Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joint workshop on electronic publishing Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University.

Similar presentations


Presentation on theme: "Joint workshop on electronic publishing Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University."— Presentation transcript:

1 Joint workshop on electronic publishing Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University Library

2 Joint workshop on electronic publishing Part 1: Bielefeld UL: from meta search to search engines BASE: objectives, content, services Outlook and further information Part 2: Backend, Frontend OAI dataflow, BASE dataflow OAI harvesting problems Further developments of BASE Overview:

3 Joint workshop on electronic publishing From where we come from …

4 Joint workshop on electronic publishing One central on-site library divided into groups of subject libraries 2 Mio books and other media items, the majority on open shelves Active registered users in 2004: 28,000 2,675 reader workplaces Budget for acquisitions in 2004: EUR 3,200,000 incl. special funds Journals: about 5,700 subscriptions Host of the International Bielefeld Conference series, a conference that offers every two years a major strategic discussion forum for library managers from all over Europe and beyond From where we come from …

5 Joint workshop on electronic publishing From meta search to search engines (1) Integration of heterogenous information resources for users is a primary objective of UL Bielefeld at all times Milestones: 1993 Introduction of the document delivery system JASON 1995 Development of the first German library project for a cooperative electronic information supply IBIS 1998 Introduction of JASON-Subito. Online access to journals available in full-text versions (i.a. by consortial agreements with publishers) 1998-2001 main coordinator of the Digital Library NRW (a major grant of the NRW State Ministry) 2000 Combination of Digital Library NRW services and the library's local website in order to offer integrative services in corporate design 2002 Development of a netbased integrated learning and teaching environment (online learning) based on Blackboard and a university publications server (BieSOn) based on OPUS 2004 Launch of the Bielefeld Academic Search Engine (BASE) on the basis of FAST Data Search Software

6 Joint workshop on electronic publishing From meta search to search engines (2) Integration on the level of library´s local system: OPAC: Local holdings Institutional repository servers (OAI, with focus on fulltext dissertations) Journal Article Database (JADE, about 39 Mio Articles) in combination with document delivery (JASON, Elsevier-ppv, Subito) Inside Serials Elsevier Springer JSTOR … Meta search for several subject portals (Digital Library)

7 Joint workshop on electronic publishing BASE: objectives, content, services (1) First starting point: reality of academic online information web pages subject databases publishers‘ ejournals library catalogues institutional repository servers search engine digital libraries portals commercial providers search

8 Joint workshop on electronic publishing BASE: objectives, content, services (2) Second starting point: experience with meta search (Digital Library) and user studies: Users want search engine look and feel Search functionality of meta search environments is too slow when compared to search engines like Google Little integration of fulltext resources Little integration of the “visible web” Main objectives of BASE: to overcome the fragmentation of academic search information resources to use search & retrieval standards provided by search engine technology to provide comfortable search interfaces and flexible result presentation to handle with highly structured and unstructured data to create spacious shared indices for a new kind of “meta” search

9 Joint workshop on electronic publishing BASE: objectives, content, services (3) web pages subject databases publishers‘ ejournals library catalogues institutional repository servers search engine for academic online information

10 Joint workshop on electronic publishing BASE: objectives, content, services (4)

11 Joint workshop on electronic publishing BASE: objectives, content, services (4) Projekt Gutenberg-DE Internet Library of Early Journals Oxford Various Institutional Repositories Springer Link Metadata Cornell HistMath Fulltext Crawl University Michigan Historical Math Biomed Central Project Euclid Zentralblatt Mathematik Bielefeld Univ: Math. Preprints OAI Verlag Krause und Pachernegg OPAC UL Bielefeld Bielefeld Univ: Documenta Mathematica Perseus Digital Library Zeitschriften der Aufklärung (Bielefeld UL) TIB Hannover MATH Collection

12 Joint workshop on electronic publishing BASE: objectives, content, services (5) Services provided by UL Bielefeld within BASE: Identification and selection of high-quality content repositories Contact and negotiations with content providers (universities, libraries, commercial content providers) Data aggregation, data pre-processing and data-processing of internationally distributed and highly heterogeneous ressources Data production (e.g. german enlightment, JADE,...) Delivering of indexes in standardised formats (XML) for platform- independent re-use by other search engine providers Integration of BASE within meta search environments (e.g. SISIS-Elektra) Providing access to additional content in local OPAC environments

13 Joint workshop on electronic publishing BASE: Outlook and further information (1) The next steps: Leaving the „demonstrator“-status Increase the number of indexed OAI-Servers Integrate local library resources (OPAC and other databases) Integrate more commercial subject databases Increase fulltext indexing More use of FAST-features

14 Joint workshop on electronic publishing DLF Spring Forum New Orleans 2004: http://www.diglib.org/forums/Spring2004/ Norbert Lossau: Search Engine Technology and Digital Libraries, Libraries Need to Discover the Academic Internet, in: D-Lib Magazine, June 2004 (Volume 10, Number 6) Friedrich Summann, Norbert Lossau: Search engine technology and digital libraries : moving from theory to practice, in: D-Lib Magazine, September 2004 (Volume 10, Number 9) http://base.ub.uni-bielefeld.de BASE: Outlook and further information (2)

15 Joint workshop on electronic publishing Search API Pipeline QUERY & RESULT PROCESSING DOCUMENT PROCESSING Pipeline FILE TRAVERSER FILTER SEARCH INDEX FILES CONNECTORS TUNING, ADMINISTRATION and DEBUGGING WEB CRAWLER General Web Content and Full Text OAI-Sources (Metadata+Docs) Full Text Collections Database Content (Bibl.Data) FAST based architecture and intelligent modifications

16 Joint workshop on electronic publishing CONNECTORS General Web Content and Full Text OAI-Sources (Metadata+Docs) Full Text Collections Database Content (Bibl.Data) Added functionalities: Connectors

17 Joint workshop on electronic publishing OAI-Data Harvesting BASE Internal Index (FAST) OPAC Article Database Dissertations, monographs (fulltext) Articles (fulltext) PubMed, Euclid, ArXiv, CiteSeer, Citebase, DOAJ articles All ressources (texts, images, video,refernces.... OAI dataflow

18 Joint workshop on electronic publishing OAI-Data Web Pages Database Records Harvesting Pre-Processing Processing Internal Index (FAST) User interface (PHP) BASE dataflow

19 Joint workshop on electronic publishing 1 11 6 4 27 9 3 22 3 1 9 1 6 3 2 USA 34 Canada 7 Australia 8 OAI university repositories in BASE

20 Joint workshop on electronic publishing Non-Responding repositories Only References without fulltext Restricted access Invalid characterset (not well-formed) Varying Field content OAI harvesting problems

21 Joint workshop on electronic publishing http://elib.suub.uni-bremen.de/publications/ ELibD905_diplom_allnoch.pdf Barry Wellman,Jeffrey Boase,Kakuko Miyata Barry Wellman,Jeffrey Boase,Kakuko Miyata The Mobile-izing.... Talk P. Bruzzone Bruzzone Pierluigi Reproductive Biology and Endocrinology 2004, 2:52 doi:10.1186/1477-7827-2-52 2004-07-05 Review http://www.rbej.com/content/2/ 1/52 OAI Harvesting : Problems in Practice (Examples 1)

22 Joint workshop on electronic publishing http://www.forex.uni-bremen.de/cgi- bin/forex2/user/publish?search=sqn&sqn=00005223 OAI Harvesting : Problems in Practice (Examples 2)

23 Joint workshop on electronic publishing BASE homepage

24 Joint workshop on electronic publishing Advanced Search form

25 Joint workshop on electronic publishing Result Presentation

26 Joint workshop on electronic publishing combining metadata record and corresponding fulltext in result display [Done] Search history [Done] Truncation [Done] Flexible Templating (customised views) Improvement Search Interface (based on search API) Refinement on data deliverer Further Development (1): Frontend

27 Joint workshop on electronic publishing Local view on BASE

28 Joint workshop on electronic publishing Subject index browsing

29 Joint workshop on electronic publishing Author index browsing

30 Joint workshop on electronic publishing automation of harvesting and content preprocessing Federated search, linking with external indexes search result improvement (ranking, boosting, linguistics) performance optimisation support of standard protocols (Z39.50, OAI, SOAP) as a target system Further Development (2): Backend

31 Joint workshop on electronic publishing Integrating XML queries Link topology analysis Citations analysis Automatic linguistic analysis of anchor texts Push services Personalized ranking Cross-language information retrieval Further Visions


Download ppt "Joint workshop on electronic publishing Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University."

Similar presentations


Ads by Google