Presentation is loading. Please wait.

Presentation is loading. Please wait.

Martin GrötschelKonrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Trier, 12. März, 2001 On the Road.

Similar presentations


Presentation on theme: "Martin GrötschelKonrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Trier, 12. März, 2001 On the Road."— Presentation transcript:

1 Martin GrötschelKonrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Trier, 12. März, 2001 On the Road to Scientific Information Portals: Cooperative Digital Libraries Remarks, Visions, Proposals Martin Grötschel IuK 2001, Universität Trier

2 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents Introduction I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science IV.Integrating All Types of Resources V.We should Organize the Cyber Space VI.To the Benefit of our Society

3 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents Introduction I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science IV.Integrating All Types of Resources V.We should Organize the Cyber Space VI.To the Benefit of our Society

4 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Personal Motivation I have broad interests. I (have to) search a lot. I do find things I look for. However, this process costs too much time and money. The scientific information system could be much better. It seems that some scientists have to get involved. The situation is similar with respect to communication.

5 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Acting Forces Science drives Technology Technology drives Change Change induces Pressure Some Consequences: Higher Speed and Efficiency Lower Costs Universal Connectivity More and Global Competition What does this imply for Science?

6 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The World of Information Tons of Printed Material Zillions of Scientific Web Sites of E-Journals, E-Prints of Databases and CD-Roms of Multimedia Documents of of Digital Photos and Videos etc.

7 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The Players The Author The Publisher The Librarian The Software Developer The Service Provider The Scientific Information Center The Scientific Society etc. the user

8 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Some Unsolved Issues Accessability Searchability Stability Compatibility Pricing Heterogeneity Diversity and Complexity of Structures Quality Authenticity etc.

9 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Solution Scientists have to get involved Solution must be user driven Cooperation of players Consensus about structures Some Suggestions in this Talk

10 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true?

11 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Current Mathematical Resources Papers and Preprints Journals and Books Reviews and Abstracts Software and Data Collections Projects and Persons Voice, Images, and Video Information Links, Mail, and Virtual Libraries

12 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Math Papers and Preprints Preprints of the Math-Net MPRESS (including ArXiv math,...) EULER Digital ACM

13 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Math Journals and Books SUB Göttingen (Sondersammelgebiet) TIB Hannover (Tech Information Library) Uni Osnabrück EMIS Springer LINK DOCUMENTA MATHEMATICA Lehmanns.de

14 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Math Reviews and Abstracts Zentralblatt AMS FIZ-Karlsruhe Jahrbuch der Mathematik

15 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Math Software and Data Collections ANL ZIB Uni Paderborn Algebraic Groups Cinderella OpenMath

16 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Projects and Persons Web Sites of Math Research Institutes Web Sites of Math Departments BerNAM Directory of ACM Comb. Membership List AMS, SIAM, MAA PERSONA mat-net.de math-net.de

17 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Voice, Images, and Video Computer Museum MSRI Video Server Electronic Geometric Models Application Servers and Software MATHEMATICA Cinderella Inverse Calculator

18 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Links, Mail, and Virtual Libraries mathematik.de Math-Net.de Mathematical Archives ZIB MathML

19 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel There are zillions of Math Resources in the Net.

20 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The Situation is Similar in all other Sciences How do you know that all this material exists and where it is? Old Approach: Link Lists = WWW Virtual Libraries But, much more has come up in the recent years!

21 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Is Everything in the Web? Printed Books Printed Journals CD-ROMs Some Data Bases Historic Archives Catalog Cards... are not electronically available

22 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Is Everything from the Web in the Web?

23 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web

24 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The Invisible / Deep Web A fundamental Problem with Search Engines: A Vast Amount of Information is Invisible Surface Web / Web Robots Start at some Hubs Interlinked Web Pages Deep Web Isolated Web Sites There are huge Isolated Islands in the Web Information within Databases, behind CGI Interfaces Information without Links (e.g. within OPACs of Libraries) Protected Material, Excluded Explicitly

25 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel A Web Search Engine Collecting Visible Information From The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan-2000

26 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel A Direct Meta Search Engine Fishing for Invisible Information From The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan. 2000

27 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Characteristics of the Deep Web - in Comparison to the Visible Web - Public information is currently 400 to 500 times larger than the commonly defined World Wide Web 7,500 terabytes of information (550 Billion individual documents), compared to 19 terabytes (1 Billion documents) From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

28 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Characteristics of the Deep Web - in Comparison to the Visible Web - More than 100,000 Deep Web sites currently exist 60 of the largest Deep Web Sites collectively contain about 750 terabytes of Information (... narrower, with deeper content) More than half of the Deep Web content resides in topic specific databases (BrightPlanet concentrates on about 20,000 sites) A full 95% of the Deep Web is publicly accessible information – not subject to fees or subscriptions The Deep Web is the largest growing category of new information on the Internet. But the Deep Web is widely unknown. From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

29 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Making the Deep Web Visible Technology: Meta Search Engines Bibliographic Meta Search Engines Virtual Catalogs and Link Lists Organisational Issues: Building Networks of Digital Libraries Forming Library and other Cooperatives Working on Standards and Formats (Common, Open, Metadata,...)

30 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Categories of Information Systems Web Sites – Collection, Query Interface Publications– E-Journals, Preprints,... Regional/Nat. Collections – Harvesting Systems Topical Databases – Subject Specific Aggregation OPACs – Library Holdings Journal Archives – Archive of Publishers Software/Data Collection – Commercial / Public Archive Compute Servers – Math. Calculations /Demos Mailing Lists/Archive – Topical Communication Forum Topical Portals – Wide Spectrum Information System

31 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Problems: Wide Variety of Servers Problems with Search Engines (Web Robots) Impose High Load on Servers and Networks Perverted use of Metadata Robots cant see behind CGI-Interfaces Access Rights, Range of Licenses Problems with Cascading Search Engines Diversity of data formats (MAB, MARC Formats, DC,...) Multitude of protocols (Z39.50, HTTP, proprietary) Specialized Repositories and Archives Scientific Journals provided by Commercial Publishers Document Delivery Systems and Specialized Historic Archives Maps, Music, Photos, Videos, Multimedia

32 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science

33 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Virtual/Digital Library Virtual Search index Links Metadata OPAC catalog entries Digital Structured digital contents Full texts Data bases

34 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Towards a Scientific Portal to Interconnect the Digital World Virtual Library Information Portal: Cooperative Virtual Digital Digital Library Scientific Library The Scientific Portal (Information Portal for the Sciences) is an Entry Point to all Types of Information Products from the Sciences. Behind the Scientific Portal is a Structured Network to be coordinated and organized by the Sciences in a cooperative way. A Task for the IuK Initiative?

35 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Lots of Examples already exist

36 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel An Example in the Making Virtuelle Fachbibliothek Technik der TIB Hannover

37 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Example: The DOE Information Bridge Started in 1997 with searchable full text reports DOE Office of Scientific and Technical Information (OSTI) Direct Search based on the Distributed Explorer developed by a small Internet Company: Innovative Web Application Ltd. (IWA) A public version in partnership with the Government Printing Office (GPO) of the USA Many other Federal Deep Web collections added to the DOE Virtual Library PubScience PubMed NTIS Electronic Catalog (450,000 Titles) NASA Technical Report Server Energy Portal Search Digitization efforts for Gray Literature OSTI)

38 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel OSTI Virtual Library

39 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Pub Science

40 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The GrayLit Information Network Graphic from Searching The Deep Web; W.L. Warnick et al. D-Lib Magazine, Vol. 7, No. 1, January 2001;

41 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Preprint Network

42 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel DOE OSTI

43 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Energy Portal Search

44 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel PubMed

45 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel NASA Image Exchange

46 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Federal R & D Architecture Graphic from Searching The Deep Web; W.L. Warnick et al. D-Lib Magazine, Vol. 7, No. 1, January 2001;

47 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel An Observation The Voluntary Work contributed so far was and will stay important. There will, however, be no satisfactory solution without substantial amounts of personal and financial investment. We need to become more professional, e.g., Google versus Math-Net.

48 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science IV.Integrating All Types of Resources

49 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Distributed Meta Search Engines Exist What they do: Query Search Engines, OPACs, Databases Perform Distributed Searches in Parallel Cascade Search to reach Large/Vast Amounts of Targets Deliver Links, Metadata, and/or Full Texts Handle a Diversity of Data Structures Use a Multitude of Internet/Web Protocols Structure Heterogeneous/Large Result Sets They Rely on a Series of Small Configuration Files

50 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Combination of Search Engines As studied by J. Lügger in Über Suchmaschinen, Verbünde und die Integration von Informationsangeboten; ABI-Technik, June, 2000 Math-Net: Harvest+DC KOBV Search Engine Shared Index Distributed Search Shared Index EULER and Dublin Core DigiBib NRW

51 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel A Potential Math Information Portal SI HTTP DigiBib with KOBVDigiBib with WebPack Z39.50 with UNIMARC Z39.50 DS Browser HTTP DS ZIB Uni Köln Sigma NetLib Software Persona Mathematica Zentralblatt für Mathematik MATH, MATHDI Jahrbuch für Mathematik Universität Osnabrück ELib MPRESS Special Interest Groups of DMV OPT-NET, IM-Net, IuK,... Publishers and Software Houses E-Journals, Software SUB Göttingen OPAC SSG Mathematik TIB Hannover TIB CAT CWI Amsterdam OPAC Mathematics Mathematische Fachbereiche & Institute Specialized OPACs Library Cooperatives BVB, GBV, HBZ, KOBV,... Die Deutsche Bibliothek Authority Data Publishers and Math Societies Math-Journals and -Document DigiBib with Math- Net Z39.50 with MAB2 USMARC Open Distributed Efficient Scalable Stable

52 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science IV.Integrating All Types of Resources V.We should Organize the Cyber Space Scientists should Organize the Scientific Cyberspace Cooperatively (Summary and Proposals)

53 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Organizing the Cyberspace: Suggestions Partners for the information portal? Who should form the information portals? Organizational framework? Cooperative Digital Libraries Main Issues: Sustainability and Finance

54 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Partners of the Information Portal Scientific Libraries, Scientific Archives Scientific Departments, Research Institutes Database / Content Providers Document Delivery Services Digitization Centers Scientific Societies Publishers Software Houses Data (Collecting) Centers

55 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Suggestions for an Information Portal Open Digital Archives of Specialized Collections Scientific Suppliers Obtain Free Access High Quality Information and Services Robust/Commercial Software/Database Distributed/Heterogeneous Architecture Some Centralization is Necessary Too Emphasis on Reliable/Long Term Availability Activities in Long Term Archival Supported by a Specialized Information Center/Library Cooperation with Scientific Societies Not-for-Profit and For-Profit do not exclude each other.

56 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Suggestions for an Organizational Framework University Level (local) University Library University Computing Center Cooperation University Media Center Scientific Level (topical/national) Specialized Library / Information Center Consulted by a Scientific Society Editorial Topical Competence Center National Level National Competence Center for New Technologies Research and Development for Production Consultation Standardization / Coordination Activities A Topical Competence Center may be Research Institute.

57 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Key Problems No progress without substantial investment Long term sustainability No progress without further research and development Institutionalization (The IuK-Initiative can literally initiate, but cant run the show) But the show must go on!

58 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Contents I.All Information is Part of the Web Can we make this true? II.The Visible Web and the Deep Web III.There could be an Interconnected Network of Science IV.Integrating All Types of Resources V.We should Organize the Cyber Space VI.To the Benefit of our Society

59 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel Who Will Benefit Student: Access to Vast Amount of Materials Employee: Further Training, Lifelong Learning Teacher: Reuse of High Quality Materials Author: Publishing Cheap, Fast, and Widely Publisher: Open Sources Generate New Chances Business: More Profit from Applying Science Citizen: Contacting Research More Directly Science: Communicating with the Public Society: Free Flow of Information

60 Konrad-Zuse-Zentrum für Informationstechnik BerlinMartin Grötschel The End


Download ppt "Martin GrötschelKonrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Trier, 12. März, 2001 On the Road."

Similar presentations


Ads by Google