Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009.

Similar presentations


Presentation on theme: "Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009."— Presentation transcript:

1 Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009

2 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

3 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

4 CERN: European Organization for Nuclear Research (since 1954) World leading HEP laboratory, Geneva (CH) 2500 staff (mostly engineers,administrators/services) 9000 users (physicists from 580 institutes in 85 countries) 3 Nobel prizes (Accelerators, Detectors, Discoveries) Invented the web Ready to re-start the 27-km (6bn€) LHC accelerator, “the big-bang machine” Top management committed to Open Access Runs a 1-million objects Digital Library CERN Convention (1953): ante-litteram Open Access manifesto “… the results of its experimental and theoretical work shall be published or otherwise made generally available”

5 INSPIRE team @ CERN Being Recruited (IT)– 100% (API, grid-ification) Jukka Klem (OA) – 80% (Applications) Jean-Yves le Meur (IT) – Infra supervision Tibor Šimko (IT) – Tech supervision Tim Smith (IT) – Infra strategy & MGA Salvatore Mele (OA) – Apps strategy & TB TBC: Junior developer (OA/IT) – (Interface applications/API)

6 Who is INSPIRE ? Fermilab CERN DESY SLAC arXiv ADS Who are our buddies ? APS SISSA Elsevier Springer Which publishers do we talk to ? PDG Durham KEK World Scientific

7 ~15’000 High Energy Physics (HEP) scientists smash stuff at the speed of light to produce new stuff

8 ~15’000 HEP theorists scratch their heads to make sense of all that stuff and then some more

9 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

10 The HEP “preprint culture” L.Goldschmidt-Clermont, 1965, http://eprints.rclis.org/archive/00000445/02/communication_patterns.pdf Scientific journals of ‘60s too slow for HEP Mass-mail preprints to institutes worldwide Ante litteram (institute-pays) Open Access CERN library starts index and display preprints Leading research libraries “serve” preprints CERN Library, circa 1960

11 Before e-mail and RSS... L. Addis, 2002, http://www.slac.stanford.edu/spires/papers/history.html SLAC Library (Stanford) maintains preprint lists Sending lists to subscribers worldwide as of ‘62 Scientists then request preprints of interest Published articles go on anti-preprint list Indispensable working tool from ‘60s to ‘80s

12 SPIRES: first electronic catalogue http://www.slac.stanford.edu/spires/papers/history.html http://www-conf.slac.stanford.edu/interlab99/program/kunz/EarlyWeb.frame.pdf SLAC Library,1974: now 750’000 records With Fermilab (US) and DESY (DE) Libraries Electronic catalogue of preprints metadata Updated with publication reference First terminal login, then e-mail interface Then the first web server in U.S. Date: Fri, 13 Dec 91 17:55:53 GMT+0100 From: timbl@nxoc01.cern.ch (Tim Berners-Lee) Subject: WWW to SPIRES on SLACVM - Experimental To: www-interest@cernvax.cern.ch, www-talk@cernvax.cern.ch There is an experimental W3 server for the SPIRES High energy Physics preprint database, thanks to Terry Hung, Paul Kunz and Louise Addis of SLAC. It's only just been put up, so don't expect perfection. With the w3 line mode browser, follow a link to it from our home page, - Tim Paul Kunz wrote a few days ago:- "The SLAC Library maintainer of SPIRES databases, Louise Addis, is absolutely delighted. She will ask for a permanent VM service machine and finish off the polishing. Things are really moving now.”

13 arXiv.org the archetypal repository P. Ginsparg, LANL, 1991. Now Cornell Library E-mail based, then immediately on the web No mandate, no debate, author-driven 1/2 Million preprints. Growing beyond HEP http://vmsstreamer1.fnal.gov/VMS_Site_03/Lectures/Colloquium/presentations/090506Ginsparg.pdf

14 Where do HEP scientists go for info? Survey of 2’000+ scientists (10% community) Library/community answers to info needs Google as proxy of arXiv, SPIRES, publishers Gentil-Beccot et al. arxiv:0804.2701

15 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

16 What more do users want ? Gentil-Beccot et al. arxiv:0804.2701 Not important Very important Depth of coverage Quality of content Access to full text

17 Where do users see the systems go ? Gentil-Beccot et al. arxiv:0804.2701 Seamless Open Access to pre-’90s articles “Greyer” literature (laboratory reports) Conference slides (linked with articles) “Publication” of “ancillary” material: –Data behind tables, figures –Re-usable experimental data Some sort of peer-review overlaid on arXiv “Smarter” search tools

18 What would users give ? Gentil-Beccot et al. arxiv:0804.2701 Would users contribute to tag articles ? Indexing and keywording in a Web2.0 world ! Immense potential to be harnessed Would contribute 30 minutes/week or more Would not contribute Fraction of answers Seniority in the field

19 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

20

21

22

23

24

25

26

27

28 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

29 Building INSPIRE http://www.projecthepinspire.net/ Joint project of CERN, DESY, FERMILAB, SLAC Switch off aging SPIRES infrastructure Import 750’000+ records into an Invenio instance Inherit 50’000+ users (60+ million searches/year) Roll out 1Q10 (working on back-offices tools) Out of the box: totally new back-office, Bi-directional feeds with arXiv and publishers

30 Releasing INSPIRE http://www.projecthepinspire.net/ Medium term add-ons to INSPIRE (2Q10-4Q10) Full-text searching warehouse, Open Access & Copyrighted Author disambiguation (algorithm & web2.0) Personal shelves, with annotations. Alerts Drop-box for old preprints, theses … (advocacy campaign) Widespread “drop”, describe and search non-text material User generated tags (taxonomic & à la Flickr) Thesaurus-based semantics, then folksonomy & ontology

31 Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

32 Use computational power of e- Infrastructure to grow repository services 1. Back-office infrastructural services 2. Back-office content-analysis services 3. Novel front-line services

33 1.Back-office infrastructural services I. Parallelization of full-text indexing II. OCR’ing old holdings/new scanned submissions III. “Gorilla” classification of content IV. Text-mining for metadata and citation extraction

34 2. Back-office content-analysis services Clustering of “similar” records for I. Discovery (if you want this you might want that) II. Ranking (first result is what you want) Nightly re-clustering holdings including daily updates: 1. User-generated tags 2. New additions with their metadata/citations/logs Use citations, author network, tags, logs

35 3. Novel front-line services Reqs: Impossible without a Grid, but latency tolerant “Find me a mentor” User uploads A4-size research synopsis INSPIRE identifies appropriate mentor (or referee) Depends on success of parallel semantic project

36 Metadata extraction … Indexing parallelization OCR’ing SWORD Maintain INSPIRE API develop+maintain Clustering Find-me-a-mentor Infra Infra services Apps


Download ppt "Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009."

Similar presentations


Ads by Google