Presentation is loading. Please wait.

Presentation is loading. Please wait.

H.B. O'Connell HEP Info Summit DESY May 2008

Similar presentations


Presentation on theme: "H.B. O'Connell HEP Info Summit DESY May 2008"— Presentation transcript:

1 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
Goals for HEP Heath O’Connell H.B. O'Connell HEP Info Summit DESY May 2008

2 What do physicists want from us?
Access to full-text (strongest interest). Search accuracy: find the right information. Coverage: a central place with everything. Citation analysis: co-citation, author citation, etc Conference proceedings. Experimental and Theoretical Results All instances of a result (notes, conf, article). Access to the data in tables and figures. Computer codes Published comments and replies H.B. O'Connell HEP Info Summit DESY May 2008

3 How the achieve this? Three corners of triangle
Scientific Community inSPIRE using portal-role to bring involvement of scientific community. Author identification, metadata sharing (CrossRef), peer review, published articles Traditional information curation. Publishers: APS Elsevier Springer IEEE Information Resources: inSPIRE arXiv ADS PDG Sliding scale along this axis H.B. O'Connell HEP Info Summit DESY May 2008

4 User generated content
Poll: If a simple web interface would show you an article and offer a set of categories to which it could belong, how much time would you spend in this tagging system to give a service to the community? Willing scientist FTE > current library staff. Incredible community response with incomparable scientific knowledge. How to harness this important resource? Web 2.0! We would share this information with anyone. H.B. O'Connell HEP Info Summit DESY May 2008

5 User-generated content for SPIRES 1998-2008
12,000 HEPNAMES records verified. 10,000 article reference lists added. 10,000 articles added. 1,300 job listings added. 3 FTE years from scientists (5 min each). 1/2 FTE year in staff time just to cut-and-paste this information from to database (1 min. each). H.B. O'Connell HEP Info Summit DESY May 2008

6 Harnessing the hidden workforce
Tagging/correcting/updating records must be easy, automatic, standardized Simple login interface, restricted to community Drop down menus to, e.g., authority files such as author names, experiments, institutions, journal names and keywords. Obvious and rational conventions. Commenting, ranking and enriching records. H.B. O'Connell HEP Info Summit DESY May 2008

7 A Web 2.0 Approach for Records
H.B. O'Connell HEP Info Summit DESY May 2008

8 A Web 2.0 Approach (continued)
H.B. O'Connell HEP Info Summit DESY May 2008

9 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
Access to Full-text Poll: 85% responded that access to full-text “very important” (highest rated issue). Older preprints, might require scanning: Time intensive, but worthy Immediate interest in Fermilab’s scanning of 1950’s MURA reports. SLAC scanned with permission “The Two Mile Accelerator” and posted it, huge number of hits. CERN put 12,000 CERN-TH on the web Import from KEK scans Scanning at CERN from collection ing authors for copies Asking publishers for permission to scan papers Import from publishers via mutual agreement. H.B. O'Connell HEP Info Summit DESY May 2008

10 Full-text: A role for authors
Many authors have pre-arXiv TeX files or paper preprints they could scan. Potential for thousands of papers to be uploaded to the web. Personal web pages not archivally stable. What is the most sensible way to do this? H.B. O'Connell HEP Info Summit DESY May 2008

11 Older preprints are not being deposited at arXiv
Submission date-stamp and “new” list of arXiv is important feature for precedence, distributing latest scholarship. Authors reluctant to submit older papers. 50K+ hep-ph papers, only 29 before 1991. 40K+ hep-th papers, only 37 before 1991. Also a problem for Ph.D. theses. H.B. O'Connell HEP Info Summit DESY May 2008

12 Need a place for older preprints
inSPIRE will establish a drop box for authors older preprints and other unpublished material. Would need an automatic way to link to record in INSPIRE or create a record in INSPIRE if one didn’t exist. “Click here to upload full-text of paper.” H.B. O'Connell HEP Info Summit DESY May 2008

13 Better organizing the data
Scientists use limited search criteria: Author, title, citation Difficult to find all papers on a topic Powerful classification is non-trivial: Hundreds of thousands of terms and articles Assignment cannot be done by non-scientists A taxonomy is required to enable, e.g.: Automatic classification from full-text Improved search tools and display of results Finding related articles Assessing article relevance Finding reviews. H.B. O'Connell HEP Info Summit DESY May 2008

14 Different schemes exist
PACS Field Codes: corresponding to arXiv classification Keywords: author keywords (non standardized) DESY-KW other KW-taxonomies exist, e.g. INSPEC all implemented already in INSPIRES H.B. O'Connell HEP Info Summit DESY May 2008

15 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
HEP Taxonomy evolving from DESY thesaurus, contains related / narrower / broader / composite descriptions / comments / synonyms enables automated keywording by text-mining recommendation system (already implemented) proposes KWs selected and supplemented by physicists H.B. O'Connell HEP Info Summit DESY May 2008

16 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
KW Example symmetry breaking alt: broken symmetry related: violation narrow: dynamical symmetry breaking spontaneous symmetry breaking symmetry related: invariance narrow: asymmetry hidden symmetry horizontal symmetry kappa symmetry supersymmetry symmetry breaking composite: violation alt: violat\w+ non[-\s]*conserv\w+ related: symmetry breaking symmetry: Becchi-Rouet-Stora symmetry: Lorentz symmetry: O(N) symmetry: SU(3) x SU(2) x U(1) x U(1) symmetry: chiral time: symmetry …: … …: … a wealth of information esp. in composite KWs 100 composites with ‘symmetry’ H.B. O'Connell HEP Info Summit DESY May 2008

17 Improving the Implementation
scientists help through Web 2.0 automated assignment of Field Codes linguistic algorithms intelligent search tools H.B. O'Connell HEP Info Summit DESY May 2008

18 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
Towards common goals Currently SPIRES is taking feeds from the APS, arXiv, Elsevier, IEEE, IOP and Springer. 85% of scientists use community resources to find an article and access the full-text (Google another 12%, probably full-text searching). Key role here for publishers and libraries work together in ensuring system is comprehensive and articles can be found easily by scientists. inSPIRE will receive lots of metadata from authors, that we want to share with others. How do we do this without duplication? H.B. O'Connell HEP Info Summit DESY May 2008

19 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
Who’s Who of Science Increasing problems of identifying authors. Collaborations with hundreds of people. Ambiguous names, many languages. APS has recently introduced innovative system for Chinese names. Want to find single author’s papers to judge scientific output. Clear need for author ID system. HEPNAMES – 12,000 verified records. Requires constant updating and help from scientific community. H.B. O'Connell HEP Info Summit DESY May 2008

20 HEPNAMES record: author identified, community standard ID needed
H.B. O'Connell HEP Info Summit DESY May 2008

21 Data from plots and figures
Refined, published data, such as plot of cross-section v. energy, easily understood. Useful for fitting with phenom. models. Could be uploaded with paper as text file. Otherwise software can accurately extract numbers from figures. Durham REACTIONS database has over 5,000 records for papers with data. We want to develop “Google Table” and “Google Plot” (like Google Images). H.B. O'Connell HEP Info Summit DESY May 2008

22 H.B. O'Connell HEP Info Summit DESY 20-21 May 2008
Conclusion: Unification of HEP literature in all forms. Author names, etc standardized with ID. New standardized taxonomy organizes the literature. Web 2.0 harnesses willing scientists. Putting resources in common to advance the common good, publishers, inSPIRE, ADS and arXiv work together, develop synergies to present the scientific literature to the community. Future plans evolve with the HEP community. H.B. O'Connell HEP Info Summit DESY May 2008


Download ppt "H.B. O'Connell HEP Info Summit DESY May 2008"

Similar presentations


Ads by Google