Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael.

Similar presentations


Presentation on theme: "Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael."— Presentation transcript:

1 Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael Buckland, Aitao Chen, Fredric Gey and Ray Larson Friday Afternoon Seminar, Feb 14, 2003 http://metadata.sims.berkeley.edu/papers/SeamlessSearchFinalReport.pdf

2 From numbers to texts: Iritani, Evelyn. "Normalizing ties to Vietnam important steps for U.S. firms; California stands to profit handsomely when barriers fall to trade with fast-growing country." Los Angeles Times v114 (July 12, 1995):D1. An article found using the keywords “Import” and “Vietnam” as query.

3 From text to numbers: "U.S. bans import of most European meat". Los Angeles Times v116, n14 (Dec 14, 1997):A22. (On fear of mad cow disease.) "Ban on cattle and sheep is extended to all Europe." New York Times v147, sec1 (Dec 14, 1997):16(N), 42(L). (The U.S. Agriculture Department responds to threat of 'Mad Cow' disease). Topic of interest: imports of beef to the United States from Britain The sources at http://govinfo.kerr.orst.edu/import/import.html showhttp://govinfo.kerr.orst.edu/import/import.html No reported edible beef imports from the United Kingdom.

4 Seamless Search Project Goals: Phase I: The development and demonstration of a library gateway providing search support for searching both text and socio-economic numeric databases. Phase II: The demonstration of a library gateway supporting searches between text and numeric database.

5 Data Sets to create Entry Vocabulary Indexes: MELVYL MARC Files 73180254 A study of operant conditioning under delayed reinforcement in early infancy Infant psychology. Operant conditioning. Number of MARC records in the training data set: ~4,246,000. Book title LC Subject Headings A sample training record extracted from a MARC record.

6 doc1 doc2 doc3 doc4 doc5 behavior infant infancy psychology Infant psychology Operant conditioning Infant development Psychology Parent and child child attitude baby development Title WordsDoc IDsLCSHs Statistical association of title words and LCSH

7 Word to LCSH Entry Vocabulary Index (EVI) 1alcoholism7470.46 2alcoholic1745.23 3alcohol709.26 4alcoholism and employment318.26 5drug abuse257.75 6alcohol, ethyl235.13 7drinking of alcoholic beverages151.46 8substance abuse146.04 Rank LCSHWeight List of the LCSHs that are most closely associated, statistically, with the query word: alcoholism.

8 Words to LCSH Entry Vocabulary Index (EVI) 1 economic policy756.90 2 german (west)645.02 3 switzerland97.70 4 regional planning96.39 5 economics92.14 Rank LCSHWeight List of LCSHs that are most closely associated, statistically, with the German query word: Wirtschaftspolitik. Note: The top-ranked LCSH “economic policy” happens to be the English translation of the German word “Wirtschaftspolitik”.

9 Words to LCSH Entry Vocabulary Index (EVI) 1 peanut1343.90 2 cookery (peanut butter) 429.61 3 cookery (peanuts) 423.47 4 peanut industry 359.57 5 peanut butter 316.23 6 butter 309.36 7 schulz, charles m 277.30 8 cookery 197.08 Rank LCSHWeight List of LCSHs that are most closely associated, statistically, with the phrase peanut butter as a query.

10 Word to LCSH Entry Vocabulary Index (EVI) 1 world war, 1939-1945 16430.62 2 vietnamese conflict, 1961-1975 15388.68 3 united states 13989.66 4 world war, 1914-1918 8055.60 5 vietnam 6523.90 Rank LCSHWeight List of LCSHs that are most closely associated with the German query: Vietnam War. Note: “Vietnam War” is not an established (authorized) LCSH. The established LCSH is “Vietnamese conflict”.

11 LCSH to Words Entry Vocabulary Index 1 alcohol13471.94 2 alcoholism11715.56 3 abuse 3708.09 4 drug 3467.22 5 drink 2563.53 6 alcoholic 2534.91 7 treatment 2349.03 8 prevention 1263.94 9 problem 1148.03 10 addiction 886.81 Rank WordsWeight List of words that are most closely associated, statistically, with the Library of Congress Subject Heading: Alcoholism.

12 EVI-based Access to MELVYL Free-form query Ranked list of LCSHs MELVYL Z39.50 SERVER HTTP/Z39.50 Gateway httpd evi access Search results Full MARC record Web server gateway access EVI Web Browser Other Z39.50 SERVERS Z39.50 HTTP CGI 16 5 4 3 2 7

13

14

15

16 Counting California Database (http://countingcalifornia.cdlib.org/) A collection of some 3,000 numeric tables. Organized into 16 topics and 184 subtopics. Sample topics: Banking, Finance and Insurance Elections Population and Demographics Social Services and Public Assistance Sample subtopics under Agriculture and Natural Resources: Farms and Farming Fishing Forestry and Lumber Minerals

17 Enhanced Access to Counting California Database Conventional probabilistic retrieval of numeric tables using table captions, mapping query to text of captions. Access to numeric tables through the words-to-subtopic entry vocabulary index. education libraries STATISTICS, STATEWIDE SUMMARY BY TYPE OF LIBRARY CALIFORNIA, 1992-93 TO 1997-98 A sample record created from http://countingcalifornia.cdlib.org.

18 Probabilistic Access to Counting California Database Search results for the query: public libraries in California gives ranked list of captions:

19

20 EVI-based Access to Counting California Database Ranked list of subtopics that are most closely associated, statistically, with the query: personal/individual income tax. 1income542.53 2government earnings and tax revenues251.71 3property tax156.67 4property tax74.58 5personal income tax59.99

21 Numeric Tables with Subtopic: Personal income tax.

22 EVI LCSH marcnew query search results captions numeric table numeric database online catalog search interface 1 search interface 2 1 876 5 432 11 109 Traverse Searching Between Online Catalogs and Numeric Databases

23 Melvyl MARC record as source of a query

24 Extract from MARC as a query Any caption can become a query

25 http://metadata.sims.berkeley.edu/papers/SeamlessSearchFinalReport.pdf Final Report on “Seamless Searching of Numeric and Textual Resources” Project, 1999-2002. Two sequels: 1.Adding search by place: “Going Places in the Catalog: Improved Geographic Access,” funded by a National Library Leadership Project from the Institute of Museum and Library Services, 2002-2004. 2.Multilingual Search Across Multiple Genres: Proposal submitted Feb 13, 2003!


Download ppt "Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael."

Similar presentations


Ads by Google