Download presentation
Presentation is loading. Please wait.
Published byFred Glass Modified over 9 years ago
1
Data Mining the Largest Library Database in the World Roy Tennant OCLC Research Leveraging WorldCat
2
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Worldcat.org/identities/ Algorithmically constructed from WorldCat records Algorithmically constructed from WorldCat records
3
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Viaf.org A Union database of authority records A Union database of authority records
4
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L The Responsible Party Thom Hickey Chief Scientist OCLC Research
5
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L 290+ million records
6
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Language Coverage 30 June 2012 60.2% 274 million 36.5 million 25.5 million 11.3 million 4.7 million 4.3 million 3.6 million 3.5 million Total German French Spanish Italian Dutch Russian Latin
7
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Worldcat.org/identities/Worldcat.org/identities/
10
(J.K. Rowling) (Diana Gabaldon) (Galileo)
11
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
13
Viaf.org
14
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L VIAF Participants
15
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
16
“Super” Authority File
17
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
19
Our Cataloging Future “Moving from cataloging to catalinking” Eric Miller, Zepheira
20
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
21
Some Lessons Widespread collaboration is essentialWidespread collaboration is essential Normalizing the data is essentialNormalizing the data is essential Normalizing the data is complicatedNormalizing the data is complicated Everything is interrelated:Everything is interrelated: –You can’t bring names together if titles don’t match –You can’t bring titles together if names don’t match Batch mode processing still rules (but we’re getting better and faster at it)Batch mode processing still rules (but we’re getting better and faster at it)
22
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Conclusions Data mining isn’t just useful, it’s essentialData mining isn’t just useful, it’s essential Extracting data from MARC that is useful in other contexts is possible, but will require sophisticated processingExtracting data from MARC that is useful in other contexts is possible, but will require sophisticated processing Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this workOnly very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work Thankfully, we are doing it, but there is much more to be doneThankfully, we are doing it, but there is much more to be done
23
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L Roy Tennant tennantr@oclc.org@rtennantroytennant.com
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.