Presentation is loading. Please wait.

Presentation is loading. Please wait.

| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 1 Machine-based issuing of DNB Subject Categories.

Similar presentations


Presentation on theme: "| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 1 Machine-based issuing of DNB Subject Categories."— Presentation transcript:

1 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 1 Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine in the German National Library Frank Busse

2 Outline 1.General Information 2.Automatic Classification of DNB Subject Categories 3.Automatic Classification of DDC Short Numbers for Medicine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 2

3 General Information | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 3

4 4 Automated Cataloguing – why?

5 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 5  2009Start of PETRUS project  2010 Ceasing of intellectual cataloguing of online publications  2012 Automatic classification / DNB Subject Categories  2014 Automatic indexing  2015 Automatic classification / DDC Short Numbers  2015 PETRUS project completed Timeline

6 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 6 Further information: http://www.dnb.de/EN/Erwerbung/Inhaltserschliessung/inhaltserschliessung_node.html Subject cataloguing DNB Subject Categories Subject headings DDC numbers Subject Cataloguing at the DNB

7 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 7 Automatic Classification of DNB Subject Categories

8 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 8  Since 2004  Based on Dewey Decimal Classification (DDC)  102 categoriescategories DNB Subject Categories

9 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 9 330 Economics 560 Paleontology 640 Home and family management Examples of Subject Categories

10 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 10 Automatic Classification  Start: 2012  Method: machine learning / SVM  Document type:  All online publications / without fiction  PDF (since 2012)  Epub (since 2015)  Language Ger/Eng  Volume: 444.586 online publications (03/2016)

11 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 11  Supervised learning (Learning by example)  Pattern recognition  Generalization of rules  Classifying unknown objects Machine Learning

12 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 12  Averbis GmbH / Freiburg im Breisgau Averbis GmbH  Averbis Extraction Platform (AEP)  Version 2.2.2a  Improvements and further development Software

13 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 13 Workflow Training  Base  Create a model  Software:  Averbis Software Routine  Daily processing of new online publications  Retro-processing  Software:  Averbis Software  DNB Interface  CBS

14 Routine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 14

15 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 15 Training Selection Training data Parameter setting Linguistic analysis Training

16 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 16 Training Data  Online publications & digitised Tables of Contents (ToC)  Since 2004  Language Ger/Eng  April 2016: 451.333 Online publications & ToC

17 Training Workflow 17 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training

18 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 18 Parameter Setting  Language  Text length  Metadata weighting  Exclusion conditions  etc.

19 Training Workflow 19 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training

20 Training Workflow 20 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training

21 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 21 Quality Management sample check data analysis improvement Two ways of generating sample data:  Intellectual supervision  Comparison with printed edition

22 Results 2012 - 2015 Classified objects: 413.363 Sample check: 73.509 (18%) Result: 75% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 22

23 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 23 DDC Short Numbers for Medicine

24 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 24 DDC Short Numbers for Medicine  Developed in 2006/2007  Classification of printed medical theses  Fast and time-saving

25 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 25 Example Book content: Study Overweight Children Kiel 2000-2009 DNB-SC610 DDC618.92398009435123090511 Short Number618.92398009435123090511

26 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 26 DDC Short Numbers  Start: Oct. 2015  Method: machine learning / SVM  Document type:  Subject Category 610 „Medicine and health“  Online publications (PDF / Epub)  Language Ger/Eng  Volume: 8.121 online publications (03/2016)

27 Results October – December 2015 Classified objects: 4.072 Sample check: 574 (14%) Result: 74% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 27

28 Future challenges | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 28  Improve results  Development of DDC Short Numbers for other DNB Subject Categories  No „automatic DDC“ with this tool

29 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 29 Thank you for your attention! Questions? Frank Busse German National Library Section Automatic Indexing, Online Publications f.busse@dnb.de


Download ppt "| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 1 Machine-based issuing of DNB Subject Categories."

Similar presentations


Ads by Google