Presentation is loading. Please wait.

Presentation is loading. Please wait.

Content analysis and CERN Roman Chyla. Artificial intelligence Natural language processing Web of data Content analysis.

Similar presentations


Presentation on theme: "Content analysis and CERN Roman Chyla. Artificial intelligence Natural language processing Web of data Content analysis."— Presentation transcript:

1 Content analysis and CERN Roman Chyla

2 Artificial intelligence Natural language processing Web of data Content analysis

3

4 Semantic Web

5 Information extraction

6 ?

7 A lot to do…

8 Semantic dictionary Link between infinite and finite domains Must be prepared (or at least revised) by humans –Purposeful –Incomplete –Constantly changing Very expensive to create/maintain –Solution? Use existing data!

9 Basic principles Keep it simple, stupid (I didn‘t want believe it could work, it was too simple!) You can‘t get it 100% right Dictionary ~ Universal semantic language –Not really a language, but taxonomy (not even ontology) –Lackss expresiveness –Still very much vague (but that is a feature, not bug!) –Cannot infere from facts BUT it is: – Simple to maintain –Ready to change and evolve, ready to accomodate other resources –Language independent –Problem of research question –Problem of universal and domain specific taxonomy

10 Word sense disambiguation Homonyms are obvious problem … and Seman can work with many definitions at the same time (think of 3 people and their definition of one word) Possible solutions: –Disambiguation by harvested definitions –Rules –Neural network (supervised learning) –If problems are few, humans can decide

11 cat

12 So what I want to do… Prepare another semantic dictionary for HEP (using whatever I can) and for english in general (UDC + existing seman) Diferentiate HEP core and non-core Search corrections (did you mean?) Search results categorization/facets Identify entities, data elements… make them available (this is mainly IE task) Identification of topics (metrics of similarity between document and „known characteristics“) Keywording – identification of statically significant occurences of concepts (not words) Come up with faster ways to enrich the taxonomy

13 Semantic dictionary Did you mean? IE engine (Bibclassify)

14 Thank you for your attention. Questions?


Download ppt "Content analysis and CERN Roman Chyla. Artificial intelligence Natural language processing Web of data Content analysis."

Similar presentations


Ads by Google