Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007

Similar presentations


Presentation on theme: "Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007"— Presentation transcript:

1

2 Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007

3 "Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."

4 The Mining Metaphor

5

6 Gold Mining

7 Diamond Mining

8 Data Mining

9 Data Mining- What it isn’t

10 ≠ Information Retrieval

11 ≠ Information Extraction

12 ≠ Information Analysis

13 ++ Information Retrieval Information Extraction Information Analysis

14 Data Mining new, previously unknown information

15 And so what is text data mining?

16 Text Mining

17

18 ++ Information Retrieval Information Extraction Information Analysis

19

20 Crucial question for publishers is: “If ‘hiding’ information in unstructured text is a problem- then shouldn’t we be exploring new ways to “publish”?

21 So how did we get here?

22 The word tobacco originates from the Taino indians. There is no I in the word Team. The book captured the zeitgeist of the time. I am sure that I turned the gas off.

23 The book captured the zeitgeist of the time. I am sure that I turned the gas off.

24

25

26 Semantic Web “Light”

27

28

29

30

31

32 But we can do more...

33 The web as a database

34 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges New Directions HopscotchJulio Cortazar Pantheon The Aleph Jorge Luis Borges Penguin... The Relational Model

35 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges New Directions HopscotchJulio Cortazar Pantheon The Aleph Jorge Luis Borges Penguin... Rows represent things

36 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges New Directions HopscotchJulio Cortazar Pantheon The Aleph Jorge Luis Borges Penguin... Columns are properties

37 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges New Directions HopscotchJulio Cortazar Pantheon The Aleph Jorge Luis Borges Penguin... The book has an author “Jorge Luis Borges” The thing’s property SubjectPredicateObject

38 The book has an author “Jorge Luis Borges” SubjectPredicateObject URI

39 has an author RDF: Resource Description Framework

40 Journal A Journal B Wiki Blog Personal Website OPAC

41 Journal A Journal B Wiki Blog Personal Website OPAC

42

43 PREFIX rdf: PREFIX foaf: SELECT DISTINCT ?name WHERE { ?x rdf:type foaf:Person. ?x foaf:name ?name } ORDER BY ?name SPARQL

44

45

46 RSS 1.0 FRBR Creative Commons FOAF Geo SKOS

47 The Early Modern Internet

48 Data Mining = With the goal of discovering new, previously unknown information Information retrieval + Information extraction + Information analysis...

49 Data Mining = Text Data Mining = With the goal of discovering new, previously unknown information Complex data extraction layer + data mining Information retrieval + Information extraction + Information analysis...

50

51

52

53

54 Why do we publish text?

55 Thank You


Download ppt "Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007"

Similar presentations


Ads by Google