Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

Similar presentations


Presentation on theme: "BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young."— Presentation transcript:

1 BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young University

2 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote2

3 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote3

4 BIG DATA Volume: typically exceeding terabytes Variety: heterogeneous sources; diverse needs Velocity: phenomenal rate of acquisition Veracity: trustworthiness & uncertainty ER 2013 Keynote4

5 Volume: Kilobyte (10 3 ) ER 2013 Keynote5 A paragraph of text

6 Volume: Megabyte (10 6 ) ER 2013 Keynote6 A small novel

7 Volume: Gigabyte (10 9 ) ER 2013 Keynote7 Sound wave of Beethovens Fifth Symphony

8 Volume: Terabyte (10 12 ) ER 2013 Keynote8 All the X-ray images in a large hospital …

9 Volume: Petabyte (10 15 ) ER 2013 Keynote9 10 billion Facebook photos

10 Volume: Exabyte (10 18 ) ER 2013 Keynote10 1/5 of the words ever spoken

11 Volume: Zettabyte (10 21 ) ER 2013 Keynote11 Grains of sand on all the worlds beaches

12 Volume: Yottabyte (10 24 ) ER 2013 Keynote12 Atoms in 7,000 human bodies … NSA data site – purportedly designed to store yottabytes of data.

13 Variety: Heterogeneous Sources & Diverse Needs ER 2013 Keynote13 Radiology Report (John Doe, July 19, 12:14 pm)

14 Velocity ER 2013 Keynote14 Astronomers expect to be processing 10 petabytes of data every hour from the SKA telescope. Square Kilometer Array Telescope

15 Velocity ER 2013 Keynote15 One minute on the Internet: 640TB data transferred, 100k tweets, 204 million e-mails sent

16 Veracity: Uncertainty An age-old question: What is truth? Einstein: The pursuit of truth and beauty is a sphere of activity in which we are permitted to remain children all our lives. Of one thing we can be certain: ER 2013 Keynote16

17 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote17

18 Conceptual Modeling & BIG DATA Main thrust: organizing data [Chen, TODS76] And, thats one of the challenges of BIG DATA … but – Volume: too big – Variety: too much – Velocity: too fast – Veracity: too uncertainty ER 2013 Keynote18

19 Looking Backward ER 2013 Keynote19 select PART-NO, QUANTITY-ON-HAND where …

20 Looking Forward Conceptualization of the Web – Semantic search as well as keyword search – World-wide knowledge sharing Examples: – DB-pedia – Conceptual Graphs Googles Knowledge Graph Yahoo!s Web of Objects Facebooks Graph Search Microsofts/Bings Satori Knowledge Base – Metaweb – FamilySearch Conceptual Modeling should apply! ER 2013 Keynote20

21 ER 2013 Keynote21 SELECT ?name ?description_en ?description_de ?musician WHERE { ?musician. ?musician foaf:name ?name. OPTIONAL { ?musician rdfs:comment ?description_en. FILTER (LANG(?description_en) = 'en'). } OPTIONAL { ?musician rdfs:comment ?description_de.

22 Googles Knowledge Graph ER 2013 Keynote22

23 Yahoo!s Web of Objects ER 2013 Keynote23 Yahoo!s image answer to: What is a food?

24 Facebooks Graph Search ER 2013 Keynote24

25 Satori Knowledge Base ER 2013 Keynote25

26 Metaweb ER 2013 Keynote26 Boston ?

27 Metaweb ER 2013 Keynote27 Dont forget to take Wendy to Bostons birthday party at 2:00.

28 Metaweb ER 2013 Keynote28 Dont forget to take Wendy to Bostons birthday party at 2:00.

29 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote29

30 ER 2013 Keynote30 Visitors per day: 85,000+ Pages viewed per day: 5M+ A service provided by The Church of Jesus Christ of Latter-day Saints. © 2013 by Intellectual Reserve, Inc. All rights reserved. A free family history web site

31 & the WoK-HD Project FamilySearch – Volume: 1.8PB+ online (1.2B records along with 900M 2MB jpeg images) 42PB+ offline (1.2B 30–40MB tiff images) – Velocity: 500M+ images in 2013 200K+ volunteer indexers WoK-HD scanned-book project (within FamilySearch) – Volume: 100,000 books (3.5TB expected) – Velocity: 25,000 books / year ER 2013 Keynote31

32 WoK-HD (A Web of Knowledge Superimposed over Historical Documents) …… …… ER 2013 Keynote32

33 WoK-HD (A Web of Knowledge Superimposed over Historical Documents) …… grandchildren of Mary Ely …… ER 2013 Keynote33

34 WoK-HD (A Web of Knowledge Superimposed over Historical Documents) …… …… grandchildren of Mary Ely ER 2013 Keynote34

35 WoK-HD (A Web of Knowledge Superimposed over Historical Documents) …… grandchildren of Mary Ely …… ER 2013 Keynote35

36 grandchildren of Mary Ely WoK-HD (A Web of Knowledge Superimposed over Historical Documents) …… …… ER 2013 Keynote36

37 WoK-HD Construction Mitigating Velocity, Variety, & Volume – CM-based information extraction PatternReader (for semi-structured text) OntoSoar (for unstructured text) – Automated information harvesting & organization Assuring Veracity – CM-based query processing (with links and reasoning chains for extracted information) – Automated analysis with evidence-based CMs ER 2013 Keynote37

38 PatternReader ER 2013 Keynote38 THE ELY ANCESTRY. 419 SEVENTH GENERATION. 241213. Mary Eliza Warner, b. 1826, dau. of Samuel Selden Warner and Azubah Tully; m. 1850, Joel M. Gloyd (who was connected with Chief Justice Waite's family), 24331 1. Abigail Huntington Lathrop (widow), Boonton, N. J., b. 1810, dau. of Mary Ely and Gerard Lathrop ; m. 1835, Donald McKenzie. West Indies, who was b. 1812, d. 1839. (The widow is unable to give the names of her husband's parents.) Their children 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. 243312. William Gerard Lathrop, Boonton, N. J., b. 1812, d. 1882, son of Mary Ely and Gerard Lathrop; m. 1837, Charlotte Brackett Jennings, New York City, who was b. 1818, dau. of Nathan Tilestone Jennings and Maria Miller. Their children: 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. 243314. Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss, 992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge Caleb Halstead Andruss and Emma Sutherland Goble. Mrs. Lathrop died at her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4, 1898. The funeral services were held at her residence on Monday, Nov. 7, 1898, at half-past two o'clock P. M. Their children: 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862. Miss Emma Goble Lathrop, official historian of the New York Chapter of the Daughters of the American Revolution, is one of the youngest members to hold office, but one whose intelligence and capability qualify her for such distinction. Miss Lathrop is not without experience; in her present home and native city, Newark, N. J., she has filled the positions of secretary and treasurer to the Girls' Friendly Society for nine years, secretary and president of the Woman's Auxiliary of Trinity Church Parish, treasurer of the St. Catherine's Guild of St. Barnabas Hospital, and manager of several of Newark's charitable institutions which her grandparents were instrumental in founding. Miss Lathrop traces her lineage back through many generations of famous progenitors on both sides. Her maternal ancestors were among the early settlers of New Jersey, among them John Ogden, who received patent in 1664 for the purchase of Elizabethtown, and who in 1673 was

39 PatternReader ER 2013 Keynote39 THE ELY ANCESTRY. 419 SEVENTH GENERATION. 241213. Mary Eliza Warner, b. 1826, dau. of Samuel Selden Warner and Azubah Tully; m. 1850, Joel M. Gloyd (who was connected with Chief Justice Waite's family), 24331 1. Abigail Huntington Lathrop (widow), Boonton, N. J., b. 1810, dau. of Mary Ely and Gerard Lathrop ; m. 1835, Donald McKenzie. West Indies, who was b. 1812, d. 1839. (The widow is unable to give the names of her husband's parents.) Their children 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. 243312. William Gerard Lathrop, Boonton, N. J., b. 1812, d. 1882, son of Mary Ely and Gerard Lathrop; m. 1837, Charlotte Brackett Jennings, New York City, who was b. 1818, dau. of Nathan Tilestone Jennings and Maria Miller. Their children: 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. 243314. Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss, 992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge Caleb Halstead Andruss and Emma Sutherland Goble. Mrs. Lathrop died at her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4, 1898. The funeral services were held at her residence on Monday, Nov. 7, 1898, at half-past two o'clock P. M. Their children: 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862. Miss Emma Goble Lathrop, official historian of the New York Chapter of the Daughters of the American Revolution, is one of the youngest members to hold office, but one whose intelligence and capability qualify her for such distinction. Miss Lathrop is not without experience; in her present home and native city, Newark, N. J., she has filled the positions of secretary and treasurer to the Girls' Friendly Society for nine years, secretary and president of the Woman's Auxiliary of Trinity Church Parish, treasurer of the St. Catherine's Guild of St. Barnabas Hospital, and manager of several of Newark's charitable institutions which her grandparents were instrumental in founding. Miss Lathrop traces her lineage back through many generations of famous progenitors on both sides. Her maternal ancestors were among the early settlers of New Jersey, among them John Ogden, who received patent in 1664 for the purchase of Elizabethtown, and who in 1673 was … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862.

40 PatternReader ER 2013 Keynote40 … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862.

41 PatternReader ER 2013 Keynote41 … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862. OCR Error Twins (lost in OCR) }

42 PatternReader ER 2013 Keynote42 … #. Aaaa Aaaa, b, 18##, d. 18##. #. Aaaa Aaaa, b. 18##. … #. Aaaa Aaaa, b. 18##, d. 18##. #. Aaaa Aaaa, b. 18##. ). #. Aaaa AaAa, b. 18##, d. 18##. ] #. Aaaa Aaaa, b. 18##. … #. Aaaa Aaaa, b. 18##, d. 18##. #. Aaaa Aaaa, b. i8##. #. Aaaa Aaaa, b. 18##. ^(\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)$ ^(\d)\.\s(([A-Z][a-z][A-Z][a-z]{5})|([A-Z][a-z]{3,7}))\s([A-Z][a-z]{4,9}),\sb[.,]\s(18\d\d)\sd.\s(18\d\d)\.$ Conflate symbols and induce grammar

43 PatternReader ER 2013 Keynote43 … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862.

44 PatternReader ER 2013 Keynote44 … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862.

45 Conceptual Modelingthe Backbone ER 2013 Keynote45

46 Conceptual Modelingthe Backbone ER 2013 Keynote46 (\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)

47 Conceptual Modelingthe Backbone ER 2013 Keynote47 (\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)

48 Conceptual Modelingthe Backbone ER 2013 Keynote48

49 Extraction Ontologies 49 Linguistically Grounded Conceptual Models ER 2013 Keynote

50 Lexical Object-Set Recognizers 50 BirthDate external representation: \b[1][6-9]\d\d\b left context: b\.\s right context: [.,] … ER 2013 Keynote

51 Non-lexical Object-Set Recognizers 51 Person object existence rule: {Name} … Name external representation: \b{FirstName}\s{LastName}\b … ER 2013 Keynote

52 Relationship-Set Recognizers 52 Person-BirthDate external representation: ^\d{1,3}\.\s{Person},\sb\.\s{BirthDate}[.,] … ER 2013 Keynote

53 Ontology-Snippet Recognizers 53 ChildRecord external representation: ^(\d{1,3})\.\s+([A-Z]\w+\s[A-Z]\w+) (,\sb\.\s([1][6-9]\d\d))?(,\sd\.\s([1][6-9]\d\d))?\. ER 2013 Keynote

54 HMM Recognizers 54

55 OntoSoar Recognizers ER 2013 Keynote55

56 OntoSoar Recognizers ER 2013 Keynote56 +---------------------------------Xp------------------------------+ | +--------Ost--------+ +-----Js-----+ | +-Wd-+-Ss-+ +-----A-----+--Mp---+ +---DG--+ | | | | | | | | | | ^ Emma was.v official.a historian.n of the NYCDAR. of(x1,x2) NYCDAR(x2) Emma(x1) historian(x1) official(x1) Name(Emma) Officer(historian) Organization(NYCDAR) Person–Name(y1,Emma) OntoES Soar Person-Officer-Organization(y1,official historian,NYCDAR)

57 Beyond Extraction Canonicalization Reasoning – Extraction of implied assertions – Generation of implied assertions – Object identity resolution Free-form query processing Form-based advanced query processing ER 2013 Keynote57 All based on Conceptual Modeling

58 Canonicalization for Lexical Object Sets Easter 1832 JulianDate(1832113) JulianDate(1832113) 22 Apr 1832 Saml and Geo. Samuel and George Boonton, N.J. Boonton, NJ, USA Operations: – before(Date 1, Date 2 ): Boolean – probabilityMale(Name): 0.0..1.0 58ER 2013 Keynote

59 Implied Assertions 59 Authors ViewDesired View ER 2013 Keynote Maria Jennings … daughter of … William Gerard Lathrop Gender: Female Name: GivenName: Maria Jennings Surname: Lathrop

60 Implied Assertions 60ER 2013 Keynote Maria Jennings Lathrop … child of … William Gerard Lathrop … son of … Mary Ely … Female Mary Ely … grandmother of … Maria Jennings Lathrop

61 Object Identity Resolution ER 2013 Keynote61 0.032081 0.995030

62 Free-Form Query Processing ER 2013 Keynote62 Persons born in 1838

63 Free-Form Query Processing ER 2013 Keynote63 Persons born in 1838 born Person(s)?

64 Free-Form Query Processing ER 2013 Keynote64 Persons born in 1838 = 1838 Person Name BirthDate Person 11 Gerard Lathrop McKenzie 1838 Person 18 Maria Jennings Lathrop 1838 born Person(s)?

65 Free-Form Query Processing ER 2013 Keynote65 Persons born in 1838 Person Name BirthDate Person 11 Gerard Lathrop McKenzie 1838 Person 18 Maria Jennings Lathrop 1838 Gerard Lathrop McKenzie because: Person(Person 11 ) has GivenName (Gerard Lathrop) and Child(Person 11 ) of Person(Person 9 ) and Person(Person 9 ) has Gender(Male) and Person(Person 9 ) has Surname(McKenzie)

66 Form-Based Advanced Query Processing ER 2013 Keynote66 Cousins of Donald Lathrop who died before he was born or were born after he died. Cousin

67 Form-Based Advanced Query Processing ER 2013 Keynote67 Cousins of Donald Lathrop who died before he was born or were born after he died. … … 1. Mary Ely, b, 1836, d. 1859. 2. Gerard Lathrop, b. 1838. … 1. Maria Jennings, b. 1838, d. 1840. 2. William Gerard, b. 1840. ). 3. Donald McKenzie, b. 1840, d. 1843. ] 4. Anna Margaretta, b. 1843. 5. Anna Catherine, b. 1845. … 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862.

68 Veracity Knowledge – Populated conceptual model – Plato: justified true belief FamilySearch – Conceptual model of reality – Constraint violation (discovery) – Assertion verification (evidence) Conceptual modeling for veracity ER 2013 Keynote68 Mitigating Uncertainty with Conceptual Modeling

69 Veracity: Justified True Belief ER 2013 Keynote69 Persons born in 1838 Person Name BirthDate Person 11 Gerard Lathrop McKenzie 1838 Person 18 Maria Jennings Lathrop 1838 Gerard Lathrop McKenzie because: Person(Person 11 ) has GivenName (Gerard Lathrop) and Child(Person 11 ) of Person(Person 9 ) and Person(Person 9 ) has Gender(Male) and Person(Person 9 ) has Surname(McKenzie)

70 FamilySearch: Wiki-like Updates + Uncertain Information Sources of error: 1.Incorrect person merges 2.Incorrect parent-child relationship assertions Cyclic Pedigree:

71 FamilySearch: Useful More Expressive Conceptual Model Specifications 1:* 1:2.1:* x 2 Nov 1846 1 Nov 1845 p = 0.79 p = 0.35

72 Evidence-Based Conceptual Modeling (1) Model Reality, (2) Allow/Discover Discrepancies, (3) Add Evidence 1:* 1:2.1:* x 2 Nov 1846 1 Nov 1845 p = 0.79 p = 0.35

73 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote73

74 Principles that guide the use of Conceptual Modeling in BIG DATA Harvest wrt a conceptual model – Extraction ontologies – And … Organize wrt a conceptual model – Rich conceptualizations – And … Analyze wrt a conceptual model – Evidence-based reasoning – And … ER 2013 Keynote74

75 More Examples of Conceptual Modeling in BIG DATA Applications Knowledge Bundle Building for Research Studies (KBB) Multi-Lingual Query Processing (ML-OntoES) Table Understanding (TISP, Table Ontology) Automating Ontology Creation (TANGO) Automated Reading (OntoSoar) Homeland Security Twitter Suicide Study Human Genome Project ER 2013 Keynote75 Dream! Think Big! Contribute!

76 Knowledge Bundle Building (i.e., Construct and Populate CMs) Objective: Study the association of: – TP53 polymorphism and – Lung cancer Task: locate, gather, organize data from: – Single Nucleotide Polymorphism database – Medical journal articles – Medical-record database – Radiology images and reports ER 2013 Keynote76 Example: Bio-Medical Research

77 Form-Based Extraction Ontologies Gather SNP Information from the NCBI dbSNP Repository ER 2013 Keynote77

78 Linguistically Grounded Conceptual Models Search PubMed Literature ER 2013 Keynote78

79 Reverse-Engineer Human Subject Information from I NDIVO into a Conceptual Model ER 2013 Keynote79

80 Add Annotated Images into the Conceptual Knowledge Bundle Radiology Report (John Doe, July 19, 12:14 pm) ER 2013 Keynote80

81 Query and Analyze Data in Knowledge the Bundle ER 2013 Keynote81

82 Q Honda moins de 8000 en « excellent état » marque prix mots de clé Honda 7826 Honda (2) 8000 français + ER 2013 Keynote82 Multi-Lingual Query Processing

83 Q Honda moins de 8000 en « excellent état » marque prix mots de clé Honda 7826 Honda (2) 8000 français + ER 2013 Keynote83

84 Table Understanding Tables on the web – 14.1 billion HTML tables [Cafarella et al. 08] Most are tables for layout 154 million high-quality relational tables – 50 million spreadsheet tables [Adelfio & Samet 13] Web table complexity (sampling statistics) [ibid] – Simple relational table: 25% (spreadsheet) 68% (HTML) – Multiple header rows: 15% (spreadsheet) 7% (HTML) – More complex: 60% (spreadsheet) 25% (HTML) ER 2013 Keynote84

85 Table Understanding ER 2013 Keynote85 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

86 Table Understanding ER 2013 Keynote86 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

87 Table Understanding ER 2013 Keynote87 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

88 Table Understanding ER 2013 Keynote88 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

89 Table Understanding ER 2013 Keynote89 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

90 Table Understanding ER 2013 Keynote90 ABCDE Less than 100100-299 p300 pupils or more 1 Schools2003/0436.23924.8 2 2004/0535.23925.8 3 2005/0635.23925.8 4 2006/0734.34025.7 5 2007/083439.626.4 6 2008/0933.34026.7 7 2009/1013240.727.3 8 Pupils2003/048.739.352 9 2004/058.738.353 10 2005/068.838.352.9 11 2006/078.43952.6 12 2007/088.338.253.5 12 2008/098.138.253.7 12 2009/1017.738.254.1

91 Automating Ontology Creation with TANGO AgglomerationPopulationContinentCountry Tokyo31,139,900AsiaJapan New York-Philadelphia30,286,900The AmericasUnited States of America Mexico21,233,900The AmericasMexico Seoul19,969,100AsiaKorea (South) Sao Paulo18,847,400The AmericasBrazil Jakarta17,891,000AsiaIndonesia Osaka-Kobe-Kyoto17,621,500AsiaJapan ………… Niigata503,500AsiaJapan Raurkela503,300AsiaIndia Homjel502,200EuropeBelarus Zunyi501,900AsiaChina Santiago501,800The AmericasDominican Republic Pingdingshan501,500AsiaChina Fargona501,000AsiaUzbekistan Kirov500,200EuropeRussia Newcastle500,000Australia /Oceania Australia AgglomerationPopulation CountryContinent

92 Merge Results AgglomerationPopulation CountryContinent Time Location LongitudeLatitude hasnames Latitude and longitude designates location CountryCity NameGeopolitical Entity Continent Location LongitudeLatitude Latitude and longitude designates location NameGeopolitical Entity Population City Agglomeration Country Has GMT Time Location LongitudeLatitude hasnames Latitude and longitude designates location CountryCity NameGeopolitical Entity Has GMT Automating Ontology Creation with TANGO

93 Automated Reading: NELL ER 2013 Keynote93 http://rtw.ml.cmu.edu

94 Automated Reading: OntoSoar Populate conceptual model from text – Directly – By inference Augment conceptual model and populate ER 2013 Keynote94

95 Homeland Security: Terrorist Example ER 2013 Keynote95

96 Homeland Security: Terrorist Example ER 2013 Keynote96

97 Homeland Security: Terrorist Example Abu Aziz ? White House ER 2013 Keynote97

98 Homeland Security: Terrorist Example Abu Aziz ? White House ER 2013 Keynote98

99 Homeland Security: Terrorist Example Abu Aziz ? White House What If! ER 2013 Keynote99

100 Twitter Suicide Study ER 2013 Keynote100 Tweets could warn of a suicide risk, BYU study says Oct. 10 2013 … Over three months, the computer scientists screened millions of tweets and identified 37,717 that were "genuinely troubling" from 28,088 unique users …

101 Conceptual Modeling for Studying the Human Genome ER 2013 Keynote101

102 Conceptual Modeling for Studying the Human Genome ER 2013 Keynote102

103 Roadmap What is BIG DATA? Why should Conceptual Modeling apply? Examples to show how Conceptual Modeling can come to the rescue Summary (and take-home message): – Principles that guide the use of Conceptual Modeling in BIG DATA applications – Challenges and Research Opportunities ER 2013 Keynote103

104 Principles that guide the use of Conceptual Modeling in BIG DATA Harvest wrt a conceptual model – Extraction ontologies – And: table understanding, automated reading, … Organize wrt a conceptual model – Rich conceptualizations – And: KBs for research studies, multilingual web, … Analyze wrt a conceptual model – Evidence-based reasoning – And: what-if, warning signs search, DNA, … ER 2013 Keynote104

105 Summary & Challenge Conceptual Modeling Applies to BIG DATA (perhaps more than you might have thought) Challenge: find ways to use conceptual modeling to rescueresolve BIG DATA issues ER 2013 Keynote105 BYU Data Extraction Research Group www.deg.byu.edu

106 Summary & Challenge Conceptual Modeling Applies to BIG DATA (perhaps more than you might have thought) Challenge: find ways to use conceptual modeling to rescueresolve BIG DATA issues ER 2013 Keynote106 BYU Data Extraction Research Group www.deg.byu.edu


Download ppt "BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young."

Similar presentations


Ads by Google