Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Do You Want—Semantic Understanding?

Similar presentations


Presentation on theme: "What Do You Want—Semantic Understanding?"— Presentation transcript:

1 What Do You Want—Semantic Understanding?
(You’ve Got to be Kidding) David W. Embley Brigham Young University Funded in part by the National Science Foundation

2 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

3 Semantic Understanding
Grand Challenge Semantic Understanding Can we quantify & specify the nature of this grand challenge?

4 Semantic Understanding
Grand Challenge Semantic Understanding “If ever there were a technology that could generate trillions of dollars in savings worldwide …, it would be the technology that makes business information systems interoperable.” (Jeffrey T. Pollock, VP of Technology Strategy, Modulant Solutions)

5 Semantic Understanding
Grand Challenge Semantic Understanding “The Semantic Web: … content that is meaningful to computers [and that] will unleash a revolution of new possibilities … Properly designed, the Semantic Web can assist the evolution of human knowledge …” (Tim Berners-Lee, …, Weaving the Web)

6 Semantic Understanding
Grand Challenge Semantic Understanding “20th Century: Data Processing “21st Century: Data Exchange “The issue now is mutual understanding.” (Stefano Spaccapietra, Editor in Chief, Journal on Data Semantics)

7 Semantic Understanding
Grand Challenge Semantic Understanding “The Grand Challenge [of semantic understanding] has become mission critical. Current solutions … won’t scale. Businesses need economic growth dependent on the web working and scaling (cost: $1 trillion/year).” (Michael Brodie, Chief Scientist, Verizon Communications)

8 Why Semantic Understanding?
We succeed in managing information if we can “[take] data and [analyze] it and [simplify] it and [tell] people exactly the information they want, rather than all the information they could have.” - Jim Gray, Microsoft Research Because we’re overwhelmed with data Point and click too slow “Give me what I want when I want it.” Because it’s the key to revolutionary progress Automated interoperability and knowledge sharing Automated negotiation in e-business Large-scale, in-silico experiments in e-science

9 What is Semantic Understanding?
Semantics: “The meaning or the interpretation of a word, sentence, or other language form.” Understanding: “To grasp or comprehend [what’s] intended or expressed.’’ - Dictionary.com

10 Can We Achieve Semantic Understanding?
“A computer doesn’t truly ‘understand’ anything.” But computers can manipulate terms “in ways that are useful and meaningful to the human user.” - Tim Berners-Lee Key Point: it only has to be good enough. And that’s our challenge and our opportunity!

11 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

12 Information Value Chain
Meaning Knowledge Information Data Translating data into meaning

13 Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of certainty or community agreement Information: data in a conceptual framework Data: attribute-value pairs - Adapted from [Meadow92]

14 Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs - Adapted from [Meadow92]

15 Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs - Adapted from [Meadow92]

16 Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs - Adapted from [Meadow92]

17 Data Attribute-Value Pairs Fundamental for information
Thus, fundamental for knowledge & meaning

18 Data Attribute-Value Pairs Data Frame Fundamental for information
Thus, fundamental for knowledge & meaning Data Frame Extensive knowledge about a data item Everyday data: currency, dates, time, weights & measures Textual appearance, units, context, operators, I/O conversion Abstract data type with an extended framework

19 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

20 ? Olympus C-750 Ultra Zoom Sensor Resolution: 4.2 megapixels
Optical Zoom: 10 x Digital Zoom: 4 x Installed Memory: 16 MB Lens Aperture: F/8-2.8/3.7 Focal Length min: 6.3 mm Focal Length max: 63.0 mm

21 ? Olympus C-750 Ultra Zoom Sensor Resolution: 4.2 megapixels
Optical Zoom: 10 x Digital Zoom: 4 x Installed Memory: 16 MB Lens Aperture: F/8-2.8/3.7 Focal Length min: 6.3 mm Focal Length max: 63.0 mm

22 ? Olympus C-750 Ultra Zoom Sensor Resolution: 4.2 megapixels
Optical Zoom: 10 x Digital Zoom: 4 x Installed Memory: 16 MB Lens Aperture: F/8-2.8/3.7 Focal Length min: 6.3 mm Focal Length max: 63.0 mm

23 ? Olympus C-750 Ultra Zoom Sensor Resolution 4.2 megapixels
Optical Zoom 10 x Digital Zoom 4 x Installed Memory 16 MB Lens Aperture F/8-2.8/3.7 Focal Length min 6.3 mm Focal Length max 63.0 mm

24 Digital Camera Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixels Optical Zoom: 10 x Digital Zoom: 4 x Installed Memory: 16 MB Lens Aperture: F/8-2.8/3.7 Focal Length min: 6.3 mm Focal Length max: 63.0 mm

25 ? Year 2002 Make Ford Model Thunderbird Mileage 5,500 miles
Features Red ABS 6 CD changer keyless entry Price $33,000 Phone (916)

26 ? Year 2002 Make Ford Model Thunderbird Mileage 5,500 miles
Features Red ABS 6 CD changer keyless entry Price $33,000 Phone (916)

27 ? Year 2002 Make Ford Model Thunderbird Mileage 5,500 miles
Features Red ABS 6 CD changer keyless entry Price $33,000 Phone (916)

28 ? Year 2002 Make Ford Model Thunderbird Mileage 5,500 miles
Features Red ABS 6 CD changer keyless entry Price $33,000 Phone (916)

29 Car Advertisement Year 2002 Make Ford Model Thunderbird
Mileage 5,500 miles Features Red ABS 6 CD changer keyless entry Price $33,000 Phone (916)

30 ? Flight # Class From Time/Date To Time/Date Stops
Delta Coach JFK :05 pm CDG 7:35 am Delta 119 Coach CDG :20 am JFK 1:00 pm

31 ? Flight # Class From Time/Date To Time/Date Stops
Delta Coach JFK :05 pm CDG 7:35 am Delta 119 Coach CDG :20 am JFK 1:00 pm

32 Airline Itinerary Flight # Class From Time/Date To Time/Date Stops
Delta Coach JFK :05 pm CDG 7:35 am Delta 119 Coach CDG :20 am JFK 1:00 pm

33 ? Monday, October 13, 2003 Group A W L T GF GA Pts. USA 3 0 0 11 1 9
Sweden North Korea Nigeria Group B W L T GF GA Pts. Brazil

34 ? Monday, October 13, 2003 Group A W L T GF GA Pts. USA 3 0 0 11 1 9
Sweden North Korea Nigeria Group B W L T GF GA Pts. Brazil

35 World Cup Soccer Monday, October 13, 2003 Group A W L T GF GA Pts.
USA Sweden North Korea Nigeria Group B W L T GF GA Pts. Brazil

36 ? Calories 250 cal Distance 2.50 miles Time 23.35 minutes
Incline 1.5 degrees Speed 5.2 mph Heart Rate 125 bpm

37 ? Calories 250 cal Distance 2.50 miles Time 23.35 minutes
Incline 1.5 degrees Speed 5.2 mph Heart Rate 125 bpm

38 ? Calories 250 cal Distance 2.50 miles Time 23.35 minutes
Incline 1.5 degrees Speed 5.2 mph Heart Rate 125 bpm

39 Treadmill Workout Calories 250 cal Distance 2.50 miles
Time minutes Incline 1.5 degrees Speed 5.2 mph Heart Rate 125 bpm

40 ? Place Bonnie Lake County Duchesne State Utah Type Lake
Elevation 10,000 feet USGS Quad Mirror Lake Latitude ºN Longitude ºW

41 ? Place Bonnie Lake County Duchesne State Utah Type Lake
Elevation 10,000 feet USGS Quad Mirror Lake Latitude ºN Longitude ºW

42 ? Place Bonnie Lake County Duchesne State Utah Type Lake
Elevation 10,000 feet USGS Quad Mirror Lake Latitude ºN Longitude ºW

43 Maps Place Bonnie Lake County Duchesne State Utah Type Lake
Elevation 10,100 feet USGS Quad Mirror Lake Latitude ºN Longitude ºW

44 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

45 Information Extraction Ontologies
Source Target Information Extraction Information Exchange

46 What is an Extraction Ontology?
Augmented Conceptual-Model Instance Object & relationship sets Constraints Data frame value recognizers Robust Wrapper (Ontology-Based Wrapper) Extracts information Works even when site changes or when new sites come on-line

47 Extraction Ontology: Example
Car [-> object]; Car [0:1] has Year [1:*]; Car [0:1] has Make [1:*]; Car [0:*] has Feature [1:*]; PhoneNr [1:*] is for Car [0:1]; Year matches [4] constant {extract “\d{2}”; context “\b’[4-9]\d\b”; …} Mileage matches [8] keyword {\bmiles\b”, “\bmi\b.”, …}

48 Extraction Ontologies: An Example of Semantic Understanding
“Intelligent” Symbol Manipulation Gives the “Illusion of Understanding” Obtains Meaningful and Useful Results

49 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

50 A Variety of Applications
Information Extraction High-Precision Classification Schema Mapping Semantic Web Creation Agent Communication Ontology Generation

51 Application #1 Information Extraction

52 Constant/Keyword Recognition
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or Descriptor/String/Position(start/end) Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155

53 Heuristics Keyword proximity Subsumed and overlapping constants
Functional relationships Nonfunctional relationships First occurrence without constraint violation

54 Keyword Proximity Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9
Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 D = 2 D = 52 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or

55 Subsumed/Overlapping Constants
Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or

56 Functional Relationships
Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or

57 Nonfunctional Relationships
Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or

58 First Occurrence without Constraint Violation
Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. # JERRY SEINER MIDVALE, or

59 Database-Instance Generator
Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr| |136|143 PhoneNr| |148|155 insert into Car values(1001, “97”, “CHEVY”, “Cavalier”, “7,000”, “11,995”, “ ”) insert into CarFeature values(1001, “Red”) insert into CarFeature values(1001, “5 spd”)

60 Application #2 High-Precision Classification

61 An Extraction Ontology Solution

62 Density Heuristic Document 1: Car Ads
Document 2: Items for Sale or Rent

63 Expected Values Heuristic
Document 1: Car Ads Year: 3 Make: 2 Model: 3 Mileage: 1 Price: 1 Feature: 15 PhoneNr: 3 Document 2: Items for Sale or Rent Year: 1 Make: 0 Model: 0 Mileage: 1 Price: 0 Feature: 0 PhoneNr: 4

64 Vector Space of Expected Values
OV ______ D1 D2 Year Make Model Mileage Price Feature PhoneNr D1: 0.996 D2: 0.567 ov D2

65 { { Grouping Heuristic Document 1: Car Ads
Year Make Model Price Mileage Document 1: Car Ads Year Mileage Price Document 2: Items for Sale or Rent { {

66 Grouping Expected Number in Group = floor(∑ Ave )
Car Ads Year Make Model Price Mileage Grouping: 0.875 Grouping Sale Items Year Mileage Price Grouping: 0.500 Expected Number in Group = floor(∑ Ave ) = 4 (for our example) 1-Max Sum of Distinct 1-Max Object Sets in each Group Number of Groups * Expected Number in a Group 4*4 = 0.875 4*4 = 0.500

67 Application #3 Schema Mapping

68 Problem: Different Schemas
Target Database Schema {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature} Different Source Table Schemas {Run #, Yr, Make, Model, Tran, Color, Dr} {Make, Model, Year, Colour, Price, Auto, Air Cond., AM/FM, CD} {Vehicle, Distance, Price, Mileage} {Year, Make, Model, Trim, Invoice/Retail, Engine, Fuel Economy}

69 Solution: Remove Internal Factoring
Unnest: μ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)* μ (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table Legend ACURA Discover Nesting: Make, (Model, (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*)*

70 Solution: Replace Boolean Values
Yes, βAutoβAir CondβAM/FM AM/FM Air Cond. Auto β CD Table Yes, CD ACURA ACURA Legend

71 Solution: Form Attribute-Value Pairs
AM/FM Air Cond. Auto CD ACURA ACURA Legend <Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto, Auto>, <Air Cond., Air Cond.>, <AM/FM, AM/FM>, <CD, >

72 Solution: Adjust Attribute-Value Pairs
AM/FM Air Cond. Auto CD ACURA ACURA Legend <Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto>, <Air Cond>, <AM/FM>

73 Solution: Do Extraction
AM/FM Air Cond. Auto CD ACURA ACURA Legend

74 Solution: Infer Mappings
πMakeμ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*μ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table πYearTable Note: Mappings produce sets for attributes. Joining to form records is trivial because we have OIDs for table rows (e.g. for each Car). πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table AM/FM Air Cond. Auto CD Each row is a car. ACURA ACURA Legend {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

75 Solution: Infer Mappings
πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table AM/FM Air Cond. Auto CD ACURA ACURA Legend {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

76 Solution: Do Extraction
AM/FM Air Cond. Auto CD ACURA ACURA Legend πPriceTable {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

77 Solution: Do Extraction
AM/FM Air Cond. Auto CD ACURA ACURA Legend ρ Colour←Feature π ColourTable U ρ Auto←Feature π Auto β AutoTable U ρ Air Cond.←Feature π Air Cond. β Air Cond.Table U ρ AM/FM←Feature π AM/FM β AM/FMTable U ρ CD←Featureπ CDβ CDTable Yes, Yes, Yes, Yes, {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}

78 Application #4 Semantic Web Creation

79 The Semantic Web Make web content accessible to machines
What prevents this from working? Lack of content Lack of tools to create useful content Difficulty of converting the web to the Semantic Web

80 Converting Web to Semantic Web

81 Superimposed Information

82 Application #5 Agent Communication

83 The Problem Requiring these assumptions precludes
Agents must: 1- share ontologies, 2- speak the same language, 3- pre-agree on message format. The Problem Requiring these assumptions precludes agents from interoperating on the fly “The holy grail of semantic integration in architectures” is to “allow two agents to generate needed mappings between them on the fly without a priori agreement and without them having built-in knowledge of any common ontology.” [Uschold 02]

84 Solution Eliminate all assumptions This requires:
Agents must: 1- share ontologies, 2- speak the same language, 3- pre-agree on message format. Eliminate all assumptions This requires: - Translating (developing mutual understanding) Dynamically capturing a message’s semantics Matching a message with a service

85 MatchMaking System (MMS)

86 Application #6 Ontology Generation

87 TANGO: Table Analysis for Generating Ontologies
Recognize and normalize table information Construct mini-ontologies from tables Discover inter-ontology mappings Merge mini-ontologies into a growing ontology

88 Recognize Table Information
Religion Population Albanian Roman Shi’a Sunni Country (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other Afganistan 26,813, % % % Albania ,510, % % %

89 Construct Mini-Ontology
Religion Population Albanian Roman Shi’a Sunni Country (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other Afganistan 26,813, % % % Albania ,510, % % %

90 Discover Mappings

91 Merge

92 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

93 Limitations and Pragmatics
Data-Rich, Narrow Domain Ambiguities ~ Context Assumptions Incompleteness ~ Implicit Information Common Sense Requirements Knowledge Prerequisites

94 Busiest Airport in 2003? Chicago
- 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.) Atlanta - 58,875,694 Passengers (Sep., latest numbers available) Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

95 Busiest Airport in 2003? Chicago
- 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.) Atlanta - 58,875,694 Passengers (Sep., latest numbers available) Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

96 Busiest Airport in 2003? Chicago
- 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.) Atlanta - 58,875,694 Passengers (Sep., latest numbers available) Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)

97 Busiest Airport in 2003? Chicago
- 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.) Atlanta - 58,875,694 Passengers (Sep., latest numbers available) Memphis - 2,494,190 Metric Tons (Airports Council Int’l.) Ambiguous Whom do we trust? (How do they count?)

98 Busiest Airport in 2003? Chicago
- 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.) Atlanta - 58,875,694 Passengers (Sep., latest numbers available) Memphis - 2,494,190 Metric Tons (Airports Council Int’l.) Important qualification

99 Dow Jones Industrial Average
High Low Last Chg 30 Indus 20 Transp 15 Utils 66 Stocks 44.07 10,409.85 Graphics, Icons, …

100 Dow Jones Industrial Average
High Low Last Chg 30 Indus 20 Transp 15 Utils 66 Stocks Reported on same date Weekly Daily Implicit information: weekly stated in upper corner of page; daily not stated. 44.07 10,409.85

101 Presentation Outline Grand Challenge
Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges

102 Some Key Ideas Data, Information, and Knowledge Data Frames Ontologies
Knowledge about everyday data items Recognizers for data in context Ontologies Resilient Extraction Ontologies Shared Conceptualizations Limitations and Pragmatics

103 Some Research Issues Building a library of open source data recognizers Creating a corpora of test data for extraction, integration, table understanding, … Precisely finding and gathering relevant information Subparts of larger data Scattered data (linked, factored, implied) Data behind forms in the hidden web Improving concept matching Indirect matching Calculations and unit conversions

104 Some Research Challenges
Automating ontology construction Converting web data to Semantic Web data Developing effective personal software agents


Download ppt "What Do You Want—Semantic Understanding?"

Similar presentations


Ads by Google