Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Concepts, Ontologies, and Project TANGO Deryle Lonsdale BYU Linguistics and English Language

Similar presentations


Presentation on theme: "1 Concepts, Ontologies, and Project TANGO Deryle Lonsdale BYU Linguistics and English Language"— Presentation transcript:

1 1 Concepts, Ontologies, and Project TANGO Deryle Lonsdale BYU Linguistics and English Language lonz@byu.edu

2 2 Outline NSF projects Semantic Web Concepts Project TIDIE Ontologies Project TANGO Tables Ontology generation

3 3 Acknowledgements NSF David Embley (BYU CS), Steve Liddle (BYU Marriott School) and Yuri Tijerino BYU Data Extraction Group members

4 4 The National Science Foundation Federal agency, $5.5 billion budget, funds 20% of all federally supported basic research conducted by America’s colleges and universities 7 directorates Biological Sciences, Computer and Information Science and Engineering, Engineering, Geosciences, Mathematics and Physical Sciences, Social, Behavioral and Economic Sciences, and Education and Human Resources 200,000 scientists, engineers, educators and students at universities, laboratories and field sites 10,000 awards/year, 3 years duration (avg.)

5 5 The NSF Nifty 50 (general) ACCELERATING, EXPANDING UNIVERSE ANTARCTIC OZONE HOLE RESEARCH ARABIDOPSIS—A PLANT GENOME PROJECT BAR CODES BLACK HOLES CONFIRMED BUCKY BALLS COMPUTER VISUALIZATION TECHNIQUES DATA COMPRESSION TECHNOLOGY DISCOVERY OF PLANETS DOPPLER RADAR EFFECTS OF ACID RAIN EL NIÑO AND LA NIÑA PREDICTIONS FIBER OPTICS GEMINI TELESCOPES HANTAVIRUS IDENTIFICATION DNA FINGERPRINTING MRI—MAGNETIC RESONANCE IMAGING NANOTECHNOLOGY THE NATIONAL OBSERVATORIES OVERCOMING HEAVY METALS OVERCOMING SALT TOXICITY TISSUE ENGINEERING TUMOR DETECTION VOLCANIC ERUPTION DETECTION YELLOW BARRELS

6 6 Language-related Nifty 50 AMERICAN SIGN LANGUAGE DICTIONARY DEVELOPMENT COMPUTER VISUALIZATION TECHNIQUES THE DARCI CARD DATA COMPRESSION TECHNOLOGY THE "EYE CHIP" OR RETINA CHIP THE INTERNET PERSONS WITH DISABILITIES ACCESS TO THE WEB PROJECT LISTEN SPEECH RECOGNITION TECHNOLOGY vBNS—VERY HIGH SPEED BACKBONE NETWORK SYSTEM WEB BROWSERS

7 7 Hypernym Synonym Annotation The search query Browsing the Semantic Web

8 8  Ranking based on content data and structure  Grouping results by their conceptual relationships  Using lexical semantics for similarity search movie astronomy sports Browsing the Semantic Web

9 9 Desirable, not (yet) possible Word sense disambiguation Other types of queries (e.g. services) What is the cheapest available round-trip flight to Cancun the day after finals this semester? Set up an appointment with my optometrist for next week. List available 4-person BYU-approved apartments in Orem for under $150/month. Find me a linguistics job in Tahiti.

10 10 Project TIDIE Apr 10, 2001 – May 12, 2005

11 11 Overview of TIDIE 3-year NSF project at BYU Total amount about $430,000 PI David Embley (BYU CS), 4 co-PI’s from BYU 18 grad students, 45 publications Demos, tools, papers, presentations at website (www.deg.byu.edu/)www.deg.byu.edu/

12 12 Goal of TIDIE Target-Based Independent-of-Document Information Extraction Target-based: user specifies what to find Not just keyword search, but concept-based search using an ontology Document independent Should work even if pages change over time, on new documents IE: match, merge, retrieve, format information Present in way that user can search, query results

13 13 Document-based IE

14 14 Recognition and extraction Car Year Make Model Mileage Price PhoneNr 0001 1989 Subaru SW $1900 (336)835-8597 0002 1998 Elantra (336)526-5444 0003 1994 HONDA ACCORD EX 100K (336)526-1081 Car Feature 0001 Auto 0001 AC 0002 Black 0002 4 door 0002 tinted windows 0002 Auto 0002 pb 0002 ps 0002 cruise 0002 am/fm 0002 cassette stereo 0002 a/c 0003 Auto 0003 jade green 0003 gold

15 15 Concepts What drive the matching process for IE Inherent in words, numbers, phrases, text Linguistics: lexical semantics Denotations: entities, attributes Location: relationships Occurrences: constraints

16 16 Concept matching We use exhaustive concept matching techniques to find concepts in documents including: Lexical information (lexicons) Natural language processing (NLP) techniques Similarity of values Features of value Data frames Constraints

17 17 Lexicons Repositories of enumerable classes of lexical information FirstNames, LastNames, USStates, ProvoOremApts, CarMakes, Drugs, CampGroundFeats, etc. WordNet (synonyms, word senses, hypernyms/hyponyms)

18 18 The data-frame library Snippets of real-world knowledge about data (type, length, nearby keywords, patterns [as in regexps], functional relations, etc) Low-level patterns implemented as regular expressions Match items such as email addresses, phone numbers, names, etc. Mileage matches [8] constant { extract "\b[1-9]\d{0,2}k"; substitute "[kK]" -> "000"; }, { extract "[1-9]\d{0,2}?,\d{3}"; context "[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";}, { extract "[1-9]\d{0,2}?,\d{3}"; context "(mileage\:\s*)[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";}, { extract "[1-9]\d{3,6}"; context "[^\$\d][1-9]\d{3,6}\s*mi(\.|\b\les\b)";}, { extract "[1-9]\d{3,6}"; context "(mileage\:\s*)[^\$\d][1-9]\d{3,6}\b";}; keyword "\bmiles\b", "\bmi\.", "\bmi\b", "\bmileage\b"; end;

19 19 Isolated concepts are OK, but... We’re also interested in the relations between concepts This is often best done graphically Ontology: arrangement of concepts that explicitizes their relations, constraints Conceptual modeling: field of CS / linguistics that deals with formalizing concepts, using such information BYU has its own well-known conceptual modeling framework (OSM)

20 20 Conceptual modeling (OSM) YearPrice Make Mileage Model Feature PhoneNr Extension Car has is for has 1..* 0..1 1..* 0..1 0..* 1..*

21 21 Ontologies and IE SourceTarget

22 22 '97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888 Constant/keyword recognition Descriptor/String/Position(start/end) Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr|566-3800|136|143 PhoneNr|566-3888|148|155

23 23 Year|97|2|3 Make|CHEV|5|8 Make|CHEVY|5|9 Model|Cavalier|11|18 Feature|Red|21|23 Feature|5 spd|26|30 Mileage|7,000|38|42 KEYWORD(Mileage)|miles|44|48 Price|11,995|100|105 Mileage|11,995|100|105 PhoneNr|566-3800|136|143 PhoneNr|566-3888|148|155 Database instance generator insert into Car values(1001, “97”, “CHEVY”, “Cavalier”, “7,000”, “11,995”, “556-3800”) insert into CarFeature values(1001, “Red”) insert into CarFeature values(1001, “5 spd”)

24 24 Car ads extraction ontology

25 25 Car ads ontology (textual) Car [->object]; Car [0..1] has Year [1..*]; Car [0..1] has Make [1..*]; Car [0...1] has Model [1..*]; Car [0..1] has Mileage [1..*]; Car [0..*] has Feature [1..*]; Car [0..1] has Price [1..*]; PhoneNr [1..*] is for Car [0..*]; PhoneNr [0..1] has Extension [1..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d[^\d]"; substitute "^" -> "19"; }, … End;

26 26 A gene ontology

27 27 A geneology data model

28 28 Finding jobs in linguistics Built ontology for linguistics jobs: what defines a linguistics job Data frames and lexicons: language names (www.ethnologue.com), subfields of linguistics (www.linguistlist.org), tools linguists use, programming languages, activities, responsibilities, country nameswww.ethnologue.comwww.linguistlist.org Documents: 3500 web pages + emails to me Complete results reported in DLLS 2003

29 29 Sample query

30 30 Sample output

31 31 Subfield expertise sought

32 32 Technical skills sought

33 33 Sample observations 270 don’t have linguist* (!) Computer/computational background required for almost 1/3 (1116) Noticeable amount of headhunting, particularly in Seattle, DC areas Often a job title is not even listed (!) Great need for ontologies related to linguistics job titles theoretical frameworks, subfields typical linguist job activities linguistic research/development venues

34 34 An engineering discipline? 160 linguistics jobs ending in “engineer” Software development cycle research e., software design e. development e., software e. software quality e., linguistic test e., linguistic quality e. linguistic support e., user experience e. presales e., technical sales e. Specific subfields web site e. speech e., voice recognition e., speech recognition application e., speech e., ASR tuning e., audio e. dialog e. tools e. AI e., NLP e. knowledge e., ontology e. linguist e., natural language e. staff e. human factors e., user interface e.

35 35 A recent ontologist job ad Date: Thu, 28 Jul 2005 11:44:40 Subject: General Linguistics: Ontologist, Denver, USA Job Rank: Ontologist Specialty Areas: General Linguistics Position Summary: Ontologist This person will be responsible for modifying & editing Ontology structures. Skills:  Basic computer skills such as Internet, email, and spreadsheet programs  In-depth knowledge of any major industry, such as Health Care, Automotive, Legal, Construction, and so forth helpful  Superior communication skills, both oral and written. Ability to communicate effectively with reports, peers, superiors, and customers essential  Travel &/or foreign language experience desired Personal Characteristics:  A healthy sense of logic, and a love for details  A deep and abiding love of language, and of rule-governed classification systems. This person should be excited by the challenge of figuring out the precise place where a word belongs, and be delighted with the prospect of performing such tasks as the major part of their job Position Qualifications:  -Bachelor's degree, preferably in Linguistics, Library Science, English, or related field

36 36 Another recent ontologist ad Position Summary: Lead Ontologist The Lead Ontologist will be responsible for creating & designing Ontology and Ontology structures. This person will be responsible for innovation and general Ontology development as Ontology requirements change. They will serve as Team Lead on various Ontology projects, and they will assist the Director with certain aspects of management, including the development of department culture and standards. They will also serve as a liaison between the Director and the rest of the team. Skills:  Ability to edit & manipulate text highly desired, using tools such as Emacs and Perl. High level programming language experience and SQL also desired  Knowledge of Ontology structures, and experience with developing and maintaining such structures  Ability to assist with Ontology development and use problem-solving skills to overcome obstacles  Ability to QA own Ontology work, and work of others  Ability to lead projects from set-up through to QA  Leadership or management experience a plus Position Qualifications:  -Bachelor's degree in Linguistics, Library Science, or related field  -2-3 years experience in Ontology or related field Application Deadline: Open until filled.

37 37 Matching request with ontology “Tell me about cruises on San Francisco Bay. I’d like to know scheduled times, cost, and the duration of cruises on Friday of next week.”

38 38 Building a query Friday, Oct. 29thcost duration Selection Constants San Francisco Bay scheduled times Projection = Result   () Join Path

39 39 StartTimePriceDurationSource 10:45 am, 12:00 pm, 1:15, 2:30, 4:00$20.00, $16.00, $12.00 1 10:00 am, 10:45 am, 11:15 am, 12:00 pm, 12:30 pm, 1:15 pm, 1:45 pm, 2:30 pm, 3:00 pm, 3:45 pm, 4:15 pm, 5:00 pm $17.00, $16.00, $12.00 1 Hour2

40 40 Another example Service Request Match with Task Ontology Domain Ontology Process Ontology Complete, Negotiate, Finalize I want to see a dermatologist next week; any day would be ok for me, at 4:00 p.m. The dermatologist must be within 20 miles from my home and must accept my insurance.

41 41 Service domain ontology

42 42

43 43 Relevant mini-ontology

44 44 Ontologies: issues Most successful in data-rich, narrow- domain applications Ambiguities are problematic, context only partially eliminates Incompleteness: implicit information Commonsense world pragmatics evasive Knowledge prerequisites are steep Major efforts in creation, maintenance Must be created by experts Experts are biased in knowledge, agreement needed Ontologies continually change; upkeep a massive task

45 45 Ontologies: possible solutions Some automation is needed Current automatic generation of ontologies is not successful, because extracted from free-form, unstructured text. A more effective alternative is to extract ontologies from structured data on the web (tables, charts, etc.) TANGO project Part 1: Extract tables from the web Part 2: Define mini-ontologies from tables Part 3: Merge into growing domain ontology

46 46 Project TANGO

47 47 Overview Table ANalysis for Generating Ontologies 3-year NSF-funded project Joint BYU/RPI project Uses and extends TIDIE concepts, ontologies Goal is to process tables, generate ontologies, use results for IE

48 48 Motivation Keyword or link analysis search not enough to search for information in tables Structure in tables can lead to domain knowledge which includes concepts, relationships and constraints (ontologies) Tables on web created for human use can lead to robust domain ontologies

49 49 Table understanding What is a table? Why table normalization? What is table understanding? What is mini-ontology generation?

50 50 What is a table? “…a two-dimensional assembly of cells used to present information…” Lopresti and Nagy Normalized tables (row-column format) Small paper (using OCR) and/or electronic tables (marked up) intended for human use

51 51 ? Olympus C-750 Ultra Zoom Sensor Resolution:4.2 megapixels Optical Zoom:10 x Digital Zoom:4 x Installed Memory:16 MB Lens Aperture:F/8-2.8/3.7 Focal Length min:6.3 mm Focal Length max:63.0 mm

52 52 ? Olympus C-750 Ultra Zoom Sensor Resolution:4.2 megapixels Optical Zoom:10 x Digital Zoom:4 x Installed Memory:16 MB Lens Aperture:F/8-2.8/3.7 Focal Length min:6.3 mm Focal Length max:63.0 mm

53 53 ? Olympus C-750 Ultra Zoom Sensor Resolution:4.2 megapixels Optical Zoom:10 x Digital Zoom:4 x Installed Memory:16 MB Lens Aperture:F/8-2.8/3.7 Focal Length min:6.3 mm Focal Length max:63.0 mm

54 54 ? Olympus C-750 Ultra Zoom Sensor Resolution4.2 megapixels Optical Zoom10 x Digital Zoom4 x Installed Memory16 MB Lens ApertureF/8-2.8/3.7 Focal Length min6.3 mm Focal Length max63.0 mm

55 55 Digital Camera Olympus C-750 Ultra Zoom Sensor Resolution:4.2 megapixels Optical Zoom:10 x Digital Zoom:4 x Installed Memory:16 MB Lens Aperture:F/8-2.8/3.7 Focal Length min:6.3 mm Focal Length max:63.0 mm

56 56 ? Flight # Class From Time/Date To Time/Date Stops Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04 Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

57 57 ? Flight # Class From Time/Date To Time/Date Stops Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04 Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

58 58 Airline Itinerary Flight # Class From Time/Date To Time/Date Stops Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04 Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04

59 59 ? PlaceBonnie Lake CountyDuchesne StateUtah TypeLake Elevation10,000 feet USGS QuadMirror Lake Latitude40.711ºN Longitude110.876ºW

60 60 ? PlaceBonnie Lake CountyDuchesne StateUtah TypeLake Elevation10,000 feet USGS QuadMirror Lake Latitude40.711ºN Longitude110.876ºW

61 61 ? PlaceBonnie Lake CountyDuchesne StateUtah TypeLake Elevation10,000 feet USGS QuadMirror Lake Latitude40.711ºN Longitude110.876ºW

62 62 Maps PlaceBonnie Lake CountyDuchesne StateUtah TypeLake Elevation10,100 feet USGS QuadMirror Lake Latitude40.711ºN Longitude110.876ºW

63 63 Table normalization take any table, produce a standard row-column table with all data cells containing expanded values and type information CountryGDP/PPP Per Capita Real- Growth Rate Inflation Afghanistan$21,000,000,000$800?? Albania$13,200,000,000$3,8007.3%3.0% Algeria$177,000,000,000$5,6003.8%3.0% Andorra$1,300,000,000$19,0003.8%4.3% Angola$13,300,000,000$1,3305.4%110.0% Antigua and Barbuda $674,000,000$10,0003.5%0.4% …………… Raw table Normalized table

64 64 Normalizing across hyperlinks

65 65 Normalized table ??Population Growth rate Population Density Birth Rate Death Rate Migration Rate Life Expectancy Male Life Expectancy Female Infant Mortality Afghanistan25,824,8823.95%39.88 persons/km 2 4.19%1.70%1.46%47.82 years 46.82 years14.06% Albania3,364,5711.05%122.79 persons/km 2 2.07%0.74%-0.29%65.92 years 72.33 years4.29% Algeria31,133,4862.10%13.07 persons/km 2 2.70%0.55%-0.05%68.07 years 70.46 years4.38% American Samoa63,7862.64%320.53 persons/km 2 2.65%0.40%0.39%71.23 years 79.95 years1.02% Andorra65,9392.24%146.53 persons/km 2 1.03%0.55%1.76%80.55 years 86.55 years0.41% Angola11,5102.84%8.97 persons/km 2 4.31%1.64%0.16%46.08 years 50.82 years12.92% ………………………… Western Sahara239,3332.34%0.90 persons/km 2 4.54%1.66%-0.54%47.98 years 50.57 years13.67% World5,995,544,8361.30%14.42 persons/km 2 2.20%0.90%?61.00 years 65.00 years5.60% Yemen16,942,2303.34%32.09 persons/km 2 4.33%0.99%0.00%58.17 years 61.88 years6.98% Zambia9,663,5352.12%13.05 persons/km 2 4.45%2.26%0.08%36.72 years 37 21 years9.19% Zimbabwe11,163,1601.02%28.87 persons/km 2 3.06%2.04%?38.77 years 38.94 years6.12%

66 66 How to understand tables Captions – in vicinity of table (above, below etc) Footnotes – on annotated column labels or data cells Embedded information – in rows, columns or cells {e.g., $, %, (1,000), billions, etc} Links to other views of the table, possibly with new information

67 67 Use of normalized data Take a table as an input and produce standard records in the form of attribute-value pairs as output Discover constraints among columns Understand the data values CountryGDP/PPP Per Capita Real-Growth Rate Inflation Afghanistan$21,000,000,000$800?? Albania$13,200,000,000$3,8007.3%3.0% Algeria$177,000,000,000$5,6003.8%3.0% Andorra$1,300,000,000$19,0003.8%4.3% Angola$13,300,000,000$1,3305.4%110.0% Antigua and Barbuda $674,000,000$10,0003.5%0.4% …………… {has(Country, GDP/PPP),has(Country,GDP/PPP Per Capita), has(Country,Real-growth rate*), has(Country, Inflation*) Left-most, primary key Dollar amount (from data frame) Percentage (from data frame) Country names (from data frame) {,,,, }

68 68 Ontology generation overview Concepts of Interest Concepts with Relations Data extraction ontology Sample Documents

69 69 Example: Creating a domain ontology Has associated data frames Includes procedural knowledge Distances Duration between Time zones NameGeopolitical Entity Time Location LongitudeLatitude hasnames Latitude and longitude designates location CountryCity Has GMT

70 70 Example: Table understanding to mini-ontology generation AgglomerationPopulationContinentCountry Tokyo31,139,900AsiaJapan New York- Philadelphia 30,286,900The AmericasUnited States of America Mexico21,233,900The AmericasMexico Seoul19,969,100AsiaKorea (South) Sao Paulo18,847,400The AmericasBrazil Jakarta17,891,000AsiaIndonesia Osaka-Kobe-Kyoto17,621,500AsiaJapan ………… Niigata503,500AsiaJapan Raurkela503,300AsiaIndia Homjel502,200EuropeBelarus Zunyi501,900AsiaChina Santiago501,800The AmericasDominican Republic Pingdingshan501,500AsiaChina Fargona501,000AsiaUzbekistan Kirov500,200EuropeRussia Newcastle500,000Australia /Oceani a Australia AgglomerationPopulation CountryContinent

71 71 Example: Concept matching to ontology Merging Merge Results AgglomerationPopulation CountryContinent Time Location LongitudeLatitude hasnames Latitude and longitude designates location CountryCity NameGeopolitical Entity Continent Location LongitudeLatitude Latitude and longitude designates location NameGeopolitical Entity Population City Agglomeration Country Has GMT Time Location LongitudeLatitude hasnames Latitude and longitude designates location CountryCity NameGeopolitical Entity Has GMT

72 72 Ontology merging/growing Direct merge (no conflicts) Use results of matching phase to find similar concepts in ontologies (e.g., data value similarities, data frames, NLP, etc) Conflict resolution Interactively identify evidence and counter evidence of functional relationships among mini-ontologies using constraint resolution IDS Interaction with human knowledge engineer Issues – identify Default strategy – apply Suggestions – make

73 73 Example: Another mini-ontology generation Place LongitudeLatitude Elevation USGS Quad Area MineReservoir LakeCity/town Country State Place Name ⊎

74 74 Example: Another mini-ontology generation Place LongitudeLatitude Elevation USGS Quad Area MineReservoir LakeCity/town Country State Place Name ⊎ Location LongitudeLatitude Latitude and longitude designates location NameGeopolitical Entity Population City Agglomeration Country Merge Continent Time hasnames has GMT

75 75 Example: Concept Mapping to Ontology Merging Place Elevation USGS Quad Area MineReservoir Lake Country State ⊎ Location LongitudeLatitude Latitude and longitude designates location NameGeopolitical Entity Population Agglomeration Country Continent Time hasnames has GMT Geopolitical Entity with population City/town

76 76 Recognize Table Information Religion Population Albanian Roman Shi’a Sunni Country (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other Afganistan 26,813,057 15% 84% 1% Albania 3,510,484 20% 70% 30%

77 77 Construct Mini-Ontology Religion Population Albanian Roman Shi’a Sunni Country (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other Afganistan 26,813,057 15% 84% 1% Albania 3,510,484 20% 70% 30%

78 78 Discover Mappings

79 79 Merge

80 80 Review: the TANGO process Start out with normalized table Generate likely candidates for: Object Sets Relationship Sets Functional Constraints Inclusion Constraints/Hierarchical Structure Get help from user when needed Choose best candidate for the ontology

81 81 Generate concepts Create list of candidate concepts (usually column names)

82 82 Example 1: Generate Concepts Determine lexicalization (columns with associated values are lexical)

83 83 Example 1: Generate Concepts Current ontology

84 84 Example 1: Generate Relationships Decide relationship sets Exponential number of combinations Basic assumption: one main concept relates to all others (attributes) Goal: find central column of interest

85 85 Example 1: Generate Relationships Look for mapping between one column and title of table

86 86 Example 1: Generate Relationships Current ontology

87 87 Example 1: Generate Constraints FDs and Participation Constraints FD definition: X → Y iff (X[i] = X[j]) → (Y[i] = Y[j]) for all row indexes i and j. Unless solid case (two or more same values), only consider FDs from central object to attributes Use heuristics for setting exact participation (0:1,1:*, etc)

88 88 Example 1: Generate Concepts Numerical values are usually functionally determined by column of interest and have 0:* participation constraint.

89 89 Example 1: Generate Constraints Completed mini-ontology

90 90 Example 2: Generate Concepts SubFamily, Group, and SubGroup are generic types Enumerate column values as object sets because less than 5 divisions (recursively)

91 91 Example 2: Generate Relationships Found mapping of central column of interest to title (Language) Exceptions to basic assumption Hierarchy (enumerated object sets) Transitive FDs (X → Y, Y → Z, remove X → Z) Create ISA hierarchy from table structure

92 92 Example 2: Generate Relationships Current ontology

93 93 Example 2: Generate Hierarchical Constraints Assign members to each object set for easy calculation Find inclusion dependencies: Union – All members of parents are members of one or more child Intersection (Less common) – Child members are always in both parents Mutual exclusion – Intersection of any two child members is empty.

94 94 Example 2: Generate Hierarchical Constraints Completed mini-ontology

95 95 Future direction Start with multiple tables (or URLs) and generate mini-ontologies Identify most suitable mini-ontologies to merge by calculating which tables have most overlap of concepts Generate multiple domain ontologies Integrate with form-based data extraction tools (smarter Web search engines)


Download ppt "1 Concepts, Ontologies, and Project TANGO Deryle Lonsdale BYU Linguistics and English Language"

Similar presentations


Ads by Google