Presentation is loading. Please wait.

Presentation is loading. Please wait.

Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano.

Similar presentations


Presentation on theme: "Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano."— Presentation transcript:

1 Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano

2 Goal 2: Knowledge Management (Polimi) Goal 2: Knowledge Management (Polimi) Activity 4: Activity 4: Knowledge extraction from natural language actions (Polimi + IBM + Bari) Activity 6: Activity 6: Knowledge extraction, modeling and integration from semi-structured information sources, driven by domain ontologies (PoliMI) Activity 7: Activity 7: Knowledge fusion, tailoring and dissemination for business model redesign (PoliMI)

3 Context-aware Web Portal

4 Contextual data analysis At GialloRosso the oenologist and the agronomist interact with the data related to harvesting and to the wine ageing –the information they interact with depend on their role and on the workflow phase –The agronomist inserts information related to the nature of the natural phoenomena –The agronomist and the oenologist ask information related to the phase At BiancoRosso the sales manager: – analyzes sales data – in a different moment analyzes the market trends, then –reads similar information in natural language from the web GialloRosso performs market analyses by accessing its own information combined with market information collected by its ally BiancoRosso

5 GialloRosso Logical Schema

6 BiancoRosso Logical Schema VINO(ID_Vino, nome, vinificazione, invecchiamento, denominazione, temperatura, min_temp, note) EVENTO(ID_Evento, nome, tipo, data, luogo) TRENDSETTER(ID_Trend, nome, professione) FONTE(ID_Fonte, nome, uri, tipo, rilevanza, provenienza, descrizione) DOCUMENTO(ID_Doc, riassunto, url, data, autore, titolo, argomento, descrittore, ID_Fonte) VALUTAZIONEMERITO(ID_valutazione, descrizione, giudizio, lingua) RISULTATORICERCA(ID_risultato, ID_vino, ID_evento, ID_trend, ID_fonte, ID_doc, posizione, ID_valutazione)

7 Context-aware data tailoring

8 Data tailoring via view composition

9 Context Dimension Tree

10 Some relevant areas

11 AT GIALLOROSSO THE OENOLOGIST AND THE AGRONOMIST INTERACT WITH THE DATA RELATED TO CULTIVATION AND TO THE CELLAR

12 A PORTION OF THE CDT OF OUR SCENARIO oenologist

13 Some contextual views C1= C2 = C3= C4 =

14 Some contextual queries The agronomist during the harvesting phase (context C1) wants to collect all the available information coming from sensors: SELECT m.date_time,m.value,s.s_id,s.meas_unit FROM sensor s, measure_data m WHERE s.s_id=m.s_id; S/he obtains only the information from sensors placed in the vineyards (see Rel(C1))

15 Some contextual queries The oenologist during the harvesting phase (context C3) wants to collect all the available information about bottles of Aglianico wine: SELECT * FROM bottle b WHERE b.appellation="aglianico"; But the query is out of context, in the context C3 only information about vineyard and grapevine are available for the oenologist.

16 Some more contextual queries The previous query makes sense in context C4, where the oenologist is in the ageing phase: SELECT * FROM bottle b WHERE b.appellation="aglianico"; Produces a non- empty result.

17 AT BIANCOROSSO: 1.THE SALES MANAGER ANALYZES SALES DATA 2.THE OENOLOGIST ANALYZES WINE FEATURES TO DESIGN A NEW WINE 3.THEN S/HE READS SIMILAR INFORMATION IN NATURAL LANGUAGE FROM THE WEB 4.ALSO INTENSIONAL QUERIES ARE PERFORMED

18 Sales and promotions planning (Q1) Sales and promotions planning for events and festivals The sales manager of BiancoRosso wants to select the wines to promote for each event or festival –For each event or type of event he/she needs to identify the most related wines –Interesting wines for each event can be obtained by analyzing frequent rules in the form EventType=value Wine=value E.g., EventType=Summer party Wine=White wine support=20%, confidence=36%

19 Sales and promotions planning (Q2) Sales and promotions planning depending on time periods The sales manager wants to plan specific promotions for each time period of the year –For each time period (e.g., month) the manager needs to select the most related wines –Interesting wines can be obtained by analyzing frequent rules in the form Month=value Wine=value E.g., Month=June Wine=White wine support=20%, confidence=36%

20 Design of wine (Q3) Analysis of the main characteristics of wines The oenologist of BiancoRosso wants to produce new wines He/she needs to know the main characteristics of each wine to select the most interesting wines to produce –He/she obtains the characteristics of each wine by exploiting rules in the form Wine=value Characteristic=value E.g., Wine=White wine Characteristic=Mainly drunk in a specific time period support=6%, confidence=100%

21 Design of wine (Q4) Identification of correlations between wines and time periods The time period in which each wine is mainly consumed is useful to select the wines to produce For each wine the oenologist wants to obtain the time period (e.g., month) in which the wine is mainly consumed –Allows selecting wines related to time periods not already covered by the wines currently produced by BiancoRosso –He/she uses rules in the form Wine=value Month=value E.g., Wine=White wine Month=June support=20%, confidence=100%

22 Design of wine (Q5) Identification of correlations between wines and information sources Once the oenologist has selected the new wines to be produced, he/she needs to identify the sources containing documents related to the selected wines –The oenologist identifies the sources containing information about the wines of his/her interest by exploiting the following rules Wine=value Source=value E.g., Wine=Montello e colli asolani cabernet superiore Source=Gambero Rosso support=11%, confidence=100%

23 DIESIRAE A semantic search engine based on Natural Language Processing

24 Knowledge Management

25 Knowledge Indexing & Extraction: Goals Domain model Ontology (W3C OWL standard) –Describes the concepts of the domain Domain vocabulary Semantic Network –Describes the lemmas of the domain Mapping model Stochastic model –2° order HMM-inspired model –Transition probs approximated by means of MaxEnt models –Solves mapping ambiguities Queries: –Keyword-based (AND/OR; max probability/exaustive) –Phrase-based (Disambiguated Word queries and Ontological queries)

26 Knowledge indexing & extraction: Functionalities TrainingIndexing, querying, and extending

27 Knowledge indexing & extraction: Information Extraction Engine TrainingIndexing, querying, and extending Linguistic Context Extractor: –Calls linguistic tools (Stanford Parser, FreeLing, JavaRAP,…) –words W i (lemmas L i, linguistic context information I i ) MaxEnt Models: –Calculates HMM transition probabilities (takes in account the linguistic context info) Extended Viterbi: –(L i, I i ) concepts C i TF-IDF: –Document ranking, based on concept frequencies

28 Art deco Wine Domain Ontology

29 Keyword-based queries Sequence of isolated words –No linguistic structure Exhaustive AND/OR keywords –No concept disambiguation –Searches for multiple tuples –Example: light wine several meanings found… country wine search for instances… taste wine search for subclasseses… Max probability AND/OR keywords –Searches for a single tuple –Exploits the a-priori concept probabilities –Example: [light wine] max probability meaning

30 Phrase-based queries Phrase –Linguistic structure –Context-based disambiguation Disambiguated Word queries –Context used for concept disambiguation Index the phrase ( extract concepts) Search for AND-ed concepts –Example: (fruit taste) disambiguates fruit Ontological queries –Context used to select the request to the ontology Indexes the sentences Select the request; searches the ontology for the mapped concepts –Example: type of tannins in wine instance list

31 GIALLOROSSO PERFORMS MARKET ANALYSES BY ACCESSING ITS OWN INFORMATION COMBINED WITH MARKET INFORMATION COLLECTED BY ITS ALLY BIANCOROSSO

32 The Integration problem from the user point of view DATA SOURCE 1 (RDBMS) DATA SOURCE 2 (XML) DATA SOURCE 3 (WWW) GLOBAL KNOWLEDGE INTERFACE queryanswer DATA SOURCE 4 (Base station) User APPLICATION

33 Information integration in ART DECO Information integration in ART DECO

34 Knowledge retrieval from the sources In order to integrate the two original sources, we define the following query to populate the ontology: PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions#http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl#file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl# SELECT ?w1 ?w2 ?wn1 ?wn2 ?wb ?bq ?dse ?dso ?sn FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owlfile:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm. ?wb rdf:type do:WineBottle. ?wb do:containsWine ?w1. ?wb do:bottleQuantity ?bq. ?w1 do:appellationInFarm ?wn1. ?w2 do:appellationInDocument ?wn2. ?w2 rdf:type do:WineInDocument. ?dse rdf:type do:DocSearch. ?dso rdf:type do:DocSource. ?dse do:searchWineID ?w2. ?dse do:searchSrcID ?dso. ?dso do:docSrcName ?sn. }

35 Query 1 Quantity of bottles (in the GialloRosso DB) available for each wine cited by the web source Percorsi di Vino (stored in the BiancoRosso DB): PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions#http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl#file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl# SELECT ?wine_name sum(?bottle_quantity) FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owlfile:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm. ?wb rdf:type do:WineBottle. ?wb do:containsWine ?w1. ?wb do:bottleQuantity ?bottle_quantity. ?w1 do:appellationInFarm ?wn1. ?w2 do:appellationInDocument ?wine_name. ?w2 rdf:type do:WineInDocument. ?dse rdf:type do:DocSearch. ?dso rdf:type do:DocSource. ?dse do:searchWineID ?w2. ?dse do:searchSrcID ?dso. ?dso do:docSrcName ?source_name. FILTER regex(?source_name, PercorsiDiVino") FILTER fn:contains(?wine_name, ?wn1) } GROUP BY ?wine_name ?source_name

36 Query 2 Which sources (from BiancoRosso) cite wines of which we (GialloRosso) have at least a bottle available? PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions#http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl#file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl# SELECT ?wine_name ?source_name FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owlfile:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art- deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm. ?wb rdf:type do:WineBottle. ?wb do:containsWine ?w1. ?wb do:bottleQuantity ?bottle_quantity. ?w1 do:appellationInFarm ?wn1. ?w2 do:appellationInDocument ?wine_name. ?w2 rdf:type do:WineInDocument. ?dse rdf:type do:DocSearch. ?dso rdf:type do:DocSource. ?dse do:searchWineID ?w2. ?dse do:searchSrcID ?dso. ?dso do:docSrcName ?source_name. FILTER (?bottle_quantity > 0) FILTER fn:contains(?wine_name, ?wn1) } GROUP BY ?wine_name ?source_name

37 Q & A (If you see this slide weve not run out of time)

38 Part 3 of the book Ontology-based knowledge elicitation: an architecture (Chapter editor Licia Sbattella, Roberto Tedesco, Giorgio Orsi, Politecnico di Milano, Marcello Montedoro, IBM Italia) Knowledge extraction from Natural Language (Chapter editor Licia Sbattella, Roberto Tedesco, Politecnico di Milano) Knowledge extraction from event flows (Chapter editor Alberto Sillitti, Università di Bolzano) Context-aware knowledge querying in a networked enterprise (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Fabio A. Schreiber, Politecnico di Milano, Teresa Baldassare, Università di Bari) On-the-fly and Context-Aware Integration of Heterogeneous Data Sources (Chapter editors Giorgio Orsi, Letizia Tanca, Politecnico di Milano) A methodology for context-driven data-warehouse design (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Letizia Tanca, Politecnico di Milano)


Download ppt "Goal 2 Activities 4, 6, 7 Letizia Tanca Politecnico di Milano."

Similar presentations


Ads by Google