Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Similar presentations


Presentation on theme: "An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut."— Presentation transcript:

1 An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut

2 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics2 Ideas of Emergent Semantics  Improve document representation by aggregating many users’ opinions Adding keywords implicitly while querying the corpus  Living document representation instead of query reformulation Entirely new keywords Immediate change of the document representation and of the corpus index corpus/ doc repr. User query IR Query Engine Information Retrieval today

3 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics3 Outline  Basement (Background)  Construction (Architecture of Emergent Semantics)  Assessment (Evaluation)  Roof and Windows (Conclusion and Future Work)

4 Basement (Background)

5 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics5 Information Retrieval  Information Retrieval Content-oriented search on a set of documents Find an document representation to retrieve documents effectively and efficiently according to the user’s query  Today's approaches Capture the semantics of a document by analyzing syntactic information No new words in document representation  Synonyms cannot be added Query refinement Basement Construction Assessment Roof and Windows

6 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics6 Semiotic Syntax Pragmatics A tall perennial woody plant … A figure that branches from a single root … http://www.wordreference.com/definition/tree t r e e Basement Construction Assessment Roof and Windows Semantics signs  signssigns  represented objectsigns  user interpretation current IR approaches emergent semantics

7 Construction (Architecture of Emergent Semantics)

8 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics8 Components of Emergent Semantics Query Engine Annotation Filter Quality Measure Ranking Function Interpreter 21 3 4 ?! corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn Basement Construction Assessment Roof and Windows Retrieval Engine

9 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics9 Bootstrapping corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn Index the document corpus, e.g., TF/IDF, Latent Semantic Indexing Basement Construction Assessment Roof and Windows

10 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics10 Interpreter Receiving a Query 1 ? corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn Reformulate the query, e.g., query expansion, replacing terms Basement Construction Assessment Roof and Windows

11 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics11 Query Engine Interpreter Retrieval Engine Query Evaluation 21 ? corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn Select documents according to the query, e.g., inverted index of all terms Rank the list of matching documents, e.g., vector space model Groundwork Construction Assessment Roof and Windows Ranking Function

12 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics12 Query Engine Ranking Function Interpreter Query Result 21 3 ? corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn The user determines the set of relevant documents by evaluating the document surrogates. ! Basement Construction Assessment Roof and Windows Retrieval Engine

13 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics13 Query Engine Annotation Filter Quality Measure Ranking Function Interpreter Feedback 21 3 4 ?! corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn The user retrieves the relevant documents. Add the original query to the document representation Basement Construction Assessment Roof and Windows Retrieval Engine Idea: Document is found by query terms and Document is marked as relevant  All query terms are related to the document Idea: Document is found by query terms and Document is marked as relevant  All query terms are related to the document

14 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics14 Query Engine Annotation Filter Quality Measure Ranking Function Interpreter Retrieval Engine Emergent Semantics Architecture 21 3 4 corpus/ doc repr. know- legde tntn t1t1 t2t2 tntn ?! What do I mean by my query? How do most users formulate this query? How is the corpus queried? PragmaticsSemantics Syntax Basement Construction Assessment Roof and Windows

15 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics15 Example – Querying the document corpus  TF/IDF matrix of the document corpus RDBMS does not occur in the document corpus  Query Q = {RDBMS, SQL, language}  Ranked result D Query = (d 1, d 5, d 2, d 10 ) D relevant = {d 1, d 2 } databaseSQLlanguagerelational d1d1 2,7619,8303,22 d2d2 3,680,942,763,68 d3d3 ………… Basement Construction Assessment Roof and Windows TF/IDF : weight = (term freq ∙ #doc) / doc freq doc repr. ? ! Query Engine

16 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics16 Recalculation for keyword: language Example – Adding the query terms  Adding {RDBMS, SQL, language} to document representation Recalculation of the TF/IDF matrix necessary databaseSQLlanguagerelationalRDBMS d1d1 2,7619,990,023,210,29 d2d2 3,681,131,323,670,33 d3d3 …………0 databaseSQLlanguagerelational d1d1 2,7619,8303,22 d2d2 3,680,942,763,68 d3d3 ………… Recalculation for keyword: SQLRecalculation for keyword: RDBMS Basement Construction Assessment Roof and Windows Annotation Filter

17 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics17 Living Document Representation  Document representations change over time (living document representation) Many similar queries  weights of the query terms increase Unrelated query terms  document representation changes only slightly  New keywords / semantic concepts in document representation Basement Construction Assessment Roof and Windows Document representations Query

18 Assessment (Evaluation)

19 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics19 Experiment I - Setup  CACM corpus 3200 documents + 32 queries + gold standard Title and abstract tokenized and indexed using Apache Lucene  Retrieval and Ranking Vector space model with TF/IDF weights  Feedback Attach the tokenized query to all relevant document representations Basement Construction Assessment Roof and Windows

20 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics20 Add query terms to relevant document representation Identical to TF/IDF without EmSem Add query terms again Measure again (1st EmSem run)) … Run query set 2 Run query set 1 Run query set 2 Run query set 1 Exploit corpus correlations  Split the set of queries into halves Run first half and feed back all query terms Run second half Basement Construction Assessment Roof and Windows Small overlap between queries Small overlap between result sets Small overlap between queries Small overlap between result sets

21 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics21 Feeding back all query terms Groundwork Construction Assessment Roof and Windows  Run all queries and feed back all query terms

22 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics22 Experiment II - Setup  First phase Presented a wide variety of images to users Which keywords would you use to find the image with a search engine?  Second phase Rate the adequacy of the annotations Basement Construction Assessment Roof and Windows

23 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics23 Results % Users terms % users Weihnachtsmann 26.5% Brille 7.8% Nikolaus 7.8% Weihnachten 6.5% Santa Claus 6.0% Weihnachtsmann100.0% Brille 51.8% Nikolaus 91.6% Weihnachten 61.5% Santa Claus 75.0% Phase 1 Phase 2 Groundwork Construction Assessment Roof and Windows

24 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics24 Conclusions from our Experiments  Document representations become more precise over time.  A small number of terms describe an image sufficiently.  A large number of user queries can be satisfied by indexing a small number of terms. Basement Construction Assessment Roof and Windows

25 Roof and Windows (Conclusion)

26 S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics26 Roof and Windows  Architecture for emergent semantics  Users’ individual pragmatics aggregated into representation of documents  Living document representation Outlook  Applying EmSem to distributed IR Reducing the size of document representations Less network traffic Basement Construction Assessment Roof and Windows

27 Thank you!


Download ppt "An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut."

Similar presentations


Ads by Google