Presentation on theme: "Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich."— Presentation transcript:
Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich
Motivation users’ information needs often have a temporal dimension, but traditional information retrieval systems do not exploit the temporal content in documents. query: PM United Kingdom 2000 search engine is not aware that 2000 is actually mentioned implicitly by the document an approach which recognizes and exploits temporal references in documents to yield better search results !
Example Temporal Queries Broad Queries British colony 17 th century Economic situtation Germany 1920s President assasination 1950 – 2000 Specific Queries US president October 1962 Pope 1940s Academy awards best actor 1975 Ambiguous Queries George Bush 1990 vs. George Bush 2007 Gulf war 1991 vs. Gulf war 2005
Language Modeling for Information Retrieval Time Modeling for Temporal Information Retrieval Combining Text Relevance with Temporal Relevance Experimental Results Outline
Language Model: a statistical model to generate text Language Modeling: the task of estimating the statistical parameters of a language model Language Modeling for IR: the problem of estimating the likelihood that a query and a document could have been generated by the same language model In practical IR approaches: Unigram Language Model words occur independently Language Modeling for Information Retrieval
1)document : a sample from a language model assume an underlying multinomial probability distribution over words for each document estimate statistics of this distribution: P[word] 2)estimate the likelihood that the query is generated by this distribution 3)rank the documents by P(q | d ) document infer M d : P [ word | M d ] Language Modeling for IR
General approach similar to LM approach based on a generative model which generates temporal references temporal model splits query into 2 parts: text query and temporal query Probabilistic mechanism for producing temporal content of the document each time reference generated by a different generative temporal model for generating a time reference 1)first choose a temporal model 2)then generate a time reference using this temporal model Temporal Modeling for Temporal Retrieval
Estimating temporal query likelihood Infer a temporal model from each temporal reference in the document Estimate the likelihood that the temporal query is generated by one of the models which generated the temporal content of the document Temporal query generation probability Temporal Modeling
A probabilistic model to generate temporal references What kind of distribution? How can we estimate its parameters? What is a temporal model?
Temporal Modeling A probabilistic model to generate temporal references What kind of distribution? How can we estimate its parameters? Formalize the problem in a goal-oriented way, We should infer a temporal model from each time interval (sample time interval) This temporal model should be able to generate all time intervals which are relevant to the sample interval What is a temporal model?
Assumptions: only relevant if they intersect the generative model inferred should be able to produce subintervals, superintervals, overlapping intervals of the interval in the document probability of generating an intersecting time interval should be proportional to the length of intersection query: 1980 – 1990 1980 – 1989 is more relevant than 23 March 1984 Appropriate probabilistic model: 2 underlying triangular distributions one for start, one for end, sub2 lOverlap s t sub1 sup1 sup2 rOverlap e 1. Approach
se +1 u elq s - 1 1. Approach r1r2r3r4 nonzero probability for intersecting intervals r1 – r3 : left overlaps r1 – r4 : super intervals r2 – r3 : subintervals r2 - r4 : right overlaps interval [s,e] has the highest probability probability decreases to the left and right resulting in lower probability for intervals which have smaller intersection lengths
Assumptions: Only relevant if they are positioned closely to each other on the time axis and have similar lengths | start1 – start2 | < a | length1 – length2 | < b The generative model inferred should be able to produce temporal intervals in some neighbourhood on the time axis s l t ∆s ∆l 2. Approach
ss+as -a ll+b l-b 2. Approach Temporal interval x = s, y = l has the highest probability Probability decreases as start point moves away from s and as length moves away from l
Query: George Bush 1990 Experimental Results-3 TerrierBooleanOur Method George_W._Bush_insider_tr ading_allegations Bush_familyPresident_Bush Bush_familyBush_administration Early_life_of_George_W._B ush Andrew_CardPresident's Council of Advisors on Science and Technology George_H._W._BushApproval_ratingGeorge_H._W._Bush C_Boyden_GrayBrent_ScowcroftArbusto_Energy