Presentation is loading. Please wait.

Presentation is loading. Please wait.

BOOLEAN INFORMATION RETRIEVAL 1Adrienn Skrop. Boolean Information Retrieval  The Boolean model of IR (BIR) is a classical IR model and, at the same time,

Similar presentations


Presentation on theme: "BOOLEAN INFORMATION RETRIEVAL 1Adrienn Skrop. Boolean Information Retrieval  The Boolean model of IR (BIR) is a classical IR model and, at the same time,"— Presentation transcript:

1 BOOLEAN INFORMATION RETRIEVAL 1Adrienn Skrop

2 Boolean Information Retrieval  The Boolean model of IR (BIR) is a classical IR model and, at the same time, the first and most adopted one.  It is used by virtually all commercial IR systems today.  The BIR is based on:  Boolean Logic and  classical Sets Theory in that both the documents to be searched and the user's query are conceived as sets of terms.  Retrieval is based on whether or not the documents contain the query terms. 2Adrienn Skrop

3 Boolean logic 3Adrienn Skrop

4 Proposition  The proposition is a statement (formulation, assertion) which can be assigned either a value T or a value F (there is no third alternative), where T and F are two different values, i.e., T  F.  For example, T = true, F = false, or T = yes, F = no, or T = white, F = black, or T = 1, F = 0. The values T and F will be referred to as truth values. A proposition cannot be true and false at the same time (principle of noncontradiction). 4Adrienn Skrop

5 Example  “I am reading this text.” is a  true  proposition.  The sentence “The sun is shining.” is also a proposition because either the value T or F can be assigned to it.  The sentence „The cooks wearing red hats are playing football at the North Pole.“ becomes a proposition if a truth value can be assigned to it.  In general, it is not a necessary quality of an assertion or proposition to be true. For example, the proposition “It is raining.” may be true or false. However, there are propositions that are ‘absolutely’ true (e.g., “The year 2001 is the first year of the 21st century.”) 5Adrienn Skrop

6 Negation  The negation of a proposition P is a proposition denoted by  P and pronounced “not P”. If P is true, then  P is false, and if P is false, then  P is true.  Hence,  (  P) is always P (law of double negation). Truth table of logical negation  “I am not reading this text.” is a  false  proposition, and is the negation of the proposition “I am reading this text.”. 6 P PP TF FT Adrienn Skrop

7 Conjunction  Given two propositions: P, Q. The proposition denoted by P Λ Q (pronunciation: “P and Q”) is called conjunction. The conjunction is true if and only if both P and Q are true, and false otherwise.  Thus, P Λ (  P) is always false (law of contradiction). Truth table of logical conjunction 7 PQ P Λ Q TTT TFF FTF FFF Adrienn Skrop

8 Example  “I am reading this text. Λ It is raining.” is a proposition, and its truth value can be assigned by the Reader.  “I am thinking at myself. Λ A bicycle has two wheels.” is a proposition (the Reader can assign a truth value to it), albeit one would rarely link its two constituent propositions into one sentence in everyday speech. 8Adrienn Skrop

9 Disjunction  Given two propositions: P, Q. The proposition denoted by P V Q (pronunciation: “P or Q”) is called disjunction. The disjunction is false if and only if both P and Q are false, and true otherwise.  Thus, P V (  P) is always true (law of excluded third).  Truth table of logical disjunction 9 PQ P V Q TTT TFT FTT FFF Adrienn Skrop

10 Example  “I am reading this text. V It is raining.” is a true proposition (regardless of whether it is actually raining or not). 10Adrienn Skrop

11 Sets theory 11Adrienn Skrop

12 Sets  The notion of set is a fundamental one. It does not have a mathematical definition. A set is a collection of distinct objects.  The objects in a set are called elements. If an object x is an element of a set S (equivalent formulation: x belongs to S), this is denoted as x  S. x  S means that x does not belong to S.  It is very important to note that:  An element can occur at most once in a set.  The order of the elements in a set is unimportant. 12Adrienn Skrop

13 Sets  A set can be given  by enumerating its elements between brackets, e.g., A = {a 1, a 2,...,a n }, or  by giving a property P(x) all elements must share as follows: A = {x | P(x)}.  A set having a fixed number of elements is finite, and infinite otherwise.  The empty set contains no elements and is denoted by . 13Adrienn Skrop

14 Example  ℕ = {1, 2,…,n,…} denotes the set of natural numbers.  ℤ ={...,  2,  1, 0, 1, 2,...} denotes the set of integer numbers.  ℚ denotes the set of rational numbers,  ℝ denotes the set of real numbers,  ℂ denotes the set of complex numbers,  {thought, ape, quantum, Rembrandt} is a set.  {mammal | water content of mammal’s milk is less than 20%} is a set 14Adrienn Skrop

15 Union  The union of sets A and B is denoted by the symbol  and defined as follows: A  B = {x | (x  A) V (x  B)}. 15Adrienn Skrop

16 Union example {thought, ape, quantum, Rembrandt}  {1, 2} = {thought, ape, quantum, Rembrandt, 1, 2}.  Note that the operation of union is a purely formal one (just like the other set operations): it does not require that the elements of the sets be compatible with each other, or have the same nature, in any way. 16Adrienn Skrop

17 Union Set union satisfies the following properties (as can be easily checked using the definitions of sets equality and union):  Commutativity: A  B = B  A, for any two sets A, B;  Associativity: A  (B  C)= (A  B)  C, for any three sets A, B, C;  Idempotency: A  A = A, for any set A. 17Adrienn Skrop

18 Intersection  The intersection of sets A and B is denoted by the symbol  and defined as follows: A  B = {x | (x  A) Λ (x  B)}. Visualisation of set intersection A  B 18Adrienn Skrop

19 Disjoint sets  If A  B = , sets A and B are said to be disjoint sets Visualisation of the disjoint sets A and B: 19Adrienn Skrop

20 Intersection example {thought, ape, quantum, Rembrandt}  {thought, Rembrandt, 1, 2} = {thought, Rembrandt}. Note that the result of the intersection consists of the elements which are exactly the same. 20Adrienn Skrop

21 Intersection Set intersection satisfies the following properties (as can be easily checked using the definitions of sets equality and intersection):  Commutativity: A  B = B  A, for any two sets A, B;  Associativity: A  (B  C)= (A  B)  C, for any three sets A, B, C;  Idempotency: A  A = A, for any set A. 21Adrienn Skrop

22 Difference  The difference of sets A and B (in this order) is denoted by the symbol \, and is defined as follows A \ B = {x | (x  A) Λ (x  B)}. Note: in general, A \ B  B \ A (i.e., set difference does not commute) 22Adrienn Skrop

23 Powerset  The powerset  (A) of a set A is defined as follows:  (A) = {X | X  A}, i.e., the set of all subsets of A.  The empty set  is a member of the powerset of any set A, i.e.,    (A).  Example   ({thought, ape, quantum}) = { , {thought}, {ape}, {quantum}, {thought, ape}, {thought, quantum}, {ape, quantum}, {thought, ape, quantum}}. 23Adrienn Skrop

24 Notations Given a finite set T = {t 1, t 2,...,t j,...,t m } of elements called terms (e.g. words or expressions  which may be stemmed  describing or characterising documents such as, for example, keywords given for a journal article) 24Adrienn Skrop

25 Notation Given a finite set D = {D 1,...,D i,...,D n }, D i  (T) of elements called documents. These documents are formally conceived, for retrieval purposes, as being represented by sets of terms. 25Adrienn Skrop

26 Notation Given a Boolean expression Q called a query. For example (the terms are A,B,C,D,E): 26Adrienn Skrop

27 Boolean Information Retrieval Retrieval is defined as follows: 1. The set S i of documents are obtained that contain or not the term under focus: for term A: S i = {D | A  D} for negated term A, i.e., ¬A: S i = {D | A ∉ D} 2. Those documents are retrieved in response to Q, which belong to the set obtained as a result of the corresponding sets operations: intersection  corresponds to logical AND, union  corresponds to logical OR. 27Adrienn Skrop

28 BIR example 1  Q = A OR (B AND C)  S 1 results set for term A,  S 2 results set for term B,  S 3 results set for term C,  The retrieved set in response to Q: S 1  (S 2  S 3 ) 28Adrienn Skrop

29 BIR example 2 Let the set of original documents be O = {O 1, O 2, O 3 } where:  O 1 = Bayes' Principle: The principle that, in estimating a parameter, one should initially assume that each possible value has equal probability (a uniform prior distribution).  O 2 = Bayesian Decision Theory: A mathematical theory of decision-making which presumes utility and probability functions, and according to which the act to be chosen is the Bayes act, i.e. the one with highest Subjective Expected Utility. If one had unlimited time and calculating power with which to make every decision, this procedure would be the best way to make any decision. 29Adrienn Skrop

30 BIR example 2  O 3 = Bayesian Epistemology: A philosophical theory which holds that the epistemic status of a proposition (i.e. how well proven or well established it is) is best measured by a probability and that the proper way to revise this probability is given by Bayesian conditionalisation or similar procedures. A Bayesian epistemologist would use probability to define, and explore the relationship between, concepts such as epistemic status, support or explanatory power. 30Adrienn Skrop

31 BIR example 2  Let the set T of terms be: T = {t 1 = Bayes' Principle, t 2 = probability, t 3 = decision-making, t 4 = Bayesian Epistemology}.  Then, the set D of documents is as follows: D = {D 1, D 2, D 3 } where  D 1 = {Bayes' Principle, probability},  D 2 = {probability, decision-making},  D 3 = {probability, Bayesian Epistemology}.. 31Adrienn Skrop

32 BIR example 2  Let the query Q be: Q = probability Λ decision-making  Firstly, the following sets S 1 and S 2 of documents D i are obtained:  S 1 = {D i | probability  D i } = {D 1, D 2, D 3 },  S 2 = {D i | decision-making  D i } = {D 2 }.  Finally, the following documents D i are retrieved in response to Q:  {D i |D i  S 1  S 2 } = {D 1, D 2, D 3 }  {D 2 } = {D 2 }. 32Adrienn Skrop


Download ppt "BOOLEAN INFORMATION RETRIEVAL 1Adrienn Skrop. Boolean Information Retrieval  The Boolean model of IR (BIR) is a classical IR model and, at the same time,"

Similar presentations


Ads by Google