Yet another Example T This happens to be a rank-7 matrix

Yet another Example T This happens to be a rank-7 matrix
U (9x7) = S (7x7) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V (7x8) = term ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 controllability 1 1 0 observability realization feedback 0 controller 1 observer 1 transfer function 0 polynomial matrices T This happens to be a rank-7 matrix -so only 7 dimensions required Singular values = Sqrt of Eigen values of AAT

Formally, this will be the rank-k (2)
matrix that is closest to M in the matrix norm sense U (9x7) = S (7x7) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V (7x8) = U2 (9x2) = S2 (2x2) = 0 0 V2 (8x2) = T U2*S2*V2 will be a 9x8 matrix That approximates original matrix

Coordinate transformation inherent in LSI Doc rep
T-D = T-F*F-F*(D-F)T Mapping of keywords into LSI space is given by T-F*F-F Mapping of a doc d=[w1….wk] into LSI space is given by d’*T-F*(F-F)-1 For k=2, the mapping is: The base-keywords of The doc are first mapped To LSI keywords and Then differentially weighted By S-1 LSx LSy controllability observability realization feedback controller observer Transfer function polynomial matrices LSIy ch3 controller LSIx controllability

Querying T-F To query for feedback controller, the query vector would be q = [0 0 0 1 1 0 0 0 0]' (' indicates transpose), since feedback and controller are the 4-th and 5-th terms in the index, and no other terms are selected. Let q be the query vector. Then the document-space vector corresponding to q is given by: q'*TF(2)*inv(FF(2) ) = Dq For the feedback controller query vector, the result is: Dq = To find the best document match, we compare the Dq vector against all the document vectors in the 2-dimensional V2 space. The document vector that is nearest in direction to Dq is the best match. The cosine values for the eight document vectors and the query vector are: F-F D-F Centroid of the terms In the query (with scaling) -0.37 0.967 -0.94

Variations in the examples 
DB-Regression example Started with D-T matrix Used the term axes as T-F; and the doc rep as D-F*F-F Q is converted into q’*T-F Chapter/Medline etc examples Started with T-D matrix Used term axes as T-F*FF and doc rep as D-F Q is converted to q’*T-F*FF-1 We will stick to this convention

Query Expansion Add terms that are closely related to the query terms
to improve precision and recall. Two variants: Local  only analyze the closeness among the set of documents that are returned Global  Consider all the documents in the corpus a priori How to decide closely related terms? THESAURI!! -- Hand-coded thesauri (Roget and his brothers) -- Automatically generated thesauri --Correlation based (association, nearness) --Similarity based (terms as vectors in doc space)

Correlation/Co-occurrence analysis
Terms that are related to terms in the original query may be added to the query. Two terms are related if they have high co-occurrence in documents. Let n be the number of documents; n1 and n2 be # documents containing terms t1 and t2, m be the # documents having both t1 and t2 If t1 and t2 are independent If t1 and t2 are correlated >> if Inversely correlated Measure degree of correlation

Association Clusters Let Mij be the term-document matrix
For the full corpus (Global) For the docs in the set of initial results (local) (also sometimes, stems are used instead of terms) Correlation matrix C = MMT (term-doc Xdoc-term = term-term) Un-normalized Association Matrix Normalized Association Matrix Nth-Association Cluster for a term tu is the set of terms tv such that Suv are the n largest values among Su1, Su2,….Suk

Example 11 4 6 4 34 11 6 11 26 Correlation Matrix d1d2d3d4d5d6d7
Correlation Matrix d1d2d3d4d5d6d7 K K K Normalized Correlation Matrix 1th Assoc Cluster for K2 is K3

Scalar clusters Even if terms u and v have low correlations,
they may be transitively correlated (e.g. a term w has high correlation with u and v). Consider the normalized association matrix S The “association vector” of term u Au is (Su1,Su2…Suk) To measure neighborhood-induced correlation between terms: Take the cosine-theta between the association vectors of terms u and v Nth-scalar Cluster for a term tu is the set of terms tv such that Suv are the n largest values among Su1, Su2,….Suk

Example AK1 1th Scalar Cluster for K2 is still K3 1.0 0.226 0.383
Normalized Correlation Matrix AK1 USER(43): (neighborhood normatrix) 0: (COSINE-METRIC ( ) ( )) 0: returned 1.0 0: (COSINE-METRIC ( ) ( )) 0: returned 0: (COSINE-METRIC ( ) ( )) 0: returned 0: (COSINE-METRIC ( ) ( )) 0: (COSINE-METRIC ( ) ( )) 0: (COSINE-METRIC ( ) ( )) 0: returned 0: (COSINE-METRIC ( ) ( )) 0: (COSINE-METRIC ( ) ( )) 0: (COSINE-METRIC ( ) ( )) Scalar (neighborhood) Cluster Matrix 1th Scalar Cluster for K2 is still K3

Querying To query for database index, the query vector would be
since database and index are the 1st and 3rd terms in the index, and no other terms are selected. Let q be the query vector. Then the document-space vector corresponding to q is given by: q'*U2*inv(S2) = Dq To find the best document match, we compare the Dq vector against all the document vectors in the 2-dimensional doc space. The document vector that is nearest in direction to Dq is the best match. The cosine values for the eight document vectors and the query vector are: Centroid of the terms In the query (with scaling)

Yet another Example T This happens to be a rank-7 matrix

Similar presentations

Presentation on theme: "Yet another Example T This happens to be a rank-7 matrix"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yet another Example T This happens to be a rank-7 matrix

Similar presentations

Presentation on theme: "Yet another Example T This happens to be a rank-7 matrix"— Presentation transcript:

Similar presentations

About project

Feedback