Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.

Similar presentations


Presentation on theme: "Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of."— Presentation transcript:

1 Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of Informatics, SAS Bratislava, Slovakia mokris@aoslm.sk, skovajsova@aoslm.sk

2 Summary Development of the neural network model for information retrieval from text documents in Slovak language by vector space model of document representation Key words : Information Retrieval, Queries, Keywords, Text Documents, Neural Networks, Slovak Language

3 Text Document Analysis The most common approaches : Statistical – analyses words in text documents comparing them with keywords Linguistic – extracts linguistic units from text – phoneme, morpheme, lexeme,... Knowledge – based – uses domain models of documents descripted by ontology Porter algorithm for English

4 Slovak language is more complicated Inflection of Slovak language – grammatical forms – nouns, adjectives, pronouns,... Complicated word – timing and declension, prefixes and suffixes,... Synonyms and homonyms Phrases containing more than one word, And so on

5 System for Information Retrieval in STD Furdík, K.: Inf. Retrieval in Nat. Language by Hypertext Structure, 2003. User Indexation Document Administrator

6 How continue Utilization of Neural Networks Well trained NN is able: to simplify the Slovak text analysis, is invariance from point of Slovak words infection, perform faster linguistic analysis Disadvantage: problems with learning and static structure of NN

7 System for Information Retrieval Can be simplified

8 It means – 3 Layer Information Retrieval System Most simplified structure of system KeywordsDocuments Queries

9 Next solution - Representation the query, keywords and document layer by neural networks

10 Development of 1st NN for Keyword Determination 1st NN – Feed-Forward NN Back-Prop Type

11 Development of 2nd NN for Document Determination – Vector Space Model K(m x n) – Vector Space Matrix k kd – frequency of keywords in documents k – number of keywords d – number of documents

12 NN with Spreading Activation Function Determination of Documents

13 NN with Spreading Activation Function SAF NN is not learning Weights are setting by equation W = K

14 Experiments Model of cascade NN in Matlab Query layer - 12 characters Keyword layer - 20 keywords Document layer - 90 documents Each document - app. 50 words QTrS – 164 queries of training set KwTrS – 20 keywords of training set 2nd NN is not trained

15 Experiments 1st experiment QTsS1 – 185 queries, questions from QTsS1 belonging keywords from KwTrS Precision 0,996 2nd experiment QTsS1 – 100 queries, questions from QTsS1 belonging no keywords from KwTrS Precision 0,97

16 Disadvantage of VS Model Approach Great dimension of VS matrix Next approach – Dimension reduction of VS matrix – Latent Semantic Model

17 Latent Semantic Model Singular Value Decomposition of Vector Space Matrix K K = U S V T U – row – oriented eigen vectors of K.K T V – column – oriented eigen vectors of K.K T S – diagonal matrix of singular values of K.K T dim (S) < dim (K)

18 VS Matrix Dimension Reduction – Truncated SVD S r < S k – number of singular values s i r < k r – number of s i after dimension reduction Number of elements of reduced matrices is lower then number of elements in the matrix K

19 Solution of Dimension VSM Reduction Document relevance D is defined by: D = Q x K, Q – set of queries K – VS matrix Reduced document relevance D r is defined by: D r = Q x K r, K r = U.S r.V T – reduced VS matrix

20 Experiments Collection of 90 documents with 20 keywords – vector space matrix Dimension reduction by truncated singular value decomposition For chosen number of singular values computation the precision, recall, absolute and relative number of element k il

21 Evaluation of Experiments – precision, recall, number of elements VS red R - recall R = n retrel / n rel n retrel – number of retrieved relevant documents n rel – number of relevant documents P – precision P = n retrel / n ret ret – number of retrieved documents

22 Results s i Precision Recall Absolute Relative 10,7942 0,24 110 0,632 20,95 0,314 121 0,695 30,95 0,405 137 0,787 50,975 0,512148 0,850 70,977 0,634161 0,925 101,00,754 165 0,948 151,00,95 173 0,994 201,01,01741,0

23 Conclusion follows from table


Download ppt "Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of."

Similar presentations


Ads by Google