Presentation is loading. Please wait.

Presentation is loading. Please wait.

EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria.

Similar presentations


Presentation on theme: "EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria."— Presentation transcript:

1 EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria An Online Information Retrieval Systems by means of Artificial Neural Networks An Online Information Retrieval Systems by means of Artificial Neural Networks

2 EUROCAST’01EUROCAST’01 Introduction I What’s about ‘Information Retrieval Systems’? structured field search full-text search

3 EUROCAST’01EUROCAST’01 Indexing and Storing Search Interface Relevance classification Indexes Documents Document transfer Documents General process

4 EUROCAST’01EUROCAST’01 Documents database Original documents ‘Pure text’ files Files to index Stopwords Stemming Thesaurus List of terms Text extraction Filtering Link to documents Indexation Storing Indexes Indexing and storing

5 EUROCAST’01EUROCAST’01 Classification of Information Retrieval Systems ClassificationClassification Free dictionary Clustering Latent Semantic Indexing Statistics Self-organising ANN In words In n-grams Inverse indexes Pre-established dictionary Vectorial representation

6 EUROCAST’01EUROCAST’01 Inverse index

7 EUROCAST’01EUROCAST’01 inputs bmu Neighbourhood radio Kohonen’s topological map Fritzke’s growing topological maps Self-organising ANN

8 EUROCAST’01EUROCAST’01 Distances * * * * * * * * * * * * * * * * * * * 0.14 0.17 0.23 0.27 0.31 0.38 0.42 0.47 0.57 0.61 0.69 0.73 0.79 0.84 0.89 0.91 0.96 0.99 7 clusters Clustering statistics

9 EUROCAST’01EUROCAST’01 AkAk m x n Documents Terms Singular Value Decomposition = U  VtVt Term vectors Document vectors k k m x r r x rr x n XX Query New documents New termsLSILSI

10 EUROCAST’01EUROCAST’01 Competitive networks (self-organising, p.e.) a processor in output layer with non-null response Radial basis networks a continuos response, generally in one layer Multilayer perceptrons similar to radial networks, except in activation function and operations made at the connections ANN for classification

11 EUROCAST’01EUROCAST’01ProposalProposal Information Retrieval System doc1 doc2 doc3 doc4.......... doc n w1 w2 w3 w4.......... wn dictionary word in binary representation documents in output layer COES: Spanish dictionary developed by Santiago Rodriguez and Jesús Carretero Documents: Spanish Civil Code Articles

12 EUROCAST’01EUROCAST’01 Test and Results I Results: Error function tends to mistaken minima (gradient is essentially zero) The Neural Network needs a processor for each word in the dictionary, i.e., the network isn’t compact Neural network with Radial Basis Functions Error function: mean squared error, entropia Nº documents: 93 Nº words in dictionary: 140 Conclusion: ordinary RBF approach is not appropriate, it is necessary a change of approach or a change of network; we present another network: MLP

13 EUROCAST’01EUROCAST’01 Test and Results II Multilayer Perceptron with tanh activation function Error function: mean squared error, entropia Nº documents: 10Nº words in dictionary: 14 Architecture: 10x5x10 ; 10x7x10 ; 10x10x10 Optimisation methods: Conjugate Gradient, Quasi-Newton with lineal and parabolic minimisation. Results: A 10x5x10 architecture can learn the training set The optimisation method can have a definitive importance The same method, in different programs, offers different results The error function does not make much of a difference Conclusion: In order to gain insight into optimisation process, we program the network Results: A 10x5x10 architecture almost learn the training set, in 10x10x10 is perfect Quasi-Newton with parabolic minimisation is the most efficient method Mean squared error offers better results than entropia. Sorting the training set by number of occurrences or scaling the output between 0 and 1 doesn’t offer better results.

14 EUROCAST’01EUROCAST’01 Test and Results III Future: Growing output layer with more documents. It will be necessary to increase the number of hidden neurons when the error becomes high.

15 EUROCAST’01EUROCAST’01 What’s an Information Retrieval System What does it work? Classification of IRS Proposal A neural network in which each output layer processor represents a document and the input layer receives words in binary representationResults Promising results of MLP in a toy problem ConclusionsConclusions


Download ppt "EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria."

Similar presentations


Ads by Google