EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria.

EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria An Online Information Retrieval Systems by means of Artificial Neural Networks An Online Information Retrieval Systems by means of Artificial Neural Networks

EUROCAST’01EUROCAST’01 Introduction I What’s about ‘Information Retrieval Systems’? structured field search full-text search

EUROCAST’01EUROCAST’01 Indexing and Storing Search Interface Relevance classification Indexes Documents Document transfer Documents General process

EUROCAST’01EUROCAST’01 Documents database Original documents ‘Pure text’ files Files to index Stopwords Stemming Thesaurus List of terms Text extraction Filtering Link to documents Indexation Storing Indexes Indexing and storing

EUROCAST’01EUROCAST’01 Classification of Information Retrieval Systems ClassificationClassification Free dictionary Clustering Latent Semantic Indexing Statistics Self-organising ANN In words In n-grams Inverse indexes Pre-established dictionary Vectorial representation

EUROCAST’01EUROCAST’01 Inverse index

EUROCAST’01EUROCAST’01 inputs bmu Neighbourhood radio Kohonen’s topological map Fritzke’s growing topological maps Self-organising ANN

EUROCAST’01EUROCAST’01 Distances * * * * * * * * * * * * * * * * * * * 0.14 0.17 0.23 0.27 0.31 0.38 0.42 0.47 0.57 0.61 0.69 0.73 0.79 0.84 0.89 0.91 0.96 0.99 7 clusters Clustering statistics

EUROCAST’01EUROCAST’01 AkAk m x n Documents Terms Singular Value Decomposition = U  VtVt Term vectors Document vectors k k m x r r x rr x n XX Query New documents New termsLSILSI

EUROCAST’01EUROCAST’01 Competitive networks (self-organising, p.e.) a processor in output layer with non-null response Radial basis networks a continuos response, generally in one layer Multilayer perceptrons similar to radial networks, except in activation function and operations made at the connections ANN for classification

EUROCAST’01EUROCAST’01ProposalProposal Information Retrieval System doc1 doc2 doc3 doc4.......... doc n w1 w2 w3 w4.......... wn dictionary word in binary representation documents in output layer COES: Spanish dictionary developed by Santiago Rodriguez and Jesús Carretero Documents: Spanish Civil Code Articles

EUROCAST’01EUROCAST’01 Test and Results I Results: Error function tends to mistaken minima (gradient is essentially zero) The Neural Network needs a processor for each word in the dictionary, i.e., the network isn’t compact Neural network with Radial Basis Functions Error function: mean squared error, entropia Nº documents: 93 Nº words in dictionary: 140 Conclusion: ordinary RBF approach is not appropriate, it is necessary a change of approach or a change of network; we present another network: MLP

EUROCAST’01EUROCAST’01 Test and Results II Multilayer Perceptron with tanh activation function Error function: mean squared error, entropia Nº documents: 10Nº words in dictionary: 14 Architecture: 10x5x10 ; 10x7x10 ; 10x10x10 Optimisation methods: Conjugate Gradient, Quasi-Newton with lineal and parabolic minimisation. Results: A 10x5x10 architecture can learn the training set The optimisation method can have a definitive importance The same method, in different programs, offers different results The error function does not make much of a difference Conclusion: In order to gain insight into optimisation process, we program the network Results: A 10x5x10 architecture almost learn the training set, in 10x10x10 is perfect Quasi-Newton with parabolic minimisation is the most efficient method Mean squared error offers better results than entropia. Sorting the training set by number of occurrences or scaling the output between 0 and 1 doesn’t offer better results.

EUROCAST’01EUROCAST’01 Test and Results III Future: Growing output layer with more documents. It will be necessary to increase the number of hidden neurons when the error becomes high.

EUROCAST’01EUROCAST’01 What’s an Information Retrieval System What does it work? Classification of IRS Proposal A neural network in which each output layer processor represents a document and the input layer receives words in binary representationResults Promising results of MLP in a toy problem ConclusionsConclusions

EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria.

Similar presentations

Presentation on theme: "EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria.

Similar presentations

Presentation on theme: "EUROCAST’01EUROCAST’01 Marta E. Zorrilla, José L. Crespo and Eduardo Mora Department of Applied Mathematics and Computer Science University of Cantabria."— Presentation transcript:

Similar presentations

About project

Feedback