Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,"— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus, Samuel Kaski *, Teuvo Kohonen Information Sciences 2004 國立雲林科技大學 National Yunlin University of Science and Technology

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Methodology Experimental Conclusion Personal Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation It would be of great help for browsing an encyclopaedia or a digital library, if the items could be preordered according to their contents. The main problem with the MDS methods is that one has to know all the items before computation of the mapping. The computation is also a heavy and even impossible task for any sizable collection of items.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective when the searching can be started that match best with the search expression, further relevant search results can be found on the basis of the pointers stored at the same or neighboring map units.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology The Batch Map version of the SOM : 5

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology-Encoding Vector space method Methods for dimensionality reduction  Latent semantic indexing  Random projection  Word clustering 6 docst1t2t3 D1101 D2100 D3011 D4100 D5111 D6110 D7010 D8010 D9001 D10011 D11101

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology-Encoding Weighting of words  IDF-based weights  Entropy over topical document classes 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology-Fast Rapid initialization by increasing the map size Faster computation of the final state of the SOM  Addressing old winners  Intial best matching units 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology-Fast Additional computational shortcuts  Parallelized Batch Map algorithm  Saving memory by reducing representation accuracy  Utilizing the sparsity of the vectors 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology-Fast Performance evaluation of the new methods  Numerical comparison with the traditional SOM algorithm  Comparison of the computational complexity 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Largest experiment: nearly 7 million patent abstracts 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Experiment on the Britannica collection  Preprocessing and document encoding  Construction of the map  Obtaining descriptive labels for text clusters and map regions  Exploration of the map 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental  Exploration of the map 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion WEBSOM method has been shown to be robust for organizing large and varied collections onto meaningfully ordered document maps. The developed computational speedups enable the creation of very large maps. 14

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Comments Advantage  … Drawback  … Application  Search Engine,  various retrieval of large document such as encyclopaedia or digital library.


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,"

Similar presentations


Ads by Google