1 © 2005 Major Web Intelligence Tools. 2 © 2005 Web Intelligence Tools I. Collection –Offline Explorer –SpidersRUs (AI Lab) –Google Scholar II. Analysis.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
An Introduction to GATE
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Information Retrieval in Practice
Definition, Research Challenges and Major Tools
Introduction to Weka and NetDraw
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
FIRST COURSE Getting Started with Microsoft Office 2007.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Overview of Search Engines
Lecture-8/ T. Nouf Almujally
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
UNIT-V The MVC architecture and Struts Framework.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
With Internet Explorer 9 Getting Started© 2013 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring the World Wide Web with Internet Explorer.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
WorkPlace Pro Utilities.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Fundamentals of Information Systems, Fifth Edition
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
Chapter 1 Review Chapter 2 Whatcha Gonna Do???
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Chapter 8: Web Analytics, Web Mining, and Social Analytics
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Searching the Web for academic information Ruth Stubbings.
Data mining in web applications
Information Retrieval in Practice
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Waikato Environment for Knowledge Analysis
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
AGMLAB Information Technologies
Dept. of Computer Science University of Liverpool
Slides prepared by Sarah Benis Scheier-Dolberg
Data Mining CSCI 307, Spring 2019 Lecture 7
Search for Article Citation
Presentation transcript:

1 © 2005 Major Web Intelligence Tools

2 © 2005 Web Intelligence Tools I. Collection –Offline Explorer –SpidersRUs (AI Lab) –Google Scholar II. Analysis (Data and Text Mining) –Google APIs –Google Translation –GATE –Arizona Noun Phraser (AI Lab) –Self-Organizing Map, SOM (AI Lab) –Weka III. Visualization –NetDraw –JUNG –Analyst’s Notebook and Starlight

3 © 2005 Collection: Offline Explorer Developed by MetaProducts Corporation, Offline Explorer can download Web sites to your hard disk for offline browsing. Advantages of Offline Explorer –Save Time: D ownload up to 500 files simultaneously. –Save Yesterday's Web Sites for Tomorrow's Use –Monitor Web Sites –Mine your Data TextPipe tool in Offline Explorer Pro edition can extract or change the desired data, or even explort it to a database.

4 © 2005 Offline Explorer Project list Project properties setup window File filters, URL filters, and other advanced properties. Download URLs Download level File modification check

5 © 2005 SpidersRUs SpidersRUs Digital Library Toolkit was developed by Artificial Intelligence Lab at the University of Arizona. Provide modular tools for spidering, indexing, searching for building digital libraries in different languages in a simple DIY (Do-It- Yourself) way. Users can create their own search engines easily and quickly via the friendly user interface. SpidersRUs can automate the development of vertical search engines in different domains and languages. It can work on non- English languages such as Asian and Middle East languages.

6 © 2005 SpidersRUs An example of a Chinese search engine built by SpidersRUs Keyword search Search results

7 © 2005 Google Scholar Google Scholar provides a simple way to broadly search for scholarly literature. Features of Google Scholar: –Search diverse sources from one convenient place –Find papers, abstracts and citations –Locate the complete paper through your library or on the web –Learn about key papers and scholars in any area of research

8 © 2005 Google Scholar Search for “Bioterrorism” in Google Scholar List of papers citing this paper 366 citations

9 © 2005 Analysis: Google APIs Google provides many APIs to help you quickly develop your own applications. Examples of Google APIs: –Google API for Inlink: Discovers what pages link to your website. –Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. –Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages. –Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. –Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages. –YouTube Data API: Integrates online videos from YouTube into your applications.

10 © 2005 Example: Google API for Inlink Input “link URL” and search Results: all the related inlink Web pages

11 © 2005 Google Translation Google's Translate function. The input and output languages can be Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portugese, Russian or Spanish. Major functions of Google Translation include: –Search multilingual Web pages Search the Internet in one language and get the results in another one. –Translate text Translate free text into multiple languages. –Translate a Web page Translate a Web page into multiple languages.

12 © 2005 Google Translation Translate text from Arabic to English Search multilingual Web pages Translate a Web page

13 © 2005 GATE Generalised Architecture for Text Engineering (GATE) is a toolkit for Text Mining. It was developed by NLP group at the University of Sheffield (UK). Information Extraction tasks: –Named Entity Recognition (NE) Finds names, places, dates, etc. –Co-reference Resolution (CO) Identifies identity relations between entities in texts. –Template Element Construction (TE) Adds descriptive information to NE results (using CO). –Template Relation Construction (TR) Finds relations between TE entities. –Scenario Template Production (ST) Fits TE and TR results into specified event scenarios. GATE also includes: –Parsers, stemmers, and Information Retrieval tools; –Tools for visualizing and manipulating ontology; and –Evaluation and benchmarking tools.

14 © 2005 GATE * Picture is from Project information Results display Attributes

15 © 2005 Arizona Noun Phraser The Arizona Noun Phraser was developed by Artificial Intelligence Lab at the University of Arizona. The Arizona Noun Phraser is made up of three major components, a tokenizer, a part- of-speech tagger, and a phrase generation tool. It generates precise topic descriptions. –Tokenizer Separates punctuation and symbols from text without affecting content. –Part of Speech (POS) Tagger Uses both lexical and contextual disambiguation in POS assignment; Lexicons include: Brown Corpus, Wall Street Journal, and Specialist Lexicon. –Phrase Generation Uses Simple Finite State Automata (FSA) of noun phrasing rules; Breaks sentences and clauses into grammatically correct noun phrases.

16 © 2005 Arizona Noun Phraser

17 © 2005 SOM The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona. –Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions.

18 © 2005 SOM Example: FMD Paper Content Map (2001~2005) Different Topics Topic region Topic # of documents belonging to this topic Warm colors represent new topics. Developed by AI lab at the University of Arizona

19 © 2005 Weka Weka was developed at the University of Waikato in New Zealand. Tools include: –Data preprocessing (e.g., Data Filters), –Classification (e.g., BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM), –Regression (e.g., Linear Regression, Isotonic Regression, SVM for Regression), –Clustering (e.g., Simple K-means, Expectation Maximization (EM), Farthest First), –Association rules (e.g., Apriori Algorithm, Predictive Accuracy, Confirmation Guided), –Feature Selection (e.g., Cfs Subset Evaluation, Information Gain, Chi- squared Statistic), and –Visualization (e.g., View different two-dimensional plots of the data).

20 © 2005 Weka Different analysis tools Different attributes to choose The value set of the chosen attribute and the # of input items with each value

21 © 2005 Visualization: NetDraw NetDraw is a open source program written by Steve Borgatti from Analytic Technologies for visualizing both 1-mode and 2- mode social network data. Handle multiple relations at the same time, and can use node attributes to set colors, shapes, and sizes of nodes. Pictures can be saved in metafile, jpg, gif and bitmap formats. Two basic kinds of layouts are implemented: a circle and an MDS/ spring embedding based on geodesic distance. You can also rotate, flip, shift, resize and zoom configurations.

22 © 2005 NetDraw Display setup of the nodes and relations Different functions The networks: nodes representing the individuals and links representing the relations

23 © 2005 JUNG The Java Universal Network/Graph Framework (JUNG) is a software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It was developed by School of Information and Computer Science at the University of California, Irvine. The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis: –Clustering –Decomposition –Optimization –Random Graph Generation –Statistical Analysis –Calculation of Network Distances and Flows and Importance Measures (Centrality, PageRank, HITS, etc.).

24 © 2005 JUNG Examples of visualization types * Pictures are from

25 © 2005 Analyst’s Notebook & Starlight Analyst’s Notebook, by i2: A 2D graph and timeline layout tool for crime and intelligence analysis Startlight, by Pacific Northwest Lab (PNL): A 3D network visualization and navigation tool for intelligence analysis

26 © 2005 Analyst’s Notebook, i2Starlight, PNL