Presentation is loading. Please wait.

Presentation is loading. Please wait.

Definition, Research Challenges and Major Tools

Similar presentations


Presentation on theme: "Definition, Research Challenges and Major Tools"— Presentation transcript:

1 Definition, Research Challenges and Major Tools
Web Intelligence (WI) Definition, Research Challenges and Major Tools Yang Chen UNC Charlotte

2 Outline A brief history of Web Intelligence Motivations for WI
Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion

3 A Brief History of WI 1999: Collaborative research initiatives
Ning Zhong, Data Mining and Knowledge Systems Jiming Liu, Intelligent agents and multi-agents Yiyu Yao, Information retrieval and intelligent information systems Combined research efforts with common goal: create a new sub-discipline covering theories and techniques related to web information. Unlike the development of Web and AI, the WI has relatively short history.

4 A Brief History of WI 2000: Publication of a two-page position paper on WI (Zhong, Liu, Yao, Ohsuga, COMPSAC 2000) A two page position paper on WI was first introduced at COMPSAC2000. Also formally made an announcement of the new web intelligence conference.

5 A Brief History of WI 2001: First Asia-Pacific Conference on Web Intelligence 2002: Publication of first special issue on WI in IEEE Computer 2002: Web Intelligence Consortium 2003: First edited book on WI 2005: The international WIC Institute Growth rapidly and become a hot topic in data mining, AI and Web community.

6 Outline A brief history of Web Intelligence Motivations for WI
Definition and Perspectives of WI Trends and Research Agenda Major Web Intelligence Tools Conclusion The introduction of Web Intelligence (WI) can be motivated from both academic and industrial perspectives.

7 Motivation The sheer size of Web Complexity of Web
Difficulties in the storage, management, and efficient and effective retrieval Complexity of Web Heterogeneous collection of structured, unstructured, semi-structured, interrelated, and distributed Web documents Consist texts, images and sounds Two features of the Web, the size and complexity make the WI useful and necessary. The Web contains a huge amount of interconnected Web documents known as Web pages. To deal with the huge size and complexity of the Web, one needs to study issues on the design and implementation of the Web-based information systems by combining and extending results from existing intelligent information systems. Existing theories and technologies need to be modified or enhanced.

8 Motivation Web Intelligence on the Web
The Web pages returned by most search engines contain both keywords “Web” and “Intelligence”. The co-occurrences of the two keywords show their strong association.

9 Industrial Interests in WI
Web Intelligence kis-lab.com/wi01/ Web-Intelligence Home Page Intelligence on the Web WIN: home WEB INTELLIGENCE NETWORK, smarter.net/ CatchTheWeb - Web Research, Web Intelligence Collaboration Infonoia: Web Intelligence In Your Hands Google to search for the exact phrase “Web Intelligence”. We obtained 3,660 hits. We found that many companies concentrate on WI to provide intelligent solutions to business in the new Web-based information age. In fact, the majority of the top 40 pages returned by Google is industry related.

10 Motivations Data production on the Web is at an exponential growth rate. A fast growing industrial interest in WI Only a few academic papers We need to narrow the gap between industry needs and academic research. In summrazation.

11 Outline A brief history of Web Intelligence Motivations for WI
Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion

12 What is Web Intelligence
Web Intelligence (WI) exploits the fundamental and practical impact that advanced Information Technology (IT) and innovative Artificial Intelligence (AI) will have on the Web: Integration of IT with AI Applications of AI on the Web

13 Web Intelligence System
Based on Zhong`s AWIC03 keynote talk

14 An Example

15 Advanced Questions How the customer enters VIP portal in order to target products and manage promotions and marketing campaigns? What is the semantic association between the pages the customer visited? Is the visitor familiar with the Web structure? Or is he or she a new user or a random one? Is the visitor a Web robot or other users?

16 Advanced WI System Making a dynamic recommendation to a Web user based on the user profile and usage behavior; Automatic modification of a website’s contents and organization; Combining Web usage data with marketing data to give information about how visitors used a website.

17 Advanced WI System

18 Perspectives of WI WI can be classified into four categories (based on Russel & Norvig`s scheme) The four categories vary along the two dimensions. The combination of two dimensions results the four categories. The first row focuses on the humans issues, and leads to the treatment of WI as an empirical science. It regulates that a system should have the usual human capabilities such as knowledge representation, nature language processing, reasoning, planning and learning. The second row focuses on the normative perspectives. It deals with theoretical principles and laws an WI must follow. Based on and logical mathematical.

19 Outline A brief history of Web Intelligence Motivations for WI
Definition and Perspectives of WI Research Agenda Major Web Intelligence Tools Conclusion Web Intelligence presents excellent opportunities and challenges for the research and development of new generation Web-based information processing technology.

20 Research Agenda of WI Semantic Web mining and automatic
construction of ontologies Social network intelligence

21 The Semantic Web The Semantic Web is based on languages that make more of the semantic content of the page available in machine-readable formats for agent-based computing. A “semantic” language that ties the information on a page to machine readable semantics (ontology).

22 Components of Semantic Web
A unifying data model such as RDF. Languages with defined semantics, built on RDF, such as OWL (DAML+OIL). Ontologies of standardized terminology for marking up Web resources. Tools that assist the generation and processing of semantic markup. Ontologies provides the semantic backbone for Semantic Web applications.

23 Ontologies offer Communication Sharing & Reuse Control
Normative models, Networks of relationships Sharing & Reuse Specifications, Reliability Control Classification, and Finding, sharing, discovering relationships

24 Categories of Ontologies
A domain-specific ontology describes a well-defined technical or business domain. A task ontology might be either domain-specific or reconstructed from a set of domain-specific ontologies for meeting the requirement of a task. A universal ontology describes knowledge at higher levels. Ontologies and agent technology can play a crucial role in WI by enabling Web-based knowledge processing, sharing, and reuse between applications.

25 Research Agenda of WI Semantic Web mining and automatic
construction of ontologies Social network intelligence

26 The Web as a Graph We can view the Web as a directed social network that connects people (organizations or social entities). Research Questions: How big is the graph? (outdegree and indegree) Can we browse from any page to any other? (clicks) Can we exploit the structure of the Web? (searching and mining) How to discover and manage the Web communities? What does the Web graph reveal about social dynamics?

27 Social Network Intelligence

28 Social Network

29 Outline A brief history of Web Intelligence Motivations for WI
Definition and Perspectives of WI Trends and Research Agenda Major Web Intelligence Tools Conclusion

30 Major Web Intelligence Tools
I. Collection Offline Explorer SpidersRUs (AI Lab) Google Scholar II. Analysis (Data and Text Mining) Google APIs Google Translation GATE Arizona Noun Phraser (AI Lab) Self-Organizing Map, SOM (AI Lab) Weka III. Visualization NetDraw JUNG Analyst’s Notebook and Starlight

31 Collection: Offline Explorer
Project list Project properties setup window Download URLs File filters, URL filters, and other advanced properties. Download level File modification check

32 Analysis: Google APIs Google provides many APIs to help you quickly develop your own applications. Examples of Google APIs: Google API for Inlink: Discovers what pages link to your website. Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages. Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages. YouTube Data API: Integrates online videos from YouTube into your applications.

33 GATE Information Extraction tasks: GATE also includes:
Named Entity Recognition (NE) Finds names, places, dates, etc. Co-reference Resolution (CO) Identifies identity relations between entities in texts. Template Element Construction (TE) Adds descriptive information to NE results (using CO). Template Relation Construction (TR) Finds relations between TE entities. Scenario Template Production (ST) Fits TE and TR results into specified event scenarios. GATE also includes: Parsers, stemmers, and Information Retrieval tools; Tools for visualizing and manipulating ontology; and Evaluation and benchmarking tools.

34 GATE Attributes Project information Results display

35 SOM The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona. Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions.

36 SOM Topic Topic region # of documents belonging to this topic
Different Topics Warm colors represent new topics.

37 Visualization: JUNG The Java Universal Network/Graph Framework (JUNG) is a software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It was developed by School of Information and Computer Science at the University of California, Irvine. The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis: Clustering Decomposition Optimization Random Graph Generation Statistical Analysis Calculation of Network Distances and Flows and Importance Measures (Centrality, PageRank, HITS, etc.).

38 JUNG Examples of visualization types

39 Conclusion The marriage of hypertext and internet leads to a revolution: the Web. The marriage of Artificial Intelligence and Advanced Information Technology, on the platform of Web, will lead to another paradigm shift: the Intelligent and Wisdom Web.

40 Thank You Any Question?


Download ppt "Definition, Research Challenges and Major Tools"

Similar presentations


Ads by Google