Enhanced Content Delivery Action 2: Mine the Web Industrial Day Roma, 10 Giugno 2004.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Information Retrieval in Practice
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Data Mining – Intro.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Knowledge Portals and Knowledge Management Tools
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
Databases & Data Warehouses Chapter 3 Database Processing.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
Search Engines and Information Retrieval Chapter 1.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Multimedia Databases (MMDB)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Data Mining By Dave Maung.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
WebBase: Building a Web Warehouse Hector Garcia-Molina Stanford University Work with: Sergey Brin, Junghoo Cho, Taher Haveliwala, Jun Hirai, Glen Jeh,
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Information Retrieval in Practice
Strategies for improving Web site performance
Web Mining Ref:
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Information Retrieval
Data Warehousing and Data Mining
Unit# 5: Internet and Worldwide Web
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Web Mining Research: A Survey
Presentation transcript:

Enhanced Content Delivery Action 2: Mine the Web Industrial Day Roma, 10 Giugno 2004

ECD - Industrial Day, Roma 10 Giugno 2004 Action 2 - Partners ICAR-CNR, Cosenza KDD & HPC Labs ISTI-CNR, Pisa Dipartimento di Informatica, Università di Pisa

ECD - Industrial Day, Roma 10 Giugno 2004 Action 2 – Mine the Web  The project: four Work Packages (Action Coordinator Dott. Fosca Giannotti, ISTI-CNR)  Work Package 2.1. Web Mining (UNIPI, ISTI, ICAR)  WP Coordinator: Dott. Salvatore Ruggieri, Dip. Informatica  Work Package 2.2. Indexing and compression (UNIPI)  WP Coordinator : Prof. Paolo Ferragina, Dip. Informatica  Work Package 2.3. Managing Terabytes (ISTI, ICAR)  WP Coordinator : Dott. Raffaele Perego, ISTI-CNR  Work Package 2.4. Participatory Search Services (UNIPI)  WP Coordinator : Prof. Maria Simi, Dip. Informatica

ECD - Industrial Day, Roma 10 Giugno 2004 Action 2 – Mine the Web  The main goals of the ECD Project, content enhancement and delivery, are here pursued in a complementary way w.r.t. Action 1  The focus is on Delivering Enhanced Web Contents to (Communities of) Users:  Exploiting Web Mining to extract knowledge/models that can be used to enhance efficacy and efficiency of the various phases of the information search process  Design, validate and provide efficient and scalable solutions for retrieving, storing, and delivering Web contents to users

ECD - Industrial Day, Roma 10 Giugno 2004 Motivations  On-line data grows rapidly:  50+M new pages/day, font: IBM  100+k news, articles/day font: IBM  Databases, digital libraries, etc.  Internet use tracking produces additional interesting data:  Servers logs, WSE logs, network traffic logs  Goldman Sachs estimates (2002): “between 80 and 90 percent of information on the Internet and corporate networks is unstructured”

ECD - Industrial Day, Roma 10 Giugno 2004 Motivations  The limits of the current means of access to web contents are becoming clear  Low precision and quality, difficulty of matching users’ subjective relevance  over-abundance of low-quality web material  low covering and freshness much relevant information in the hidden web ranking mechanisms penalize important pages that enter the scene  Difficulties in  managing size, complexity, heterogeneity  identifying Patterns and Trends within huge amounts of unstructured contents Web Mining plays an important role. It allows to synthesize and extract precious information and knowledge

ECD - Industrial Day, Roma 10 Giugno 2004 Web Mining  User-Centric View (Client-Side)  discovery of documents on a subject  discovery of semantically related documents or document segments  extraction of relevant knowledge about a subject from multiple sources Web Mining: Exploiting Data Mining techniques with data coming from the Web Data Mining: the process of discovery interesting knowledge from large amount of data stored in databases, data warehouses, or other repositories Goal: assist users or site owners in finding something useful/interesting/relevant  Owner-Centric View (Server-Side)  increasing contact / conversion efficiency (Web marketing)  targeted promotion of goods, services, products, ads  measuring effectiveness of site content / structure  providing dynamic personalized services or content

ECD - Industrial Day, Roma 10 Giugno 2004 Web Mining Taxonomy Web Mining Web Usage Mining Web Content Mining Web Structure Mining [27/May/2004:19:24: ] "GET /images/finger.jpg HTTP/1.1" [27/May/2004:19:24: ] "GET /images/logokdd.jpg HTTP/1.1 " [27/May/2004:19:24: ] "GET /didattica/BDM2004/TDM_intro pdf HTTP/1.1" [27/May/2004:19:24: ] "GET /didattica/BDM2004/TDM_intro pdf HTTP/1.1" [27/May/2004:19:24: ] "GET /didattica/BDM2004/TDM_intro pdf HTTP/1.1"

ECD - Industrial Day, Roma 10 Giugno 2004 Web Content Mining  Discover semantics of documents by examining  textual content  linkage structure  domain knowledge and meta-data;  user attributes / profiles  Approaches: text mining, document semantic analysis  Discover and extract common schema to capture relevant semantic information form heterogeneous data sources  Approaches:  Web-based query languages: XML + WebSQL + WebML  Multiple-layered databases; Discovery of concept hierarchies

ECD - Industrial Day, Roma 10 Giugno 2004 Web Structure Mining  Discovery and Analysis of Site Structures  Analyzing web site structure (viewed as a directed graph) by comparing site graph against patterns discovered from site usage / content data  Automatic site construction based on  correlations among pages  domain knowledge / site description  discovery of concept hierarchies among documents  Co-Citation Analysis  Based on the view that the semantic contents of a document/site is reflected in  documents/sites to which it refers  documents/sites that refer to it  Application: discovery of authoritative pages

ECD - Industrial Day, Roma 10 Giugno 2004 Web Usage Mining  Discovery of meaningful patterns from data generated by client-server transactions on one or more Web localities  Web localities may involve one or more Web and/or application servers usually belonging to the same organization  Typical Sources of Data:  automatically generated data stored in web server access logs, referrer logs, proxy logs, agent logs, and client-side cookies  user profiles and/or user ratings  meta-data, page attributes, page content, site structure  e-commerce transaction data

ECD - Industrial Day, Roma 10 Giugno 2004 Web Mining Applications  Web Usage Mining  discovering customer preference and behavior  Web personalization / collaborative filtering  adaptive Web sites / improving Web site organization  e-business intelligence, etc.  Web Content Mining  information filtering / knowledge extraction  Web document categorization  discovery of ontologies on the Web, etc.  Web Structure Mining  Finding "Quality" or "authoritative" sites based on linkage and citations IBM CLEVER project Google  Etc.

ECD - Industrial Day, Roma 10 Giugno 2004 Some related projects  WebFountain - IBM  WebBase - Stanford DBGroup

ECD - Industrial Day, Roma 10 Giugno 2004 WebFountain World-Wide Web, News Forums, Weblogs, etc. Newspapers, Magazines, etc. Customer Electronic Text WebFountain Infrastructure for Advanced Text Analytics Finds patterns, trends and relationships in text Application Examples: Marketing Intelligence Research IBM

ECD - Industrial Day, Roma 10 Giugno 2004 WebFountain: an infrastructure for Advanced Text Analytics applications ½ Petabye Cluster capacity 2,000,000,000 Number of pages in store 25,000,000 Number of pages crawled per day 10,000 Number of pages mined per second 3674 Number of 73GB hard drives 1231 Number of CPU’s 250 Number of scientists and researchers who have contributed to WebFountain technology 100 Patents pending 75 Patents issued 70 Megabytes/sec traffic coming in from internet 5 minutes, 22 seconds Time to complete query 5 Number of countries contributing to technology

ECD - Industrial Day, Roma 10 Giugno 2004 WebFountain: Reputation Tracking

ECD - Industrial Day, Roma 10 Giugno 2004 WebBase Stanford DBgroup

ECD - Industrial Day, Roma 10 Giugno 2004 WebBase Challenges  Scalability  crawling  archive distribution  index construction  storage  Consistency  freshness  versions  Dissemination  Archiving  “units”  coordination  IP Management  copy access  link access  access control  Hidden Web  Topic-Specific Collection Building

ECD - Industrial Day, Roma 10 Giugno 2004 Action 2 – Mine the Web: application scenario  So far, barely no approach analyzes how a given group of users access the Web, with the aim of exploiting usage information to provide enhanced access to web resources to the users from this group  We think that it is possible to learn from usage data of a group of web users new models and patterns that, in combination with document content and structure, may yield enhanced content access and delivery  better search services, better categorization and document classification services, better question answering services

ECD - Industrial Day, Roma 10 Giugno 2004 Action 2 – Mine the Web  Ambitious objective: Exploit the combination of Web data about: USAGE, STRUCTURE, CONTENT originated/accessed by a Virtual Organization, to improve the efficacy and efficiency of the knowledge extraction process from the users point of view  Developing solutions:  Innovative w.r.t. the state of the art  Appropriate for the Web domain

ECD - Industrial Day, Roma 10 Giugno 2004 Virtual Organizations Virtual Community Internet

ECD - Industrial Day, Roma 10 Giugno 2004 Tracking Virtual Organizations  Tracking the interaction of the virtual community with internet allows us to collect several interesting information  Network Traffic data provide detailed information about:  Usage  Preferred sites, user sessions  Content  Accessed Documents  Structure  From client sessions we can build the usage Web subgraph  By parsing the documents retrieved we can build the corresponding link graph Virtual Community

ECD - Industrial Day, Roma 10 Giugno 2004 Tracking Virtual Organizations Link graph Traffic graph Link and Traffic graph Virtual Community

ECD - Industrial Day, Roma 10 Giugno 2004 Tracking Virtual Organizations Virtual Community  the sequence of pages visited by a user after a query to a WSE gives us precious information about the subjective relevance of pages w.r.t. query topic Query: www consortium

ECD - Industrial Day, Roma 10 Giugno 2004 We need an infrastructure: the Web Object Store (WOS)  A Web Data Management System optimized to efficiently handle content, usage, and structure web data Purpose: Enable (possibly) innovative Web IR and Web Mining research by locally providing a small, but significant, portion of the Web built according to our user- centric view  Manage large collections of  Web pages  Preprocessed Usage data  Structure data  Collected within our virtual community

ECD - Industrial Day, Roma 10 Giugno 2004 Related activities: -Clustering s -Caching of Documents and of Query results -Efficient and scalable pattern mining and clustering algorithms -Enhanced compression methods -Clustering/categorizing query results snippets -Clustering XML documents -Etc. WOS and related activities Clustering/Pattern/Classification Web Mining algorithms Efficient and scalable access methods: IXE b-trees, full-text indexes search in compressed data Data cleaning, preprocessing, filtering Population: traffic raw data of our community IXE Crawler Partecipatory search Efficient and scalable storage: IXE persistent objects compression distributed architecture  Persistent store of objects  Web data management system for web content, structure and usage data  Management of data at many abstraction levels  Fast development of new applications  Easy C++ annotation of new persistent objects  Read and write data in tables

ECD - Industrial Day, Roma 10 Giugno 2004 WOS data model  HttpRequest (Usage)  Citation (Structure)  Page (Content)  Higher-level abstractions  PageView  Session/Q-Sessions  User

ECD - Industrial Day, Roma 10 Giugno 2004 WOS applications  Some innovative applications are currently pursued within our project:  Characterization, on the basis of usage only or usage + contents + structure, of new important emerging sites, or irrelevant sites (e.g., advertising sites);  crucial to instruct the crawler of the community web repository towards fresh, relevant documents while avoiding unimportant documents  Page ranking based also on usage information, for achieving a more accurate and dynamic measurement of document relevance  Recommendation of similar/related documents and keywords, on the basis of combined usage/content analysis  Caching and clustering of web search results

ECD - Industrial Day, Roma 10 Giugno 2004 WOS population: usage data (WP 2.1)  Many-to-many interactions  Inter-site user sessions  Massive data  Millions/day HttpRequest  ~1 GB/day raw data  We collected long periods of proxy-level IP traffic originated from SERRA network (domain unipi.it)  The whole University of Pisa

ECD - Industrial Day, Roma 10 Giugno 2004 WOS population: content data (WP 2.4)  Methods to gather contents to populate Web Object Store  IXE Crawler  Participatory Search System (main activity this year)  Hidden Web Search

ECD - Industrial Day, Roma 10 Giugno 2004 WOS population: content data (WP 2.4)  IXE crawler init get next url get page extract urls initial urls web pages Internet

ECD - Industrial Day, Roma 10 Giugno 2004 IXE Crawler  Parallel/distributed crawler  High performance through:  asynchronous I/O (500 connections/thread)  asynchronous DNS resolution  keep-alive connections  multi-threads  URL compression  9 Mb/sec transfer rate (7 times nutch.org crawler)

ECD - Industrial Day, Roma 10 Giugno 2004 Participatory search: the idea  Participatory search:  each participant builds an index of the local contents and sends it to a central server  the central server implements a community search service collecting and merging the participants' indexes  A model that fits community needs for dedicated search services  A trade-off between a centralized search model (e.g.: Google), and a distributed approach (e.g.: Gnutella, Kazaa)

ECD - Industrial Day, Roma 10 Giugno 2004 Participatory Search CentralizedParticipatoryDistributed Search Index Search results Documents C IC I C IC I C IC I C IC I S C I S C I SC I S C I SC I S C I SC I S C I SC I S C I SC I S C – Crawler I – Indexer S – Search Engine

ECD - Industrial Day, Roma 10 Giugno 2004 Participatory Search: benefits  Participants are in charge of  selecting what to index and to publish  when to publish (no need of coordination with an external crawler)  Control on index update and freshness  Publishing of Hidden Web content

ECD - Industrial Day, Roma 10 Giugno 2004 Qualitatively, we show that  c’ is shorter than c, if s is compressible  Time( A boost ) = Time ( A ), i.e. no slowdown  A is used as a black-box Storage and access methods: compression (WP 2.2) c’c’ Booster The better is A, the better is A boost A sc The more compressible is s, the better is A boost Key Components: Burrows-Wheeler Transform, Suffix Tree, and a Greedy processing of them Our technique takes a poor compressor A and turns it into a compressor A boost with better performance guarantee

ECD - Industrial Day, Roma 10 Giugno 2004 Storage and access methods (WP 2.1 and 2.2)  Repository of URLs  Compressed  Prefix and Suffix search within URLs  Search by hostname, path, file-ext, … select count(*) from … where url LIKE ‘  Up to two order of magnitude faster than using sequential scan and B-tree  Space occupacy << B-tree

ECD - Industrial Day, Roma 10 Giugno 2004 Storage and access methods: index compression (WP 2.3)  Assigning DocIDs in a clever way could improve the compression factor of traditional variable-[bit/byte] encoding methods by increasing the number of small DGaps.  Clustering property: within each posting lists there are dense zones (i.e. a lot of small DGaps).  Our problem consists of enhancing the Clustering Property of posting lists.

ECD - Industrial Day, Roma 10 Giugno 2004 Compression Enhancement

ECD - Industrial Day, Roma 10 Giugno 2004 Assignment Performance

ECD - Industrial Day, Roma 10 Giugno 2004 Content delivery (WP 2.1, 2.2 and 2.3)  Web Caching  Mining of web/proxy server requests aimed at improving LRU- based document caching (WP 2.1)  Recommendation system  (On line/Off line) Mining of web sessions aimed at profiling users and recommending them related pages (WP 2.1, 2.3)  Transactional Clustering  Clustering specialized on transactional data aimed at categorizing web pages, user sessions, snippet sequences, search engine results (WP 2.1, 2.2)

ECD - Industrial Day, Roma 10 Giugno 2004 Content delivery (WP 2.3)  SUGGEST: a recommendation system made up of two distinct modules  Offline: performing model extraction by a clustering algorithm which partition the Usage Graph  Online: performing users classification and suggestion generation  The WOS remarkably shortened implementation time (< 500 C++ lines)  We used three WOS objects to produce a persistent clustering structure Citation PageView Session sCluster

ECD - Industrial Day, Roma 10 Giugno 2004 Content delivery (WP 2.2) Goal: Retrieve the pages which match the user needs. This is a much difficult task in the light of the fact that:  the Web size is increasing and so the number of answers  the Web coverage is a problem for a single search engine  Web pages are heterogeneous  User needs are subjective and time-varying  “list of keywords” paradigm for a user query may be ambiguous SnakeT: clusters the web-snippets returned by many search engine(s) into hierarchically labeled folders which are created on-the-fly to catch the various meaning of the answers returned for a user query

ECD - Industrial Day, Roma 10 Giugno 2004 A commercial example: Vivisimo Mainly a black-art: IBM India [WWW 04] and Microsoft Cina [SIGIR 04] made their software not publicy available

ECD - Industrial Day, Roma 10 Giugno 2004 SnakeT  It offers various interesting features:  Labels are non-contiguous sentences of variable length selected on the basis of two knowledge bases  13 search engines are queried on-the-fly  Hierarchy is built via a greedy strategy which aims for:  Good coverage of the web-snippets,  Effective readability of the labels  Parent labels are NOT substring of descending labels  Open-source architecture written in C and Perl

ECD - Industrial Day, Roma 10 Giugno 2004 SnakeT : An example fo use

ECD - Industrial Day, Roma 10 Giugno 2004 SnakeT : An example fo use Look at the DEMO

ECD - Industrial Day, Roma 10 Giugno 2004 Content delivery (WP 2.1)  Clustering of  s (manco)  XML documents (chiara)  ??

ECD - Industrial Day, Roma 10 Giugno 2004 On going and future activities  Work in progress  Pursuing our goal of exploiting USAGE, STRUCTURE, CONTENT Web data to improve efficacy and efficiency in the interaction of the user with the Web  Implementation of additional WOS layers  Compression booster, XML clustering  Future work (medium-long term)  WOS, final version  Community-oriented ranking  Content (news, xml,..) clustering  Cooperation with Nutch.org (Doug Cutting in Pisa next October)  etc

ECD - Industrial Day, Roma 10 Giugno 2004 Deployment scenarios  Concerning the role of the WOS and of the ECD applications three (non-exclusive) possible deployment scenarios could be devised  The WOS is a research infrastructure, in the spirit of the WebBase project at Stanford University  The WOS is an infrastructure for web analytics services to be offered to third parties, in a spirit close to the WebFountain IBM project  The WOS can become a product for Web Data Management Systems aimed at developing and engineering web mining ECD applications, again in a spirit close to WebBase

ECD - Industrial Day, Roma 10 Giugno 2004 Demo Session  Three demos here  WOS: browsing usage data (Mirko Nanni, Vincenzo Bacarella)  SnakeT: Web snippets clustering (Paolo Ferragina, Antonio Gullì)  ANTIX: Participatory Search System (Andrea Esuli)  Some other activities described in the Posters

ECD - Industrial Day, Roma 10 Giugno 2004 More information  Interested people can find these slides, more information, documents and the full list of publications at the address: 