Improving searches through community clustering of information

Slides:



Advertisements
Similar presentations
ICS 434 Advanced Database Systems
Advertisements

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Distributed components
NYU Microarray Database (NYUMAD)
Research Update on WebPlaces: Application of Implicit Networks Danyel Fisher Human-Centered Computing Retreat Summer, 1999.
Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 31/10/2007.
2/11/2004 Internet Services Overview February 11, 2004.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
Application Architectures Vijayan Sugumaran Department of DIS Oakland University.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
DISTRIBUTED COMPUTING
Client-Server Processing and Distributed Databases
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
DAT602 Database Application Development Lecture 12 C/S Model Database Application.
Victor Mushkatin, MCSE, MCSD CORPORATION Alexander Zakonov, MCSE, MCSD Stephen Pelletier, MCSE.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Organisations and Data Management 1 Data Collection: Why organisations & individuals acquire data & supply data via websites 2Techniques used by organisations.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 State and Session Management HTTP is a stateless protocol – it has no memory of prior connections and cannot distinguish one request from another. The.
By Ruizhe Ma, Avinash Madineni Sidoine Lafleur Kamgang Nov,
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert David F. Redmiles Information and Computer Science.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Chapter-04 Building an Ecommerce Website. Building an E-commerce Site: A Systematic Approach The two most important management challenges in building.
Introduction The concept of a web framework originates from the basic idea that every web application obtains its foundations from a similar set of guidelines.
Introduction to Oracle Forms Developer and Oracle Forms Services
Managing State Chapter 13.
Presentation on Distributed Web Based Systems Submitted by WWW
Web Engineering CS-4513 Prepared By: Junaid Hassan Lecturer at UOS M.B.Din Campus
CIIT-Human Computer Interaction-CSC456-Fall-2015-Mr
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
N-Tier Architecture.
Web Development Web Servers.
WEB SERVICES From Chapter 19 of Distributed Systems Concepts and Design,4th Edition, By G. Coulouris, J. Dollimore and T. Kindberg Published by Addison.
Introduction to Oracle Forms Developer and Oracle Forms Services
Outline Introduction Standards Project General Idea
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Design Decisions / Lessons Learned
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Introduction to Oracle Forms Developer and Oracle Forms Services
Processes The most important processes used in Web-based systems and their internal organization.
CHAPTER 3 Architectures for Distributed Systems
Introduction to J2EE Architecture
PHP / MySQL Introduction
#01 Client/Server Computing
OUTLINE Basic ideas of traditional retrieval systems
Distributed Content in the Network: A Backbone View
Section 14.1 Section 14.2 Identify the technical needs of a Web server
Lecture 1: Multi-tier Architecture Overview
PHP and Forms.
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Web Mining Department of Computer Science and Engg.
MORE ON ARCHITECTURES The main reasons for using an architecture are maintainability and performance. We want to structure the software into reasonably.
Back end Development CS Programming Languages for Web Applications
WEB SERVICES From Chapter 19, Distributed Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Kamal Satish M Persistent Systems Pvt. Ltd. Pune
McGraw-Hill Technology Education
State Handling CS 4640 Programming Languages for Web Applications
Information Retrieval and Web Design
Network management system
Back end Development CS Programming Languages for Web Applications
State Handling CS 4640 Programming Languages for Web Applications
#01 Client/Server Computing
Presentation transcript:

Improving searches through community clustering of information Sinergia Improving searches through community clustering of information

Motivation Finding useful information using today’s search engines technology is usually a time consuming process. Users in a “community” that share similar interests usually search for the same kind of things. There is no mechanism to share information about good search results, thus repeating time consuming searches.

Main idea Collect information about search queries made by users in the community. Allow people in the community to rank pages. Combine that information to provide a way to reuse previous searches made by users to speed up similar search queries from other users of the community.

Main idea Cluster users by their surfing preferences and use that information to suggest/filter related links.

Proposed solution Watch user’s queries to search engines and keep track of the URLs they find interesting. Build up a profile of the user based on his/her surfing behavior and correlate that information with profiles from other users. Complement the search results obtained from the traditional search engines with our knowledge of what is interesting to users thus potentially saving search time.

Goals of the implementation Use an approach that requires as few changes as possible in the client side and in the existing internet infrastructure. Leverage the power of our approach to as many client platforms as possible, ideally provide a truly platform-independent solution.

Goals of implementation Complement existing search technology instead of reinventing it. Use a modular architecture that allows: Flexible support of different search engines. Allow easy switching between correlation techniques Design the solution for high scalability and availability.

Architecture DB Proxy Web Server Hard state info stored in DB Soft state info stored in Proxy Proxy Web Server Search requests (modified) Most requests * Search queries * Vote for pages * Find related

Architecture: Overview Community: The group of users that the proxy serves. The proxy collects information transparently. The Web Server provides information to the users like related pages and results of searches. Both the proxy and the Web Server talk to the same Database. The Database maintains the hard state of the system.

Architecture: Proxy Maintains the concept of a session for each user. For most HTTP requests it works as a normal proxy. For requests that are search queries on a search engine, it keeps track of the URLs returned for that query and logs activity on those URLs.

Architecture: Proxy Requests to Sinergia web server are modified to provide user ID information on a fat URL. When a user’s session ends or he/she generates a new search store the collected information for that user in the DB. Uses a modularized architecture that allows to add support easily for additional search engines.

Architecture:Web Server Provides a front end for the Sinergia search engine. Provide the backend for supporting the “Vote”, “Get Rank” and “Find Related” functionality. To relate a request to the user ID of the client it receives information from the proxy in the form of a fat URL.

Architecture: Scalability Scalability is achieved through the clear division in tasks between the Proxy, the Web Server and the DB. Increase in number of search requests to our search engine should not create a bottleneck as the Database is decoupled from the Web Server and Proxy, thus allowing to scale any of it as needed.

Architecture: Scalability Soft state information about current searches is maintained in the Proxy and transactions to the database occur only when the user session ends or he makes a new search. The Proxy itself is scalable using clustering, as is the Web Server.

Architecture: Availability Our architecture addresses this by decoupling the Proxy, Web Server and Database therefore allowing the use of clustering for each of these components to provide a highly available system. Managing the hard state only in the Database provides availability of information as long the database is working. The database availability can be ensured by modern database techniques like distribution and replication.