By Morris Wright, Ryan Caplet, Bryan Chapman. Overview  Crawler-Based Search Engine (A script/bot that searches the web in a methodical, automated manner)

Slides:



Advertisements
Similar presentations
Cross-Site Scripting Issues and Defenses Ed Skoudis Predictive Systems © 2002, Predictive Systems.
Advertisements

The Collections Keeper A collections management system Brian J. Mullen.
PHP (2) – Functions, Arrays, Databases, and sessions.
Creating Web Page Forms. Objectives Describe how Web forms can interact with a server-based program Insert a form into a Web page Create and format a.
By Morris Wright, Brian Chapman and Ryan Caplet. Recap  Crawler-Based Search Engine  Limited to a subset of Uconn’s School of Engineering Websites Roughly.
Tutorial 6 Working with Web Forms
Crawler-Based Search Engine Milestone IV By Ryan Caplet, Morris Wright and Bryan Chapman.
Direct Congress Dan Skorupski Dan Vingo 15 October 2008.
Creating WordPress Websites. Creating a site on your computer Local server Local WordPress installation Setting Up Dreamweaver.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Multiple Tiers in Action
Crawler-Based Search Engine By Ryan Caplet, Morris Wright and Bryan Chapman.
Crawler-Based Search Engine By: Bryan Chapman, Ryan Caplet, Morris Wright.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
Sara SartoliAkbar Siami Namin NSF-SFS workshop July 14-18, 2014.
 2004 Tau Yenny, SI - Binus M0194 Web-based Programming Lanjut Session 11.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Introducing LAMP: Linux, Apache, MySQL and PHP Track 2 Workshop PacNOG 7 July 1, 2010 Pago Pago, American Samoa.
Databases & Data Warehouses Chapter 3 Database Processing.
Lecture 3 – Data Storage with XML+AJAX and MySQL+socket.io
Server-side Scripting Powering the webs favourite services.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 Exploring Microsoft Office Word 2007 Chapter 8 Word and the Internet Robert Grauer, Keith.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
M1G Introduction to Database Development 6. Building Applications.
Web Server Administration Chapter 7 Installing and Testing a Programming Environment.
Attacking Applications: SQL Injection & Buffer Overflows.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
SQL Queries Relational database and SQL MySQL LAMP SQL queries A MySQL Tutorial and applications Database Building Assignment.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
SQL pepper. Why SQL File I/O is a great deal of code Optimal file organization and indexing is critical and a great deal of code and theory implementation.
15/10/20151 PHP & MySQL 'Slide materials are based on W3Schools PHP tutorial, 'PHP website 'MySQL website.
Creating Dynamic Web Pages Using PHP and MySQL CS 320.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Module 10 Administering and Configuring SharePoint Search.
Creating PHPs to Insert, Update, and Delete Data CS 320.
Search Engines.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
Search Engines By: Faruq Hasan.
Creating a simple database This shows you how to set up a database using PHPMyAdmin (installed with WAMP)
Chapter 10 Database Management. Data and Information How are data and information related? p Fig Next processing data stored on disk Step.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Since you’ll need a place for the user to enter a search query. Every form must have these basic components: – The submission type defined with the method.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
G053 - Lecture 02 Search Engines Mr C Johnston ICT Teacher
Setting up a search engine KS 2 Search: appreciate how results are selected.
PS-1 project Designing an E-commerce page for HMT BEARINGS LTD and SEO of the website.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
SQL pepper. Why SQL File I/O is a great deal of code Optimal file organization and indexing is critical and a great deal of code and theory implementation.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Group 18: Chris Hood Brett Poche
Introduction to Dynamic Web Programming
Address Verification Using SQL, TextPad and Web Link Validator.
Computer Security Fundamentals
PHP / MySQL Introduction
Database Driven Websites
Lesson Objectives Aims You should know about: – Web Technologies
What is a Search Engine EIT, Author Gay Robertson, 2017.
Server-Side Processing II
Presentation transcript:

By Morris Wright, Ryan Caplet, Bryan Chapman

Overview  Crawler-Based Search Engine (A script/bot that searches the web in a methodical, automated manner) (wikipedia, ”web crawler”)‏  Limited to a subset of Uconn’s School of Engineering Websites  Resources: Web server and MySQL servers provided by ECS  Languages Used: HTML, PHP, SQL, Perl

Task Breakdown  Bryan Design Crawler Analyze files and fill database with Urls to search  Morris Search functionality Database/Account Management  Ryan UI Development  Ranking Algorithm and Keyword extraction done by group

Crawler Summary  The crawler creates a ”mirror” of our intended scope of websites on local hard drive  Using a script, the title is then extracted from the relavent files and placed into a DB table  Another script then visits each url and extracts keywords to populate the second DB table  When a user types in a word in the search engine, the word will be queried in the keyword database, and from that word another query will be sent to display all the urls/titles matching that specific keyword

Crawler - Wget  The linux command, wget is used in our script along with the base domain of to limit our crawler to sites within the school of engineering  “Wget can follow links in HTML pages and create local versions of remote web sites, fully recreating the directory structure of the original site” (linux.about.com)  Our “Mirror”  A script is then used to run a recursive call that removes all the tags from the files, preparing them for storage into the database

Crawler – Stem Words  A script is used to remove all arbitrary ”stem” words and combine like words such as: the if however -ion, -ing, -ier… etc “Running” is the same as “Run”  Helps with space in the database

Crawler Functionality  Once this is accomplished our first database is populated with indexing information and has a layout as seen below. ID Site Index Table URL TITLE Used as a primary key Stores site's url address Stores extracted title

Crawler Functionality  PHP is then used to loop through all the url listings in our indexing database to create keywords  Unwanted HTML syntax is removed and PHP's built-in function array_count_values is used to create a list of keywords and frequency  For the time being, these keyword frequencies will be used to determine page rank and ordering on the search page

MySQL Database

Crawler Functionality  Once this list is created for a given website, we then populate our keyword database by either creating a new table for the keyword, or simply adding a new entry into an existing table ID 'Keyword' Table URL Freq Used as a primary key Stores site's url address Stores keyword frequency

Sample Keyword Results  Consider the following results  URL: Title: For all your Technology Needs Keyword: technology 4 Keyword: information 10  URL: Title: For all your Sports Information Keyword: football 10 Keyword: information 12

Crawler Functionality  Once the databases have been populated, it just needs to be integrated with the search function of the page and the UI to be fully functional  The current UI is good for displaying a few results, but we will need something more efficient and better looking when there are hundreds of results

Search Function  When a word is entered into the search bar, a query of that word is entered into the database  If the word is in the database, the query will pull up all the URLs and their associated titles and display them on the page  The pages should be ordered by their page rank – the higher the frequency of the keyword, the higher the rank  The search function code is written in PHP and the queries are written in SQL

Search Function Test

Search Function Example

Search Function - Mail

Search Function - Uconn

Search Function – N/a

Changes needed for Integration  Need to setup the test database fields to match up the criteria of the crawler database  Test Database only uses 1 database whereas the crawler database uses 2 – one for the URL/Titles, one for the Keywords  Need to work on security measures such as input validation and Hackbar  Hackbar is a tool used for testing SQL injections, XSS holes and site security.

Questions?