Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,

Slides:



Advertisements
Similar presentations
Web Development & Design Foundations with XHTML
Advertisements

1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Chapter 1: The Database Environment
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Introduction to HTML, XHTML, and CSS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Module 2 Sessions 10 & 11 Report Writing.
LIBRARY WEBSITE, CATALOG, DATABASES AND FREE WEB RESOURCES.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Microsoft Office 2010 Basics and the Internet
Computer Literacy BASICS
Microsoft Office Illustrated Fundamentals Unit C: Getting Started with Unit C: Getting Started with Microsoft Office 2010 Microsoft Office 2010.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
1 How Do I Order From.decimal? Rev 05/04/09 This instructional training document may be updated at anytime. Please visit and check the.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Addition 1’s to 20.
Pasewark & Pasewark Microsoft Office XP: Introductory Course 1 INTRODUCTORY MICROSOFT WORD Lesson 8 – Increasing Efficiency Using Word.
25 seconds left…...
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
® Microsoft Office 2010 Browser and Basics.
Week 1.
DIKLA GRUTMAN 2014 Databases- presentation and training.
PSSA Preparation.
CINAHL Keyword Searching. This presentation will take you through the procedure of finding reliable information which can be used in your academic work.
Benchmark Series Microsoft Excel 2013 Level 2
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Search Engine Optimization
Databases & Data Warehouses Chapter 3 Database Processing.
Lecturer: Ghadah Aldehim
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
The Internet Do you really know what is out there?
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
Search Tools and Search Engines Searching for Information and common found internet file types.
Search Engines By: Faruq Hasan.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
G053 - Lecture 02 Search Engines Mr C Johnston ICT Teacher
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
SEARCH ENGINES The World Wide Web contains a wealth of information, so much so that without search facilities it could be impossible to find what you were.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
General Architecture of Retrieval Systems 1Adrienn Skrop.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Search Engines & Subject Directories
Search Engines & Subject Directories
Search Engines & Subject Directories
Presentation transcript:

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis, NeuroSearch

2 Contents Introduction Implementation Testing Software lifecycle : (1)webCrawler Engine, (2) Indexer Engine, (3) Query Engine, (4) Re-Crawler Engine (Specialised Crawler) Conclusions Components in a NeuroSearch & its Architecture Challenges

3 Introduction What is a Search Engine? A server or a collection of servers dedicated to indexing internet web pages, storing the results and returning lists of pages which match particular queries. Convenient search engines generate indexes : Google using Spider Yahoo using Directory NeuroSearch Using Spider & the Advance Knowledge

4 Introduction cont.. Defining the problem In addition, (1)- users have many challenges in choosing the relevant keywords; (2)- professionals sometimes fail in their search and get disappointed result, because A.the retrieved pages sometimes not related or B.different from what the theyre looking for. The Objective Creating a specialised search engine (i.e, Advance knowledge) to read web documents Index and update all the content in the local server Answer the queries from the local database Update the system over a constant period why is a specialised search engine needed? Web has got non centralised organisation, with huge mixed collection of Information Updated continuously, without standard format, Pages are extensively linked Therefore, establishing standard measures for relevance is a very challenging task

5 Components of NeuroSearch It has two components: 1-Search/Crawler Engine 2- Query engines

6 Components explained Retriever (Query engine) Re-crawler Indexer Spider Crawler Engine Query Engine

7 NeuroSearch Architecture Model Search Engine Interface Query Engine Indexer Index Re-CrawlerWebCrawler World Wide Web Users WWW

8 Implementation and Case Study Creating the database using Access DB. Implementing all parts of NueroSearch using Java Language and SQL.

9 NeuroSearch Database The Advance Knowledge TEXT WebCrawler data Advance Knowledge data Re-crawler data Query Data Indexer data

10 The advance knowledge Case study- Neuroscience (Vision) Phase 1 Phase 2 Phase 3 NeuroSearch uses advance knowledge about Neuroscience (vision) as a case study. Then, as a domain knowledge of Vision, do data mining to construct keywords and the relation between them. This knowledge is stored in the database and categorised by numbers, and related knowledge is categorised too and stored in data network form in the database.

11 Software lifecycle Consists of WebCrawler/Spider Engine 1. WebCrawler/Spider Engine 2. Indexer Engine 2. Indexer Engine 3. Re-Crawler (specialised) 3. Re-Crawler (specialised) Crawler Engine

12 WebCrawler (Spider) Spider 1)-This web crawler is general one which can download any kind of WebPages. It performs this using : 3)-In addition, WebCrawler access the proxy has to access the proxy firewall firewall (i.e. in Newcastle University LAN), before downloaded any web sites. 2)-Fetch URL, retrieves all its WebPages and saves them in the local drive performs a breadth-first search 4)-The crawler performs a breadth-first search, which means it collects a list of all the links that are on the current page before it follows any of the links to a new page.

13 WebCrawler - real challenge. Challenge 1: connect to www and accessing private websites. Solution 1: Crawler has to allow its socket to connect first with the Proxy server. Challenge 2: connect this socket further to the WWW Solution 2: Get method : the straight forward socket uses is just to get the file name. However, in this case Get command has to take the full URL.

14 Indexer Engine Indexer Engine 4)-The Ranking Method 1)-Firstly, it search the webpage using its advance knowledge. Then, Webpage will be deleted if it is not related to the case study subject. 2)- if it is related to the case study subject (neuroscience) so the indexer will collect the following information from the document: 3)-All keywords it contains, how many times they are repeated, title, contents Then, save them in the database for later display in the query result and do other calculation.

15 Query Engine Query Engine It has an interface to accept keywords from the user gives the user 2 choices for either display only the most relevant result, or the whole result which include the related results. It searches for query keywords in the index database and retrieved the result in html format.

16 Query Result: This is indeed an edge compared to other convenient search engines

17 Re-Crawling Re- Crawling 2-its interface allow the special users decide to continue crawling the website or cancel it. 1-WebCrawler is specialised of any subject created in the advance knowledge in the database, which will achieve this purpose by reading the URL from the index database using SQL 3-This Part of software aimed to update the index found new link. This is will make search and crawl any advance knowledge subject related websites easier

18 Testing phase 20 tests for each category Test phase requires: checking the first 10 ranking queries results of the NeuroSearch with the same 10 queries results of another search engine such as Google. abbreviation & combined keywords general keywords specific keywords Abbreviation keywords combined keywords Total of 1000 tests

19 Testing cont.. Ranking query test results in General Keywords: Search Engine GoogleNeuroSearch Search Engine First 10 results RankKeywordRepeatedRankKeywordrepeatedRelated- keyword repeated Quality/perce ntage % % % % % % % % % % Average % 10% 100% Table 1: (Query 1) Ranking query test result in General Keywords: (Eye)

20 Testing cont.. Chart 1 Average of Keywords performance for Category Based test results of the (Google) Chart 2 Average of Keywords performance for Category Based test results of the (NeuroSearch)

21 Analysing the search engines ranking results Depends on the Categories Table 4. The Average Ranking Engines Performance Query test results Category based

22 Analysing the Average Ranking Engines Performance Query test results Category based t test Result analysisResult analysis.. is used to compare two groups' scores on the same variable p value <.05). That indicates, NeuroSearch have a statistically significantly higher mean score in all categories ranking results (100) than Google (52.35) the negative values of t-test show the (inverse) relation between them when NeuroSearch results increase the Google results decrease.

23 Visual representation Chart 3 Average of Categories Based Engines ranking performance Chart 4 Average of the keyword Based in the documents in Query test results for (Category based Query) engines performance

24 Conclusion Although NeuroSearch search engine Used a simple algorithm to judge the page quality compared by quality compared by other convenient search engines, Although NeuroSearch search engine Used a simple algorithm to judge the page quality compared by quality compared by other convenient search engines, NeuroSearch proves to be veryNeuroSearch proves to be very powerful in obtaining relevant results, NeuroSearch proves to be veryNeuroSearch proves to be very powerful in obtaining relevant results, Particularly, if its advance knowledge built/created by specialist (domain knowledge), e.g. Oil, Medical, arts, etc Particularly, if its advance knowledge built/created by specialist (domain knowledge), e.g. Oil, Medical, arts, etc

25 Reference (example..) 4 : Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts, USA, Brin, S. and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference and computing science of Stanford University, Stanford, CA USA, 1998.

26