3 Introduction What is a Search Engine? A server or a collection of servers dedicated to indexing internet web pages, storing the results and returning lists of pages which match particular queries. Convenient search engines generate indexes : Google using Spider Yahoo using Directory NeuroSearch Using Spider & the Advance Knowledge
4 Introduction cont.. Defining the problem In addition, (1)- users have many challenges in choosing the relevant keywords; (2)- professionals sometimes fail in their search and get disappointed result, because A.the retrieved pages sometimes not related or B.different from what the theyre looking for. The Objective Creating a specialised search engine (i.e, Advance knowledge) to read web documents Index and update all the content in the local server Answer the queries from the local database Update the system over a constant period why is a specialised search engine needed? Web has got non centralised organisation, with huge mixed collection of Information Updated continuously, without standard format, Pages are extensively linked Therefore, establishing standard measures for relevance is a very challenging task
5 Components of NeuroSearch It has two components: 1-Search/Crawler Engine 2- Query engines
7 NeuroSearch Architecture Model Search Engine Interface Query Engine Indexer Index Re-CrawlerWebCrawler World Wide Web Users WWW
8 Implementation and Case Study Creating the database using Access DB. Implementing all parts of NueroSearch using Java Language and SQL.
9 NeuroSearch Database The Advance Knowledge TEXT WebCrawler data Advance Knowledge data Re-crawler data Query Data Indexer data
10 The advance knowledge Case study- Neuroscience (Vision) Phase 1 Phase 2 Phase 3 NeuroSearch uses advance knowledge about Neuroscience (vision) as a case study. Then, as a domain knowledge of Vision, do data mining to construct keywords and the relation between them. This knowledge is stored in the database and categorised by numbers, and related knowledge is categorised too and stored in data network form in the database.
12 WebCrawler (Spider) Spider 1)-This web crawler is general one which can download any kind of WebPages. It performs this using : 3)-In addition, WebCrawler access the proxy has to access the proxy firewall firewall (i.e. in Newcastle University LAN), before downloaded any web sites. 2)-Fetch URL, retrieves all its WebPages and saves them in the local drive performs a breadth-first search 4)-The crawler performs a breadth-first search, which means it collects a list of all the links that are on the current page before it follows any of the links to a new page.
13 WebCrawler - real challenge. Challenge 1: connect to www and accessing private websites. Solution 1: Crawler has to allow its socket to connect first with the Proxy server. Challenge 2: connect this socket further to the WWW Solution 2: Get method : the straight forward socket uses is just to get the file name. However, in this case Get command has to take the full URL.
14 Indexer Engine Indexer Engine 4)-The Ranking Method 1)-Firstly, it search the webpage using its advance knowledge. Then, Webpage will be deleted if it is not related to the case study subject. 2)- if it is related to the case study subject (neuroscience) so the indexer will collect the following information from the document: 3)-All keywords it contains, how many times they are repeated, title, contents Then, save them in the database for later display in the query result and do other calculation.
15 Query Engine Query Engine It has an interface to accept keywords from the user gives the user 2 choices for either display only the most relevant result, or the whole result which include the related results. It searches for query keywords in the index database and retrieved the result in html format.
16 Query Result: This is indeed an edge compared to other convenient search engines
17 Re-Crawling Re- Crawling 2-its interface allow the special users decide to continue crawling the website or cancel it. 1-WebCrawler is specialised of any subject created in the advance knowledge in the database, which will achieve this purpose by reading the URL from the index database using SQL 3-This Part of software aimed to update the index found new link. This is will make search and crawl any advance knowledge subject related websites easier
18 Testing phase 20 tests for each category Test phase requires: checking the first 10 ranking queries results of the NeuroSearch with the same 10 queries results of another search engine such as Google. abbreviation & combined keywords general keywords specific keywords Abbreviation keywords combined keywords Total of 1000 tests
19 Testing cont.. Ranking query test results in General Keywords: Search Engine GoogleNeuroSearch Search Engine First 10 results RankKeywordRepeatedRankKeywordrepeatedRelated- keyword repeated Quality/perce ntage % % % % % % % % % % Average % 10% 100% Table 1: (Query 1) Ranking query test result in General Keywords: (Eye)
20 Testing cont.. Chart 1 Average of Keywords performance for Category Based test results of the (Google) Chart 2 Average of Keywords performance for Category Based test results of the (NeuroSearch)
21 Analysing the search engines ranking results Depends on the Categories Table 4. The Average Ranking Engines Performance Query test results Category based
22 Analysing the Average Ranking Engines Performance Query test results Category based t test Result analysisResult analysis.. is used to compare two groups' scores on the same variable p value <.05). That indicates, NeuroSearch have a statistically significantly higher mean score in all categories ranking results (100) than Google (52.35) the negative values of t-test show the (inverse) relation between them when NeuroSearch results increase the Google results decrease.
23 Visual representation Chart 3 Average of Categories Based Engines ranking performance Chart 4 Average of the keyword Based in the documents in Query test results for (Category based Query) engines performance
24 Conclusion Although NeuroSearch search engine Used a simple algorithm to judge the page quality compared by quality compared by other convenient search engines, Although NeuroSearch search engine Used a simple algorithm to judge the page quality compared by quality compared by other convenient search engines, NeuroSearch proves to be veryNeuroSearch proves to be very powerful in obtaining relevant results, NeuroSearch proves to be veryNeuroSearch proves to be very powerful in obtaining relevant results, Particularly, if its advance knowledge built/created by specialist (domain knowledge), e.g. Oil, Medical, arts, etc Particularly, if its advance knowledge built/created by specialist (domain knowledge), e.g. Oil, Medical, arts, etc
25 Reference (example..) 4 : Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts, USA, Brin, S. and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference and computing science of Stanford University, Stanford, CA USA, 1998.