Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quantitative Comparisons of Search Engine Results Mike Thlwall School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓 UK)

Similar presentations


Presentation on theme: "Quantitative Comparisons of Search Engine Results Mike Thlwall School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓 UK)"— Presentation transcript:

1 Quantitative Comparisons of Search Engine Results Mike Thlwall School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓 UK) Journal of the American Society for Information Science and Technology 2008

2 2 Abstract Search engines –To find information or web sites Webometric –Finding and measuring web based phenomena Comparing the applications programming interfaces –Google, Yahoo!, Live Search Webometric application –hit count, number of URLs, number of domains, number of web sites, number of top-level domains

3 3 Search Engine and Web Crawlers Three key operations: –Crawling : identifying, downloading and storing to DB –Results matching: a search engine identifies the pages in its database that match any user query.

4 4 Search Engine and Web Crawlers –Results ranking A search engine will arrange the matching URLs to maximize the probability that a relevant result is in the first or second pages. Search term Occur frequency Number of click

5 5 Research Objectives Are there specific anomalies that make the HCEs of Google, Live Search or Yahoo! unreliable for particular values? How consistent are Google, Live Search and Yahoo! in the number of URLs returned for a search, and which of them typically returns the most URLs? How consistent are the search engines in terms of the spread of results (sites, domains and top-level domains) and which search engine gives the widest spread of results for a search?

6 6 Data 1,587 words –Blogs –Word frequency –http://cybermetrics.wlv.ac.uk/paperdata/ Three engine searchs –Google, Yahoo! and Live Search –1000 pages Five webometrics –hit count, number of URLs, number of domains, number of web sites, number of top-level domains

7 7 Results - 1 Hit count estimates Figure 2a,b,c. Hit count estimates of the three search engines compared (logarithmic scales, excluding data with zero values; r=0.80, 0.96, 0.83).

8 8 Results - 2 Number of URLs returned Figure 3a,b,c. URLs returned by the three search engines compared (r=0.71, 0.68, 0.84)

9 9 Results - 3 Number of domains returned Figure 4a,b,c. Domains returned by the three search engines compared (r=0.65, 0.69, 0.83).

10 10 Results - 4 Number of sites returned Figure 5a,b,c. Sites returned by the three search engines compared (r=0.66, 0.69, 0.81)

11 11 Results - 5 Number of TLDs returned Figure 6a,b,c. TLDs returned by the three search engines compared (r=0.74, 0.77, 0.84)

12 12 Results - 6 Comparison within results

13 13 Conclusion Google seems to be the most consistent in terms of the relationship between its HCEs and number of URLs returned. Yahoo! is recommended if the objective is to get results from the widest variety of web sites, domains or TLDs.

14 14 Evaluating Search Engine Effects on Web-based Relatedness Measurement

15 15 Snippets Six manifest records –snippets –hit count –number of URLs –number of domains –number of web sites –number of top-level domains

16 16 Dataset WordSimilarity-353 Test Collection (TC-353) –TC353 Full (353 pairs) –TC353 Testing (153 pairs) Three famous search engines –Yahoo! –Google –Live Search Five domains –general web search (web09) –.Com –.Edu –.Net –.Org

17 17 The Model A web-based relatedness Web Metric (X, Y) measures the association of two objects X and Y –where F is a transfer function and d is a dependency score. The dependency score d reflects a mutual dependency of X and Y on the web. Web Metric (X, Y)= F(d(X,Y))

18 18 The Model Given a search engine G and two objects X and Y –we employ two double-checking functions, f G (Y@X) and f G (X@Y), to estimate the dependence between X and Y Web Metric (X, Y) =

19 19 Figure 8. Behaviors of the Gompertz Curve and a Mapping Example

20 20 Experiments Web Metric (X, Y) =

21 21


Download ppt "Quantitative Comparisons of Search Engine Results Mike Thlwall School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓 UK)"

Similar presentations


Ads by Google