Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 圖書資訊處處長 官大智 報告者 : 圖書資訊處資訊應用組組長 陳嘉平 研究助理 江欣倩 葉佳璋 Webometrics.

Similar presentations


Presentation on theme: "1 圖書資訊處處長 官大智 報告者 : 圖書資訊處資訊應用組組長 陳嘉平 研究助理 江欣倩 葉佳璋 Webometrics."— Presentation transcript:

1 1 圖書資訊處處長 官大智 報告者 : 圖書資訊處資訊應用組組長 陳嘉平 研究助理 江欣倩 葉佳璋 Webometrics

2 2 History –Since 2004, the Webometrics ranking is published twice a year (January and July). –This ranking has a coverage of more than 16,000 higher education institutions. –The most recent ranking is the January 2009 Edition.

3 3 Methodology –The unit for analysis is the institutional domain, so only universities and research centers with an independent web domain are considered. –University activity is multi-dimensional. So the ranking is built based on combining a group of indicators of web presence that measures these different aspects.

4 4 Indicators –Size : the number of pages in a domain (as recovered by search engines) –Visibility: the number of unique external links received by a domain –Rich File: the number of files of certain file types in a domain –Scholar: the number of papers and citations in a domain

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12 Metrics –For each indicator, the universities are ranked. –Then the ranks of four indicators are combined according to a formula as follows.

13 13 Verifiable Data –The only source for the data of this ranking is a small set of globally available, free access search engines. –All the results can be duplicated according to the described methodologies, taking into account the explosive growth of the web contents, their volatility and the irregular behavior of the commercial engines.

14 14 Bad Practices –The use of link farms and paid backlinks to improve the position in this rankings is not acceptable. –The involved institutions does not have a place in this ranking and will not be classified in future editions. –Random checks are made to ensure the correctness of the data obtained.

15 15 Ranking of Interests 1.55 "National_Taiwan_University" 116 87 46 13 2.179 "National_Chiao_Tung_University" 90 178 171 590 3.273 "National_Taiwan_Normal_University" 211 270 400 527 4.274 "National_Cheng_Kung_University" 316 421 235 53 5.282 "National_Sun_Yat-Sen_University" 333 405 348 28 6.308 "National_Tsing_Hua_University_Taiwan" 161 502 220 328 7.370 "National_Central_University" 427 576 355 30 8.384 "National_Chung_Cheng_University" 321 446 390 644 9.391 "National_Chengchi_University" 336 414 492 675 10.491 "Tamkang_University" 461 563 806 550 11.529 "I-Shou_University" 318 855 397 469 12.564 "National_Chung_Hsing_University" 448 713 529 861 We need to work on size, visibility, and rich files, while keeping our strength in scholar.

16 16 Ranking of Interests 13.659 "Providence_University" 543 1,067 600 280 14.716 "Fu_Jen_Catholic_University" 622 848 610 1,321 15.748 "Feng_Chia_University" 616 1,191 959 156 16.772 "Yuan_Ze_University" 409 1,021 574 1,564 17.836 "NTUST" 1,068 1,325 563 120 18.851 "Shih_Hsin_University" 617 1,356 948 336 19.896 "Tunghai_University" 418 1,235 810 1,470 20.905 "National_Dong_Hwa_U" 816 1,544 580 200 21.914 "Soochow_University_Taiwan" 560 1,393 1,042 665 22.921 "Chaoyang_University_of_T" 885 1,500 803 174 23.924 "NYUST" 719 1,470 1,212 109

17 17 URL Naming –Each institution should choose a unique institutional domain that can be used by all the websites of the institution. –Avoid changing the institutional domain as it has a devastating effect on the visibility values. –The alternative or mirror domains should be disregarded. –Use of well known acronyms –Should consider including descriptive word, like the name of the city, in the domain name. –Change IP address to domain name!

18 18 Content: Create –Allow a large proportion of staff, researchers or graduate students to be potential authors. Individual persons or teams should maintain their own websites. –Libraries, documentation centers and similar services can be responsible of large databases, including bibliographic ones and large repositories (thesis, pre-prints, and reports) –Hosting external resources can be interesting for third parties and increase the visibility: Conference websites, software repositories, scientific societies and their publications, especially electronic journals.

19 19 Content: Convert –Important resources available in non-electronic format can be converted to web pages easily. –Most of the universities have a long record of activities that can be published in “historical web sites”. –Other resources, as candidate for conversion, include past activities reports or pictures collections.

20 20 Interlinking –Measuring and classifying the links from others can be insightful. –You should expect links from your “natural” partners locality or region similar organizations portals covering your topics colleagues or partners personal pages. –Make an impact in your common language community. –Check for the orphaned pages, i.e. pages not linked from another. –Most popular pages or directories are relevant.

21 21 Language –The WWW audience is truly global, so one should not think locally. –Language versions, especially in English, are mandatory not only for the main pages, but for certain selected sections such as scientific documents.

22 22 Rich Files –Although html is the standard format of web pages, sometimes it is better to use rich file formats. –Provide versions of different formats.

23 23 Search Engine Issues –Search engine friendly design –Avoid cumbersome navigation menus based on Flash, Java or JavaScript that can block the robot access. –Deep nested directories or complex interlinking can block robots too. –Databases and even highly dynamic pages can be invisible for some search engines, so use directories or static pages instead or as an option. –Plain is good.

24 24 Archiving –Maintain a copy of old or outdated material in the site. –Archive media materials in web repositories. Collections of videos, interviews, presentations, animated graphs, and even digital pictures could be very useful in the long term.

25 25 Standards for Sites –The use of meaningful titles and descriptive meta-tags can increase the visibility of the pages. –Add authoring info, keywords and other data about the web sites.

26 26 Challenge –If the web performance of an institution is below the expected position according to their academic excellence, university authorities should reconsider their web policy, promoting substantial increases of the volume and quality of their electronic publications. –Again, NSYSU needs to improve on size, visibility, and rich files, while keeping the strength in scholar.

27 27

28 28

29 29

30 30 Experiments –For each institutional domain, we collect the data from search engines, per the description of methodology. –Then we compare our ranking against the Webometrics ranking. –We need to verify whether our data agree with theirs. It may not agree exactly, but we can evaluate the correlation.

31 31 Size –Number of pages recovered from four engines Google, Yahoo, Live Search and Exalead –For each engine, results are log-normalized to 1 for the highest value. –For each domain, maximum and minimum results are excluded. –An institution is assigned a rank according to the combined sum.

32 32 Visibility –The total number of unique external links received by a site –Data gathered from Yahoo, Live and Exalead (Google excluded) –For each engine, results are log-normalized to 1 for the highest value. –An institution is assigned a rank according to the combined sum.

33 33 Rich Files –Four different file formats Adobe Acrobat (pdf) Adobe PostScript (ps) Microsoft Word (doc) Microsoft Powerpoint (ppt) –Data (number of files) are extracted using Google –Merging the results for each file type after log- normalization, in the same way as described before

34 34 Scholar –Google Scholar provides the number of papers and citations for each academic domain. –These results from the Scholar database represent papers, reports and other academic items.

35 35 Number of Swaps –For two rankings of domains (institutions), say r and s, the number of swaps to bring ranking r to s is defined computationally by If the top-rank domain in s, say x, ranks 5 th in r, then 4 swaps is needed for to bring x to top. Find the second-rank domain of s in r, bring it to second. Continue until the entire order is correct. Accumulate the number of swaps, say N. –Smaller N is better.

36 36 Test 1: 03/27/2009 –Scholar (n = 23): N = 17 –Size (n = 23): N = 28 –Rich files (n = 23): N = 62 –Scholar (n = 100): N = 555 –Note the worst-case scenario is n(n-1)/2 swaps, and a random ranking is around n(n-1)/4.


Download ppt "1 圖書資訊處處長 官大智 報告者 : 圖書資訊處資訊應用組組長 陳嘉平 研究助理 江欣倩 葉佳璋 Webometrics."

Similar presentations


Ads by Google