Presentation is loading. Please wait.

Presentation is loading. Please wait.

NAS_qual reports. 2 NAS_qual - 1 Java batch which works on Heritrix reports (extracted from metadata W/ARC files) Compiles a large set of figures and.

Similar presentations


Presentation on theme: "NAS_qual reports. 2 NAS_qual - 1 Java batch which works on Heritrix reports (extracted from metadata W/ARC files) Compiles a large set of figures and."— Presentation transcript:

1 NAS_qual reports

2 2 NAS_qual - 1 Java batch which works on Heritrix reports (extracted from metadata W/ARC files) Compiles a large set of figures and lists and store them into text files 21 figures: –processed URLs –harvested URLs –harvested seeds –non-harvested seeds –harvested hosts –harvested domains –non-harvested domains –TLDs –MIME types –harvest duration –average URL/s –average Kb/s –average job size in URLs –average seeds per job –average job size –non-harvested URLs because of robots exclusion –total raw size –number of W/ARC files –size of W/ARC files –number of processed jobs –list of processed jobs

3 3 NAS_qual - 2 01-codehttp_url.txt : URL distribution per HTTP response code. 02-typemime_url_octets.txt : URL and bytes distribution per MIME type. 03-tld_url_octets.txt : URL and bytes distribution per TLD. 04-tld-hotes.txt : hosts distribution per TLD. 05-tld-domaines.txt : domains distribution per TLD. 06-tranches_hotes_url.txt : number of hosts in a given slice of harvested URL. –= =100001; 07-tranches_domaines_url.txt : same with domains. 08-tranches_domaines_hotes.txt : same with hosts on domains. 09-tld2ndniveau_url_octets.txt : URL and bytes distribution per second level TLD. 10-tld2ndniveau_hotes.txt : host distribution per second level TLD. 11-top_domaines_url_octets.txt : URL and bytes distribution for the N bigger domains. 12-top_hotes_url_octets.txt : URL and bytes distribution for the N bigger hosts. 13-top_domaines_hotes.txt : list of domains having the largest number of hosts. 14-codereponse_seeds.txt : distribution of seed per response code.


Download ppt "NAS_qual reports. 2 NAS_qual - 1 Java batch which works on Heritrix reports (extracted from metadata W/ARC files) Compiles a large set of figures and."

Similar presentations


Ads by Google