Evaluation Workshop: Quantitative Evaluation Methods Peter Dowdell NOF-digitise Technical Advisory Service email: firstname.lastname@example.org web: http://www.ukoln.ac.uk UKOLN is supported by:
Aims of this presentation: To explain the need to establish a measurement policy using a range of performance indicators To explain some of the possible measurements we can record To discuss some of the pitfalls and problem areas in attempting to measure performance
Why Have Performance Indicators? Performance indicators for Web sites can be used for several purposes: Use in management reports showing service growth For Service Level Agreements with funding agencies To identify gaps in service provision To predict and plan for future load patterns To monitor performance levels To advise on deployment of new technologies To inform and motivate contributors
Why have a measurement policy? To create stable view points over your site. To make sense of the available data. To clarify and measure broader objectives. To answer questions.
What are we logging? We log each request the web server receives. We record information like: remote IP address date and time response code request string method GET, POST etc execution time data transfer
How can we log this? Each request is appended to a log file. There are different accepted formats: W3C … Alternatively requests can be logged to a database. Log files can become very LARGE! Should we start a new log according to: daily / weekly / monthly / by file size?
Server logs: #Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 1999-12-25 00:00:21 #Fields: date time c-ip cs-username cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs(User-Agent) cs(Cookie) cs(Referer) 1999-12-25 00:00:21 188.8.131.52 - GET /issue1/jobs/Default.asp - 200 20407 AltaVista- Intranet/V2.3A+(email@example.com) - - 1999-12-25 00:03:39 184.108.40.206 - GET /statistics/ExpIntHits1.asp - 200 10519 AltaVista- Intranet/V2.3A+(firstname.lastname@example.org) - - 1999-12-25 00:26:54 220.127.116.11 - GET /robots.txt - 200 303 FAST-WebCrawler/2.0.9+(email@example.com;+http://www.fast.no/…) - - 1999-12-25 00:32:47 18.104.22.168 - GET /issue2/default.asp - 200 5332 AltaVista-Intranet/V2.3A+(firstname.lastname@example.org) - - 1999-12-25 01:49:54 22.214.171.124 - GET /resources/images/main/bg.gif - 200 300 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit- lib.org/issue1/webtechs/ 1999-12-25 01:49:54 126.96.36.199 - GET /issue1/webtechs/Default.asp - 200 24659 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) - http://www.statslab.cam.ac.uk/%7Esret1/analog/webtechs.html 1999-12-25 01:49:54 188.8.131.52 - GET /resources/images/main/global_home_h.gif - 200 487 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit- lib.org/issue1/webtechs/ 1999-12-25 01:49:54 184.108.40.206 - GET /resources/images/main/global_search.gif - 200 534 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit- lib.org/issue1/webtechs/ 1999-12-25 01:49:56 220.127.116.11 - GET /resources/images/main/local_home01.gif - 200 663 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit- lib.org/issue1/webtechs/
How to analyse the log files Analog ( http://www.analog.cx ) Webaliser ( http://www.mrunix.net/webalizer ) WebTrends ( http://www.webtrends.com ) Bespoke? One of the scripting solutions?
What do we measure? Hits: the overall number of requests that the server is handling. Includes all files making up a web page. Pages: the number of files designated as base pages: determined by file extension ….htm /.html /.asp /.php /.cfm? Visits: many assumptions must be made User Agents.. Total Data transfer Average processing time Search terms in referrer string Failed requests
What are the problems? Robots and other agents Developers and in-house access Caches and Proxy servers can conceal site usage IP addresses lookup can mislead IP address can mask multiple users: firewalls, NAT
How to access the results Who needs to see the processed reports? Do you need a private area on your website? Will you allow 3rd party access to reports, possibly to a reduced set of information? How much configuration and re-processing will you allow?
External Services Server usage can also be determined by third-party services: www.nedstat.com www.sitemeter.com Non-commercial only - or you pay! No guarantee of service Includes client-side sniffing
Service Monitoring We also would like to know that our service is available: Remote monitoring services Alerting or reporting? Does this overlap with your hosting SLA?
Link monitoring How is the site ranked in search engines? Use url= syntax in common search engines How many sites have linked to you? www.linkpopularity.com
Coverage By Search Engines Have you promoted your Web site? Can your Web site be accessed by search engines? Are you near the top of the search results? Search engines can report on their coverage of your Web site Coverage is an indication of potential use of your Web site For information on how to ensure that your web site has been indexed see
Links To Your Site Search engines can be used to report on the numbers of links to a Web site LinkPopularity.com provides an interface to 3 search engines Monthly reports can be obtained Links are an indication of potential use of your Web site A survey of the number of links to University web sites is available at. www.linkpopularity.com
Links From Your Web Site Links from your Web site: –Usually implemented using: Foo –Not normally possible to monitor nos. of users following link –Is possible if using link of the form: Foo
Considerations What will we measure? How often will we produce reports? How will we handle our raw server logs? Will we be able to view the results over the web? Will we need different levels of reporting detail for different users? Technical / Executive / 3rd party?