Presentation on theme: "1 Technical Developments Related to Quality Issues Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY"— Presentation transcript:
1 Technical Developments Related to Quality Issues Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY / UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based. Contents Application-based Developments Protocol Developments Conclusions Contents Application-based Developments Protocol Developments Conclusions
2 Application-Based Solutions Sophisticated search engines are being developed: Google Large-scale search engine for the research community (now commercial) Clever IBM research project Direct Hit! Records how users make use of search engines Alexa Allows end users to vote on resources
3 Google Google uses a "PageRank" technique - important resources are pointed to from many sites and important sites (e.g. Yahoo). See Search for Digital Libraries Following the link to the first hit
4 Clever Aims to find small set of documents the most authoritative information on the requested subject. Uses a standard search engine to gather a "root set" of pages matching the query. Next, adds all pages pointing to or pointed to by the root set. Thereafter, it uses only the links between these pages to distill the best authorities and hubs. See ) AltaVista results include sites selling medical services. Distinct pages found using Clever Clever finds the key Baseball sites.
5 Direct Hit Direct Hit: Integrated with search engines such as Yahoo Ranks results based on clicking profile from other users of the search service Users searching for Dublin Core typically click on links related to metadata. Therefore put these at the top of the search results.
6 Alexa Alexa: Enables end users to "rate" site when surfing Includes access to related links Based on central archive of the web (see See also Netscape's What's Related facility Possibilities: Signed votes Use Alexa model with UK database of resources Possibilities: Signed votes Use Alexa model with UK database of resources
7 Summary Good News New generation of experimental search engines are being developed Algorithms include: –Making use of link information –Making use of end users input –Collaborative bookmarks (cf FireFly - You like "Sex" and "Drugs". So does he, and he also likes "Rock'n'Roll") But such techniques make use of "brute strength" approach Is there a more elegant solution?
8 We Need Metadata! Web originally based on 3 architectural components. Metadata is the missing component. Metadata / RDF PICS, IPR, MCF, DSig, DC,... Addressing URL Data format HTML Transport HTTP The W3C is developing a machine-understandable metadata framework which can automate a variety of tasks (resource discovery, content filtering, etc.)
9 RDF RDF (Resource Description Framework): Provides a metadata framework ("machine understandable metadata for the web") Based on ideas from content rating (PICS), resource discovery (Dublin Core), etc. Based on a formal data model (direct label graphs) Applications include: –cataloging resources– resource discovery –intellectual property rights – content rating –digital signatures –privacy Resource Value PropertyType Property RDF Data Model
10 Certificates Certificates can be provided for: Services Users Code (Java, ActiveX) Certificate Authorities (CAs) can distribute certificates: Global CAs (Verisign, Thawte) National CAs (Post Office, central University body, British Library, etc) Government legislation this session related to digital signatures
11 Certificates Within An Organisation Digital signatures will enable publishers (e.g. Universities) to give an authoritative stamps to digital resources PhD Thesis MSc University Research Office Press Office Prospectus Within the University, the Research Office and PR Office can allocate legally-binding signatures to authorised publications Admissions Staff and students can be given a certificate which is used for authentication The CVCP could give certificates to Universities, who would then be authorised to distribute certificates within the university
12 Developments for Gateways Quality information gateways: Can make use of signed resources to help cataloguing Can provide input to sophisticated search engines (similar to Google) Information Gateway Signed PhD Thesis Quality Resources Advanced search engine A central organisation could give certificates to approved information gateways Signed gateway: this gateway follows xx quality conventions Signed Gateway Unsigned Gateway
13 Conclusions Automated Indexing AltaVista approach Comprehensive Junk indexed Too may hits Automated Indexing AltaVista approach Comprehensive Junk indexed Too may hits Manual Indexing Subject Gateway approach Quality Value-added services Incomplete Expensive Manual Indexing Subject Gateway approach Quality Value-added services Incomplete Expensive A Third Way Combination of automated and manual approaches Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of metadata, selling ads, etc.) A Third Way Combination of automated and manual approaches Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of metadata, selling ads, etc.)