Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Indexing your web server(s) Helen Varley Sargan.

Slides:



Advertisements
Similar presentations
Web 2.0 Programming 1 © Tongji University, Computer Science and Technology. Web Web Programming Technology 2012.
Advertisements

1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
The Web Wizards Guide to Freeware/Shareware Chapter Three Customizing Your Online Experience.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Microsoft Office 2010 Basics and the Internet
Software change management
Drawing & Document Management System or DMS
Lecture plan Information retrieval (from week 11)
Benchmark Series Microsoft Excel 2013 Level 2
Desktop Training & Quick Start Guide
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
OAAIS Enterprise Information Security Security Awareness, Training & Education (SATE) Program or UCSF Campus VPN.
Server-Side vs. Client-Side Scripting Languages
ASP Tutorial. What is ASP? ASP (Active Server Pages) is a Microsoft technology that enables you to make dynamic and interactive web pages. –ASP usually.
Web Server Hardware and Software
B.Sc. Multimedia ComputingMedia Technologies Database Technologies.
Week 2 IBS 685. Static Page Architecture The user requests the page by typing a URL in a browser The Browser requests the page from the Web Server The.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
IS4401 Project Technology Issues. Introduction This seminar covers Databases When to use a Database What Database to use Development Tools Visual Studio.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Putting Documents on the Web Vivien Hall Central Computing Services.
Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
Static VS Dynamic websites. 1-What are the advantages and disadvantages? 2- Which one should you choose and why?
Chapter 10 Publishing and Maintaining Your Web Site.
 What I hate about you things people often do that hurt their Web site’s chances with search engines.
The basics of the Online Portal
NSDI/NBII Clearinghouse Server Training Slide 1 NSDI/NBII Clearinghouse Server Training Yellowstone to Yukon Initiative 7. December University of.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
A Lightweight Approach To Support of Resource Discovery Standards The Problem Dublin Core is an international standard for resource discovery metadata.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
Web-Based Tools Discuss pros and cons of hosting company’s web site Discuss features of typical Web server software packages Discuss fundamental duties.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Product Feeds. What is a Product? In marketing terms, a product is an item, service or idea that is for sale Examples are: A flight with set dates and.
The Internet : Exploration, Evaluation, and Elaboration presented by Kathy Schrock.
Web Indexing and Searching By Florin Zidaru. Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo.
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials The internet: Blogging Suitable for: Advanced.
UNIT 14 1 Websites. Introduction 2 A website is a set of related webpages stored on a web server. Webmaster: is a person who sets up and maintains a.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Validating, Promoting, & Publishing Your Web Site Writing For the Web The Internet Writer’s Handbook 2/e.
Protecting Students on the School Computer Network Enfield High School.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
1 After completing this lesson, you will be able to: Transfer your files to the Internet. Choose a method for posting your Web pages. Use Microsoft’s My.
P2Rx Web Group Fall 2003 Update. Coding Management Items covered today Topic hubs being shared Modularized Topic Hub Code CVS server (managing code) –Topic.
Web Search Engines AGED Search Engines Search engines (most have directories, too)  Yahoo  AltaVista  Lycos
Deploying Software with Group Policy Chapter Twelve.
UNIT-3 1.Web server software and Tools 1IT2031 UNIT-3.
ASP. ASP is a powerful tool for making dynamic and interactive Web pages An ASP file can contain text, HTML tags and scripts. Scripts in an ASP file are.
+ Publishing Your First Post USING WORDPRESS. + A CMS (content management system) is an application that allows you to publish, edit, modify, organize,
Learning Aim C.  In this section we will look at some simple client-side scripts, browser compatibility, exporting and compressing and suitable file.
Securing a Host Computer BY STEPHEN GOSNER. Definition of a Host  Host  In networking, a host is any device that has an IP address.  Hosts include.
Creating Web Pages in Word. Sharing Office Files Online Many Web pages are created using the HTML programming language. Web page editors are software.
XP Creating Web Pages with Microsoft Office
Pros and Cons of Static or Dynamic Websites. As a website user, you may not bother if a site you visit is static or dynamic as it is a sheer backend functionality.
2.4 Cyber-Safety.
HedEx Lite Obtaining and Using Huawei Documentation Easily
Warm Handshake with Websites, Servers and Web Servers:
MCSA VCE
Popular Operating Systems
Content Management Systems
2.4 Cyber-Safety.
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Presentation transcript:

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Indexing your web server(s) Helen Varley Sargan

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Why create an index? Helps users (and webmasters) to find things …but isnt a substitute for good navigation Gives cohesion to a group of unrelated servers Observation of logs gives information on what people are looking for - and what they are having trouble finding You are already being part-indexed by many search engines, unless you have taken specific action against it

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Current situation NameTotal ht://Dig 25 Excite 19 Microsoft 12 Harvest 8 Ultraseek 7 SWISH 5 Webinator 4 Netscape 3 wwwwais 3 FreeFind 2 Other 13 None 59 Based on UKOLN survey of search engines used in 160 UK HEIs carried out in July/Aug Report to be published in Ariadne issue 21. See.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Current situation questions Is the version of Muscat used by Surrey the free version available for a time (but not any more)? Are the users of Excite quite happy with the security and that development seems to have ceased? Are users of local search engines that don't use robots.txt happy with what other search engines can index on their sites (you have got a robots.txt file haven't you?)

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Types of tool External services are robots Tools you install yourself fall into two main categories (some will work both ways) –direct indexes of local and/or networked file structure –robot- or spider-based following instructions from the robots.txt file on each web server indexed The programs are either in a form you have to compile yourself or are precompiled for your OS, or they are written in Perl or Java, so will need either Perl or Java runtime to function.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Controlling robot access 1 All of our web servers are being part-indexed by external robots Control of external robots and a local robot- mediated indexer is by the same route –a robots.txt file to give access information –Meta tags for robots in each HTML file giving indexing and link-following entry or exclusion –Meta tags in each HTML file giving description and keywords The first two controls are observed by all the major search engines. Some search engines do not observe description and keyword meta tags.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Controlling robot access 2 Some patchy support for Dublin Core metadata Access to branches of the server can be limited by the server software - by combining access control with metadata you can give limited information to some users and more to others. If you dont want people to read files, either password-protect that section of the server or remove them. Limiting robot access to a directory can make nosey users flock to look whats inside.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Security There has been a security problem with indexing software (Excite free version in 1998) Remember the security of the OS the indexing software is running under - keep all machines up-to- date with security patches whether they are causing trouble or not. Seek help with security if you are not an expert in the OS, particularly with Unix or Windows NT

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service What tool to use? 1 Find out if any money, hardware and/or staff are available for the project first Make a shopping list of your requirements and conditions –hosting the index (where)? –platform (available and desirable)? –how many servers (and/or pages) will I index? –is the indexed data very dynamic? –what types of files do I want indexed? –what kind of search (keyword, phrase, natural language, constrained)? Are you concerned how you are indexed by others?

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service What tool to use? 2 Equipped with the answers to the previous questions, you will be able to select a suitable category of tool If you are concerned how others index your site, install a local robot- or spider-based indexer and look at indexer control measures Free externally hosted services for very small needs Free tools (mainly Unix-based) for the technically literate or built-in to some server software Commercial tools cover a range of platforms and pocket-depths but vary enormously in features

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Free externally hosted services Will be limited to the number of pages indexed, possibly the number of times the index is access, and may be deleted if not used for a certain number of days (5-7) Very useful for small sites and/or those with little technical experience or resources Access is prey to Internet traffic (most services are in US) and server availability, and for UK users incoming transatlantic traffic will be charged for You may have to have advertising on your search page as a condition of use

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Free tools - built in Microsoft, Netscape, WebStar, WebTen and WebSite Pro all come with built in indexers (others may too) With any or all of these there may be problems indexing some other servers, since they are all using vendor-specific APIs (they may receive responses from other servers that they cant interpret). Problems are more likely with more and varied server types being indexed

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Free tools - installed Most active current development on SWISH (both E and ++), Webglimpse, ht://Dig and Alkaline Alkaline is a new product, all the others have been through long periods of inactivity and all are dependent on volunteer effort All of these are now robot based but may have other means of looking at directories as well Alkaline is available on Windows NT, but all the others are Unix. Some need to be compiled.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Commercial tools Most have specialisms - sort out your requirements very carefully before you select a shortlist Real money price may vary from US$250 to £10,000+ (possibly with additional yearly maintenance), depending on product The cost of most will be on a sliding scale depending on the size of index being used Bear in mind that Java-based tools will require the user to be running a Java-enabled browser

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Case Study 1 - Essex Platform: Windows NT Number of servers searched: 16 Number of entries: approx 11,500 File types indexed: Office files, html and txt. Filters available for other formats Index updating: Configured with windows task scheduler. Incremental updates possible. Constrained searches possible: Yes Configuration: follows robots.txt but can take a 'back door' route as well. Obeys robots meta tag Logs and reports: Creates reports on crawling progress. Log analysis not included but can be written as add-ons (asp scripts) Pros: Free of charge with Windows NT. Cons: Needs high level of Windows NT expertise to set up and run it effectively. May run into problems indexing servers running diverse server software. Not compatible with Microsoft Index server (a single server product). Creates several catlog files, which may create network problems when indexing many servers.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Case Study 2 - Oxford Platform: Unix Number of servers searched: 131 Number of entries: approx 43, 500 (specifically 9 levels down as a maximum on any server) File types indexed: Office files, html and txt. Filters available for other formats Index updating: Configured to reindex after a set time period. Incremental updates possible. Constrained searches possible: Yes but need to be configured on the ht://Dig server Configuration: follows robots.txt but can take a 'back door' route as well. Logs and reports: none generated in an obvious manner, but probably available somehow. Pros: Free of charge. Wide number of configuration options available. Cons: Needs high level of Unix expertise to set up and run it effectively. Index files are very large.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Case Study 3 - Cambridge Platform: Unix Number of servers searched: 232 Number of entries: approx 188,000 File types indexed: Many formats, including PDF, html and txt. Index updating: Intelligent incremental reindexing dependent on the frequency of file updates - can be given permitted schedule. Manual incremental updates easily done. Constrained searches possible: Yes easily configured by users and can also be added to configuration as a known constrained search. Configuration: follows robots.txt and meta tags. Configurable weighting given to terms in title and meta tags. Thesaurus add-on available to give user- controlled alternatives Logs and reports: Logs and reports available for every aspect of use - search terms, number of terms, servers searched, etc. Pros: Very easy to install and maintain. Gives extremely good results in a problematic environment. Technical support excellent. Cons: Relatively expensive.

Institutional Webmasters Workshop7-9 September 1999 University of Cambridge Computing Service Recommendations Choosing an appropriate search engine is wholly dependent on your particular needs and circumstances Sort out all your robot-based indexing controls when you install your local indexer Do review your indexing software regularly - if its trouble free it still needs maintaining