Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire II: Recent Additions & Cheshire III: Design and System Overview Ray R. Larson School.

Similar presentations


Presentation on theme: "October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire II: Recent Additions & Cheshire III: Design and System Overview Ray R. Larson School."— Presentation transcript:

1 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire II: Recent Additions & Cheshire III: Design and System Overview Ray R. Larson School of Information Management and Systems University of California, Berkeley

2 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson OverviewOverview Cheshire IICheshire II –Feature overview –Current usage –Recent Additions Distributed Search and IndexingDistributed Search and Indexing Geographic Operators and Search RankingGeographic Operators and Search Ranking XML Schemas and Element RetrievalXML Schemas and Element Retrieval MySQL and PostgreSQL interfacesMySQL and PostgreSQL interfaces CORI, Okapi BM-25 ranking algorithmsCORI, Okapi BM-25 ranking algorithms Result Set sorting, merging and ranking operators, bitmapped indexesResult Set sorting, merging and ranking operators, bitmapped indexes Cheshire III Design and DevelopmentCheshire III Design and Development

3 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Overview of Cheshire II It supports SGML and XMLIt supports SGML and XML It is a client/server applicationIt is a client/server application Uses the Z39.50 Information Retrieval Protocol, support for SRW, OAI, SOAP, SDLIP also implementedUses the Z39.50 Information Retrieval Protocol, support for SRW, OAI, SOAP, SDLIP also implemented Server supports a Relational Database GatewayServer supports a Relational Database Gateway Supports Boolean searching of all serversSupports Boolean searching of all servers Supports probabilistic ranked retrieval in the Cheshire search engine as well as Boolean and proximity searchSupports probabilistic ranked retrieval in the Cheshire search engine as well as Boolean and proximity search Search engine supports ``nearest neighbor'' searches and relevance feedbackSearch engine supports ``nearest neighbor'' searches and relevance feedback GUI interface on X window displays and Windows NTGUI interface on X window displays and Windows NT WWW/CGI forms interface for DL, using combined client/server CGI scripting via WebCheshireWWW/CGI forms interface for DL, using combined client/server CGI scripting via WebCheshire Scriptable clients using Tcl and (new) PythonScriptable clients using Tcl and (new) Python Store SGML/XML as files or “Datastore” databaseStore SGML/XML as files or “Datastore” database

4 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Current Usage Over 100 Databases in the UK, includingOver 100 Databases in the UK, including –AHDS/History Data Service –Mersey Libraries –ZETOC –Archives Hub Distributed Archives HubDistributed Archives Hub –JISC Resource Discovery Network (RDN) (OAI-MHP Harvesting with Cheshire Search)(OAI-MHP Harvesting with Cheshire Search) –Planned use with TEL being developed by the BL Also being used at Harvard and BerkeleyAlso being used at Harvard and Berkeley California Sheet Music ProjectCalifornia Sheet Music Project Los Alamos National Lab (genomics metadata)Los Alamos National Lab (genomics metadata)

5 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Distributed Search

6 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson The Problem The Digital Library vision -- Access to everyone for “all human knowledge”The Digital Library vision -- Access to everyone for “all human knowledge” Lyman and Varian’s estimates of the “Dark Web”Lyman and Varian’s estimates of the “Dark Web” Hundreds or Thousands of servers with databases ranging widely in content, topic, formatHundreds or Thousands of servers with databases ranging widely in content, topic, format –Broadcast search is expensive in terms of bandwidth and in processing too many irrelevant results –How to select the “best” ones to search? Which resource to search first?Which resource to search first? Which to search next if more is wanted?Which to search next if more is wanted? –Topical /domain constraints on the search selections –Variable contents of database (metadata only, full text, multimedia…)

7 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Distributed Search Tasks Resource DescriptionResource Description –How to collect metadata about digital libraries and their collections or databases Resource SelectionResource Selection –How to select relevant digital library collections or databases from a large number of databases Distributed SearchDistributed Search –How to perform parallel or sequential searching over the selected digital library databases Data FusionData Fusion –How to merge query results from different digital libraries with their different search engines, differing record structures, etc.

8 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson An Approach for Distributed Resource Discovery Distributed resource representation and discoveryDistributed resource representation and discovery –New approach to building resource descriptions based on Z39.50 –Instead of using broadcast search across resources we are using two Z39.50 Services Identification of database metadata using Z39.50 ExplainIdentification of database metadata using Z39.50 Explain Extraction of distributed indexes using Z39.50 SCANExtraction of distributed indexes using Z39.50 SCAN EvaluationEvaluation –How efficiently can we build distributed indexes? –How effectively can we choose databases using the index? –How effective is merging search results from multiple sources? –Can we build hierarchies of servers (general/meta- topical/individual)?

9 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Z39.50 Explain Explain supports searches forExplain supports searches for –Server-Level metadata Server NameServer Name IP AddressesIP Addresses PortsPorts –Database-Level metadata Database nameDatabase name Search attributes (indexes and combinations)Search attributes (indexes and combinations) –Support metadata (record syntaxes, etc)

10 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Z39.50 SCAN Originally intended to support BrowsingOriginally intended to support Browsing Query forQuery for –Database –Attributes plus Term (i.e., index and start point) –Step Size –Number of terms to retrieve –Position in Response set ResultsResults –Number of terms returned –List of Terms and their frequency in the database (for the given attribute combination)

11 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Z39.50 SCAN Results % zscan title cat 1 20 1 {SCAN {Status 0} {Terms 20} {StepSize 1} {Position 1}} {cat 27} {cat-fight 1} {catalan 19} {catalogu 37} {catalonia 8} {catalyt 2} {catania 1} {cataract 1} {catch 173} {catch-all 3} {catch-up 2} … zscan topic cat 1 20 1 {SCAN {Status 0} {Terms 20} {StepSize 1} {Position 1}} {cat 706} {cat-and-mouse 19} {cat-burglar 1} {cat-carrying 1} {cat-egory 1} {cat-fight 1} {cat-gut 1} {cat-litter 1} {cat-lovers 2} {cat-pee 1} {cat-run 1} {cat-scanners 1} … Syntax: zscan indexname1 term stepsize number_of_terms pref_pos

12 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Resource Index Creation For all servers, or a topical subset…For all servers, or a topical subset… –Get Explain information –For each index Use SCAN to extract terms and frequencyUse SCAN to extract terms and frequency Add term + freq + source index + database metadata to the XML “Collection Document” for the resourceAdd term + freq + source index + database metadata to the XML “Collection Document” for the resource –Planned extensions: Post-Process indexes (especially Geo Names, etc) for special types of dataPost-Process indexes (especially Geo Names, etc) for special types of data –e.g. create “geographical coverage” indexes

13 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson MetaSearch Approach MetaSearch Server Map Explain And Scan Queries Internet Map Results Map Query Map Results Search Engine DB2DB 1 Map Query Map Results Search Engine DB 4DB 3 Distributed Index Search Engine Db 6 Db 5

14 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Known Issues and Problems Not all Z39.50 Servers support SCAN or ExplainNot all Z39.50 Servers support SCAN or Explain Solutions that appear to work well:Solutions that appear to work well: –Probing for attributes instead of explain (e.g. DC attributes or analogs) –We also support OAI and can extract OAI metadata for servers that support OAI –Query-based sampling (Callan) Collection Documents are static and need to be replaced when the associated collection changesCollection Documents are static and need to be replaced when the associated collection changes

15 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson EvaluationEvaluation Test EnvironmentTest Environment –TREC Tipster data (approx. 3 GB) –Partitioned into 236 smaller collections based on source and date by month (no DOE) High size variability (from 1 to thousands of records)High size variability (from 1 to thousands of records) Same database as used in other distributed search studies by J. French and J. Callan among othersSame database as used in other distributed search studies by J. French and J. Callan among others –Used TREC topics 51-150 for evaluation (these are the only topics with relevance judgements for all 3 TIPSTER disks

16 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Harvesting Efficiency Tested using the databases on the previous slide + the full FT database (210,158 records ~ 600 Mb)Tested using the databases on the previous slide + the full FT database (210,158 records ~ 600 Mb) Average of 23.07 seconds per database to SCAN each database (3.4 indexes on average) and create a collection representative, over the networkAverage of 23.07 seconds per database to SCAN each database (3.4 indexes on average) and create a collection representative, over the network Average of 14.07 secondsAverage of 14.07 seconds Also tested larger databases (E.g. TREC FT database ~600 Mb with 7 indexes was harvested in 131 seconds.Also tested larger databases (E.g. TREC FT database ~600 Mb with 7 indexes was harvested in 131 seconds.

17 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Our Collection Ranking Approach We attempt to estimate the probability of relevance for a given collection with respect to a query using the Logistic Regression method developed at Berkeley (W. Cooper, F. Gey, D. Dabney, A. Chen) with new algorithm for weight calculation at retrieval timeWe attempt to estimate the probability of relevance for a given collection with respect to a query using the Logistic Regression method developed at Berkeley (W. Cooper, F. Gey, D. Dabney, A. Chen) with new algorithm for weight calculation at retrieval time Estimates from multiple extracted indexes are combined to provide an overall ranking score for a given resource (I.e., fusion of multiple query results)Estimates from multiple extracted indexes are combined to provide an overall ranking score for a given resource (I.e., fusion of multiple query results)

18 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Probabilistic Retrieval: Logistic Regression Probability of relevance for a given index is based on logistic regression from a sample set documents to determine values of the coefficients (TREC). At retrieval the probability estimate is obtained by:

19 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Probabilistic Retrieval: Logistic Regression attributes Average Absolute Query Frequency Query Length Average Absolute Collection Frequency Collection size estimate Average Inverse Collection Frequency Inverse Document Frequency (N = Number of collections M = Number of Terms in common between query and document

20 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson EvaluationEvaluation EffectivenessEffectiveness –Tested using the collection representatives described above (as harvested from over the network) and the TIPSTER relevance judgements –Testing by comparing our approach to known algorithms for ranking collections –Results were measured against reported results for the Ideal and CORI algorithms and against the optimal “Relevance Based Ranking” (MAX) –Recall analog (How many of the Rel docs occurred in the top n databases – averaged)

21 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Titles only (short query)

22 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson FutureFuture Logically Clustering servers by topicLogically Clustering servers by topic Meta-Meta Servers (treating the MetaSearch database as just another database)Meta-Meta Servers (treating the MetaSearch database as just another database)

23 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Distributed Metadata Servers Replicated servers Meta-Topical Servers General Servers Database Servers

24 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Geographic Operators and Search Ranking

25 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson The GEO Operations Operators established for the GEO Z39.50 profileOperators established for the GEO Z39.50 profile Implemented using special operations on indexesImplemented using special operations on indexes Indexing allows extraction of geographic coordinates and dates from SGML/XML data in a variety of formatsIndexing allows extraction of geographic coordinates and dates from SGML/XML data in a variety of formats Normalized internal representation in indexesNormalized internal representation in indexes Search using geographic and time elements as primary or limiting search elementsSearch using geographic and time elements as primary or limiting search elements

26 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson The GEO Operations X-based interfaces permit (simple) map drawing and searchX-based interfaces permit (simple) map drawing and search Interface to MapServer for web-based map searchingInterface to MapServer for web-based map searching

27 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson GEO Geographic operators >=<Overlap Search region and data Overlap >#< Fully Enclosed Data fully enclosed in search reg. <#>Encloses Data fully encloses search region <># Fully Outside Data outside of search region ++Near Data is near search region :<:Before Data date is before search date :<=: Before or During Data date is before or during search date :>=: During or After Data date is during or after search date :>:After Data date is after search date

28 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Overlaps search

29 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Fully Enclosed Search

30 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Map-Based Search

31 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson GeoSearch Web Interface

32 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson XML Schemas and Element Retrieval

33 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson XML Schema Support XML Schemas can now be used to define the data contentsXML Schemas can now be used to define the data contents Tested with a wide variety of schemas including METS (with various supporting schemas)Tested with a wide variety of schemas including METS (with various supporting schemas)

34 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson XML Element Extraction A new search “ElementSetName” is XML_ELEMENT_A new search “ElementSetName” is XML_ELEMENT_ Any Xpath, element name, or regular expression can be included following the final underscore when submitting a present requestAny Xpath, element name, or regular expression can be included following the final underscore when submitting a present request The matching elements are extracted from the records matching the search and delivered in a simple format..The matching elements are extracted from the records matching the search and delivered in a simple format..

35 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson XML Extraction % zselect sherlock 372 {Connection with SHERLOCK (sherlock.berkeley.edu) database 'bibfile' at port 2100 is open as connection #372} % zfind topic mathematics {OK {Status 1} {Hits 26} {Received 0} {Set Default} {RecordSyntax UNKNOWN}} % zset recsyntax XML % zset elementset XML_ELEMENT_Fld245 % zdisplay {OK {Status 0} {Received 10} {Position 1} {Set Default} {NextPosition 11} {RecordSyntax XML 1.2.840.10003.5.109.10}} { Singularitâes áa Cargáese … etc…

36 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson MySQL and PostgreSQL

37 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson RDBMS Support There are two reasons for RDBMS supportThere are two reasons for RDBMS support –IR systems are not meant for LOTS of update transactions –Some application need to have access to both relational data and text data via Z39.50 Both MySQL and PostgreSQL are popular open source RDBMS and now either can now be used via CheshireBoth MySQL and PostgreSQL are popular open source RDBMS and now either can now be used via Cheshire –Z39.50 mappings to RDBMS columns –“ZQL” submission of SQL as Z39.50 Type 0 query

38 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Protocol Support

39 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson ProtocolsProtocols In Cheshire II most protocols (except Z39.50) are implemented using scriptingIn Cheshire II most protocols (except Z39.50) are implemented using scripting Example scripts to support the following are included in the distributionExample scripts to support the following are included in the distribution –OAI –SRW (Python version) –SOAP –SDLIP

40 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson CORI, Okapi BM-25 ranking algorithms

41 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Why additional ranking methods CORI is extremely hard to beat as a distributed search methodCORI is extremely hard to beat as a distributed search method OKAPI BM-25 is now the “default” retrieval algorithm in experimental IROKAPI BM-25 is now the “default” retrieval algorithm in experimental IR New operators (later) let us mix and match ranking methods and Boolean operationsNew operators (later) let us mix and match ranking methods and Boolean operations

42 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson CORI ranking

43 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Okapi BM25 Where:Where: Q is a query containing terms TQ is a query containing terms T K is k 1 ((1-b) + b.dl/avdl)K is k 1 ((1-b) + b.dl/avdl) k 1, b and k 3 are parameters, usually 1.2, 0.75 and 7-1000k 1, b and k 3 are parameters, usually 1.2, 0.75 and 7-1000 tf is the frequency of the term in a specific documenttf is the frequency of the term in a specific document qtf is the frequency of the term in a topic from which Q was derivedqtf is the frequency of the term in a topic from which Q was derived dl and avdl are the document length and the average document length measured in some convenient unitdl and avdl are the document length and the average document length measured in some convenient unit w (1) is the Robertson-Sparck Jones weight.w (1) is the Robertson-Sparck Jones weight.

44 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Result Set sorting, merging and ranking operators, bitmapped indexes

45 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson SortingSorting Support for Z39.50 Sort functionsSupport for Z39.50 Sort functions Merge multiple resultsets and sort new setMerge multiple resultsets and sort new set –Sort by index name/key (ATTRIBUTE) –Sort by rank (ELEMENTS) Merges ranked results and Boolean resultsMerges ranked results and Boolean results –Sort by XML/SGML Tag contents (TAG)

46 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Merging and Ranking Operators Extends the capabilities of merging to include merger operations in queries like Boolean operatorsExtends the capabilities of merging to include merger operations in queries like Boolean operators Fuzzy Logic OperatorsFuzzy Logic Operators –!FUZZY_AND –!FUZZY_OR –!FUZZY_NOT Restrict components to particular parentsRestrict components to particular parents Merge OperatorsMerge Operators –!MERGE_SUM –!MERGE_MEAN

47 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Bitmapped Indexes Bitmap indexes can be used for Boolean operations where the data has only a few values and very large numbers of items with each valueBitmap indexes can be used for Boolean operations where the data has only a few values and very large numbers of items with each value Only one bit per record stored in the indexOnly one bit per record stored in the index Processed on a demand basis so only blocks with the bits needed to resolve a query are fetchedProcessed on a demand basis so only blocks with the bits needed to resolve a query are fetched

48 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire III Design and Development

49 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire III Goals Retain or reproduce (and refine) all Cheshire II featuresRetain or reproduce (and refine) all Cheshire II features –“Spring cleaning” of code base –Add Full Unicode Support –Store most system and content data in the database Permit easy and efficient integration in Web ServicesPermit easy and efficient integration in Web Services Use threaded server for economy of resource usageUse threaded server for economy of resource usage Enhanced Multiprotocol supportEnhanced Multiprotocol support Support for distributed processing (I.e. GRID clusters)Support for distributed processing (I.e. GRID clusters) Enhance expandability and “drop in’ functionalityEnhance expandability and “drop in’ functionality Interfaces and/or APIs for Java, Python, C/C++Interfaces and/or APIs for Java, Python, C/C++

50 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire II Design Overview XML DOCS XML DIRECTORY INDEX CLUSTER INDEX CHESHIRE CONT BUILD ASSOC Z SERVER CONFIG COMPONENT DEFINITION INDEX(S) ASSOC CLUSTER EXTENSION

51 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire III Server Overview API INDEXINGINDEXING T R R X E A S C N L O S T R F D O R M S SEARCHSEARCH P H R A O N T D O L C E O R L DB API REMOTE SYSTEMS (any protocol) XML CONFIG & Metadata INFO INDEXES LOCAL DB STAFF UI CONFIG NETWORKNETWORK RESULT SETS SCANSCAN USER INFO CONFIG&CONTROLCONFIG&CONTROL ACCESS INFO AUTHENTICATIONAUTHENTICATION CLUSTERINGCLUSTERING Native calls Z39.50 SOAP OAI JDBC Fetch ID Put ID OpenURL APACHEINTERFACEAPACHEINTERFACE SERVER CONTROL UDDI WSRP SRW Normalization Client User/ Clients OGIS Cheshire III SERVER

52 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson API INDEXINGINDEXING T R R X E A S C N L O S T R F D O R M S SEARCHSEARCH P H R A O N T D O L C E O R L DB API REMOTE SYSTEMS (any protocol) XML CONFIG & Metadata INFO INDEXES LOCAL DB STAFF UI CONFIG NETWORKNETWORK RESULT SETS SCANSCAN USER INFO CONFIG&CONTROLCONFIG&CONTROL ACCESS INFO AUTHENTICATIONAUTHENTICATION CLUSTERINGCLUSTERING Native calls Z39.50 SOAP OAI JDBC Fetch ID Put ID OpenURL APACHEINTERFACEAPACHEINTERFACE SERVER CONTROL UDDI WSRP SRW Normalization Client User/ Clients OGIS Cheshire III SERVER

53 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Retain Features The intent is to permit all of the types of in indexing, searching and record formatting available now, while making it easier to add new capabilitiesThe intent is to permit all of the types of in indexing, searching and record formatting available now, while making it easier to add new capabilities The new system will also support full UNICODE for content and for metadataThe new system will also support full UNICODE for content and for metadata Store metadata and content in the database (including config information, etc.)Store metadata and content in the database (including config information, etc.)

54 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Permit easy integration of Web Services The assumption is that the web server will be the central server mechanism in the future.The assumption is that the web server will be the central server mechanism in the future. The new design relies on the session handling, threading and load management tools available in Apache (2.0.40+)The new design relies on the session handling, threading and load management tools available in Apache (2.0.40+) The Cheshire server is dynamically loaded as part of the Web ServerThe Cheshire server is dynamically loaded as part of the Web Server

55 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Multiprotocol Support The Web server handles the network issues and passes requests in various protocols along to the Cheshire Server.The Web server handles the network issues and passes requests in various protocols along to the Cheshire Server. Individual Protocol “plugins” and the Protocol Handler convert search, display, and metadata requests in a particular protocol to the internal Cheshire III control language, and convert outgoing message and data to the appropriate protocol formIndividual Protocol “plugins” and the Protocol Handler convert search, display, and metadata requests in a particular protocol to the internal Cheshire III control language, and convert outgoing message and data to the appropriate protocol form

56 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Distributed Processing (RESEARCH) The server will support protocols for interchange of partial results and collection statistics with a single “Master” controlling the actions of a large number of “Slave” serversThe server will support protocols for interchange of partial results and collection statistics with a single “Master” controlling the actions of a large number of “Slave” servers These will run in parallel in a GRID environmentThese will run in parallel in a GRID environment This is still “research” but will probably be using “Storage Grid” technology from SDSC with our own applicationsThis is still “research” but will probably be using “Storage Grid” technology from SDSC with our own applications Non-Grid use of the same protocols, etc will be possible (but definitely slower)Non-Grid use of the same protocols, etc will be possible (but definitely slower)

57 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Enhanced Expanability Clearly defined APIs for interacting with the server will permit easy addition of new functionality, or to replace or upgrade existing functionalityClearly defined APIs for interacting with the server will permit easy addition of new functionality, or to replace or upgrade existing functionality Interactive user interface for database configuration and setupInteractive user interface for database configuration and setup –We want to make it easier for a user/administrator to create and manage the database

58 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Multilingual APIs The system is being developed in a multilingual environment.The system is being developed in a multilingual environment. We will include the ability to interface with (at a minimum) Java, Python and C/C++ applications.We will include the ability to interface with (at a minimum) Java, Python and C/C++ applications. APIs for developing new functions will be available in these languages as wellAPIs for developing new functions will be available in these languages as well

59 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson DevelopmentDevelopment Currently work is going on here (RRL) and (primarily) in the UK Currently work is going on here (RRL) and (primarily) in the UK We have incomplete (Alpha) versions of the system, but haven’t been distributing it in the current form (changing constantly)We have incomplete (Alpha) versions of the system, but haven’t been distributing it in the current form (changing constantly) First release version is expected in mid-’04First release version is expected in mid-’04

60 October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson For More Information http://Cheshire.berkeley.eduhttp://Cheshire.berkeley.edu ftp://Cheshire.berkeley.edu for sourceftp://Cheshire.berkeley.edu for source Contact ray@sims.berkeley.eduContact ray@sims.berkeley.edu


Download ppt "October 3, 2003 CDL -- Cheshire II & III -- Ray R. Larson Cheshire II: Recent Additions & Cheshire III: Design and System Overview Ray R. Larson School."

Similar presentations


Ads by Google