Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pears, Gwen and JZKit Training. 2 Designing and Building Databases Topics Pears Database Building - Introduction BDatabase Description File CBuilding.

Similar presentations


Presentation on theme: "Pears, Gwen and JZKit Training. 2 Designing and Building Databases Topics Pears Database Building - Introduction BDatabase Description File CBuilding."— Presentation transcript:

1 Pears, Gwen and JZKit Training

2 2 Designing and Building Databases Topics Pears Database Building - Introduction BDatabase Description File CBuilding Databases DConfiguring and Testing EDatabase Utilities and Maintenance FAdvanced Database Description Concepts

3 Pears, Gwen and JZKit Training3 Pears Database Building Introduction Pears provides tools that allow you to: Build databases from structured data such as: – MARC - that has a defined standard structure. – XML – that has loose structure but clearly identified fields. Determine each index for the database. Load the records into a database following your indexing definitions.

4 Pears, Gwen and JZKit Training4 Pears Database Building Exercise Preview View the structure of a small set of MARC records. Build a small database from those records. Look at the setup database description file. Build the Database. Test database for correctness using testgwen. Add the database to the JZKit configuration files, making it searchable by a Z39.50 Client.

5 Pears, Gwen and JZKit Training5 Pears Database Building The Gwen Search Engine The Gwen search engine is a generalized text retrieval engine. Functionality is contained in the Java classes that can be embedded in Java applications including the JZKit Z39.50 Server. The JZKit server allows multiple, simultaneous users utilizing a client program supporting the Z39.50 protocol, to browse, search and display records from Pears databases.

6 Pears, Gwen and JZKit Training6 Pears Database Building Logical and Physical Databases A Gwen Database is a logical database –It provides features for searching and retrieving records A Pears Database is a physical database –It provides the information that a Gwen database needs

7 Pears, Gwen and JZKit Training7 Pears Database Building Gwen Database Features A Gwen Database has: –Indexes with numeric IDs –Index Terms with Postings Lists –Postings Lists have Record Numbers and Restrictor Data

8 Pears, Gwen and JZKit Training8 Pears Database Building What is a Pears Database? A Pears database is a single physical file with three main kinds of data –Record data –Index data –Postings data

9 Pears, Gwen and JZKit Training9 Pears Database Building Record Data Contains the actual records of your database. Records are stored as BER-encoded records. Each record is identified by a unique logical record number.

10 Pears, Gwen and JZKit Training10 Pears Database Building Index Data Contains a sorted list of all the Index Terms extracted from your data records. Index Terms Contain: –term/index-id. –number of records that term appears in (postings count). –a list of records that contain that term or a pointer to such a list.

11 Pears, Gwen and JZKit Training11 Understanding the Database Structure INDEX abercrombie: au: postings=2, postings list=r17, r15 anderson: au : postings=102, postings list ID=l21

12 Pears, Gwen and JZKit Training12 Pears Database Building Postings Data Contains a list of record IDs for each of the terms in the index. Each record ID may have restrictor and proximity information associated with it.

13 Pears, Gwen and JZKit Training13 Understanding the Database Structure INDEX abercrombie: au: postings=2, postings list=r17, r15 anderson: au : postings=102, postings list ID=l21 POSTINGS l21: r1024, r1021, r1007, r995, …

14 Pears, Gwen and JZKit Training14 Understanding the Database Structure INDEX abercrombie: au: postings=2, postings list=r995, r175 anderson: au : postings=102, postings list ID=l21 POSTINGS l21: r1024, r1021, r1007, r995, … RECORDS r995: au: Abercrombie & Anderson ti: Tennis Made Easy yr: 1905

15 Pears, Gwen and JZKit Training15 Pears Database Building Data Conversion The Bartlett class is responsible for updating a Pears database. Bartlett automatically converts input records to the Pears internal BER format. The class of objects that do the conversion are called RecordHandlers. RecordHandler is a Java Interface class –You can write your own RecordHandlers!

16 Pears, Gwen and JZKit Training16 Pears Database Building Data Conversion Options There are two primary Pears RecordHandlers that convert your data to BER format. –HandleUSMARC –HandleSGML There are several others: –HandleBER, HandleDB, HandlePDB, HandleUnimarc, HandleChinaMarc

17 Pears, Gwen and JZKit Training17 Pears Database Building Data Conversion The RecordHandler class has a main() method that you can use to test RecordHandlers and/or your data. –Usage: java ORG.oclc.RecordHandler.RecordHandler – c -i -o … –Example: java ORG.oclc.RecordHandler.RecordHandler –cUSMARC –iscifi.usmarc –oscifi.ber –n10

18 Pears, Gwen and JZKit Training18 Pears Database Building Data Conversion BER (Basic Encoding Rules) is defined by ISO It was created to encode ASN.1 records Encodes tree-structured data (equivalent to DOM records) Can contain binary data (e.g..jpeg files) (unlike DOM records!)

19 Pears, Gwen and JZKit Training19 BER Record Structure tag=1 tag=2tag=3tag=4 tag=1 Ralph OhioOCLC tag=2 LeVan =1, Class=1, form=1, count=3 tag=1, Class=1, form=1, count=3 tag=2, Class=2, form=1, count=2 tag=2, Class=2, form=1, count=2 tag=1, Class=2, form=0, count=5 tag=1, Class=2, form=0, count=5 data=Ralph data=Ralph tag=1, Class=2, form=0, count=5 tag=1, Class=2, form=0, count=5 data=LeVan data=LeVan tag=3, Class=2, form=0, count=4 tag=3, Class=2, form=0, count=4 data=Ohio data=Ohio tag=4, Class=2, form=0, count=4 tag=4, Class=2, form=0, count=4 data=OCLC data=OCLC

20 Pears, Gwen and JZKit Training20 Pears Database Building Marc Data Example 000 nmm Ia 001 ocm OCoLC s1995 cau d eng d 040 $aFQM$cFQM 096 $aNTERNET $aOphthalmic Anesthesia Society $h[computer file]. 256$aComputer data. 260$a San Diego, CA : $b Ophthalmic Anesthesia Society, $c $aHtml text and images in GIF and JPeg. 538$aSystem requirements: Html browser, JPeg compatible browser or image viewer. 538$aMode of access: Internet. Host: 500$aTitle from title screen. 521$aMedical. 520$aHome page of the Ophthalmic Anesthesia Society with articles, references, addresses of members, pictures and ophthalmic anesthesia resources $aSocieties, Medical $aOphthalmology $aAnesthesia $aOphthalmic Anesthesia Society $u Anesthesia Society home page For USMARC data – (InputRecordtype=USMARC)

21 Pears, Gwen and JZKit Training21 HandleUSMARC converts this cam ^^000000s1993e ng^_a /93/$06.00^^ ^_a ^^ ^_aPYLBAJ^^ ^_aA K-002^^ ^_aBrandenburg, A. ^^ ^_aMa, J.P.^^ ^_aInst. fur Theor. Phys., Heidelberg, Germany^^ ^_aCP odd observables for the top-antitop system produced at proton-antiproton and proton-proton colliders^^ ^_aNetherlands^_c7 Jan. 1993^^ ^_a SOURCE:Physics Letters B, vol.298, no.1-2, p ^^ ^_aTREATMENT: T; Theoretical or Mathematical^^ ^_aCLASS CODES: A1385K (Inclusive reactions, including total cross sections, (energy > 10 GeV))^_aA1110E (Lagrangian and Hamiltonian approach)^_aA1130E (Charge conjugation, parity, time reversal and other discret symmetries)^_aA1340F (Electromagnetic form factors; electric and magnetic moments; structure functions)^^ ^_ aThe authors propose some CP odd observables to test CP invariance in the tt system produced at pp and pp colliders. Using these observables the effects of CP violation from the production and from the decay of the top quarks can be separated well. The application of their observables to pp collisions, where one has no CP invariant initial state, is discussed. To parametrize CP violating interactions their use an effective lagrangian for the tt production and a general form factor approach for the decay of t and t (19 Refs.)^^ ^_aEnglish^^ ^_aCP invariance^^ ^_aform factors (elementary particles)^^ ^_aproton-proton inclusive interactions^^ ^_aquark production^^ ^_aantiproton+proton producing antitop+top^^ ^_aproton+proton producing antitop+top^^ ^_aCP odd observables^^ ^_aCP invariance^^ ^_aCP violating interactions^^ ^_aeffective lagrangian^^ ^_aform factor

22 Pears, Gwen and JZKit Training22 tag=650, Class=2, form=1, count=2 tag=0, Class=2, form=0, count=2 data= 2 tag=1, Class=2, form=0, count=19 data=Societies, Medical. tag=650, Class=2, form=1, count=2 tag=0, Class=2, form=0, count=2 data= 2 tag=1, Class=2, form=0, count=14 data=Ophthalmology. tag=650, Class=2, form=1, count=2 tag=0, Class=2, form=0, count=2 data= 2 tag=1, Class=2, form=0, count=11 data=Anesthesia. tag=710, Class=2, form=1, count=2 tag=0, Class=2, form=0, count=2 data=2 tag=1, Class=2, form=0, count=30 data=Ophthalmic Anesthesia Society....to this tag=0, Class=1, form=1, count=22 tag=0, Class=2, form=0, count=8 data=nmm Ia tag=245, Class=2, form=1, count=3 tag=0, Class=2, form=0, count=2 data=00 tag=1, Class=2, form=0, count=29 data=Ophthalmic Anesthesia Society tag=8, Class=2, form=0, count=16 data=[computer file]. tag=260, Class=2, form=1, count=4 tag=0, Class=2, form=0, count=2 data= tag=1, Class=2, form=0, count=15 data=San Diego, CA : tag=2, Class=2, form=0, count=30 data=Ophthalmic Anesthesia Society, tag=3, Class=2, form=0, count=5 data=1995.

23 Pears, Gwen and JZKit Training23 Pears Database Building SGML Data Example.tags file Title 1 Local-Subject-Index 2 Abstract 3 Spatial-Domain 4 Geographic-Coverage 1 Coverage-Description 2 Bounding-Coordinates 3 West-Bounding- Coordinate 1 East-Bounding- Coordinate 2 North-Bounding- Coordinate 3 South-Bounding- Coordinate 4 Time-Period 5 Time-Period-Textual 1 Name 6 Organization 7 For SGML data – (InputRecordtype=SGML) BEG - PANHANDLE COLOR INFRARED AERIAL PHOTOGRAPHY TNRIS file no File consists of original and duplicate positive transparencies, color-infrared, stereoscopic, 1:80,000, quad centered, aerial photography of the Texas Panhandle, flown in September, 1977 by Mark Hurd. US STATE TEXAS PANHANDLE BUREAU OF ECONOMIC GEOLOGY

24 Pears, Gwen and JZKit Training24 Converted SGML tag=3, Class=2, form=1, count=4 tag=1, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=4 data=-102 tag=2, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=3 data=-98 tag=3, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=2 data=30 tag=4, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=2 data=26 tag=5, Class=2, form=1, count=1 tag=1, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=9 data= tag=6, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=26 data=BUREAU OF ECONOMIC GEOLOGY tag=7, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=26 data=BUREAU OF ECONOMIC GEOLOGY tag=0, Class=1, form=1, count=8 tag=1, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=49 data=BEG - PANHANDLE COLOR INFRARED AERIAL PHOTOGRAPHY tag=2, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=35 data=AERIAL PHOTOGRAPHY; INFRARED; TEXAS tag=3, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=229 data=TNRIS file no File consists of original and duplicate positive transparencies, color-infrared, stereoscopic, 1:80,000, quad centered, aerial.photography of the Texas Panhandle, flown in September, 1977 by Mark Hurd. tag=4, Class=2, form=1, count=3 tag=1, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=8 data=US STATE tag=2, Class=2, form=1, count=1 tag=1, Class=2, form=0, count=15 data=TEXAS PANHANDLE

25 Pears, Gwen and JZKit Training25 Pears Database Building Viewing a BER record - BufferedBerStream BER records are not readable in their encoded form. BufferedBerStream is a class that includes main() that dumps BER records in a human readable format. usage:BufferedBerStream –i [-n ] [-s ] To see a page at a time: BufferedBerStream –i | more To dump to a file: BufferedBerStream –i > filename

26 Pears, Gwen and JZKit Training26 Exercise Configuration Information The database is in ~/dbs/scifi The jar files are in ~/jars Aliases are: alias Bartlett 'java -Xmx800m ORG.oclc.pears.Bartlett.Bartlett' alias BufferedBerStream 'java ORG.oclc.ber.BufferedBerStream' alias IndexLoop 'java ORG.oclc.pears.util.IndexLoop' alias RecordHandler 'java ORG.oclc.RecordHandler.RecordHandler' alias testgwen 'java ORG.oclc.os.gwen.testgwen' alias validate 'java ORG.oclc.pears.util.validate' alias ZClient 'java com.k_int.z3950.client.ZClient' alias ZServer 'java com.k_int.z3950.server.ZServer'

27 Pears, Gwen and JZKit Training27 Exercise Configuration Information The CLASSPATH is: setenv CLASSPATH.:/home/levan/java:/home/levan/lib/pears.jar:/home/levan/lib/Dbutils.jar: /home/levan/lib/ki-jzkit-z3950.jar:/home/levan/lib/ki-util.jar: /home/levan/lib/log4j.jar:/home/levan/lib/a2jruntime.jar: /home/levan/lib/ki-jzkit-iface.jar:/home/levan/lib/gwen.jar: /home/levan/lib/xerces.jar All of this is in ~/.tcshrc. Just say tcsh at the command line to get it.

28 Pears, Gwen and JZKit Training28 Pears Database Building Exercise Exercise 1: Identifying Data in a BER Record Using the BER records generated from the MARC data file: dbs/scifi/scifi.usmarc identify the tags used for the data. (Hint: run RecordHandler to make the BER records and then BufferedBerStream to look at them)

29 Pears, Gwen and JZKit Training29 Designing and Building Databases Topics APears Database Building - Introduction Database Description File CBuilding Databases DConfiguring and Testing EDatabase Utilities and Maintenance FAdvanced Database Description Concepts

30 Pears, Gwen and JZKit Training30 Database Description File Function The database description is a text file that you set up to determine: –Database Indexing –What Indexes support proximity searching –What Index contains the unique recordID Known as the desc.ini file

31 Pears, Gwen and JZKit Training31 [DB] Database NameName=scifi Accession indexRecordIDIndex=17 Raw Data TypeInputRecordType=USMARC Index definitions[Title] Index IDindex=1 Indexing Routineroutine=ORG.oclc.pears.IndexRoutines.Words Field to be indexedtagpath*=245/1 tagpath*=245/2 [Author] index=3 routine=ORG.oclc.pears.IndexRoutines.Words tagpath*=100/1 tagpath*=100/2 tagpath*=700/1 [Control Number] index=5 routine=ORG.oclc.pears.IndexRoutines.Words tagpath=1 Database Description File File Example

32 Pears, Gwen and JZKit Training32 Database Description File General Database Information The [DB] section provides the database name, accession index and input record type Syntax: –[DB] –Name= –RecordIDIndex= –InputRecordType=

33 Pears, Gwen and JZKit Training33 Database Description File General Database Information Example: [DB] Name=Test RecordIDIndex=1 InputRecordType=SGML

34 Pears, Gwen and JZKit Training34 Database Description File Setting up Index Definitions Any number of independent indexes can be defined. An index can be made from multiple fields. –Example: index 1 may include title, author, notes, etc. Indexes can share fields. –Example: index 2 may also include title

35 Pears, Gwen and JZKit Training35 Database Description File Setting up Index Definitions An index section is any section with Index, Routine and Tagpath Syntax: –[ ] –Index= –Routine= –Tagpath*= –OccurrenceRoutine=

36 Pears, Gwen and JZKit Training36 Database Description File Setting up Index Definitions index number is any number Index routine defines how the term is extracted - use ORG.oclc.pears.IndexRoutines.Words for basic keywords - use ORG.oclc.pears.IndexRoutines.Phrase for basic bound phrases path to field contains a list of BER tags separated by slashes occurrence routine (optional) specifies the routine to add proximity information to the index

37 Pears, Gwen and JZKit Training37 Database Description File Index Definition Example: [Title Words] Index=2 Routine=ORG.oclc.pears.IndexRoutines.Words Tagpath*=245/1 Tagpath*=245/2

38 Pears, Gwen and JZKit Training38 Defines positional information stored with each indexed term. Adjacency information is stored at build time on a per record basis, so is within fields, NOT across field boundaries. Set by the OccurrenceRoutine. ORG.oclc.pears.Bartlett.wordfield is most commonly used. Database Description File Term Adjacency (Optional)

39 Pears, Gwen and JZKit Training39 Database Description File Index Definition with Adjacency Example: [Title Words] Index=2 Routine=ORG.oclc.pears.IndexRoutines.Words OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield Tagpath*=245/1 Tagpath*=245/2

40 Pears, Gwen and JZKit Training40 Database Description File Global Stopwords List of terms NOT indexed Syntax: [Stopwords] index=0 routine= ORG.oclc.pears.IndexRoutines.StopwordEnforcer tagpath=none stopword*=

41 Pears, Gwen and JZKit Training41 Database Description File Global Stopwords Example: [Stopwords] index=0 routine= ORG.oclc.pears.IndexRoutines.StopwordEnforcer tagpath=none stopword*=and stopword*=the

42 Pears, Gwen and JZKit Training42 Database Description File Index Specific Stopwords Syntax: [ ] Index= Routine= Tagpath*= Stopword*=

43 Pears, Gwen and JZKit Training43 Database Description File Index Definition with Stopwords Example: [Title Words] Index=2 Routine=ORG.oclc.pears.IndexRoutines.Words OccurrenceRoutine=ORG.oclc.pears.Bartlett.wordfield Tagpath*=245/1 Tagpath*=245/2 Stopword*=and Stopword*=the

44 Pears, Gwen and JZKit Training44 Database Description File Exercise 2: Identifying Database Description Indexes View the database description file (dbs/scifi/scifidesc.ini) that has been created for your student account. Identify what indexes will be created from this file.

45 Pears, Gwen and JZKit Training45 Designing and Building Databases Topics APears Database Building - Introduction BDatabase Description File Building A Database DConfiguring and Testing EDatabase Utilities and Maintenance FAdvanced Database Description Concepts

46 Pears, Gwen and JZKit Training46 Building A Database Program Steps 1.) Convert Input Data 2.) Store Records and Extract Index Terms 3.) Sort Extracted Terms 4.) Update Index and Postings

47 Pears, Gwen and JZKit Training47 DatabaseDescription Building a Pears Database Program Steps - Illustrated Bartlett desc.ini InputData.pdb file Databas e

48 Pears, Gwen and JZKit Training48 Building A Database Bartlett usage: Bartlett -i -d [-n ] [-s ] [-t ] [-w ] [-fX] where the -f flags (which turn things on) are: -fg: guaranteed that all records are adds -fn: printing to a file / use newlines -fu: update the stored database description with a new one All of the arguments are optional, but somehow you must specify an input file and a database file. If you specify then the others default to -i.recordType and -d desc.ini

49 Pears, Gwen and JZKit Training49 Use validate to verify the internal correctness of a database usage: java validate [-count] [-records] [-index] [-data] [-postings] [-regions] [-all] -count means validate the record count -records means validate the records and implies -count -index means validate the index structure -data means validate the data for each index term and implies -index -postings means validate the postings list for each term and implies -data -all means validate everything Building A Database Validate a Database

50 Pears, Gwen and JZKit Training50 Building a Database Exercise 3 Build and validate the scifi database –cd dbs/scifi –type: Bartlett scifi –type: validate scifi -all

51 Pears, Gwen and JZKit Training51 A.Pears Database Building - Introduction B.Database Description File C.Building A Database Configuring and Testing E.Database Utilities and Maintenance F.Advanced Database Description Concepts Designing and Building Databases Topics

52 Pears, Gwen and JZKit Training52 Configuring and Testing Test using testgwen testgwen is a command line search engine that demonstrates how to embed searching in your java applications usage: testgwen –p

53 Pears, Gwen and JZKit Training53 Configuring and Testing Test using testgwen scifi.properties: database.name=scifi implementation.class=ORG.oclc.os.pearsgwen.pDatabase pearsgwen.inifileName=scifi.ini #CQL Stuff qualifier.srw.serverChoice= 1=1016 qualifier.dc.title= 1=4 structure.*= 4=6

54 Pears, Gwen and JZKit Training54 Configuring and Testing Test using testgwen scifi.ini: [Database] ZBaseDbType=ORG.oclc.db.DbNewton class=ORG.oclc.pears.pears dbName= scifi LongName = SiteSearch example USMARC database pdbFile=scifi.pdb # this allows for more than 1 attribute type BIB1, EXP1, ZDSR [attributes] type1=BIB1attributes

55 Pears, Gwen and JZKit Training55 Configuring and Testing Test using testgwen (scifi.ini continued) [BIB1attributes] OID=BIB1 default=words parse_mode = 0 browse_default=0 stopwords= default operator= 0 index* = titleWords index* = subjectCategoryCodes index* = authorWords index* = titlePhrase …

56 Pears, Gwen and JZKit Training56 Configuring and Testing Test using testgwen (scifi.ini continued) [titleWords] use=4 structure=2 alternateID=1 filter=ORG.oclc.pears.IndexRoutines.Words [subjectCategoryCodes] use=20 structure=2 alternateID=2 filter=ORG.oclc.pears.IndexRoutines.Words

57 Pears, Gwen and JZKit Training57 Configuring and Testing Test using testgwen testgwen commands: BROWSE b[rowse] [numberOfTerms] [positionOfSeed] numberOfTerms defaults to 10 positionOfSeed defaults to numberOfTerms/2 example: b dc.author=smith SEARCH s[earch] example: s dog DISPLAY DOCUMENT d[ocument] [startpoint][-endpoint] startpoint defaults to 1 endpoint defaults to 1 example: d 1

58 Pears, Gwen and JZKit Training58 Configuring and Testing testgwen testing suggestions Test the indexes with the browse command Browse the top and bottom of the index; garbage in the records tends to go there Browse all of your indexes to verify that indexing rules Test the postings lists with searches

59 Pears, Gwen and JZKit Training59 Test the records with display commands e.g. d 1 to view the first record from the latest search Configuring and Testing testgwen testing suggestions

60 Pears, Gwen and JZKit Training60 Configuring and Testing Exercise 4 Test your scifi database using testgwen testgwen –pscifi.properties b dog b dc.author=smith s dc.title=ninja turtles d q

61 Pears, Gwen and JZKit Training61 Configuring and Testing Expose your database using JZKits ZServer JZKit is an OpenSource Z39.50 server and client package –http://www.k-int.com/products/jzkit/index.phphttp://www.k-int.com/products/jzkit/index.php We have embedded gwen inside of the JZKit Server through database interfaces provided in JZKit. This allows the JZKit server to search Pears databases

62 Pears, Gwen and JZKit Training62 Configuring and Testing Expose your database using JZKits ZServer Usage: ZServer ZServer.props: port=2105 evaluator=ORG.oclc.os.jzkit.GwenSearchable Gwen.configuration=gwen.properties # # Record conversion configuration # XSLConverterConfiguratorClassName= com.k_int.IR.Syntaxes.Conversion.XMLConfigurator ConvertorConfigFile=./SchemaMappings.xml

63 Pears, Gwen and JZKit Training63 Configuring and Testing Expose your database using JZKits ZServer gwen.properties: gwen.db1=scifi.properties Scifi.properties: The same as for testgwen!

64 Pears, Gwen and JZKit Training64 Configuring and Testing Expose your database using JZKits ZServer Converting your database records to Z39.50 records: SchemaMappings.xml:

65 Pears, Gwen and JZKit Training65 Configuring and Testing Search your database using JZKits ZClient usage: ZClient Commands: open hostname[:portnum] - Connect to z server on host[:port] show n[+i] - show i records starting at n find [rpn-string] - Process the supplied rpn query base db1 [db2.....] - Search the specified databases format [ xml|sutrs|grs..] - Ask the server for the specified kind of records scan [rpn-string]

66 Pears, Gwen and JZKit Training66 Configuring and Testing Search your database using JZKits ZClient usage: ZClient rpn strings are composed as follows: rpn-string default-attrset expr expr = [ attr-plus-term | boolean ] attr-plus-term = attrdef [ attrdef...] { single-term | "quoted string" } attrdef [attrset] attrtype=attrval boolean = } expr expr

67 Pears, Gwen and JZKit Training67 Configuring and Testing Exercise 5 Start Zserver –ZServer ZServer.props& Test the database files with Zclient –Zclient –open localhost:2105 –base scifi 4=2 dog –quit

68 Pears, Gwen and JZKit Training68 A.Pears Database Building - Introduction B.Database Description File C.Building Databases D.Configuring and Testing Database Utilities and Maintenance F.Advanced Database Description Concepts Designing and Building Databases Topics

69 Pears, Gwen and JZKit Training69 Database Utilities and Maintenance General Database Information Report Indexloop: usage: java IndexLoop [-b ][-d ][-i ] [-n ] [-t ] [-f] -b the number of terms from the bottom of the index to be returned (default is 0) -d the number of terms distributed through the index to be returned (default is 0) -n the number of the most highly posted terms to be returned (default is 100) -t the number of terms from the top of the index to be returned (default is 0)

70 Pears, Gwen and JZKit Training70 Database Utilities and Maintenance Exercise 6: Using the Database Utilities Run IndexLoop against the scifi database –IndexLoop scifi

71 Pears, Gwen and JZKit Training71 During the update process,changes to the database are written to a journal (.pdb.journal) At the end of the update, the journal is committed to the database Errors that occur before the journal is committed can be undone by deleting the journal Database Utilities and Maintenance Recovering From Error Conditions

72 Pears, Gwen and JZKit Training72 Errors that occur while the journal is being committed can occur during one of two phases –Adding New Regions –Replacing Old Regions Database Utilities and Maintenance Recovering From Error Conditions

73 Pears, Gwen and JZKit Training73 Errors that occur while adding new regions –These happen when the disk fills –The database is still usable! –Copy the.pdb and journal files to a disk with more space. –Run java ORG.oclc.pears.util.MergeOldJournal to commit the journal to the database Database Utilities and Maintenance Recovering From Error Conditions

74 Pears, Gwen and JZKit Training74 Errors that occur while replacing old regions –These happen due to hardware errors –The database is NOT usable! –Fix the hardware problem –Run java ORG.oclc.pears.util.MergeOldJournal to commit the journal to the database Database Utilities and Maintenance Recovering From Error Conditions

75 Pears, Gwen and JZKit Training75 APears Database Building - Introduction BDatabase Description File CBuilding Databases DConfiguring and Testing EDatabase Utilities and Maintenance Advanced Database Description Concepts Advanced Database Description Concepts Designing and Building Databases Topics

76 Pears, Gwen and JZKit Training76 Advanced Database Concepts Topics Restrictors Replacing and Deleting Records

77 Pears, Gwen and JZKit Training77 Used to additionally qualify indexes. Speeds up Boolean searching. Can only be used in combination with another search term. One database can have multiple restrictors defined. Can be linked with a searchable index. –by shared id Advanced Database Concepts Record Restrictions

78 Pears, Gwen and JZKit Training78 Advanced Database Concepts Record Restrictions Practical with data that has a defined range. –categories like publication type –range like publication date –language Binary value –set on a per-record basis. –stored in the postings entry for each extracted term.

79 Pears, Gwen and JZKit Training79 Advanced Database Concepts Defining Record Restrictions Syntax: [docrule ] index= routine=ORG.oclc.pears.Bartlett.termrest parameters= Example: [docrule1] index=24 routine=ORG.oclc.pears.Bartlett.termrest parameters=english german french

80 Pears, Gwen and JZKit Training80 Link to an index by using the same Id. routine - rule used for setting the restriction. parameters - specific to restriction routine. Advanced Database Concepts Defining Record Restrictions

81 Pears, Gwen and JZKit Training81 Advanced Database Concepts Replace and Delete Records Unique record key is in index. If a record is added that has the same unique record key as a previous record, then the new record replaces the existing record. HandleUSMARC uses record status values from the MARC fixed fields to delete records.

82 Pears, Gwen and JZKit Training82 Advanced Class Topics A class on Advanced Database Building will cover: –Building databases with SGML data. –Advanced restrictor concepts. –Debugging of data errors. –and more exciting topics too numerous to mention.

83 Pears, Gwen and JZKit Training83 Pears Designing and Building Databases...and thats how you test your new database. What questions do you have?


Download ppt "Pears, Gwen and JZKit Training. 2 Designing and Building Databases Topics Pears Database Building - Introduction BDatabase Description File CBuilding."

Similar presentations


Ads by Google