Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin.

Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin

Assume: We have a file The file is made up of records The records are made up of fields We want to access a specific record

Identifying the Record RRN (relative record number) –Saw previously Access fixed length records directly –Byte offset = RRN * size of record in bytes Variable length –Use index »Fixed length records »At RRN j, index contains byte offset in data file »Adds an extra look-up

Identifying the Record Key –Field or set of fields –Canonical Rule for exact format –All caps –Remove or add ‘-’ in SSN or phone # –Distinct (unique) Required for primary key ISBN, SSN, Phone #

Identifying the Record Keys come in two main flavors –Primary Uniquely identifies a single record Ex: your specific bank account –Secondary Identifies a group of records Ex: all bank customers in Turlock Ex: all bank customers overdrawn

Finding the Record Two extremes –Direct access –Sequential search –Lots of algorithms in between, but we’ll start with the extremes

Measuring Algorithm Performance In general we’ll count reads (seeks) “Big O” –Asymptotic upper bound - worst case –g(n) = O(f(n)) means c*f(n) is an upper bound for g(n), if there exist constants c, n 0 such that to the right of n 0 the value of g(n) is always below c*f(n) –Draw Picture

Direct Access Just go get the record we want O(1) –No matter how large the file we can get the record in one seek See previous discussion of using RRN for fixed length or index + RRN for variable length

Sequential Access Go through the records in the file sequentially until we find the one we’re looking for –RRN or Key –Read one record at a time from disk –O(n) where n is the number of records in the file –I.e.time is proportional to the number of records in the file (average and worst case) BUT what if we use blocks and read 100 records at a time STILL proportional to number of records in the file

Why would we ever do this? Sequential search can be good when –There are few records –Rarely need to search –Ascii files where looking for patterns (grep) –Lots of records that will match a secondary key

Pros and Cons Sequential search + easy to program + only requires simple file structures - takes too long Soon we will start looking at ways to get around this and get closer to direct access

Some Miscellaneous Topics Structure and length –Fixed length fields (think inventory example) –Make sure record size fits evenly into sectors Ex: 512 byte sectors –30 byte records -> increase to 32 bytes –Records never span sectors –More challenging with variable length fields (records) Estimate longest possible field values (waste issues if too big, truncation/data loss if too small) Averaging effect –Longest name unlikely to occur with longest address in mailing list

Some Miscellaneous Topics Distinguishing data from unused space –Read length at beginning –Special delimiter at end –Count fields

Some Miscellaneous Topics Header records –Commonly used –At beginning of file –Might contain # records Length of records Date and time of last update Name of file –Need to be able to distinguish it from data

Some Miscellaneous Topics Metadata –Data that describes the primary data in the file –Ex: Astronomer with image data generated by telescopes Mostly interested in the image Need info about image –Where and when taken –Which telescope –Names of related files/images –Etc.

Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin.

Similar presentations

Presentation on theme: "Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin.

Similar presentations

Presentation on theme: "Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin."— Presentation transcript:

Similar presentations

About project

Feedback