1 Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas 78704

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

Data Representation. Units & Prefixes Review kilo, mega, and giga are different in binary! bit (b) – binary digit Byte (B) – 8 binary digits KiloByte.
Goal: Write Programs in Assembly
Computer Science 101 Picture Files. Computer Representation of Pictures Common representation is as a bitmap. Common representation is as a bitmap. Two.
Information Representation
IT253: Computer Organization Lecture 6: Assembly Language and MIPS: Programming Tonga Institute of Higher Education.
1 Chapter 2 The Digital World. 2 Digital Data Representation.
Data Representation COE 202 Digital Logic Design Dr. Aiman El-Maleh
The Binary Numbering Systems
DAT2343 Comparison of The LMC and General Computer Models © Alan T. Pinck / Algonquin College; 2003.
Bit Depth and Spatial Resolution SIMG-201 Survey of Imaging Science © 2002 CIS/RIT.
Connecting with Computer Science, 2e
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Portability CPSC 315 – Programming Studio Spring 2008 Material from The Practice of Programming, by Pike and Kernighan.
An Introduction to Scanning and Storing Photographs and Graphics Bryn Jones Aug 2002
Data starts with width and height of image Then an array of pixel values (colors) The number of elements in this array is width times height Colors can.
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
©Brooks/Cole, 2003 Chapter 2 Data Representation.
Chapter 2 Data Representation. Define data types. Visualize how data are stored inside a computer. Understand the differences between text, numbers, images,
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
Higher Computing Data Representation.
ACOE1611 Data Representation and Numbering Systems Dr. Costas Kyriacou and Dr. Konstantinos Tatas.
Bits & Bytes: How Computers Represent Data
CMPT 120 How computers run programs Summer 2012 Instructor: Hassan Khosravi.
© GCSE Computing Candidates should be able to:  explain the representation of an image as a series of pixels represented in binary  explain the need.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 14 Introduction to Computer Graphics.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Binary Arithmetic & Data representation
Computer Systems Organization CS 1428 Foundations of Computer Science.
Cis303a_chapt03-2a.ppt Range Overflow Fixed length of bits to hold numeric data Can hold a maximum positive number (unsigned) X X X X X X X X X X X X X.
Text and Graphics September 26, Unit 3.
File Structures Foundations of Computer Science  Cengage Learning.
Marr CollegeHigher ComputingSlide 1 Higher Computing: COMPUTER SYSTEMS Part 1: Data Representation – 6 hours.
Data & Databases Basic Data Fundamentals. Data vs Information zData: facts Computer systems store data. zInformation: facts organized for a specific application.
1 Data Representation Characters, Integers and Real Numbers Binary Number System Octal Number System Hexadecimal Number System Powered by DeSiaMore.
Quiz # 1 Chapters 1,2, & 3.
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Multimedia Basics (1) Hongli Luo CEIT, IPFW. Topics r Image data type r Color Model : m RGB, CMY, CMYK, YUV, YIQ, YCbCr r Analog Video – NTSC, PAL r Digital.
Data Representation Robin Burke IT 130. Outline Data Representation Binary Numbers Image types.
Computer Architecture EKT 422
Data Representation The storage of Text Numbers Graphics.
Digital Images are represented by manipulating this…
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
Resolution The resolution of an image is determined by the number of individually addressable points that make up the image, whether it is the number.
Computer Science I Storing data. Binary numbers. Classwork/homework: Catch up. Do analysis of image types.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
The idea of adding markup instructions to documents is not new. Before computers, authors would make annotations by hand in their written or typed documents.
Graphics and Image Data Representations 1. Q1 How images are represented in a computer system? 2.
Byte Addressability Bytes are always 8 bits Word length typically ranges from 16 to 64 bits. Memory location assignments refer to successive byte locations.
By the end of this session you should be able to... Understand character sets and why these are used within computer systems. Understand how characters.
Text and Images Key Revision Points.
EET 2259 Unit 13 Strings and File I/O
Chapter 8 & 11: Representing Information Digitally
Data Representation.
Everything is a number Everything in a computer memory and on storages is a number. Number  Number Characters  Number by ASCII code Sounds  Number.
CPSC 315 – Programming Studio Spring 2012
Notes on the color table and indexed color concept
The University of Adelaide, School of Computer Science
Portability CPSC 315 – Programming Studio
1. Explain how ASCII is used to represent text in a computer system
Chapter 2 Data Representation.
COMS 161 Introduction to Computing
COMS 161 Introduction to Computing
Abstraction – Number Systems and Data Representation
Real-World File Structures
EET 2259 Unit 13 Strings and File I/O
Beyond Record Structures
Presentation transcript:

1 Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas

2 Metadata Data About Data  Usually in the form of a file header  Example in text Astronomy image storage format HTML format (name = value) But look on page 177: coding style makes a BIG difference  Parsing this kind of data Read field name; read field value Convert ASCII value to type required for storage & use Store converted value into right variable  Why use this type of header?

3 More Metadata Graphics Storage Formats  Data Color values for each pixel in image Data compression often used (GIF, JPG) Different color “depth” possibilities  Metadata Height & width of imagae Number of bits per pixel (color depth) If not true color (24 bits / pixel) –Color look-up table »Normally 256 entries »Indexed by values stored for each pixel (normally 1 byte) »Contains R/G/B values for color combination –Often formatted to be loaded directly into graphics RAM

4 Mixing Kinds of Data in a File Objective  Store different types of data in the same file  Textbook example – mix of astronomy data Main file header (HTML-style) Sub-files of notes – lines of ASCII text Sub-files of image data – in whatever format is needed  So our main file becomes a file of sub-files Each sub-file (header, notes, or image) is really a “record” in the main file These “records” are of varying length & format How do we store the actual records in the sub-file “records”? –Could use another level of specified-length record software –Better – do what makes sense in the situation(s)

5 Our Main File Notes Sub-file Image Sub-file Main File Header Notes Sub-file Image Sub-file … Image Header Image Data Text line … Text line Terminator Organization Notes Header

6 More on Our Mixed-Data File Access  Can we just read it sequentially? Why or why not? What if we wanted to skip a notes sub-file? What if some image didn’t even have a notes sub-file?  Can we access it directly? What would the header have to include to allow that? –An index of the “records” in the file –We call the entries in that index “tags” Each tag in the tag list has: –Type of sub-file referred to »Special-case type: end of file –RBA of sub-file in main file –Length of sub-file (not necessary, but helpful) –Key information, if any, for the sub-file

7 Even More on Our Mixed-Data File Access, continued  So how can we access the mega-file now? Read and process the header –Get information about the whole main file –Build in-memory table of tags (keys + locations) for sub-files Sequential access –Same as before –May be able to program in some speed-ups from tag table Direct access –Locate sub-file in tag table –Go right to it

8 Extensibility Look at Our Main File Format Again  Main header tells us things about the sub-files: What kinds of files they are Where to find them  Sub-files themselves To the main-file processor, they are just random bytes To each sub-file processor, they are meaningful information What If We Need a New Type of Sub-File?  Define a new type of main header entry  Extend main header processor to understand that entry  Write (or borrow or buy) code to handle new sub-file Cardinal Rule:  Everything changes – file types, data types,...

9 File Portability

10 Factors Affecting Portability - 1 Operating System Differences  Example – text lines End with line-feed character End with carriage-return and line-feed Prefixed by a count of characters in the line Natural Language Differences  Example – character coding Single-byte coding – ASCII, EBCDIC Double-byte coding – Unicode Programming Language Differences  Pascal can’t directly process varying-length records  Different C++ compilers use different byte lengths for the standard data types

11 Factors Affecting Portability - 2 Computer Architecture Differences  Byte order in 16-bit and 32-bit integer values “Big-endian” – leftmost byte is most significant “Little-endian” – rightmost byte is most significant  Storage of data in memory Most architectures require values that are N bytes long to start at a byte whose address is divisible by N 0x150x32 Big-endianLittle-endian interpretation:interpretation: 0x15320x3215 Don’t ask.

12 How to Port Files Define Your Format C*A*R*E*F*U*L*L*Y  Once a file format is defined, never change it If you need a new file format, add it so as not to invalidate the existing formats If you need to change a format, add a new one instead, and let programs that need the new version use it  Decide on a standard format for data elements Text lines –ASCII, EBCDIC, or Unicode? –Which character(s) to end lines? Binary –Tightly packed or multiple-of-N addressing? –Which “endian”?  You can always write code to convert to & from the standard format on a new language, computer, etc.

13 The Conversion Problem Only a Few Environments – do it directly: Many Env’ts. – need an intermediate form: IBMSun IBM Sun IA-32 IA XML (or some other standard format)