Fluency with Information Technology INFO100 and CSE100 Katherine Deibel Katherine Deibel, Fluency in Information Technology1
We learn about data management We discussed spreadsheets We will get into databases now Lab 9 will get you involved in using database software (Access) Project 3 will have you use both spreadsheets and databases Katherine Deibel, Fluency in Information Technology2
Databases are collections of information given a structure We have done this before: XHTML describes the layout of info on a page CSS describes the styling of information JavaScript describes the computation of info Spreadsheets describe data organization and flow of calculations The repeated lesson: Give the computer structure so it can help! Katherine Deibel, Fluency in Information Technology3
Some of us want to compute, but all of us want information … Most archived information is in tables Databases enhance many applications Databases introduce interesting ideas Still, there is a lot of overlap with what spreadsheets can do Katherine Deibel, Fluency in Information Technology4
Before relational databases, there were only “flat files” Structural information difficult to describe All processing of information was “special cased” and required custom programs Information repeated in multiple places and hard to keep consistent Change in format of one file meant all related programs had to be changed Katherine Deibel, Fluency in Information Technology5
Invented in 1970 by Ted Codd Motivation: The adverse impact on development productivity of requiring programmers to navigate along access paths to reach target data [...] was enormous. In addition, it was not possible to make slight changes in the layout in storage without simultaneously having to revise all programs that relied on the previous structure. [...] As a result, far too much manpower was being invested in continual (and avoidable) maintenance of application programs Katherine Deibel, Fluency in Information Technology6
Metadata Focusing on the relationships between the data entries Manipulating data tables through operations on the tables Separating the physical and logical aspects of the database Katherine Deibel, Fluency in Information Technology7
Data about data about data about… Katherine Deibel, Fluency in Information Technology8
Metadata is Data about data The key to making computers more useful A database is composed of data and its metadata Metadata was not available to computers in the past Katherine Deibel, Fluency in Information Technology9
Bits and bytes encode the information, but that’s not all Tags can encode format and structure Example uses: word processors HTML Oxford English Dictionary Katherine Deibel, Fluency in Information Technology10
byte (baIt). Computers. [Arbitrary, prob. influenced by bit sb. 4 and bite sb.] A group of eight consecutive bits operated on as a unit in a computer Blaauw & Brooks in IBM Systems Jrnl. III. 122 An 8-bit unit of information is fundamental to most of the formats [of the System/360]. A consecutive group of n such units constitutes a field of length n. Fixed-length fields of length one, two, four, and eight are termed bytes, halfwords, words, and double words respectively IBM Jrnl. Res. & Developm. VIII. 97/1 When a byte of data appears from an I/O device, the CPU is seized, dumped, used and restored P. A. Stark Digital Computer Programming xix. 351 The normal operations in fixed point are done on four bytes at a time Dataweek 24 Jan. 1/1 Tape reading and writing is at from 34,160 to 192,000 bytes per second. byte baIt. Computers. Arbitrary, prob. influenced by bit n. 4 and bite n. A group of eight consecutive bits operated on as a unit in a computer Blaauw &. Brooks in IBM Systems Jrnl. III. 122 An 8-bit unit of information is fundamental to most of the formats of the System/360.&es.A consecutive group of n such units constitutes a field of length n.&es.Fixed- length fields of length one, two, four, and eight are termed bytes, halfwords, words, and double words respectively IBM Jrnl. Res. &. Developm. VIII. 97/1 When a byte of data appears from an I/O device, the CPU is seized, dumped, used and restored P. A. Stark Digital Computer Programming xix. 351 The normal operations in fixed point are done on four bytes at a time Dataweek 24 Jan. 1/1 Tape reading and writing is at from 34,160 to 192,000 bytes per second Katherine Deibel, Fluency in Information Technology11
Two most important for us are tags and schemas Tags Tags 305,471,002 Schemas “Schemas,” which are descriptions of tables and the kinds of values they can store Katherine Deibel, Fluency in Information Technology12
The Extensible Markup Language has become the standard way to add metadata to data Its success is largely driven by Web Example: Canada Katherine Deibel, Fluency in Information Technology13
The best part of XML is that YOU think up the tags A “self-describing language” There are no tags to learn!!! That’s why it is called “extensible” You are already an expert on XML Katherine Deibel, Fluency in Information Technology14
Tags are like XHTML … Must be properly nested Allowed characters Alphanumeric and _ No spaces! Everything must be tagged Katherine Deibel, Fluency in Information Technology15
When we tag in XML, we use tags in different ways Identity: Say what something is Affinity: Say which properties go together Collection: Group like things together Isabela Fernandina Tower Katherine Deibel, Fluency in Information Technology16
Not really a fortress… More a specialized furniture store Katherine Deibel, Fluency in Information Technology17
Databases are typically in XML All relational databases use XML Not all XML databases are relational The difference: Relational databases place further restrictions on the XML Katherine Deibel, Fluency in Information Technology18
General XML approach Best when the data is not rigidly structured More of an ad hoc organization Relational database approach Data comes with a rigid structure Happens very frequently Humans (and the computers we make) really really really like structure Katherine Deibel, Fluency in Information Technology19
A relational database consists of Multiple tables of data Descriptions of the relationships between the various tables Sounds simple… and it kind of is Katherine Deibel, Fluency in Information Technology20
Information is stored in tables Each table consists of entities of one kind Each entity has a set of characteristics known as attributes Tables are tuples of these attributes Each tuple must have a unique primary key Relationships among the data are stored The table structure is called a schema The table contents are an instance Katherine Deibel, Fluency in Information Technology21
Tables have names, attributes, tuples Instance Schema: IDnumber unique number (key) Lasttext person’s last name Firsttext person’s first name Hiredate first day on job Addrtext street address Katherine Deibel, Fluency in Information Technology22 Primary Key
Databases are comprised of multiple tables BUT DATA SHOULD NOT BE REPEATED!! Replicated data can differ in its different locations, e.g. multiple addresses can differ Inconsistent data is worse than no data Solution: Keep a single copy of any data, and If it is needed in multiple places, associate it with a key, and store key rather than the data Katherine Deibel, Fluency in Information Technology23
When looking for information, a single item or a table of answers is possible “Who is currently taking FIT100?” Result: Table of students “Who won the 1940 Best Actor Oscar?” Result: A table containing only a single row “In what years has the US won the World Cup?” Result: Empty Table A query to a database produces a table Katherine Deibel, Fluency in Information Technology24
Scalpel… Sponge… Union… Join… Katherine Deibel, Fluency in Information Technology25
There are five primitive operations on tables to create new tables: Select: pick rows from a table Project: pick columns from a table Union: combine two tables w/like columns Difference: remove one table from another Product: create “all pairs” from two tables Another fundamental operation is "Join": Join: Combine tables based on common fields Katherine Deibel, Fluency in Information Technology26
Select creates a table from the rows of another table meeting a criterion Select from Example On Hire < Katherine Deibel, Fluency in Information Technology27
Project creates a table from the columns of another table Project Last, First From Example Katherine Deibel, Fluency in Information Technology28
Union (written like addition) combines two tables with same attributes PoliticalUnits = States + Provinces Katherine Deibel, Fluency in Information Technology29
Difference (written like subtraction) removes 1 table’s rows from another Eastern = States - WestCoast Katherine Deibel, Fluency in Information Technology30
Product (written like multiplication) combines columns and pairs all rows Colors = Blues x Reds Column Rule: If A has x columns and B has y columns, then A x B has x+y columns Row Rule: If A has m rows and B has n rows, then A x B has m∙n rows Katherine Deibel, Fluency in Information Technology31
To the right is a man who divides database tables. Do you want to be like him? Seriously though Division operations do exist Advanced database topic Not used in regular practice Katherine Deibel, Fluency in Information Technology32
Join (written like a bow tie) combines rows if a common field matches Homes = States Students Katherine Deibel, Fluency in Information Technology33
The five DB Operations can create any table from a given set of tables Join is not primitive, but can be built from 5 Join, select and project are used most often All modern database systems are built on these relational operations The operations are not usually used directly, but are used indirectly from other languages SQL database language is one such example Katherine Deibel, Fluency in Information Technology34
Databases are a big topic Katherine Deibel, Fluency in Information Technology35 Physical versus logical databases Constructing and designing a database More on operations and queries More about XML
Like many aspects of computer fluency, understanding databases is about understanding structure Defining structure Manipulating structure Databases are based around the simple notion of tables More tables are built from more tables using operations Katherine Deibel, Fluency in Information Technology36