Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Modelling Lecture 1: Introduction to Databases Akhtar Ali 25 August 20141.

Similar presentations


Presentation on theme: "Database Modelling Lecture 1: Introduction to Databases Akhtar Ali 25 August 20141."— Presentation transcript:

1 Database Modelling Lecture 1: Introduction to Databases Akhtar Ali 25 August 20141

2 Learning Objectives To consider “What is a Database ?” To consider “What is a Relational Database ?” i.e.what is a “relation”, and therefore what differentiates a relational database from any other kind of database. To consider how Set Theory defines the structure of relations To consider how relations can be exploited to provide the equivalent of bags of duplicates & ordered sequences. 25 August 20142

3 What is a Database ? In order to understand what we mean by the term database, we will provide a definition and some of the main features associated with it. Definition : “A collection of data that is permanently stored on a computer”. In a database we should be able to : have different types of data in the collection record relationships between different data items have varying sizes of data collections to suit our needs 25 August 20143

4 We need to be able to Insert new data, delete old data, and amend existing data in the collection. Retrieve data from the collection. Manage the collection so that it can be permanently stored in the face of various hazards that would otherwise corrupt or lose data in it. 25 August 20144

5 Permanent Data Storage By “permanent” is meant that once data is put into the computer, it stays there until someone explicitly removes it, or until the computer is damaged or broken and this causes the data to be lost. The data is not lost when the computer is switched off or the database is not used for a long time. A computer has 2 classes of data storage : Random Access Memory (RAM). This is used by the computer’s Central Processing Unit (CPU) to temporarily hold data that the CPU is processing. Backing Store. This is used to store data permanently. Typically magnetic discs are used for this purpose, although other types of storage device, e.g. compact discs and magnetic tapes, are also used. Thus the database invariably uses the backing store to make the data storage permanent. 25 August 20145

6 Other Memory Types In future, as RAM becomes ever cheaper, it may be used to store databases, but it will need to use backing store to keep the data permanently as the RAM can not when the computer is switched off. For completeness, a third category of modern usage is that of ‘switch’ in communication links, e.g. in computerised telephone exchanges and Internet routing nodes. 25 August 20146

7 7

8 Descriptions To store useful information, many types of data may be needed together. As well as the obvious ones such as numbers, text, dates etc. there are also more complex types of data such as pictures, audio and video clips. In order to describe something, such as a species of bird for example, a number of different kinds of facts might be necessary to complete the description. Example:Recording observations of a species of bird. Number of birds observed, textual description of characteristics, dates of observations, map of migration routes, pictures of birds, audio recording of bird calls, video of flight. 25 August 20148

9 Relationships Among Data In practice, we are not interested in just isolated items of data. We also want to know the relationships among various data items. – Certain relationships are essential to make sense of the data. – Isolated facts are rarely meaningful. – The facts need to be given a context to make them meaningful, and the context is commonly provided by relating the data that represent facts. For example, a number on its own has no significance; but if we relate it, say, to a particular bird species and a particular date, then it becomes meaningful if we interpret it as the number of birds seen of that species on that date. – This would be an essential relationship. Example:Species of bird observations It is important to know which bird numbers, descriptions, etc relate to which bird species. It may also be useful to know which bird species have similar migration routes. 25 August 20149

10 Open-ended Relationships Other relationships may be ‘optional’ in the sense that some users may find it useful to compare (say) migration routes, but many other users of the data will not use the relationship between these different facts. All sorts of relationships are possible: some essential to make sense of data, some useful for different purposes. 25 August 201410

11 What is a Relational DB ? Definition : “A database in which all the data is stored in relations”. 25 August 201411

12 Relation Example A relation is a simple logical structure, which contains related data (hence the name “relation”). It can be pictured as a table of data: 25 August 201412

13 Purpose of Relation Thus a relation not only holds a collection of data, but also relates together the data items that it holds. This is important because, as we have noted already, in reality we need to relate data together to ensure that it is meaningful to us. 25 August 201413

14 A Relation Example A relation consists of a set of tuples. Each tuple must hold the same kinds of data. In the example below, each tuple represents one employee. 25 August 201414

15 Tuples/types Each tuple consists of a set of attributes. The data in each attribute is related. – In this case, each tuple relates the data pertaining to a single employee. Each attribute can contain data of any type. In this simple example, each attribute contains textual or numeric data. Technically, data types are orthogonal to relations. – This means they are independent of relations, and therefore data types do not constrain relations and vice versa. Commercial relational DBMSs that apply this orthogonality are typically called Object-Relational DBMSs – because they can hold any kind of object as a data type in a relation. As relations are just structures, they can hold any kind of data. – So they can hold all the different kinds of data mentioned earlier, and more besides, without limit. 25 August 201415

16 Relational Databases In computing, a way of looking at data and thinking about its handling is called a model. A relation is a way of looking at data. Relations themselves are based on simple mathematical principles. Relations can be manipulated by users in ways that are conceptually simple. A relational DB is perceived and used as a collection of relations. The relational DBMS manages the physical processing of data so that the user doesn‘t need to know what underlying computer processing goes on and merely perceives things in terms of relations. 25 August 201416

17 Relational Data Model So, a relational database utilises the relational data model. The relational model raises the level of abstraction, compared to the normal 3GL level of programming with its use of files to store data. – It enables the database programmer to think about the DB in a way that is further from the way that the hardware actually works and nearer to the way that the outside world is perceived. – This makes it easier for the programmer to solve problems and hence makes him/her more productive. 25 August 201417

18 Importance of Relational DBMSs Because relational databases are built on sound theoretical mathematical principles, the theory is very practical. – It makes relational databases easy to learn and use. – Nevertheless, relational databases can be very powerful and flexible. Therefore, relational databases : – are the most common type of database on the market (over 90% share); – are extremely important in practice. Commercial relational DBMSs usually use a database programming language called SQL. 25 August 201418

19 SQL SQL stands for Structured Query Language. SQL is sometimes pronounced “S Q L” and sometimes as “Sequel” (which was its original name when it was invented by IBM). The latter pronunciation is particularly prevalent in America. Terminology Because relations are usually depicted as tables, the word table is used in SQL instead of relation. – Consequently, as tables have columns and rows, the following tabular names are used in SQL (and often more generally) instead: tableinstead ofrelation columninstead ofattribute rowinstead oftuple 25 August 201419

20 Principles Graphical Text QBE SQL Oracle MS Access MS SQL Server Calculus Algebra Domain Tuple RAQUEL IBM’s DB2 Sybase Ingres Relational Implementations The following diagram illustrates the relational implementation and example software systems. 25 August 201420

21 Two Languages Relational databases were developed mathematically using two kinds of language to manipulate relations: – relational calculus (based on predicate calculus) – relational algebra (based on traditional algebra, of which arithmetic is the most common example) Relational calculus comes in two forms: – domain calculus – tuple calculus. 25 August 201421

22 Status for Relational DB The domain calculus can be represented graphically, and an IBM product called QBE typifies this. – QBE is no longer used commercially, – but the graphical interface to Microsoft Access represents a watered-down version of it. SQL uses a combination of tuple calculus and algebra ideas in its design, together with a number of its own idiosyncrasies. – It is entirely a textual language. – The five commercial products shown in the diagram are the main ones to implement SQL. IBM and Oracle have the major market shares, – although Microsoft’s SQL Server is now gaining a significant share as well. RAQUEL is a purely algebra language developed at the School of Informatics in the University of Northumbria. – It will be used occasionally to describe ideas because of its inherent simplicity. 25 August 201422

23 Relational Standards QBE and relational algebra have not been standardised, whereas SQL has and later standards include additions to SQL compared to earlier standards. There has been a sequence of SQL standards : – SQL:87 came out in 1987 (known as SQL1). – In 1989 there was an addendum to SQL1. – SQL:92 came out in 1992 (known as SQL2). – SQL:99 came out in 1999 (known as SQL3). – SQL:2003 came out in 2003 (known as SQL4). Another SQL standard is in preparation. However, it is important to note that the SQL standards and products do not conform in every respect to relational principles, thereby increasing the complexity of the language while weakening its power. 25 August 201423

24 SQL Products The major SQL vendors, who seek to make the latest standard relate to their products and be backwards compatible with earlier standards, dominate the SQL standards body. – Vendors’ SQL implementations often don’t adhere completely to SQL standards, and so are even more ad hoc. No product today supports all of SQL:92 yet, although many have features additional to the standard that are peculiar to themselves, and some support some of the features of SQL:99. This module uses Oracle SQL, because it is popular in the database market, but note that it does not adhere completely to the SQL standards. 25 August 201424

25 What is a Relation? The relational model is based on mathematical sets. Definition:“A set is a collection of things” A set has two main characteristics: It has no sequence or structure It has no duplicates Example:{ 2, 7, 5, 4 }  { 7, 5, 4, 2 } They are the same set because sets have no order. Example: { 2, 2, 7, 5, 4, 8, 8 } is not a set because it has duplicates. 25 August 201425

26 Set/bag A ‘set’ whose values are ordered is a Sequence A ‘set’ with duplicates is a Bag (or Multiset) A set is the simplest construct that exists in mathematics. – It is merely a collection of things, called the members of the set. – It has no other properties. So sequences and bags are not sets. Traditionally the contents of a set are written enclosed in curly brackets. In principle in mathematics, a set can be a collection of any kinds of things. For example, a set could consist of: { a car, a bus ticket, a building, a person, a flight, a colour } 25 August 201426

27 Typed Sets However, in computing, sets are invariably constrained to be typed sets, meaning that the members of a set all have to be of the same kind or type. – Thus, the above example is not a typed set (unless you can find some way of generalising them all so that they are indeed members of some strange type !). – Typed sets would be a set of cars, a set of bus tickets, etc. Databases in general and relational databases in particular all use typed sets. Look out for how this applies as you work through the module. 25 August 201427

28 Relation = Set of Tuples Definition : A relation is a set of tuples. Hence the two relations below are the same, even though their tuples appear in a different order. 25 August 201428

29 Ordering of Tuples Neither version of the relations has any particular rhyme or reason for the ordering of its tuples. Because a relation is a set of tuples, its tuples are not fixed in a particular order, because an ordering is a kind of structure and sets have no structure. – Or put another way, the same relation can appear with its tuples in any order. In practice, a relation might have a default sequence in which it appears, this being typically determined by the physically storage of the relation. The number of tuples in a relation is called the cardinality of the relation. So the example relation above has a cardinality of 5. 25 August 201429

30 Relation - No Duplicate Tuples This cannot be a relation as some tuples appear more than once. 25 August 201430

31 Ordering Since a relation is a set of tuples, it cannot have duplicate tuples, because duplicates don’t exist in a set. Sometimes it is important to deal with matters of duplication and ordering, because these things occur in reality. We will now consider how these two aspects can be handled using relations. We will consider duplication first and then ordering…….. 25 August 201431

32 Storing a Bag in a Relation Example:A library holds information about books, but it often has many copies of the same book. 25 August 201432

33 Strategy for Storing Bags The above example still uses a relation – each tuple is unique. It uses the strategy of recording one instance of each item of data about some entity (in this case, a book), and how many times that entity occurs. – This saves storage space and is generally less confusing. It is what manual recording systems generally do, e.g. a shopping list. Use this method when the entities whose data appear in a tuple are identical, but there can be many instances of them. 25 August 201433

34 Handling a Sequence with a Relation Although the DBMS presents a relation to the user as a set of tuples, we can retrieve a relation in any tuple sequence which depends on the values in the tuples. 25 August 201434

35 Sequencing is Flexible Although the relation has no particular sequence, when the DBMS retrieves a relation full of data for the user, its tuples can be sorted into any order we like. – Therefore the user or application program can see a sequence of data. All DBMSs provide this feature because it is so useful in practice. There are many cases where it is very important, if not essential, for the tuples to appear in the right sequence. Example : A telephone directory. – The directory is a set of entities each consisting of a name, address and telephone number, but it would be completely unusable if it were not sorted into alphabetic order of names ! The advantage of the relational approach is that it can provide any sequence of tuples; different users and/or applications might want the same relation of data in different sequences, and the relational DBMS can satisfy them all. 25 August 201435

36 A ‘Sequenced’ Relation The following examples illustrate how the same relation of data can be sequenced differently depending on user/application requirements. 25 August 201436

37 A ‘Sequenced’ Relation 2 25 August 201437

38 Sorting a Relation into Sequences Summary: A relation is sorted on the values of one or more of its attributes. Thus any sort attribute must contain orderable data. Examples.Numbers have a natural sequence Text can be sorted alphabetically Photos : there is no agreed way of sorting them Any orderable attribute’s data can be sorted into ascending or descending order. If more than one attribute is used for sorting, a major to minor order must be specified. 25 August 201438

39 Sorting Approach Where several attributes are used for sorting, then the approach is : Sort tuples using the 1 st attribute’s values Where more than one tuple has the same value for the 1 st attribute, then sort those tuples on the 2 nd attribute’s values Where more than one tuple has the same value for the 1 st and 2 nd attributes, then sort those tuples on the 3 rd attribute’s values etc. The order of 1 st, 2 nd,.. n th attributes is called the major to minor order, the 1 st attribute being the major attribute, and the last attribute being the minor attribute. Alternatively, tuples are said to be sorted “on the 3 rd attribute within the 2 nd attribute within the 1 st attribute”. 25 August 201439

40 SQL : Retrieving a Sequenced Relation To retrieve the relation EMPLOYEE in order of increasing employee salary : 25 August 201440

41 SQL Sort Example The above example retrieves the whole of the relation EMPLOYEE from the current DB. Asc is the default sort order, so it could be omitted if desired. If Asc had been replaced by Desc (for descending), then the sequence would have been in order of decreasing salaries. The above retrieval is an example of an SQL statement, which must always be terminated with a semi-colon; – this is because some SQL statements can be very long, so the semi-colon (rather than the end of the line) is used to indicate the end of the statement. 25 August 201441

42 SQL code Every SQL retrieval statement must include the Select and From phrases in it. – All other phrases - like Order By - are optional. The following SQL example illustrates a sequencing that uses a major and a minor attribute : Select * From EMPLOYEE Order By M-S Asc, EName Asc ; 25 August 201442

43 Limitations of Sequencing This relation contains a plumber’s job list, in order of priority:- 25 August 201443

44 Tuple Sequence Its tuple sequence is based on the urgency of the job, followed by the profit margin the plumber can make on the job. – As this data is not stored in the relation it cannot be used for sorting on tuple values; – so the DBMS cannot retrieve the tuples in this order. Thus the limitation on sequencing is that it is value-based, i.e. it must be based on values held in the relation. 25 August 201444

45 Overcoming Sequencing Limitations The solution is to add another attribute to the relation, on whose values the tuples can be sorted to create the required sequence. Example: Add a priority attribute to the plumber‘s relation and sort on it :- The only way to overcome this limitation is to add one or more attributes whose value(s) do reflect the sequence required, and then sort on these. 25 August 201445

46 Tuple Definition Tuple = A Set of Attributes Definition: A tuple is a set of attributes. Because a relation is a set of tuples, the relation has that same set of attributes. – When depicted as a table, the attributes for each tuple are always shown in the same sequence, for simplicity and convenience. 25 August 201446

47 Column Order does not Matter These two relations are the same, even though their attributes appear in a different sequence. 25 August 201447

48 Arity of Tuples Each tuple in the relation has the same set of named attributes. – Of course the values in those attributes can differ in different tuples. The number of attributes in a tuple, and hence in a relation, is called the degree (or arity) of the tuple or relation. – So the example relations above both have a degree (or arity) of 4. “Degree” is generally the preferred term in Britain, “arity” in America. A tuple/relation of degree one is said to be unary, of degree two binary, of degree three ternary, and so on; – in general a tuple/relation of degree n is said to be n-ary. – The corresponding terms of 1-tuple, 2-tuple, 3-tuple, …, n-tuple can also be used for tuples. 25 August 201448

49 Order in Retrieval In a retrieval, such as Select * From EMPLOYEE ; the order in which the attributes appear in the result is a default order, typically determined by the relation’s physical storage and/or by the order in which the attributes were referenced when the relation was created. To obtain a specific sequence of attributes in the retrieval, simply list their names in the Select phrase in the desired order. 25 August 201449

50 Sequences in SQL Thus the attribute sequences above would be produced by the following retrievals respectively : Select ENo, EName, M-S, Sal From EMPLOYEE ; Select EName, Sal, M-S, ENo From EMPLOYEE ; 25 August 201450

51 Relation - No Duplicate Attributes This cannot be a relation because attribute ‘M-S’ appears twice. Duplicate attributes are those with duplicate names, whether their contents are duplicated or not. 25 August 201451

52 Flexibility If 2 attributes have the same name, we cannot tell the computer which of these 2 attributes we want, even if they do contain different data. The lack of a fixed attribute sequence is useful, – because we can retrieve data from the DB in any attribute sequence we like; – and if we don‘t care what sequence the attributes appear in, we need not bother to specify any sequence and just get the default. 25 August 201452

53 A Relation - its Definition so far A relation is a set of tuples, each of which is a set of attributes. Each tuple in the relation contains the same set of attributes by name. Each named attribute holds data of one specific type (domain). Consequently it is convenient to consider the relation as comprising named attributes, each with its particular data type/domain. 25 August 201453

54 Intension/extension A relation’s Heading (or Intension) is its type, which is a composite type based on that of its attributes. A relation’s Body (or Extension) is its value, which is a composite being the value of all of its attributes in all of its tuples. 25 August 201454

55 Data Types Available In principle, any data type can be used in an attribute, – but in practice you will be limited by the range of types available with your DBMS. A DB relation is in fact a variable. – It is a named object in the DB whose value can change over time. Future sessions will consider relation/attribute data types in greater detail. 25 August 201455


Download ppt "Database Modelling Lecture 1: Introduction to Databases Akhtar Ali 25 August 20141."

Similar presentations


Ads by Google