Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7 Tuesday 4/12/2018 Slides from A. Ballatore, Geography

Similar presentations


Presentation on theme: "Lecture 7 Tuesday 4/12/2018 Slides from A. Ballatore, Geography"— Presentation transcript:

1 Lecture 7 Tuesday 4/12/2018 Slides from A. Ballatore, Geography
Relational databases, II Lecture 7 Tuesday 4/12/2018 Slides from A. Ballatore, Geography and Stelios Sotiriadis

2 Outline 1-to-many relationships Many-to-many relationships
Normal forms Joins Aggregating records 2

3 Connect to SQL Plus SQL Scripts: Class7-sql-second-class.txt
We will run it one by one… 3

4 Film DB: ER relationships
city_id name in_country gender Cities first_name actor_id M last_name set in born in N 1 N Actors film_id title Passports worked in 1 Films owns M N passport no 1 4

5 1-to-many relationships
An actor was born in a city Each city can be birth place of many actors How do you link actors to their birth places in a DB with these two tables? Actor Actor id First name Last name Date of birth City City id In country Population N Born in 5 5

6 1-to-many relationships
One solution based on a foreign key Additional field on the many side (actor in this case, not city) Actor Actor id First name Last name Date of birth Birth place City City id Name In country Population Cities Actors N 1 born in

7 1-to-many relationships
As desirable, it is not possible to define multiple birth places for an actor Actor Actor id First name Last name Date of birth Birth place City City id Name In country Population 7

8 Many-to-many relationships
How do you link actors to the movies they worked in (or vice-versa) in a DB with these two tables? Actor Actor id First name Last name Date of birth Film Film id Name Release year M N Worked in 8 8

9 Many-to-many relationships
Solution based on an additional table The primary key (PK) of the new table is composite of the two foreign keys (FK). This means that the same pair of actor_id and film_id is unique and not null The relation can have attributes that do not belong to either entity, but to the relationship Films Actors N M Worked in Actors Actor id First name Last name Date of birth Worked_in Film id Fee Character Films Name Release year

10 Many-to-many relationships
10

11 Many-to-many relationships
Constraint that is automatically enforced in this scenario: The same actor cannot work more than once in the same film pair <actor_id, film_id> is unique Films Actors N M Worked in Actors Actor id First name Last name Date of birth Worked in Film id Fee Character Films Name Release year

12 https://en.wikipedia.org/wiki/First_normal_form
DB normalisation As a DB grows and more tables/relationships are added, it is easy to create redundancy Redundancy results in anomalies Normal forms are set of criteria that help you keep the DB well-designed and robust 12

13 DB anomalies Telephone information is not atomic, i.e., it can be split into separate records (see table below) However, to update an attribute of customer 123 or 456, we would have to change multiple records. How can we delete/update the phone numbers without deleting customers or dealing with inconsistent information? 13

14 DB anomalies To solve this problem, the best approach is to split the table into two inter-linked tables, removing all redundancy. The tables below are the best possible design (normalised). This is the core idea behind normalisation. 14

15 1st normal form (1NF) if: Each row is unique; it has a primary key
There are no columns with repeated data Each data item cannot be broken down any further Each field has a unique name 15

16 2nd normal form (2NF) if: DB is in 1st normal form
Records do not depend on anything other than a table's primary key (a compound key, if necessary) To enforce it, create separate tables for sets of values that apply to multiple records Relate these tables with a foreign key 16

17 3rd normal form (3NF) if: DB is in 2nd normal form
Values in a record that do not depend on that record's key do not belong in the table In general, any time the contents of a group of fields may apply to more than a single record in the table, place those fields in a separate table Eliminate fields that do not depend on the key 17

18 Normalization example

19 Database example Assume a video library maintains a database of movies rented out.

20 1 NF 1NF (First Normal Form) Rules
Each table cell should contain a single value. Each record needs to be unique.

21 2 NF Rule 1- Be in 1NF Rule 2- Single Column Primary Key
Records do not depend on anything other than a table's primary key

22 3 NF Rule 1- Be in 2NF Rule 2- Has no transitive functional dependencies Values in a record that do not depend on that record's key do not belong in the table

23 How can we join tables? Now that we have defined and populated relationships in the DB, how can we formulate queries such as: What French actors worked in films in English? What actors born in London worked in films with French actors? How many actresses born after 1985 worked in more than 2 films? What actors were born in cities smaller than 1M people? These queries need data that is stored in multiple tables. To solve them, we need to combine the tables. 23

24 Relational algebra Formal semantics for querying relational databases
Created as a precise language to support the design and formulation of queries Small set of operations that can be combined in very complex expressions to solve queries In SQL, you apply these operations in the SELECT statements 24

25 Relational algebra (RA) operations
Operations on two sets (tables): Cartesian product (x): combine rows from two tables in all possible ways Natural join (⋈), Theta-join (θ), equijoin: combine rows applying different strategies Union (∪) and intersection (∩) These operations are combined to extract useful information from DBs 26

26 RA: Cartesian product Set-theoretical operation to combine two sets 29

27 Cartesian product (or “cross join”)
select columns from table1,table2; Generate a new table with all possible rows from table1 and table2 Columns of the new table are from both table1 and table2 30

28 Cartesian product (or “cross join”)
actors cities select * from actors,cities; 31

29 Cartesian product (or “cross join”)
select * from actors,cities; New table with 6 rows (3 actors x 2 cities) The meaning of these rows is undefined, because they are not filtered by a relationship 32

30 Equi-join (or “inner join”)
Most common type of join Two (or more) attributes must be equal in each row The results can show a meaningful relationship between the two tables select columns from table1,table2 where table1.col = table2.col; 33

31 Equi-join (or “inner join”)
actors cities Note: the syntax table.column is recommended for readability, and necessary when there are columns with the same name in different tables being joined. select * from actors,cities where actors.birth_place = cities.city_id; 34

32 Equi-join (or “inner join”)
select * from actors,cities where actors.birth_place = cities.city_id; New table with 2 rows (2 actors with 2 cities) The rows contain actors information with information about their place of birth (meaningful!) The other rows are discarded Actors with birth_place = NULL also discarded 35

33 Equi-join with selection
select * from actors,cities where actors.birth_place = cities.city_id and cities.in_country = 'GB'; The results contain actors who were born in the UK (only 1 in this case) We have successfully retrieved meaningful information from two tables! 36

34 Left/right join The equi-join only keeps rows where both tables have matching values (~intersection) What if we want to enrich the records from table A with data from Table B, while keeping all records from table A? e.g. Actors with birth city information (even with actors with birth_city = NULL) Actors and their first films, also showing actors who never worked in a film select * from tableA left join tableB on tableA.key = tableB.key where conditions; 37

35 actors cities select * from actors left join cities
on actors.birth_place = cities.city_id; actors cities 38

36 select * from actors left join cities
on actors.birth_place = cities.city_id; actors cities 39

37 select * from actors right join cities
on actors.birth_place = cities.city_id; actors cities 40

38 select * from actors right join cities
on actors.birth_place = cities.city_id; actors cities 41

39 Joining more than 2 tables
select a.actor_id, a.first_name, a.last_name, f.film_id, f.title from actors as a, actor_work_film as w, films as f where a.actor_id = w.actor_id and f.film_id = w.film_id; actors Note: tablename as t renames the table within the query to make the query easier to write. actor_work_film films 42

40 Joining more than 2 tables
select a.actor_id, a.first_name, a.last_name, f.film_id, f.title from actors as a, actor_work_film as w, films as f where a.actor_id = w.actor_id and f.film_id = w.film_id; The result of this inner join is the pairs of actors and films (excluding films without actors and actors without films) Joins can be concatenated, for example: select * from (select * from table where conditions) where conditions 43

41 Overview of joins

42 Aggregating records In data science/GIS, we often need to get aggregate results, and not individual records (e.g., mean, count, etc.). E.g., How many actors were born in each city? SQL can group records into blocks, and then aggregate the results with operators (GROUP BY) Operators like count, sum, avg, max, min take multiple values and return a single value select aggregated_results from table group by columns 45

43 Aggregating records select in_country,count(city_id) from cities group by in_country select in_country,sum(population) from cities group by in_country 46

44 Aggregating records Count cities in each country
Sum population of cities in each country select in_country,count(city_id) from cities group by in_country select in_country,sum(population) from cities group by in_country 47

45 Aggregating records Count actors who worked in each film
select f.film_id, f.title, count(w.actor_id) n_actors from films f, actor_work_film w where f.film_id = w.film_id group by f.film_id Note that the count column is renamed to make it easier to interpret (n_actors) 48

46 Quote of the day “It is a capital mistake to theorize before one has data.” Sherlock Holmes (Arthur Conan Doyle) 49


Download ppt "Lecture 7 Tuesday 4/12/2018 Slides from A. Ballatore, Geography"

Similar presentations


Ads by Google