Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL.

Similar presentations


Presentation on theme: "SQL."— Presentation transcript:

1 SQL

2 What is SQL? DML DDL Partially declarative
Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations Provides a sound foundation for mathematically precise optimization Has very simple structure; optimized for moving large numbers of tuples at once Set-based, retrieval-based

3 What are relational databases?
Tables and SQL, a programming language engineered for high volume data applications “must always be correct” data management apps transaction-based applications non-network, non-object based data

4 Chapter 3: The syntax of SQL

5 Details of the Select clause

6 Details of Select, continued
Select * means all columns (attributes) Using arithmetic SELECT invoice_total - payment_total – credit_total AS balance_due Using a function SELECT CONCAT(first_name, ' ', last_name) AS full_name Renaming a column SELECT invoice_number AS "Invoice Number", invoice_date AS Date, invoice_total AS Total FROM invoices

7 Operator precedence

8 Calculations & concatenation & strings
SELECT invoice_total, payment_total, credit_total, invoice_total - payment_total - credit_total AS balance_due FROM invoices SELECT vendor_city, vendor_state, CONCAT(vendor_city, vendor_state) FROM vendors SELECT vendor_contact_first_name,vendor_contact_last_name CONCAT(LEFT(vendor_contact_first_name, 1), LEFT(vendor_contact_last_name, 1)) AS initials

9 Date function, round function
SELECT invoice_date, DATE_FORMAT(invoice_date, '%m/%d/%y') AS 'MM/DD/YY', DATE_FORMAT(invoice_date, '%e-%b-%Y') AS 'DD-Mon-YYYY' FROM invoices SELECT invoice_date, invoice_total, ROUND(invoice_total) AS nearest_dollar, ROUND(invoice_total, 1) AS nearest_dime

10 Where clause format WHERE [NOT] search_condition_1 {AND|OR} [NOT] search_condition_2 ... WHERE NOT (invoice_total >= 5000 OR NOT invoice_date <= ' ') WHERE invoice_total < 5000 AND invoice_date <= ' '

11 In phrase and nested selects
WHERE test_expression [NOT] IN ({subquery|expression_1 [, expression_2]...}) WHERE vendor_id IN (SELECT vendor_id FROM invoices WHERE invoice_date = ' ')

12 Like clause WHERE vendor_city LIKE 'SAN%'
Cities that will be retrieved “San Diego”, “Santa Ana” WHERE vendor_name LIKE 'COMPU_ER%' Vendors that will be retrieved “Compuserve”, “Computerworld”

13 Order By clause SELECT vendor_name, CONCAT(vendor_city, ', ', vendor_state, ' ', vendor_zip_code) AS address FROM vendors ORDER BY vendor_name

14 Chapter 4: Manipulating multiple tables
Join Equijoin Natural join Outer join Self join N-way join

15 Explicit versus implicit joins
SELECT vendor_name, invoice_number, invoice_date, line_item_amount, account_description FROM vendors v JOIN invoices i ON v.vendor_id = i.vendor_id JOIN invoice_line_items li ON i.invoice_id = li.invoice_id JOIN general_ledger_accounts gl ON li.account_number = gl.account_number WHERE invoice_total - payment_total - credit_total > 0 ORDER BY vendor_name, line_item_amount DESC SELECT invoice_number, vendor_name FROM vendors v, invoices i WHERE v.vendor_id = i.vendor_id ORDER BY invoice_number

16 Left outer join and natural join
SELECT vendor_name, invoice_number, invoice_total FROM vendors LEFT JOIN invoices ON vendors.vendor_id = invoices.vendor_id ORDER BY vendor_name SELECT invoice_number, vendor_name FROM vendors NATURAL JOIN invoices ORDER BY invoice_number

17 Union operator SELECT 'Active' AS source, invoice_number,
invoice_date, invoice_total FROM active_invoices WHERE invoice_date >= ' ' UNION SELECT 'Paid' AS source, invoice_number, FROM paid_invoices ORDER BY invoice_total DESC

18 Chapter 5: Aggregates in queries
AVG([ALL|DISTINCT] expression) SUM([ALL|DISTINCT] expression) MIN([ALL|DISTINCT] expression) MAX([ALL|DISTINCT] expression) COUNT([ALL|DISTINCT] expression) COUNT(*) – counts nulls

19 Issues with aggregate data
This is when we intentionally create a new form of “object” and shift to ones that are deliberately not related to individual tuples It’s selections and projections that often unintentionally lose a perspective on object identity

20 Aggregate examples SELECT COUNT(*) AS number_of_invoices, SUM(invoice_total – payment_total – credit_total) AS total_due FROM invoices WHERE invoice_total – payment_total – credit_total > 0 SELECT 'After 1/1/2011' AS selection_date, COUNT(*) AS number_of_invoices, ROUND(AVG(invoice_total), 2) AS avg_invoice_amt, SUM(invoice_total) AS total_invoice_amt WHERE invoice_date > ' ‘ COUNT(*) AS number_of_invoices, MAX(invoice_total) AS highest_invoice_total, MIN(invoice_total) AS lowest_invoice_total WHERE invoice_date > ' '

21 Aggregate examples, continued
SELECT MIN(vendor_name) AS first_vendor, MAX(vendor_name) AS last_vendor, COUNT(vendor_name) AS number_of_vendors FROM vendors SELECT COUNT(DISTINCT vendor_id) AS number_of_vendors, COUNT(vendor_id) AS number_of_invoices, ROUND(AVG(invoice_total), 2) AS avg_invoice_amt, SUM(invoice_total) AS total_invoice_amt FROM invoices WHERE invoice_date > ' '

22 Having clause (with group by)
SELECT select_list FROM table_source [WHERE search_condition] [GROUP BY group_by_list] [HAVING search_condition] [ORDER BY order_by_list]

23 Group by and Having: complexities
Group by: how to vertically group tuples Having: which groups will be included in the final result Note: All of this is after the Where clause is applied Group by: based on columns or expressions that have columns in them If there are any calculations done in the Select clause, this happens after the Group by clause; i.e., it is performed for each group that results from the Group by Group by can be nested if you specify more than one column Order by operators are performed after the Having

24 Having example, group by example
SELECT vendor_id, ROUND(AVG(invoice_total), 2) AS average_invoice_amount FROM invoices GROUP BY vendor_id HAVING AVG(invoice_total) > 2000 ORDER BY average_invoice_amount DESC (Note: 2 is the number of decimals in the result.) SELECT vendor_name, COUNT(*) AS invoice_qty, ROUND(AVG(invoice_total),2) AS invoice_avg FROM vendors JOIN invoices ON vendors.vendor_id = invoices.vendor_id WHERE invoice_total > 500 GROUP BY vendor_name ORDER BY invoice_qty DESC

25 These are the same… SELECT invoice_date, COUNT(*) AS invoice_qty,
SUM(invoice_total) AS invoice_sum FROM invoices GROUP BY invoice_date HAVING invoice_date BETWEEN ' ' AND ' ' AND COUNT(*) > 1 AND SUM(invoice_total) > 100 ORDER BY invoice_date DESC SELECT invoice_date, COUNT(*) AS invoice_qty, WHERE invoice_date BETWEEN ' ' AND ' ' HAVING COUNT(*) > AND SUM(invoice_total) > 100

26 We will shift our focus a bit…
We’ll run the actually queries that appear in chapters 5 onward The slides will contain overview material and not the actual queries

27 Chapter 6: subqueries Often used to pass an aggregate value to a parent query Often a good way to book-keep what might have been a very complex WHERE clause, with perhaps a multiway join A good way to make a query look more readable to someone who uses it later Subqueries can also be reused in other queries Note: you cannot use the SELECT attributes from the embedded query in the parent query unless it directly references the appropriate table(s) in the outer FROM clause

28 Subqueries, continued Important tool: IN operator, which is a set “element of” operator (written with an epsilon), and you can negate the IN (not IN) Often you use SOME or ALL or ANY, if you want to return multiple values (i.e., multiple tuples, perhaps with only one attribute each) The default if you don’t use one of the operators above is to return a single value ALL is a set “for all elements of” operator (written with an upside down A) ANY is a set “there exists” operator (written with a backward E)

29 Subqueries, continued The default is that a subquery only executes once, but you can use a “correlated” query so that it will run once for each row processed by the parent query. This breaks the “execute from the inside out” paradigm on an uncorrelated subquery. EXISTS is often used with correlated queries You can put a subquery in a HAVING, FROM, or SELECT clause as well But such queries get messy and we will skip this for now. It’s a good idea to write and test subqueries independently whenever possible, unless they are trivial

30 Examples from chapter 6 7: passing a single value 6: ANY 3: IN
8: NOT EXISTS 5: ALL

31 Chapter 7: changing the database state
This is when we need a transaction protocol Updates must never overlap with each other or with read-only queries Read-only queries can overlap But we want to increase through put by supporting as much “concurrency” as possible Each transaction has the potential to update the DB state

32 2 Phase Transactions Each SQL program is within a begin and end transaction pair Each transaction has its own workspace for DB items it is going to update Any transactions that overlap in execution time will appear to have run in some serial order This is done by transactions requesting read and write locks (also known as shared and exclusive locks) Read locks can be shared with other readers Write locks cannot be shared with readers or writers All locks held until the end of the transaction They are released and then the changes that a transaction has made are moved to the DB

33 Serializability of transactions
The net effect is that the transactions that overlap in execution time appear to have run in some serial order Transactions can be undone by throwing away the local store (conceptually, at least) The write period at the end of the transaction must be atomic The two phases: Request read, write, upgrade locks (and wait on locks) and process Release locks and move updates to the DB There is a notion of “serializability” which means that the actual schedule of executed steps corresponds to some serial order of running the transactions

34 Interesting facts on transactions…
Various legal schedules might produce different results A crash during phase two can lead the database inconsistent The transaction manager uses a lot of overhead resources handling locks We still need to be able to roll the database back and rerun transaction logs User must control the nature of overlapping transactions or there might be very little true concurrency In a distributed database, the lock manager is a bottleneck because all components of the database must move in lockstep

35 Updating data: changing the DB state
INSERT INTO invoices VALUES (115, 97, '456789', ' ', , 0, 0, 1, ' ', NULL); (116, 97, '456701', ' ', , 0, 0, 1, ' ', NULL), (117, 97, '456791', ' ', , 0, 0, 1, ' ', NULL), (118, 97, '456792', ' ', , 0, 0, 1, ' ', NULL);

36 Updating tables, continued
USE ex; INSERT INTO color_sample (color_number) VALUES (606); INSERT INTO color_sample (color_name) VALUES ('Yellow'); INSERT INTO color_sample VALUES (DEFAULT, DEFAULT, 'Orange'); VALUES (DEFAULT, 808, NULL); VALUES (DEFAULT, DEFAULT, NULL);

37 Updating tables, continued
INSERT INTO invoice_archive SELECT * FROM invoices WHERE invoice_total - payment_total - credit_total = 0; (invoice_id, vendor_id, invoice_number, invoice_total, credit_total, payment_total, terms_id, invoice_date, invoice_due_date) SELECT invoice_id, vendor_id, invoice_number, invoice_total, credit_total, payment_total, terms_id, invoice_date, invoice_due_date

38 Updating, continued UPDATE invoices SET payment_date = ' ', payment_total = WHERE invoice_number = '97/522'; SET terms_id = 1 WHERE vendor_id = 95; SET credit_total = credit_total + 100

39 Updating, continued DELETE FROM general_ledger_accounts WHERE account_number = 306; DELETE FROM invoice_line_items WHERE invoice_id = 78 AND invoice_sequence = 2; WHERE invoice_id = 12; WHERE invoice_id IN (SELECT invoice_id FROM invoices WHERE vendor_id = 115);


Download ppt "SQL."

Similar presentations


Ads by Google