Introduction to SQL Structured Query Language Martin Egerhill.

Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Disposition Databases and their components What is SQL? The basics JOIN-predicate logic Advanced functionality Subqueries Regular expressions Inserting and updating data GCP – Good Coding Practice

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Databases and their components Databases are containers for data, relational databases have connections between data points. Database Management Systems (DBMS) handle interactions with the database. Tables are structured arrangements of specific data Schemas define the structures and parameters of the tables. Columns are single fields in a table containing specific information (columns have defined data types). Data types define the content format. Rows represent the records in the table and contain the information for the record in the respective column. Keys define the relational logic.

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 What is SQL? Database communication language with very few commands. Pros: Intuitively constructed syntax (case- and break-insensitive). Widely supported. Supports both data retrieval and manipulation Fast Cons: Several dialects Lack of advanced data manipulation Can be slow if not optimized

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 The basics SELECT-statement –Defines your output data. Rename columns with “AS”. FROM-statement –Defines your data sources and relations. WHERE-clause –Defines dependencies and restrictions on selection. ORDER BY-statement –Defines the sort order (columns delimited by “,”). Can be either ascending (asc) or descendng (desc). Different columns can have different directions! GROUP BY-statement –Used to group calculated columns, for example counts, by data level. Can have multiple levels (delimited by “,”). HAVING-clause –Same functionality as the WHERE-clause but for groups.

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 JOIN-predicate logic There are two main types of joins; inner and outer. There are two ways of writing inner joins: Explicit: Full specification of predicate using JOIN and ON keywords. Implicit: Simply states columns to be joined in the WHERE-clause.

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 JOIN-predicate logic (continued) Different types of inner joins are: Equi-join: Uses equality between columns as comparator. For columns with the same name the shorthand USING (column name) can be used (not supported by SQL server and Sybase). Natural join: Not recommended as it implicitly joins columns with the same name from both tables. Cross join: Returns the Cartesian product of the two tables joined (rows which combine each row from the first table with each row from the second table).

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 JOIN-predicate logic (continued) Different types of outer joins are: Left (outer) join: Preserves all the records in the “left” table while finding matches in the “right” table. Right (outer) join: Preserves all the records in the “right” table while finding matches in the “left” table. Full (outer) join: Retrieves all records regardless of match (combines the effects of both the left and right outer joins). Note: No implicit join-notation for outer joins exists in standard SQL. FROM MASTER A LEFT JOIN LM_SEQ_LENGTH B ON A.SEQ_LENGTH_ID = B.SEQ_LENGTH_ID

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Advanced functionality Commonly used functions in the SELECT-statement: COUNT(column): counts the number of rows (not NULL values). Should be combined with GROUP BY for stratification. DISTINCT: Selects only unique values. Used on the first selected column if not nested; COUNT(DISTINCT column). SUM(column): Returns the total of values in a column. MIN(column), MAX(column): returns the min/max value. CASE: Converts the returned data using logic: NVL: Converts NULL values, other values are left untouched: DECODE: Specifies conversion of (multiple) string(s): CASE WHEN x=w1 THEN r1 WHEN x=w2 THEN r2 ELSE r3 END NVL(column, replacement_value) DECODE(column [, string, match], default)

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Advanced functionality (continued) Commonly used functions in the WHERE-clause: LIKE: Matches string in desired column. The % is used as wildcard: EXISTS: Boolean operator used together with subqueries. Is negated with NOT: IN: Used to specify several equality conditions (limited to 1000): AND, OR: Standard logical operators to combine conditions. BETWEEN: Selects a range of data between two values. This argument can behave differently depending on database! WHERE country_name LIKE ‘%JAPAN%’ WHERE (NOT) EXISTS (SELECT * FROM master) WHERE country_id IN (63856, 37465, 98364) WHERE country_id BETWEEN 63856 AND 98364

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Subqueries Subqueries are used to compile a subset of data instead of actually creating a new table. The “virtual” subset-table can then be referenced. This can be especially useful when many operands and longer conditions are needed or when calculated conditions are applied. It is not recommended to use more than 2 nested layers. SELECT CASE WHEN A.LINEAGE_NAME IN (SELECT LINEAGE_NAME FROM (SELECT DISTINCT LINEAGE_NAME FROM HOSTS_AND_SITES WHERE COUNTRY_REGION_ID = 133)) THEN 1 END AS HAWAI FROM master A...

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Regular expressions Can be used with SQLite3 Requires third party installation Requires session initialization Syntax is slightly different than Perl

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Inserting and updating data Commonly used functions in the WHERE-clause: INSERT: Inserts specified data into specified table and columns: UPDATE: Will update existing values with possible condition: DELETE: Used to remove specified entries from specified table: There are other modifiers like CREATE, ALTER and DROP but that will not be covered in this presentation. INSERT INTO enterprise (col1, col2, col3, col4) VALUES (‘James T.’, ‘Kirk’, ‘Captain’, 1) UPDATE enterprise SET col1 = ‘Jean-Luc’, col2 = ‘Picard’ WHERE col4 = 1 DELETE FROM enterprise WHERE col3 = ‘Redshirt’

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 GCP – Good Coding Practice Use descriptive table and column names. Use descriptive table aliases. Avoid arbitrary column names or identical column names. Comment your code if necessary. Don’t mix joins if not needed (use explicit outer joins). Keep table relations in the FROM-clause. Use Primary Keys, Secondary Keys and Foreign Keys. Use list maintenance tables. Do not overuse subqueries. Split the code up if too complicated. Always try to avoid writing to the database when retrieving data.

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 For further reading visit http://www.w3schools.com/sql/ http://www.w3schools.com/sql/

Introduction to SQL Structured Query Language Martin Egerhill.

Similar presentations

Presentation on theme: "Introduction to SQL Structured Query Language Martin Egerhill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to SQL Structured Query Language Martin Egerhill.

Similar presentations

Presentation on theme: "Introduction to SQL Structured Query Language Martin Egerhill."— Presentation transcript:

Similar presentations

About project

Feedback