Presentation on theme: "SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008."— Presentation transcript:
SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008
What is SQL? Structured Query Language for working with databases to manage and retrieve data Pronounced sequel or S Q L ANSI/ISO standard, but proprietary extensions for different DBMSs SQL Server is MicroSofts database management system using SQL language
SQL Queries Queries are used to retrieve data or information about the data. Queries are a type of SQL Statement, the SELECT statement. Statements begin with a keyword, in this case the keyword is SELECT. Keywords are in red in these slides.
Click here to save your query or your results. SQL Query Interface in CRS Plus (3)
When focus is on the query entry area, you can save your query so you can re-run it in the future. SQL Query Interface in CRS Plus (4)
When focus is on the results area, you can save your results as a file in one of several forms and use the results in another program. Default is comma- delimited file. SQL Query Interface in CRS Plus (5)
CRS Plus Data Types SQL data types are listed in Control Table Almost all fields in CRS Plus tables are of type nVarcharcharacter strings of varying length. There are exceptions for some system fields. Strings in SQL statements must be enclosed in single quotes.
Model CRS Plus Tables Patient IDLastNameSSNSex 0001Smith2345678392 0002Jones1234567891 MedRefIDPatientIDAgeDxPSiteDxCounty 0001 62C509013 0408000268C619999 0025000271C349041 Tables are linked by including keys from other tables. Patient 0002 has two tumors. Column name is FieldName from Control table.
SELECT (1) The SELECT statement is a description of the set of data to be returned by the query. The SELECT keyword is followed by a list of the columns from a specified table to be returned in the set, followed by FROM clause specifying source table(s). Symbol * selects all columns in a table.
SELECT (2) Two ways to specify columns: –Use FieldName from Control Table in CRS Plus. Example: TypeRepSrc –Use format: TableName.FieldName Example: MedicalSum_2.TypeRepSrc –Second method is required for joins to distinguish fields with the same name in different tables, so good practice
SELECT (3) In standard SQL, all statements, including SELECT statements, must end with a semicolon (;). However, SQL Server will accept statements without a semicolon. SELECT statements are safe to run. They cannot modify or damage your database. But they can give a misleading answer if done incorrectly.
Not Case-Sensitive SQL is not case-sensitive. SELECT,select, and Select mean the same thing. It is a tradition to write SQL's own words in uppercase to distinguish SQL instructions from other words used in queries. Using all caps for keywords is good practice.
Registry Data Tables in CRS Plus Patients Patients_2 MedicalSum MedicalSum_2 Abstracts Abstracts_2 Abstracts_3 Tables split for performance and because of field number limits See Saba Yemanes handouts (or Control Table) to map fields to tables
Joining Tables To retrieve data in columns from more than one table requires a join of tables –Example: LastName from Patients table and PSite from MedicalSum table Joins require a field and value to be the same in the two tables being joined
Joins WHERE Table1.FieldA = Table2.FieldA AND Table1.FieldA = Table3.FieldA AND Table 3.FieldB = Table4.FieldB WHERE (Patients.PatientId = Patients_2.PatientID AND Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID)
SELECT Patients.BirthDate, Patients.LastName, MedicalSum_2.RegNodExam FROM Patients, MedicalSum_2, MedicalSum WHERE Patients.PatientID = MedicalSum.PatientID AND MedicalSum.MedRefID = MedicalSum_2.MedRefID AND Patients.BirthDate > '19400100 ORDER BY Patients.BirthDate SELECT clause: column names separated by commas FROM clause: table names separated by commas WHERE clause: filter criteria with joins and logical operators ORDER BY clause: sort order Model Query
Exclude Abstracts by Status (1) Status is an attribute of abstracts, not of consolidated data. In querying from Abstracts tables, you may want to exclude Voided and Pending See Sanjeevs Status Codes handout.
Exclude Abstracts by Status (2) Two versions of syntax to negate a condition –WHERE... AND NOT (Status1 = 97 OR Status1 = 99) –WHERE... AND (Status1 <> 97 AND Status1 <> 99) Status1 and Status2 are system variables with data type TinyInt, so no quotes are used around the values. Be careful with use of AND and/or OR!
Nulls NULL means data are not defined, BUT Not the same as blank or unknown Use of nulls complicates data retrieval with SQL, requiring 3-value logic (True, False, or Unknown) Nulls are not used in CRS Plus database, so you can safely ignore this complication.
Parentheses Use them in complex conditional expressions. They arent always necessary for correct results, but you need to really know what youre doing to know when its safe not to use them. They add clarity for reading.
Using Site Group Standard site analysis categories used by SEER, NPCR, and NAACCR Codes are 5 characters. For codes, see: –NAACCR vol. III, section 22.214.171.124, Table 3, p. 50 –SEER Web site: http://seer.cancer.gov/siterecode/icdo3_d01272003/ http://seer.cancer.gov/siterecode/icdo3_d01272003/ 3 levels of codes: Detail, Subaggregate, and Aggregate –GrpSite3, detailed code Example: 35011 = acute lymphocytic leukemia –GrpSite2, subaggregate Example: 35010 = acute leukemias –GrpSite1, aggregate Example: 35000 = all leukemias
GRPSite Codes in CRS Plus (1) Three fields in MedicalSum table –GrpSite1Aggregate Example: 35000, Leukemias –GrpSite2Subaggregate Example: 35010, Acute leukemias –GrpSite3Detail Example: 35011, Acute lymphocytic leukemia
GRPSite Codes in CRS Plus (2) Stored when tumor records are created or updated Can also be computed on demand on all tumor records by selecting Batch Update SEER Codes from the Administration menu
How to Formulate a Query (1) See handout: SQL Queries and CRS Plus Sample Query.doc First, formulate your question in English. Example: I want a list of childhood leukemia cases in Kosciusko county residents, diagnosed in 2006, listed in order by age.
How to Formulate a Query (2) Identify the data items needed to identify the cases, and their tables. Example: For selection, I can use GrpSite1 (the aggregate code), DxCounty, DxDate, and AgeDx, all from MedicalSum table.
How to Formulate a Query (3) Identify the data items you want on the results list, and their tables. Example: For display, I want Patients.PatientID, Patients.LastName, MedicalSum.AgeDx, MedicalSum.PSite, MedicalSum.HistTypeICDO3, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1 TIP: Always display the items you are filtering on as a check.
Caution! County codes are not unique. They are duplicated across states. To select a specific county, select on both state and county codes. For Kosciusko County, Indiana, I will need to select State = IN County = 085
How to Formulate a Query (4a) Construct first part of query: SELECT Patients.PatientID, Patients.LastName, MedicalSum.AgeDX, MedicalSum.PSite, MedicalSum.HistTypeICDO3, MedicalSum.DxState, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1 --Continued --(The 2 hyphens signal a comment.)
How to Formulate a Query (4b) FROM Patients, MedicalSum --Continued
How to Formulate a Query (5) Determine codes to use in WHERE clause Example: AgeDx less than 16 (15 years and younger) GrpSite1 = 35000 (leukemia group) State = IN CountyDx = 085 (Kosciusko) Year of DX = 2006
How to Formulate a Query (6a) Construct WHERE clause with joins and filters Example: WHERE Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID) AND --Continued
How to Formulate a Query (6b) MedicalSum.AgeDx < 016 AND SUBSTRING (MedicalSum.DxDate, 1, 4) = 2006 AND --This is SQL Servers substring function syntax, beginning in column 1 of the field for a length of 4 columns. MedicalSum.GrpSite1 = 35000 AND --Continued
How to Formulate a Query (6c) DxState = IN AND DxCounty = 085 --End of WHERE clause
How to Formulate a Query (7) Construct ORDER BY clause Example: ORDER BY AgeDx; --Note terminal semicolon indicating end of entire SQL Select statement
Preview of Part 2 Using COUNT Using AS for alias names Converting data types for calculations Subqueries (nested queries) More complex conditional selections and joins Request time! Send your query requests and well build a library on the Web site.
The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.