Indexing & Computational Efficiency

Indexing & Computational Efficiency
Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi This section provides a brief introduction to select methods of making SQL code computationally more efficient. This brief presentation was organized by Dr. Alemi and narrated by Yara Alemi.

Computational Efficiency
There are a number of steps to make SQL code run more efficiently.

Computers Don’t Get Tired
Computers do not get tired. They can repeatedly do the same task. They don’t need to rest. They don’t get bored from millions of repetitions. Yet, all these repetitions and data manipulations take time. Processing a record make take a nanosecond. Doing so for millions of records repeatedly will take longer

Computers Don’t Get Tired
When analyzing massive data, some SQL code make that take hour or days to run. You would start your code go to bed and come back to see the results. Given that you might have errors in the code and need to redo the entire task, codes that take a long time to run are frustrating. Steps need to be taken to make the code more efficient. We talk about some of these steps in these slides.

Parallel Processing One approach to making code more efficient is to process computer tasks in several parallel machines.

sp_spaceused = ] 'objname' ] [, = ] 'updateusage' ] [, = ] 'mode' ] [, = ] oneresultset ] [, = ] include_total_xtp_storage ] You see the syntax of the code to create parallel processing. This is not something we talk about in these slides and you are encouraged to read on this on your own. You should be aware that if you are working with big data, this feature is available to you. Codes run faster if there are less input/output of data and more run time memory to do computations needed. Allocation of space to run time memory can also be done in SQL code. Again this is beyond the scope of these slides.

Reduce Data A key method is to reduce the data so that your code can run faster.

This can be done by sampling or by eliminating unneeded data
This can be done by sampling or by eliminating unneeded data. Keep in mind WHERE commands are always processed before GROUP BY commands so the more you use WHERE command the better. Clean the data, remove duplicates, and take every steps you can to remove unneeded cases before processing massive data. Do this before you do anything else.

FROM [AgeDx].[dbo].[final] GROUP BY id, icd9, AgeAtDx
SELECT id, icd9, AgeAtDx -- Other calculations FROM [AgeDx].[dbo].[final] GROUP BY id, icd9, AgeAtDx Here we see the use of GROUP BY to remove duplicated diagnosis for the same patient at same age. Always remove any duplication or unneeded case before proceeding.

-- Calculate likelihood ratio here FROM [AgeDx].[dbo].[final]
SELECT icd9 -- Calculate likelihood ratio here FROM [AgeDx].[dbo].[final] WHERE AgeAtDx < AgeAtDiabetes Here we see the use of WHERE command in removing diagnoses that occur after diabetes.

Desired Results Actual Results
B Actual Results Second way you can make the code faster is to check that no joins create duplicates. Always trace the size of your data and make sure that no additional records are added by the join command. Join commands can radically increase the size of your data. This occurs if there are duplicates in variables used to join two tables.

Indexing Indexing is a method of improving computational efficiency. Indexes create pointers to the data so that the computer knows which record is next and does not need to look up what is next.

CREATE UNIQUE INDEX index_name ON table_name (column1, column2, ...);
This syntax creates a unique index on columns specified.

The syntax includes a name for the index

a name for the table

and the various column names that compose the index.

Example Here is an example code for indexing.

CREATE UNIQUE INDEX Dx ON dbo.final (id, icd9, AgeAtDx);
This code uses the data from dbo dot final and the three columns id, ICD 9, and age at dx. This code assumes that we have previously cleaned the data and there are no duplications across these three columns. A patient can have only one diagnosis at a specific age.

Indexing can save time Cursor and Do-While commands repeatedly execute an SQL code

Indexing & Computational Efficiency

Similar presentations

Presentation on theme: "Indexing & Computational Efficiency"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexing & Computational Efficiency

Similar presentations

Presentation on theme: "Indexing & Computational Efficiency"— Presentation transcript:

Similar presentations

About project

Feedback