Presentation is loading. Please wait.

Presentation is loading. Please wait.

Know What Your Code Is Doing!!

Similar presentations


Presentation on theme: "Know What Your Code Is Doing!!"— Presentation transcript:

1 Know What Your Code Is Doing!!
Presentation by Kevin G. Boles

2 Who Am I?? Kevin G. Boles SQL Server Consultant Extraordinaire “The Wolfe” for the Relational Engine Indicium Resources, Inc. AIM/Twitter: TheSQLGuru GTalk/GMail: MVP , MCT, MCITP, yada-yada-yada

3 What Do I Do? World-class Better-than-average relational engine expert  Almost 45,000 hours invested in SQL Server Absolutely LOVE doing targeted performance analysis and tuning work, and designing and building high-scale, highly-performing database applications VERY good at making others better at interacting with SQL Server Couldn’t SSAS my way out of a paper bag … with both ends open! 

4 GURUISM: Anything that allows developers to slap together code more quickly is inversely proportional to the performance and scalability you will get from that code!!

5 Entity Framework Is Such a Thing
Can be WAY more productive as a coder using it Unfortunately, several traps that are easy to fall into have given it a reputation for performing poorly Many of these problems center around database access, but not all RTFM!!! Read The “Fine” Manual  Other ORMs have identical/similar issues Disclaimer: I am NOT an EF/Nhibernate/Pick-Your-Favorite-ORM Guru!! 

6 Being Too Greedy with Rows
At its heart, Entity Framework is a way of exposing .NET objects without actually knowing their values, but then fetching / updating those values from the database behind the scenes when you need them. It’s important to be aware of when EF is going to hit the database – a process called materialization.

7 Being too greedy with Rows
Let’s say we have a database with an entity db.Schools. We might choose to write something like: string city = "New York"; List<School> schools = db.Schools.ToList(); List<School> newYorkSchools = schools.Where(s => s.City == city).ToList(); On line 2 when we do .ToList(), Entity Framework will go out to the database to materialize the entities, so that the application has access to the actual values of those objects, rather than just having an understanding of how to look them up from the database. It’s going to retrieve every row in that Schools table, then filter the list in .NET.

8 Being too greedy with Rows
NO WHERE CLAUSE!!

9 Being too greedy with Rows
LET SQL SERVER DO THE FILTERING!! We can do that either with ... List<School> newYorkSchools = db.Schools.Where(s => s.City == city).ToList(); ... or even ... IQueryable<School> schools = db.Schools; List<School> newYorkSchools = schools.Where(s => s.City == city).ToList();

10 The ‘N+1 Select’ problem: Minimizing the trips to the DB
In our database, every Pupil belongs to a School, referencing the Schools table using a foreign key on the SchoolId column. Equivalently, in our EF model, the Schools object has a virtual property Pupils. We want to print a list of how many pupils attend each school: string city = "New York"; List<School> schools = db.Schools.Where(s => s.City == city).ToList(); var sb = new StringBuilder(); foreach(var school in schools) { sb.Append(school.Name); sb.Append(": "); sb.Append(school.Pupils.Count); sb.Append(Environment.NewLine); }

11 The ‘N+1 Select’ problem: Minimizing the trips to the DB
If we look in ANTS at what happens when this code runs, we see a query run once to get a list of schools in New York, but another query is also run 500 times to fetch Pupil information.

12 The ‘N+1 Select’ problem: Minimizing the trips to the DB
This happens because BY DEFAULT, EF uses a loading strategy called Lazy Loading, where it doesn’t fetch any data associated with the virtual Pupils property on the School object when the first query is run. This leads to the name “N+1 select problem”, because N plus 1 queries are executed, where N is the number of objects returned by the original query. If you know that you’re definitely going to want the Pupil data, you’d be better doing things differently – especially if you want it for a large number of School objects. This is particularly important if there is high latency between your application and the database server (“CLOUD” ANYONE?!?) If you subsequently try to access data from one of the related Pupil objects, only then will it be retrieved from the database. Most of the time that’s a good idea because otherwise any time you accessed a School object, EF would bring back all related Pupil data regardless of whether it were needed. But in the example above, Entity Framework makes an initial request to retrieve the list of Schools, and then has to make a separate query for each of the 500 Schools returned to fetch the pupil data.

13 The ‘N+1 Select’ problem: Options
Use the Eager Loading data access strategy, which fetches the related data in a single query when you use an Include() statement. List<School> schools = db.Schools .Where(s => s.City == city) .Include(x => x.Pupils) .ToList(); But that gets them all. Make sure you really NEED them all. Isn’t it silly to get ALL rows from database just to count them in client? Sproc? Other? Key is DON’T GET DATA ITERATIVELY AND DON’T GET DATA YOU DON’T NEED!!

14 Being too greedy with Columns
Let’s say we want to print the name of every pupil at a certain SchoolId int schoolId = 1; List<Pupil> pupils = db.Pupils .Where(p => p.SchoolId == schoolId) .ToList(); foreach(var pupil in pupils) { textBox_Output.Text += pupil.FirstName + " " + pupil.LastName; textBox_Output.Text += Environment.NewLine; }

15 But a LOT more data than the first and last names (FirstName and LastName) has been retrieved

16 Being too greedy with Columns
The problem here is that, at the point when the query is run, EF has no idea what properties you might want to read, so its only option is to retrieve all of an entity’s properties, i.e. every column in the table. That causes two problems: We’re transferring more data than necessary. This impacts EVERYTHING in the application stack - from SQL Server I/O, CPU, RAM, network, right through to memory usage in our client application. EVERYTHING is retrieved, jpegs, XML, comments, etc! By selecting every column (effectively running a “Select * From…” query), we make it almost im­poss­ible to index the database usefully. Covering indexes are not available as a tuning mechanism.

17 Being too greedy with Columns - Options
Fortunately we can tell Entity Framework to select only certain specific columns We can either select a dynamic object: var pupils = db.Pupils .Where(p => p.SchoolId == schoolId) .Select(x => new { x.FirstName, x.LastName }) .ToList();

18 Being too greedy with Columns - Options
Or we could choose to define a separate class, sometimes called a DTO (Data Transfer Object), to select into List<PupilName> pupils = db.Pupils .Where(p => p.SchoolId == schoolId) .Select(x => new PupilName { FirstName = x.FirstName, LastName = x.LastName }) .ToList(); public class PupilName public string FirstName { get; set; } public string LastName { get; set; } }

19 Mismatched Data Types Data types matter, and if not enough attention is paid to them, even disarmingly simple database queries can perform surprisingly poorly We want to search for Pupils with zip code Easy: string zipCode = "90210"; var pupils = db.Pupils .Where(p => p.PostalZipCode == zipCode) .Select(x => new {x.FirstName, x.LastName}) .ToList();

20 Mismatched Data Types Unfortunately it takes a very long time for the results to come back from the database. There are several million rows in the Pupils table, but there’s an index covering the PostalZipCode column which we’re searching against, so it should be quick to find the appropriate rows. Indeed the results are returned instantly if we directly query the database from SQL Server Management Studio using SELECT FirstName, LastName FROM Pupils p WHERE p.PostalZipCode = '90210' What gives?!?

21

22 BAD STUFF DUE TO WRONG DATA TYPE!!

23 Mismatched Data Types Query Plan Warning:
Type conversion: Seek Plan for CONVERT_IMPLICIT(nvarchar(20), So [Extent1].[PostalZipCode] was implicitly converted to NVARCHAR(20) If we look back at the complete query which was run we can see why. Entity Framework has declared the variable as NVARCHAR, which seems sensible as strings in .NET are Unicode, and NVARCHAR is the SQL Server type which can represent Unicode strings.

24 Mismatched Data Types But looking at the Pupils table we can see that the PostalZipCode column is VARCHAR(20) So SQL Server is being FORCED to: Evaluate EVERY PostalZipCode in the table!! INDEX SCAN lock/blocking/latches IO Buffer Pool slammed, forcing other useful pages out and harming entire server app performance Etc. Then CONVERT the data type CPU BURN VOIDING ACQUIRING ACCURATE STATISTICS!!!!

25 Mismatched Data Types All THAT happened because YOU the developer slapped some code together quickly and didn’t RTFM! :-D Solution: You just need to edit the model to explicitly tell Entity Framework to use VARCHAR, using column annotation public string Adderss2 { get; set; } [Column(TypeName = "varchar")] public string PostalZipCode { get; set; }

26 Mismatched Data Types After making this trivial change, the parameter will be sent to SQL Server as VARCHAR, so the data type will match the column in the Pupils table, and an Index Seek operator can be used Generally, these data type mismatches don't happen if EF creates the database for you and is the only tool to modify its schema BUT THAT ISN’T ALWAYS THE CASE! And there is lots of debate about whether that is even a good idea or not. ;-)

27 Overly-generic queries
Often we want to do a search that is based on several criteria. For example, we might have a set of four search boxes for a user to complete, where empty boxes are ignored, so write something like:

28 Overly-generic queries
//Search data as input by user var searchModel = new Pupil { FirstName = "Ben", LastName = null, City = null, PostalZipCode = null }; List<Pupil> pupils = db.Pupils.Where(p => (searchModel.FirstName == null || p.FirstName == searchModel.FirstName) && (searchModel.LastName == null || p.LastName == searchModel.LastName) && (searchModel.City == null || p.City == searchModel.City) && (searchModel.PostalZipCode == null || p.PostalZipCode == searchModel.PostalZipCode) ) .Take(100) .ToList(); Only 1 of 4 possible filters has value

29 Overly-generic queries
We HOPE that the LastName, City, and PostalZipCode clauses, which all evaluate to true because in this case they are null, will be optimized away in .NET, leaving a query along the lines of ... NVARCHAR(20) = 'Ben' SELECT TOP 100 PupilId , FirstName , LastName, etc... FROM dbo.Pupils WHERE FirstName

30 -- Generated by ANTS Performance Profiler -- Executed against
-- Generated by ANTS Performance Profiler -- Executed against .\SQL2014 USE [EFSchoolSystem] NVarChar(4000) = 'Ben' NVarChar(4000) = 'Ben' NVarChar(4000) = '' NVarChar(4000) = '' NVarChar(4000) = '' NVarChar(4000) = '' NVarChar(4000) = '' NVarChar(4000) = '' -- Executed query SELECT TOP (100) [Extent1].[PupilId] AS [PupilId] , [Extent1].[FirstName] AS [FirstName] , [Extent1].[LastName] AS [LastName] , [Extent1].[Address1] AS [Address1] , [Extent1].[Adderss2] AS [Adderss2] , [Extent1].[PostalZipCode] AS [PostalZipCode] , [Extent1].[City] AS [City] , [Extent1].[PhoneNumber] AS [PhoneNumber] , [Extent1].[SchoolId] AS [SchoolId] , [Extent1].[Picture] AS [Picture] FROM [dbo].[Pupils] AS [Extent1] WHERE IS NULL OR [Extent1].[FirstName] AND IS NULL OR [Extent1].[LastName] AND IS NULL OR [Extent1].[LastName] AND IS NULL OR [Extent1].[PostalZipCode] BAD BAD BAD!!! Do ISNULL OR DEMO if haven’t

31 Overly-generic queries - Options
Sproc?!? Conditionals in EF logic Make SQL Server recompile the plans each time – from within EF

32 Overly-generic queries - Options
Write a custom database command interceptor to modify the EF- generated SQL before it’s run, to add a “option(recompile)” hint. You can write a class a little like this: public class RecompileDbCommandInterceptor : IDbCommandInterceptor { public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext) if(!command.CommandText.EndsWith(" option(recompile)")) command.CommandText += " option(recompile)"; } //and implement other interface members

33 Overly-generic queries - Options
And use it like this: var interceptor = new RecompileDbCommandInterceptor(); DbInterception.Add(interceptor); var pupils = db.Pupils.Where(p => p.City = city).ToList(); DbInterception.Remove(interceptor); Note that this interception is enabled globally, not for the specific instance of the context, so you probably want to disable it again so that other queries aren’t affected

34 Bloating the plan cache
The reuse of execution plans is often a good thing because it avoids the need to regenerate a plan each time a query is run In order for a plan to be reused, the statement text must be identical, which as we just saw, is the case for parameterized queries So far we’ve seen that Entity Framework usually generates parameterized queries when we include values through variables, But with .Skip() or .Take() this doesn’t happen

35 Bloating the plan cache
When implementing a paging mechanism we might choose to write the following: var schools = db.Schools .OrderBy(s => s.PostalZipCode) .Skip(model.Page * model.ResultsPerPage) .Take(model.ResultsPerPage) .ToList();

36

37 Bloating the plan cache
Looking at the executed query we see that the ResultsPerPage (100) and Page (417*100) integers are part of the query text, not parameters Next time we run this query for, say, page 567, a very slightly different query will be run with a different number, but it will be different enough that SQL Server won’t reuse the execution plan

38 Bloating the plan cache - Options
Enable a SQL Server setting called ‘optimize for ad-hoc workloads ’ . This makes SQL Server less aggressive at caching plans, and is generally a good thing to enable, but it doesn’t address the underlying issue. The problem occurs in the first place because (due to an implementation detail) when passing an int to the Skip() and Take() methods, Entity Framework can’t see whether they were passed absolute values like Take(100), or a variable like Take(resultsPerPage), so it doesn’t know whether the value should be parameterized.

39 Bloating the plan cache - Options
But there’s an easy solution in EF 6, which includes versions of Skip() and Take() which take a lambda instead of an int, enabling it to see that variables have been used, and parameterize the query int resultsToSkip = model.Page * model.ResultsPerPage; var schools = db.Schools .OrderBy(s => s.PostalZipCode) .Skip(() => resultsToSkip) //must pre-calculate this value .Take(() => model.ResultsPerPage) .ToList();

40

41 Inserting data When modifying data in SQL Server, Entity Framework will run separate INSERT statements for every row being added. The performance consequences of this are not good if you need to insert a lot of data! (round trip, tlog buffer flush, etc). You can use a NuGet package, EF.BulkInsert, which batches up Insert statements instead, in much the way that the SqlBulkCopy class does. This approach is also supported out of the box in Entity Framework 7 (released Q1 2016). If there’s a lot of latency between the application and the database, this problem will be more pronounced.

42 QUESTIONS?? Kevin G. Boles Don’t forget about the #sqlhelp hash tag
@TheSQLGuru Don’t forget about the #sqlhelp hash tag

43 References This deck is based off of an AWESOME Simple-Talk post by Ben Emmett, which has a LOT MORE great advice on EF performance problems outside of those that hit SQL Server!!! It shows screenshots from RedGate’s AWESOME ANTS Performance Profiler performance-and-what-you-can-do-about-it/ Sample Code:   – setup instructions are included in the readme


Download ppt "Know What Your Code Is Doing!!"

Similar presentations


Ads by Google