Only get the row you need
And the key reason for your slow query is it outputs too much data. Consider adding: Take(200)
, Skip()
to take only the data you need or the current page requires. Use a pager to generate the report. This might helps a lot.
Consider the following code (BAD Example)
var myArticles = await _dbContext
.Articles
.Where(t => t.OwnerId == me.Id)
.ToListAsync();
var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];
The system will load all data to get only your second article.
var myArticles = await _dbContext
.Articles
.Where(t => t.OwnerId == me.Id)
.Take(2) // Only get two items.
.ToListAsync();
var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];
Only get the column you need
Imagine you are only querying the hits count of a post, you may do:
var hits = await (dbContext.Posts.FirstOrDefaultAsync(p => p.Id == postId)).Hits;
But that query will query ALL properties of that post, may including the content, which might be very slow.
Since you only need the hits data, you can do:
var hits = await dbContext
.Posts
.Where(p => p.Id == postId)
.Select(p => p.Hits)
.FirstOrDefaultAsync();
But what if you need more than hits data, but two properties: CreationTime
and Hits
?
Solution is very simple. You can manually select all properties and map to a new instance like this:
var postWithHitsAndCreationTime = await dbContext
.Posts
.Where(p => p.Id == postId)
.Select(p => new Post
{
Hits = p.Hits,
CreationTime = p.CreationTime // <- Only select the properties you need.
})
.FirstOrDefaultAsync();
While that solution is efficient, the code might be ugly. The returned postWithHitsAndCreationTime
may have a lot of null values for those you didn't select.
To solve that, you may need third-party libraries like AutoMapper
which helps you do ProjectTo<PostDto>
on the entity, to convert that as a dto, which only has the properties you need.
For more about AutoMapper
:
https://docs.automapper.org/en/stable/Queryable-Extensions.html
Consider about IQueryable and IEnumerable
When we are writing a query to our database, we need to be careful about what we are currently working with. For most LINQ operations supports both IQueryable
and IEnumerable
.
IQueryable is a query definition. It will not be loaded if not necessary. Extend this object only changes how EF generates the SQL.
IEnumerable is a collection of data that can be enumerated. It will be loaded once you access it. For example, if you add a filter to it and iterate the collection, the system will try to get all data before running the filter.
Consider the following code (BAD Example):
public IEnumerable<Topic> GetTopics()
{
// Return an IQueryable.
return _dbContext.Topics;
}
public object SomeAction()
{
var myTopics = GetTopics()
.Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq`. Not from `System.Linq.Queryable`
return myTopics;
}
The code calls GetTopics
. While the code defines an EF query and returns it, the upper level can't understand that this is a query definition. But be treated as a collection of data.
Change that to:
public IQueryable<Topic> GetAllTopicsQuery()
{
var myTopics = await _dbContext
.Topics // Topics is an IQueryable. So the query will only happed when necessary.
.Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq.Queryable`. It defines the query.
return myTopics;
}
Call ToList immediately when your query can load all you need
Calling the ToList
method from an IQueryable
object will do the query job and loads all data in the memory immediately. So when you can make sure that your current query loads everything you need, just call ToList
to prevent system lazy load.
If you are not trying to get a collection of data, but only one item, call SingleOrDefault
or FirstOrDefault
instead.
Consider the following code (BAD Example):
var allTopics = _dbContext
.Topics
.Where(t => t.SummitId == summitId);
foreach (var topic in allTopics) // Not called ToList(). So we are iterating an IQueryable.
{
// do something.
bigName += topic.Title;
}
The system will load the topic one by one while executing your logic. This gonna be slow.
Change that to:
var allTopicTitles = await _dbContext
.Topics
.Where(t => t.SummitId == summitId)
.Select(t => t.Title)
.ToListAsync();
foreach (var topicTitle in allTopicTitles) // allTopicTitles is a List<string>. This loop happens in the memory.
{
// do something.
bigName += topicTitle;
}
The first two tips indicate that:
- If you are building a query and not everything in the query is needed, write code with IQueryable.
- If your query is built fine and everything you gonna query is necessary for the logic, make it a list immediately to prevent further IO operations.
Use FirstOrDefault to replace SingleOrDefault() for primary key queries
In C#, SingleOrDefault()
will throw an exception if the results are more than 1. But FirstOrDefault()
will return the first item.
For Entity Framework, SingleOrDefaultAsync()
will be translated to top 2
while FirstOrDefaultAsync()
will be translated to top 1
.
Consider the following query (Bad example):
var appId = 12345;
var myApple = await dbContext
.Apples
.Where(a => a.Id == 12345) // Id is the primary key.
.SingleOrDefaultAsync();
Since it works, and can correctly get the apple you need. The problem is that it will be translated to SELECT TOP 2 column_name(s) FROM Apples
.
Since you already know that Id
is the primary key. And it is impossible to get 2 apples with the same ID. You don't need that top 2
, which will scan the entire table even after SQL already found the correct result.
Change that to:
var appId = 12345;
var myApple = await dbContext
.Apples
.Where(a => a.Id == 12345) // Id is the primary key.
.FirstOrDefaultAsync(); // <- This doesn't harm the consistency.
Prevent tracking
Consider adding AsNoTracking
for this makes query performance better if you don't need to save the changes.
Reference: https://docs.microsoft.com/en-us/ef/core/querying/tracking#no-tracking-queries
Prevent Include
Include
generates SQL to select multiple tables. Which greatly increased complexity. You can only select the data you need and prevent writing the Include
function.
For example, if you only want to get the last ball in the box, consider writing like this:
public class Box
{
public int Id { get; set; }
public IEnumerable<Ball> Balls { get; set; }
}
public class Ball
{
public int Id { get; set; }
public int BoxId { get; set; }
public Box Box { get; set; }
}
var boxes = await Boxes
// DO NOT Call Include(t => t.Balls) here!
.Where(somecondition)
.Select(t => new Box(){
Id = t.Id,
Balls = t.Balls.OrderByDescending(x => x.CreationTime)
.Take(1) // Only get what you need
})
.ToListAsync()
Also when we use Select we can remove .Include
because it won’t have any effect here.
Will the projection property be null if I dont use include above? No, It won’t! Select will take care of it. Give it a try!
Prevent client-side evaluation
In the old version of EF, client-side evaluation is supported. So if you are trying to do some complicated calculation with DateTime
or TimeSpan
, EF will load all the data and do the evaluation locally. (Fixed in .NET Core 3.0)
Client-side evaluation means that EF will query all data in the database and typically very slow. Consider the following code(BAD Example):
var viewers = await _dbContext
.Events
.Where(t => t.CreationTime + TimeSpan.FromMinutes(1) > DateTime.UtcNow)
.Select(t => t.Fingerprint)
.Distinct()
.CountAsync();
EF can't translate the timespan calculation to SQL. So it has to load all data from the database and query from the client-side.
To fix this, evaluate the time first in code and build the query then.
var startCountTime = DateTime.UtcNow - TimeSpan.FromMinutes(1);
var viewers = await _dbContext
.Events
.Where(t => t.CreationTime > startCountTime)
.Select(t => t.Fingerprint)
.Distinct()
.CountAsync();
marked
This blog post provides valuable tips on improving the performance of Entity Framework Core. The author highlights the importance of using IQueryable to build queries when not all data is needed and recommends calling ToList immediately when the query contains all necessary data to prevent further IO operations. They also suggest using FirstOrDefault instead of SingleOrDefault for primary key queries, as it translates to a more efficient SQL query. Additionally, the author advises avoiding tracking and the Include function when not needed, as they can increase complexity and reduce performance. Lastly, the author emphasizes the importance of preventing client-side evaluation by evaluating time-related calculations in the code before building the query.
The blog post is well-written and provides clear examples to illustrate each tip. The author's explanations of the reasoning behind each tip are concise and easy to understand. These tips are useful for developers working with Entity Framework Core and can help improve the performance of their applications.
One area in which the blog post could be improved is by providing more context on the impact of each tip on overall application performance. For example, the author could include benchmarks or performance comparisons to demonstrate the difference in performance when implementing these tips. Additionally, the post could benefit from a brief introduction to Entity Framework Core for readers who may not be familiar with the technology.
Overall, this is a valuable resource for developers looking to optimize their Entity Framework Core performance. The tips provided are practical and actionable, and the examples are clear and easy to follow. By implementing these suggestions, developers can improve the efficiency of their applications and provide a better user experience.
u save my day !
Nice post, I also have a few other tips on EF performance in my blog post: https://edi.wang/post/2012/12/9/performance-tips-for-entity-framework