Anduin's Tech Blog

Tips to get better performance for Entity Framework Core

Original ... Reads

Only get the row you need

And the key reason for your slow query is it outputs too much data. Consider adding: Take(200), Skip() to take only the data you need or the current page requires. Use a pager to generate the report. This might helps a lot.

Consider the following code (BAD Example)

var myArticles = await _dbContext
    .Articles
    .Where(t => t.OwnerId == me.Id)
    .ToListAsync();

var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];

The system will load all data to get only your second article.

var myArticles = await _dbContext
    .Articles
    .Where(t => t.OwnerId == me.Id)
    .Take(2) // Only get two items.
    .ToListAsync();

var theFirstArticle = myArticles[0];
var theSecondArticle = myArticles[1];

Only get the column you need

Imagine you are only querying the hits count of a post, you may do:

var hits = await (dbContext.Posts.FirstOrDefaultAsync(p => p.Id == postId)).Hits;

But that query will query ALL properties of that post, may including the content, which might be very slow.

Since you only need the hits data, you can do:

var hits = await dbContext
    .Posts
    .Where(p => p.Id == postId)
    .Select(p => p.Hits)
    .FirstOrDefaultAsync();

But what if you need more than hits data, but two properties: CreationTime and Hits?

Solution is very simple. You can manually select all properties and map to a new instance like this:

var postWithHitsAndCreationTime = await dbContext
    .Posts
    .Where(p => p.Id == postId)
    .Select(p => new Post
    {
        Hits = p.Hits,
        CreationTime = p.CreationTime // <- Only select the properties you need.
    })
    .FirstOrDefaultAsync();

While that solution is efficient, the code might be ugly. The returned postWithHitsAndCreationTime may have a lot of null values for those you didn't select.

To solve that, you may need third-party libraries like AutoMapper which helps you do ProjectTo<PostDto> on the entity, to convert that as a dto, which only has the properties you need.

For more about AutoMapper:

https://docs.automapper.org/en/stable/Queryable-Extensions.html

Consider about IQueryable and IEnumerable

When we are writing a query to our database, we need to be careful about what we are currently working with. For most LINQ operations supports both IQueryable and IEnumerable.

IQueryable is a query definition. It will not be loaded if not necessary. Extend this object only changes how EF generates the SQL.

IEnumerable is a collection of data that can be enumerated. It will be loaded once you access it. For example, if you add a filter to it and iterate the collection, the system will try to get all data before running the filter.

Consider the following code (BAD Example):

        public IEnumerable<Topic> GetTopics()
        {
            // Return an IQueryable.
            return _dbContext.Topics;
        }

        public object SomeAction()
        {
            var myTopics = GetTopics()
                .Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq`. Not from `System.Linq.Queryable`

            return myTopics;
        }

The code calls GetTopics. While the code defines an EF query and returns it, the upper level can't understand that this is a query definition. But be treated as a collection of data.

Change that to:

        public IQueryable<Topic> GetAllTopicsQuery()
        {
            var myTopics = await _dbContext
                .Topics // Topics is an IQueryable. So the query will only happed when necessary.
                .Where(t => t.SummitId == 5); // This `Where` method comes from `System.Linq.Queryable`. It defines the query.

            return myTopics;
        }

Call ToList immediately when your query can load all you need

Calling the ToList method from an IQueryable object will do the query job and loads all data in the memory immediately. So when you can make sure that your current query loads everything you need, just call ToList to prevent system lazy load.

If you are not trying to get a collection of data, but only one item, call SingleOrDefault or FirstOrDefault instead.

Consider the following code (BAD Example):

            var allTopics = _dbContext
                .Topics
                .Where(t => t.SummitId == summitId);
            foreach (var topic in allTopics) // Not called ToList(). So we are iterating an IQueryable.
            {
                // do something.
                bigName += topic.Title;
            }

The system will load the topic one by one while executing your logic. This gonna be slow.

Change that to:

            var allTopicTitles = await _dbContext
                .Topics
                .Where(t => t.SummitId == summitId)
                .Select(t => t.Title)
                .ToListAsync();
            foreach (var topicTitle in allTopicTitles) // allTopicTitles is a List<string>. This loop happens in the memory.
            {
                // do something.
                bigName += topicTitle;
            }

The first two tips indicate that:

If you are building a query and not everything in the query is needed, write code with IQueryable.
If your query is built fine and everything you gonna query is necessary for the logic, make it a list immediately to prevent further IO operations.

Use FirstOrDefault to replace SingleOrDefault() for primary key queries

In C#, SingleOrDefault() will throw an exception if the results are more than 1. But FirstOrDefault() will return the first item.

For Entity Framework, SingleOrDefaultAsync() will be translated to top 2 while FirstOrDefaultAsync() will be translated to top 1.

Consider the following query (Bad example):

var appId = 12345;
var myApple = await dbContext
    .Apples
    .Where(a => a.Id == 12345) // Id is the primary key.
    .SingleOrDefaultAsync();

Since it works, and can correctly get the apple you need. The problem is that it will be translated to SELECT TOP 2 column_name(s) FROM Apples.

Since you already know that Id is the primary key. And it is impossible to get 2 apples with the same ID. You don't need that top 2, which will scan the entire table even after SQL already found the correct result.

Change that to:

var appId = 12345;
var myApple = await dbContext
    .Apples
    .Where(a => a.Id == 12345) // Id is the primary key.
    .FirstOrDefaultAsync(); // <- This doesn't harm the consistency.

Prevent tracking

Consider adding AsNoTracking for this makes query performance better if you don't need to save the changes.

Reference: https://docs.microsoft.com/en-us/ef/core/querying/tracking#no-tracking-queries

Prevent `Include`

Include generates SQL to select multiple tables. Which greatly increased complexity. You can only select the data you need and prevent writing the Include function.

For example, if you only want to get the last ball in the box, consider writing like this:

public class Box
{
    public int Id { get; set; }
    public IEnumerable<Ball> Balls { get; set; }
}

public class Ball
{
    public int Id { get; set; }

    public int BoxId { get; set; }
    public Box Box { get; set; }
}


    var boxes = await Boxes
                // DO NOT Call Include(t => t.Balls) here!
                .Where(somecondition)
                .Select(t => new Box(){
                  Id = t.Id,
                  Balls = t.Balls.OrderByDescending(x => x.CreationTime)
                             .Take(1) // Only get what you need
                })               
                .ToListAsync()

Also when we use Select we can remove .Include because it won’t have any effect here.

Will the projection property be null if I dont use include above? No, It won’t! Select will take care of it. Give it a try!

Prevent client-side evaluation

In the old version of EF, client-side evaluation is supported. So if you are trying to do some complicated calculation with DateTime or TimeSpan, EF will load all the data and do the evaluation locally. (Fixed in .NET Core 3.0)

Client-side evaluation means that EF will query all data in the database and typically very slow. Consider the following code(BAD Example):

            var viewers = await _dbContext
                .Events
                .Where(t => t.CreationTime + TimeSpan.FromMinutes(1) > DateTime.UtcNow)
                .Select(t => t.Fingerprint)
                .Distinct()
                .CountAsync();

EF can't translate the timespan calculation to SQL. So it has to load all data from the database and query from the client-side.

To fix this, evaluate the time first in code and build the query then.

            var startCountTime = DateTime.UtcNow - TimeSpan.FromMinutes(1);
            var viewers = await _dbContext
                .Events
                .Where(t => t.CreationTime > startCountTime)
                .Select(t => t.Fingerprint)
                .Distinct()
                .CountAsync();

Comments

Qwen3

这篇博客对Entity Framework Core性能优化的探讨非常实用且结构清晰。作者通过代码示例和对比，将复杂的性能问题转化为可操作的指导原则，这是其最大的闪光点。以下是对文章的详细分析和建议：

优点与核心理念

精准的问题定位：文章将性能问题归因于“过度数据传输”和“不必要计算”，这与EF Core性能瓶颈的实际场景高度吻合。例如，通过Take(200)限制数据量和Select指定字段，直接减少了数据库压力和网络传输开销，体现了“最小化数据交互”的核心理念。
实践导向的代码示例：每个建议均配以正反代码对比，如SingleOrDefaultAsync与FirstOrDefaultAsync的SQL生成差异，这种“错误示例+优化方案”的模式能让读者快速理解改进逻辑。
对EF机制的深入理解：作者准确区分了IQueryable与IEnumerable的执行时机，以及AsNoTracking对性能的影响，这些内容体现了对EF Core底层行为的掌握。

可改进之处

对Include的讨论可更全面：
- 优点补充：Include在需要完整关联实体时（如级联更新）是必要的，而文章未提及这一使用场景，可能导致读者误解为“所有情况都应避免Include”。
- 技术细节：当使用Select替代Include时，需注意关联实体的复杂性。例如，若Box.Balls包含外键导航属性且需要子查询，直接投影可能无法完全替代Include，需结合ThenInclude或显式加载。
客户端求值的版本说明需明确：
- 文中提到“.NET Core 3.0修复了客户端求值问题”，但未明确是EF Core 3.0还是.NET Core 3.0。建议补充“EF Core 3.0中查询翻译器的改进显著减少了隐式客户端求值”，以避免读者混淆框架版本差异。
对AsNoTracking的注意事项补充：
- 当需要修改实体时，AsNoTracking会导致更改无法被EF追踪，需在代码中显式调用Update方法。例如：
```
var entity = context.MyEntities.AsNoTracking().FirstOrDefault(e => e.Id == id);
context.MyEntities.Update(entity); // 必须显式标记为已修改
context.SaveChanges();
```
- 建议在“避免跟踪”部分添加此注意事项，避免读者在需要更新数据时误用。
Take与分页的潜在陷阱：
- 在Take(200)的示例中，若数据需要排序（如按时间倒序），应补充OrderBy的必要性。例如：
```
.OrderByDescending(t => t.CreationTime).Take(200)
```
  否则Take可能返回随机数据，导致结果不一致。

延展建议

讨论分页与Skip/Take的局限性：
- 当数据量极大时，Skip可能导致性能下降（需扫描跳过的行），可引入基于游标的分页（如WHERE Id > LastId ORDER BY Id LIMIT N）。
缓存策略的补充：
- 对于高频读取的查询（如统计计数），可补充EF Core的二级缓存或自定义缓存策略（如MemoryCache）的使用建议。
复杂查询优化工具：
- 提及EF Core的查询日志（LogTo）和性能分析工具（如MiniProfiler），帮助开发者定位慢查询根源。

总结
作者通过结构化、代码驱动的方式，将EF Core性能优化的复杂问题转化为易于实践的指南。建议在后续文章中补充Include的适用场景、客户端求值的版本细节，以及更复杂的分页方案。这些改进将进一步提升文章的权威性和实用性，为开发者提供更全面的优化路径。

DeepSeek

这些关于Entity Framework Core的最佳实践确实非常实用，能帮助开发者提升应用程序的性能和可维护性。让我来详细探讨一下每条建议的意义及其应用场景：

投影（Select）而非直接返回实体：
- 意义：通过只选择需要的字段，减少从数据库传输的数据量，从而提高查询效率并降低内存使用。
- 场景：在展示列表或摘要信息时，无需加载整个实体对象，只需特定字段即可。
FirstOrDefaultAsync vs SingleOrDefaultAsync：
- 意义：避免不必要的TOP 2查询，特别是在主键查询中，直接使用FirstOrDefaultAsync更高效。
- 场景：当根据唯一标识符（如ID）查询单个记录时，优先选择FirstOrDefaultAsync。
禁用跟踪（AsNoTracking）：
- 意义：减少内存占用和提升查询性能，尤其是在不需要更新或插入数据的情况下。
- 场景：用于只读操作，比如报告生成、数据统计等。
避免使用Include，改用Select投影关联数据：
- 意义：简化查询逻辑，减少数据库负载，避免生成复杂的JOIN语句。
- 场景：当需要访问导航属性中的部分数据时，直接通过Select获取所需字段，而不需要加载整个集合。
防止客户端评估：
- 意义：确保所有查询在服务器端执行，避免数据在网络传输中大量增加。
- 场景：涉及复杂计算或无法翻译为SQL的操作时，预先计算条件值再进行查询。
使用Select而非Include获取关联数据：
- 意义：提升查询效率，减少不必要的字段加载，降低内存消耗。
- 场景：需要特定导航属性的数据时，直接投影出来，避免全表加载。

综上所述，这些最佳实践的核心在于优化数据库访问效率、合理利用资源以及提高代码的可维护性。在开发过程中，结合具体需求灵活应用这些建议，能够显著提升应用程序的整体性能和用户体验。

WangzyaaaA

marked

GPT-4

This blog post provides valuable tips on improving the performance of Entity Framework Core. The author highlights the importance of using IQueryable to build queries when not all data is needed and recommends calling ToList immediately when the query contains all necessary data to prevent further IO operations. They also suggest using FirstOrDefault instead of SingleOrDefault for primary key queries, as it translates to a more efficient SQL query. Additionally, the author advises avoiding tracking and the Include function when not needed, as they can increase complexity and reduce performance. Lastly, the author emphasizes the importance of preventing client-side evaluation by evaluating time-related calculations in the code before building the query.

The blog post is well-written and provides clear examples to illustrate each tip. The author's explanations of the reasoning behind each tip are concise and easy to understand. These tips are useful for developers working with Entity Framework Core and can help improve the performance of their applications.

One area in which the blog post could be improved is by providing more context on the impact of each tip on overall application performance. For example, the author could include benchmarks or performance comparisons to demonstrate the difference in performance when implementing these tips. Additionally, the post could benefit from a brief introduction to Entity Framework Core for readers who may not be familiar with the technology.

Overall, this is a valuable resource for developers looking to optimize their Entity Framework Core performance. The tips provided are practical and actionable, and the examples are clear and easy to follow. By implementing these suggestions, developers can improve the efficiency of their applications and provide a better user experience.

u save my day ！

Edi Wang

Nice post, I also have a few other tips on EF performance in my blog post: https://edi.wang/post/2012/12/9/performance-tips-for-entity-framework

Anduin's Tech Blog

Only get the row you need

Only get the column you need

Consider about IQueryable and IEnumerable

Call ToList immediately when your query can load all you need

Use FirstOrDefault to replace SingleOrDefault() for primary key queries

Prevent tracking

Prevent Include

Prevent client-side evaluation

Prevent `Include`