Entity Framework Core is fantastic! It’s strongly typed, uses expressions and puts you in control of what, how and when data is loaded.

However, performance should always be a concern when your SQL is written for you. And EF Core is no exception to the fact that in some situations an ORM can fail.

Historically, my repository layers was always loading the entire model, any any related included models, when querying for data. The problem came when I wanted to include a “computed” property. The count of a child collection. Solutions?

  1. Include all the related child items, run a count client side. Terrible idea, never do this!
  2. Run two queries, the data and the count. Join them up on the client side. Not bad, but not ideal – dealing with the joining manually can cause bugs and query timings can produce divergent results.
  3. Run a single query using projections. Use this, and be aware of potential performance issues.

The fact is, projections actually produce high performance SQL. And in the case of including a count – you’re not likely to have any issues. Except, I wasn’t just including count, and sometimes I didn’t even want it. Instead I had some code which conditionally included various pieces of data, and this is where the problems began.

Let’s take a look at some code, albeit simplified to make the example easier to follow.

The persistence models

// The persistence models (EF DbSet)
public class User
{
	public int Id { get; set; }
	
	public string Name { get; set; }
		
	public ICollection<UserRole> Roles { get; set; }
}

public class UserRole
{
	public int Id { get; set; }
		
	public string Name { get; set; }
		
	public int UserId { get; set; }
	
	public User User { get; set; }
}

The domain models

public class UserImpl
{
	public int Id { get; set; }
		
	public string Name { get; set; }
		
	public IEnumerable<UserRoleImpl> Roles { get; set; }
		
	public int? RoleCount { get; set; }
}

public class UserRoleImpl
{
	public int Id { get; set; }
	
	public string Name { get; set; }
		
	public int UserId { get; set; }
		
	public UserImpl User { get; set; }
}

The projection

public class UserProjection
{
	private readonly UserProjectionOptions _options;

	public UserProjection(UserProjectionOptions options = null)
	{
		_options = options;
	}
		
	public Expression<Func<User, UserImpl>> Projection()
	{
		return u => new UserImpl
		{
			Id     = u.Id,
			Name   = u.Name,
			Roles  = _options != null && _options.IncludeRoles ? u.Roles.Select(ur => new UserRoleImpl
			{
				Id   = ur.Id,
				Name = ur.Name
			}).ToList() : null,
			RoleCount = _options != null && _options.IncludeRoleCount ? u.Roles.Count : default(int?)
		};
	}
}

public class UserProjectionOptions
{
	public bool IncludeRoles { get; set; }

	public bool IncludeRoleCount { get; set; }
}

Now, for all intents and purposes this does exactly what I want. I can pass true or false for any of the options and the result will contain exactly what I asked for. However, what might not be clear is actually that entire expression will be sent to the database. No smart heuristics will be applied. No transformations. Essentially, just a copy and paste. Which means the ternary statements will become nonsensical and clutter the SQL. I’ve seen the following occur.

SELECT ..., CASE 1 = 0 THEN ... ELSE ... END;

And it gets worse if you’re including computed data across a few tables (think concatenating values across some 1-1 tables). Firstly all tables are joined regardless to the truth status of the statement. Secondly, the computation happens regardless to the truth status. And thirdly, most worryingly, the truth status is calculated on the database so the client can know wether to throw away the data or not.

This means the following projection

u => new ExampleImpl
{
	UserName = u.UserName,
	Url = includeUrl ? u.Profile.Settings.BaseUrl + u.Profile.Url + u.Slug : null
}

Can actually produce SQL looking something similar to this

SELECT UserName, CONCAT(/* Settings Table */, /* Profile Table */, /* User Table */), CASE 1 = 1 THEN TRUE ELSE FALSE END FROM User INNER JOIN /* 3 tables worth of inner joins */

And then the query calculates everything, determines if true = true and then if the response is true include the data, otherwise throw it all away and replace it with null. It’s at this point, I decided to fix it.

Let’s start with an expression visitor which does nothing more than expose an Optimise method to enable us to return the exact same expression we pass in, of the same type. Doesn’t do a lot, but means we can call this directly and retain type hinting further down the pipeline.

public class ProjectionExpressionOptimiser : ExpressionVisitor
{
	public Expression<Func<TSource, TDest>> Optimise<TSource, TDest>(Expression<Func<TSource, TDest>> expression)
	{
		return Visit(expression) as Expression<Func<TSource, TDest>>;
	}
}

It’s designed to be created and called right before the expression is passed through the linq provider like so.

var projection = new UserProjection(new UserProjectionOptions
{
	IncludeRoles     = true,
	IncludeRoleCount = false
}).Projection();

var expression = new ProjectionExpressionOptimiser().Optimise(projection);

var users = await _databaseContext.Users
	.Select(expression)
	.ToListAsync();

I knew that the main thing I wanted to solve was the performance of conditional statements. That’s when I added the following method to handle it.

protected override Expression VisitConditional(ConditionalExpression node)
{
	var test = Visit(node.Test);

	// The conditional is now a constant, we can replace the branch
	if (test is ConstantExpression testNode)
	{
		var value = (dynamic) testNode.Value;
		return value ? Visit(node.IfTrue) : Visit(node.IfFalse);
	}

	// If it is not a conditional, we follow the default behaviour
	return base.VisitConditional(node);
}

The idea being that a constant represents the smallest possible expression type. If we can “reduce” the test of a conditional to true or false, then we can replace the entire node with the appropriate branch. Unfortunately, unless you are writing boolean literals in your projection this isn’t going to help. That’s where we need a method to handle reading members: variables etc…

protected override Expression VisitMember(MemberExpression node)
{
	if (node.Expression.NodeType != ExpressionType.Constant &&
		node.Expression.NodeType != ExpressionType.MemberAccess)
		return node;

	var objectMember = Expression.Convert(node, typeof(object));
	var getterLambda = Expression.Lambda<Func<object>>(objectMember);
	var getter       = getterLambda.Compile();
	var value        = getter();

	return Expression.Constant(value);
}

This allows us to reduce members to constants by executing the member on its own. However, we exclude MemberAccess from this as that should always be executed on the server and form part of the database query. This is when we are imposing conditionals based on specific rows of the database, not fixed constants.

However, we are using binary actions to perform AndAlso checks. Such as the fact that the object can’t be null AND must be true. We need another method.

protected override Expression VisitBinary(BinaryExpression node)
{
	var left  = Visit(node.Left);
	var right = Visit(node.Right);

	var leftConst  = left as ConstantExpression;
	var rightConst = right as ConstantExpression;

	var leftValue  = (dynamic) leftConst.Value;
	var rightValue = (dynamic) rightConst.Value;

	switch (node.NodeType)
	{
		case ExpressionType.Add:
			return Expression.Constant(leftValue + rightValue);
		case ExpressionType.Divide:
			return Expression.Constant(leftValue / rightValue);
		case ExpressionType.Modulo:
			return Expression.Constant(leftValue % rightValue);
		case ExpressionType.Multiply:
			return Expression.Constant(leftValue * rightValue);
		case ExpressionType.Power:
			return Expression.Constant(leftValue ^ rightValue);
		case ExpressionType.Subtract:
			return Expression.Constant(leftValue - rightValue);
		case ExpressionType.And:
			return Expression.Constant(leftValue & rightValue);
		case ExpressionType.AndAlso:
			return Expression.Constant(leftValue && rightValue);
		case ExpressionType.Or:
			return Expression.Constant(leftValue | rightValue);
		case ExpressionType.OrElse:
			return Expression.Constant(leftValue || rightValue);
		case ExpressionType.Equal:
			return Expression.Constant(leftValue == rightValue);
		case ExpressionType.NotEqual:
			return Expression.Constant(leftValue != rightValue);
		case ExpressionType.GreaterThan:
			return Expression.Constant(leftValue > rightValue);
		case ExpressionType.GreaterThanOrEqual:
			return Expression.Constant(leftValue >= rightValue);
		case ExpressionType.LessThan:
			return Expression.Constant(leftValue < rightValue);
		case ExpressionType.LessThanOrEqual:
			return Expression.Constant(leftValue <= rightValue);
	}

	return node;
}

As you can see, it’s necessary for us to handle each of the possible binary operations. This enables the case that someone might perform some logic. Only include roles if 5 > 3 etc…

However, unlike traditional “and” / “or” logic. We need to evaluate both sides and perform some additional optimisation logic (see that todo?) in order to ensure that we’re not sending unnecessary processing to the database.

The logic we need to apply is simple:

  1. We need a partial expression (only half can be reduced to true | false)
  2. We need to be working with a binary expression of And or Or.
  3. We need to apply simple truth logic to the left and right based on the truthy value
private OptimisedBooleanBinary OptimiseBooleanBinaryExpression(
	ExpressionType type, Expression left, Expression right)
{
	Expression leftVisited  = null;
	Expression rightVisited = null;

	dynamic leftValue  = null;
	dynamic rightValue = null;

	dynamic GetLeftValue()
	{
		if (leftVisited != null) return leftValue;
		leftVisited = Visit(left);
		if (leftVisited is ConstantExpression leftConst) leftValue = leftConst.Value;
		return leftValue;
	}

	dynamic GetRightValue()
	{
		if (rightVisited != null) return rightValue;
		rightVisited = Visit(right);
		if (rightVisited is ConstantExpression rightConst) rightValue = rightConst.Value;
		return rightValue;
	}

	switch (type)
	{
		// We can check for constants on each side to simplify the reduction process
		case ExpressionType.And:
		case ExpressionType.AndAlso:
		{
			if (GetLeftValue() == false || GetRightValue() == false) return new OptimisedBooleanBinary(false);
			if (GetLeftValue() == true) return new OptimisedBooleanBinary(rightVisited);
			if (GetRightValue() == true) return new OptimisedBooleanBinary(leftVisited);
			break;
		}
		case ExpressionType.Or:
		case ExpressionType.OrElse:
		{
			if (GetLeftValue() == true || GetRightValue() == true) return new OptimisedBooleanBinary(true);
			if (GetLeftValue() == false) return new OptimisedBooleanBinary(rightVisited);
			if (GetRightValue() == false) return new OptimisedBooleanBinary(leftVisited);
			break;
		}
	}

	GetLeftValue();
	GetRightValue();
	return new OptimisedBooleanBinary
	{
		LeftVisit  = leftVisited,
		RightVisit = rightVisited
	};
}

private class OptimisedBooleanBinary
{
	public OptimisedBooleanBinary()
	{
	}

	public OptimisedBooleanBinary(Expression result)
	{
		Result = result;
	}

	public OptimisedBooleanBinary(bool result)
	{
		Result = Expression.Constant(result);
	}

	public Expression Result { get; set; }

	public Expression LeftVisit { get; set; }

	public Expression RightVisit { get; set; }
}

There’s quite a lot going on here, mainly because there are a few conditions we need to be aware of.

  1. We should only visit the left or right node once
  2. We should only visit either the left or the right if necessary
  3. We should only visit the right node after the left node

These are some simple rules that limit our ability to cause issues within the application. Prematurely trying to resolve a node could result in an exception when left to right precedence is expected.

And we can replace the start of our VisitBinary method with a new signature.

protected override Expression VisitBinary(BinaryExpression node)
{
	// Special optimisations for boolean expressions 
	var optimised = OptimiseBooleanBinaryExpression(node.NodeType, node.Left, node.Right);
	if (optimised.Result != null) return optimised.Result;

	var leftConst  = optimised.LeftVisit as ConstantExpression;
	var rightConst = optimised.RightVisit as ConstantExpression;

	// ...

Which leaves us with the final code as follows

public class ProjectionExpressionOptimiser : ExpressionVisitor
{
	public Expression<Func<TSource, TDest>> Optimise<TSource, TDest>(Expression<Func<TSource, TDest>> expression)
	{
		return Visit(expression) as Expression<Func<TSource, TDest>>;
	}

	protected override Expression VisitConditional(ConditionalExpression node)
	{
		var test = Visit(node.Test);

		// The conditional is now a constant, we can replace the branch
		if (test is ConstantExpression testNode)
		{
			var value = (dynamic) testNode.Value;
			return value ? Visit(node.IfTrue) : Visit(node.IfFalse);
		}

		// If it is not a conditional, we follow the default behaviour
		return base.VisitConditional(node);
	}

	protected override Expression VisitMember(MemberExpression node)
	{
		Expression nodeRoot = node;
		while (nodeRoot is MemberExpression nodeRootMember)
			nodeRoot = nodeRootMember.Expression;

		if (nodeRoot.NodeType != ExpressionType.Constant &&
			nodeRoot.NodeType != ExpressionType.MemberAccess)
			return node;

		var objectMember = Expression.Convert(node, typeof(object));
		var getterLambda = Expression.Lambda<Func<object>>(objectMember);
		var getter       = getterLambda.Compile();
		var value        = getter();

		return Expression.Constant(value);
	}

	protected override Expression VisitBinary(BinaryExpression node)
	{
		// Special optimisations for boolean expressions 
		var optimised = OptimiseBooleanBinaryExpression(node.NodeType, node.Left, node.Right);
		if (optimised.Result != null) return optimised.Result;

		var leftConst  = optimised.LeftVisit as ConstantExpression;
		var rightConst = optimised.RightVisit as ConstantExpression;

		if (leftConst == null || rightConst == null)
			return node;

		var leftValue  = (dynamic) leftConst.Value;
		var rightValue = (dynamic) rightConst.Value;

		switch (node.NodeType)
		{
			case ExpressionType.Add:
				return Expression.Constant(leftValue + rightValue);
			case ExpressionType.Divide:
				return Expression.Constant(leftValue / rightValue);
			case ExpressionType.Modulo:
				return Expression.Constant(leftValue % rightValue);
			case ExpressionType.Multiply:
				return Expression.Constant(leftValue * rightValue);
			case ExpressionType.Power:
				return Expression.Constant(leftValue ^ rightValue);
			case ExpressionType.Subtract:
				return Expression.Constant(leftValue - rightValue);
			case ExpressionType.And:
				return Expression.Constant(leftValue & rightValue);
			case ExpressionType.AndAlso:
				return Expression.Constant(leftValue && rightValue);
			case ExpressionType.Or:
				return Expression.Constant(leftValue | rightValue);
			case ExpressionType.OrElse:
				return Expression.Constant(leftValue || rightValue);
			case ExpressionType.Equal:
				return Expression.Constant(leftValue == rightValue);
			case ExpressionType.NotEqual:
				return Expression.Constant(leftValue != rightValue);
			case ExpressionType.GreaterThan:
				return Expression.Constant(leftValue > rightValue);
			case ExpressionType.GreaterThanOrEqual:
				return Expression.Constant(leftValue >= rightValue);
			case ExpressionType.LessThan:
				return Expression.Constant(leftValue < rightValue);
			case ExpressionType.LessThanOrEqual:
				return Expression.Constant(leftValue <= rightValue);
		}

		return node;
	}

	protected override Expression VisitUnary(UnaryExpression node)
	{
		var operand = Visit(node.Operand);

		var operandConst = operand as ConstantExpression;
		if (operandConst == null) return node;
		var operandValue = (dynamic) operandConst.Value;

		switch (node.NodeType)
		{
			case ExpressionType.Not:
				return Expression.Constant(!operandValue);
		}

		return node;
	}

	private OptimisedBooleanBinary OptimiseBooleanBinaryExpression(
		ExpressionType type, Expression left, Expression right)
	{
		Expression leftVisited  = null;
		Expression rightVisited = null;

		dynamic leftValue  = null;
		dynamic rightValue = null;

		dynamic GetLeftValue()
		{
			if (leftVisited != null) return leftValue;
			leftVisited = Visit(left);
			if (leftVisited is ConstantExpression leftConst) leftValue = leftConst.Value;
			return leftValue;
		}

		dynamic GetRightValue()
		{
			if (rightVisited != null) return rightValue;
			rightVisited = Visit(right);
			if (rightVisited is ConstantExpression rightConst) rightValue = rightConst.Value;
			return rightValue;
		}

		switch (type)
		{
			// We can check for constants on each side to simplify the reduction process
			case ExpressionType.And:
			case ExpressionType.AndAlso:
			{
				if (GetLeftValue() == false || GetRightValue() == false) return new OptimisedBooleanBinary(false);
				if (GetLeftValue() == true) return new OptimisedBooleanBinary(rightVisited);
				if (GetRightValue() == true) return new OptimisedBooleanBinary(leftVisited);
				break;
			}
			case ExpressionType.Or:
			case ExpressionType.OrElse:
			{
				if (GetLeftValue() == true || GetRightValue() == true) return new OptimisedBooleanBinary(true);
				if (GetLeftValue() == false) return new OptimisedBooleanBinary(rightVisited);
				if (GetRightValue() == false) return new OptimisedBooleanBinary(leftVisited);
				break;
			}
		}

		GetLeftValue();
		GetRightValue();
		return new OptimisedBooleanBinary
		{
			LeftVisit  = leftVisited,
			RightVisit = rightVisited
		};
	}

	private class OptimisedBooleanBinary
	{
		public OptimisedBooleanBinary()
		{
		}

		public OptimisedBooleanBinary(Expression result)
		{
			Result = result;
		}

		public OptimisedBooleanBinary(bool result)
		{
			Result = Expression.Constant(result);
		}

		public Expression Result { get; set; }

		public Expression LeftVisit { get; set; }

		public Expression RightVisit { get; set; }
	}
}

Feel free to use it in your projects, I’ve had a lot of success with this so far and hopefully it will prevent some of the performance issues that can come with advanced projections. You can find the source over at GitHub if that helps!

https://gist.github.com/eliottrobson/2a75aa0ef4def5e167c44b662ad55b70

There are currently no comments.