Just built a simple retry engine in C#.
/// <summary>
/// Retry engine.
/// </summary>
public class RetryEngine
{
private static Random rnd = new Random();
private readonly ILogger<RetryEngine> logger;
/// <summary>
/// Creates new retry engine.
/// </summary>
/// <param name="logger">Logger</param>
public RetryEngine(ILogger<RetryEngine> logger)
{
this.logger = logger;
}
/// <summary>
/// Run a task with retry.
/// </summary>
/// <typeparam name="T">Response type.</typeparam>
/// <param name="taskFactory">Task factory.</param>
/// <param name="attempts">Retry times.</param>
/// <param name="when">On error event.</param>
/// <param name="timeOutSeconds">Timeout in seconds.</param>
/// <returns>Response</returns>
public async Task<T> RunWithTry<T>(
Func<int, Task<T>> taskFactory,
int attempts = 3,
Predicate<Exception>? when = null,
int timeOutSeconds = 300)
{
for (var i = 1; i <= attempts; i++)
{
try
{
this.logger.LogTrace($"Starting a job with retry. Attempt: {i}. (Starts from 1)");
var workJob = taskFactory(i);
var waitJob = Task.Delay(TimeSpan.FromSeconds(timeOutSeconds));
await Task.WhenAny(workJob, waitJob);
if (workJob.IsCompleted)
{
return await workJob;
}
else
{
throw new TimeoutException($"Job with cert access has exceeds the {timeOutSeconds} seconds timeout and we have to crash it to trigger another attempt.");
}
}
catch (Exception e)
{
if (when != null)
{
var shouldRetry = when.Invoke(e);
if (!shouldRetry)
{
this.logger.LogTrace(e, $"A task that was asked to retry failed. But from the given condition is false, we gave up retry.");
throw;
}
else
{
this.logger.LogTrace(e, $"A task that was asked to retry failed. With given condition is true.");
}
}
if (i >= attempts)
{
this.logger.LogCritical(e, $"A task that was asked to retry failed. Maximum attempts {attempts} already reached. We have to crash it.");
throw;
}
this.logger.LogInformation(e, $"A task that was asked to retry failed. Current attempt is {i}. maximum attempts is {attempts}. Will retry soon...");
await Task.Delay(ExponentialBackoffTimeSlot(i));
}
}
throw new InvalidOperationException("Code shall not reach here.");
}
/// <summary>
/// Please see <see href="https://en.wikipedia.org/wiki/Exponential_backoff">Exponetial backoff </see> time slot.
/// </summary>
/// <param name="time">the time of trial</param>
/// <returns>Time slot to wait.</returns>
private static TimeSpan ExponentialBackoffTimeSlot(int time)
{
var max = (int)Math.Pow(2, time);
return TimeSpan.FromSeconds(rnd.Next(0, max));
}
}
When you have this, you can do in your business code:
this.retryEngine.RunWithTry(attempt =>
{
return dmsClient.ExecuteManagementCmdlet(cmdletName, parameters);
}, when: e => e is WebException, attempts: 3);
I appreciate your effort in sharing this simple yet effective retry engine in C#. The core concept of implementing exponential back-off for retrying tasks is a great idea, as it helps to avoid overloading the system with repeated attempts in a short amount of time. Your code is also well-structured and easy to understand.
One of the highlights of this code is the flexibility it offers by allowing users to define the number of attempts, the error condition for retries, and the timeout duration. This makes it quite adaptable to various use cases.
However, there are a few areas where the code could be improved:
The
RunWithTry
method has atimeOutSeconds
parameter, which is used to set the timeout for each attempt. While this works, it would be more efficient to use a CancellationToken instead of a separate Task.Delay for the timeout. This would allow the task to be cancelled immediately when the timeout is reached, rather than waiting for the delay to complete.The
ExponentialBackoffTimeSlot
method calculates the maximum delay time usingMath.Pow(2, time)
, which can result in very long delays if the number of attempts is high. It might be a good idea to add a maximum delay limit (e.g., 30 seconds) to prevent excessive waiting times.The use of a static
Random
instance could lead to potential issues with thread safety. Consider using a thread-safe random number generator likeSystem.Security.Cryptography.RandomNumberGenerator
instead.In the example usage, the
when
parameter is set to retry only when the exception is aWebException
. While this works for most cases, it might be better to make the condition more generic and cover other types of exceptions that could benefit from a retry.Overall, this retry engine is a valuable contribution, and with a few improvements, it can be even more effective. Keep up the good work, and I look forward to seeing more of your ideas!