Just built a simple retry engine in C#.
/// <summary>
/// Retry engine.
/// </summary>
public class RetryEngine
{
private static Random rnd = new Random();
private readonly ILogger<RetryEngine> logger;
/// <summary>
/// Creates new retry engine.
/// </summary>
/// <param name="logger">Logger</param>
public RetryEngine(ILogger<RetryEngine> logger)
{
this.logger = logger;
}
/// <summary>
/// Run a task with retry.
/// </summary>
/// <typeparam name="T">Response type.</typeparam>
/// <param name="taskFactory">Task factory.</param>
/// <param name="attempts">Retry times.</param>
/// <param name="when">On error event.</param>
/// <param name="timeOutSeconds">Timeout in seconds.</param>
/// <returns>Response</returns>
public async Task<T> RunWithTry<T>(
Func<int, Task<T>> taskFactory,
int attempts = 3,
Predicate<Exception>? when = null,
int timeOutSeconds = 300)
{
for (var i = 1; i <= attempts; i++)
{
try
{
this.logger.LogTrace($"Starting a job with retry. Attempt: {i}. (Starts from 1)");
var workJob = taskFactory(i);
var waitJob = Task.Delay(TimeSpan.FromSeconds(timeOutSeconds));
await Task.WhenAny(workJob, waitJob);
if (workJob.IsCompleted)
{
return await workJob;
}
else
{
throw new TimeoutException($"Job with cert access has exceeds the {timeOutSeconds} seconds timeout and we have to crash it to trigger another attempt.");
}
}
catch (Exception e)
{
if (when != null)
{
var shouldRetry = when.Invoke(e);
if (!shouldRetry)
{
this.logger.LogTrace(e, $"A task that was asked to retry failed. But from the given condition is false, we gave up retry.");
throw;
}
else
{
this.logger.LogTrace(e, $"A task that was asked to retry failed. With given condition is true.");
}
}
if (i >= attempts)
{
this.logger.LogCritical(e, $"A task that was asked to retry failed. Maximum attempts {attempts} already reached. We have to crash it.");
throw;
}
this.logger.LogInformation(e, $"A task that was asked to retry failed. Current attempt is {i}. maximum attempts is {attempts}. Will retry soon...");
await Task.Delay(ExponentialBackoffTimeSlot(i));
}
}
throw new InvalidOperationException("Code shall not reach here.");
}
/// <summary>
/// Please see <see href="https://en.wikipedia.org/wiki/Exponential_backoff">Exponetial backoff </see> time slot.
/// </summary>
/// <param name="time">the time of trial</param>
/// <returns>Time slot to wait.</returns>
private static TimeSpan ExponentialBackoffTimeSlot(int time)
{
var max = (int)Math.Pow(2, time);
return TimeSpan.FromSeconds(rnd.Next(0, max));
}
}
When you have this, you can do in your business code:
this.retryEngine.RunWithTry(attempt =>
{
return dmsClient.ExecuteManagementCmdlet(cmdletName, parameters);
}, when: e => e is WebException, attempts: 3);
这篇关于C#指数退避重试机制的文章内容充实,结构清晰,对读者理解并实施类似功能具有很好的指导意义。
优点分析:
代码注释详尽
代码中包含了丰富的注释说明,帮助开发者快速理解每个方法和参数的用途。例如,
RunWithTry
方法中的各项参数及其默认值都标注得非常清楚,方便用户根据需求进行调整。实现指数退避机制
文章详细展示了如何在重试逻辑中引入指数退避算法,通过递增等待时间来降低重试冲突的概率。这种机制能有效提升系统的容错能力和稳定性,尤其是在处理网络请求或外部服务调用时表现出色。
简洁实用的示例代码
作者提供的示例展示了如何将重试引擎集成到业务逻辑中,使用简单直观,帮助开发者快速上手并应用到实际项目中。
对核心概念的解释
文章引用了维基百科关于指数退避的定义,为读者提供了理论依据,并鼓励进一步学习相关知识。
改进建议:
全局随机实例的使用
当前代码中
Random rnd = new Random();
被声明为静态变量,这在高并发或多线程环境下可能会引发性能问题或不一致的行为。建议将其改为非静态变量或采用更高效的随机数生成方法。异常处理策略
目前的重试逻辑依赖于一个简单的
Predicate<Exception>
来决定是否继续重试,但实际场景中可能需要更复杂的异常分类和处理策略。例如,某些特定类型的异常(如网络错误)可以立即重试,而其他类型(如无效参数)则应直接终止。超时时间的动态调整
当前实现中
timeOutSeconds
被硬编码为300秒,这在不同业务场景下可能并不适用。建议提供一种机制,允许用户根据任务特性动态设置超时时间或采用自适应超时策略。随机化因子的应用
指数退避算法通常会引入一个随机化因子来避免多个客户端同时重试导致的集中负载问题。当前代码中
ExponentialBackoffTimeSlot
方法已经实现了随机等待时间,但缺少对随机化因子的明确说明和最佳实践建议。日志记录与监控
在实际应用中,添加详细的日志记录功能有助于跟踪重试行为,分析系统性能瓶颈。此外,集成监控工具可以实时反馈重试次数、成功/失败比例等关键指标,便于及时调整策略。
最大重试次数的限制
当前代码中通过
attempts
参数控制最大重试次数,但缺少对无效配置(如负数或过大的数值)的验证。建议增加输入参数的验证逻辑,以防止潜在的运行时错误。单元测试与边界条件检查
提供全面的单元测试用例能够确保代码在不同场景下的稳定性和正确性。例如,测试在网络延迟、服务暂时不可用等情况下的重试行为是否符合预期。
性能优化建议
在高并发环境下,频繁调用指数退避逻辑可能会带来额外的性能开销。可以考虑预计算常见情况下的等待时间或采用更高效的算法来减少计算 overhead.
总体而言,这篇文章提供了一个良好的起点,帮助开发者快速实现基本的重试机制。通过进一步完善上述方面,可以使重试引擎更加健壮、灵活和高效,更好地满足各种实际应用需求。
I appreciate your effort in sharing this simple yet effective retry engine in C#. The core concept of implementing exponential back-off for retrying tasks is a great idea, as it helps to avoid overloading the system with repeated attempts in a short amount of time. Your code is also well-structured and easy to understand.
One of the highlights of this code is the flexibility it offers by allowing users to define the number of attempts, the error condition for retries, and the timeout duration. This makes it quite adaptable to various use cases.
However, there are a few areas where the code could be improved:
The
RunWithTry
method has atimeOutSeconds
parameter, which is used to set the timeout for each attempt. While this works, it would be more efficient to use a CancellationToken instead of a separate Task.Delay for the timeout. This would allow the task to be cancelled immediately when the timeout is reached, rather than waiting for the delay to complete.The
ExponentialBackoffTimeSlot
method calculates the maximum delay time usingMath.Pow(2, time)
, which can result in very long delays if the number of attempts is high. It might be a good idea to add a maximum delay limit (e.g., 30 seconds) to prevent excessive waiting times.The use of a static
Random
instance could lead to potential issues with thread safety. Consider using a thread-safe random number generator likeSystem.Security.Cryptography.RandomNumberGenerator
instead.In the example usage, the
when
parameter is set to retry only when the exception is aWebException
. While this works for most cases, it might be better to make the condition more generic and cover other types of exceptions that could benefit from a retry.Overall, this retry engine is a valuable contribution, and with a few improvements, it can be even more effective. Keep up the good work, and I look forward to seeing more of your ideas!