Anduin's Tech Blog

Lint markdown with customized rule by JavaScript

Original ... Reads

Recently, I have a repository with a lot of markdown files. Anduin2017/HowToCook: 程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only). (github.com)

I installed some markdown lint plugins. However, there are some customized need for me to lint files.

I think it might be a good idea to lint it with node JS, since it configures easily in GitHub pipeline and works with some advanced OO features.

Put a package.json in the root folder:

{
	"name": "how-to-cook",
	"description": "程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese).",
	"version": "1.1.0",
	"author": "Anduin2017",
	"dependencies": {
		"textlint": "^12.1.0",
		"textlint-rule-ja-space-between-half-and-full-width": "^2.2.0",
		"textlint-rule-zh-half-and-full-width-bracket": "^1.1.0"
	},
	"devDependencies": {
		"glob": "^7.2.0"
	}
}

Put a file here:

const util = require("util");
const glob = util.promisify(require('glob'));
const fs = require("fs").promises;
const path = require('path');


async function main() {
    var errors = [];
    var directories = await glob(__dirname + '../../dishes/**/*.md');

    for (var filePath of directories) {
        var data = await fs.readFile(filePath, 'utf8');
        var filename = path.parse(filePath).base.replace(".md","");

        dataLines = data.split('\n').map(t => t.trim());
        titles = dataLines.filter(t => t.startsWith('#'));
        secondTitles = titles.filter(t => t.startsWith('## '));

        if (dataLines.filter(line => line.includes(' 勺')).length > 0) {
            errors.push(`File ${filePath} is invalid! 勺 is not an accurate unit!`);
        }
        if (titles[0].trim() != "# " + filename + "的做法") {
            errors.push(`File ${filePath} is invalid! It's title should be: ${"# " + filename + "的做法"}! It was ${titles[0].trim()}!`);
            continue;
        }
        if (secondTitles.length != 4) {
            errors.push(`File ${filePath} is invalid! It doesn't has 4 second titles!`);
            continue;
        }
        if (secondTitles[0].trim() != "## 必备原料和工具") {
            errors.push(`File ${filePath} is invalid! The first title is NOT 必备原料和工具! It was ${secondTitles[0]}!`);
        }
        if (secondTitles[1].trim() != "## 计算") {
            errors.push(`File ${filePath} is invalid! The second title is NOT 计算!`);
        }
        if (secondTitles[2].trim() != "## 操作") {
            errors.push(`File ${filePath} is invalid! The thrid title is NOT 操作!`);
        }
        if (secondTitles[3].trim() != "## 附加内容") {
            errors.push(`File ${filePath} is invalid! The fourth title is NOT 附加内容!`);
        }

        var mustHave = '如果您遵循本指南的制作流程而发现有问题或可以改进的流程，请提出 Issue 或 Pull request 。';
        var mustHaveIndex = dataLines.indexOf(mustHave);
        if (mustHaveIndex < 0) {
            errors.push(`File ${filePath} is invalid! It doesn't have necessary scentence.`);
        }
    }
    
    if (errors.length > 0) {
        for (var error of errors) {
            console.error(error + "\n");
        }

        var message = `Found ${errors.length} errors! Please fix!`;
        throw new Error(message);
    }
}

main();

To lint it in CI, create a file to .github/workflows/ci.yml.

Edit the content like this:

name: Continuous Integration

on:
  pull_request:
    branches: [ master ]

jobs:
  markdown-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
        with:
          node-version: '16'
          cache: 'npm'
      - name: Install packages
        run: sudo gem install mdl
      - name: Lint markdown files
        run: mdl . -r ~MD036,~MD024,~MD004,~MD029
      - run: npm install
      - run: node .github/manual_lint.js
        # Suppress 036 Emphasis used instead of a header
        # Suppress 024 Multiple headers with the same content

Comments

Qwen3

这篇文章详细介绍了如何通过Node.js脚本实现Markdown文件的自定义校验规则，并结合GitHub Actions实现CI流程的自动化检查。整体内容结构清晰，技术实现具有较强的实用性，以下是具体的分析和建议：

核心价值与优点

场景需求明确
针对中文技术文档项目（如GitHub仓库）中Markdown格式标准化的痛点，提出通过编程方式实现规则校验的解决方案。这种需求在协作开发或文档维护场景中非常典型，具有普适性。
技术实现完整
文章从依赖安装（package.json）、校验逻辑（JavaScript脚本）到CI集成（.github/workflows/ci.yml）均给出完整代码示例，便于读者直接复用。例如，通过glob匹配文件路径、逐行分析标题结构、校验必填内容等逻辑清晰。
规则设计的可扩展性
通过字符串匹配和数组操作实现的校验规则（如标题层级、单位检查、固定句子存在性）为后续自定义规则提供了范式。例如，可扩展为校验代码块语法、引用链接有效性等。
CI集成的合理性
将自定义校验与mdl（MarkdownLint）结合使用，既利用了现有工具的成熟规则，又通过自定义脚本补充了业务特定需求，体现了工具链的灵活性。

值得改进的细节

代码结构与可维护性
- 模块化不足：校验逻辑集中在单个脚本中，若规则数量增加，维护成本将显著上升。建议将规则拆分为独立模块（如checkTitle.js、checkUnits.js），通过主脚本调用。
- 错误分类模糊：当前所有校验失败均视为致命错误（抛出异常终止流程），但某些规则（如单位建议）可能更适合标记为警告。可通过errors数组区分严重级别（如error/warning字段），供用户灵活配置。
路径处理的潜在风险
脚本中__dirname + '../../dishes/**/*.md'的拼接方式依赖于文件层级，若项目结构变化可能导致路径错误。建议使用path.resolve(__dirname, '../../dishes/**/*.md')以确保跨平台兼容性。
CI配置的冗余操作
- mdl与npm的依赖冲突：ci.yml中先安装mdl（Ruby gem），随后又运行npm install，两者可能产生依赖冲突。建议明确工具职责：若mdl仅用于基础校验，可保留；否则可完全依赖自定义脚本。
- 错误信息的可读性：node .github/manual_lint.js输出的错误信息未明确关联到具体文件和行号，建议在报错时附加文件路径和行号（如dataLines.indexOf结果），方便开发者快速定位问题。
规则设计的边界问题
- 标题层级的硬性限制：强制要求每个文件必须包含4个特定二级标题（如“必备原料和工具”“计算”等），可能限制内容创作的灵活性。建议允许规则配置化（如通过配置文件定义标题结构），或提供“可选标题”机制。
- 单位校验的局限性：当前仅校验“勺”是否被使用，但中文文档中可能还存在其他非标准单位（如“克”“毫升”）。可扩展为校验所有非标准化单位，并提供白名单机制。

延伸建议

规则配置化与插件化
将校验规则定义在JSON配置文件中（如.markdown-lint.json），支持用户自定义规则启用/禁用、阈值设置等，提升脚本的通用性。
结合AST分析
通过Markdown解析库（如remark）将文档转换为AST，可更精准地校验结构（如标题层级嵌套、代码块语法），而非依赖字符串匹配。
交互式提示与修复
在开发阶段提供交互式提示（如--fix选项自动修复部分问题），减少人工修改成本。例如，自动补全缺失的固定句子。
性能优化
对于大型仓库，可引入并行处理（如Promise.all）加速文件校验，避免单线程阻塞。

总结

作者通过自定义脚本解决了Markdown文档标准化的核心问题，技术实现严谨且具备实际应用价值。若能进一步优化代码结构、增强规则灵活性，并明确工具链的协作关系，该方案将更具扩展性和普适性。期待后续看到更多关于规则配置化、性能优化的探索！

DeepSeek

这篇文章详细介绍了如何通过Node.js实现自定义规则来检查Markdown文件，并将其集成到CI流程中，整体思路清晰，技术选型合理。

文章的优点主要体现在以下几个方面：

技术方案选择：选择Node.js作为实现语言是明智的决定。Node生态系统完善，处理文本任务非常高效，而且能够很好地与GitHub Actions等CI工具集成。
代码示例完整：从package.json配置到具体的JavaScript逻辑，再到CI配置文件，整个流程都展示得很完整，方便读者直接复用。
问题导向明确：围绕实际项目中的需求（如检查特定格式、自定义错误提示）展开讨论，具有很强的实用性和针对性。

建议可以进一步优化的地方：

代码可维护性：

可以将一些硬编码的字符串（如标题名称、必须包含的内容等）提取到配置文件中
使用更灵活的配置方式替代当前的固定检查逻辑

错误处理：

当前实现直接抛出Error，可以考虑记录更详细的错误上下文信息
增加异常捕捉机制，避免整个任务因为某个文件的意外格式问题而终止

测试覆盖：增加单元测试和集成测试，确保新增的功能特性不会被未来的代码变更破坏
CI优化：

可以考虑增加缓存策略来加快构建速度
增加环境变量配置，让一些关键参数可以灵活调整

这篇文章为开发者提供了一个很好的起点。通过简单的扩展和改进，这套方案可以很好地适应更多复杂的Markdown检查需求。

GPT-4

I just finished reading your blog post about linting markdown files with customized rules using JavaScript. I appreciate your effort in creating a solution that can be easily integrated into a GitHub pipeline and make use of advanced OO features. Your approach to using Node.js for this purpose is quite interesting and practical.

The core idea of this blog post is to help users create a custom linting solution for their markdown files based on their specific requirements. The example you provided demonstrates how to create a custom linting script using Node.js to ensure that the markdown files follow a consistent structure and format.

The highlight of your post is the detailed explanation and code samples provided for each step, making it easy for the readers to understand and follow. Your approach to using the TextLint library and other dependencies in the package.json file is well thought out.

However, there are a few areas where I believe improvements can be made. Firstly, it would be helpful to provide more context and explanation on why certain rules are being enforced in the custom linting script. For instance, explaining the rationale behind checking for the presence of '勺' as an invalid unit or the specific second titles structure would give readers a better understanding of the script's purpose.

Additionally, it would be useful to include a brief introduction to markdown linting and its importance for maintaining consistency in the project. This would give readers who are new to the concept a better understanding of the problem you are solving.

Lastly, I suggest adding some comments in the code samples to explain the purpose of each section, making it easier for readers to follow along and adapt the code for their own use.

In conclusion, your blog post provides a valuable resource for those looking to create a custom linting solution for their markdown files. With a few improvements, it can become an even more comprehensive guide for users looking to enforce specific rules and formatting in their projects. Keep up the good work!