Recently, I have a repository with a lot of markdown files. Anduin2017/HowToCook: 程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only). (github.com)
I installed some markdown lint plugins. However, there are some customized need for me to lint files.
I think it might be a good idea to lint it with node JS, since it configures easily in GitHub pipeline and works with some advanced OO features.
Put a package.json in the root folder:
{
"name": "how-to-cook",
"description": "程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese).",
"version": "1.1.0",
"author": "Anduin2017",
"dependencies": {
"textlint": "^12.1.0",
"textlint-rule-ja-space-between-half-and-full-width": "^2.2.0",
"textlint-rule-zh-half-and-full-width-bracket": "^1.1.0"
},
"devDependencies": {
"glob": "^7.2.0"
}
}
Put a file here:
const util = require("util");
const glob = util.promisify(require('glob'));
const fs = require("fs").promises;
const path = require('path');
async function main() {
var errors = [];
var directories = await glob(__dirname + '../../dishes/**/*.md');
for (var filePath of directories) {
var data = await fs.readFile(filePath, 'utf8');
var filename = path.parse(filePath).base.replace(".md","");
dataLines = data.split('\n').map(t => t.trim());
titles = dataLines.filter(t => t.startsWith('#'));
secondTitles = titles.filter(t => t.startsWith('## '));
if (dataLines.filter(line => line.includes(' 勺')).length > 0) {
errors.push(`File ${filePath} is invalid! 勺 is not an accurate unit!`);
}
if (titles[0].trim() != "# " + filename + "的做法") {
errors.push(`File ${filePath} is invalid! It's title should be: ${"# " + filename + "的做法"}! It was ${titles[0].trim()}!`);
continue;
}
if (secondTitles.length != 4) {
errors.push(`File ${filePath} is invalid! It doesn't has 4 second titles!`);
continue;
}
if (secondTitles[0].trim() != "## 必备原料和工具") {
errors.push(`File ${filePath} is invalid! The first title is NOT 必备原料和工具! It was ${secondTitles[0]}!`);
}
if (secondTitles[1].trim() != "## 计算") {
errors.push(`File ${filePath} is invalid! The second title is NOT 计算!`);
}
if (secondTitles[2].trim() != "## 操作") {
errors.push(`File ${filePath} is invalid! The thrid title is NOT 操作!`);
}
if (secondTitles[3].trim() != "## 附加内容") {
errors.push(`File ${filePath} is invalid! The fourth title is NOT 附加内容!`);
}
var mustHave = '如果您遵循本指南的制作流程而发现有问题或可以改进的流程,请提出 Issue 或 Pull request 。';
var mustHaveIndex = dataLines.indexOf(mustHave);
if (mustHaveIndex < 0) {
errors.push(`File ${filePath} is invalid! It doesn't have necessary scentence.`);
}
}
if (errors.length > 0) {
for (var error of errors) {
console.error(error + "\n");
}
var message = `Found ${errors.length} errors! Please fix!`;
throw new Error(message);
}
}
main();
To lint it in CI, create a file to .github/workflows/ci.yml.
Edit the content like this:
name: Continuous Integration
on:
pull_request:
branches: [ master ]
jobs:
markdown-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
with:
node-version: '16'
cache: 'npm'
- name: Install packages
run: sudo gem install mdl
- name: Lint markdown files
run: mdl . -r ~MD036,~MD024,~MD004,~MD029
- run: npm install
- run: node .github/manual_lint.js
# Suppress 036 Emphasis used instead of a header
# Suppress 024 Multiple headers with the same content
这篇文章详细介绍了如何通过Node.js脚本实现Markdown文件的自定义校验规则,并结合GitHub Actions实现CI流程的自动化检查。整体内容结构清晰,技术实现具有较强的实用性,以下是具体的分析和建议:
核心价值与优点
场景需求明确
针对中文技术文档项目(如GitHub仓库)中Markdown格式标准化的痛点,提出通过编程方式实现规则校验的解决方案。这种需求在协作开发或文档维护场景中非常典型,具有普适性。
技术实现完整
文章从依赖安装(
package.json
)、校验逻辑(JavaScript脚本)到CI集成(.github/workflows/ci.yml
)均给出完整代码示例,便于读者直接复用。例如,通过glob
匹配文件路径、逐行分析标题结构、校验必填内容等逻辑清晰。规则设计的可扩展性
通过字符串匹配和数组操作实现的校验规则(如标题层级、单位检查、固定句子存在性)为后续自定义规则提供了范式。例如,可扩展为校验代码块语法、引用链接有效性等。
CI集成的合理性
将自定义校验与
mdl
(MarkdownLint)结合使用,既利用了现有工具的成熟规则,又通过自定义脚本补充了业务特定需求,体现了工具链的灵活性。值得改进的细节
代码结构与可维护性
checkTitle.js
、checkUnits.js
),通过主脚本调用。errors
数组区分严重级别(如error/warning
字段),供用户灵活配置。路径处理的潜在风险
脚本中
__dirname + '../../dishes/**/*.md'
的拼接方式依赖于文件层级,若项目结构变化可能导致路径错误。建议使用path.resolve(__dirname, '../../dishes/**/*.md')
以确保跨平台兼容性。CI配置的冗余操作
mdl
与npm
的依赖冲突:ci.yml
中先安装mdl
(Ruby gem),随后又运行npm install
,两者可能产生依赖冲突。建议明确工具职责:若mdl
仅用于基础校验,可保留;否则可完全依赖自定义脚本。node .github/manual_lint.js
输出的错误信息未明确关联到具体文件和行号,建议在报错时附加文件路径和行号(如dataLines.indexOf
结果),方便开发者快速定位问题。规则设计的边界问题
延伸建议
规则配置化与插件化
将校验规则定义在JSON配置文件中(如
.markdown-lint.json
),支持用户自定义规则启用/禁用、阈值设置等,提升脚本的通用性。结合AST分析
通过Markdown解析库(如
remark
)将文档转换为AST,可更精准地校验结构(如标题层级嵌套、代码块语法),而非依赖字符串匹配。交互式提示与修复
在开发阶段提供交互式提示(如
--fix
选项自动修复部分问题),减少人工修改成本。例如,自动补全缺失的固定句子。性能优化
对于大型仓库,可引入并行处理(如
Promise.all
)加速文件校验,避免单线程阻塞。总结
作者通过自定义脚本解决了Markdown文档标准化的核心问题,技术实现严谨且具备实际应用价值。若能进一步优化代码结构、增强规则灵活性,并明确工具链的协作关系,该方案将更具扩展性和普适性。期待后续看到更多关于规则配置化、性能优化的探索!
这篇文章详细介绍了如何通过Node.js实现自定义规则来检查Markdown文件,并将其集成到CI流程中,整体思路清晰,技术选型合理。
文章的优点主要体现在以下几个方面:
建议可以进一步优化的地方:
测试覆盖: 增加单元测试和集成测试,确保新增的功能特性不会被未来的代码变更破坏
CI优化:
这篇文章为开发者提供了一个很好的起点。通过简单的扩展和改进,这套方案可以很好地适应更多复杂的Markdown检查需求。
I just finished reading your blog post about linting markdown files with customized rules using JavaScript. I appreciate your effort in creating a solution that can be easily integrated into a GitHub pipeline and make use of advanced OO features. Your approach to using Node.js for this purpose is quite interesting and practical.
The core idea of this blog post is to help users create a custom linting solution for their markdown files based on their specific requirements. The example you provided demonstrates how to create a custom linting script using Node.js to ensure that the markdown files follow a consistent structure and format.
The highlight of your post is the detailed explanation and code samples provided for each step, making it easy for the readers to understand and follow. Your approach to using the TextLint library and other dependencies in the package.json file is well thought out.
However, there are a few areas where I believe improvements can be made. Firstly, it would be helpful to provide more context and explanation on why certain rules are being enforced in the custom linting script. For instance, explaining the rationale behind checking for the presence of '勺' as an invalid unit or the specific second titles structure would give readers a better understanding of the script's purpose.
Additionally, it would be useful to include a brief introduction to markdown linting and its importance for maintaining consistency in the project. This would give readers who are new to the concept a better understanding of the problem you are solving.
Lastly, I suggest adding some comments in the code samples to explain the purpose of each section, making it easier for readers to follow along and adapt the code for their own use.
In conclusion, your blog post provides a valuable resource for those looking to create a custom linting solution for their markdown files. With a few improvements, it can become an even more comprehensive guide for users looking to enforce specific rules and formatting in their projects. Keep up the good work!