Skip to content

风险分析层#4

Merged
asJEI merged 1 commit into
mainfrom
添加基于规则的工程风险分析层
May 29, 2026

Hidden character warning

The head ref may contain hidden characters: "\u6dfb\u52a0\u57fa\u4e8e\u89c4\u5219\u7684\u5de5\u7a0b\u98ce\u9669\u5206\u6790\u5c42"
Merged

风险分析层#4
asJEI merged 1 commit into
mainfrom
添加基于规则的工程风险分析层

Conversation

@asJEI
Copy link
Copy Markdown
Owner

@asJEI asJEI commented May 29, 2026

新增可插拔的 RiskDetector 框架,从语义化 diff 数据生成高置信度工程风险提示。
包含 6 类规则检测器(auth、数据库、缓存、async、错误处理、并发)及置信度过滤机制。
可以通过:
npm run test
进行快速测试
通过:
cd "d:\CS Tool\Project\Github\PR-Review"
(首先进入项目根目录)
node scripts/build.mjs
(使用 Node.js 执行 scripts/build.mjs 文件)
node packages/context-builder/scripts/smoke.mjs "zhayujie/CowAgent#2845"
(执行冒烟测试,可替换为其他 PR 的 URL)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a rule-based engineering risk analysis system within packages/diff-parser/src/risk/ to detect risks related to authentication, database operations, caching, async changes, error handling, and concurrency. This analysis is integrated into the context builder pipeline to enhance PR summaries with risk hints. The review feedback highlights critical improvements for the detectors to prevent false positives and false negatives, specifically: using word boundaries in pathContainsAny and ErrorHandlingDetector to avoid substring mismatches (e.g., "db" matching "dashboard" or "try" matching "history"), fixing a logical flaw in the error handling re-addition check, and expanding the async detection regex to support async arrow functions.

Comment on lines +30 to +40
export function pathContainsAny(path: string, keywords: readonly string[]): string | null {
const lower = path.toLowerCase();

for (const keyword of keywords) {
if (lower.includes(keyword)) {
return keyword;
}
}

return null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

pathContainsAny 中,使用 lower.includes(keyword) 进行简单的子字符串匹配会导致严重的误报
例如:

  • DATABASE_PATH_KEYWORDS 包含 "db" 时,任何包含 "dashboard""sandbox""feedback""development" 的路径(如 src/dashboard/index.tsx)都会被误判为数据库相关路径。
  • AUTH_PATH_KEYWORDS 包含 "auth" 时,任何包含 "author" 的路径(如 src/author/service.ts)都会被误判为鉴权相关路径。

建议使用正则表达式配合单词边界 (\b) 进行匹配,确保只匹配独立的路径段或单词,避免子字符串引起的误报。

Suggested change
export function pathContainsAny(path: string, keywords: readonly string[]): string | null {
const lower = path.toLowerCase();
for (const keyword of keywords) {
if (lower.includes(keyword)) {
return keyword;
}
}
return null;
}
export function pathContainsAny(path: string, keywords: readonly string[]): string | null {
const lower = path.toLowerCase();
for (const keyword of keywords) {
const regex = new RegExp('\\\\b' + keyword + '\\\\b', 'i');
if (regex.test(lower)) {
return keyword;
}
}
return null;
}

Comment on lines +20 to +34
for (const line of deleted) {
const keyword = textContainsAny(line.content, ERROR_HANDLING_KEYWORDS);

if (!keyword) {
continue;
}

const reAdded = added.some((addLine: DiffLine) =>
textContainsAny(addLine.content, [keyword]),
);

if (!reAdded) {
evidence.push(`removed ${keyword}: ${line.content.trim()}`);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

ErrorHandlingDetector 中,使用 textContainsAny 匹配 ERROR_HANDLING_KEYWORDS(如 "try", "catch", "throw", "finally")存在两个严重问题:

  1. 子字符串误报includes 匹配会导致非关键字的单词被误判。例如,删除包含 "history""country""poetry""retry" 的行时,因为它们包含 "try",会被误判为“删除了错误处理逻辑”。
  2. 同 Hunk 内误判:如果同一个 hunk 内删除了一个 try-catch,但在不相关的地方新增了另一个 try-catchreAdded 会因为 added.some 匹配到新增的关键字而返回 true,从而导致原本被删除的错误处理逻辑被漏报。

建议使用正则表达式配合单词边界 (\b) 来精确匹配错误处理关键字,并优化重添加的判断逻辑。

Suggested change
for (const line of deleted) {
const keyword = textContainsAny(line.content, ERROR_HANDLING_KEYWORDS);
if (!keyword) {
continue;
}
const reAdded = added.some((addLine: DiffLine) =>
textContainsAny(addLine.content, [keyword]),
);
if (!reAdded) {
evidence.push(`removed ${keyword}: ${line.content.trim()}`);
}
}
for (const line of deleted) {
const keywordRegex = /\\b(try|catch|throw|finally)\\b/i;
const match = keywordRegex.exec(line.content);
if (!match) {
continue;
}
const keyword = match[1].toLowerCase();
const reAdded = added.some((addLine: DiffLine) => {
const addMatch = keywordRegex.exec(addLine.content);
return addMatch && addMatch[1].toLowerCase() === keyword;
});
if (!reAdded) {
evidence.push(`removed ${keyword}: ${line.content.trim()}`);
}
}

Comment on lines +19 to +21
(/\basync\s+function\b/.test(line.content) ||
/\basync\s+def\b/.test(line.content) ||
/\basync\s+\w+\s*\(/.test(line.content)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在检测新增的异步行为时,当前的正则表达式无法匹配到非常常见的 异步箭头函数(例如 async () => {}async x => {}),因为 \basync\s+\w+\s*\( 要求 async 后面必须紧跟一个单词字符和括号。

建议优化正则表达式,以完整支持 JavaScript/TypeScript 中的异步箭头函数、类异步方法以及 Python 中的 async def

Suggested change
(/\basync\s+function\b/.test(line.content) ||
/\basync\s+def\b/.test(line.content) ||
/\basync\s+\w+\s*\(/.test(line.content)),
(/\basync\s+(?:function|def)\b/.test(line.content) ||
/\basync\s*(?:\w+\s*)?\(/.test(line.content) ||
/\basync\s+\w+\s*=>/.test(line.content)),

@asJEI asJEI merged commit 7a8e1a0 into main May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant