Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ PR-Review 是一个具备上下文感知能力的 AI 代码审查系统,帮助
### 已实现
- **GitHub PR 数据获取**:完整获取 PR 元数据、变更文件、提交记录、评论
- **Diff 解析**:统一 diff 格式解析,提取 hunk、行号、变更类型
- **Diff 语义分析**:函数/类/interface/import/export/async 变更检测(regex,可扩展 Tree-sitter)
- **Diff 语义分析**:函数/类/interface/import/export/async 变更检测
- **工程风险分析**:auth、数据库、缓存、async、错误处理、并发等规则检测(含 confidence)
- **上下文构建**:
- 复用 diff-parser 语义层,映射为审查上下文
- 分析 import 依赖关系
Expand Down Expand Up @@ -104,6 +105,19 @@ console.log(semantic.imports); // { added: [], removed: [] }
console.log(semantic.asyncChanges);
```

### 工程风险分析

```typescript
import { parseUnifiedDiff, analyzeSemantics, analyzeRisk } from '@pr-review/diff-parser';

const parsed = parseUnifiedDiff('src/auth.ts', patch);
const semantic = analyzeSemantics(parsed, { language: 'typescript' });
const risk = analyzeRisk({ filename: 'src/auth.ts', language: 'typescript', semantic, parsed });

console.log(risk.riskHints); // 高置信度风险提示
console.log(risk.findings); // 含 confidence 与 evidence
```

### 构建审查上下文

```typescript
Expand Down
16 changes: 12 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,17 +52,24 @@ packages/diff-parser/src/
extractors/ # functions, classes, imports, exports, async
patterns/ # per-language regex rules
interfaces/ # SemanticExtractor (Tree-sitter hook)
risk/
analyze-risk.ts # analyzeRisk, parseAnalyzeAndAssessRisk
detectors/ # auth, db, cache, async, error, concurrency
engine/run-detectors.ts # composable rule pipeline + confidence filter
interfaces/ # RiskDetector
```

### Public API

```ts
const parsed = parseUnifiedDiff(filename, patch);
const semantic = analyzeSemantics(parsed, { language: "typescript" });
// semantic.functions, semantic.classes, semantic.imports, semantic.asyncChanges, ...
const risk = analyzeRisk({ filename, language, semantic, parsed });
// risk.riskHints — high-confidence engineering risk messages
// risk.findings — structured { id, message, confidence, evidence }
```

MVP uses `RegexSemanticExtractor`. No AST or LSP.
MVP uses `RegexSemanticExtractor` and rule-based `RiskDetector`s. No AST or LSP.

---

Expand All @@ -79,7 +86,7 @@ PullRequestData
→ extract-imports (mapSemanticToImportEdges)
→ build-dependency-graph
→ group-changes
→ build-summaries (deterministic)
→ build-summaries (deterministic + analyzeRisk riskHints)
→ compress-context (token budget)
→ ReviewContext
```
Expand Down Expand Up @@ -107,7 +114,8 @@ packages/context-builder/src/

| Interface | MVP | Future |
|-----------|-----|--------|
| `SymbolExtractor` | `HeuristicSymbolExtractor` | Tree-sitter / TS compiler |
| `SemanticExtractor` (diff-parser) | `RegexSemanticExtractor` | Tree-sitter |
| `RiskDetector` (diff-parser) | 6 rule-based detectors | Custom rules / ML scoring |
| `AstAnalyzer` | `NoopAstAnalyzer` | Full AST analysis, call graphs |
| `FileContentResolver` | Not wired | Fetch blobs from GitHub |

Expand Down
30 changes: 8 additions & 22 deletions packages/context-builder/src/pipeline/stages/build-summaries.ts
Original file line number Diff line number Diff line change
@@ -1,19 +1,8 @@
import type { ChangeProfile, SemanticSummary } from "@pr-review/shared";
import { analyzeRisk } from "@pr-review/diff-parser";

import type { PipelineState } from "../types.js";

const CONCURRENCY_HINTS = [
"mutex",
"lock",
"atomic",
"concurrent",
"parallel",
"async",
"thread",
"race",
"semaphore",
];

function directoryArea(filename: string): string {
const parts = filename.split("/");

Expand Down Expand Up @@ -100,11 +89,14 @@ function buildRiskHints(state: PipelineState): string[] {
}

for (const entry of state.parsedFiles) {
const lower = entry.changedFile.filename.toLowerCase();
const risk = analyzeRisk({
filename: entry.changedFile.filename,
language: entry.language,
semantic: entry.semantic,
parsed: entry.parsedDiff,
});

if (CONCURRENCY_HINTS.some((hint) => lower.includes(hint))) {
hints.push(`Concurrency-related path: ${entry.changedFile.filename}`);
}
hints.push(...risk.riskHints);

const largeHunk = entry.parsedDiff.hunks.some(
(h) => h.lines.length > 80,
Expand All @@ -113,12 +105,6 @@ function buildRiskHints(state: PipelineState): string[] {
if (largeHunk) {
hints.push(`Large hunk in ${entry.changedFile.filename}`);
}

if (entry.semantic.asyncChanges) {
hints.push(
`Async/sync signature change detected in ${entry.changedFile.filename}`,
);
}
}

if (state.skippedFiles.length > 0) {
Expand Down
18 changes: 18 additions & 0 deletions packages/diff-parser/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
export { getChangedLines, parseUnifiedDiff } from "./parse-unified-diff.js";
export {
analyzeRisk,
parseAnalyzeAndAssessRisk,
} from "./risk/analyze-risk.js";
export type {
AnalyzeRiskOptions,
ParseAnalyzeAndAssessRiskOptions,
ParsedFileDiffWithSemanticAndRisk,
} from "./risk/analyze-risk.js";
export {
analyzeSemantics,
parseAndAnalyze,
} from "./semantic/analyze-semantics.js";
export type { AnalyzeSemanticsOptions, ParsedFileDiffWithSemantic } from "./semantic/analyze-semantics.js";
export { detectLanguage } from "./semantic/detect-language.js";
export { RegexSemanticExtractor } from "./semantic/regex-semantic-extractor.js";
export { createDefaultDetectors, DEFAULT_MIN_CONFIDENCE } from "./risk/engine/run-detectors.js";
export type { SemanticExtractor } from "./semantic/interfaces/semantic-extractor.js";
export type { RiskDetector } from "./risk/interfaces/risk-detector.js";
export type {
DiffHunk,
DiffLine,
Expand All @@ -21,4 +32,11 @@ export type {
SemanticFunction,
SemanticInterface,
} from "./semantic/types.js";
export type {
RiskAnalysisInput,
RiskAnalysisResult,
RiskCategory,
RiskFinding,
} from "./risk/types.js";
export { EMPTY_RISK_ANALYSIS } from "./risk/types.js";
export { EMPTY_SEMANTIC_ANALYSIS } from "./semantic/types.js";
120 changes: 120 additions & 0 deletions packages/diff-parser/src/risk/analyze-risk.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import { describe, expect, it } from "vitest";

import { parseUnifiedDiff } from "../parse-unified-diff.js";
import { analyzeSemantics } from "../semantic/analyze-semantics.js";
import { analyzeRisk } from "./analyze-risk.js";

function assess(filename: string, patch: string, language = "typescript") {
const parsed = parseUnifiedDiff(filename, patch);
const semantic = analyzeSemantics(parsed, { language });

return analyzeRisk({ filename, language, semantic, parsed });
}

describe("analyzeRisk", () => {
it("detects authLogicChanged for auth path and login symbol", () => {
const patch = `@@ -0,0 +1,3 @@
+export function login() {
+ return verifyToken();
+}
`;
const result = assess("src/auth/service.ts", patch);

const finding = result.findings.find((f) => f.id === "authLogicChanged");
expect(finding).toBeDefined();
expect(finding?.confidence).toBeGreaterThanOrEqual(0.7);
expect(result.riskHints.length).toBeGreaterThan(0);
});

it("detects databaseOperationModified for prisma query changes", () => {
const patch = `@@ -1,1 +1,1 @@
-await prisma.user.findMany()
+await prisma.user.findFirst()
`;
const result = assess("src/db/user-repo.ts", patch);

expect(result.findings.some((f) => f.id === "databaseOperationModified")).toBe(
true,
);
});

it("detects cacheLayerTouched for redis import", () => {
const patch = `@@ -0,0 +1,1 @@
+import Redis from 'ioredis';
`;
const result = assess("src/cache/client.ts", patch);

expect(result.findings.some((f) => f.id === "cacheLayerTouched")).toBe(true);
});

it("detects asyncIntroduced for sync to async change", () => {
const patch = `@@ -1,2 +1,2 @@
-function fetchData() {
+async function fetchData() {
`;
const result = assess("src/api.ts", patch);

expect(result.findings.some((f) => f.id === "asyncIntroduced")).toBe(true);
expect(result.riskHints.some((hint) => hint.includes("Async"))).toBe(true);
});

it("detects errorHandlingRemoved when try/catch is deleted", () => {
const patch = `@@ -1,4 +1,1 @@
-try {
- doWork();
-} catch (error) {
-}
+doWork();
`;
const result = assess("src/worker.ts", patch);

expect(result.findings.some((f) => f.id === "errorHandlingRemoved")).toBe(true);
});

it("detects concurrencyRisk when lock keyword and async change coexist", () => {
const patch = `@@ -1,3 +1,3 @@
-function acquireLock() {
+async function acquireLock() {
mutex.lock();
`;
const result = assess("src/concurrent/lock.ts", patch);

expect(result.findings.some((f) => f.id === "concurrencyRisk")).toBe(true);
});

it("returns empty risk for null patch", () => {
const parsed = parseUnifiedDiff("empty.ts", null);
const semantic = analyzeSemantics(parsed);

const result = analyzeRisk({
filename: "empty.ts",
language: "typescript",
semantic,
parsed,
});

expect(result.riskHints).toHaveLength(0);
expect(result.findings).toHaveLength(0);
});

it("filters hints by minConfidence", () => {
const patch = `@@ -0,0 +1,1 @@
+// minor comment in migrations folder
`;
const parsed = parseUnifiedDiff("src/db/migrations/readme.md", patch);
const semantic = analyzeSemantics(parsed, { language: "typescript" });

const lowThreshold = analyzeRisk(
{ filename: parsed.filename, language: "typescript", semantic, parsed },
{ minConfidence: 0.6 },
);
const highThreshold = analyzeRisk(
{ filename: parsed.filename, language: "typescript", semantic, parsed },
{ minConfidence: 0.9 },
);

expect(lowThreshold.riskHints.length).toBeGreaterThanOrEqual(
highThreshold.riskHints.length,
);
});
});
65 changes: 65 additions & 0 deletions packages/diff-parser/src/risk/analyze-risk.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { parseUnifiedDiff } from "../parse-unified-diff.js";
import type { ParsedFileDiff } from "../types.js";
import { analyzeSemantics } from "../semantic/analyze-semantics.js";
import type { AnalyzeSemanticsOptions } from "../semantic/analyze-semantics.js";
import { detectLanguage } from "../semantic/detect-language.js";
import {
DEFAULT_MIN_CONFIDENCE,
runRiskDetectors,
} from "./engine/run-detectors.js";
import type { RiskDetector } from "./interfaces/risk-detector.js";
import {
EMPTY_RISK_ANALYSIS,
type RiskAnalysisInput,
type RiskAnalysisResult,
} from "./types.js";

export interface AnalyzeRiskOptions {
minConfidence?: number;
detectors?: RiskDetector[];
}

export function analyzeRisk(
input: RiskAnalysisInput,
options: AnalyzeRiskOptions = {},
): RiskAnalysisResult {
if (input.parsed.isEmpty) {
return { ...EMPTY_RISK_ANALYSIS };
}

return runRiskDetectors(input, {
minConfidence: options.minConfidence ?? DEFAULT_MIN_CONFIDENCE,
detectors: options.detectors,
});
}

export interface ParseAnalyzeAndAssessRiskOptions extends AnalyzeSemanticsOptions {
minConfidence?: number;
detectors?: RiskDetector[];
}

export interface ParsedFileDiffWithSemanticAndRisk extends ParsedFileDiff {
semantic: ReturnType<typeof analyzeSemantics>;
risk: RiskAnalysisResult;
}

/** Parses patch, runs semantic + risk analysis in one step. */
export function parseAnalyzeAndAssessRisk(
filename: string,
patch: string | null,
options: ParseAnalyzeAndAssessRiskOptions = {},
): ParsedFileDiffWithSemanticAndRisk {
const language = options.language ?? detectLanguage(filename);
const parsed = parseUnifiedDiff(filename, patch);
const semantic = analyzeSemantics(parsed, { ...options, language });

const risk = analyzeRisk(
{ filename, language, semantic, parsed },
{
minConfidence: options.minConfidence,
detectors: options.detectors,
},
);

return { ...parsed, semantic, risk };
}
39 changes: 39 additions & 0 deletions packages/diff-parser/src/risk/detectors/async-detector.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import type { RiskDetector } from "../interfaces/risk-detector.js";
import type { RiskAnalysisInput, RiskFinding } from "../types.js";
import { truncateEvidence } from "../utils/confidence.js";
import { collectChangedLines } from "../utils/line-scan.js";

export class AsyncDetector implements RiskDetector {
readonly id = "asyncIntroduced" as const;

detect(input: RiskAnalysisInput): RiskFinding | null {
const evidence: string[] = [];

if (input.semantic.asyncChanges) {
evidence.push("semantic.asyncChanges=true");
}

const asyncAdds = collectChangedLines(input.parsed).filter(
(line) =>
line.side === "add" &&
(/\basync\s+function\b/.test(line.content) ||
/\basync\s+def\b/.test(line.content) ||
/\basync\s+\w+\s*\(/.test(line.content)),
Comment on lines +19 to +21
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在检测新增的异步行为时,当前的正则表达式无法匹配到非常常见的 异步箭头函数(例如 async () => {}async x => {}),因为 \basync\s+\w+\s*\( 要求 async 后面必须紧跟一个单词字符和括号。

建议优化正则表达式,以完整支持 JavaScript/TypeScript 中的异步箭头函数、类异步方法以及 Python 中的 async def

Suggested change
(/\basync\s+function\b/.test(line.content) ||
/\basync\s+def\b/.test(line.content) ||
/\basync\s+\w+\s*\(/.test(line.content)),
(/\basync\s+(?:function|def)\b/.test(line.content) ||
/\basync\s*(?:\w+\s*)?\(/.test(line.content) ||
/\basync\s+\w+\s*=>/.test(line.content)),

);

if (asyncAdds.length > 0) {
evidence.push(`async additions: ${asyncAdds.length}`);
}

if (evidence.length === 0) {
return null;
}

return {
id: this.id,
message: `Async behavior introduced in ${input.filename}`,
confidence: 0.9,
evidence: evidence.map(truncateEvidence),
};
}
}
Loading