Skip to content

feat: recruiter resume-sync — incremental local resume cache for AI analysis#24

Open
qianjunye wants to merge 8 commits into
jackwener:mainfrom
qianjunye:main
Open

feat: recruiter resume-sync — incremental local resume cache for AI analysis#24
qianjunye wants to merge 8 commits into
jackwener:mainfrom
qianjunye:main

Conversation

@qianjunye

Copy link
Copy Markdown

Summary

  • 新增 boss recruiter resume-sync 命令:将候选人简历批量缓存为本地 Markdown 文件,支持增量更新
  • 修复简历 Markdown 输出缺失字段resume-download 此前缺失时间段、职责、学历文字等关键字段
  • 消除重复代码resume-download 改为复用统一的 _build_candidate_md 函数

Background

招聘方分析 300 名候选人时需实时调用 API,耗时约 20 分钟,无法在 AI 工具中高效使用。本 PR 引入本地缓存机制,首次全量同步后(约 4 分钟),后续增量更新只需 10-30 秒,AI 工具可直接读取本地 .md 文件进行分析,无需实时 API 调用。

New Command

# 同步指定岗位候选人简历到本地
boss recruiter resume-sync <encryptJobId>

# 同步所有在线岗位(不指定 ID)
boss recruiter resume-sync

# 常用选项
boss recruiter resume-sync <id> --output-dir ./candidates   # 指定输出目录
boss recruiter resume-sync <id> --force                     # 忽略 24h 冷却强制重拉
boss recruiter resume-sync <id> --dry-run                   # 预览不写文件

缓存目录结构:

$BOSS_CACHE_DIR/
  /{encrypt_job_id}/
    _meta.json          # 岗位信息 + 同步状态 + 候选人 uid 列表
    /{encrypt_uid}.md   # 候选人简历 Markdown

增量逻辑: 只拉取 _meta.json 中不存在的新候选人。消失的候选人标记为 archived(文件保留)。24h 内已同步的岗位自动跳过(--force 可绕过)。

Resume Markdown Fixes

_build_candidate_md 函数修复了以下字段名错误(resume-downloadresume-sync 共用此函数):

字段 修复前 修复后
学历文字 degree(返回数字码 203) degreeName(返回"本科")
工作时间段 timeDesc(字段不存在) startYearMonStr + endYearMonStr
工作职责 description(字段不存在) responsibility
求职期望 base_info.expectPosition(不存在) geekExpPosList[] 数组
个人简介 未输出 userDescription
项目经历列表 geekProjectExpList(错误) geekProjExpList

Test Plan

  • boss recruiter resume-sync <id> 首次全量同步,生成完整 Markdown 文件
  • 再次运行跳过已有候选人(增量)
  • --force 强制重拉覆盖已有文件
  • --dry-run 预览不写文件
  • boss recruiter resume-download 输出包含时间段、学历文字、职责、求职期望、个人简介
  • encryptJobId 时同步所有在线岗位

🤖 Generated with Claude Code

william.qian and others added 8 commits April 15, 2026 17:06
对 200+ 候选人进行 AI 分析时,实时逐个拉取简历(受限速机制约束)需耗时 15 分钟以上,
严重影响招聘分析工作流的效率。新增本地缓存功能,将候选人简历一次性同步到本地,
后续 AI 分析直接读取本地文件,无需重复调用 API。

When analyzing 200+ candidates via AI, real-time per-candidate fetching
takes 15+ minutes due to rate-limit delays, making recruiter workflows
impractical. This adds a local cache layer so resumes are synced once
and read instantly from disk during analysis.

- Add `boss recruiter sync` subcommand to incrementally cache candidate
  resumes as local Markdown files under $BOSS_CACHE_DIR
- 新增 `boss recruiter sync` 子命令,将候选人简历增量缓存到本地 Markdown 文件
- Support --job (single job), --output-dir, --force, --dry-run, --json
- Incremental by encrypt_uid: skip already-cached candidates
- 按 encrypt_uid 增量更新,已缓存的候选人自动跳过
- 24h cooldown per job (bypass with --force)
- Archived tracking for candidates who leave the recommend list
- Update SKILL.md with full Recruiter Mode section and sync docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: repair incomplete resume Markdown output and rename sync command

问题: resume-download 和 recruiter sync 生成的 Markdown 简历缺失多个字段:
- 工作经历无时间段(API 返回 startYearMonStr/endYearMonStr,代码找 timeDesc)
- 工作职责为空(字段名是 responsibility,不是 description)
- 学历显示数字代码 203(应用 degreeName 文字字段)
- 求职期望(城市/薪资)缺失(在 geekExpPosList,不在 base_info)
- 个人简介(userDescription)未输出

修复: 重写 _build_candidate_md,使用正确字段名;resume-download 改为复用此函数消除重复代码。

命令变更: recruiter sync → recruiter resume-sync,岗位 ID 改为可选参数(不加同步全部)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs: update README and SKILL.md for resume-sync command rename

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QR code login cannot obtain __zp_stoken__ because the cookie is
generated by Boss Zhipin's client-side JavaScript on page load.
The existing Camoufox headless browser approach is sometimes detected
and blocked by Boss's anti-bot fingerprinting.

Add _hydrate_stoken_via_cdp() which connects to a real Chrome instance
via Chrome DevTools Protocol (port 9222), navigates to zhipin.com, and
harvests the cookie after JS runs.  A real browser session is not
subject to headless-browser fingerprint checks, making this approach
more reliable.

The new strategy in browser_qr_login() is:
  1. Try CDP (real Chrome on port 9222) — most reliable
  2. Fall back to Camoufox — works when Chrome is not available
  3. Log a clear hint if both fail, directing the user to launch Chrome
     with --remote-debugging-port=9222

The CDP path requires websocket-client (optional dependency) and is
silently skipped when the package is absent or Chrome is not running.

Fixes jackwener#21

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New 'boss login --cdp [--cdp-port N]' reads cookies directly from a
  running Chrome session (--remote-debugging-port). No QR scan needed.
- __zp_stoken__ is now treated as optional. wt2/wbg/zp_at unlock most
  recruiter APIs (recommend, inbox, chat, resume-sync); search and a few
  communication endpoints may degrade with '环境异常' when stoken absent.
- Update SKILL.md and README to document --cdp as the recommended flow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants