feat(issue-labeler): add LLM issue labeler for area and type#255
Conversation
Add the reusable strands-agents/devtools issue-labeler to evals, mirroring the harness-sdk setup but tuned for this repo: - area.yml: ten area labels mapped to src/strands_evals/ subsystems (evaluators, multimodal, simulation, detectors, chaos, redteam, generators, tracing, cli, core). Includes precedence rules so the specialized areas (chaos/redteam/multimodal) win over the general ones they overlap with, and guards so area-core is not used as a catch-all. - type.yml: bug/enhancement/question/chore with the conventional-commit prefix treated as authoritative. - workflow: area capped at 2 labels, type at 1. No language job since evals is Python-only.
|
Assessment: Approve Clean, well-scoped CI addition (3 files, ~114 lines). The area→subsystem mappings were verified against the actual Review Categories
Nicely documented design notes in the PR description — the overlap/precedence reasoning made this easy to verify. |
A web-based-interface feature request was misclassified as area-cli. Clarify that area-cli is strictly the command-line/console layer and that web UIs, GUIs, dashboards, and hosting have no dedicated area, so the classifier should assign no area label rather than reaching for area-cli.
Match conventional-commit prefixes regardless of scope (feat(core): == feat:), make the prefix authoritative over the user-facing heuristic so perf:/refactor: resolve to chore without contradiction, and frame the fallback around title + body since the classifier has no diff visibility.
|
Thanks for the review. On the three optional suggestions, I'm going to decline all three for now and keep the config as-is:
All three are reasonable and we can revisit if the labeler grows. |
With max_labels: 2 the classifier's output is truncated to the first two labels, so order is load-bearing. Instruct it to rank concrete subsystems ahead of the broader area-core, with bugs in a concrete subsystem taking priority.
Add the two cross-cutting labels (parity with harness-sdk): area-devx for SDK usability/papercuts/confusing APIs, area-community for repo health and CI dependency bumps. Both carry the same not-a-catch-all guards and slot into the priority ordering below concrete subsystems, with a worked example for the subsystem+devx co-label case.
|
Thanks for the thorough replies — all three declines are well-reasoned and I agree with each:
I also reviewed the new commit
I re-verified the Assessment: Approve. This is good to merge whenever you're ready. |
|
Thanks for the thorough responses — all three declines are well-reasoned and I'm happy to drop them. The "starting point for triage, not a binding classification" framing in particular is the right call for Re-review of the two follow-up commitsI checked
Assessment unchanged: Approve. Mappings still verified against the real |
Summary
Adds the reusable
strands-agents/devtools/issue-labeleraction to this repo, mirroring the harness-sdk setup but tuned for Evals.What it does
Two parallel jobs run on opened issues and PRs and apply labels from a hardcoded allowlist (the LLM has no tools/shell/API access; worst case is a mislabel):
max_labels: 2) — tenarea-*labels mapped tosrc/strands_evals/subsystems: evaluators, multimodal, simulation, detectors, chaos, redteam, generators, tracing, cli, core.max_labels: 1) — bug / enhancement / question / chore.Design notes
area.ymlincludes precedence rules: the specialized area wins over the general one (e.g. the recovery-strategy evaluator isarea-chaos, notarea-evaluators).area-coreis explicitly scoped to shared primitives, and the classifier is told to assign no area when none clearly applies.type.yml, so afeat:PR is an enhancement even if it touches CI.pythonlabel is applied elsewhere (issue templates / dependabot).The 11 labels (10
area-*+chore) have already been created on the repo. Existing issues/PRs will be backfilled separately.Testing