Skip to content

feat(issue-labeler): add LLM issue labeler for area and type#255

Merged
yonib05 merged 5 commits into
strands-agents:mainfrom
yonib05:feat/issue-labeler
Jun 12, 2026
Merged

feat(issue-labeler): add LLM issue labeler for area and type#255
yonib05 merged 5 commits into
strands-agents:mainfrom
yonib05:feat/issue-labeler

Conversation

@yonib05

@yonib05 yonib05 commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Adds the reusable strands-agents/devtools/issue-labeler action to this repo, mirroring the harness-sdk setup but tuned for Evals.

What it does

Two parallel jobs run on opened issues and PRs and apply labels from a hardcoded allowlist (the LLM has no tools/shell/API access; worst case is a mislabel):

  • Area (max_labels: 2) — ten area-* labels mapped to src/strands_evals/ subsystems: evaluators, multimodal, simulation, detectors, chaos, redteam, generators, tracing, cli, core.
  • Type (max_labels: 1) — bug / enhancement / question / chore.

Design notes

  • Overlap handling. chaos, redteam, and multimodal each have their own evaluators/generators/sessions, so area.yml includes precedence rules: the specialized area wins over the general one (e.g. the recovery-strategy evaluator is area-chaos, not area-evaluators).
  • No catch-all. area-core is explicitly scoped to shared primitives, and the classifier is told to assign no area when none clearly applies.
  • Conventional-commit prefix is authoritative in type.yml, so a feat: PR is an enhancement even if it touches CI.
  • Python-only. No language job; the python label is applied elsewhere (issue templates / dependabot).

The 11 labels (10 area-* + chore) have already been created on the repo. Existing issues/PRs will be backfilled separately.

Testing

  • After merge, open a test issue and confirm area + type labels are applied correctly.

Add the reusable strands-agents/devtools issue-labeler to evals, mirroring
the harness-sdk setup but tuned for this repo:

- area.yml: ten area labels mapped to src/strands_evals/ subsystems
  (evaluators, multimodal, simulation, detectors, chaos, redteam,
  generators, tracing, cli, core). Includes precedence rules so the
  specialized areas (chaos/redteam/multimodal) win over the general ones
  they overlap with, and guards so area-core is not used as a catch-all.
- type.yml: bug/enhancement/question/chore with the conventional-commit
  prefix treated as authoritative.
- workflow: area capped at 2 labels, type at 1. No language job since
  evals is Python-only.
Comment thread .github/workflows/issue-labeler.yml
Comment thread .github/workflows/issue-labeler.yml
Comment thread .github/workflows/issue-labeler.yml
@github-actions

Copy link
Copy Markdown

Assessment: Approve

Clean, well-scoped CI addition (3 files, ~114 lines). The area→subsystem mappings were verified against the actual src/strands_evals/ tree and are accurate, and the precedence rules for chaos/redteam/multimodal correctly reflect that those subsystems own their own evaluators/generators. Only minor, non-blocking suggestions below.

Review Categories
  • Correctness: All area-* labels map to real packages; precedence and "no catch-all" guidance for area-core are sound.
  • Security: pull_request_target is used safely — sparse-checkout pulls only .github/labelers from the base repo (no PR-ref override), so no untrusted code executes, and the LLM output is constrained to a hardcoded label allowlist. Worst case is a mislabel, as the author notes.
  • Maintainability: Two near-duplicate jobs could be a matrix; action pinned to @main; opened-only trigger. All optional — see inline comments.
  • Conventions: checkout@v6 and devtools @main usage match existing workflows.

Nicely documented design notes in the PR description — the overlap/precedence reasoning made this easy to verify.

@yonib05 yonib05 added area-cli CLI commands (run, report, validate, diagnose) and console display enhancement New feature or request labels Jun 11, 2026
A web-based-interface feature request was misclassified as area-cli.
Clarify that area-cli is strictly the command-line/console layer and that
web UIs, GUIs, dashboards, and hosting have no dedicated area, so the
classifier should assign no area label rather than reaching for area-cli.
Match conventional-commit prefixes regardless of scope (feat(core): ==
feat:), make the prefix authoritative over the user-facing heuristic so
perf:/refactor: resolve to chore without contradiction, and frame the
fallback around title + body since the classifier has no diff visibility.
@yonib05

yonib05 commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review. On the three optional suggestions, I'm going to decline all three for now and keep the config as-is:

  • Matrix over the two jobs: with only two configs the duplication is minimal, and separate named jobs (Label: Area / Label: Type) read more clearly in the Actions UI and let us tune each independently. A matrix would save a few lines at the cost of that clarity.
  • Relabel on title edits: deliberately opened-only. Re-running on every edit means repeated model calls and the risk of fighting a maintainer's manual relabeling. The type label is a starting point for triage, not a binding classification.
  • Pin @main: intentionally consistent with the repo's other strands-agents/devtools action usages, which all track @main. Pinning just this one would be inconsistent; if we want SHA-pinning it should be a repo-wide decision.

All three are reasonable and we can revisit if the labeler grows.

With max_labels: 2 the classifier's output is truncated to the first two
labels, so order is load-bearing. Instruct it to rank concrete subsystems
ahead of the broader area-core, with bugs in a concrete subsystem taking
priority.
Add the two cross-cutting labels (parity with harness-sdk): area-devx for
SDK usability/papercuts/confusing APIs, area-community for repo health and
CI dependency bumps. Both carry the same not-a-catch-all guards and slot
into the priority ordering below concrete subsystems, with a worked
example for the subsystem+devx co-label case.
@github-actions

Copy link
Copy Markdown

Thanks for the thorough replies — all three declines are well-reasoned and I agree with each:

  • Matrix: fair call; two named jobs read more clearly in the Actions UI and the duplication is minimal. Worth revisiting only if a third classifier shows up.
  • opened-only: agreed — re-running on every edit risks fighting manual relabels and burning model calls. Treating the type label as a triage starting point is the right framing.
  • @main pin: agreed it should be a repo-wide decision rather than a one-off here.

I also reviewed the new commit 64c23f7 (fix(issue-labeler): add area label priority ordering):

  • The "Order matters … rank by specificity" instruction is a genuinely good catch — with max_labels: 2 the classifier's output order is load-bearing (truncation keeps the first two), so ranking concrete subsystems ahead of area-core directly improves correctness.
  • The area-cli scope tightening (excluding web UIs / GUIs / dashboards) closes a plausible mislabel path cleanly.
  • The type.yml scope-aware prefix matching (feat(core):feat:) and the "title + body only, no diff" note are both correct and match how the action actually invokes the classifier.

I re-verified the area-*src/strands_evals/ mappings against the current tree after this commit — still accurate. No new concerns.

Assessment: Approve. This is good to merge whenever you're ready.

@github-actions

Copy link
Copy Markdown

Thanks for the thorough responses — all three declines are well-reasoned and I'm happy to drop them. The "starting point for triage, not a binding classification" framing in particular is the right call for opened-only.

Re-review of the two follow-up commits

I checked 147ecb7 and 64c23f7 (the latest config). Both changes are good and I have no blocking concerns:

fix(issue-labeler): add area label priority ordering (area.yml)

  • The ordering instruction is a genuinely good catch: with max_labels: 2 the classifier's output is truncated, so ranking concrete subsystems ahead of area-core (and bugs-in-a-subsystem first) is exactly what keeps the kept labels meaningful. ✅
  • The expanded area-cli scope (terminal/console only; web UIs/GUIs/dashboards get no area) is verified accurate — there is no web/UI/dashboard/server/http subsystem under src/strands_evals/, so directing those to "no area unless area-core" is correct and won't strand a real package.

type.yml refinements

  • "You are given only the title and body — there is no diff or file list" is a valuable constraint to hand the classifier; it stops the model from hallucinating about changed files it can't see.
  • Spelling out the conventional-commit prefix mapping as an explicit list (including "scope counts, e.g. feat(core):") and "the prefix wins even when the description sounds user-facing" removes the main ambiguity from the original prose. ✅

Assessment unchanged: Approve. Mappings still verified against the real src/strands_evals/ tree, security model unchanged (base-repo sparse-checkout, hardcoded allowlist), and the three open inline threads remain optional. Nice iteration.

@yonib05 yonib05 enabled auto-merge (squash) June 12, 2026 13:49
@yonib05 yonib05 merged commit b119e85 into strands-agents:main Jun 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-cli CLI commands (run, report, validate, diagnose) and console display enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants