feat(issue-labeler): add LLM issue labeler for area and type by yonib05 · Pull Request #255 · strands-agents/evals

yonib05 · 2026-06-11T02:14:59Z

Summary

Adds the reusable strands-agents/devtools/issue-labeler action to this repo, mirroring the harness-sdk setup but tuned for Evals.

What it does

Two parallel jobs run on opened issues and PRs and apply labels from a hardcoded allowlist (the LLM has no tools/shell/API access; worst case is a mislabel):

Area (max_labels: 2) — ten area-* labels mapped to src/strands_evals/ subsystems: evaluators, multimodal, simulation, detectors, chaos, redteam, generators, tracing, cli, core.
Type (max_labels: 1) — bug / enhancement / question / chore.

Design notes

Overlap handling. chaos, redteam, and multimodal each have their own evaluators/generators/sessions, so area.yml includes precedence rules: the specialized area wins over the general one (e.g. the recovery-strategy evaluator is area-chaos, not area-evaluators).
No catch-all. area-core is explicitly scoped to shared primitives, and the classifier is told to assign no area when none clearly applies.
Conventional-commit prefix is authoritative in type.yml, so a feat: PR is an enhancement even if it touches CI.
Python-only. No language job; the python label is applied elsewhere (issue templates / dependabot).

The 11 labels (10 area-* + chore) have already been created on the repo. Existing issues/PRs will be backfilled separately.

Testing

After merge, open a test issue and confirm area + type labels are applied correctly.

Add the reusable strands-agents/devtools issue-labeler to evals, mirroring the harness-sdk setup but tuned for this repo: - area.yml: ten area labels mapped to src/strands_evals/ subsystems (evaluators, multimodal, simulation, detectors, chaos, redteam, generators, tracing, cli, core). Includes precedence rules so the specialized areas (chaos/redteam/multimodal) win over the general ones they overlap with, and guards so area-core is not used as a catch-all. - type.yml: bug/enhancement/question/chore with the conventional-commit prefix treated as authoritative. - workflow: area capped at 2 labels, type at 1. No language job since evals is Python-only.

github-actions · 2026-06-11T02:18:29Z

Assessment: Approve

Clean, well-scoped CI addition (3 files, ~114 lines). The area→subsystem mappings were verified against the actual src/strands_evals/ tree and are accurate, and the precedence rules for chaos/redteam/multimodal correctly reflect that those subsystems own their own evaluators/generators. Only minor, non-blocking suggestions below.

Review Categories

Correctness: All area-* labels map to real packages; precedence and "no catch-all" guidance for area-core are sound.
Security: pull_request_target is used safely — sparse-checkout pulls only .github/labelers from the base repo (no PR-ref override), so no untrusted code executes, and the LLM output is constrained to a hardcoded label allowlist. Worst case is a mislabel, as the author notes.
Maintainability: Two near-duplicate jobs could be a matrix; action pinned to @main; opened-only trigger. All optional — see inline comments.
Conventions: checkout@v6 and devtools @main usage match existing workflows.

Nicely documented design notes in the PR description — the overlap/precedence reasoning made this easy to verify.

A web-based-interface feature request was misclassified as area-cli. Clarify that area-cli is strictly the command-line/console layer and that web UIs, GUIs, dashboards, and hosting have no dedicated area, so the classifier should assign no area label rather than reaching for area-cli.

Match conventional-commit prefixes regardless of scope (feat(core): == feat:), make the prefix authoritative over the user-facing heuristic so perf:/refactor: resolve to chore without contradiction, and frame the fallback around title + body since the classifier has no diff visibility.

yonib05 · 2026-06-11T03:11:59Z

Thanks for the review. On the three optional suggestions, I'm going to decline all three for now and keep the config as-is:

Matrix over the two jobs: with only two configs the duplication is minimal, and separate named jobs (Label: Area / Label: Type) read more clearly in the Actions UI and let us tune each independently. A matrix would save a few lines at the cost of that clarity.
Relabel on title edits: deliberately opened-only. Re-running on every edit means repeated model calls and the risk of fighting a maintainer's manual relabeling. The type label is a starting point for triage, not a binding classification.
Pin @main: intentionally consistent with the repo's other strands-agents/devtools action usages, which all track @main. Pinning just this one would be inconsistent; if we want SHA-pinning it should be a repo-wide decision.

All three are reasonable and we can revisit if the labeler grows.

With max_labels: 2 the classifier's output is truncated to the first two labels, so order is load-bearing. Instruct it to rank concrete subsystems ahead of the broader area-core, with bugs in a concrete subsystem taking priority.

Add the two cross-cutting labels (parity with harness-sdk): area-devx for SDK usability/papercuts/confusing APIs, area-community for repo health and CI dependency bumps. Both carry the same not-a-catch-all guards and slot into the priority ordering below concrete subsystems, with a worked example for the subsystem+devx co-label case.

github-actions · 2026-06-11T14:47:11Z

Thanks for the thorough replies — all three declines are well-reasoned and I agree with each:

Matrix: fair call; two named jobs read more clearly in the Actions UI and the duplication is minimal. Worth revisiting only if a third classifier shows up.
opened-only: agreed — re-running on every edit risks fighting manual relabels and burning model calls. Treating the type label as a triage starting point is the right framing.
@main pin: agreed it should be a repo-wide decision rather than a one-off here.

I also reviewed the new commit 64c23f7 (fix(issue-labeler): add area label priority ordering):

The "Order matters … rank by specificity" instruction is a genuinely good catch — with max_labels: 2 the classifier's output order is load-bearing (truncation keeps the first two), so ranking concrete subsystems ahead of area-core directly improves correctness.
The area-cli scope tightening (excluding web UIs / GUIs / dashboards) closes a plausible mislabel path cleanly.
The type.yml scope-aware prefix matching (feat(core): ≡ feat:) and the "title + body only, no diff" note are both correct and match how the action actually invokes the classifier.

I re-verified the area-* → src/strands_evals/ mappings against the current tree after this commit — still accurate. No new concerns.

Assessment: Approve. This is good to merge whenever you're ready.

github-actions · 2026-06-11T14:49:37Z

Thanks for the thorough responses — all three declines are well-reasoned and I'm happy to drop them. The "starting point for triage, not a binding classification" framing in particular is the right call for opened-only.

Re-review of the two follow-up commits

I checked 147ecb7 and 64c23f7 (the latest config). Both changes are good and I have no blocking concerns:

fix(issue-labeler): add area label priority ordering (area.yml)

The ordering instruction is a genuinely good catch: with max_labels: 2 the classifier's output is truncated, so ranking concrete subsystems ahead of area-core (and bugs-in-a-subsystem first) is exactly what keeps the kept labels meaningful. ✅
The expanded area-cli scope (terminal/console only; web UIs/GUIs/dashboards get no area) is verified accurate — there is no web/UI/dashboard/server/http subsystem under src/strands_evals/, so directing those to "no area unless area-core" is correct and won't strand a real package.

type.yml refinements

"You are given only the title and body — there is no diff or file list" is a valuable constraint to hand the classifier; it stops the model from hallucinating about changed files it can't see.
Spelling out the conventional-commit prefix mapping as an explicit list (including "scope counts, e.g. feat(core):") and "the prefix wins even when the description sounds user-facing" removes the main ambiguity from the original prose. ✅

Assessment unchanged: Approve. Mappings still verified against the real src/strands_evals/ tree, security model unchanged (base-repo sparse-checkout, hardcoded allowlist), and the three open inline threads remain optional. Nice iteration.

yonib05 temporarily deployed to auto-approve June 11, 2026 02:15 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 11, 2026

github-actions Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread .github/workflows/issue-labeler.yml

github-actions Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread .github/workflows/issue-labeler.yml

github-actions Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread .github/workflows/issue-labeler.yml

github-actions Bot removed the strands-running label Jun 11, 2026

yonib05 added area-cli CLI commands (run, report, validate, diagnose) and console display enhancement New feature or request labels Jun 11, 2026

yonib05 temporarily deployed to auto-approve June 11, 2026 02:34 — with GitHub Actions Inactive

github-actions Bot added strands-running and removed strands-running labels Jun 11, 2026

yonib05 temporarily deployed to auto-approve June 11, 2026 03:09 — with GitHub Actions Inactive

github-actions Bot added strands-running and removed strands-running labels Jun 11, 2026

yonib05 mentioned this pull request Jun 11, 2026

fix(templates): correct doc link and clean up issue/PR templates #256

Merged

1 task

fix(issue-labeler): add area label priority ordering

64c23f7

With max_labels: 2 the classifier's output is truncated to the first two labels, so order is load-bearing. Instruct it to rank concrete subsystems ahead of the broader area-core, with bugs in a concrete subsystem taking priority.

yonib05 temporarily deployed to auto-approve June 11, 2026 14:34 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 11, 2026

yonib05 temporarily deployed to auto-approve June 11, 2026 14:41 — with GitHub Actions Inactive

github-actions Bot removed the strands-running label Jun 11, 2026

yonib05 enabled auto-merge (squash) June 12, 2026 13:49

poshinchen approved these changes Jun 12, 2026

View reviewed changes

yonib05 merged commit b119e85 into strands-agents:main Jun 12, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(issue-labeler): add LLM issue labeler for area and type#255

feat(issue-labeler): add LLM issue labeler for area and type#255
yonib05 merged 5 commits into
strands-agents:mainfrom
yonib05:feat/issue-labeler

yonib05 commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

yonib05 commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yonib05 commented Jun 11, 2026

Summary

What it does

Design notes

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

yonib05 commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Re-review of the two follow-up commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants