Skip to content

Pull requests: strands-agents/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

docs(agents): add high-quality PR guidance to AGENTS.md
#258 opened Jun 11, 2026 by yonib05 Member Loading…
1 task
docs: add AI contribution guidance to CONTRIBUTING and PR template
#257 opened Jun 11, 2026 by yonib05 Member Loading…
1 task
feat(issue-labeler): add LLM issue labeler for area and type area-cli CLI commands (run, report, validate, diagnose) and console display enhancement New feature or request
#255 opened Jun 11, 2026 by yonib05 Member Loading…
1 task
feat(redteam): add SequentialBreak narrative-scaffold attack strategy area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation enhancement New feature or request
#254 opened Jun 11, 2026 by yeomjiwonyeom Contributor Loading…
feat(redteam): add PAIR single-stream multi-turn attack strategy area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation enhancement New feature or request
#253 opened Jun 10, 2026 by yeomjiwonyeom Contributor Loading…
chore(redteam): redteam multi-agent session area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#251 opened Jun 10, 2026 by poshinchen Contributor Loading…
5 of 7 tasks
feat(redteam): add GOAT multi-turn attack strategy area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation enhancement New feature or request
#250 opened Jun 10, 2026 by yeomjiwonyeom Contributor Loading…
feat(redteam): add Bad Likert Judge multi-turn attack strategy area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation enhancement New feature or request
#248 opened Jun 10, 2026 by yeomjiwonyeom Contributor Loading…
ci: update opentelemetry-instrumentation-langchain requirement from <0.50.0,>=0.40.0 to >=0.40.0,<0.62.0 area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#237 opened Jun 2, 2026 by dependabot Bot Loading…
ci: update pytest-asyncio requirement from <1.4.0,>=1.0.0 to >=1.0.0,<1.5.0 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#233 opened May 26, 2026 by dependabot Bot Loading…
chore: allow additional fields to EvaluationData and flexible experiment report type area-core Core eval framework: Case, Experiment, task handler, evaluation data stores chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#232 opened May 22, 2026 by poshinchen Contributor Loading…
4 of 7 tasks
Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#229 opened May 14, 2026 by venkatkrish543re Draft
ci: update mypy requirement from <2.0.0,>=1.15.0 to >=1.15.0,<3.0.0 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#216 opened May 7, 2026 by dependabot Bot Loading…
ci: update pre-commit requirement from <4.6.0,>=3.2.0 to >=3.2.0,<4.7.0 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#203 opened Apr 22, 2026 by dependabot Bot Loading…
ci: update rich requirement from <15.0.0,>=14.0.0 to >=14.0.0,<16.0.0 area-cli CLI commands (run, report, validate, diagnose) and console display chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#196 opened Apr 13, 2026 by dependabot Bot Loading…
ci: bump actions/github-script from 8 to 9 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code
#193 opened Apr 10, 2026 by dependabot Bot Loading…
feat: Add EvaluationPlugin for agent invocation evaluation and retry area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#166 opened Mar 18, 2026 by afarntrog Contributor Loading…
5 of 7 tasks
ci: update langfuse requirement from <3,>=2.0.0 to >=2.0.0,<5 area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#155 opened Mar 11, 2026 by dependabot Bot Loading…
ci: bump actions/upload-artifact from 6 to 7 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code
#149 opened Feb 27, 2026 by dependabot Bot Loading…
ci: bump actions/download-artifact from 7 to 8 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code
#148 opened Feb 27, 2026 by dependabot Bot Loading…
feat: add OTel test semantic convention attributes to Experiment spans area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL enhancement New feature or request
#131 opened Feb 10, 2026 by anirudha Draft
ci: update ruff requirement from <0.15.0,>=0.13.0 to >=0.13.0,<0.16.0 chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact dependencies Pull requests that update a dependency file python Pull requests that update python code
#116 opened Feb 4, 2026 by dependabot Bot Loading…
feat: Optional Case specific Goal for GoalSuccessRateEvaluator area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#75 opened Dec 17, 2025 by dbermuehler Draft
7 tasks
feat: add ContextualFaithfulnessEvaluator area-core Core eval framework: Case, Experiment, task handler, evaluation data stores area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request
#64 opened Dec 7, 2025 by stefanoamorelli Loading…
7 tasks done
Mapper for parsing langfuse traces to standard format area-tracing Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL enhancement New feature or request
#49 opened Nov 25, 2025 by deepakdalakoti Collaborator Loading…
6 tasks done
ProTip! no:milestone will show everything without a milestone.