-
Notifications
You must be signed in to change notification settings - Fork 37
Pull requests: strands-agents/evals
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
docs(agents): add high-quality PR guidance to AGENTS.md
#258
opened Jun 11, 2026 by
yonib05
Member
Loading…
1 task
docs: add AI contribution guidance to CONTRIBUTING and PR template
#257
opened Jun 11, 2026 by
yonib05
Member
Loading…
1 task
feat(issue-labeler): add LLM issue labeler for area and type
area-cli
CLI commands (run, report, validate, diagnose) and console display
enhancement
New feature or request
#255
opened Jun 11, 2026 by
yonib05
Member
Loading…
1 task
feat(redteam): add SequentialBreak narrative-scaffold attack strategy
area-redteam
Red teaming: adversarial generation, attack strategies, attack success evaluation
enhancement
New feature or request
#254
opened Jun 11, 2026 by
yeomjiwonyeom
Contributor
Loading…
feat(redteam): add PAIR single-stream multi-turn attack strategy
area-redteam
Red teaming: adversarial generation, attack strategies, attack success evaluation
enhancement
New feature or request
#253
opened Jun 10, 2026 by
yeomjiwonyeom
Contributor
Loading…
chore(redteam): redteam multi-agent session
area-redteam
Red teaming: adversarial generation, attack strategies, attack success evaluation
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#251
opened Jun 10, 2026 by
poshinchen
Contributor
Loading…
5 of 7 tasks
feat(redteam): add GOAT multi-turn attack strategy
area-redteam
Red teaming: adversarial generation, attack strategies, attack success evaluation
enhancement
New feature or request
#250
opened Jun 10, 2026 by
yeomjiwonyeom
Contributor
Loading…
feat(redteam): add Bad Likert Judge multi-turn attack strategy
area-redteam
Red teaming: adversarial generation, attack strategies, attack success evaluation
enhancement
New feature or request
#248
opened Jun 10, 2026 by
yeomjiwonyeom
Contributor
Loading…
ci: update opentelemetry-instrumentation-langchain requirement from <0.50.0,>=0.40.0 to >=0.40.0,<0.62.0
area-tracing
Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#237
opened Jun 2, 2026 by
dependabot
Bot
Loading…
ci: update pytest-asyncio requirement from <1.4.0,>=1.0.0 to >=1.0.0,<1.5.0
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#233
opened May 26, 2026 by
dependabot
Bot
Loading…
chore: allow additional fields to EvaluationData and flexible experiment report type
area-core
Core eval framework: Case, Experiment, task handler, evaluation data stores
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
#232
opened May 22, 2026 by
poshinchen
Contributor
Loading…
4 of 7 tasks
Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons
area-evaluators
Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics
enhancement
New feature or request
#229
opened May 14, 2026 by
venkatkrish543re
•
Draft
ci: update mypy requirement from <2.0.0,>=1.15.0 to >=1.15.0,<3.0.0
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#216
opened May 7, 2026 by
dependabot
Bot
Loading…
ci: update pre-commit requirement from <4.6.0,>=3.2.0 to >=3.2.0,<4.7.0
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#203
opened Apr 22, 2026 by
dependabot
Bot
Loading…
ci: update rich requirement from <15.0.0,>=14.0.0 to >=14.0.0,<16.0.0
area-cli
CLI commands (run, report, validate, diagnose) and console display
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#196
opened Apr 13, 2026 by
dependabot
Bot
Loading…
ci: bump actions/github-script from 8 to 9
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#193
opened Apr 10, 2026 by
dependabot
Bot
Loading…
feat: Add EvaluationPlugin for agent invocation evaluation and retry
area-core
Core eval framework: Case, Experiment, task handler, evaluation data stores
area-evaluators
Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics
enhancement
New feature or request
#166
opened Mar 18, 2026 by
afarntrog
Contributor
Loading…
5 of 7 tasks
ci: update langfuse requirement from <3,>=2.0.0 to >=2.0.0,<5
area-tracing
Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#155
opened Mar 11, 2026 by
dependabot
Bot
Loading…
ci: bump actions/upload-artifact from 6 to 7
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#149
opened Feb 27, 2026 by
dependabot
Bot
Loading…
ci: bump actions/download-artifact from 7 to 8
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#148
opened Feb 27, 2026 by
dependabot
Bot
Loading…
feat: add OTel test semantic convention attributes to Experiment spans
area-core
Core eval framework: Case, Experiment, task handler, evaluation data stores
area-tracing
Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL
enhancement
New feature or request
ci: update ruff requirement from <0.15.0,>=0.13.0 to >=0.13.0,<0.16.0
chore
Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact
dependencies
Pull requests that update a dependency file
python
Pull requests that update python code
#116
opened Feb 4, 2026 by
dependabot
Bot
Loading…
feat: Optional Case specific Goal for GoalSuccessRateEvaluator
area-evaluators
Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics
enhancement
New feature or request
#75
opened Dec 17, 2025 by
dbermuehler
•
Draft
7 tasks
feat: add Core eval framework: Case, Experiment, task handler, evaluation data stores
area-evaluators
Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics
enhancement
New feature or request
ContextualFaithfulnessEvaluator
area-core
#64
opened Dec 7, 2025 by
stefanoamorelli
Loading…
7 tasks done
Mapper for parsing langfuse traces to standard format
area-tracing
Trace/session ingestion: providers, session mappers, extractors, telemetry/OTEL
enhancement
New feature or request
#49
opened Nov 25, 2025 by
deepakdalakoti
Collaborator
Loading…
6 tasks done
ProTip!
no:milestone will show everything without a milestone.