feat(ci): daily flake report agent#16417
Conversation
Adds a Monday agentic workflow that analyses the previous 7 days of Playwright CI runs, clusters recurring failures, and opens a single Draft PR fixing flakes that meet the escalation threshold. Bumps Playwright artifact retention 7 -> 30 days so the 7-day window is reliably full.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Pull request overview
Adds a weekly agentic GitHub workflow (built on gh-aw) that analyses the previous 7 days of Playwright CI runs, clusters recurring test failures, opens a tracking issue, and opens a single Draft PR with auto-fixes for flake/infra/test-data clusters. Bumps Playwright artifact retention to 30 days so the 7-day analysis window is reliably populated, and pins the new gh-aw–managed actions.
Changes:
- New
weekly-flake-report.mdagentic workflow + auto-generated.lock.yml. playwright.yaml: artifactretention-days7 → 30.dependabot.ymlignoresgithub/gh-aw-actions/**;actions-lock.jsonadds pinned SHAs for the new actions.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/weekly-flake-report.md |
New gh-aw prompt defining the 5-phase weekly flake-report agent. |
.github/workflows/weekly-flake-report.lock.yml |
Auto-generated compiled workflow (do not edit). |
.github/workflows/playwright.yaml |
Increases test-results artifact retention to 30 days. |
.github/dependabot.yml |
Reformatted; ignores gh-aw-managed actions from Dependabot. |
.github/aw/actions-lock.json |
Pins SHAs for actions/github-script@v9.0.0 and github/gh-aw-actions/setup@v0.77.5. |
| opens a single Draft PR fixing all flakes that meet the escalation | ||
| threshold (≥3 distinct PRs OR ≥5 occurrences). | ||
| on: | ||
| schedule: weekly on monday |
| gh api --paginate \ | ||
| "repos/ohcnetwork/care_fe/actions/workflows/playwright.yaml/runs?per_page=100&created=>=$SINCE&status=completed" \ | ||
| --jq '.workflow_runs[] | {id, run_number, head_branch, event, conclusion, created_at, pull_requests: [.pull_requests[].number], head_sha}' \ | ||
| > /tmp/gh-aw/agent/runs.ndjson |
Project convention: never add test-id / data-testid attributes to application source. Update the agent prompt to use role-based selectors (getByRole/getByLabel/getByText) and to leave any cluster that would require a test id for human triage.
…based IDs - Cadence: weekly -> daily (analysis window stays at 7 days for stable signal) - Rename file/branch/title prefixes from weekly-flake-report -> flake-report - IDs: ISO week (YYYY-WW) -> date (YYYY-MM-DD) - Add dedupe: agent checks for existing open flake-report issue/PR before opening another - Quiet days no longer create empty issues - Verification looks at last 7 days of open reports, not just last week
Deploying care-preview with
|
| Latest commit: |
5aaf3bf
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e2ca0230.care-preview-a7w.pages.dev |
| Branch Preview URL: | https://feat-weekly-flake-report.care-preview-a7w.pages.dev |
🎭 Playwright Test ResultsStatus: ❌ Failed
📊 Detailed results are available in the playwright-final-report artifact. Run: #9323 |
| on: | ||
| schedule: daily | ||
| workflow_dispatch: |
| gh api --paginate \ | ||
| "repos/ohcnetwork/care_fe/actions/workflows/playwright.yaml/runs?per_page=100&created=>=$SINCE&status=completed" \ |
Approach
A daily agentic workflow (
.github/workflows/flake-report.md, built ongh-aw) runs every day and:playwright.yamlworkflow runs via paginatedgh api.playwright-results-shard-*artifacts and parsestest-results.json.(test_id, signature).waitForResponse,waitForURL, web-first assertions, role-based selectors (getByRole/getByLabel/getByText), per-worker fixtures. Never adds sleeps, skips tests, weakens assertions, or addsdata-testid/testidattributes to application code. Never touchessrc/or healthcare/clinical logic.Dedupe: before opening a new issue or PR, the agent checks for an existing open
[flake-report]issue/Draft PR from the last 3 days covering the same clusters and comments on the existing artifact instead.Working
Cadence note: running daily during the shakedown period. The analysis window stays at 7 days (rolling) for stable signal. Cadence will move to weekly once the workflow is proven reliable.
If a day has zero qualifying clusters, no issue or PR is created — silent on quiet days. If >15 clusters appear (regression spike), the issue is tagged
needs-triageand no PR is opened.Changes
.github/workflows/flake-report.md— new agentic workflow.github/workflows/flake-report.lock.yml— auto-generated bygh aw compile.github/workflows/playwright.yaml— artifactretention-days: 7 → 30so the 7-day analysis window is reliably full.github/dependabot.yml— ignoregithub/gh-aw-actions/**(version-locked to gh-aw compiler).github/aw/actions-lock.json— pinned SHAs for new actions referenced by the lock fileMerge Checklist