Candid Chrome QA
Drive a real Chrome session against your running web app, walk it like a real user, and write structured findings to disk for triage. Catches the bugs your unit tests miss — UX quirks, mobile-only layout breaks, accessibility violations, console errors, slow API calls.
Quick Start
/candid-chrome-qaYou’ll be asked for:
- goal — what surface to test (e.g. “Agent Config tabs, all 16”)
- prompt — free-form QA plan (edge cases, hot spots, recently-changed surfaces)
- app URL — verified before any work begins (curl health check)
The pass writes findings to .context/findings/<YYYY-MM-DD>-<slug>.json and prints a summary to your terminal at the end.
/candid-chrome-qa --url http://localhost:3000 # Skip URL prompt
/candid-chrome-qa --mobile-only # Mobile pass onlyHow It Works
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Candid Chrome QA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Goal: Agent Config tabs, all 16
URL: http://localhost:3000
Steps:
1. ✅ Pre-flight (server up, fresh tab, baseline cleared)
2. 🖥️ Desktop pass (1440x900) — walk every target
3. 📱 Mobile pass (390x844) — top 5 targets re-walked
4. 🔍 Cross-cutting probes (a11y, touch targets)
5. 📝 Append findings to JSON (per-finding, not batched)
6. 📊 Final summary (file + stdout)Each target gets the flush-capture cycle: clear console + network → exercise interactions + 1 edge case → wait → capture telemetry → append finding. This keeps every step’s signal clean.
Pre-Flight Checks
Before any QA work, candid-chrome-qa verifies:
- Dev server is up —
curlhealth check on the URL. Anything other than 2xx/3xx aborts with a clear ask. - Fresh Chrome tab — uses
mcp__claude-in-chrome__tabs_context_mcpto enumerate, creates a new tab unless you say otherwise. - Desktop default viewport —
1440x900. - Logged-in state confirmed — if you land on a login or onboarding route, the skill asks before proceeding.
- Console baseline cleared — sets the high-water mark so per-target probes only show fresh entries.
- Required data present — if the goal touches lists/details and the account is empty, the skill asks rather than silently downgrading to source-only review.
- Findings file opened —
.context/findings/<YYYY-MM-DD>-<slug>.jsoninitialised with the schema and emptyfindings: [].
If any check fails, the pass stops with a focused error before any browser work begins.
Per-Target Loop
For each target identified in goal + prompt:
- Navigate (sidebar buttons, not direct URLs, for client-side routing apps)
- Flush console + network telemetry
- Read interactive elements
- Exercise 2–3 main interactions + 1 edge case (empty input, invalid format, rapid double-click, navigation while dirty)
- Wait 1–2s for in-flight requests
- Capture telemetry — apply the console triage table and network thresholds
- Append findings to disk (immediately, not buffered)
- Reset state if you mutated something that affects subsequent targets
Edge-case patterns
The “1 edge case” per target reaches for one of these — they catch the majority of production-only failures:
- Race condition — fire 3+ identical requests within 500 ms (rapid double-click save). Watch for mixed 200/500 on the same endpoint.
- Stale state — mutate → navigate away → come back. Missing re-fetch after mutation = stale state.
- Silent failure — perform an action that returns 200, then refresh and verify the change actually persisted.
- Memory drift — repeat an action 20× and snapshot
document.querySelectorAll('*').lengthearly vs late. - Idle WebSocket — for real-time features, idle 60s and check console for close events without reconnect.
Cross-Cutting Probes
Run once per pass via javascript_exec. These catch things click-by-click won’t:
- A11y probe — counts icon-only buttons without labels, images without alt text, inputs without labels.
- Touch-target probe (mobile) — flags any tappable element under 44×44 px.
Both probes return structured data the skill turns into findings tagged category: "a11y".
Mobile Pass
After desktop, the skill resizes to 390x844 and re-walks the 5 most-used targets (or your specified subset). The touch-target probe runs here. Mobile-only bugs surface in roughly 30% of passes — this step is required, not optional.
Resize ceiling: Chrome may enforce a minimum content width of ~1075 px depending on UI chrome. If 390 doesn’t take effect, the skill falls back to the smallest reachable width (often ~500 px) — the mobile breakpoint still triggers, the desktop sidebar still hides, and the hamburger drawer still appears. The actual width is recorded in the finding’s
evidence.
Output Schema (v2.0)
Every finding file looks like this:
{
"schemaVersion": "2.0",
"context": {
"createdAt": "2026-04-25T18:30:00Z",
"scope": "Agent Config tabs, all 16",
"originatingPrompt": "Walk every tab, exercise save/cancel, hammer phone provisioning",
"environment": {
"url": "http://localhost:3000",
"branch": "feature/agent-config-v2",
"commit": "8001a89",
"viewport": "1440x900",
"agentModel": "claude-opus-4-7"
}
},
"findings": [
{
"id": "F-a3f29b71",
"severity": "P0",
"category": "bug",
"surface": "dashboard/agents/[id]/voice",
"viewport": "both",
"url": "http://localhost:3000/dashboard/agents/agent_123/voice",
"title": "Save button does nothing on Voice tab when phone is unprovisioned",
"repro": "1. Open agent\n2. Voice tab\n3. Click Save",
"expected": "Either save or show validation error",
"actual": "Click registers, no network call, no error, no state change",
"evidence": {
"consoleErrors": [
{"level": "warn", "message": "[react-hook-form] missing required: phoneNumber"}
],
"networkRequests": [],
"filesLikelyTouched": ["app/src/components/agent/VoiceTab.tsx:142"]
},
"suggestedFix": "Surface the form validation error in the UI; current handler swallows it",
"groupHint": "form-state",
"confidence": "definite",
"capturedAt": "2026-04-25T18:34:12Z"
}
],
"summary": {
"total": 19,
"bySeverity": {"p0": 2, "p1": 5, "p2": 8, "p3": 2, "p4": 1, "p5": 1},
"byCategory": {"bug": 8, "a11y": 4, "perf": 2, "ux": 3, "copy": 1, "security": 0, "compat": 1}
}
}Per-finding required fields
| Field | Type | Description |
|---|---|---|
id | string | F-<8charSlug> |
severity | enum | P0 (urgent) → P5 (idea) |
category | enum | bug | a11y | perf | ux | copy | security | compat |
surface | string | Route or component name |
viewport | enum | desktop | mobile | both |
url | string | Full repro URL with query/hash |
title | string | One-line, < 80 chars |
repro | string | Numbered steps |
expected | string | One-line expected behaviour |
actual | string | One-line observed behaviour |
suggestedFix | string | One-line fix proposal |
groupHint | string | Topic slug for clustering related findings |
confidence | enum | definite | likely | suspected |
capturedAt | ISO8601 | Per-finding timestamp |
evidence is optional but include any non-obvious signal you captured.
Severity scale
Maps to Linear priority:
| Severity | Meaning | Linear |
|---|---|---|
| P0 | feature broken, blocks user | Urgent |
| P1 | clear bug or a11y violation | High |
| P2 | UX polish | Medium |
| P3 | perf concern | Medium |
| P4 | copy / wording | Low |
| P5 | enhancement idea | No priority |
Migrating from v1
schemaVersion: "2.0" is the current contract. v1 consumers expecting body, status, tag, or stringified consoleErrors/networkRequests arrays need migration:
tag→surface(rename)consoleErrors: string[]→[{level, message}]networkRequests: string[]→[{method, url, status, durationMs?}]bodyfield — dropped (pure derivation; render at consumption time fromrepro/expected/actual)statusfield — dropped (finding lifecycle is the consumer’s concern, not the producer’s)
Final Summary
When the pass completes, the skill writes the summary block to the JSON file and prints to stdout:
Chrome QA pass: Agent Config tabs, all 16
Total findings: 19
Severity:
P0 (urgent): 2
P1 (high): 5
P2 (medium): 8
P3 (perf): 2
P4 (copy): 1
P5 (idea): 1
Category:
bug: 8
a11y: 4
perf: 2
ux: 3
copy: 1
compat: 1
Top issues (P0 + P1):
• [P0] Save button does nothing on Voice tab when phone is unprovisioned — http://localhost:3000/dashboard/agents/agent_123/voice
• [P0] Empty agent list crashes dashboard render — http://localhost:3000/dashboard
• [P1] Phone number input accepts non-numeric characters — http://localhost:3000/dashboard/agents/new
...
Findings file: .context/findings/2026-04-25-agent-config.jsonThis is a courtesy for users who don’t run a triage tool — the JSON file is always the source of truth.
CLI Flags
| Flag | Description | Default |
|---|---|---|
--url <url> | App URL to test (overrides prompted value) | (prompted) |
--mobile-only | Skip desktop pass and run mobile-only | false |
Hard Rules
The skill enforces these — it will refuse to act otherwise:
- Never click destructive actions without explicit approval each time: delete, disconnect, drop, force, publish, purchase.
- Never silently downgrade from live QA to source-only review. If data is missing, ask.
- Never invent the schema. v2.0 fields, byte-for-byte. Downstream consumers depend on it.
- Never batch-write findings at the end. Append per-finding. Context can exhaust mid-pass.
- Never claim “no issues” without running the cross-cutting probes.
- Never resize to mobile without first finishing desktop. Mobile is additive, not a substitute. Exception:
--mobile-onlyexplicitly opts out of this rule. - Never use a stale tab. Always confirm tab context first.
- Never proceed without verifying Claude in Chrome is loaded in pre-flight — without it, every browser tool call fails opaquely.
Configuration
Optional chromeQA block in .candid/config.json:
{
"chromeQA": {
"defaultUrl": "http://localhost:3000",
"apiPathPattern": "/api/",
"desktopViewport": "1440x900",
"mobileViewport": "390x844"
}
}| Field | Default | Description |
|---|---|---|
defaultUrl | (prompted) | Skip the URL prompt on every run |
apiPathPattern | /api/ | What path prefix to capture when pulling network telemetry — useful for non-standard backends (e.g. /v2/api/, supabase) |
desktopViewport | 1440x900 | Viewport size for the desktop pass |
mobileViewport | 390x844 | Viewport size for the mobile pass |
Technical.md integration
If your project has a Technical.md (Candid’s standards file — see Technical.md docs), the skill reads it during pre-flight and uses any QA-relevant rules during the pass. Project-specific rules to encode there:
- Browser support matrix (e.g. “support last 2 Chrome / Safari / Firefox; no IE”)
- Accessibility target (WCAG A / AA / AAA)
- API base path (e.g. “/v2/api/”)
- Design-system constraints (e.g. “all CTA buttons use Button.Primary; no inline styles”)
- Auth setup (e.g. “test account
qa@example.comlives in.env.local”)
Findings against Technical.md rules surface with surface: "Technical.md" so they’re easy to filter downstream.
Requirements
- A running web app reachable via HTTP (typically
http://localhost:<port>) - The Claude in Chrome MCP installed in your Claude Code environment — the skill checks for
mcp__claude-in-chrome__*tools in pre-flight step 0 and aborts with an install link if absent - A writable
.context/directory in your workspace (the skill creates.context/findings/if missing)
FAQ
Can I use it on production URLs?
Yes, but be explicit about it. The skill’s “Hard Rules” prevent destructive actions by default — but production URLs deserve a careful read of the goal/prompt before kicking off a pass that mutates data.
What if the dev server isn’t running?
Pre-flight aborts with a focused error: Pre-flight failed: <url> returned 000 (not 2xx/3xx). Start your dev server and re-run. No browser work happens until the server responds.
What if Claude in Chrome isn’t installed?
Pre-flight step 0 catches this and prints an install link. Run /candid-chrome-qa again after you’ve installed the MCP and restarted Claude Code.
Why JSON instead of a markdown report?
Findings are designed to feed downstream triage tools that turn them into Linear issues. The JSON contract is stable across passes; markdown rendering can be done at consumption time. The end-of-pass stdout summary covers the “I just want to read it” case.
What if my findings file already exists for today?
The skill switches to a <YYYY-MM-DD>-<HHmm>-<slug>.json filename to avoid clobbering a prior pass. Two passes on the same day produce two files, both preserved.
Can I disable the mobile pass?
Mobile is required by default — a third of bugs are mobile-only. There is no flag to skip it; if you truly want desktop-only, run a single-target pass and stop after the desktop section. The --mobile-only flag does the opposite — it skips desktop and runs mobile only (use sparingly; it’s an explicit opt-out of Hard Rule 6).
Can the same finding be deduplicated across passes?
Yes. The id field is a deterministic SHA-1 hash of <url>|<title> — the same finding in a re-run gets the same ID. Triage tools can use this to suppress duplicates.
See Also
- Candid Chrome QA Fix — automatically fix findings from a QA pass
- Candid Ship — ship fixes after resolving QA findings
- Technical.md — define standards that QA checks enforce