Building Auto-Quarantine Workflows

Quarantine state machine: Active to Quarantined on threshold breach, through Cooldown re-validation, then graduating back or staying isolated until fixed.

Pipeline Architecture & Trigger Logic #

Auto-quarantine relies on a continuous feedback loop between test execution, result parsing, and state management. The workflow begins when a CI job completes and routes test artifacts to a centralized processor. By leveraging Automated Flaky Test Detection Tools, teams can parse JUnit/XML reports, extract failure signatures, and calculate flakiness scores in real time. The quarantine trigger executes via a dedicated CI stage that updates a shared configuration file or database before the next pipeline run.

Data Flow & State Management #

Test results are serialized into a structured format containing test ID, failure count, pass count, and execution environment. A lightweight script evaluates these metrics against predefined thresholds. If the flakiness ratio exceeds the limit, the test is flagged for isolation. State persistence is handled via Git commits, artifact storage, or a lightweight KV store, ensuring idempotent pipeline behavior.

Framework-Specific Implementation Patterns #

Cypress and Playwright handle test isolation differently, requiring tailored quarantine logic. For Cypress, a pre-run plugin can skip quarantined tests by reading an external manifest. Playwright leverages its built-in test.skip() and test.fixme() APIs, which can be injected programmatically via a test hook or custom reporter. For a complete Cypress implementation, refer to How to Auto-Quarantine Flaky Cypress Tests.

Dynamic Test Filtering #

Instead of hardcoding skips, use a manifest-driven approach. Generate a quarantine-manifest.json during the analysis phase. In the pre-test hook, read the manifest and apply framework-specific skip directives. This keeps the quarantine logic decoupled from test code and enables rapid rollback.

Threshold Configuration & Historical Baselines #

Effective quarantine requires statistically sound thresholds. Relying on single-run failures causes over-quarantining, while lenient thresholds allow flaky tests to persist. Integrate Historical Flakiness Tracking & Analytics to establish rolling baselines. Configure a sliding window where a test is quarantined if it fails intermittently across multiple distinct commits or environments.

Adaptive Thresholds #

Implement exponential backoff for re-evaluation. Once quarantined, a test enters a cooldown period. After the cooldown, it runs in a dedicated reliability suite. If it passes consistently, it graduates back to the main pipeline. If it fails again, it remains isolated until a code fix is merged.

Governance, Recovery & Team Alignment #

Automation must be paired with clear ownership. Auto-quarantine generates actionable tickets, assigns them to the owning squad, and tracks mean time to resolution. While automation handles volume, human review remains critical for complex race conditions or infrastructure drift.

Quarantine Lifecycle Management #

Define SLAs for quarantined tests. Use CI badges and messaging webhooks to notify stakeholders. Implement a mandatory PR check that prevents merging until quarantined tests are either fixed or explicitly approved for extended isolation.

Production-Ready Implementation Examples #

Cypress Dynamic Quarantine Hook #

// File: cypress/support/quarantine.ts
import { readFileSync, existsSync } from 'fs';

const MANIFEST_PATH = './quarantine-manifest.json';

// Cypress supports skipping via before() hooks in support files.
// Calling test.skip() from Cypress.on('test:before:run') is not supported;
// use before() at the spec level or the task API instead.
if (existsSync(MANIFEST_PATH)) {
  const manifest = JSON.parse(readFileSync(MANIFEST_PATH, 'utf-8'));
  const quarantinedTests = new Set<string>(manifest.quarantined);

  before(function () {
    if (quarantinedTests.has(this.currentTest?.title ?? '')) {
      this.skip();
    }
  });
}

CI Pipeline Impact: Executes synchronously before each spec run. Adds <50ms overhead per test suite.
Trade-offs: Requires manifest synchronization across parallel CI runners. Use a centralized artifact store or Git LFS for distributed execution.

Playwright Reporter-Based Quarantine #

// File: tests/reporters/quarantine-reporter.ts
import type { Reporter, TestCase, TestResult } from '@playwright/test/reporter';
import { readFileSync, writeFileSync, existsSync } from 'fs';

export default class QuarantineReporter implements Reporter {
  private manifestPath = 'quarantine-manifest.json';

  onTestEnd(test: TestCase, result: TestResult) {
    // Playwright marks a test 'flaky' when it fails then passes on retry.
    // retries must be >= 1 in playwright.config.ts for this status to appear.
    if (result.status === 'flaky') {
      const manifest = existsSync(this.manifestPath)
        ? JSON.parse(readFileSync(this.manifestPath, 'utf-8'))
        : { quarantined: [] };

      if (!manifest.quarantined.includes(test.title)) {
        manifest.quarantined.push(test.title);
        writeFileSync(this.manifestPath, JSON.stringify(manifest, null, 2));
      }
    }
  }
}

CI Pipeline Impact: Runs post-execution. Zero impact on test runtime. Manifest updates trigger a follow-up commit or artifact upload.
Trade-offs: flaky status requires retries >= 1 in playwright.config.ts. Reporter only captures final state, not intermediate retry failures.

GitHub Actions Auto-Quarantine Step #

# File: .github/workflows/ci-quarantine.yml
- name: Analyze & Quarantine Flaky Tests
  id: analyze
  run: |
    python scripts/analyze_flakiness.py \
      --report junit-results.xml \
      --threshold 0.15 \
      --output quarantine-manifest.json
  continue-on-error: true

- name: Commit Quarantine Manifest
  if: steps.analyze.outputs.changed == 'true'
  run: |
    git config user.name 'ci-bot'
    git config user.email '[email protected]'
    git add quarantine-manifest.json
    git commit -m 'chore: auto-quarantine flaky tests [skip ci]'
    git push origin ${{ github.head_ref }}

CI Pipeline Impact: Adds ~15–30s to pipeline duration. [skip ci] prevents infinite commit loops.
Trade-offs: Direct branch commits bypass PR review gates. Enforce branch protection rules that allow only service accounts to push quarantine manifests.

Common Pitfalls & Mitigation Strategies #

Over-quarantining stable tests due to environment-specific failures (network timeouts, third-party API rate limits). Mitigation: Filter out infrastructure-level errors via regex classification before applying flakiness thresholds.
Failing to implement a graduation or re-evaluation mechanism, causing permanent test debt accumulation. Mitigation: Enforce a mandatory cooldown_runs counter that triggers re-inclusion after N successful executions.
Hardcoding skip logic directly in test files instead of using external manifests or dynamic hooks. Mitigation: Decouple quarantine state from source code using runtime configuration or CI environment variables.
Ignoring root-cause analysis and treating quarantine as a permanent fix rather than a temporary isolation state. Mitigation: Auto-link quarantined tests to Jira/Linear tickets with a 72-hour SLA for triage.
Running quarantined tests in parallel with main suites without resource isolation, causing CI bottlenecks. Mitigation: Route quarantined tests to a dedicated, low-priority runner pool with extended timeouts.

Reliability Metrics & KPIs #

Metric	Description	Target
Quarantine Activation Rate	Percentage of total tests moved to quarantine per sprint.	`<5%` of active suite
Mean Time to Resolution (MTTR)	Average time from quarantine trigger to test graduation or permanent removal.	`<72 hours`
False Positive Quarantine Rate	Tests quarantined due to infrastructure/environment issues rather than code flakiness.	`<10%`
CI Pass Rate Improvement	Delta in pipeline success rate before and after implementing auto-quarantine.	`+15–25%`

Frequently Asked Questions #

How do I prevent auto-quarantine from masking real bugs? Auto-quarantine should only trigger on intermittent failures with a consistent pass/fail pattern across identical commits. Combine it with deterministic failure classification and require a human review step before permanent isolation.

What is the recommended flakiness threshold for triggering quarantine? Start with a 15–20% failure rate over a 30-day rolling window. Adjust based on suite maturity and CI frequency. Use statistical confidence intervals rather than raw counts to avoid noise.

Can auto-quarantine workflows run in monorepos with multiple test runners? Yes. Implement a centralized quarantine service that aggregates results from Cypress, Playwright, and unit test frameworks. Use a unified manifest format and route framework-specific skip logic via pre-run hooks.

Building Auto-Quarantine Workflows

Pipeline Architecture & Trigger Logic #

Data Flow & State Management #

Framework-Specific Implementation Patterns #

Dynamic Test Filtering #

Threshold Configuration & Historical Baselines #

Adaptive Thresholds #

Governance, Recovery & Team Alignment #

Quarantine Lifecycle Management #

Production-Ready Implementation Examples #

Cypress Dynamic Quarantine Hook #

Playwright Reporter-Based Quarantine #

GitHub Actions Auto-Quarantine Step #

Common Pitfalls & Mitigation Strategies #

Reliability Metrics & KPIs #

Frequently Asked Questions #

Child guides in this section

Auto-Quarantining Flaky Playwright Tests

How to Auto-Quarantine Flaky Cypress Tests

Pipeline Architecture & Trigger Logic #

Data Flow & State Management #

Framework-Specific Implementation Patterns #

Dynamic Test Filtering #

Threshold Configuration & Historical Baselines #

Adaptive Thresholds #

Governance, Recovery & Team Alignment #

Quarantine Lifecycle Management #

Production-Ready Implementation Examples #

Cypress Dynamic Quarantine Hook #

Playwright Reporter-Based Quarantine #

GitHub Actions Auto-Quarantine Step #

Common Pitfalls & Mitigation Strategies #

Reliability Metrics & KPIs #

Frequently Asked Questions #

Related guides #

Child guides in this section

Auto-Quarantining Flaky Playwright Tests

How to Auto-Quarantine Flaky Cypress Tests