Appearance
Operations
This section is the production operations entry point.
What you will learn
- How to run go-live checks safely.
- How to monitor critical production signals.
- How to use replay in incident response and verification.
What this section covers
- Go-live readiness checks.
- Monitoring and SLO guardrails.
- Incident response with replay workflow.
- Repeatable operational runbooks.
- Release-safe rollback and evidence practices.
Task-driven operating paths
- Go live this week: start with Go-live Checklist, then run Monitoring and SLO.
- Investigate an incident now: start with Incident Response and Replay, then Runbooks.
- Build evidence for change approval: use Tutorial: Release Gate with Replay Evidence.
Operating model
flowchart LR A["Staging validation"] --> B["Core gate"] --> C["Go-live decision"] --> D["Production traffic"] --> E["Monitoring + replay checks"]
Daily operator checklist
- Verify health status and key service metrics.
- Run policy sanity checks on a target scope.
- Validate one replay chain from recent traffic.
Weekly operator checklist
- Run governance and evidence workflows.
- Review SLO trend and incident noise.
- Confirm rollback and drill readiness.
How to navigate this section
Use the left sidebar to move between checklists, monitoring, incident response, and runbooks.