運用の SSoT — Strategy + Runbook 集 Operations SSoT — Strategy & runbooks
Parky の運用設計 (通知 / SLO / インシデント対応 / observability / deploy / セキュリティ) の すべてをここに集約。最上位に 戦略 doc (SSoT) を置き、各 runbook が それに従って動く。新規 alert / runbook を作るときは必ず本章のルールに沿うこと。
The single home for Parky operations — notification strategy, SLOs, incident response, observability, deployment, and security. Top-level strategy docs (SSoT) dictate how each runbook behaves. All new alerts/runbooks must conform.
運用設計 (SSoT) Operational design (SSoT)
これらは "戦略" レベルの doc。 個別 runbook はすべてこれに従う。 新規通知 / alert / SLO 追加時はまずこちらを update してから個別 runbook を実装する。
These are strategy docs. Every runbook conforms to them. Update strategy first, then individual runbooks.
通知戦略 Notification strategy
Severity 4 段 × Discord 4 channel + Business event N channel の振り分け。各 source の対応マトリクス、format 標準、dedup ルール、anti-pattern を一括規定。
§7 Failure handling で 3 層防御 (retry → DLQ → P0 email backup) も規定済。
4 severity tiers × 4 Discord channels + N business event channels. Per-source routing matrix, format standard, dedup rules, anti-patterns. §7 covers failure handling with 3-layer defense (retry → DLQ → P0 email backup).
SLO / Error budget SLO / Error budget
サービス別 SLI / SLO 目標値、burn rate alert (14.4× / 6×) の閾値、deploy freeze 判断基準。
Per-service SLI/SLO targets, burn rate alert thresholds (14.4× / 6×), deploy-freeze criteria.
インシデント対応 Incident response
P0/P1/P2 発火時のトリアージ手順、初動コマンド集 (wrangler tail / gh run / supabase 接続確認)、エスカレーション。
Triage flow when P0/P1/P2 fires, initial commands (wrangler tail / gh run / supabase ping), escalation.
Postmortem template Postmortem template
SMART + Blameless テンプレ。P0 解決後に必ず作成。Google SRE Workbook 準拠。
SMART + Blameless template — required after every P0. Aligned with the Google SRE Workbook.
Observability — 計測と通知の配線 Observability — wiring up metrics & notifications
通知戦略 (SSoT) の P0/P1/P2/P3 を実際に動かす配線群。
The wiring that makes the notification strategy actually fire alerts.
Sentry setup
Sentry org / project の初期セットアップ手順。dev Worker DSN 投入済 + alert rule 12 本準備済。
Initial Sentry org/project setup. dev Worker DSN wired + 12 alert rules prepared.
Sentry alert rules
12 alert rule の定義 (5xx burst / Error budget burn 等)。各 rule の severity 振り分けは notification-strategy.md 参照。
Definitions of 12 alert rules (5xx burst, Error budget burn, etc.). Severity routing per notification-strategy.md.
Observability 配線 Observability hookup
Sentry 残チャネル DSN / Honeycomb signup / Discord native integration の手順を一本化。
Remaining Sentry DSN, Honeycomb signup, Discord native integration — consolidated.
Synthetic healthcheck
5 分 cron で /healthz を probe。Cloudflare Health Checks 本格化までの暫定 (GH Actions cron)。
5-min cron probing /healthz. GH Actions interim until Cloudflare Health Checks rolls out.
OpenTelemetry
OTel collector 配線 (Honeycomb 等)。lib/otel.ts は完備、endpoint 投入待ち。
OTel collector wiring (e.g., Honeycomb). lib/otel.ts ready; endpoint pending.
Logging
log level 設計、scope / resource 構造、createLogger 利用パターン。
Log level design, scope/resource structure, createLogger patterns.
Logpush
Workers Logs を R2 に push する設定 (job 1606777 / 1606779)。
Pushing Workers Logs to R2 (jobs 1606777 / 1606779).
Sentry/Logpush rollout
Sentry + Logpush の段階展開計画。
Phased rollout plan for Sentry + Logpush.
Analytics Engine
Cloudflare Analytics Engine (CAE) の利用方針。SLO 計測 / cost report の集計基盤。
Cloudflare Analytics Engine (CAE) usage. Aggregation backbone for SLO and cost reports.
Deployment & Infra Deployment & infra
GitHub Environments
prod approval gate のセットアップ。Free plan の制限と Team plan upgrade 判断材料も記載。
Production approval gate setup, with notes on Free vs Team-plan trade-offs.
Deploy rollback
wrangler rollback / Pages rollback / DB migration revert の手順。
wrangler rollback, Pages rollback, and DB migration revert procedures.
Auto rollback (SLO burn) Auto rollback (SLO burn)
SLO burn rate (error × 3 / p99 × 2) を 5 min cron で監視し、dev では自動で scripts/deploy/rollback.sh api dev を発火。prod 用は auto-rollback-prod.yml に同構造で配置済だが vars.AUTO_ROLLBACK_PROD_ENABLED で gate (初期 disabled)。dev で 2 週間 false-positive ゼロを実証してから prod 有効化を検討。
5-min cron monitors SLO burn rate (error × 3 / p99 × 2) and auto-fires scripts/deploy/rollback.sh api dev in dev. Prod variant exists with the same structure but is gated by vars.AUTO_ROLLBACK_PROD_ENABLED (disabled initially). Enable prod only after 2 weeks of zero false-positives in dev.
Chaos engineering Chaos engineering
四半期 game day の runbook。Supabase pause / Hyperdrive 切断 / R2 障害 / 外部 API 5xx / Auth 障害の 6 シナリオ + 記録テンプレ + 整備すべき仕組み (MSW / circuit breaker)。chaos-fault-inject.yml で Discord に drill シグナル投稿。
Quarterly game day runbook — 6 scenarios (Supabase pause, Hyperdrive disconnect, R2 fail, external API 5xx, Auth outage), recording template, and tooling backlog (MSW / circuit breaker). chaos-fault-inject.yml posts drill signals to Discord.
Canary deploy Canary deploy
Cloudflare Workers Versions API を使った 1 → 10 → 50 → 100% の段階配信。dev/stg は ENABLED、prod は if: false で温存中 (動作実証後に main thread が flag on)。
Gradual rollout 1 → 10 → 50 → 100% via Cloudflare Workers Versions API. dev/stg ENABLED; prod kept disabled (if: false) until validated.
Split worker deploy Split worker deploy
ADR-0010 で決定した 4 worker (public / admin / marketing / store-sync) への本番カットオーバー手順。store-sync → public → admin → marketing の順序、secret 投入 (secret-keys-1p-map.json)、smoke test、rollback。legacy monolith [env.prod] は 2026-05-01 にコメントアウト済 (npm run deploy:prod は意図的に失敗)。
Production cutover for the 4 split workers (public / admin / marketing / store-sync) per ADR-0010. Deploy order (store-sync → public → admin → marketing), secret rollout via secret-keys-1p-map.json, smoke tests, and rollback. Legacy monolith [env.prod] commented out 2026-05-01 — npm run deploy:prod intentionally fails.
DORA Weekly Metrics
毎週月曜 09:00 JST に GitHub API から DORA 4 指標 (Deployment Frequency / Lead Time / MTTR / Change Failure Rate) を集計し、#p2-deploys へ Discord embed 投稿。閾値表と Parky 現状フェーズの解釈を併記。
Weekly DORA 4-key metrics (Deployment Frequency, Lead Time, MTTR, Change Failure Rate) collected every Monday 09:00 JST from the GitHub API and posted to #p2-deploys. Includes threshold table and Parky-phase-aware interpretation.
サプライチェーン セキュリティ Supply-chain security
SBOM (CycloneDX) / SLSA L3 provenance attestation / Dependency Review / Sigstore による多層防御。SLSA レベル達成度と検証手順 (gh attestation verify / cosign) も記載。
Multi-layer defense via SBOM (CycloneDX), SLSA L3 provenance attestation, Dependency Review, and Sigstore. Includes self-assessment of SLSA levels and verification commands (gh attestation verify / cosign).
Regional rollout
将来の段階展開戦略 (Tokyo → 関東 → 全国)。
Future phased rollout (Tokyo → Kanto → nationwide).
Terraform backend
local state → R2 backend への移行計画。drift detection の前提整備。
Migrating local state to an R2 backend; prerequisites for drift detection.
Supabase branching
Supabase database branching の導入計画。PR 単位の DB プレビュー。
Plan for Supabase database branching — per-PR DB previews.
Build time scaling
CI 時間最適化 (turbo cache / job 並列化 / matrix 統合)。
CI time optimization (turbo cache, job parallelism, matrix consolidation).
Security
Secret rotation
SA / CF API token / Supabase / Firebase / R2 / Webhook URL の rotation 手順。
Rotation procedures for SA, CF API token, Supabase, Firebase, R2, and webhook URLs.
Secret rotation runbook (SSoT)
R2 / Stripe / FCM / Mapbox / Resend / Discord / OAuth の trigger・検証・rollback を網羅した運用基準書。
SSoT runbook with trigger / verify / rollback for R2, Stripe, FCM, Mapbox, Resend, Discord, and OAuth.
Auth cookie policy
SPA + Bearer / SSR + cookie の使い分け方針。
When to use SPA + Bearer vs SSR + cookies.
Leaked password 対策 Leaked password protection
HIBP 連携、login-lockout、rate limit 設計。
HIBP integration, login lockout, rate-limit design.
SRI
外部 CDN script の Subresource Integrity 強制。Mapbox GL / Sentry SDK 等。
SRI enforcement on external CDN scripts (Mapbox GL, Sentry SDK, etc.).
Mobile security runbook
Flutter signing key 管理、cert pinning、jailbreak detect の運用。
Flutter signing keys, cert pinning, jailbreak detection in operation.
Storage & Performance Storage & performance
Cloudflare Images
画像変換 / R2 連携の運用方針。AVIF/WebP/JPEG fallback。
Image transformation and R2 integration. AVIF/WebP/JPEG fallback.
Durable Objects
DO の利用パターン (rate limit / single-flight / coordination)。
DO usage patterns (rate limiting, single-flight, coordination).
関連 Related
運用に関わる監査結果は .work/parky/2026-04-29_001_parky_comprehensive_evaluation_v2.html
(Operations 軸 62/100) と
.work/parky/2026-04-29_002_parky_notification_strategy.html
(本章の HTML reference 版) に残してある。
Operations audit at .work/parky/2026-04-29_001_parky_comprehensive_evaluation_v2.html
(Operations axis 62/100), with the HTML reference of this chapter at
.work/parky/2026-04-29_002_parky_notification_strategy.html.