1. Why This Guide Exists
This guide walks the YieldGuard team through every step of handling big problems that could hurt our users’ money, our app, or our legal status. The problems we care about include hacked smart‑contracts, bad price feeds (oracles), broken servers, and rule‑breaking customers.
2. How Bad Is the Problem? (Severity Levels)
| Level | What It Means | Examples | Time Goals |
|---|---|---|---|
| SEV-0 (Critical) | Money is being stolen right now or the law stops us from working. | Hacker drains a contract; proof we are short of funds; our multisig wallet is taken. | Confirm in 5 min, stop damage in 15 min, tell public in 1 hour |
| SEV-1 (High) | The app is breaking for most users or a hacker could copy a working exploit soon. | Auto-pause triggered; serious bug bounty; leaked AWS keys. | Confirm in 15 min, stop damage in 1 hour, tell public in 4 hours |
| SEV-2 (Medium) | Annoying but not dangerous; can wait one day. | Slow indexer; small UI glitch; sanctions list update. | Confirm in 1 hour, fix in 24 hours |
| SEV-3 (Low) | Cosmetic mistakes. | Typos, broken link. | Fix in next sprint |
3. Who Does What
| Role | Main Person | Backup | Job |
|---|---|---|---|
| Incident Commander (IC) | On-call security engineer | Back-end dev lead | Picks severity, runs war-room, assigns tasks |
| Communications Lead | Head of Operations | CEO | Writes updates for staff, users, and press |
| Chain Lead | Smart-contract lead | CTO | Sends pause or emergency transactions |
| Infrastructure Lead | DevOps engineer | SRE | Fixes cloud servers, rotates keys |
| Compliance Liaison | Compliance officer | CFO | Tells regulators and bank partners |
| Scribe | Ops support | Any engineer | Writes timeline and keeps notes |
PagerDuty makes sure IC and Infra Lead are awake 24/7.
4. Life of an Incident
4.1 Detect & Triage
- Alerts come from contracts, price feeds, AI risk scores, PagerDuty, bounty hunters, or user reports.
- IC checks the alert and sets a first severity level.
- Open Slack channel `#warroom-<id>` and Zoom call; invite leads.
- Scribe starts the incident log in Notion.
4.2 Contain
- **Contract hack** – Chain Lead pauses deposits or calls emergency withdraw with high gas.
- **Bad price feed** – Pause deposits, switch to backup price, limit withdrawals.
- **Key leak** – Change all exposed secrets and kill old sessions.
- Check metrics to be sure the bleeding has stopped.
4.3 Eradicate & Fix
- Patch the code, run tests, and get a quick audit review.
- Deploy the upgrade with multisig. Skip the timelock only if funds are in danger.
- Rebuild any hacked servers with least‑privilege settings.
4.4 Recover & Verify
- Run the full test suite on staging.
- Simulate deposits and withdrawals; check net asset value (NAV).
- Remove the pause only when IC, Chain Lead, and Compliance agree.
4.5 Learn & Improve
- Draft a post-mortem within 48 hours.
- Hold a blame-free meeting: what broke, what worked, what to improve.
- Publish the report to the repo and Statuspage.
- Track action items in JIRA and close on time.
5. Communication Plan
5.1 Inside the Team
- Talk in Slack `#warroom-<id>`.
- Zoom call is recorded.
- IC posts status every hour during SEV-0/1.
5.2 Outside
| Channel | Who Sees It | How Often |
|---|---|---|
| Statuspage | Everyone | First post in 1 hour for SEV-0/1; then every 2 hours |
| Investors | First email in 2 hours; summary when fixed | |
| Twitter/X | Crypto community | Post after Statuspage to avoid front-running |
| Discord | Users | Mirror Statuspage updates |
5.3 Regulators & Partners
- Call and email AIFM & Depositary within 2 hours (SEV-0/1).
- Send an incident form to CSSF within 24 hours if NAV is hit.
- Call police only for crime (theft, extortion).
6. Quick Playbooks
6.1 Smart-Contract Hack
- Pause deposits; record TVL.
- Trace the attack on Tenderly; estimate loss risk.
- Patch and deploy the fix (skip timelock if funds still leaking).
- Ask white-hat helpers if a rescue is needed.
6.2 Oracle Trouble
- Auto-pause.
- Switch to the last good price.
- Talk to Chainlink; watch proof-of-reserve.
- Resume when the feed is fresh (<15 min) and proof is good.
6.3 KYC / Sanctions Issue
- Bot flags wallet; burn its KYC pass.
- Freeze deposits; notify AIFM.
- File a SAR if the wallet is on the OFAC/UN list.
6.4 Server Hack
- Revoke IAM keys; rotate all secrets.
- Save a forensic snapshot; give it to the Security Lead.
- Re-deploy clean infrastructure with Terraform.
7. Tools We Use
- PagerDuty – alerts and on-call
- Slack – chat and war-room creation
- Statuspage.io – public status
- Safe (Gnosis) – multisig actions
- Tenderly / Etherscan – trace transactions
- Grafana & Prometheus – metrics
- Notion – logs and post-mortems
- JIRA – track fixes
8. Goals & Metrics
| Metric | Target |
|---|---|
| Detect Time | <5 min for SEV-0 |
| Contain Time | <15 min for SEV-0 |
| Post-mortem done | 100% within 48h |
| Action items closed | 90% within 30 days |