Why Use This This skill provides specialized capabilities for korallis's codebase.
Use Cases Developing new features in the korallis repository Refactoring existing code to follow korallis standards Understanding and working with korallis's codebase structure
Install Guide 2 steps 1 2 Install inside Ananke
Click Install Skill, paste the link below, then press Install.
https://github.com/korallis/Droidz/tree/main/droidz_installer/payloads/claude/default/skills/incident-response Skill Snapshot Auto scan of skill assets. Informational only.
Valid SKILL.md Checks against SKILL.md specification
Source & Community
Updated At Nov 26, 2025, 11:09 PM
Skill Stats
SKILL.md 114 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: incident-response
description: Respond to production incidents systematically with triage, investigation, resolution, and post-mortem analysis to minimize downtime and prevent recurrence. Use when handling production outages, triaging incidents, investigating critical bugs, coordinating incident response, implementing hotfixes, conducting post-mortems, or establishing incident response procedures.
---
# Incident Response - Production Issue Management
## When to use this skill
- Responding to production outages
- Triaging critical incidents
- Investigating high-severity bugs
- Coordinating incident response teams
- Implementing emergency hotfixes
- Conducting post-mortem analyses
- Establishing incident response procedures
- Communicating status during incidents
- Creating runbooks for common issues
- Implementing rollback strategies
- Documenting incident timelines
- Preventing incident recurrence
## When to use this skill
- Responding to outages, managing incidents, conducting postmortems.
- When working on related tasks or features
- During development that requires this expertise
**Use when**: Responding to outages, managing incidents, conducting postmortems.
## Incident Response Process
### 1. Detect
- Monitoring alerts
- User reports
- Automated checks
### 2. Triage
- Assess severity (P0-P4)
- Page on-call engineer
- Create incident channel
### 3. Mitigate
- Rollback to last known good
- Scale resources
- Apply hotfix
- Communicate status
### 4. Resolve
- Verify fix
- Monitor metrics
- Update status page
- Close incident
### 5. Postmortem
- Timeline of events
- Root cause analysis
- Action items
- Follow-up tasks
## Severity Levels
- **P0 (Critical)**: Complete outage, data loss
- **P1 (High)**: Major feature broken, revenue impact
- **P2 (Medium)**: Degraded performance, workaround exists
- **P3 (Low)**: Minor bug, cosmetic issue
- **P4 (Informational)**: Enhancement request
## Example Runbook
\`\`\`markdown
# High CPU Usage Runbook
## Symptoms
- Server CPU > 90%
- Slow response times
- Request timeouts
## Investigation
1. Check top processes: \`top\`
2. Check memory: \`free -h\`
3. Check logs: \`tail -f app.log\`
## Mitigation
1. Scale horizontally: Add servers
2. Restart service: \`systemctl restart app\`
3. Rate limit: Enable aggressive rate limiting
## Resolution
1. Identify root cause (N+1 query, memory leak, etc.)
2. Deploy fix
3. Monitor for 1 hour
\`\`\`
## Communication Template
\`\`\`
[INCIDENT] Service X degraded
Status: Investigating
Impact: 20% of users seeing slow load times
ETA: 30 minutes
Updates:
- 10:00 AM: Issue detected
- 10:05 AM: On-call paged, investigation started
- 10:15 AM: Root cause identified (database bottleneck)
- 10:30 AM: Fix deployed, monitoring
Next update: 11:00 AM
\`\`\`
## Resources
- [Incident Management Guide](https://www.pagerduty.com/resources/learn/what-is-incident-management/)
- [Postmortem Template](https://github.com/dastergon/postmortem-templates)
- [PagerDuty](https://www.pagerduty.com/)