Why Use This
This skill provides specialized capabilities for jeremylongshore's codebase.
Use Cases
- Developing new features in the jeremylongshore repository
- Refactoring existing code to follow jeremylongshore standards
- Understanding and working with jeremylongshore's codebase structure
Install Guide
2 steps - 1
- 2
Install inside Ananke
Click Install Skill, paste the link below, then press Install.
https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/replit-pack/skills/replit-incident-runbook
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Updated At Apr 3, 2026, 03:47 AM
Skill Stats
SKILL.md 253 Lines
Total Files 1
Total Size 7.7 KB
License MIT
---
name: replit-incident-runbook
description: |
Execute Replit incident response: triage deployment failures, database issues, and platform outages.
Use when responding to Replit-related outages, investigating deployment crashes,
or running post-incident reviews for Replit app failures.
Trigger with phrases like "replit incident", "replit outage",
"replit down", "replit emergency", "replit broken", "replit crash".
allowed-tools: Read, Grep, Bash(curl:*)
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
compatible-with: claude-code, codex, openclaw
tags: [saas, replit, incident-response, debugging]
---
# Replit Incident Runbook
## Overview
Rapid incident response for Replit deployment failures, database issues, and platform outages. Covers triage, diagnosis, remediation, rollback, and communication.
## Prerequisites
- Access to Replit Workspace and Deployment settings
- Deployment URL for health checks
- Communication channel (Slack, email)
- Rollback awareness (Deployment History)
## Severity Levels
| Level | Definition | Response Time | Examples |
|-------|------------|---------------|----------|
| P1 | Complete outage | < 15 min | App returns 5xx, DB down |
| P2 | Degraded service | < 1 hour | Slow responses, intermittent errors |
| P3 | Minor impact | < 4 hours | Non-critical feature broken |
| P4 | No user impact | Next business day | Monitoring gap |
## Quick Triage (First 5 Minutes)
```bash
set -euo pipefail
DEPLOY_URL="https://your-app.replit.app"
echo "=== TRIAGE ==="
# 1. Check Replit platform status
echo -n "Replit Status: "
curl -s https://status.replit.com/api/v2/summary.json | \
python3 -c "import sys,json;print(json.load(sys.stdin)['status']['description'])" 2>/dev/null || \
echo "Check https://status.replit.com"
# 2. Check your deployment health
echo -n "App Health: "
curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)" "$DEPLOY_URL/health" 2>/dev/null || echo "UNREACHABLE"
echo ""
# 3. Get health details
echo "Health Response:"
curl -s "$DEPLOY_URL/health" 2>/dev/null | python3 -m json.tool 2>/dev/null || echo "No response"
# 4. Check if it's a cold start issue (Autoscale)
echo -n "Second request: "
curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)\n" "$DEPLOY_URL/health"
```
## Decision Tree
```
App not responding?
├─ YES: Is status.replit.com reporting an incident?
│ ├─ YES → Platform issue. Wait for Replit. Communicate to users.
│ └─ NO → Your deployment issue. Continue below.
│
│ Can you access the Replit Workspace?
│ ├─ YES → Check deployment logs:
│ │ ├─ Build error → Fix code, redeploy
│ │ ├─ Runtime crash → Check logs, fix, redeploy
│ │ └─ Secret missing → Add to Secrets tab, redeploy
│ └─ NO → Network/browser issue. Try incognito window.
│
└─ App responds but with errors?
├─ 5xx errors → Check logs for crash/exception
├─ Slow responses → Check database, cold start, memory
└─ Auth not working → Verify deployment domain, not dev URL
```
## Remediation by Error Type
### Deployment Crash (5xx / App Unreachable)
```markdown
1. Open Replit Workspace
2. Go to Deployment Settings > Logs
3. Look for the crash reason:
- "Error: Cannot find module..." → Missing dependency
- "FATAL: Missing secrets..." → Add to Secrets tab
- "EADDRINUSE" → Port conflict in .replit config
- "JavaScript heap out of memory" → Increase VM size or fix memory leak
4. Fix the issue in code
5. Click "Deploy" to redeploy
6. If fix is unclear, ROLLBACK:
- Deployment Settings > History
- Click "Rollback" on last known-good version
```
### Database Connection Failure
```markdown
1. Check database status in Database pane
2. Verify DATABASE_URL is set in Secrets
3. Test connection:
```
```bash
# From Replit Shell
node -e "
const {Pool} = require('pg');
const pool = new Pool({connectionString: process.env.DATABASE_URL, ssl:{rejectUnauthorized:false}});
pool.query('SELECT NOW()').then(r => console.log('OK:', r.rows[0])).catch(e => console.error('FAIL:', e.message)).finally(() => pool.end());
"
```
```markdown
4. If connection fails:
- Check if PostgreSQL is provisioned (Database pane)
- Try creating a new database
- Check for connection pool exhaustion (max connections)
```
### Cold Start Too Slow (Autoscale)
```markdown
If cold starts exceed acceptable latency:
1. Check deployment type: Autoscale scales to zero
2. Options:
a. Switch to Reserved VM (always-on, no cold starts)
b. Set up external keep-alive (ping /health every 4 min)
c. Optimize startup: lazy imports, defer DB connection
3. To switch:
- Update .replit: deploymentTarget = "cloudrun"
- Redeploy
```
### Secrets Missing After Deploy
```markdown
1. Open Secrets tab (lock icon in sidebar)
2. Verify all required secrets are present
3. Check Deployment Settings > Environment Variables
4. Secrets should auto-sync (2025+), but if not:
- Remove and re-add the secret
- Redeploy
5. For Account-level secrets:
- Account Settings > Secrets
- These apply to ALL Repls
```
## Rollback Procedure
```markdown
Replit supports one-click rollback to any previous deployment:
1. Deployment Settings > History
2. Find the last successful deployment
3. Click "Rollback to this version"
4. Verify health endpoint
5. Investigate root cause before redeploying fix
Rollback restores:
- Code at that deployment's commit
- Deployment configuration at that time
- Does NOT rollback database changes
```
## Communication Templates
### Internal (Slack)
```
P[1-4] INCIDENT: [App Name] on Replit
Status: INVESTIGATING / IDENTIFIED / MONITORING / RESOLVED
Impact: [What users are experiencing]
Cause: [If known]
Action: [What we're doing]
ETA: [When we expect resolution]
Next update: [Time]
```
### External (Status Page)
```
[App Name] Service Disruption
We are experiencing issues with [specific feature/service].
[Describe user impact].
We have identified the cause and are working on a fix.
Estimated resolution: [time].
Last updated: [timestamp]
```
## Post-Incident
### Evidence Collection
```bash
set -euo pipefail
# Capture deployment logs
# Go to Deployment Settings > Logs > Copy relevant entries
# Capture timeline
echo "Timeline of events:" > incident-report.md
echo "- [time] Issue detected" >> incident-report.md
echo "- [time] Investigation started" >> incident-report.md
echo "- [time] Root cause identified" >> incident-report.md
echo "- [time] Fix deployed / rollback executed" >> incident-report.md
echo "- [time] Service restored" >> incident-report.md
```
### Postmortem Template
```markdown
## Incident: [Title]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]
### Summary
[1-2 sentence description of what happened]
### Root Cause
[Technical explanation]
### Timeline
- HH:MM — First alert
- HH:MM — Investigation started
- HH:MM — Root cause found
- HH:MM — Fix deployed / rollback
- HH:MM — Service restored
### Impact
- Users affected: [N]
- Downtime: [duration]
### Action Items
- [ ] [Prevention measure] — Owner — Due date
```
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Can't access Workspace | Replit outage | Use status.replit.com, wait |
| Rollback not available | No previous deployments | Fix forward, deploy fix |
| Logs too short | Container restarted | Set up external log aggregator |
| DB rollback needed | Bad migration | Restore from Replit DB snapshot |
## Resources
- [Replit Status](https://status.replit.com)
- [Deployment Rollbacks](https://blog.replit.com/introducing-deployment-rollbacks)
- [Monitoring Deployments](https://docs.replit.com/cloud-services/deployments/monitoring-a-deployment)
- [Replit Support](https://replit.com/support)
## Next Steps
For data handling patterns, see `replit-data-handling`.