perplexity-incident-runbook by jeremylongshore
DevOps
945 Stars
114 Forks
Updated Jan 11, 2026, 10:30 PM
Why Use This
This skill provides specialized capabilities for jeremylongshore's codebase.
Use Cases
- Developing new features in the jeremylongshore repository
- Refactoring existing code to follow jeremylongshore standards
- Understanding and working with jeremylongshore's codebase structure
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Repository claude-code-plugins-plus-skills
Skill Version
main
Community
945 114
Updated At Jan 11, 2026, 10:30 PM
Skill Stats
SKILL.md 203 Lines
Total Files 1
Total Size 0 B
License MIT
--- name: perplexity-incident-runbook description: | Execute Perplexity incident response procedures with triage, mitigation, and postmortem. Use when responding to Perplexity-related outages, investigating errors, or running post-incident reviews for Perplexity integration failures. Trigger with phrases like "perplexity incident", "perplexity outage", "perplexity down", "perplexity on-call", "perplexity emergency", "perplexity broken". allowed-tools: Read, Grep, Bash(kubectl:*), Bash(curl:*) version: 1.0.0 license: MIT author: Jeremy Longshore <[email protected]> --- # Perplexity Incident Runbook ## Overview Rapid incident response procedures for Perplexity-related outages. ## Prerequisites - Access to Perplexity dashboard and status page - kubectl access to production cluster - Prometheus/Grafana access - Communication channels (Slack, PagerDuty) ## Severity Levels | Level | Definition | Response Time | Examples | |-------|------------|---------------|----------| | P1 | Complete outage | < 15 min | Perplexity API unreachable | | P2 | Degraded service | < 1 hour | High latency, partial failures | | P3 | Minor impact | < 4 hours | Webhook delays, non-critical errors | | P4 | No user impact | Next business day | Monitoring gaps | ## Quick Triage ```bash # 1. Check Perplexity status curl -s https://status.perplexity.com | jq # 2. Check our integration health curl -s https://api.yourapp.com/health | jq '.services.perplexity' # 3. Check error rate (last 5 min) curl -s localhost:9090/api/v1/query?query=rate(perplexity_errors_total[5m]) # 4. Recent error logs kubectl logs -l app=perplexity-integration --since=5m | grep -i error | tail -20 ``` ## Decision Tree ``` Perplexity API returning errors? ├─ YES: Is status.perplexity.com showing incident? │ ├─ YES → Wait for Perplexity to resolve. Enable fallback. │ └─ NO → Our integration issue. Check credentials, config. └─ NO: Is our service healthy? ├─ YES → Likely resolved or intermittent. Monitor. └─ NO → Our infrastructure issue. Check pods, memory, network. ``` ## Immediate Actions by Error Type ### 401/403 - Authentication ```bash # Verify API key is set kubectl get secret perplexity-secrets -o jsonpath='{.data.api-key}' | base64 -d # Check if key was rotated # → Verify in Perplexity dashboard # Remediation: Update secret and restart pods kubectl create secret generic perplexity-secrets --from-literal=api-key=NEW_KEY --dry-run=client -o yaml | kubectl apply -f - kubectl rollout restart deployment/perplexity-integration ``` ### 429 - Rate Limited ```bash # Check rate limit headers curl -v https://api.perplexity.com 2>&1 | grep -i rate # Enable request queuing kubectl set env deployment/perplexity-integration RATE_LIMIT_MODE=queue # Long-term: Contact Perplexity for limit increase ``` ### 500/503 - Perplexity Errors ```bash # Enable graceful degradation kubectl set env deployment/perplexity-integration PERPLEXITY_FALLBACK=true # Notify users of degraded service # Update status page # Monitor Perplexity status for resolution ``` ## Communication Templates ### Internal (Slack) ``` 🔴 P1 INCIDENT: Perplexity Integration Status: INVESTIGATING Impact: [Describe user impact] Current action: [What you're doing] Next update: [Time] Incident commander: @[name] ``` ### External (Status Page) ``` Perplexity Integration Issue We're experiencing issues with our Perplexity integration. Some users may experience [specific impact]. We're actively investigating and will provide updates. Last updated: [timestamp] ``` ## Post-Incident ### Evidence Collection ```bash # Generate debug bundle ./scripts/perplexity-debug-bundle.sh # Export relevant logs kubectl logs -l app=perplexity-integration --since=1h > incident-logs.txt # Capture metrics curl "localhost:9090/api/v1/query_range?query=perplexity_errors_total&start=2h" > metrics.json ``` ### Postmortem Template ```markdown ## Incident: Perplexity [Error Type] **Date:** YYYY-MM-DD **Duration:** X hours Y minutes **Severity:** P[1-4] ### Summary [1-2 sentence description] ### Timeline - HH:MM - [Event] - HH:MM - [Event] ### Root Cause [Technical explanation] ### Impact - Users affected: N - Revenue impact: $X ### Action Items - [ ] [Preventive measure] - Owner - Due date ``` ## Instructions ### Step 1: Quick Triage Run the triage commands to identify the issue source. ### Step 2: Follow Decision Tree Determine if the issue is Perplexity-side or internal. ### Step 3: Execute Immediate Actions Apply the appropriate remediation for the error type. ### Step 4: Communicate Status Update internal and external stakeholders. ## Output - Issue identified and categorized - Remediation applied - Stakeholders notified - Evidence collected for postmortem ## Error Handling | Issue | Cause | Solution | |-------|-------|----------| | Can't reach status page | Network issue | Use mobile or VPN | | kubectl fails | Auth expired | Re-authenticate | | Metrics unavailable | Prometheus down | Check backup metrics | | Secret rotation fails | Permission denied | Escalate to admin | ## Examples ### One-Line Health Check ```bash curl -sf https://api.yourapp.com/health | jq '.services.perplexity.status' || echo "UNHEALTHY" ``` ## Resources - [Perplexity Status Page](https://status.perplexity.com) - [Perplexity Support](https://support.perplexity.com) ## Next Steps For data handling, see `perplexity-data-handling`.
Name Size