Why Use This This skill provides specialized capabilities for yennanliu's codebase.
Use Cases Developing new features in the yennanliu repository Refactoring existing code to follow yennanliu standards Understanding and working with yennanliu's codebase structure
Install Guide 2 steps 1 2 Install inside Ananke
Click Install Skill, paste the link below, then press Install.
https://github.com/yennanliu/CS_basics/tree/master/.claude/skills/system-architecture Skill Snapshot Auto scan of skill assets. Informational only.
Valid SKILL.md Checks against SKILL.md specification
Source & Community
Updated At Jan 18, 2026, 09:31 AM
Skill Stats
SKILL.md 349 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: system-architecture
description: System design and architecture expert for creating scalable distributed systems. Covers system design interviews, architecture patterns, and real-world case studies like Netflix, Twitter, Uber. Use when designing systems, writing architecture docs, or preparing for system design interviews.
allowed-tools: Read, Glob, Grep, Write
---
# System Architecture Expert
## When to use this Skill
Use this Skill when:
- Designing distributed systems
- Writing system design documentation
- Preparing for system design interviews
- Creating architecture diagrams
- Analyzing trade-offs between design choices
- Reviewing or improving existing system designs
## System Design Framework
### 1. Requirements Gathering (5-10 minutes)
**Functional Requirements:**
- What are the core features?
- What actions can users perform?
- What are the inputs and outputs?
**Non-Functional Requirements:**
- Scale: How many users? How much data?
- Performance: Latency requirements? (p50, p95, p99)
- Availability: What uptime is needed? (99.9%, 99.99%)
- Consistency: Strong or eventual consistency?
**Constraints:**
- Budget limitations
- Technology stack constraints
- Team expertise
- Timeline
**Example Questions:**
```
- How many daily active users?
- What's the read:write ratio?
- What's the average data size?
- What's the peak load vs average load?
- Do we need real-time updates?
- Can we have data loss?
```
### 2. Capacity Estimation (Back-of-the-envelope)
**Calculate:**
```
Traffic:
- DAU = 100M users
- Each user makes 10 requests/day
- QPS = 100M * 10 / 86400 ≈ 11,574 QPS
- Peak QPS = 2-3x average ≈ 30,000 QPS
Storage:
- 100M users * 1KB per user = 100GB
- With 3x replication = 300GB
- Growth: 300GB * 365 days = 109.5TB/year
Bandwidth:
- QPS * average request size
- 11,574 * 10KB = 115.74MB/s
```
**Memory/Cache:**
- 80-20 rule: 20% of data gets 80% of traffic
- Cache = 20% of total data for hot data
### 3. High-Level Design
**Core Components:**
1. **Client Layer** (Web, Mobile, Desktop)
2. **API Gateway / Load Balancer**
3. **Application Servers** (Business logic)
4. **Cache Layer** (Redis, Memcached)
5. **Database** (SQL, NoSQL, or both)
6. **Message Queue** (Kafka, RabbitMQ)
7. **Object Storage** (S3, GCS)
8. **CDN** (CloudFront, Akamai)
**Draw Architecture:**
```
[Clients] → [CDN]
↓
[Load Balancer]
↓
[Application Servers]
↙ ↓ ↘
[Cache] [DB] [Queue] → [Workers]
↓
[Object Storage]
```
### 4. Database Design
**SQL vs NoSQL Decision:**
**Use SQL when:**
- ACID transactions required
- Complex queries with JOINs
- Structured data with relationships
- Examples: PostgreSQL, MySQL
**Use NoSQL when:**
- Massive scale (horizontal scaling)
- Flexible schema
- High write throughput
- Examples: Cassandra, DynamoDB, MongoDB
**Sharding Strategy:**
- Hash-based: `user_id % num_shards`
- Range-based: Users 1-100M on shard 1
- Geographic: US users on US shard
- Consistent hashing: For even distribution
**Schema Design:**
```sql
-- Example: URL Shortener
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_url VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count INT DEFAULT 0,
INDEX (short_url),
INDEX (user_id)
);
```
### 5. Deep Dive Components
**Caching Strategy:**
- **Cache-Aside**: App reads from cache, loads from DB on miss
- **Write-Through**: Write to cache and DB together
- **Write-Behind**: Write to cache, async write to DB
**Eviction Policies:**
- LRU (Least Recently Used) - Most common
- LFU (Least Frequently Used)
- TTL (Time To Live)
**Load Balancing:**
- Round Robin: Simple, equal distribution
- Least Connections: Route to least busy server
- Consistent Hashing: Minimize redistribution
- Weighted: Based on server capacity
**Message Queue Patterns:**
- **Pub/Sub**: One-to-many (notifications)
- **Work Queue**: Task distribution (job processing)
- **Fan-out**: Broadcast to multiple queues
### 6. Scalability Patterns
**Horizontal Scaling:**
- Add more servers
- Use load balancers
- Stateless application servers
- Session stored in cache/DB
**Vertical Scaling:**
- Add more CPU/RAM to servers
- Limited by hardware
- Simpler but has limits
**Microservices:**
```
Monolith:
[Single App] → [DB]
Microservices:
[User Service] → [User DB]
[Post Service] → [Post DB]
[Feed Service] → [Feed DB]
```
**Benefits:**
- Independent scaling
- Technology flexibility
- Fault isolation
**Drawbacks:**
- Increased complexity
- Network latency
- Distributed transactions
### 7. Reliability & Availability
**Replication:**
- Master-Slave: One writer, multiple readers
- Master-Master: Multiple writers (conflict resolution needed)
- Multi-region: Geographic redundancy
**Failover:**
- Active-Passive: Standby server takes over
- Active-Active: Both servers handle traffic
**Rate Limiting:**
- Token bucket algorithm
- Leaky bucket algorithm
- Fixed window counter
- Sliding window log
**Circuit Breaker:**
```
States:
Closed → Normal operation
Open → Reject requests immediately
Half-Open → Test if service recovered
```
### 8. Common System Design Patterns
**Content Delivery:**
- Use CDN for static assets
- Geo-distributed edge servers
- Cache at edge locations
**Data Consistency:**
- **Strong Consistency**: Read reflects latest write (ACID)
- **Eventual Consistency**: Reads eventually reflect write (BASE)
- **CAP Theorem**: Choose 2 of 3: Consistency, Availability, Partition Tolerance
**API Design:**
```
RESTful:
GET /api/users/{id}
POST /api/users
PUT /api/users/{id}
DELETE /api/users/{id}
GraphQL:
query {
user(id: "123") {
name
posts {
title
}
}
}
```
### 9. System Design Template
Use this structure (based on `system_design/00_template.md`):
```markdown
# {System Name}
## 1. Requirements
### Functional
- [List core features]
### Non-Functional
- Scale: [Users, QPS, Data]
- Performance: [Latency requirements]
- Availability: [Uptime target]
## 2. Capacity Estimation
- Traffic: [QPS calculations]
- Storage: [Data size, growth]
- Bandwidth: [Network requirements]
## 3. API Design
```
[endpoint] - [description]
```
## 4. High-Level Architecture
[Diagram]
## 5. Database Schema
[Tables and relationships]
## 6. Detailed Design
### Component 1
[Deep dive]
### Component 2
[Deep dive]
## 7. Scalability
[How to scale each component]
## 8. Trade-offs
[Decisions and alternatives]
```
### 10. Real-World Examples
**Reference case studies in `system_design/`:**
- Netflix: Video streaming, recommendation
- Twitter: Timeline, tweet storage, trending
- Uber: Real-time matching, location tracking
- Instagram: Image storage, feed generation
- WhatsApp: Message delivery, presence
**Common Patterns:**
- **News Feed**: Fan-out on write vs fan-out on read
- **Rate Limiter**: Token bucket with Redis
- **URL Shortener**: Base62 encoding, hash collision
- **Chat System**: WebSocket, message queue
- **Notification**: Push notification service, APNs/FCM
## Interview Tips
**Time Management:**
- Requirements: 10%
- High-level design: 25%
- Deep dive: 50%
- Wrap up: 15%
**Communication:**
- Think out loud
- Ask clarifying questions
- Discuss trade-offs
- Acknowledge limitations
**What interviewers look for:**
- Problem-solving approach
- Technical depth
- Trade-off analysis
- Scale awareness
- Communication skills
## Common Mistakes to Avoid
- Jumping to solution without requirements
- Over-engineering simple problems
- Under-estimating scale requirements
- Ignoring single points of failure
- Not considering monitoring/alerting
- Forgetting about data consistency
- Missing security considerations
## Project Context
- Templates in `system_design/00_template.md`
- Case studies in `system_design/*.md`
- Reference materials in `doc/system_design/`
- Follow the established documentation pattern