Why Use This
This skill provides specialized capabilities for jeremylongshore's codebase.
Use Cases
- Developing new features in the jeremylongshore repository
- Refactoring existing code to follow jeremylongshore standards
- Understanding and working with jeremylongshore's codebase structure
Install Guide
2 steps - 1
- 2
Install inside Ananke
Click Install Skill, paste the link below, then press Install.
https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/exa-pack/skills/exa-load-scale
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Updated At Apr 3, 2026, 03:47 AM
Skill Stats
SKILL.md 229 Lines
Total Files 1
Total Size 6.7 KB
License MIT
---
name: exa-load-scale
description: |
Implement Exa load testing, capacity planning, and scaling strategies.
Use when running performance tests, planning capacity for Exa integrations,
or designing high-throughput search architectures.
Trigger with phrases like "exa load test", "exa scale",
"exa capacity", "exa k6", "exa benchmark", "exa throughput".
allowed-tools: Read, Write, Edit, Bash(k6:*), Bash(node:*)
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
compatible-with: claude-code, codex, openclaw
tags: [saas, exa, testing, performance, scaling]
---
# Exa Load & Scale
## Overview
Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.
## Prerequisites
- k6 load testing tool installed
- Test environment Exa API key (separate from production)
- Redis for result caching
## Capacity Reference
| Search Type | Typical Latency | Max Throughput (10 QPS) |
|-------------|----------------|-------------------------|
| `instant` | < 150ms | 10 req/s (600/min) |
| `fast` | < 425ms | 10 req/s (600/min) |
| `auto` | 300-1500ms | 10 req/s (600/min) |
| `neural` | 500-2000ms | 10 req/s (600/min) |
| `deep` | 2-5s | 10 req/s (600/min) |
**With caching (50% hit rate):** Effective throughput doubles to 20 req/s equivalent.
## Instructions
### Step 1: k6 Load Test Against Your Wrapper
```javascript
// exa-load-test.js
import http from "k6/http";
import { check, sleep } from "k6";
export const options = {
stages: [
{ duration: "1m", target: 5 }, // Ramp up to 5 VUs
{ duration: "3m", target: 5 }, // Steady state
{ duration: "1m", target: 10 }, // Push toward rate limit
{ duration: "2m", target: 10 }, // Stress test
{ duration: "1m", target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ["p(95)<3000"], // 3s P95 for neural search
http_req_failed: ["rate<0.05"], // < 5% error rate
},
};
const queries = [
"best practices for building RAG systems",
"transformer architecture improvements 2025",
"TypeScript 5.5 new features",
"vector database comparison guide",
"AI safety alignment research",
];
export default function () {
const query = queries[Math.floor(Math.random() * queries.length)];
const response = http.post(
`${__ENV.APP_URL}/api/search`,
JSON.stringify({ query, numResults: 3 }),
{
headers: { "Content-Type": "application/json" },
timeout: "10s",
}
);
check(response, {
"status 200": (r) => r.status === 200,
"has results": (r) => JSON.parse(r.body).results?.length > 0,
"latency < 3s": (r) => r.timings.duration < 3000,
});
sleep(0.5 + Math.random()); // 0.5-1.5s between requests
}
```
```bash
# Run load test
k6 run --env APP_URL=http://localhost:3000 exa-load-test.js
```
### Step 2: Throughput Maximizer with Request Queue
```typescript
import Exa from "exa-js";
import PQueue from "p-queue";
const exa = new Exa(process.env.EXA_API_KEY);
// Stay under 10 QPS rate limit
const searchQueue = new PQueue({
concurrency: 8, // max concurrent requests
interval: 1000, // per second
intervalCap: 10, // Exa's QPS limit
});
async function highThroughputSearch(queries: string[]) {
const results = [];
for (const query of queries) {
const promise = searchQueue.add(async () => {
const result = await exa.searchAndContents(query, {
type: "auto",
numResults: 3,
text: { maxCharacters: 500 },
});
return { query, results: result.results };
});
results.push(promise);
}
return Promise.all(results);
}
// Process 100 queries respecting rate limits
const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`);
console.time("batch");
const results = await highThroughputSearch(queries);
console.timeEnd("batch");
// Expected: ~10-12 seconds (100 queries / 10 QPS)
```
### Step 3: Caching for Scale
```typescript
import { LRUCache } from "lru-cache";
// Cache eliminates repeat queries entirely
const cache = new LRUCache<string, any>({
max: 10000,
ttl: 3600 * 1000, // 1-hour TTL
});
async function scalableSearch(query: string, opts: any) {
const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`;
const cached = cache.get(key);
if (cached) return cached;
const result = await searchQueue.add(() =>
exa.searchAndContents(query, opts)
);
cache.set(key, result);
return result;
}
// With 50% cache hit rate:
// 100 unique queries → 50 API calls → 5 seconds instead of 10
```
### Step 4: Capacity Planning Calculator
```typescript
interface CapacityEstimate {
dailySearches: number;
peakQPS: number;
cacheHitRate: number;
effectiveQPS: number;
withinLimits: boolean;
recommendation: string;
}
function estimateCapacity(
dailySearches: number,
peakMultiplier = 3,
expectedCacheHitRate = 0.5
): CapacityEstimate {
const avgQPS = dailySearches / (24 * 3600);
const peakQPS = avgQPS * peakMultiplier;
const effectiveQPS = peakQPS * (1 - expectedCacheHitRate);
const withinLimits = effectiveQPS <= 10; // Default Exa limit
let recommendation = "Within default limits";
if (effectiveQPS > 10 && effectiveQPS <= 50) {
recommendation = "Contact [email protected] for Enterprise rate limits";
} else if (effectiveQPS > 50) {
recommendation = "Requires Enterprise plan + aggressive caching + request queue";
}
return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation };
}
// Example: 50,000 searches/day
const estimate = estimateCapacity(50000);
console.log(estimate);
// { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }
```
## Benchmark Results Template
```markdown
## Exa Performance Benchmark
**Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z
| Metric | Value |
|--------|-------|
| Total Requests | N |
| Success Rate | X% |
| Cache Hit Rate | X% |
| P50 Latency | Xms |
| P95 Latency | Xms |
| Peak QPS (actual API calls) | X |
| 429 Rate Limit Errors | N |
```
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| 429 errors in load test | Exceeding 10 QPS | Reduce concurrency, add cache |
| Inconsistent latency | Different search types | Standardize on one type per test |
| Timeout errors | Deep search under load | Use `fast` or `auto` for load tests |
| Cache miss rate high | Unique queries per request | Use a fixed query pool |
## Resources
- [Exa Rate Limits](https://docs.exa.ai/reference/rate-limits)
- [k6 Documentation](https://k6.io/docs/)
- [p-queue](https://github.com/sindresorhus/p-queue)
## Next Steps
For reliability patterns, see `exa-reliability-patterns`.