Why Use This
This skill provides specialized capabilities for jeremylongshore's codebase.
Use Cases
- Developing new features in the jeremylongshore repository
- Refactoring existing code to follow jeremylongshore standards
- Understanding and working with jeremylongshore's codebase structure
Install Guide
2 steps - 1
- 2
Install inside Ananke
Click Install Skill, paste the link below, then press Install.
https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/langchain-pack/skills/langchain-rate-limits
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Updated At Apr 3, 2026, 03:47 AM
Skill Stats
SKILL.md 211 Lines
Total Files 1
Total Size 5.8 KB
License MIT
---
name: langchain-rate-limits
description: |
Implement LangChain rate limiting, retry strategies, and backoff.
Use when handling API rate limits, controlling request throughput,
or implementing concurrency-safe batch processing.
Trigger: "langchain rate limit", "langchain throttling",
"langchain backoff", "langchain retry", "API quota", "429 error".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
compatible-with: claude-code, codex, openclaw
tags: [saas, langchain, api, llm]
---
# LangChain Rate Limits
## Overview
Handle API rate limits gracefully with built-in retries, exponential backoff, concurrency control, provider fallbacks, and custom rate limiters.
## Provider Rate Limits (2026)
| Provider | Model | RPM | TPM |
|----------|-------|-----|-----|
| OpenAI | gpt-4o | 10,000 | 800,000 |
| OpenAI | gpt-4o-mini | 10,000 | 4,000,000 |
| Anthropic | claude-sonnet | 4,000 | 400,000 |
| Anthropic | claude-haiku | 4,000 | 400,000 |
| Google | gemini-1.5-pro | 360 | 4,000,000 |
RPM = requests/minute, TPM = tokens/minute. Actual limits depend on your tier.
## Strategy 1: Built-in Retry (Simplest)
```typescript
import { ChatOpenAI } from "@langchain/openai";
// Built-in exponential backoff on 429/500/503
const model = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 5, // retries with exponential backoff
timeout: 30000, // 30s timeout per request
});
// This automatically retries on rate limit errors
const response = await model.invoke("Hello");
```
## Strategy 2: Concurrency-Controlled Batch
```typescript
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
const chain = ChatPromptTemplate.fromTemplate("Summarize: {text}")
.pipe(new ChatOpenAI({ model: "gpt-4o-mini", maxRetries: 3 }))
.pipe(new StringOutputParser());
const inputs = articles.map((text) => ({ text }));
// batch() with maxConcurrency prevents flooding the API
const results = await chain.batch(inputs, {
maxConcurrency: 5, // max 5 parallel requests
});
```
## Strategy 3: Provider Fallback on Rate Limit
```typescript
import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";
const primary = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 2,
timeout: 10000,
});
const fallback = new ChatAnthropic({
model: "claude-sonnet-4-20250514",
maxRetries: 2,
});
// Automatically switches to Anthropic if OpenAI rate-limits
const resilientModel = primary.withFallbacks({
fallbacks: [fallback],
});
const chain = prompt.pipe(resilientModel).pipe(new StringOutputParser());
```
## Strategy 4: Custom Rate Limiter
```typescript
class TokenBucketLimiter {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number, // bucket size
private refillRate: number, // tokens per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
async acquire(): Promise<void> {
this.refill();
while (this.tokens < 1) {
const waitMs = (1 / this.refillRate) * 1000;
await new Promise((r) => setTimeout(r, waitMs));
this.refill();
}
this.tokens -= 1;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
// Usage: 100 requests per minute
const limiter = new TokenBucketLimiter(100, 100 / 60);
async function rateLimitedInvoke(chain: any, input: any) {
await limiter.acquire();
return chain.invoke(input);
}
```
## Strategy 5: Async Batch with Semaphore
```typescript
async function batchWithSemaphore<T>(
chain: { invoke: (input: any) => Promise<T> },
inputs: any[],
maxConcurrent = 5,
): Promise<T[]> {
let active = 0;
const results: T[] = [];
const queue = [...inputs.entries()];
return new Promise((resolve, reject) => {
function next() {
while (active < maxConcurrent && queue.length > 0) {
const [index, input] = queue.shift()!;
active++;
chain.invoke(input)
.then((result) => {
results[index] = result;
active--;
if (queue.length === 0 && active === 0) resolve(results);
else next();
})
.catch(reject);
}
}
next();
});
}
// Process 100 items, 5 at a time
const results = await batchWithSemaphore(chain, inputs, 5);
```
## Python Equivalent
```python
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.runnables import RunnableConfig
# Built-in retry
llm = ChatOpenAI(model="gpt-4o-mini", max_retries=5, request_timeout=30)
# Fallback
primary = ChatOpenAI(model="gpt-4o-mini", max_retries=2)
fallback = ChatAnthropic(model="claude-sonnet-4-20250514")
robust = primary.with_fallbacks([fallback])
# Batch with concurrency control
results = chain.batch(
[{"text": t} for t in texts],
config=RunnableConfig(max_concurrency=10),
)
```
## Error Handling
| Error | Cause | Fix |
|-------|-------|-----|
| `429 Too Many Requests` | Rate limit hit | Increase `maxRetries`, reduce `maxConcurrency` |
| `Timeout` | Response too slow | Increase `timeout`, check network |
| `QuotaExceeded` | Monthly limit hit | Upgrade tier or switch provider |
| Batch partially fails | Some items rate limited | Use `.batch()` with `returnExceptions: true` |
## Resources
- [OpenAI Rate Limits](https://platform.openai.com/docs/guides/rate-limits)
- [Anthropic Rate Limits](https://docs.anthropic.com/en/api/rate-limits)
- [LangChain Batch Processing](https://js.langchain.com/docs/how_to/batch/)
## Next Steps
Proceed to `langchain-security-basics` for security best practices.