speculative-decoding by davila7

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

Coding

15.7K Stars

1.4K Forks

Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

Developing new features in the davila7 repository
Refactoring existing code to follow davila7 standards
Understanding and working with davila7's codebase structure

Install Guide

2 steps

1

Download Ananke

Skip this step if Ananke is already installed.
2

Install inside Ananke

Click Install Skill, paste the link below, then press Install.

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/ai-research/emerging-techniques-speculative-decoding

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository claude-code-templates

Skill Version

main

Community

15.7K 1.4K

Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 468 Lines

Total Files 1

Total Size 0 B

License MIT

Source

GitHub Repository ↗ Commit main ↗ skill.extrachatgpt.com ↗