optimizing-attention-flash by davila7

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

Coding

15.7K Stars

1.4K Forks

Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

Developing new features in the davila7 repository
Refactoring existing code to follow davila7 standards
Understanding and working with davila7's codebase structure

Install Guide

2 steps

1

Download Ananke

Skip this step if Ananke is already installed.
2

Install inside Ananke

Click Install Skill, paste the link below, then press Install.

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/ai-research/optimization-flash-attention

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository claude-code-templates

Skill Version

main

Community

15.7K 1.4K

Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 368 Lines

Total Files 1

Total Size 0 B

License MIT

Source

GitHub Repository ↗ Commit main ↗ skill.extrachatgpt.com ↗