tensorrt-llm by davila7

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Coding
15.7K Stars
1.4K Forks
Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

  • Developing new features in the davila7 repository
  • Refactoring existing code to follow davila7 standards
  • Understanding and working with davila7's codebase structure

Install Guide

2 steps
  1. 1

    Download Ananke

    Skip this step if Ananke is already installed.

  2. 2

    Install inside Ananke

    Click Install Skill, paste the link below, then press Install.

    https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/ai-research/inference-serving-tensorrt-llm

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Skill Version
main
Community
15.7K 1.4K
Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 188 Lines
Total Files 1
Total Size 0 B
License MIT