nemo-evaluator-sdk by zechenzhangAGI

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

Content & Writing
4.1K Stars
347 Forks
Updated Jan 15, 2026, 08:38 PM

Why Use This

This skill provides specialized capabilities for zechenzhangAGI's codebase.

Use Cases

  • Developing new features in the zechenzhangAGI repository
  • Refactoring existing code to follow zechenzhangAGI standards
  • Understanding and working with zechenzhangAGI's codebase structure

Install Guide

2 steps
  1. 1

    Download Ananke

    Skip this step if Ananke is already installed.

  2. 2

    Install inside Ananke

    Click Install Skill, paste the link below, then press Install.

    https://github.com/zechenzhangAGI/AI-research-SKILLs/tree/main/11-evaluation/nemo-evaluator

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository AI-research-SKILLs
Skill Version
main
Community
4.1K 347
Updated At Jan 15, 2026, 08:38 PM

Skill Stats

SKILL.md 495 Lines
Total Files 2
Total Size 11.9 KB
License MIT