nemo-evaluator-sdk by zechenzhangAGI

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

Content & Writing

9.2K Stars

705 Forks

Updated Jan 15, 2026, 08:38 PM

Why Use This

This skill provides specialized capabilities for zechenzhangAGI's codebase.

Use Cases

Developing new features in the zechenzhangAGI repository
Refactoring existing code to follow zechenzhangAGI standards
Understanding and working with zechenzhangAGI's codebase structure

Install Guide

2 steps

1

Download Ananke

Skip this step if Ananke is already installed.
2

Install inside Ananke

Click Install Skill, paste the link below, then press Install.

https://github.com/zechenzhangAGI/AI-research-SKILLs/tree/main/11-evaluation/nemo-evaluator

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository AI-research-SKILLs

Skill Version

main

Community

9.2K 705

Updated At Jan 15, 2026, 08:38 PM

Skill Stats

SKILL.md 495 Lines

Total Files 2

Total Size 11.9 KB

License MIT

Source

GitHub Repository ↗ Commit main ↗ skill.extrachatgpt.com ↗