ray-data by davila7

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Data & Analytics
15.7K Stars
1.4K Forks
Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

  • **Pinterest**: Last-mile data processing for model training
  • **ByteDance**: Scaling offline inference with multi-modal LLMs
  • **Spotify**: ML platform for batch inference

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Skill Version
main
Community
15.7K 1.4K
Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 327 Lines
Total Files 1
Total Size 0 B
License MIT