view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand qgallouedec • Dec 4, 2025 • 72
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 Text Generation • 335B • Updated about 12 hours ago • 198k • • 194
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 6 days ago • 158
Running 187 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 187 Building and scaling RL environments for LLM training
NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 25 days ago • 35
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 23 days ago • 141
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published May 9 • 23