Mixture-of-Recursions (MoR) is an architecture that integrates parameter sharing and adaptive token-level computation within Recursive Transformers. It achieves a better performance-cost trade-off by dynamically assigning recursion depths to tokens and optimizing Key-Value caching, demonstrating up to 2.06x inference speedup over vanilla Transformers while using fewer parameters and FLOPs.
This paper demonstrates that previously reported performance improvements in Qwen2.5 models on mathematical reasoning benchmarks under random reinforcement learning rewards were largely an artifact of data contamination, not enhanced reasoning. By creating and utilizing a new, clean synthetic dataset, the study confirms that Qwen2.5 models, like others, only achieve stable performance gains with accurate reward signals.
Tencent AI Lab researchers reveal that "master keys"—simple non-word symbols or reasoning openers—can consistently trick LLM-as-a-judge systems into providing false positive rewards, even with models like GPT-4o, leading to "collapsed training" in RLVR setups. Their data augmentation strategy effectively trains a robust reward model (Master-RM) that achieves near-zero false positive rates against these attacks while maintaining high evaluation consistency.
A new relative positional encoding, PRoPE, is introduced to integrate complete camera frustum information into Transformer models for multiview computer vision tasks. This approach consistently outperforms absolute camera conditioning methods and demonstrates improved generalization to out-of-distribution camera parameters and input view counts across novel view synthesis, stereo depth estimation, and spatial cognition.
H-Net, developed by researchers at Carnegie Mellon University and Cartesia AI, introduces an end-to-end hierarchical network that learns dynamic data chunking, enabling direct processing of raw bytes. This architecture surpasses traditional BPE-tokenized large language models in performance and robustness across various modalities while achieving better data efficiency.
FR3E introduces a structured exploration framework for fine-tuning Large Language Models (LLMs) using Reinforcement Learning from Verifiable Rewards (RLVR) by ByteDance and The University of Manchester. The method leverages token-wise entropy to identify uncertain decision points, enabling targeted partial rollouts for semantically grounded intermediate feedback, which leads to more stable training and improved reasoning capabilities on mathematical benchmarks.
Researchers from Apple and Sorbonne University extended machine learning scaling laws to account for data domain weights, creating formulations that predict model loss based on model size, training tokens, and data mixture. This allows for the cost-effective determination of optimal data mixtures from a few small-scale training runs, consistently improving performance across Large Language Models, Native Multimodal Models, and Large Vision Models.
StreamVGGT, developed by Tsinghua University, provides a method for real-time, on-the-fly 4D visual geometry reconstruction using a causal transformer architecture with cached memory tokens and knowledge distillation. This approach achieves over 19x faster inference speeds compared to prior global-attention models while maintaining competitive reconstruction accuracy.
MoVieS introduces a unified feed-forward model that generates dynamic 4D novel views from monocular videos, performing inference in under one second per scene. The model learns generalizable priors for appearance, geometry, and motion, enabling robust performance on unseen dynamic scenes and supporting zero-shot applications like scene flow estimation.
The work "Reinforcement Learning with Action Chunking" from UC Berkeley introduces Q-chunking, a method that adapts TD-based RL algorithms to operate on sequences of actions, enhancing exploration and sample efficiency in long-horizon, sparse-reward tasks. The approach leverages an unbiased form of n-step backups and generates temporally coherent behaviors, leading to superior performance in offline-to-online settings on robotic manipulation benchmarks.