alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

1,618

14 Jul 2025

transformersefficient-transformerslightweight-models

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Mixture-of-Recursions (MoR) is an architecture that integrates parameter sharing and adaptive token-level computation within Recursive Transformers. It achieves a better performance-cost trade-off by dynamically assigning recursion depths to tokens and optimizing Key-Value caching, demonstrating up to 2.06x inference speedup over vanilla Transformers while using fewer parameters and FLOPs.

500

14 Jul 2025

reinforcement-learningreasoningtransformers

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

This paper demonstrates that previously reported performance improvements in Qwen2.5 models on mathematical reasoning benchmarks under random reinforcement learning rewards were largely an artifact of data contamination, not enhanced reasoning. By creating and utilizing a new, clean synthetic dataset, the study confirms that Qwen2.5 models, like others, only achieve stable performance gains with accurate reward signals.

599

11 Jul 2025

adversarial-attacksadversarial-robustnesstransformers

One Token to Fool LLM-as-a-Judge

Tencent AI Lab researchers reveal that "master keys"—simple non-word symbols or reasoning openers—can consistently trick LLM-as-a-judge systems into providing false positive rewards, even with models like GPT-4o, leading to "collapsed training" in RLVR setups. Their data augmentation strategy effectively trains a robust reward model (Master-RM) that achieves near-zero false positive rates against these attacks while maintaining high evaluation consistency.

219

14 Jul 2025

transformersattention-mechanismsembedding-methods

Cameras as Relative Positional Encoding

A new relative positional encoding, PRoPE, is introduced to integrate complete camera frustum information into Transformer models for multiview computer vision tasks. This approach consistently outperforms absolute camera conditioning methods and demonstrates improved generalization to out-of-distribution camera parameters and input view counts across novel view synthesis, stereo depth estimation, and spatial cognition.

2,769

15 Jul 2025

sequence-modelingtransformersrepresentation-learning

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

H-Net, developed by researchers at Carnegie Mellon University and Cartesia AI, introduces an end-to-end hierarchical network that learns dynamic data chunking, enabling direct processing of raw bytes. This architecture surpasses traditional BPE-tokenized large language models in performance and robustness across various modalities while achieving better data efficiency.

453

09 Jul 2025

cs.AI

First Return, Entropy-Eliciting Explore

FR3E introduces a structured exploration framework for fine-tuning Large Language Models (LLMs) using Reinforcement Learning from Verifiable Rewards (RLVR) by ByteDance and The University of Manchester. The method leverages token-wise entropy to identify uncertain decision points, enabling targeted partial rollouts for semantically grounded intermediate feedback, which leads to more stable training and improved reasoning capabilities on mathematical benchmarks.

12 Jul 2025

transformersdata-curationoptimization-methods

Scaling Laws for Optimal Data Mixtures

Researchers from Apple and Sorbonne University extended machine learning scaling laws to account for data domain weights, creating formulations that predict model loss based on model size, training tokens, and data mixture. This allows for the cost-effective determination of optimal data mixtures from a few small-scale training runs, consistently improving performance across Large Language Models, Native Multimodal Models, and Large Vision Models.

15 Jul 2025

transformersattention-mechanismsvideo-understanding

Streaming 4D Visual Geometry Transformer

StreamVGGT, developed by Tsinghua University, provides a method for real-time, on-the-fly 4D visual geometry reconstruction using a causal transformer architecture with cached memory tokens and knowledge distillation. This approach achieves over 19x faster inference speeds compared to prior global-attention models while maintaining competitive reconstruction accuracy.

14 Jul 2025

neural-renderinggeometric-deep-learningvideo-understanding

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

MoVieS introduces a unified feed-forward model that generates dynamic 4D novel views from monocular videos, performing inference in under one second per scene. The model learns generalizable priors for appearance, geometry, and motion, enabling robust performance on unseen dynamic scenes and supporting zero-shot applications like scene flow estimation.

1,075

15 Jul 2025

reinforcement-learningdeep-reinforcement-learningimitation-learning

Reinforcement Learning with Action Chunking

The work "Reinforcement Learning with Action Chunking" from UC Berkeley introduces Q-chunking, a method that adapts TD-based RL algorithms to operate on sequences of actions, enhancing exploration and sample efficiency in long-horizon, sparse-reward tasks. The approach leverages an unbiased form of n-step backups and generates temporally coherent behaviors, leading to superior performance in offline-to-online settings on robotic manipulation benchmarks.

Explore

Assistant

Login

Papers

Papers

Activity

Activity

Communities

Communities

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

One Token to Fool LLM-as-a-Judge

Cameras as Relative Positional Encoding

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

First Return, Entropy-Eliciting Explore

Scaling Laws for Optimal Data Mixtures

Streaming 4D Visual Geometry Transformer

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Reinforcement Learning with Action Chunking

Paper2Blog

Events

Energy-Based Transformers

Popular Communities