2026-05-28

该研究探讨了分词压缩率对缩放定律的影响，发现计算最优配置下，模型参数量与以字节（而非Token）衡量的数质量成比例缩放，并揭示了最优压缩率随计算量增加而降低的规律。 MiniMax 推出 M2 系列 MoE 大模型，采用 229.9B 总参数及 9.8B 激活参数架构，主打智能体部署，并引入 Forge 强化学习系统与具备自演化能力的 M2.7 模型。针对…

Compute Optimal Tokenization 88

Tags: 大模型 分词技术 Scaling Laws
Source: arXiv Computation and Language | 阅读原文

[摘要]
该研究探讨了分词压缩率对缩放定律的影响，发现计算最优配置下，模型参数量与以字节（而非Token）衡量的数质量成比例缩放，并揭示了最优压缩率随计算量增加而降低的规律。

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence 88

Tags: 大模型 混合专家模型 智能体 强化学习
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
MiniMax 推出 M2 系列 MoE 大模型，采用 229.9B 总参数及 9.8B 激活参数架构，主打智能体部署，并引入 Forge 强化学习系统与具备自演化能力的 M2.7 模型。

Stateful Inference for Low-Latency Multi-Agent Tool Calling 85

Tags: 推理优化 AI Agent KV缓存
Source: arXiv Machine Learning | 阅读原文

[摘要]
针对多智能体工具调用中的高延迟问题，该研究提出一种有状态推理架构，通过持久化KV缓存和投机解码将推理开销降为增量开销，相比vLLM和SGLang实现2.1至4.2倍的每轮加速。

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training 85

Tags: 优化算法 大模型训练 MoE
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究者提出 MONA 优化器，将 Muon 的正交化框架与 Nesterov 加速结合，在最高达 68B 参数、1万亿 token 的 MoE 模型训练中，其收敛性和下游任务性能均优于 Muon 和 AdamW。

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning 85

Tags: 具身智能 强化学习 多模态大模型
Source: arXiv Computation and Language | 阅读原文

[摘要]
论文提出SOLE-R1，一个专为机器人强化学习提供唯一奖励信号的视频语言推理模型。它通过时空思维链生成稠密任务进度评估，实现无真实奖励的零样本在线强化学习，显著优于GPT-5等模型。

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline 85

Tags: 大模型 自我训练 合成数据 推理能力
Source: arXiv Computation and Language | 阅读原文

[摘要]
论文提出自我验证蒸馏算法，使LLM仅利用无标签提示词，通过三阶段自我验证过滤生成解法并进行自我微调，在数学、科学和代码任务上显著提升性能且无需增加推理成本。

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation 85

Tags: 图像生成 视频生成 开源模型
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
Kandinsky 5.0 是一套用于高分辨率图像和视频合成的开源基础模型家族，包含 6B 图像模型、2B 轻量视频模型和 19B 高质量视频模型，并详细介绍了其数据处理、多阶段训练及推理优化方案。

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers 85

Tags: 优化器 大模型训练 混合专家模型
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
本文提出优化器设计的对称兼容原则，使梯度更新与权重块的对称群保持等变，并针对Embedding、SwiGLU和MoE等设计了专用优化器，在预训练中显著提升了模型性能与训练稳定性。

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens 85

Tags: 大模型 思维链 推理机制
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
该研究挑战了思维链（CoT）的语义作用，发现使用与问题无关的损坏中间轨迹训练模型，其表现与正确轨迹相当，甚至在分布外任务上泛化更好，警示不应过度解读CoT的推理行为。

Qrita: High-performance Top-k and Top-p using Pivot-based Truncation and Selection 85

Tags: 推理优化 大模型 开源工具
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
Qrita 是一种高效的 Top-k/Top-p 采样算法，通过枢轴截断与选择优化 GPU 排序开销，使 LLM 推理吞吐提升达 1.4 倍且内存减半，现已成为 vLLM 的默认采样器。

Causal Representation Learning for Generalisable Recommendation 83

Tags: 推荐系统 因果表征学习 分布外泛化
Source: arXiv Machine Learning | 阅读原文

[摘要]
提出基于因果表征学习的方法，通过信息论解耦解决推荐系统的分布偏移问题。该方法无需额外推理成本，在Spotify大规模A/B测试中显著提升了在线用户参与度。

MinT: Managed Infrastructure for Training and Serving Millions of LLMs 83

Tags: 基础设施 推理优化 分布式训练 大模型
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
MinT是针对LoRA后训练与在线服务的托管基础设施，支持在共享的万亿级基础模型上高效管理、训练和检索百万级LoRA策略，显著降低了多策略训练与服务的延迟和显存占用。

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty 83

Tags: 大模型 推理机制 自我纠错
Source: arXiv Artificial Intelligence | 阅读原文

[摘要]
提出信息论框架解释大模型自我纠错机制，指出显式表达不确定性（如“Wait”）能引导模型纠偏，表明推理能力高度依赖外化不确定性的语言习惯。

Code w/ Claude 伦敦活动：重塑开发体验 82

Tags: AI智能体 部署优化 软件开发
Source: AI HOT 精选 | 阅读原文

[摘要]
Anthropic 推出 Claude 智能体新功能，包括自托管沙箱和 MCP 隧道，支持在企业自有基础设施中运行智能体，并介绍了 Claude Code 的思维预算优化，提升企业级开发与部署体验。

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models 82

Tags: 具身智能 AI安全 VLA模型
Source: arXiv Machine Learning | 阅读原文

[摘要]
该研究证明了视觉-语言-动作（VLA）模型在能力与鲁棒性之间存在信息论上限，表明两者不可兼得；并在OpenVLA上验证了该理论，为评估和提升具身智能的对抗防御提供了新工具。

LLM-guided Hierarchical Search for End-to-end Reasoning Intensive Retrieval 82

Tags: 信息检索 大语言模型 RAG
Source: arXiv Machine Learning | 阅读原文

[摘要]
论文提出LATTICE，一种LLM引导的分层搜索新范式，无需嵌入模型，由LLM直接通过分层索引检索文档，显著提升了推理密集型检索的性能。

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection 82

Tags: Web Agent 数据选择 训练优化
Source: arXiv Machine Learning | 阅读原文

[摘要]
提出 Weasel 框架，通过平衡重要性与多样性的数据选择、AXTree 剪枝及一致性推理生成，优化 Web Agent 离线训练，显著提升域外泛化能力并实现约10倍的训练加速。

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations 82

Tags: 软硬件协同设计 边缘AI 模拟芯片 循环神经网络
Source: arXiv Machine Learning | 阅读原文

[摘要]
提出一种软硬件协同设计方案，利用双稳态记忆循环单元（BMRU）解决模拟电路中循环神经网络（RNN）噪声累积的难题，实现亚微瓦级的超低功耗边缘AI推理。

CompassDPO: Dynamics-Controlled Direct Preference Optimization for Robust Safety Alignment 82

Tags: 对齐算法 AI安全 大模型
Source: arXiv Machine Learning | 阅读原文

[摘要]
提出 CompassDPO 框架，通过动态控制调节更新方向和幅度来稳定直接偏好优化，无需外部奖励模型即可显著提升大模型安全对齐的鲁棒性。

JLT: Clean-Latent Prediction in Latent Diffusion Transformers 82

Tags: 扩散模型 Transformer 图像生成
Source: arXiv Machine Learning | 阅读原文

[摘要]
该研究提出 JLT 架构，探讨在潜扩散 Transformer 中预测干净潜变量而非速度的优势，通过数学分析与 ImageNet 实验证明其能更有效利用低维结构并显著提升生成质量。

2026-05-28 ​

Compute Optimal Tokenization 88 ​

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence 88 ​

Stateful Inference for Low-Latency Multi-Agent Tool Calling 85 ​

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training 85 ​

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning 85 ​

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline 85 ​

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation 85 ​

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers 85 ​

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens 85 ​

Qrita: High-performance Top-k and Top-p using Pivot-based Truncation and Selection 85 ​

Causal Representation Learning for Generalisable Recommendation 83 ​

MinT: Managed Infrastructure for Training and Serving Millions of LLMs 83 ​

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty 83 ​

Code w/ Claude 伦敦活动：重塑开发体验 82 ​

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models 82 ​

LLM-guided Hierarchical Search for End-to-end Reasoning Intensive Retrieval 82 ​

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection 82 ​

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations 82 ​

CompassDPO: Dynamics-Controlled Direct Preference Optimization for Robust Safety Alignment 82 ​

JLT: Clean-Latent Prediction in Latent Diffusion Transformers 82 ​

2026-05-28

Compute Optimal Tokenization 88

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence 88

Stateful Inference for Low-Latency Multi-Agent Tool Calling 85

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training 85

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning 85

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline 85

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation 85

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers 85

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens 85

Qrita: High-performance Top-k and Top-p using Pivot-based Truncation and Selection 85

Causal Representation Learning for Generalisable Recommendation 83

MinT: Managed Infrastructure for Training and Serving Millions of LLMs 83

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty 83

Code w/ Claude 伦敦活动：重塑开发体验 82

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models 82

LLM-guided Hierarchical Search for End-to-end Reasoning Intensive Retrieval 82

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection 82

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations 82

CompassDPO: Dynamics-Controlled Direct Preference Optimization for Robust Safety Alignment 82

JLT: Clean-Latent Prediction in Latent Diffusion Transformers 82