Curated academic papers that inform our research directions.
Note: These are external publications from arXiv, not SparseTech publications. We share them as context for the mathematical foundations underlying our work.
Representation Forcing for Bottleneck-Free Unified Multimodal Models
Yuqing Wang, Zhijie Lin, Ceyuan Yang +10 more
Published: May 29, 2026
cs.CV
Unified multimodal models (UMMs) aim to handle perception and generation in a single model. Yet existing UMMs still rely on a frozen, separately pretrained VAE for image generation, imposing a structural bottleneck. Naively removing it introduces a quality gap, as the model must learn both high-level structure and…
Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models
Jiazheng Xing, Hangjie Yuan, Lingling Cai +9 more
Published: May 29, 2026
cs.CVcs.AI
Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient…
Linear Scaling Video VLMs for Long Video Understanding
Cristobal Eyzaguirre, Jiajun Wu, Juan Carlos Niebles
Published: May 29, 2026
cs.CV
Video vision-language models (VLMs) are increasingly used in long-horizon and streaming settings, yet most video encoders still rely on spatiotemporal self-attention, causing compute and latency to grow quadratically with the number of frames. Existing efficiency methods improve scalability but often lose accuracy…
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models
Olaf Dünkel, Basavaraj Sunagad, Haoran Wang +3 more
Published: May 29, 2026
cs.CV
Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large…
KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems
Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil
Published: May 29, 2026
cs.CVcs.LG
Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require some knowledge of the shifted distribution, fail to detect subtle or localized distribution…
Learning Global Motion with Compact Gaussians for Feed-Forward 4D Reconstruction
Mungyeom Kim, Minkyeong Jeon, Honggyu An +10 more
Published: May 29, 2026
cs.CV
Dynamic scene reconstruction from monocular video remains a fundamental challenge in computer vision. Existing feed-forward methods predict 3D Gaussians pixel-wise for each frame, suffering from duplicated Gaussians and view-dependent biases that hinder effective learning of scene motion. We present C4G, a feed-forward…
A Tight Theory of Error Feedback Algorithms in Distributed Optimization
Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut
Published: May 29, 2026
cs.LGmath.OC
Communication costs are a major bottleneck in distributed learning and first-order optimization. A common approach to alleviate this issue is to compress the gradient information exchanged between agents. However, such compression typically degrades the convergence guarantees of gradient-based methods. Error feedback…
Stateful Online Monitoring Catches Distributed Agent Attacks
Davis Brown, Samarth Bhargav, Arav Santhanam +7 more
Published: May 29, 2026
cs.CRcs.AI
Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, splitting a harmful task across many user accounts so each individual transcript looks benign. Because safety monitors…
CoFiDA-M: Concept-Aware Feature Modulation for Cross-Domain Adaptation with Image-Only Inference
Nurjahan Sultana, Moi Hoon Yap, Xinqi Fan +1 more
Published: May 29, 2026
cs.CV
Models for AI-based skin cancer screening suffer a severe performance drop when shifting from expert dermoscopic (source) images to consumer-grade clinical (target) images, hindering real-world deployment. Existing domain adaptation methods often ignore crucial semantic invariants, such as clinical concepts. While new…
TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation
Ruotong Liao, Guowen Huang, Qing Cheng +6 more
Published: May 29, 2026
cs.CVcs.AI
Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoising trajectory where conditioning text…
Recognizing Co-Speech Gestures in-the-Wild
Sindhu B Hegde, K R Prajwal, Andrew Zisserman
Published: May 29, 2026
cs.CV
While humans naturally gesture during speech, only a sparse subset of these movements are visually depictive and semantically linked to specific spoken words. Current multimodal models struggle to capture these semantic co-speech gestures, heavily bottlenecked by a lack of precisely annotated training data. To address…
Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions
Wesley Scivetti, Ethan Wilcox, Nathan Schneider +2 more
Published: May 29, 2026
cs.CLcs.AI
Grasping the semantics of rare constructions (form-meaning pairings) has been shown to be a challenging problem that has currently only been solved by the largest LLMs. It remains an open question if open-source models have robust constructional understanding, and if so, what learning dynamics underlie the acquisition…
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
Nianyi Lin, Jiajie Zhang, Lei Hou +1 more
Published: May 29, 2026
cs.CLcs.AIcs.LG
Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability…
Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation
Albert Sadowski, Jarosław A. Chudziak
Published: May 29, 2026
cs.AI
The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We introduce context-dependent argumentation frameworks (CDAFs), an extension of Dung's theory in which a defeat function…
Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings
Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad +1 more
Published: May 29, 2026
cs.LG
Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a…
Functional Multi-Target Detection via Bispectrum Inversion
Anna Little, Daniel Sanz-Alonso, Mikhail Sweeney +1 more
Published: May 29, 2026
eess.SPcs.ITmath.ST
This paper develops a functional theory for multi-target detection, where a compactly supported signal is recovered from a single noisy observation containing many unknown translations of the signal. Our formulation allows continuous, off-grid translations and correlated stationary Gaussian process noise, extending…
SurGe: Improved Surface Geometry in Point Maps
Karim Knaebel, Gonzalo Martin Garcia, Christian Schmidt +4 more
Published: May 29, 2026
cs.CV
Recent feedforward 3D reconstruction methods predict point maps and estimate global 3D geometry remarkably well. However, their predictions still exhibit inaccurate local surface geometry, which is clearly visible qualitatively but only weakly reflected in common metrics. To make these errors more explicit in…
Joint Multi-Camera LiDAR Extrinsic Calibration via Learned Pairwise Initialization and Geometric Refinement
Aziz Al-Najjar, Marzieh Amini, James R. Green +1 more
Published: May 29, 2026
cs.CV
Most learning-based camera-LiDAR calibration methods treat each camera-LiDAR pair independently, ignoring the rigid geometric coupling in multi-camera platforms. As a result, per-camera estimates may be individually accurate yet inconsistent at the system level. We present a two-stage framework for joint multi-camera…
SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics
Eric Liang
Published: May 29, 2026
cs.IRcs.AI
Scalable information retrieval testing needs corpora that are large enough to stress index construction, ranking latency, query routing, and evaluation tooling, yet human-judged test collections remain expensive and may be unavailable when documents are private or still under design. This paper introduces SPECTRA, a…
nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving
Zhiyu Huang, Johnson Liu, Rui Song +13 more
Published: May 29, 2026
cs.CV
Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited…
What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation
Qing Wang, Jacob Devasier, Chengkai Li
Published: May 29, 2026
cs.CLcs.AI
We present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation. We analyze MDLM generation trajectories -- the order in which tokens are unmasked during iterative decoding -- and find that, unlike autoregressive LLMs which generate text linearly, MDLMs naturally…
Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection
Benedetta Muscato, Beiduo Chen, Gizem Gezici +2 more
Published: May 29, 2026
cs.CL
Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote --…
Effective Biological Representation Learning by Masking Gene Expression
Kian Kenyon-Dean, Alina Selega, Ihab Bendidi +5 more
Published: May 29, 2026
cs.LG
RNA sequencing produces rich and diverse datasets of gene expression, offering compelling insights into cellular state and function that have many applications in drug discovery. Modeling such data is challenging due to inherent technical noise and experimental batch effects, as evidenced by many existing…
What Am I Missing? Question-Answering as Hidden State Probing
Chu Fei Luo, Samuel Dahan, Xiaodan Zhu
Published: May 29, 2026
cs.CL
Test-time reasoning has become a significant field of study since the introduction of chain-of-thought reasoning in large language models (LLMs). However, the mechanisms of this reasoning process are still under-explored -- from the same input prompt, and even the same partial solution, LLMs can produce varied answers…