Research · White Papers

Research & Publications

Exploring the mathematical foundations of efficient signal processing and AI model compression. Join the waitlist for updates on our latest research.

Source code on a dark monitor representing computational research

Research Focus Areas

Our research explores fundamental challenges in sparse signal processing, deterministic algorithm design, and certification frameworks for production systems.

Efficient Spectral Analysis

Investigating efficient algorithms for identifying and reconstructing sparse frequency components in high-dimensional signals across various domains.

Deterministic Algorithms

Developing verification-ready signal processing frameworks with guaranteed reproducibility for production AI and high-reliability applications.

Computational Optimization

Advancing computational efficiency for frequency domain analysis through optimized processing and intelligent resource allocation.

Validation & Benchmarking

Establishing comprehensive benchmarking methodologies and validation frameworks for comparing signal processing implementations across diverse use cases.

Technical Papers

SparseTech technical papers covering deterministic sparse FFT engines, sublinear data discovery, and memory-compute workload modeling. PDF available for each.

Sparse FFT as a Memory-Compute Workload: FFTW Benchmarking and Traffic/Energy Modeling

Aaron R. Flouro, Shawn P. Chadwick

Published: May 12, 2026

eess.SPcs.ARcs.PF
We benchmark a production Rust implementation of a Four-View GATED CRT sparse FFT against FFTW and evaluate its suitability for near-memory sparse spectral processing. The sparse arithmetic core scales as $O(k \log k)$, while input acquisition remains streaming $O(N)$. On synthetic on-grid sparse signals, the…

SparseDSP: System-Level Evaluation of Deterministic Sparse FFT Engine Routing across Synthetic, Impaired, and Curated Real-Payload Workloads

Aaron R. Flouro, Shawn P. Chadwick

Published: May 10, 2026

eess.SPcs.DScs.AR
We present SparseDSP, a regime-adaptive deterministic sparse FFT engine routing system evaluated against Dense FFT across on-grid, off-grid, and curated real-payload workloads. SparseDSP estimates input sparsity internally, then dispatches to an exact engine drawn from a complexity-class family spanning $O(k \log k)$,…

SparseDSP: System-Level Evaluation of Deterministic Sparse FFT for 5G/6G-Relevant Wideband Spectrum Sensing

Aaron R. Flouro, Shawn P. Chadwick

Published: May 11, 2026

eess.SPcs.ITcs.NI
We present SparseDSP, a regime-adaptive deterministic sparse FFT system evaluated against dense FFT baselines for transform-stage bin identification in 5G/6G-relevant wideband sensing regimes. SparseDSP estimates effective sparsity internally and dispatches among deterministic sparse recovery engines spanning…

SparseDSP: System-Level Evaluation of Deterministic Sparse FFT for Radar, Sonar, and LiDAR

Aaron R. Flouro, Shawn P. Chadwick

Published: April 19, 2026

eess.SPcs.AR
We present SparseDSP, a regime-adaptive deterministic sparse FFT system evaluated against Dense FFT across radar, sonar, electronic warfare, and LiDAR operating points. SparseDSP estimates signal sparsity internally, then dispatches to an exact engine selected by its internal dispatch policy from an internal family of…

SparseDSP: Sublinear Data Discovery for Large-Scale Computational Pipelines

Aaron R. Flouro, Shawn P. Chadwick

Published: April 19, 2026

cs.DScs.LGcs.IR
Large-scale data processing pipelines spend substantial time on discovery: selecting relevant subsets from large data stores before downstream computation begins. This discovery stage, which includes dense scans, FFT-based analysis, and exhaustive top-$k$ selection, scales linearly with data size regardless of…

SparseTech Publications

Deterministic Sparse FFT via Keyed Multi-View Gating with $O(\sqrt{N} \log k)$ Expected Time

Aaron R. Flouro, Shawn P. Chadwick

Published: May 5, 2026

eess.SPcs.DScs.IT
We introduce a deterministic sparse Fourier transform framework based on a keyed multi-view gating mechanism that leverages 2-of-3 Chinese Remainder Theorem (CRT) agreement to reduce candidate frequency pairs from $O(k^2)$ to $Θ(k)$ under sparse-regime assumptions. Unlike prior approaches that rely on randomized…

Safety-Certified CRT Sparse FFT: $Ω(k^2)$ Lower Bound and $O(N \log N)$ Worst-Case

Aaron R. Flouro, Shawn P. Chadwick

Published: April 20, 2026

eess.SPcs.DScs.IT
Computing Fourier transforms of k-sparse signals, where only k of N frequencies are non-zero, is fundamental in compressed sensing, radar, and medical imaging. While the Fast Fourier Transform (FFT) evaluates all N frequencies in $O(N \log N)$ time, sufficiently sparse signals should admit sub-linear complexity in N.…

Post-Training Probability Manifold Correction via Structured SVD Pruning and Self-Referential Distillation

Aaron R. Flouro, Shawn P. Chadwick

Published: January 30, 2026

cs.LGcs.AIcs.CL
Large language models are expensive to deploy. We introduce Sparse Knowledge Distillation (SparseKD), a post-training method that compresses transformer models by combining structured SVD pruning with self-referential knowledge distillation. The key insight is simple: instead of using an external teacher, the model…

Adaptive Weighting in Knowledge Distillation: An Axiomatic Framework for Multi-Scale Teacher Ensemble Optimization

Aaron R. Flouro, Shawn P. Chadwick

Published: January 25, 2026

cs.LG
Knowledge distillation with multiple teachers is increasingly used to improve robustness, efficiency, and safety, yet existing approaches rely largely on heuristic or implementation-specific weighting schemes. This paper develops an operator-agnostic axiomatic framework for adaptive weighting in multi-teacher knowledge…

Recursive Meta-Distillation: An Axiomatic Framework for Iterative Knowledge Refinement

Aaron R. Flouro, Shawn P. Chadwick

Published: January 19, 2026

cs.LG
Recent work in probability-domain knowledge distillation has established axiomatic frameworks for temperature scaling, multi-teacher aggregation, and bias-variance trade-offs in single-stage settings. However, the mathematical behavior of recursive or multi-generation distillation remains poorly understood, with prior…

Multi-Teacher Ensemble Distillation: A Mathematical Framework for Probability-Domain Knowledge Aggregation

Aaron R. Flouro, Shawn P. Chadwick

Published: January 14, 2026

cs.LG
Building on the probability-domain distillation framework of Sparse-KD, we develop an axiomatic, operator-theoretic framework for multi-teacher ensemble knowledge distillation. Rather than prescribing a specific aggregation formula, we define five core axioms governing valid knowledge aggregation operators,…

Hallucinations Live in Variance

Aaron R. Flouro, Shawn P. Chadwick

Published: January 11, 2026

cs.LGcs.AI
Benchmarks measure whether a model is correct. They do not measure whether a model is reliable. This distinction is largely academic for single-shot inference, but becomes critical for agentic AI systems, where a single rephrased prompt can trigger cascading failures in multi-step execution. Yet this form of…

Sparse Knowledge Distillation: A Mathematical Framework for Probability-Domain Temperature Scaling and Multi-Stage Compression

Aaron R. Flouro, Shawn P. Chadwick

Published: January 6, 2026

cs.LG
We develop a unified theoretical framework for sparse knowledge distillation based on probability-domain softening operators. While the equivalence $p^{1/T} \propto \mathrm{softmax}(z/T)$ is well known, our contribution is an operator-level analytical framework built on this foundation rather than the equivalence…

Foundational Readings

Curated academic papers that inform our research directions.

Note: These are external publications from arXiv, not SparseTech publications. We share them as context for the mathematical foundations underlying our work.

Showing 709,845 results

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Yuqing Wang, Zhijie Lin, Ceyuan Yang +10 more

Published: May 29, 2026

cs.CV
Unified multimodal models (UMMs) aim to handle perception and generation in a single model. Yet existing UMMs still rely on a frozen, separately pretrained VAE for image generation, imposing a structural bottleneck. Naively removing it introduces a quality gap, as the model must learn both high-level structure and…

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Jiazheng Xing, Hangjie Yuan, Lingling Cai +9 more

Published: May 29, 2026

cs.CVcs.AI
Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient…

Linear Scaling Video VLMs for Long Video Understanding

Cristobal Eyzaguirre, Jiajun Wu, Juan Carlos Niebles

Published: May 29, 2026

cs.CV
Video vision-language models (VLMs) are increasingly used in long-horizon and streaming settings, yet most video encoders still rely on spatiotemporal self-attention, causing compute and latency to grow quadratically with the number of frames. Existing efficiency methods improve scalability but often lose accuracy…

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

Olaf Dünkel, Basavaraj Sunagad, Haoran Wang +3 more

Published: May 29, 2026

cs.CV
Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large…

KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems

Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil

Published: May 29, 2026

cs.CVcs.LG
Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require some knowledge of the shifted distribution, fail to detect subtle or localized distribution…

Learning Global Motion with Compact Gaussians for Feed-Forward 4D Reconstruction

Mungyeom Kim, Minkyeong Jeon, Honggyu An +10 more

Published: May 29, 2026

cs.CV
Dynamic scene reconstruction from monocular video remains a fundamental challenge in computer vision. Existing feed-forward methods predict 3D Gaussians pixel-wise for each frame, suffering from duplicated Gaussians and view-dependent biases that hinder effective learning of scene motion. We present C4G, a feed-forward…

A Tight Theory of Error Feedback Algorithms in Distributed Optimization

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

Published: May 29, 2026

cs.LGmath.OC
Communication costs are a major bottleneck in distributed learning and first-order optimization. A common approach to alleviate this issue is to compress the gradient information exchanged between agents. However, such compression typically degrades the convergence guarantees of gradient-based methods. Error feedback…

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam +7 more

Published: May 29, 2026

cs.CRcs.AI
Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, splitting a harmful task across many user accounts so each individual transcript looks benign. Because safety monitors…

CoFiDA-M: Concept-Aware Feature Modulation for Cross-Domain Adaptation with Image-Only Inference

Nurjahan Sultana, Moi Hoon Yap, Xinqi Fan +1 more

Published: May 29, 2026

cs.CV
Models for AI-based skin cancer screening suffer a severe performance drop when shifting from expert dermoscopic (source) images to consumer-grade clinical (target) images, hindering real-world deployment. Existing domain adaptation methods often ignore crucial semantic invariants, such as clinical concepts. While new…

TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation

Ruotong Liao, Guowen Huang, Qing Cheng +6 more

Published: May 29, 2026

cs.CVcs.AI
Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoising trajectory where conditioning text…

Recognizing Co-Speech Gestures in-the-Wild

Sindhu B Hegde, K R Prajwal, Andrew Zisserman

Published: May 29, 2026

cs.CV
While humans naturally gesture during speech, only a sparse subset of these movements are visually depictive and semantically linked to specific spoken words. Current multimodal models struggle to capture these semantic co-speech gestures, heavily bottlenecked by a lack of precisely annotated training data. To address…

Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions

Wesley Scivetti, Ethan Wilcox, Nathan Schneider +2 more

Published: May 29, 2026

cs.CLcs.AI
Grasping the semantics of rare constructions (form-meaning pairings) has been shown to be a challenging problem that has currently only been solved by the largest LLMs. It remains an open question if open-source models have robust constructional understanding, and if so, what learning dynamics underlie the acquisition…

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou +1 more

Published: May 29, 2026

cs.CLcs.AIcs.LG
Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability…

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

Albert Sadowski, Jarosław A. Chudziak

Published: May 29, 2026

cs.AI
The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We introduce context-dependent argumentation frameworks (CDAFs), an extension of Dung's theory in which a defeat function…

Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad +1 more

Published: May 29, 2026

cs.LG
Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a…

Functional Multi-Target Detection via Bispectrum Inversion

Anna Little, Daniel Sanz-Alonso, Mikhail Sweeney +1 more

Published: May 29, 2026

eess.SPcs.ITmath.ST
This paper develops a functional theory for multi-target detection, where a compactly supported signal is recovered from a single noisy observation containing many unknown translations of the signal. Our formulation allows continuous, off-grid translations and correlated stationary Gaussian process noise, extending…

SurGe: Improved Surface Geometry in Point Maps

Karim Knaebel, Gonzalo Martin Garcia, Christian Schmidt +4 more

Published: May 29, 2026

cs.CV
Recent feedforward 3D reconstruction methods predict point maps and estimate global 3D geometry remarkably well. However, their predictions still exhibit inaccurate local surface geometry, which is clearly visible qualitatively but only weakly reflected in common metrics. To make these errors more explicit in…

Joint Multi-Camera LiDAR Extrinsic Calibration via Learned Pairwise Initialization and Geometric Refinement

Aziz Al-Najjar, Marzieh Amini, James R. Green +1 more

Published: May 29, 2026

cs.CV
Most learning-based camera-LiDAR calibration methods treat each camera-LiDAR pair independently, ignoring the rigid geometric coupling in multi-camera platforms. As a result, per-camera estimates may be individually accurate yet inconsistent at the system level. We present a two-stage framework for joint multi-camera…

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

Eric Liang

Published: May 29, 2026

cs.IRcs.AI
Scalable information retrieval testing needs corpora that are large enough to stress index construction, ranking latency, query routing, and evaluation tooling, yet human-judged test collections remain expensive and may be unavailable when documents are private or still under design. This paper introduces SPECTRA, a…

nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving

Zhiyu Huang, Johnson Liu, Rui Song +13 more

Published: May 29, 2026

cs.CV
Reasoning is essential for autonomous driving (AD) in long-tail scenarios, where vehicles must apply commonsense knowledge, understand spatial relations, infer agent interactions, and make safe decisions. However, existing AD datasets and benchmarks mainly target perception, prediction, or planning, and provide limited…

What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation

Qing Wang, Jacob Devasier, Chengkai Li

Published: May 29, 2026

cs.CLcs.AI
We present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation. We analyze MDLM generation trajectories -- the order in which tokens are unmasked during iterative decoding -- and find that, unlike autoregressive LLMs which generate text linearly, MDLMs naturally…

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

Benedetta Muscato, Beiduo Chen, Gizem Gezici +2 more

Published: May 29, 2026

cs.CL
Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote --…

Effective Biological Representation Learning by Masking Gene Expression

Kian Kenyon-Dean, Alina Selega, Ihab Bendidi +5 more

Published: May 29, 2026

cs.LG
RNA sequencing produces rich and diverse datasets of gene expression, offering compelling insights into cellular state and function that have many applications in drug discovery. Modeling such data is challenging due to inherent technical noise and experimental batch effects, as evidenced by many existing…

What Am I Missing? Question-Answering as Hidden State Probing

Chu Fei Luo, Samuel Dahan, Xiaodan Zhu

Published: May 29, 2026

cs.CL
Test-time reasoning has become a significant field of study since the introduction of chain-of-thought reasoning in large language models (LLMs). However, the mechanisms of this reasoning process are still under-explored -- from the same input prompt, and even the same partial solution, LLMs can produce varied answers…