Publications

Measurement noise scaling laws for cellular representation learning

Gokul Gowri*, Igor Sadalski*, Dan Raviv, Peng Yin, Jonathan Rosenfeld, Allon Klein (*equal contribution)

submitted to Nature Biotechnology

Large genomic and imaging datasets can be used to fit models that learn representations of cellular systems, extracting informative structure from data. In other domains, model performance improves predictably with dataset size, providing a basis for allocating data and computation. In biological data, however, performance is also limited by measurement noise arising from technical factors such as molecular undersampling or imaging variability. By learning representations of single-cell genomic and imaging data, we show that noise defines a distinct axis along which performance improves predictably across tasks. This scaling follows a simple logarithmic law that is consistent across model types, tasks, and datasets, and can be derived quantitatively from a model of noise propagation. We identify robustness to noise and saturating performance as properties that vary across models and tasks.

Paper link

When Task-Specific Learning Outperforms Transfer Learning: A Benchmark of Gene and Expression Encoding Strategies

Igor Sadalski

submited to The 2nd Workshop on Foundation Models for Science (and ICML 2026)

We present a systematic benchmark comparing gene and expression encoding strategies for single-cell foundational models by training models from scratch under controlled conditions, scaling to 10 million cells across 100 diverse datasets. Contrary to common assumptions, we find that pretrained embeddings from large protein models like ESM-2 consistently underperform task-specific learned embeddings. Our work provides clear empirical guidance for model design decisions and establishes a systematic benchmark for evaluating encoding strategies in single-cell foundational models.

Paper link

Scaling up Measurement Noise Scaling Laws

Igor Sadalski, Dan Raviv, Jonathan Rosenfeld, Allon Klein, Gokul Gowri

ICML 2025 Workshop on Multi-modal Foundational Models for Life Sciences

Learning meaningful representations of cellular states is a key problem in computational biology. Yet, the scaling behavior of single-cell representation learning models remains poorly understood. While recent work has proposed that model performance scales predictably with measurement noise, this hypothesis has only been validated with relatively small models and datasets. We demonstrate that previously observed noise-scaling behavior again consistently emerges in these large-scale models and datasets.

Paper link

Generative Modelling of Residuals for Real-Time Risk-Sensitive Safety with Discrete-Time Control Barrier Functions

Ryan K. Cosner, Igor Sadalski, Jana K. Woo, Preston Culbertson, Aaron D. Ames

International Conference on Robotics and Automation 2024

A key source of brittleness for robotic systems is the presence of model uncertainty and external disturbances. This work proposes a training a state-conditioned generative model to represent the distribution of error residuals between the nominal dynamics and the actual system. We demonstrate our approach in simulations and hardware, and show that our method can learn a disturbance model that is accurate enough to enable risk-sensitive control of a quadrotor flying aggressively with an unmodelled slung load.

Paper link