When Task-Specific Learning Outperforms Transfer Learning: A Benchmark of Gene and Expression Encoding Strategies

submited to The 2nd Workshop on Foundation Models for Science (and ICML 2026), 2025

This work presents a large-scale benchmark of gene and expression tokenization strategies for single-cell foundational models. The research investigates encoding strategies for transformer-based models on genomic data, conducted at Somite.ai.

Paper link