Papers
arxiv:2606.19827

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Published on Jun 18
· Submitted by
DaehwanKim
on Jun 22
Authors:
,

Abstract

Adaptive Binning introduces a training-adaptive discretization method for self-supervised learning on medical tabular data, improving representation learning through feature-wise refinement and heterogeneous feature handling.

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.

Community

Paper author Paper submitter

main_figure

This paper proposes Adaptive Binning for medical tabular self-supervised learning. The core idea is to replace fixed global quantile binning with a learning-coupled, feature-wise coarse-to-fine curriculum that determines when to refine each feature, where to split its bins, and how to supervise mixed categorical–numerical schemas through type-aware ordinal reconstruction.

We show that adaptive discretization yields stronger representations across diverse public medical tabular datasets in both linear probing and fine-tuning evaluations. We also establish a unified benchmark for reproducible medical tabular self-supervised learning.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.19827
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.19827 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19827 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19827 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.