HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlations

The University of Melbourne, Melbourne, Australia
ECCV 2026

Abstract

Deep neural networks often rely on spurious features to make predictions, which makes them brittle under distribution shift and on samples where the spurious correlation does not hold (e.g., minority-group examples). Recent studies have shown that, even in such settings, the feature extractor of an Empirical Risk Minimization (ERM)-trained model can learn rich and informative representations, and that much of the failure may be attributed to the classifier head. In particular, retraining a lightweight head while keeping the backbone frozen can substantially improve performance on shifted distributions and minority groups. Motivated by this observation, we propose a bilevel meta-learning method that performs augmentation directly in feature space to improve spurious correlation handling in the classifier head. Our method learns support-side feature edits such that, after a small number of inner-loop updates on the edited features, the classifier achieves lower loss on hard examples and improved worst-group performance. By operating at the backbone output rather than in pixel space or through end-to-end optimization, the method is highly efficient and stable, requiring only a few minutes of training on a single GPU. We further validate our method with CLIP-based visualizations, showing that the learned feature-space updates induce semantically meaningful shifts aligned with spurious attributes.

Method

HSFM overview

Figure 1. Overview of HSFM. (a) Initial training samples are passed through a frozen ERM feature extractor to obtain support embeddings. The linear head is adapted on the support set through the inner loss, and the outer loss is computed on the hard set. (b) Flowchart of the training procedure.

Main Results

Worst-group and average accuracy on spurious-correlation benchmarks with a ResNet-50 backbone. Best overall in bold; best without group labels during training underlined.

Method Waterbirds CelebA MetaShift Dominoes
WorstAvg WorstAvg WorstAvg WorstAvg
Base (ERM) 74.690.21 30.695.83 64.175.65 78.688.83
DFR 92.3±0.293.3±0.5 88.3±1.191.3±0.3 72.8±0.677.5±0.6 90.0±0.492.3±0.2
DaC 92.3±0.495.3±0.4 81.9±0.791.4±1.1 78.3±1.679.3±0.1 89.2±0.192.2±0.3
DDB 93.0±0.193.6±0.1 85.8±1.487.3±0.7 81.2±0.281.3±0.2 --
HSFM (ours) 93.1±0.194.0±0.5 89.2±0.290.6±0.5 77.2±0.177.4±0.0 90.4±0.392.9±0.2

unCLIP Visualization

unCLIP visualization

Figure 2. Visualization of original images, SD unCLIP outputs from initial embeddings, and outputs from optimized embeddings. Feature-space edits align with spurious attributes (background on Waterbirds; gender on CelebA).

BibTeX