Tab‑MIA: A Benchmark Dataset
for Membership Inference Attacks
on Tabular Data in LLMs

1Ben‑Gurion University of the Negev
The same table serialized into six encoding formats used in Tab‑MIA

Abstract

Tab‑MIA introduces the first dedicated benchmark for measuring how susceptible large‑language‑models (LLMs) are to membership‑inference attacks (MIAs) when they are trained on tabular data. The benchmark bundles five real‑world table collections — ranging from census records to Wikipedia QA tables — and serialises each one into six popular text‑based formats (JSON, HTML, Markdown, Key‑Value, Key‑is‑Value and Line‑Separated).

Using Tab‑MIA we evaluate three state‑of‑the‑art black‑box MIAs (LOSS, Min‑K %, Min‑K %++) against four open‑weight LLMs (LLaMA‑3 8 B, LLaMA‑3 3 B, Mistral 7 B and Gemma‑3 4 B). Even with as few as three epochs of fine‑tuning, attacks achieve AUROC scores above 90 % on most short‑context datasets, revealing severe privacy risks. We further show that encoding choices matter: flat row‑oriented serialisations (Line‑Separated, Key‑Value) amplify memorisation, whereas tag‑heavy encodings such as HTML dilute it.

Tab‑MIA is released on HuggingFace together with evaluation scripts to catalyse future research on privacy‑preserving training and defences for tabular data in LLMs.

Benchmark Overview

Each dataset in Tab‑MIA undergoes a three‑step pipeline: (i) deduplication and filtering, (ii) optional row‑chunking for long tables, and (iii) six‑fold serialization. The result is a set of JSONL files whose lines correspond to individual tables (or table chunks), making it simple to stream data during fine‑tuning.

The table below summarises the included collections.

Datasets in Tab‑MIA
CollectionDomainContext# Records# Features
WikiTableQuestionsWikipediaShort1 290≥ 5
WikiSQLWikipediaShort17 900≥ 5
TabFactWikipediaShort13 100≥ 5
Adult (Census)Income PredictionLong2 44015
California HousingHousing PricesLong1 03010

Effect of Encoding Format

AUROC scores on the WTQ dataset for Min‑K++ 20 % MIA across encoding formats and models. Bold = best per row.
Encoding Llama-3.2 3B Mistral 7B Gemma-3 4B
Markdown85.394.286.7
JSON79.882.779.2
HTML79.792.983.3
Key-Value Pair83.594.985.5
Key-is-Value83.789.785.0
Line-Separated89.797.789.6

Effect of Fine-Tuning Epochs

AUROC vs. epochs across datasets (Line‑Separated encoding, Min‑K++ 20 % MIA). Bold = best per row.
Model# EpochsAdultCaliforniaWTQWikiSQLTabFact
LLaMA-3.1 8B155.159.061.664.564.9
260.072.880.878.679.6
371.187.893.688.989.9
LLaMA-3.2 3B154.157.757.661.561.5
258.066.874.873.673.4
364.477.289.783.280.4
Mistral 7B154.657.869.767.568.5
258.970.388.480.081.2
371.586.897.787.889.9
Gemma-3 4B153.954.359.362.663.3
258.962.577.076.677.9
367.773.889.686.187.4

BibTeX

@misc{german2025tabmiabenchmarkdatasetmembership,
  title        = {Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs},
  author       = {Eyal German and Sagiv Antebi and Daniel Samira and Asaf Shabtai and Yuval Elovici},
  year         = {2025},
  eprint       = {2507.17259},
  archivePrefix= {arXiv},
  primaryClass = {cs.CR},
  url          = {https://arxiv.org/abs/2507.17259}
}