Membership Inference Under Differential Privacy: Quantifying How DP-SGD Prevents Privacy Leakage
Introduction
Machine learning models can inadvertently memorize training data, making them vulnerable to membership inference attacks (MIA)[shokri2017membership]. In a membership inference attack, an adversary determines whether a specific data point was used to train a model—a direct violation of data privacy.
Differential privacy (DP) provides a principled defense. DP-SGD[abadi2016deep] modifies stochastic gradient descent by clipping per-sample gradients and adding calibrated Gaussian noise, bounding the influence of any individual training sample. The privacy guarantee is parameterized by : smaller means stronger privacy.
While the theory guarantees bounded information leakage, the practical effectiveness of DP-SGD against membership inference attacks—and the associated utility cost—is less well-characterized. In this work, we provide a controlled empirical study quantifying the privacy-utility-leakage triad across four privacy levels.
Method
Experimental Setup
Data. We use synthetic Gaussian cluster classification data: 500 samples, 10 features, 5 classes, with cluster standard deviation 2.5 and center spread 2.0. Each dataset is split 50/50 into members (training set) and non-members (holdout).
Target model. 2-layer MLP with 128 hidden units and ReLU activation, trained for 80 epochs with SGD (lr=0.1, batch size 32). The large model and many epochs are chosen to induce overfitting, which creates the generalization gap that membership inference exploits.
Privacy levels. We test four DP-SGD configurations with clipping norm :
\begin{center}
| Level | σ | \varepsilon (approx.) | Description |
|---|---|---|---|
| Non-private | 0.0 | ∞ | Standard SGD |
| Weak DP | 0.5 | 53 | Minimal noise |
| Moderate DP | 2.0 | 9 | Moderate noise |
| Strong DP | 5.0 | 3 | Heavy noise |
| \end{center} |
DP-SGD Implementation
We implement DP-SGD from scratch (no Opacus) following [abadi2016deep]:
- Per-sample gradients via
torch.func.vmapapplied totorch.func.grad. - Per-sample clipping: each gradient is clipped to norm .
- Noise injection: Gaussian noise added to the sum of clipped gradients.
- Privacy accounting: simplified R'{e}nyi DP composition with conversion to -DP.
Membership Inference Attack
We implement the shadow model attack of [shokri2017membership]:
- Train 3 shadow models per configuration, each on a fresh random dataset with known member/non-member splits and the same DP training config as the target.
- For each sample, extract attack features from the model: softmax probability vector, maximum confidence, prediction entropy, cross-entropy loss on the true label, and correctness indicator.
- Train a binary neural network attack classifier to distinguish members (label 1) from non-members (label 0) based on these features.
- Apply the attack classifier to the target model's outputs.
Evaluation
We report attack AUC (ROC area under curve) and attack accuracy. AUC = 0.5 corresponds to random guessing (no information leakage). We run 3 seeds per configuration and report mean standard deviation.
Results
Membership inference results across privacy levels (mean ± std over 3 seeds).
| Privacy Level | σ | \varepsilon | Test Acc. | Attack AUC |
|---|---|---|---|---|
| Non-private | 0.0 | ∞ | 0.792 ± 0.116 | 0.664 ± 0.060 |
| Weak DP | 0.5 | 53.5 | 0.849 ± 0.085 | 0.532 ± 0.019 |
| Moderate DP | 2.0 | 9.4 | 0.805 ± 0.091 | 0.541 ± 0.010 |
| Strong DP | 5.0 | 3.4 | 0.709 ± 0.118 | 0.518 ± 0.004 |
Key findings:
Non-private models are vulnerable. Without DP, the attack achieves AUC = 0.664, well above random (0.5). The model's overfitting (generalization gap) leaks membership information through its confidence patterns.
DP-SGD effectively mitigates the attack. Even weak DP () dramatically reduces attack AUC from 0.664 to 0.532. Strong DP () further reduces it to 0.518, near random guessing.
Privacy-utility trade-off. Strong DP reduces test accuracy from 79.2% to 70.9% (a 8.3 percentage point drop). This quantifies the cost of privacy protection.
Overfitting drives vulnerability. The generalization gap (train accuracy test accuracy) strongly correlates with attack success, consistent with the intuition that membership inference exploits memorization.
Discussion
Our results confirm the theoretical prediction that DP-SGD bounds membership inference leakage. The mechanism is twofold: (1) noise injection prevents the model from memorizing individual samples, reducing the generalization gap; (2) gradient clipping bounds the sensitivity of the training algorithm to any single sample.
The strong practical effectiveness even at moderate privacy levels ( already reduces AUC substantially) suggests that DP-SGD provides meaningful privacy protection at reasonable utility cost.
Limitations. Our experiments use synthetic data and small models. Real-world datasets with richer structure may show different privacy-utility trade-offs. Our simplified privacy accounting provides upper-bound estimates; tighter accounting (e.g., PLD or Gaussian DP) would yield smaller values for the same noise levels.
Reproducibility
All experiments are reproducible via the accompanying SKILL.md. The DP-SGD implementation uses no external DP libraries. Seeds are fixed at [42, 123, 456]. Dependencies are pinned: PyTorch 2.6.0, NumPy 2.2.4. In our CPU-only verification runs, the metric outputs were stable across reruns while wall-clock runtime varied between roughly 30 and 35 seconds.
\bibliographystyle{plainnat}
References
[abadi2016deep] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308--318, 2016.
[shokri2017membership] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3--18, 2017.
[mironov2017renyi] I. Mironov. R'{e}nyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263--275, 2017.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# Skill: Membership Inference Under Differential Privacy Reproduce an experiment showing that DP-SGD empirically reduces membership inference attack success in this controlled setting. Train 2-layer MLPs on synthetic Gaussian cluster data with four privacy levels (non-private, weak/moderate/strong DP), then run shadow-model membership inference attacks (Shokri et al. 2017) against each. Measure attack AUC, model utility, and the privacy-utility-leakage triad. **Key finding:** On the verified March 28, 2026 runs, DP-SGD with strong privacy (sigma=5.0, epsilon~3.4) reduces membership inference AUC from 0.664 to 0.518 (near random guessing at 0.5), a reduction of 0.146. ## Prerequisites - Python 3.11+ with `pip` - ~500 MB disk (PyTorch CPU) - CPU only; no GPU required - No API keys or authentication needed - Runtime: about 35 seconds wall-clock on a modern laptop CPU; budget up to 1 minute on slower machines ## Step 0: Get the Code Clone the repository and navigate to the submission directory: ```bash git clone https://github.com/davidydu/Claw4S.git cd Claw4S/submissions/dp-membership/ ``` All subsequent commands assume you are in this directory. ## Step 1: Set Up Virtual Environment ```bash python3 -m venv .venv .venv/bin/python -m pip install -r requirements.txt ``` **Expected output:** Successfully installed torch-2.6.0, numpy-2.2.4, scipy-1.15.2, matplotlib-3.10.1, pytest-8.3.5 (plus dependencies). ## Step 2: Run Unit Tests ```bash .venv/bin/python -m pytest tests/ -v ``` **Expected output:** All 28 tests pass. Key test groups: - `test_data.py` (6 tests) — synthetic data generation, member/non-member split, reproducibility, no overlap - `test_model.py` (3 tests) — MLP forward pass, shape checks, weight reproducibility - `test_dp_sgd.py` (8 tests) — per-sample gradients, gradient clipping, noise injection, epsilon accounting - `test_train.py` (3 tests) — standard + DP training, evaluation - `test_attack.py` (6 tests) — attack features, classifier training, attack metrics - `test_runtime.py` (2 tests) — script working-directory guard behavior ## Step 3: Run Full Experiment ```bash .venv/bin/python run.py ``` This runs the complete experiment (about 35 seconds wall-clock on the verified CPU-only runs): 1. For each of 4 privacy levels x 3 seeds = 12 configurations: - Generate 500-sample synthetic classification data (10 features, 5 classes, Gaussian clusters) - Train target model (2-layer MLP, hidden=128, 80 epochs) - Train 3 shadow models with same DP config on fresh data - Extract attack features (softmax, confidence, entropy, loss, correctness) - Train attack classifier on shadow model features - Run membership inference attack against target model 2. Aggregate results and generate plots **Expected output:** ``` [1/12] non-private (sigma=0.0), seed=42 epsilon=inf, test_acc=0.768, attack_auc=0.687 ... [12/12] strong-dp (sigma=5.0), seed=456 epsilon=3.38, test_acc=0.596, attack_auc=0.516 Results saved to results/results.json Generated 3 plots in results/ ======================================================================== MEMBERSHIP INFERENCE UNDER DIFFERENTIAL PRIVACY — RESULTS ======================================================================== Privacy Level sigma epsilon Test Acc Attack AUC Attack Acc non-private 0.0 inf 0.792+/-0.116 0.664+/-0.060 0.613+/-0.058 weak-dp 0.5 53.5 0.849+/-0.085 0.532+/-0.019 0.520+/-0.012 moderate-dp 2.0 9.4 0.805+/-0.091 0.541+/-0.010 0.529+/-0.009 strong-dp 5.0 3.4 0.709+/-0.118 0.518+/-0.004 0.521+/-0.017 ======================================================================== ``` **Generated files:** - `results/results.json` — all per-trial and aggregated metrics - Includes reproducibility metadata: seeds, dataset shape, model/training hyperparameters, DP accounting parameters (`max_grad_norm`, `delta`) - `results/summary.txt` — human-readable summary table - `results/attack_auc_vs_privacy.png` — bar chart of attack AUC per privacy level - `results/privacy_utility_leakage.png` — three-panel privacy-utility-leakage triad - `results/generalization_gap_vs_attack.png` — overfitting correlates with leakage ## Step 4: Validate Results ```bash .venv/bin/python validate.py ``` **Expected output:** ``` Privacy levels: 4 Seeds: 3 Total runs: 12 (expected 12) Non-private attack AUC: 0.664 Strong-DP attack AUC: 0.518 AUC reduction: 0.146 DP epsilon means: weak=53.46, moderate=9.43, strong=3.38 Non-private test accuracy: 0.792 Plot exists: results/attack_auc_vs_privacy.png Plot exists: results/privacy_utility_leakage.png Plot exists: results/generalization_gap_vs_attack.png Validation PASSED. ``` ## Method Details ### DP-SGD (Abadi et al. 2016) Implemented from scratch -- no Opacus or external DP library: 1. **Per-sample gradients** via `torch.func.vmap` + `torch.func.grad` 2. **Per-sample gradient clipping** to L2 norm bound C=1.0 3. **Gaussian noise** with std = sigma * C added to aggregated gradients 4. **Privacy accounting** using simplified RDP (Renyi Differential Privacy) composition, converted to (epsilon, delta)-DP ### Membership Inference Attack (Shokri et al. 2017) Shadow model approach with enriched features: 1. Train N=3 shadow models per config, each on fresh data with known member/non-member split 2. Extract rich attack features per sample: softmax vector, max confidence, prediction entropy, cross-entropy loss, correctness indicator 3. Train binary neural network attack classifier on shadow model features 4. Apply attack classifier to target model's outputs to infer membership ### Privacy Levels | Level | sigma | Approx. epsilon | Observed Attack AUC | |-------|-------|----------------|-------------------| | Non-private | 0.0 | inf | 0.664 +/- 0.060 (vulnerable) | | Weak DP | 0.5 | ~53 | 0.532 +/- 0.019 | | Moderate DP | 2.0 | ~9 | 0.541 +/- 0.010 | | Strong DP | 5.0 | ~3 | 0.518 +/- 0.004 (near-random) | ## How to Extend 1. **Different architectures:** Replace `MLP` in `src/model.py` with CNNs/Transformers; update `input_dim`, `hidden_dim`, `num_classes` parameters 2. **Real datasets:** Modify `src/data.py` to load CIFAR-10, MNIST, or tabular datasets; adjust `generate_gaussian_clusters()` or add a new data loader 3. **More attack types:** Add loss-threshold or label-only attacks in `src/attack.py` alongside the shadow model approach 4. **Tighter privacy accounting:** Replace RDP in `compute_epsilon()` with Gaussian DP (GDP) or Privacy Loss Distribution (PLD) accounting for tighter epsilon estimates 5. **More privacy levels:** Add entries to `PRIVACY_LEVELS` list in `src/experiment.py` 6. **Different DP mechanisms:** Modify `dp_sgd_step()` in `src/dp_sgd.py` to test alternative clipping strategies (e.g., adaptive clipping) or noise mechanisms ## Limitations - Synthetic data may not capture real-world distribution complexity - Small model (2-layer MLP, 128 hidden units) -- larger models may show different DP-utility trade-offs - Simplified RDP accounting gives upper-bound epsilon estimates; tighter accounting would yield smaller epsilon values - Shadow model attack assumes attacker knows the model architecture and training procedure - 3 seeds provides limited statistical power; production studies should use more seeds
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.