Postdoctoral Researcher

Zeyu Ding

Dr. rer. nat. · Statistics & Machine Learning

TU Dortmund & Lamarr Institute for ML and AI

I develop scalable Bayesian methods and coreset theory for large-scale statistical inference, with applications spanning particle physics simulations and high-dimensional generative models.

Bayesian Statistics Coreset Theory Computational Statistics Scalable Inference
scroll

About Me

I am a Postdoctoral Researcher at TU Dortmund University and the Lamarr Institute for Machine Learning and AI, one of Germany's leading AI research centers. I completed my PhD in Statistics in February 2026 with a dissertation on large-scale data reduction based on coresets.

My work sits at the intersection of theoretical statistics and practical scalability: I develop coreset methods that compress massive datasets while provably preserving the statistical properties needed for Bayesian inference. Beyond methodology, I apply these tools to challenging domains including particle physics (CERN/ATLAS) and multivariate generative models.

I am currently exploring opportunities in quantitative research, applied statistics, and ML research roles in industry — particularly in pharmaceutical statistics, tech, and finance.

PhD in Statistics
TU Dortmund, Germany
2021 – 2025 · Dr. rer. nat.
Dissertation: Large-Scale Data Reduction Based on Coresets
Supervisors: Prof. K. Ickstadt & Dr. A. Munteanu
MSc in Quantitative Economics
Georg-August-Universität Göttingen, Germany
2017 – 2020
Thesis: Variable Importance Measures for Functional Gradient Descent Boosting
Young Statistician Award, 67th Biometric Colloquium
BSc in Quantitative Economics
Xi'an Jiaotong University, China
2011 – 2015

Publications

Journal Articles

ADAC 2024

Scalable Bayesian p-Generalized Probit and Logistic Regression

Zeyu Ding, Katja Ickstadt, Alexander Munteanu, Simon Omlor

Advances in Data Analysis and Classification, 2024

BayesianClassificationMCMC
Book Chapter 2023

Bayesian Analysis for Dimensionality and Complexity Reduction

Zeyu Ding, Katja Ickstadt, Alexander Munteanu

Machine Learning under Resource Constraints, deGruyter, Berlin, 58–70

BayesianDimensionality Reduction

Preprints & Under Review

Under Review

A Benchmark Suite for Monte Carlo Sampling Algorithms

Zeyu Ding et al.

Submitted 2024

Monte CarloBenchmarkingJulia

Conference Talks & Posters

CMStatistics 2024, KCL London A Benchmark Suite for Monte Carlo Sampling Algorithms Poster
KISS Workshop, Hamburg 2024 Artificial Intelligence for Large-Scale Scientific Simulations Talk
CMStatistics 2023, Berlin Efficiency Coresets Techniques for MCTMs Talk
CMStatistics 2022, KCL London Scalable Bayesian p-Generalized Probit and Logistic Regression via Coresets Talk

Experience

2026 – Present

Postdoctoral Researcher

TU Dortmund & Lamarr Institute for Machine Learning and AI

Continuing research on scalable Bayesian methods, coreset theory, and applications to particle physics simulation in collaboration with CERN/ATLAS.

CoresetsBayesian InferenceCERN/ATLAS
2021 – 2025

PhD Researcher & Teaching Assistant

TU Dortmund — SFB 876-C4 Research Group

Developed novel algorithms for large-scale Bayesian inference. Taught Monte Carlo Simulation, Bayesian Statistics, Statistical Learning & Big Data (100+ students per semester). Developed R package BayesPprobit (CRAN) and Julia package MCBench.

R / Python / JuliaMCMCPackage DevTeaching
Sep 2019 – Mar 2020

Quantitative Risk Management Intern

Daimler Mobility AG, Global Headquarters

Led AI for Credit Scoring project — improved global corporate rating model ROC from 77% to 86% using XGBoost, LightGBM, and Random Forest. Built automated reporting tools in VBA and SAS.

Credit RiskXGBoostSASVBA
May – Jul 2019

Risk Management Intern

China Construction Bank, Frankfurt Branch

Developed R scripts for automated VaR calculation, backtested PD and LGD models, and prepared sovereign risk & liquidity stress-testing reports.

VaRRStress Testing

Open Source & Research Projects

📦

BayesPprobit

R Package · CRAN

Scalable Bayesian estimation for p-generalized probit and logistic regression models via coreset-accelerated MCMC. Handles large-scale binary classification problems with theoretical guarantees.

MCBench

Julia Package

A comprehensive benchmark suite for Monte Carlo sampling algorithms. Provides standardized test distributions, convergence diagnostics, and performance metrics for comparing MCMC methods.

⚛️

AI for Physics — KISS Project

Collaborative Research · TU Dortmund + CERN

Developing Bayesian and Monte Carlo algorithms for rapid simulation of particle collision experiments using ATLAS data. Applying GANs, normalizing flows, and diffusion models.

🔬

Big Data for MCTM / Normalizing Flows

Research · TU Dortmund

Novel coreset compression algorithms for multivariate conditional transformation models. Theoretical proofs of approximation guarantees with empirical validation across simulation studies.

Technical Skills

Programming Languages

Python R Julia SAS SQL VBA PySpark PyTorch LaTeX
Expert   Proficient   Intermediate

Bayesian & Statistical Methods

MCMC Coreset Theory Prior Design Bayesian Model Selection GLMs Time Series (ARIMA/GARCH) Score Matching Langevin Dynamics

Machine Learning & Deep Learning

Gradient Boosting / XGBoost Random Forests Normalizing Flows Diffusion Models GANs Bayesian Neural Networks RNN / LSTM Transformers

Tools & Infrastructure

Git AWS Hadoop / Spark Docker

Languages

Chinese (Native) German (C1) English (Professional)

Let's Connect

Open to research collaborations, industry opportunities, and academic exchanges.