About Me
I am a Ph.D. candidate at TU Dortmund specializing in Bayesian Statistics, Machine Learning, and Data Compression techniques for high-dimensional models. My research aims to develop efficient statistical methods for analyzing and modeling complex, high-dimensional data, with applications in scientific computing and large-scale simulation. My core expertise lies in Bayesian Statistics, and I am particularly interested in exploring its potential in advancing generative modeling methods.
Education
- Ph.D. in Statistics, TU Dortmund, Germany (2021 - 2025)
- M.Sc. in Quantitative Economics, Georg-August-University Göttingen, Germany (2017 - 2020)
- B.Sc. in Quantitative Economics, Xi’an Jiaotong University, China (2011 - 2015)
Research Interests
- Bayesian Statistics and Machine Learning
- High-dimensional Data Compression and Approximation
- Generative Models: GANs, Diffusion Models, and Normalizing Flows
- Exploring Transformer Architectures for High-dimensional Generative Modeling
- Integrating Bayesian Methods with Deep Learning Frameworks
Projects
-
Big Data for Advanced Classification Models (2021 - 2023)
Developed scalable Bayesian algorithms for high-dimensional probit and logistic regression models. -
AI for Physics (KISS Project) (2023 - Present)
Applied advanced Bayesian and Monte Carlo methods for particle physics simulation. -
Big Data for Copula Models (2023 - Present)
Created efficient data compression algorithms for multivariate conditional transformation models.
Working Experience
- Scientific Researcher, The Lamarr Institute for Machine Learning and Artificial Intelligence, Germany (2024 - Present)
- Scientific Researcher, TU Dortmund, Germany (2021 - Present)
- Intern in Quantitative Risk Management, Daimler Mobility AG, Germany (2019 - 2020)
- Intern in Risk Management, China Construction Bank, Frankfurt Branch (2019)
Academic Activities
Publications
-
Scalable Bayesian p-Generalized Probit and Logistic Regression
Advances in Data Analysis and Classification, 2024
Developed scalable Bayesian algorithms for high-dimensional classification problems. -
Bayesian Analysis for Dimensionality and Complexity Reduction
Machine Learning under Resource Constraints, deGruyter, Berlin, 2023
Unified Bayesian approaches for dimensionality reduction in resource-constrained environments. -
Efficiency Coresets Techniques for Multivariate Conditional Transformation Models
submitted, 2024
Proposed innovative coreset methods for high-dimensional data compression in generative models. -
A Benchmark Suite for Monte Carlo Sampling Algorithms
submitted, 2024
Developed new Monte Carlo sampling test metrics for academic and non-adademic users.
Talks
-
A Benchmark Suite for Monte Carlo Sampling Algorithms
18th International Conference on Computational and Methodological Statistics (CMStatistics), KCL, London, Dec. 2024
Poster presentation -
Artificial Intelligence for Large-Scale Scientific Simulations
KISS Project Workshop, University of Hamburg, Feb. 2024
Explored AI techniques in high-energy physics simulations with CERN’s LHC data. -
Efficiency Coresets Techniques for Multivariate Conditional Transformation Models
17th International Conference on Computational and Methodological Statistics (CMStatistics), Berlin, Dec. 2023
Presented data compression techniques for multivariate conditional transformations. -
Scalable Bayesian p-Generalized Probit and Logistic Regression via Coresets
16th International Conference on Computational and Methodological Statistics (CMStatistics), KCL, London, Dec. 2022
Discussed computational efficiency in Bayesian high-dimensional classification. -
6th International Summer School 2022 on Machine Learning under Resource Constraints
Poster, TU Dortmund, Sep. 2022
Topics regarding Bayesian models and coresets approaches
Ongoing Research
-
Adaptive Sliced Maximum Mean Discrepancy with Generalized Kernels and Random Fourier Features
-
Enhancing Score Matching with P-Normalized Kernels: Theory and Langevin Dynamics Implementation
-
Regularization and Prior Choice for the Bayesian Generalized Probit Model
Skills
Programming Languages
- Advanced: Python, R
- Proficient: SAS, Julia, SQL
- Intermediate: VBA, PySpark, PyTorch
Statistical and Machine Learning Expertise
- Bayesian Methods: MCMC, Prior Design, Model Selection
- Machine Learning: Gradient Boosting, Random Forests, SVMs
- Deep Learning: Neural Networks, RNNs, LSTMs, Bayesian Neural Networks
- Statistical Modeling: GLMs, Time Series (ARIMA, GARCH, etc.)
Contact
- Email: zeyu.ding@tu-dortmund.de
- GitHub: zeyudsai
- LinkedIn: Zeyu Ding