About Me
I am a Ph.D. candidate at TU Dortmund specializing in Bayesian Statistics, Machine Learning, and Data Compression techniques for high-dimensional models. My research aims to develop efficient statistical methods for analyzing and modeling complex, high-dimensional data, with applications in scientific computing and large-scale simulation. My core expertise lies in Bayesian Statistics, and I am particularly interested in exploring its potential in advancing generative modeling methods.
Education
- Ph.D. in Statistics, TU Dortmund, Germany (2021 - 2025)
 - M.Sc. in Quantitative Economics, Georg-August-University Göttingen, Germany (2017 - 2020)
 - B.Sc. in Quantitative Economics, Xi’an Jiaotong University, China (2011 - 2015)
 
Research Interests
- Bayesian Statistics and Machine Learning
 - High-dimensional Data Compression and Approximation
 - Generative Models: GANs, Diffusion Models, and Normalizing Flows
 - Exploring Transformer Architectures for High-dimensional Generative Modeling
 - Integrating Bayesian Methods with Deep Learning Frameworks
 
Projects
- 
                    Big Data for Advanced Classification Models (2021 - 2023)
Developed scalable Bayesian algorithms for high-dimensional probit and logistic regression models. - 
                    AI for Physics (KISS Project) (2023 - Present)
Applied advanced Bayesian and Monte Carlo methods for particle physics simulation. - 
                    Big Data for Copula Models (2023 - Present)
Created efficient data compression algorithms for multivariate conditional transformation models. 
Working Experience
- Scientific Researcher, The Lamarr Institute for Machine Learning and Artificial Intelligence, Germany (2024 - Present)
 - Scientific Researcher, TU Dortmund, Germany (2021 - Present)
 - Intern in Quantitative Risk Management, Daimler Mobility AG, Germany (2019 - 2020)
 - Intern in Risk Management, China Construction Bank, Frankfurt Branch (2019)
 
Academic Activities
Publications
- 
                Scalable Bayesian p-Generalized Probit and Logistic Regression
Advances in Data Analysis and Classification, 2024
Developed scalable Bayesian algorithms for high-dimensional classification problems. - 
                Bayesian Analysis for Dimensionality and Complexity Reduction
Machine Learning under Resource Constraints, deGruyter, Berlin, 2023
Unified Bayesian approaches for dimensionality reduction in resource-constrained environments. - 
                Efficiency Coresets Techniques for Multivariate Conditional Transformation Models
submitted, 2024
Proposed innovative coreset methods for high-dimensional data compression in generative models. - 
                A Benchmark Suite for Monte Carlo Sampling Algorithms
submitted, 2024
Developed new Monte Carlo sampling test metrics for academic and non-adademic users. 
Talks
- 
                A Benchmark Suite for Monte Carlo Sampling Algorithms
18th International Conference on Computational and Methodological Statistics (CMStatistics), KCL, London, Dec. 2024
Poster presentation - 
                Artificial Intelligence for Large-Scale Scientific Simulations
KISS Project Workshop, University of Hamburg, Feb. 2024
Explored AI techniques in high-energy physics simulations with CERN’s LHC data. - 
                Efficiency Coresets Techniques for Multivariate Conditional Transformation Models
17th International Conference on Computational and Methodological Statistics (CMStatistics), Berlin, Dec. 2023
Presented data compression techniques for multivariate conditional transformations. - 
                Scalable Bayesian p-Generalized Probit and Logistic Regression via Coresets
16th International Conference on Computational and Methodological Statistics (CMStatistics), KCL, London, Dec. 2022
Discussed computational efficiency in Bayesian high-dimensional classification. - 
                6th International Summer School 2022 on Machine Learning under Resource Constraints 
Poster, TU Dortmund, Sep. 2022
Topics regarding Bayesian models and coresets approaches 
Ongoing Research
- 
                Adaptive Sliced Maximum Mean Discrepancy with Generalized Kernels and Random Fourier Features
 - 
                Enhancing Score Matching with P-Normalized Kernels: Theory and Langevin Dynamics Implementation
 - 
                Regularization and Prior Choice for the Bayesian Generalized Probit Model
 
Skills
Programming Languages
- Advanced: Python, R
 - Proficient: SAS, Julia, SQL
 - Intermediate: VBA, PySpark, PyTorch
 
Statistical and Machine Learning Expertise
- Bayesian Methods: MCMC, Prior Design, Model Selection
 - Machine Learning: Gradient Boosting, Random Forests, SVMs
 - Deep Learning: Neural Networks, RNNs, LSTMs, Bayesian Neural Networks
 - Statistical Modeling: GLMs, Time Series (ARIMA, GARCH, etc.)
 
Contact
- Email: zeyu.ding@tu-dortmund.de
 - GitHub: zeyudsai
 - LinkedIn: Zeyu Ding