About
I am an Assistant Professor in the Department of Statistics at George Mason University.
Prior to joining Mason, I obtained a PhD in Statistics in 2020 from the University of British Columbia where I was advised by Dr. Gabriela Cohen Freue. Before that, I obtained a Master of Science in Statistics from the Vienna University of Technology under supervision by Prof. Peter Filzmoser.
My research agenda comprises methodological and computational aspects of robust estimation in high-dimensional problems as well as their application to Biomedical Sciences. I am working on statistical methods with reliable performance under the presence of adverse contamination anywhere in the numerous features of the data.
For regression problems, for instance, I work on estimators which are resilient to outliers in the response but also to unusual values in the (potentially) explanatory variables. If not handled appropriately, unusual values in the explanatory variables can have a much more detrimental affect on the analysis than outliers in the response alone.
Selected projects
Improving phenological modeling via participatory science
Can we guide the wisdom of the crowds to solve complex scientific problems?
We aim at improving phenological modeling by sourcing models from citizen scientists. The starting point for the project was our First International Cherry Blossom Prediction Competition, where we asked participants to predict the peak bloom date of cherry trees in four locations: Washington, D.C., Vancouver, BC, Kyoto, Japan, and Liestal-Weideli in Switzerland.
In the news
The 2021 edition of the prediction competition has been featured prominently in the news. For example, The Weather Network, CBC Radio, Public Radio's The World, the Vancouver Sun, and the Daily Hive reported on the competition and how it can inform future research.
Find out more about the competitionRobust estimation and variable selection in high-dimensional models
Outliers and contamination in data sets with many variables make common tasks like variable selection and parameter estimation a very daunting task. We are working on statistical methods and computational strategies for estimation, variable selection, and hyper-parameter selection to leverage as much information from the data set as possible, without being affected by misleading values.
Teaching
George Mason University
- STAT 463 – Exploratory Data Analysis: Fall 2023
- STAT 634 – Case Studies in Data Analysis: Spring 2022, Spring 2021
- STAT 665 – Categorical Data Analysis: Fall 2023, Fall 2021
- STAT 778 – Statistical Computing: Spring 2024, Spring 2023
University of British Columbia
- STAT 305 – Introduction to Statistical Inference: Spring 2020
Publications
- Auerbach, J., Crimmins, T. M., Kepplinger, D., Lin, R., & Wolkovich, E. M. (2024). A nonparametric bayesian model of citizen science data for monitoring environments stressed by climate change. Submitted.
- Auerbach, J., Kepplinger, D., & Rios, N. (2024). What is data science? A closer look at science’s latest priority dispute. Real World Data Science. https://doi.org/10.5281/zenodo.10679962
- Chang, J. J., Kepplinger, D., Metter, E. J., Kim, Y., Trankiem, C. T., Felbaum, D. R., Mai, J. C., Mason, R. B., Armonda, R. A., & Aulisi, E. F. (2024). Time thresholds for using pressure reactivity index in neuroprognostication for patients with severe traumatic brain injury. Neurosurgery. https://doi.org/10.1227/neu.0000000000002876
- Tehrani, B. N., Sherwood, M. W., Damluji, A. A., Epps, K. C., Bakhshi, H., Cilia, L., Dassanayake, I., Eltebaney, M., Gattani, R., Howard, E., Kepplinger, D., Ofosu-Somuah, A., & Batchelor, W. B. (2024). A randomized comparison of radial artery intimal hyperplasia following distal vs. Proximal transradial access for coronary angiography: PRESERVE RADIAL trial. Journal of the American Heart Association. https://doi.org/10.1161/JAHA.123.031504
- Vasas, V., Lowell, M. C., Villa, J., Jamison, Q. D., Siegle, A. G., Katta, P. K. R., Bhagavathula, P., Kevan, P. G., Fulton, D., Losin, N., Kepplinger, D., Yetzbacher, M. K., Salehian, S., Forkner, R. E., & Hanley, D. (2024). Recording animal-view videos of the natural world using a novel camera system and software package. PLOS Biology, 22(1), 1–31. https://doi.org/10.1371/journal.pbio.3002444
- Chang, J. J., Kepplinger, D., Metter, E. J., Felbaum, D. R., Mai, J. C., Armonda, R. A., & Aulisi, E. F. (2023). Pressure reactivity index for early neuroprognostication in poor-grade subarachnoid hemorrhage. Journal of the Neurological Sciences, 450, 120691. https://doi.org/10.1016/j.jns.2023.120691
- Kepplinger, D. (2023). Robust variable selection and estimation via adaptive elastic net S-estimators for linear regression. Computational Statistics & Data Analysis, 183. https://doi.org/10.1016/j.csda.2023.107730
- Leonard, J., Kepplinger, D., Espina, V., Gillevet, P., Ke, Y., Birukov, K. G., Doctor, A., & Hoemann, C. D. (2023). Whole blood coagulation in an ex vivo thrombus is sufficient to induce clot neutrophils to adopt a myeloid-derived suppressor cell signature and shed soluble lox-1. Journal of Thrombosis and Haemostasis. https://doi.org/10.1016/j.jtha.2023.12.014
- Kepplinger, D., & Cohen Freue, G. V. (2022). Robust prediction and protein selection with adaptive PENSE. In T. Burger (Ed.), Statistical analysis of proteomic data: Methods and tools. Springer US. https://doi.org/10.1007/978-1-0716-1967-4
- Cohen Freue, G. V., Kepplinger, D., Salibián-Barrera, M., & Smucler, E. (2019). Robust elastic net estimators for variable selection and identification of proteomic biomarkers. Annals of Applied Statistics, 13(4), 2065–2090.
- Kepplinger, D., Takhar, M., Sasaki, M., Hollander, Z., Smith, D., McManus, B., McMaster, W. R., Ng, R. T., & Freue, G. V. C. (2017). PGCA: An algorithm to link protein groups created from MS/MS data. PLoS ONE, 12(5). https://doi.org/10.1371/journal.pone.0177569
- Kepplinger, D., Filzmoser, P., & Varmuza, K. (2017). Variable selection with genetic algorithms using repeated cross-validation of PLS regression models as fitness measure. arXiv e-Prints.
- Kepplinger, D., Templ, M., & Upadhyaya, S. (2013). Analysis of energy intensity in manufacturing industry using mixed-effects models. Energy, 59, 754–763. http://doi.org/10.1016/j.energy.2013.07.003
A complete list of publications, conference presentations, and other research experience can be found in my CV.
Lab members
Yang Long
PhD Candidate at Mason
Yang Long is a member of my lab since 2021. His primary focus is on improving computation of robust regularized regression estimators by direct minimization of the non-convex objective function.
Siqi Wei
PhD Candidate at Mason
Siqi Wei joined my lab in 2021 and currently works on improving hyper-parameter selection for robust regularized regression estimators and general non-convex estimators.
Code
I am maintaining several stable R packages on CRAN and Bioconductor as well as a few experimental software tools available on my GitHub and GitLab pages.
pense
Implementation of penalized adaptive Elastic Net S/MM-Estimators of Regression
Robust penalized adaptive elastic net S- and MM-estimators for linear regression.
More info View on CRANexaminr
Create online exams from R markdown documents.
Write online exams as R markdown documents and publish them as shiny app. Allows for randomized exams, different question types (including R coding questions), and grading of submissions.
More infopyinit
Peña-Yohai Initial Estimator for Robust S-Regression
Fast and deterministic procedure to compute initial estimates for robust S-estimators of regression using as described in Peña-Yohai (1999).
View on Github View on CRANgaselect
Genetic algorithms for variable selection.
Multi-threaded genetic algorithms applicable to a wide range of variable selection methods, but particularly suited for Partial Least Squared Regression.
View on CRANcomplmrob
Robust Linear Regression with Compositional Covariates
Methods for robustly fitting regression models where the explanatory variables are compositional. Includes bootstrap methods for classical robust regression and compositional robust regression.
View on CRANnsoptim
Algorithms for non-smooth optimization
C++ template library, wrapped in an R package, providing modern and fast algorithms for optimizing non-smooth functions (e.g., L1 regularized objective functions).
View on GitLabPGCA
Link Protein Groups Created from MS/MS Data
Protein Group Code Algorithm (PGCA) is a computationally inexpensive algorithm to merge protein summaries from multiple experimental quantitative proteomics data.
View on Bioconductor