Research Overview

During my PhD at SMU, I focused on analyzing the internal structure of subatomic particles (like the pion) using parton distribution function models. Parton distribution functions (PDFs) are data-driven models derived from high-energy collider experiments, including the currently operating Large Hadron Collider (LHC) and the soon-to-be-built Electron-Ion Collider (EIC). These models describe how a particle’s momentum is distributed among its constituent quarks and gluons — a cornerstone of modern particle physics research.

I developed new modeling approaches for pion PDFs and analyzed how different experimental datasets affected the model’s results. The pion is crucial in holding atomic nuclei together (via the strong force), which is why it’s an important particle to study.

Although my work centers on structured, interpretable alternatives to neural networks, it draws on many of the same statistical foundations, including gradient descent optimization, chi-squared loss minimization, and model reliability techniques. I employed methods such as model verification, bootstrap-based uncertainty estimation, and comparative analysis across multiple parameterizations to assess assess model stability and robustness. This overlap gives me strong cross-compatibility with modern machine learning workflows and tools, and also enabled me to train and evaluate several baseline models as part of my broader data science development.

You can find all of my published work on Inspire HEP.

Fantômas4QCD

The Fantômas4QCD project resulted in the development of a custom PDF modeling module using Bézier curves as the core technique (essentially using them as universal function approximators). This approach combines the transparency of a simple polynomial model with the flexibility of a neural network, allowing us to explore a much wider range of viable solutions in QCD models.

L2 Sensitivity

To accurately interpret theoretical predictions, it’s important to understand how individual datasets influence the fitted model. L2 sensitivity quantifies this by calculating how much the chi-squared value changes when PDFs are varied by one standard deviation.

This method highlights which datasets exert the most pull on the fit — helping identify potential outliers, inconsistencies, or overreliance on specific data.

Data Visualization

Scientific results are only valuable if they can be effectively communicated. Throughout my research, I’ve emphasized clear data visualization to convey key findings in presentations, papers, and collaborations.

Using tools like Mathematica, ManeParse, and CERN’s ROOT, I’ve created visualizations that reveal model behavior, compare datasets, and highlight statistical significance in an accessible and digestible way.