
Education
- Research Interests:
- Causal Inference
- Data Science
- Statistical Machine Learning
B.A. in Mathematics, Texas Christian University (2019 - 2022)
Minors in Computer Science and Economics
Honors Thesis: Statistical Models for Reading Count Data
Experiences
ResearcherBoston University (January 2024 - present)
- Establishing theoretical framework on consistency and asymptotic normality of multi-component intervention packages in a newly-developed adaptive clinical trial process using probability theory, statistical inference, and empirical processes
- Developing a mixed-effect model with center characteristics to solve the confounding by indication issue in the trial process Statistical Consultant
- Managed several teams with a total of 20+ master’s-level students to facilitate statistical analysis for a wide range of research projects conducted by university researchers, averaging 6-7 completed projects every 4 months
- Developed statistical models using machine learning, causal inference, and predictive modeling in various fields, including healthcare, psychological, and social sciences research
- Analyzed data, proposed appropriate statistical methods, created informative data visualizations, and evaluated results to ensure the integrity and validity of research projects, specifically to mitigate risk and bias in experimentation and findings
- Developed 5 discrete/count data models with different regularization configurations and conducted simulations to measure reading accuracy for elementary school students using students’ reading ability data
- Published the research paper in the Journal of Applied Statistics in July 2022 (DOI: 10.1080/02664763.2022.2103101)
- Built scripts to transfer historical batches in type 2 slowly changing dimension table type from conformed zone to processed zone in SQL (HiveQL) for business uses in Policy, Product, and Transactions
- Assisted in developing automated scripts/jobs (Python, PySpark, Glue) to transfer the data from AWS S3 bucket to Redshift
- Collected URLs and certificates associated with applications to populate cybersecurity platform to have an easier tracking system for the cybersecurity team during cyber-attack events
- Built a sorting algorithm in Excel to manage the report of employees taking training courses
- Built two predictive classification models in Python to detect smoker propensity and liar detection using user profile data to help reduce insurance fraud
- Preprocessed the data using SQL on Hive/Spark, performed exploratory data analysis using Matplotlib/Seaborn in Python, developed a supervised machine learning model with XGBoost to achieve AUC score of 0.74 and precision score of 0.9
- Presented results to life executive leadership and recommended deployment options, planned deployment H2 2020
- Developed a Python application using Requests to scrape NOAA weather spotter and radar data for claims use cases
- Create study materials and plans for 6 to 10 student-athletes every semester to succeed at various subjects, including lower division and upper-level mathematics from Pre-Calculus to Discrete Math, beginner to intermediate Economics
Boston University (September 2023 – Present)
Texas Christian University (August 2020 - May 2022)
New York Life Insurance (May 2021 - August 2021)
COUNTRY Financial (May 2020 - August 2020)
Texas Christian University (November 2019 - May 2022)
Publications
Minh Thu Bui, Cornelis J. Potgieter & Akihito Kamata (2022) Penalized likelihood methods for modeling count data, Journal of Applied Statistics,
DOI: 10.1080/02664763.2022.2103101
Presentations & Talks
Minh Thu Bui." Penalized Likelihood Methods for Modeling of Reading Count Data"- Honors College Student Research Symposium (SRS), Texas Christian University, April 2022
- 16th Annual Texas Undergraduate Mathematics Conference (TUMC), Virtual, October 2021