David Kartchner

Researcher + Entrepreneur (ML + Biomedicine)

I research how to enable natural language processing (NLP) on new and dynamic problems by developing generative means of structuring data via large language models (LLMs) and knowledge graphs. I use these technologies to structure clinical data and biomedical research , enabling clinicians to customizably curate structured data from any unstructured text.
I have collaborated with researchers, developers, and clinicians while working at Enveda Biosciences, Facebook, GSK, Recursion Pharmaceuticals, and Intermountain Healthcare.

Education

2018 - 2023
Ph.D. in Computational Science & Engineering
Georgia Institute of Technology, Atlanta, GA
Advisor: Cassie Mitchell, Co-advisor: None
Thesis: Extracting and Structuring Information for Clinical Meta-Analysis and Drug Repurposing
Committee: Cassie Mitchell, Chao Zhang, Duen Horng "Polo" Chau, Jon Duke, Daniel Domingo-Fernández
2017 - 2018
M.S. in Mathematics
Brigham Young University, Provo, UT
Thesis: ActuarAI: Machine Learning Models for Patient Disease Forecasting and Representation
Committee: Jeffrey Humpherys, Tyler Jarvis, David Wingate
GPA: 4.00/4.00
Thesis
2010 - 2016
B.S. in Applied & Computational Mathematics
Brigham Young University, Provo, UT
Thesis: Walking the Walk: An Exploratory Analysis in Biometric Gait Recognition
Magna Cum Laude, University Honors Overall GPA: 3.96/4.00 Applied and Computational Mathematics Emphasis (ACME)
Thesis

Industry Experience

Sept 2022 - Present
Glassbox Health, Atlanta, GA
Co-Founder, CTO
Building an LLM-based assistant to provide personalized navigation of medical bills and healthcare costs
Summer 2022
Enveda Biosciences, Boulder, CO
Data Science Intern, Knowledge Graph
Mentor: Daniel Domingo-Fernandez, David Healey, Joe Davison
Performed systematic survey + implementation fo 20+ entity linking NLP models to improve accuracy evidence-based compound prioritization
Summer 2021
Facebook, Menlo Park, CA
Applied Research Science Intern, Enterprise Product Applied Research
Mentor: Minhazul Islam Sk
Designed and trained transformer-based semantic search document retrieval system to improve efficiency of customer support agents
Summer 2020
GlaxoSmithKline, Philadelphia, PA
Research Intern, AI/ML Engineering
Mentor: Anne Cocos
Built model jointly embed free-text entity mentions with structured entity knowledge graph for 30M research articles/abstracts and KG with 5M edges. Developed end-to-end pipeline to download, preprocess, and identify high-quality entity links for biomedical entities in 30M research articles. Engineered parallel model training workflow on distributed supercomputing cluster utilizing 10,000+ CPU cores and dozens of GPUs.
Summer 2018
Recursion Pharmaceuticals, Salt Lake City, UT
Data Science Intern, Machine Learning
Mentor: Andrew Blevins
Developed and deployed recommender system to infer biological mechanism of action and repurposing potential of 1M+ compounds
May 2016 - May 2018
Intermountain Healthcare, Salt Lake City, UT
Data Science Intern, Population Health Analytics
Mentor: Andy Merrill
Built and deployed models to forecast individual patient risk of chronic disease onset and long-term complex care from EHR and environmental data. Published in IEEE ICHI (2017) and AJRCCM (2018).

Academic Research Experience

Jan 2024 - Present
Georgia Institute of Technology, Atlanta, GA"
Postdoctoral Researcher, Laboratory for Pathology Dynamics
Natural language processing tools for the indexing, extraction, and synthesis of clinical knowledge from medical literature
Aug 2019 Dec 2023
Georgia Institute of Technology, Atlanta, GA
Graduate Research Assistant, Laboratory for Pathology Dynamics
Advisor: Cassie Mitchell
Member of the Laboratory of Pathology Dynamics where we use machine learning to build tools that identify and prioritize cures and optimize care for neurodegenerative diseases.
Aug 2018 - May 2019
Georgia Institute of Technology, Atlanta, GA
Graduate Research Assistant, School of Computational Science and Engineering
Mentor: Jimeng Sun
Conducted research in predicting chronic disease outcomes from electronic health records (EHR) and free-text clinical notes.
Jan 2017 - Aug 2018 Jan. 2013
Brigham Young University, Provo, UT
Graduate Research Assistant, Department of Mathematics
Advisor: Jeffrey Humpherys
Developed models to predict individual onset of chronic conditions from patient electronic health records (EHR). Published in IEEE ICHI (2017, 2018).

Honors and Awards

2018
National Science Foundation GRFP Honorable Mention
Learning to Prescribe Optimal Disease Treatment via Machine Learning
2015
Dean and Helen Robinson Scholarship
Scholarship given to outstanding undergraduates in mathematics for Putnam Mathematics competition
2016
BYU University Honors
Awarded to undergraduates who write a thesis complete requirements in leadership, service, and cross-disciplinary scholarship.
2010-2016
BYU Heritage Scholarship
Full-tuition merit based scholarship for incoming students
2011
Amberly Rupp "Circle of Honor" Essay Contest Award
1st-place in university-wide essay contest
2010
National Merit Scholarship
Merit-based scholarship awarded top <1% of incoming university students

Selected Publications*

A Comprehensive Evaluation of Biomedical Entity Linking Models
David Kartchner, Jennifer Deng, Shubham Lohiya, Tejasri Kopparthi, Prasanth Bathala, Daniel Domingo-Fernández, Cassie Mitchell
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.
Project PDF BibTeX
Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19
David Kartchner, Kevin McCoy, Janhvi Dubey, Dongyu Zhang, Kevin Zheng, Rushda Umrani, James Kim, Cassie Mitchell
Biology (Biology). 2023.
Project PDF BibTeX
Zero-Shot Information Extraction for Clinical Meta-Analysis using Large Language Models
David Kartchner, Irfan Al-Hussaini, Selvi Ramalingam, Olivia Kronick, Cassie Mitchell
22nd Workshop on Biomedical Natural Language Processing (BioNLP). Toronto, Canada, 2023.
Project PDF BibTeX
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis
David Kartchner, Irfan Al-Hussaini, Haydn Turner, Jennifer Deng, Shubham Lohiya, Prasanth Bathala, Cassie Mitchell
46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Taipei, Taiwan, 2023.
Project BibTeX
Rule-Enhanced Active Learning for Semi-Automated Weak Supervision
David Kartchner, Davi Nakajima An, Wendi Ren, Chao Zhang, Cassie Mitchell
AI (AI). Online, 2022.
Project PDF BibTeX
Machine Learning Methods for Diease Prediction with Claims Data
Tanner Christensen, Abraham Frandsen, Seth Glazier, Jeff Humpherys, David Kartchner
IEEE International Conference on Healthcare Informatics (ICHI). New York City, NY, USA, 2018.
Project PDF BibTeX DOI
Short-Term Elevation of Fine Particulate Matter Air Pollution and Acute Lower Respiratory Infection
Benjamin D. Horne, Elizabeth A. Joy, Michelle G. Hofmann, Per H. Gesteland, John B. Cannon, Jacob S. Lefler, Denitza P. Blagev, E. Kent Korgenski, Natalie Torosyan, Grant I. Hansen, David Kartchner, C. Arden Pope III
American Journal of Respiratory and Critical Care Medicine (AJRCCM). New York, NY, USA, 2018.
Project PDF BibTeX DOI

Volunteer & Leadership Experience

2017-2018
Student Alumni Relations Representative
College of Physical and Mathematical Sciences, Brigham Young University, Provo, UT
Organized college-wide student-alumni networking dinner. Organized fundraising event for student-to-student need-based scholarship program. Met regularly with dean to discuss and address student needs.
Nov 2011 - Nov 2013
Full-time Missionary and Representative
Church of Jesus Christ of Latter-day Saints, San Pablo, Philippines
Taught lessons in Tagalog language designed to strengthen families and communities. Organized quarterly conference and trainings for volunteers across six cities. Gathered and analyzed organizational data for regional leadership. Organized and coordinated community service projects with local leaders.

Technical Skills

Mathematics: Matrix Analysis, Complex Analysis, Functional Analysis, Numerical Linear Algebra, Control Theory, Probability Theory, Parallel Computing, Algorithm Design, Linear & Nonlinear Optimization, Active Learning, Advanced Econometrics, Abstract Algegra, Differential Equations

Machine Learning: Natural Language Processing (NLP), Large Language Models (LLMs), Knowledge Graphs, Deep Learning, Bayesian Statistics, Computer Vision, Semi-Supervised Learning, Weak Supervision, Information Retrieval

Packages: Pytorch, Pandas, SpaCy, NLTK, RDKit, Huggingface, LangChain, OpenAI

Programming: Python, R, Stata, Mathematica

Web: HTML, Web scraping, SQL, Cypher, LaTeX, Markdown, Jekyll, Git, Google API suite

Visualization: Figma, Seaborn, Bokeh, Draw.io

Languages: English (Native), Tagalog (Professional), Spanish (Intermediate), German (Intermediate)

References

Dr. Cassie Mitchell, Associate Professor
School of Biomedical Engineering
Georgia Institute of Technology
Dr. Jeff Humpherys, Chief Data Scientist
Harbor Health
Dr. Tyler Jarvis, Director and Cofounder
Applied and Computational Mathematics Program
Brigham Young University
Dr. David Healey, Vice President of Data Science
Enveda Biosciences
Dr. Chao Zhang, Assistant Professor
School of Computational Science and Engineering
Georgia Institute of Technology
Dr. Jon Duke, Principal Research Scientist
Georgia Tech Research Institute
Georgia Institute of Technology