Researcher + Entrepreneur (ML + Biomedicine)
I research how to enable natural language processing (NLP) on new and dynamic problems by developing generative means of structuring data via large language models (LLMs) and knowledge graphs (KGs). I use these technologies to structure clinical data and biomedical research , enabling clinicians to customizably curate structured data from any unstructured text.
I have collaborated with researchers, developers, and clinicians while working at Enveda Biosciences, Facebook, GSK, Recursion Pharmaceuticals, and Intermountain Healthcare.
2018 - 2023
Ph.D. in Computational Science & Engineering
2017 - 2018
M.S. in Mathematics
GPA: 4.00/4.00
2010 - 2016
B.S. in Applied & Computational Mathematics
Magna Cum Laude, University Honors
Overall GPA: 3.96/4.00
Applied and Computational Mathematics Emphasis (ACME)
Industry Experience
March 2024 - Present
Mehraveh Salehi,
Saman Zarandioon
Training agent-based LLM systems to perform end-to-end medical research with real-world data. Designed and implemented LLM evaluation suite automatically evaluate the quality of LLM outputs. Designed multi-agent LLM systems to automatically identify and correct erroneous training data. Current projects are focused on automatically evaluating LLM performance on medical tasks and using LLMs to systematically probe for areas of model weakness
Sept 2022 - April 2024
Built an LLM-based assistant to provide personalized navigation of medical bills and healthcare costs. Our service reduced medical bills by 67% on average across all uses.
Summer 2022
Daniel Domingo-Fernandez,
David Healey,
Joe Davison
Performed systematic survey + implementation fo 20+ entity linking NLP models to improve accuracy evidence-based compound prioritization
Summer 2021
Minhazul Islam Sk
Designed and trained transformer-based semantic search document retrieval and ranking system to improve efficiency of customer support agents
Summer 2020
Anne Cocos
Built model jointly embed free-text entity mentions with structured entity knowledge graph for 30M research articles/abstracts and KG with 5M edges. Developed end-to-end pipeline to download, preprocess, and identify high-quality entity links for biomedical entities in 30M research articles. Engineered parallel model training workflow on distributed supercomputing cluster utilizing 10,000+ CPU cores and dozens of GPUs.
Nov 2018 - Aug 2019
Created credit scoring model and interactive job density visualizations to move into new domestic markets.
Summer 2018
Andrew Blevins
Developed and deployed recommender system to infer biological mechanism of action and repurposing potential of 1M+ compounds
May 2016 - May 2018
Andy Merrill
Built and deployed models to forecast individual patient risk of chronic disease onset and long-term complex care from EHR and environmental data. Published in IEEE ICHI (2017) and AJRCCM (2018).
Summer 2015
Analyzed public loan data to predict consumer default on personal loans.
Academic Research Experience
Jan 2024 - March 2024
Trained LLMs to index, extract, and synthesize clinical knowledge from medical literature
Aug 2019 Dec 2023
Designed NLP-based machine learning systems for automated discovery of high-value biological hypotheses via knowledge graph construction and inference. Further developed systems for identifying, extracting, and synthesizing relevant data from biomedical literature to quantatitively validate medical hypotheses via AI-driven meta analysis. Publications featured in NeurIPS, ACL, EMNLP, and SIGIR.
Aug 2018 - May 2019
Jimeng Sun
Conducted research in predicting chronic disease outcomes from electronic health records (EHR) and free-text clinical notes.
Summer 2019; Fall 2022
Teaching assistant for machine learning and intro to graduate computing courses
Jan 2017 - Aug 2018
Jeffrey Humpherys
Developed models to predict individual onset of chronic conditions from patient electronic health records (EHR). Published in IEEE ICHI (2017, 2018).
Fall 2022
Graduate Teaching Assistant
Graded homework, held weekly office hours, and mentored student for CSE 6010, an introduction to graduate and parallel computing in C
Summer 2019
Graduate Teaching Assistant
Designed homeworks, graded homework, held weekly office hours, and mentored student on team projects for CX 4240, an undergraduate introduction to machine learning
Spring 2019
Invited Guest Lecturer
Presented a week of lectures on web scraping, tweet streaming, and natural language processing for Master's of Analytics program
Aug 2017 - April 2018
Graduate Teaching Assistant
Graded homeworks, taught lectures, designed curriculum, and mentored students on team projects for Math 322 and 324, a rigorous two-semester course on probabilistic mathematics and machine learning
Summer 2017
Invited Lecturer
Taugh lectures on Markov Chain Monte Carlo (MCMC) to group of visiting scholars and professionals from the Philippines
Spring 2017
Graduate Teaching Assistant
Graded homeworks, held office hours, and reviewed concepts with students for Math 371, an undergraduate abstract algebra course.
Aug 2016 - April 2017
Taught and graded weekly lab on data analysis to cohort of 35 undergraduates. Topics covered included data cleaning and analysis in python, SQL, bash shell, regular expressions, MongoDB, web scraping/crawling, and interactive visualization.
Spring 2016
Teaching Assistant
Graded homeworks, held office hours, and taught reviews for class of Econ 380, an undergraduate econometrics course
Fall 2014
Teaching Assistant
Graded homeworks, held office hours, and taught reviews for class of Econ 378, an undergraduate statistics course
Summer 2014
Teaching Assistant
Graded homeworks, held office hours, and taught reviews for class of Econ 381, an undergraduate microenomics course
Tutored undergraduates in calculus, linear algebra, and economics. Also tutored wide range of high school subjects.
Honors and Awards
1st Place and People's Choice, Georgia Tech Startup Exchange Pitch Competition
Medical billing startup to identify and correct errors in patient medical bills
National Science Foundation GRFP Honorable Mention
Learning to Prescribe Optimal Disease Treatment via Machine Learning
Dean and Helen Robinson Scholarship
Scholarship given to outstanding undergraduates in mathematics for Putnam Mathematics competition
BYU University Honors
Awarded to undergraduates who write a thesis complete requirements in leadership, service, and cross-disciplinary scholarship.
BYU Heritage Scholarship
Full-tuition merit based scholarship for incoming students
Amberly Rupp "Circle of Honor" Essay Contest Award
1st-place in university-wide essay contest
National Merit Scholarship
Merit-based scholarship awarded top <1% of incoming university students
Selected Publications
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.
title={A Comprehensive Evaluation of Biomedical Entity Linking Models},
author={Kartchner, David and Deng, Jennifer and Lohiya, Shubham and Kopparthi, Tejasri and Bathala, Prasanth and Domingo-Fern\'andez, Daniel and Mitchell, Cassie S},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
Biology (Biology). 2023.
title={Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19},
publisher={MDPI AG},
author={Kartchner, David and McCoy, Kevin and Dubey, Janhvi and Zhang, Dongyu and Zheng, Kevin and Umrani, Rushda and Kim, James J. and Mitchell, Cassie S.},
22nd Workshop on Biomedical Natural Language Processing (BioNLP). Toronto, Canada, 2023.
46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Taipei, Taiwan, 2023.
author = {Kartchner, David and Al-Hussaini, Irfan and Turner, Haydn and Deng, Jennifer and Lohiya, Shugham and Bathala, Prasanth and Mitchell, Cassie},
title = {BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis},
year = {2023},
maintitle = {SIGIR},
booktitle = {46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
AI (AI). Online, 2022.
title={Rule-Enhanced Active Learning for Semi-Automated Weak Supervision},
author={Kartchner, David and Nakajima An, Davi and Ren, Wendi and Zhang, Chao and Mitchell, Cassie S},
IEEE International Conference on Healthcare Informatics (ICHI). New York City, NY, USA, 2018.
title={Machine learning methods for disease prediction with claims data},
author={Christensen, Tanner and Frandsen, Abraham and Glazier, Seth and Humpherys, Jeffrey and Kartchner, David},
booktitle={2018 IEEE International Conference on Healthcare Informatics (ICHI)},
Benjamin D. Horne,
Elizabeth A. Joy,
Michelle G. Hofmann,
Per H. Gesteland,
John B. Cannon,
Jacob S. Lefler,
Denitza P. Blagev,
E. Kent Korgenski,
Natalie Torosyan,
Grant I. Hansen,
David Kartchner,
C. Arden Pope III
American Journal of Respiratory and Critical Care Medicine (AJRCCM). New York, NY, USA, 2018.
title={Short-term elevation of fine particulate matter air pollution and acute lower respiratory infection},
author={Horne, Benjamin D and Joy, Elizabeth A and Hofmann, Michelle G and Gesteland, Per H and Cannon, John B and Lefler, Jacob S and Blagev, Denitza P and Korgenski, E Kent and Torosyan, Natalie and Hansen, Grant I and others},
journal={American journal of respiratory and critical care medicine},
publisher={American Thoracic Society}
All Publications
Biology (Biology). 2023.
title={Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19},
publisher={MDPI AG},
author={Kartchner, David and McCoy, Kevin and Dubey, Janhvi and Zhang, Dongyu and Zheng, Kevin and Umrani, Rushda and Kim, James J. and Mitchell, Cassie S.},
AI (AI). Online, 2022.
title={Rule-Enhanced Active Learning for Semi-Automated Weak Supervision},
author={Kartchner, David and Nakajima An, Davi and Ren, Wendi and Zhang, Chao and Mitchell, Cassie S},
Kevin McCoy,
Sateesh Gudapati,
Lawrence He,
Elaina Horlander,
David Kartchner,
Soham Kulkarni,
Nidhi Mehra,
Jayant Prakash,
Helena Thenot,
Sri Vivek Vanga,
Abigail Wagner,
Brandon White,
Cassie Mitchell
Pharnaceutics (Pharm). Online, 2021.
title={Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19},
author={McCoy, Kevin and Gudapati, Sateesh and He, Lawrence and Horlander, Elaina and Kartchner, David and Kulkarni, Soham and Mehra, Nidhi and Prakash, Jayant and Thenot, Helena and Vanga, Sri Vivek and others},
publisher={Multidisciplinary Digital Publishing Institute}
Benjamin D. Horne,
Elizabeth A. Joy,
Michelle G. Hofmann,
Per H. Gesteland,
John B. Cannon,
Jacob S. Lefler,
Denitza P. Blagev,
E. Kent Korgenski,
Natalie Torosyan,
Grant I. Hansen,
David Kartchner,
C. Arden Pope III
American Journal of Respiratory and Critical Care Medicine (AJRCCM). New York, NY, USA, 2018.
title={Short-term elevation of fine particulate matter air pollution and acute lower respiratory infection},
author={Horne, Benjamin D and Joy, Elizabeth A and Hofmann, Michelle G and Gesteland, Per H and Cannon, John B and Lefler, Jacob S and Blagev, Denitza P and Korgenski, E Kent and Torosyan, Natalie and Hansen, Grant I and others},
journal={American journal of respiratory and critical care medicine},
publisher={American Thoracic Society}
The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.
title={A Comprehensive Evaluation of Biomedical Entity Linking Models},
author={Kartchner, David and Deng, Jennifer and Lohiya, Shubham and Kopparthi, Tejasri and Bathala, Prasanth and Domingo-Fern\'andez, Daniel and Mitchell, Cassie S},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Taipei, Taiwan, 2023.
author = {Kartchner, David and Al-Hussaini, Irfan and Turner, Haydn and Deng, Jennifer and Lohiya, Shugham and Bathala, Prasanth and Mitchell, Cassie},
title = {BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis},
year = {2023},
maintitle = {SIGIR},
booktitle = {46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
Findings of EMNLP (EMNLP (Findings)). Online, 2020.
title = "Denoising Multi-Source Weak Supervision for Neural Text Classification",
author = "Ren, Wendi and
Li, Yinghao and
Su, Hanting and
Kartchner, David and
Mitchell, Cassie and
Zhang, Chao",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "",
doi = "10.18653/v1/2020.findings-emnlp.334",
pages = "3739--3754"
IEEE International Conference on Healthcare Informatics (ICHI). New York City, NY, USA, 2018.
title={Machine learning methods for disease prediction with claims data},
author={Christensen, Tanner and Frandsen, Abraham and Glazier, Seth and Humpherys, Jeffrey and Kartchner, David},
booktitle={2018 IEEE International Conference on Healthcare Informatics (ICHI)},
IEEE International Conference on Healthcare Informatics (ICHI). Park City, UT, USA, 2017.
title={Code2vec: Embedding and clustering medical diagnosis data},
author={Kartchner, David and Christensen, Tanner and Humpherys, Jeffrey and Wade, Sean},
booktitle={2017 IEEE International Conference on Healthcare Informatics (ICHI)},
IEEE International Conference on Healthcare Informatics (ICHI). Park City, UT, USA, 2017.
title={Cost reduction via patient targeting and outreach: a statistical approach},
author={Kartchner, David and Merrill, Andy and Wrathall, Jonathan},
booktitle={2017 IEEE International Conference on Healthcare Informatics (ICHI)},
22nd Workshop on Biomedical Natural Language Processing (BioNLP). Toronto, Canada, 2023.
Human and Model-in-the-Loop Evaluation and Training Stragegies Workshop, NeurIPS (HAMLETS). Online, 2020.
author = {Kartchner, David and Ren, Wendi and Nakajima An, Davi and Zhang, Chao and Mitchell, Cassie},
title = {ReGAL: Rule-Generative Active Learning for Model-in-the-Loop Weak Supervision},
year = {2020},
maintitle = {Neural Information Processing Systems},
booktitle = {Human and Model-in-the-Loop Evaluation and Training Stragegies Workshop},
Biomedical Engineering Society Annual Meeting (BMES). San Antonio, TX, USA, 2022.
American Mathematical Society Joint Meeting on Mathematics (ANA). Seattle, WA, USA, 2022.
American Neurological Association Annual Meeting (ANA). Online, 2021.
Nidhi Mehra,
Jeongjin Lee,
Helena Thenot,
Sparsh Kudrimoti,
Brandon White,
David Kartchner,
Sateesh Gudapati,
Jayant Prakash,
Vivek Vanga,
Cassie Mitchell
Biomedical Engineering Society Annual Meeting (BMES). Online, 2020.
Biomedical Engineering Society Annual Meeting (BMES). Online, 2020.
Nidhi Mehra,
Brandon White,
David Kartchner,
Helena Thenot,
Lawrence He,
Elaina Horlander,
Sateesh Gudapati,
Jayant Prakash,
Vivek Vanga,
Cassie Mitchell
Biomedical Engineering Society Annual Meeting (BMES). Online, 2020.
David Kartchner,
Haydn Turner,
Christophe Ye,
Irfan Al-Hussaini,
Jennifer Deng,
Zihan Wei,
Shubham Lohiya,
Prasanth Bathala,
Courtney Curtis,
Eva Duvaris,
Coral Jackson,
Sarah Tan,
Hannah Cho,
Cassie Mitchell
Preprint. 2024.
M1 2017.
title={Forward thinking: Building deep random forests},
author={Miller, Kevin and Hettinger, Chris and Humpherys, Jeffrey and Jarvis, Tyler and Kartchner, David},
journal={arXiv preprint arXiv:1705.07366},
LLMs in Neurology: Literature Review, Drug Repurposing, and Beyond
July 2024
American Neurological Association Annual Meeting
Automatic Extraction and Synthesis of Biomedical Data for AI-Driven Systematic Review and Meta-Analysis
March 2024
Allen Institute for Artificial Intelligence
Automated extraction and synthesis of biomedical data for AI-driven systematic review and meta-analysis
Dec 2023
Georgia Tech PhD Thesis Defense
Accelerating Biomedical Discovery with Knowledge Graphs and Weakly Supervised Learning
May 2022
Georgia Tech PhD Thesis Proposal
Biomedical Information Extraction
Mar. 2021
Brigham Young University, Machine Learning for Health Class
ReGAL: Rule-Guided Active Learning for Deep Text Classification
Oct. 2020
Georgia Tech HotCSE Seminar
Survey of Knowledge Graph Embedding Rechniques
Jul. 2020
Extracting Actionable Insights from Biomedical Text
Mar. 2019
Georgia Tech PhD Qualifying Exam Oral Defense
ActuarAI: Machine Learning Models for Patient Disease Forecasting and Representation
Jul. 2018
Brigham Young University Masters Thesis Defense
Walking the Walk: An Exploratory Analysis in Biometric Gait Recognition
Nov. 2016
Brigham Young University Honors Thesis Defense
October 2022
Apr 2018
Spring 2024
B.S. Biomedical Engineering, Georgia Institute of Technology
Automated clinical meta analysis using natural language processing
Spring 2024
B.S. Biomedical Engineering, Georgia Institute of Technology
Automated clinical meta analysis using natural language processing
Spring 2024
B.S. Biomedical Engineering, Georgia Institute of Technology
Automated clinical meta analysis using natural language processing
Spring 2024
B.S. Biomedical Engineering, Georgia Institute of Technology
Automated clinical meta analysis using natural language processing
Spring 2024
B.S. Biomedical Engineering, Georgia Institute of Technology
Automated clinical meta analysis using natural language processing
Dec 2023 - Present
Ph.D. in Electrical and Computer Engineering, Georgia Institute of Technology
Biomedical entity linking with global, cross-domain context
Dec 2023 - Present
Ph.D. in Electrical and Computer Engineering, Georgia Institute of Technology
Biomedical entity linking with global, cross-domain context
Fall 2023 - Present
B.S. in Computer Science, Georgia Institute of Technology
Automating clinical data extraction with LLMs
Fall 2023 - Present
B.S. in Computer Science, Georgia Institute of Technology
Automating clinical meta-analysis with LLMs
Fall 2022 - Present
B.S. in Computer Science, Georgia Institute of Technology
Entity linking for automated knowledge graph construction; automating clinical data extraction and meta-analysis with LLMs; weakly supervised document classification and filtering
Fall 2022 - Present
M.S. in Computer Science, Georgia Institute of Technology
Entity linking for automated knowledge graph construction; automating clinical data extraction with LLMs
Spring 2022 - Present
M.S. in Computer Science, Georgia Institute of Technology
Entity linking for automated knowledge graph construction; automating clinical data extraction with LLMs
Fall 2021 - Fall 2023
B.S. in Biomedical Engineering, Georgia Institute of Technology
Automating biomedical meta-analysis via human-in-the-loop natural language processing
Fall 2022 - Spring 2023
M.S. in Biomedical Engineering, Georgia Institute of Technology
Automating clinical data extraction with LLMs
Fall 2022
M.S. in Computer Science, Georgia Institute of Technology
Entity linking for automated knowledge graph construction
Fall 2022
B.S. in Biomedical Engineering, Georgia Institute of Technology
Discovering causes of COVID-19 induced cardiovascular complications via text mining and knowldge graph analysis
Spring 2022 - Fall 2022
B.S. in Computer Science, Georgia Institute of Technology
Automating biomedical meta-analysis via human-in-the-loop natural language processing
Fall 2019 - Spring 2022
B.S. in Computer Science, Georgia Institute of Technology
Text mining and knowledge graph completion
Fall 2021 - Spring 2022
B.S. in Biomedical Engineering, Georgia Institute of Technology
Text mining for drug repurposing and mechanism of action prediction in COVID-19 and Cardiovascular Disease
Sigma Xi Undergraduate Research Award, Georgia Institute of Technology
Spring 2021
B.S. in Biomedical Engineering
Annotation pipelines for biomedical information extraction
Spring 2021
B.S. in Biomedical Engineering
Annotation pipelines for biomedical information extraction
Now: Optimized Operations Engineer at
Fall 2021
B.S. in Biomedical Engineering, Georgia Institute of Technology
Automating biomedical meta-analysis via human-in-the-loop natural language processing
Spring - Summer 2020
M.S. in Computer Science, Georgia Institute of Technology
Building a knowledge graph for COVID-19
Now: Senior Software Engineer at
Volunteer & Leadership Experience
Community Outreach
2024 - Present
Community Service Coordinator
Coordinate local service projects for community members with health challenges and impaired mobility. Service included organizing response to Nov. 2024 bomb cyclone that left 600k+ individuals without power.
2018 - 2023
Disaster Response Volunteer
Volunteer to clear debris and repair home damage caused by hurricanes in southeastern United States.
2019 - 2022
Youth Mentor
Organize community service projects and teach leadership & life skills to youth ages 8-17
Fall 2019
English Teacher
Taught semester-long English as a second language course for immigrants to United States
Spring 2015
Youth Mentor
Met weekly with elementary students to teach academic and life skills
Volunteer Translator
Provided occasional translation services to Tagalog-speaking visitors to BYU. Translation servies provided for visiting dignitaries at international Law and Religion symposium and Filipino missionaries receiving training prior to full-time service.
Student Alumni Relations Representative
Organized college-wide student-alumni networking dinner. Organized fundraising event for student-to-student need-based scholarship program. Met regularly with dean to discuss and address student needs.
Nov 2011 - Nov 2013
Full-time Missionary and Representative
Taught lessons in Tagalog language designed to strengthen families and communities. Organized quarterly conference and trainings for volunteers across six cities. Gathered and analyzed organizational data for regional leadership. Organized and coordinated community service projects with local leaders.
2010 - 2011
Regularly visited with seniors confined to local nursing homes to provide friendship and emotional support.
2009 - 2010
Assisted with local community outreach events including food drives, civil rights benefits fundraiser, and community health fair.
Empirical Methods in Natural Language Processing
Neural Information Processing Systems
BYU CPMS Student Alumni Relations Representative, 2019 - 2020
2020 — Present
Association of Computational Linguistics (ACL)
2017 — 2018
Society of Industrial and Applied Mathematics (SIAM)
2010 - 2016
Phi Eta Sigma Honor Society
Technical Skills
Matrix Analysis,
Complex Analysis,
Functional Analysis,
Numerical Linear Algebra,
Control Theory,
Probability Theory,
Parallel Computing,
Algorithm Design,
Linear & Nonlinear Optimization,
Active Learning,
Advanced Econometrics,
Abstract Algegra,
Differential Equations
Machine Learning:
Natural Language Processing (NLP),
Large Language Models (LLMs),
Knowledge Graphs,
Deep Learning,
Bayesian Statistics,
Computer Vision,
Semi-Supervised Learning,
Weak Supervision,
Information Retrieval
Web scraping,
Google API suite
English (Native),
Tagalog (Professional),
Spanish (Intermediate),
German (Intermediate)
School of Biomedical Engineering
Georgia Institute of Technology
Harbor Health
Applied and Computational Mathematics Program
Brigham Young University
Enveda Biosciences
School of Computational Science and Engineering
Georgia Institute of Technology
Georgia Tech Research Institute
Georgia Institute of Technology