Unsupervised Ranking of Treatment-Related Infection Risk Factors in Pediatric Acute Leukemia
Abstract
Introduction: Acute leukemia (AL) pediatric patients with Acute Myeloid or Lymphoblastic Leukemia undergo intense anti-neoplastic treatments including chemotherapy, hematopoietic stem cell transplant (HSCT), and immunotherapy. While survival rates have improved, treatment-related infection remains a dominant morbidity and cause of death. Antibacterial prophylaxis is induced to counteract therapeutic neutropenia, but must be prescribed sparingly. The goal of this unsupervised machine learning (ML) literature mining was to identify which factors should be included into a future supervised ML algorithm to optimally predict when AL patients should receive antibacterial prophylaxis.
Materials and Methods: SemNet is a novel literature based discovery (LBD) tool that utilizes a Neo4j graphical network of heterogeneous semantic relationships harvested from 30 million+ peer-reviewed PubMed articles. This biomedical concept graph consists of approximately 300,000 nodes connected by nearly 20,000,000 edges; nodes represent Unified Medical Language System (UMLS) literature concepts, while edges represent predications (affects, modulates, increases, etc.). SemNet queries return a list of potential risk factors related to specified target nodes. These risk factors are then ranked by their relative predictive strength using two scoring methods: hetesim and novelty. Hetesim scores employ unsupervised ML to rank features agnostically, generating rankings normalized with respect to literature prevalence, while novelty scores downweight literature prevalence to emphasize lesser-investigated predictive factors that suggest higher importance. Twenty simulations ranked risk factors related to either “AML” or “ALL” and “Infection” target nodes, one run for each node type of degree >50, yielding 20 lists of over 50 nodes ranked by novelty and hetesim scores. The top 20% hetesim rankings were used to assemble a comprehensive list of highly ranked risk factors predicting infectious complications.
Results and Discussion: Notable results consisted of high Hetesim ranked risk factors (H) common to both AL subtypes include pancytopenia (AML:H1, ALL:H4), neutropenia (AML:H4, ALL:H6), and thrombocytopenia (AML:H6, ALL:H9) or comorbid disorders i.e. aplastic anemia (AML:H3, ALL:H2) and hematological disease (AML:H5, ALL:H3). Thus, low red cells and platelets, and not just low white cells, are contributing to higher infectious risk. Of note was the elevated ranking of Graft-vs-Host Disease (AML:H2, ALL:H1) paired with bone marrow transplantation (BMT) (AML:H3, ALL H1), allogeneic BMT (AML:H1, ALL:H2), HSCT (AML:H2, ALL:H5), autologous SCT (AML H7, ALL:H8) and other stem cell therapies (SCT); this indicates SCT is most correlated with infection, followed by high-dose chemotherapy (AML:H6, ALL:H3) and Whole-Body Irradiation (AML: H6, ALL:H8).
Conclusion: : Complete peripheral blood counts and categorical treatment type (especially transplant or SCT) are most predictive of patients at high risk for treatment-related infection. Future SemNet simulations will examine specific viral, bacterial, and fungal infection correlation. Study results assist in variable selection for an algorithm to predict when pediatric patients should receive infection prophylaxis.