Exploring Optimizations to HeteSim for Computing Relatedness in Heterogeneous Information Networks

Stephen Allegri

Evie Davalbhakta

David Kartchner

Anna Kirkpatrick

Davi Nakajima An

Chidozie Onyeze

Cassie S. Mitchell

Prasad Tetali

American Mathematical Society Joint Meeting on Mathematics (ANA), 2022

Abstract

Heterogeneous information networks, or knowledge graphs, are valuable tools for analyzing insights from the vast number of papers published in the biomedical sciences. Informally, a heterogeneous information network is a directed graph in which each node corresponds to a biomedical concept and each directed edge encodes a relationship between concepts. SemNet is a heterogeneous information network with approximately 300,000 nodes and 20,000,000 edges built from the abstracts of papers in the PubMed database. This work builds on the SemNet codebase with the goal of reducing algorithm runtime. We focus specifically on the similarity score HeteSim, which has been successfully used in multiple biomedical heterogeneous information networks. However, HeteSim can require prohibitive runtime. Performance improvements to HeteSim computation algorithms based on data structure changes and randomized approximation algorithms will be presented. Both theoretical bounds on algorithm performance and empirical timing results will be included.

Materials

Project

Exploring Optimizations to HeteSim for Computing Relatedness in Heterogeneous Information Networks

Abstract

Materials

BibTeX