Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

Qingyun Wang, Doug Downey, Heng Ji, Tom Hope

Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links). LBD also ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background knowledge and motivations that human scientists consider (e.g., to find a drug candidate without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a new modeling framework using retrieval of ``inspirations'' from a heterogeneous network of citations and knowledge graph relations, and create a new dataset derived from papers. In automated and human evaluations, our models improve over baselines, including powerful large language models (LLMs), but also reveal challenges on the road to building machines that generate new scientific knowledge.

Knowledge Graph



Sign up or login to leave a comment