EnCore: Pre-Training Entity Encoders using Coreference Chains

Frank Mtumbuka, Steven Schockaert

Entity typing is the task of assigning semantic types to the entities that are mentioned in a text. Since obtaining sufficient amounts of manual annotations is expensive, current state-of-the-art methods are typically trained on automatically labelled datasets, e.g. by exploiting links between Wikipedia pages. In this paper, we propose to use coreference chains as an additional supervision signal. Specifically, we pre-train an entity encoder using a contrastive loss, such that entity embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. Since this strategy is not tied to Wikipedia, we can pre-train our entity encoder on other genres than encyclopedic text and on larger amounts of data. Our experimental results show that the proposed pre-training strategy allows us to improve the state-of-the-art in fine-grained entity typing, provided that only high-quality coreference links are exploited.

Knowledge Graph



Sign up or login to leave a comment