Modular Representation Underlies Systematic Generalization in Neural Natural Language Inference Models

Atticus Geiger, Kyle Richardson, Christopher Potts

In adversarial (challenge) testing, we pose hard generalization tasks in order to gain insights into the solutions found by our models. What properties must a system have in order to succeed at these hard tasks? In this paper, we argue that an essential factor is the ability to form modular representations. Our central contribution is a definition of what it means for a representation to be modular and an experimental method for assessing the extent to which a system's solution is modular in this general sense. Our work is grounded empirically in a new challenge Natural Language Inference dataset designed to assess systems on their ability to reason about entailment and negation. We find that a BERT model with fine-tuning is strikingly successful at the hard generalization tasks we pose using this dataset, and our active manipulations help us to understand why: despite the densely interconnected nature of the BERT architecture, the learned model embeds modular, general theories of lexical entailment relations.

Knowledge Graph



Sign up or login to leave a comment