Current textual question answering models achieve strong performance on in-domain test sets, but often do so by fitting surface-level patterns in the data, so they fail to generalize to out-of-distribution and adversarial settings. To make a more robust and understandable QA system, we model question answering as an alignment problem. We decompose both the question and context into smaller units based on off-the-shelf semantic representations (here, semantic roles), and solve a subgraph alignment problem to find a part of the context which matches the question. Our model uses BERT to compute alignment scores, and by using structured SVM, we can train end-to-end despite complex inference. Our explicit use of alignments allows us to explore a set of constraints with which we can prohibit certain types of bad behaviors which arise in cross-domain settings. Furthermore, by investigating differences in scores across different potential answers, we can seek to understand what particular aspects of the input led the model to choose the answer it did without relying on "local" post-hoc explanation techniques. We train our model on SQuAD v1.1 and test it in several adversarial and out-of-domain datasets. The results show that our model is more robust cross-domain than the standard BERT QA model, and constraints derived from alignment scores allow us to effectively trade off coverage and accuracy.