We propose a semi-supervised model for detecting anomalies in videos inspiredby the Video Pixel Network [van den Oord et al., 2016]. VPN is a probabilisticgenerative model based on a deep neural network that estimates the discrete jointdistribution of raw pixels in video frames. Our model extends the Convolutional-LSTM video encoder part of the VPN with a novel convolutional based attentionmechanism. We also modify the Pixel-CNN decoder part of the VPN to a frameinpainting task where a partially masked version of the frame to predict is given asinput. The frame reconstruction error is used as an anomaly indicator. We test ourmodel on a modified version of the moving mnist dataset [Srivastava et al., 2015]. Our model is shown to be effective in detecting anomalies in videos. This approachcould be a component in applications requiring visual common sense.