When optimizing against the mean loss over a distribution of predictions in the context of a regression task, then even if there is a distribution of targets the optimal prediction distribution is always a delta function at a single value. Methods of constructing generative models need to overcome this tendency. We consider a simple method of summarizing the prediction error, such that the optimal strategy corresponds to outputting a distribution of predictions with a support that matches the support of the distribution of targets --- optimizing against the minimal value of the loss given a set of samples from the prediction distribution, rather than the mean. We show that models trained against this loss learn to capture the support of the target distribution and, when combined with an auxiliary classifier-like prediction task, can be projected via rejection sampling to reproduce the full distribution of targets. The resulting method works well compared to other generative modeling approaches particularly in low dimensional spaces with highly non-trivial distributions, due to mode collapse solutions being globally suboptimal with respect to the extreme value loss. However, the method is less suited to high-dimensional spaces such as images due to the scaling of the number of samples needed in order to accurately estimate the extreme value loss when the dimension of the data manifold becomes large.