Learning machines which have hierarchical structures or hidden variables are singular statistical models because they are nonidentifiable and their Fisher information matrices are singular. In singular statistical models, neither the Bayes a posteriori distribution converges to the normal distribution nor the maximum likelihood estimator satisfies asymptotic normality. This is the main reason why it has been difficult to predict their generalization performances from trained states. In this paper, we study four errors, (1) Bayes generalization error, (2) Bayes training error, (3) Gibbs generalization error, and (4) Gibbs training error, and prove that there are mathematical relations among these errors. The formulas proved in this paper are equations of states in statistical estimation because they hold for any true distribution, any parametric model, and any a priori distribution. Also we show that Bayes and Gibbs generalization errors are estimated by Bayes and Gibbs training errors, and propose widely applicable information criteria which can be applied to both regular and singular statistical models.