Contact processes form a large and highly interesting class of dynamic processes on networks, including epidemic and information spreading. While devising stochastic models of such processes is relatively easy, analyzing them is very challenging from a computational point of view, particularly for large networks appearing in real applications. One strategy to reduce the complexity of their analysis is to rely on approximations, often in terms of a set of differential equations capturing the evolution of a random node, distinguishing nodes with different topological contexts (i.e., different degrees of different neighborhoods), like degree-based mean field (DBMF), approximate master equation (AME), or pair approximation (PA). The number of differential equations so obtained is typically proportional to the maximum degree kmax of the network, which is much smaller than the size of the master equation of the underlying stochastic model, yet numerically solving these equations can still be problematic for large kmax. In this paper, we extend AME and PA, which has been proposed only for the binary state case, to a multi-state setting and provide an aggregation procedure that clusters together nodes having similar degrees, treating those in the same cluster as indistinguishable, thus reducing the number of equations while preserving an accurate description of global observables of interest. We also provide an automatic way to build such equations and to identify a small number of degree clusters that give accurate results. The method is tested on several case studies, where it shows a high level of compression and a reduction of computational time of several orders of magnitude for large networks, with minimal loss in accuracy.