Clustering is a fundamental task in unsupervised learning. The focus of this paper is the Correlation Clustering functional which combines positive and negative affinities between the data points. The contribution of this paper is two fold: (i) Provide a theoretic analysis of the functional. (ii) New optimization algorithms which can cope with large scale problems (>100K variables) that are infeasible using existing methods. Our theoretic analysis provides a probabilistic generative interpretation for the functional, and justifies its intrinsic "model-selection" capability. Furthermore, we draw an analogy between optimizing this functional and the well known Potts energy minimization. This analogy allows us to suggest several new optimization algorithms, which exploit the intrinsic "model-selection" capability of the functional to automatically recover the underlying number of clusters. We compare our algorithms to existing methods on both synthetic and real data. In addition we suggest two new applications that are made possible by our algorithms: unsupervised face identification and interactive multi-object segmentation by rough boundary delineation.