Recently, considerable efforts have been devoted to approximately computing the global and local (i.e., incident to each node) triangle counts of a large graph stream represented as a sequence of edges. Existing approximate triangle counting algorithms rely on sampling techniques to reduce the computational cost. However, their estimation errors are significantly determined by the covariance between sampled triangles. Moreover, little attention has been paid to developing parallel one-pass streaming algorithms that can be used to fast and approximately count triangles on a multi-core machine or a cluster of machines. To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles. We theoretically prove that REPT is more accurate than parallelizing existing triangle count estimation algorithms in a direct manner. In addition, we also conduct extensive experiments on a variety of real-world graphs, and the results demonstrate that our method REPT is several times more accurate than state-of-the-art methods.