Modeling molecular potential energy surface is of pivotal importance in science. Graph Neural Networks have shown great success in this field, especially those using rotation-equivariant representations. However, they either suffer from a complex mathematical form or lack theoretical support and design principle. To avoid using equivariant representations, we introduce a novel local frame method to molecule representation learning and analyze its expressive power. With a frame and the projection of equivariant vectors on the frame, GNNs can map the local environment of an atom to a scalar representation injectively. Messages can also be passed across local environments with frames' projection on frames. We further analyze when and how we can build such local frames. We prove that local frames always exist when the local environments have no symmetry, as is often the case in molecular dynamics simulations. For symmetric molecules, though only degenerate frames can be built, we find that the local frame method may still achieve high expressive power in some common cases due to the reduced degrees of freedom. Using only scalar representations allows us to adopt existing simple and powerful GNN architectures. Our model outperforms a range of state-of-the-art baselines in experiments. Simpler architectures also lead to higher scalability. Our model only takes about 30% inference time compared with the fastest baseline.