We propose a Distributed and Collaborative Monitoring system, DCM, with the following properties. First, DCM allow switches to collaboratively achieve flow monitoring tasks and balance measurement load. Second, DCM is able to perform per-flow monitoring, by which different groups of flows are monitored using different actions. Third, DCM is a memory-efficient solution for switch data plane and guarantees system scalability. DCM uses a novel two-stage Bloom filters to represent monitoring rules using small memory space. It utilizes the centralized SDN control to install, update, and reconstruct the two-stage Bloom filters in the switch data plane. We study how DCM performs two representative monitoring tasks, namely flow size counting and packet sampling, and evaluate its performance. Experiments using real data center and ISP traffic data on real network topologies show that DCM achieves highest measurement accuracy among existing solutions given the same memory budget of switches.