We consider a wireless sensor network (WSN), consisting of several sensors and a fusion center (FC), which is tasked with solving an M-ary hypothesis testing problem. Sensors make M-ary decisions and transmit their digitally modulated decisions over orthogonal channels, which are subject to Rayleigh fading and noise, to the FC. Adopting Bayesian optimality criterion, we consider training and non-training based distributed detection systems and investigate the effect of imperfect channel state information (CSI) on the optimal maximum a posteriori probability (MAP) fusion rules and optimal power allocation between sensors, when the sum of training and data symbol transmit powers is fixed. We consider J-divergence criteria to do power allocation between sensors. The theoretical results show that J-divergence for coherent reception will be maximized if total training power be half of total power, however for non coherent reception, optimal training power which maximize J-divergence is zero. The simulated results also show that probability of error will be minimized if training power be half of total power for coherent reception and zero for non coherent reception.