In Transport Mode Detection, a great diversity of methodologies exist according to the choice made on sensors, preprocessing, model used, etc. In this domain, the comparisons between each option are not always complete. Experiments on a public, real-life dataset are led here to evaluate carefully each of the choices that were made, with a specific emphasis on data fusion methods. Our most surprising finding is that none of the methods we implemented from the literature is better than a simple late fusion. Two important decisions are the choice of a sensor and the choice of a representation for the data: we found that using 2D convolutions on spectrograms with a logarithmic axis for the frequencies was better than 1-dimensional temporal representations.