Deforestation is one of the contributing factors to climate change. Climate change has a serious impact on human life, and it occurs due to emission of greenhouse gases, such as carbon dioxide, to the atmosphere. It is important to know the causes of deforestation for mitigation efforts, but there is a lack of data-driven research studies to predict these deforestation drivers. In this work, we propose a contrastive learning architecture, called Multimodal SuperCon, for classifying drivers of deforestation in Indonesia using satellite images obtained from Landsat 8. Multimodal SuperCon is an architecture which combines contrastive learning and multimodal fusion to handle the available deforestation dataset. Our proposed model outperforms previous work on driver classification, giving a 7% improvement in accuracy in comparison to a state-of-the-art rotation equivariant model for the same task.