Most invariance-based self-supervised methods rely on single object-centric images (e.g., ImageNet images) for pretraining, learning invariant representations from geometric transformations. However, when images are not object-centric, the semantics of the image can be significantly altered due to geometric transformations such as random crops and multi-crops. Furthermore, the model may struggle to capture location information. For this reason, we propose a Geometric Transformation Sensitive Architecture that learns features sensitive to geometric transformation like four-fold rotation, random crop, and multi-crop. Our method encourages the student to learn sensitive features by increasing the similarity between overlapping regions not entire views. and applying rotations to the target feature map. Additionally, we use a patch correspondence loss to capture long-term dependencies. Our approach demonstrates improved performance when using non-object-centric images as pretraining data compared to other methods that learn geometric transformation-invariant representations. We surpass DINO baseline in tasks such as image classification, semantic segmentation, detection, and instance segmentation with improvements of 6.1 $Acc$, 0.6 $mIoU$, 0.4 $AP^b$, and 0.1 $AP^m$.