Various multi-modal imaging sensors are currently involved at different steps of an interventional therapeutic work-flow. Cone beam computed tomography (CBCT), computed tomography (CT) or Magnetic Resonance (MR) images thereby provides complementary functional and/or structural information of the targeted region and organs at risk. Merging this information relies on a correct spatial alignment of the observed anatomy between the acquired images. This can be achieved by the means of multi-modal deformable image registration (DIR), demonstrated to be capable of estimating dense and elastic deformations between images acquired by multiple imaging devices. However, due to the typically different field-of-view (FOV) sampled across the various imaging modalities, such algorithms may severely fail in finding a satisfactory solution. In the current study we propose a new fast method to align the FOV in multi-modal 3D medical images. To this end, a patch-based approach is introduced and combined with a state-of-the-art multi-modal image similarity metric in order to cope with multi-modal medical images. The occurrence of estimated patch shifts is computed for each spatial direction and the shift value with maximum occurrence is selected and used to adjust the image field-of-view. We show that a regional registration approach using voxel patches provides a good structural compromise between the voxel-wise and "global shifts" approaches. The method was thereby beneficial for CT to CBCT and MRI to CBCT registration tasks, especially when highly different image FOVs are involved. Besides, the benefit of the method for CT to CBCT and MRI to CBCT image registration is analyzed, including the impact of artifacts generated by percutaneous needle insertions. Additionally, the computational needs are demonstrated to be compatible with clinical constraints in the practical case of on-line procedures.