Finding translational biomarkers stands center stage of the future of personalized medicine in healthcare. We observed notable challenges in identifying robust biomarkers as some with great performance in one scenario often fail to perform well in new trials (e.g. different population, indications). With rapid development in the clinical trial world (e.g. assay, disease definition), new trials very likely differ from legacy ones in many perspectives and in development of biomarkers this heterogeneity should be considered. In response, we recommend considering building in the heterogeneity when evaluating biomarkers. In this paper, we present one evaluation strategy by using leave-one-study-out (LOSO) in place of conventional cross-validation (cv) methods to account for the potential heterogeneity across trials used for building and testing the biomarkers. To demonstrate the performance of K-fold vs LOSO cv in estimating the effect size of biomarkers, we leveraged data from clinical trials and simulation studies. In our assessment, LOSO cv provided a more objective estimate of the future performance. This conclusion remained true across different evaluation metrics and different statistical methods.