Fast computation of singular value decomposition (SVD) is of great interest in various machine learning tasks. Recently, SVD methods based on randomized linear algebra have shown significant speedup in this regime. This paper attempts to further accelerate the computation by harnessing a modern computing architecture, namely graphics processing unit (GPU), with the goal of processing large-scale data that may not fit in the GPU memory. It leads to a new block randomized algorithm that fully utilizes the power of GPUs and efficiently processes large-scale data in an out-of- core fashion. Our experiment shows that the proposed block randomized SVD (BRSVD) method outperforms existing randomized SVD methods in terms of speed with retaining the same accuracy. We also show its application to convex robust principal component analysis, which shows significant speedup in computer vision applications.