Recent imaging technologies are rapidly evolving for sampling richer and more immersive representations of the 3D world. And one of the emerging technologies are light field (LF) cameras based on micro-lens arrays. To record the directional information of the light rays, a much larger storage space and transmission bandwidth are required by a LF image as compared with a conventional 2D image of similar spatial dimension, and the compression of LF data becomes a vital part of its application. In this paper, we propose a LF codec that fully exploits the intrinsic geometry between the LF sub-views by first approximating the LF with disparity guided sparse coding over a perspective shifted light field dictionary. The sparse coding is only based on several optimized Structural Key Views (SKV); however the entire LF can be recovered from the coding coefficients. By keeping the approximation identical between encoder and decoder, only the residuals of the non-key views, disparity map and the SKVs need to be compressed into the bit stream. An optimized SKV selection method is proposed such that most LF spatial information could be preserved. And to achieve optimum dictionary efficiency, the LF is divided into several Coding Regions (CR), over which the reconstruction works individually. Experiments and comparisons have been carried out over benchmark LF dataset, which show that the proposed SC-SKV codec produces convincing compression results in terms of both rate-distortion performance and visual quality compared with High Efficiency Video Coding (HEVC): with 47.87% BD-rate reduction and 1.59 dB BD-PSNR improvement achieved on average, especially with up to 4 dB improvement for low bit rate scenarios.