We introduce VERTEX, an effective solution to recover 3D shape and intrinsic texture of vehicles from uncalibrated monocular input in real-world street environments. To fully utilize the template prior of vehicles, we propose a novel geometry and texture joint representation, based on implicit semantic template mapping. Compared to existing representations which infer 3D texture distribution, our method explicitly constrains the texture distribution on the 2D surface of the template as well as avoids limitations of fixed resolution and topology. Moreover, by fusing the global and local features together, our approach is capable to generate consistent and detailed texture in both visible and invisible areas. We also contribute a new synthetic dataset containing 830 elaborate textured car models labeled with sparse key points and rendered using Physically Based Rendering (PBRT) system with measured HDRI skymaps to obtain highly realistic images. Experiments demonstrate the superior performance of our approach on both testing dataset and in-the-wild images. Furthermore, the presented technique enables additional applications such as 3D vehicle texture transfer and material identification.