We consider a multi-user video streaming service optimization problem over a time-varying and mutually interfering multi-cell wireless network. The key research challenge is to appropriately adapt each user's video streaming rate according to the radio frequency environment (e.g., channel fading and interference level) and service demands (e.g., play request), so that the users' long-term experience for watching videos can be optimized. To address the above challenge, we propose a novel two-level cross-layer optimization framework for multiuser adaptive video streaming over wireless networks. The key idea is to jointly design the physical layer optimization-based beamforming scheme (performed at the base stations) and the application layer Deep Reinforcement Learning (DRL)-based scheme (performed at the user terminals), so that a highly complex multi-user, cross-layer, time-varying video streaming problem can be decomposed into relatively simple problems and solved effectively. Our strategy represents a significant departure for the existing schemes where either short-term user experience optimization is considered, or only single-user point-to-point long-term optimization is considered. Extensive simulations based on real-data sets show that the proposed cross-layer design is effective and promising.