The iPhone was introduced only a decade ago in 2007 but has fundamentally changed the way we interact with online information. Mobile devices differ radically from classic command-based and point-and-click user interfaces, now allowing for gesture-based interaction using fine-grained touch and swipe signals. Due to the rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft's Cortana, Google Now, and Apple's Siri, mobile devices have become personal, allowing us to be online all the time, and assist us in any task, both in work and in our daily lives, making context a crucial factor to consider. Mobile usage is now exceeding desktop usage, and is still growing at a rapid rate, yet our main ways of training and evaluating personal assistants are still based on (and framed in) classical desktop interactions, focusing on explicit queries, clicks, and dwell time spent. However, modern user interaction with mobile devices is radically different due to touch screens with a gesture- and voice-based control and the varying context of use, e.g., in a car, by bike, often invalidating the assumptions underlying today's user satisfaction evaluation. There is an urgent need to understand voice- and gesture-based interaction, taking all interaction signals and context into account in appropriate ways. We propose a research agenda for developing methods to evaluate and improve context-aware user satisfaction with mobile interactions using gesture-based signals at scale.