Remote services and applications that users access via their local clients (laptops or desktops) usually assume that, following a successful user authentication at the beginning of the session, all subsequent communication reflects the user's intent. However, this is not true if the adversary gains control of the client and can therefore manipulate what the user sees and what is sent to the remote server. To protect the user's communication with the remote server despite a potentially compromised local client, we propose the concept of continuous visual supervision by a second device equipped with a camera. Motivated by the rapid increase of the number of incoming devices with front-facing cameras, such as augmented reality headsets and smart home assistants, we build upon the core idea that the user's actual intended input is what is shown on the client's screen, despite what ends up being sent to the remote server. A statically positioned camera enabled device can, therefore, continuously analyze the client's screen to enforce that the client behaves honestly despite potentially being malicious. We evaluate the present-day feasibility and deployability of this concept by developing a fully functional prototype, running a host of experimental tests on three different mobile devices, and by conducting a user study in which we analyze participants' use of the system during various simulated attacks. Experimental evaluation indeed confirms the feasibility of the concept of visual supervision, given that the system consistently detects over 98% of evaluated attacks, while study participants with little instruction detect the remaining attacks with high probability.