Age of Information (AoI) measures the freshness of the information at a remote location. AoI reflects the time that is elapsed since the generation of the packet by a transmitter. In this paper, we consider a remote monitoring problem (e.g., remote factory) in which a number of sensor nodes are transmitting time sensitive measurements to a remote monitoring site. We consider minimizing a metric that maintains a trade-off between minimizing the sum of the expected AoI of all sensors and minimizing an Ultra Reliable Low Latency Communication (URLLC) term. The URLLC term is considered to ensure that the probability the AoI of each sensor exceeds a predefined threshold is minimized. Moreover, we assume that sensors tolerate different threshold values and generate packets at different sizes. Motivated by the success of machine learning in solving large networking problems at low complexity, we develop a low complexity reinforcement learning based algorithm to solve the proposed formulation. We trained our algorithm using the state-of-the-art actor-critic algorithm over a set of public bandwidth traces. Simulation results show that the proposed algorithm outperforms the considered baselines in terms of minimizing the expected AoI and the threshold violation of each sensor.