Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems. However, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision from the simultaneous massive access. Despite that this collision problem has been treated in existing RACH schemes, these schemes usually organize IoT devices' transmission and re-transmission along with fixed parameters, thus can hardly adapt to time-varying traffic patterns. Without adaptation, the RACH procedure easily suffers from high access delay, high energy consumption, or even access unavailability. With the goal of improving the RACH procedure, this paper targets to optimize the RACH procedure in real-time by maximizing a long-term hybrid multi-objective function, which consists of the number of access success devices, the average energy consumption, and the average access delay. To do so, we first optimize the long-term objective in the number of access success devices by using Deep Reinforcement Learning (DRL) algorithms for different RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). The converging capability and efficiency of different DRL algorithms including Policy Gradient (PG), Actor-Critic (AC), Deep Q-Network (DQN), and Deep Deterministic Policy Gradients (DDPG) are compared. Inspired by the results from this comparison, a decoupled learning strategy is developed to jointly and dynamically adapt the access control factors of those three access schemes. This decoupled strategy first leverage a Recurrent Neural Network (RNN) model to predict the real-time traffic values of the network environment, and then uses multiple DRL agents to cooperatively configure parameters of each RACH scheme.