This paper aims to develop a unified paradigm that models one's learning behavior and the system's equilibrating processes in a routing game among atomic selfish agents. Such a paradigm can assist policymakers in devising optimal operational and planning countermeasures under both normal and abnormal circumstances. To this end, a multi-agent reinforcement learning (MARL) paradigm is proposed in which each agent learns and updates her own en-route path choice policy while interacting with others on transportation networks. This paradigm is shown to generalize the classical notion of dynamic user equilibrium (DUE) to model-free and data-driven scenarios. We also illustrate that the equilibrium outcomes computed from our developed MARL paradigm coincide with DUE and dynamic system optimal (DSO), respectively, when rewards are set differently. In addition, with the goal to optimize some systematic objective (e.g., overall traffic condition) of city planners, we formulate a bilevel optimization problem with the upper level as city planners and the lower level as a multi-agent system where each rational and selfish traveler aims to minimize her travel cost. We demonstrate the effect of two administrative measures, namely tolling and signal control, on the behavior of travelers and show that the systematic objective of city planners can be optimized by a proper control. The results show that on the Braess network, the optimal toll charge on the central link is greater or equal to 25, with which the average travel time of selfish agents is minimized and the emergence of Braess paradox could be avoided. In a large-sized real-world road network with 69 nodes and 166 links, the optimal offset for signal control on Broadway is derived as 4 seconds, with which the average travel time of all controllable agents is minimized.