In urban Vehicular Ad hoc Networks (VANETs), high mobility of vehicular environment and frequently changed network topology call for a low delay end-to-end routing algorithm. In this paper, we propose a Multi-Agent Reinforcement Learning (MARL) based decentralized routing scheme, where the inherent similarity between the routing problem in VANET and the MARL problem is exploited. The proposed routing scheme models the interaction be- tween vehicles and the environment as a multi-agent problem in which each vehicle autonomously estab- lishes the communication channel with a neighbor de- vice regardless of the global information. Simula- tion performed in the 3GPP Manhattan mobility model demonstrates that our proposed decentralized routing algorithm achieves less than 45.8 ms average latency and high stability of 0.05 % averaging failure rate with varying vehicle capacities.