Shathushan Sivashangaran and Azim Eskandarian
Adv. Artif. Intell. Mach. Learn., 3 (2):1198-1219
Shathushan Sivashangaran : Virginia Tech
Azim Eskandarian : Virginia Tech
DOI: 10.54364/AAIML.2023.1170
Article History: Received on: 18-May-23, Accepted on: 22-Jun-23, Published on: 24-Jun-23
Corresponding Author: Shathushan Sivashangaran
Email: shathushansiva@vt.edu
Citation: Shathushan Sivashangaran, Azim Eskandarian (2023). Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps. Adv. Artif. Intell. Mach. Learn., 3 (2 ):1198-1219
Autonomous Ground Vehicles (AGVs) are essential tools for a wide range of applications stemming from their ability to operate in hazardous environments with minimal human operator input. Effective motion planning is paramount for successful operation of AGVs. Conventional motion planning algorithms are dependent on prior knowledge of environment characteristics and offer limited utility in information poor, dynamically altering environments such as areas where emergency hazards like fire and earthquake occur, and unexplored subterranean environments such as tunnels and lava tubes on Mars. We propose a Deep Reinforcement Learning (DRL) framework for intelligent AGV exploration without a-priori maps utilizing Actor-Critic DRL algorithms to learn policies in continuous and high-dimensional action spaces directly from raw sensor data. The DRL architecture comprises feedforward neural networks for the critic and actor representations in which the actor network strategizes linear and angular velocity control actions given current state inputs, that are evaluated by the critic network which learns and estimates Q-values to maximize an accumulated reward. Three off-policy DRL algorithms, DDPG, TD3 and SAC, are trained and compared in two environments of varying complexity, and further evaluated in a third with no prior training or knowledge of map characteristics. The agent is shown to learn optimal policies at the end of each training period to chart quick, collision-free exploration trajectories, and is extensible, capable of adapting to an unknown environment without changes to network architecture or hyperparameters. The best algorithm is further evaluated in a realistic 3D environment.