ISSN :2582-9793

SOccDPT: 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints.

Original Research (Published On: 26-May-2024 )
SOccDPT: 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints.
DOI : https://dx.doi.org/10.54364/AAIML.2024.42126

ADITYA NALGUNDA GANESH

Adv. Artif. Intell. Mach. Learn., 4 (2):2201-2212

ADITYA NALGUNDA GANESH : PES University

Download PDF Here

DOI: https://dx.doi.org/10.54364/AAIML.2024.42126

Article History: Received on: 06-Feb-24, Accepted on: 19-May-24, Published on: 26-May-24

Corresponding Author: ADITYA NALGUNDA GANESH

Email: adityang5@gmail.com

Citation: ADITYA NALGUNDA GANESH (2024). SOccDPT: 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints.. Adv. Artif. Intell. Mach. Learn., 4 (2 ):2201-2212


Abstract

    

We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and  Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labeling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.


Statistics

   Article View: 393
   PDF Downloaded: 20