Multimodal Large Language Models

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

Multimodal Large Language Models (MLLMs) have demonstrated impressive 2D image/video understanding capabilities.However, there are no publicly standardized benchmarks to assess the abilities of MLLMs in understanding the 4D objects.In this paper, we …