To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determ...To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determining the position of the grasping information.These results can then be used to guide the visually impaired or to execute grasping tasks with a robotic arm.In this paper,we collected,annotated,and published the benchmark TQUGraspingObject dataset for testing,validation,and evaluation of deep learning(DL)models for detecting,recognizing,and localizing grasping objects in 2D and 3D space,especially 3D point cloud data.Our dataset is collected in a shared room,with common everyday objects placed on the tabletop in jumbled positions by Intel RealSense D435(IR-D435).This dataset includes more than 63k RGB-D pairs and related data such as normalized 3D object point cloud,3D object point cloud segmented,coordinate system normalizationmatrix,3D object point cloud normalized,and hand pose for grasping each object.At the same time,we also conducted experiments on fourDL networks with the best performance:SSD-MobileNetV3,ResNet50-Transformer,ResNet101-Transformer,and YOLOv12.The results present that YOLOv12 has the most suitable results in detecting and recognizing objects in images.All data,annotations,toolkit,source code,point cloud data,and results are publicly available on our project website:https://github.com/HuaTThanhIT2327Tqu/datasetv2.展开更多
文摘To support the process of grasping objects on a tabletop for the blind or robotic arm,it is necessary to address fundamental computer vision tasks,such as detecting,recognizing,and locating objects in space,and determining the position of the grasping information.These results can then be used to guide the visually impaired or to execute grasping tasks with a robotic arm.In this paper,we collected,annotated,and published the benchmark TQUGraspingObject dataset for testing,validation,and evaluation of deep learning(DL)models for detecting,recognizing,and localizing grasping objects in 2D and 3D space,especially 3D point cloud data.Our dataset is collected in a shared room,with common everyday objects placed on the tabletop in jumbled positions by Intel RealSense D435(IR-D435).This dataset includes more than 63k RGB-D pairs and related data such as normalized 3D object point cloud,3D object point cloud segmented,coordinate system normalizationmatrix,3D object point cloud normalized,and hand pose for grasping each object.At the same time,we also conducted experiments on fourDL networks with the best performance:SSD-MobileNetV3,ResNet50-Transformer,ResNet101-Transformer,and YOLOv12.The results present that YOLOv12 has the most suitable results in detecting and recognizing objects in images.All data,annotations,toolkit,source code,point cloud data,and results are publicly available on our project website:https://github.com/HuaTThanhIT2327Tqu/datasetv2.