In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus ofte...In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus often limiting its use in small-scale acquisition.Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors.However,underperformance of stereo matching easily occurs in textureless areas.Accordingly,this study proposed a Shading Aware DSM GEneration Method(SADGE)with high resolution multi-view satellite images.Considering the complementarity of stereo matching and Shape from Shading(SfS),SADGE combines the advantage of stereo matching and SfS technique.First,an improved Semi-Global Matching(SGM)technique is used to generate an initial surface expressed by a DSM;then,it is refined by optimizing the objective function which modeled the imaging process with the illumination,surface albedo,and normal object surface.Different from the existing shading-based DEM refinement or generation method,no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized.Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM(12.5-56.3%improvement)with sound overall shape and temporarily detailed surface compared with a software solution(SURE)for multi-view stereo.展开更多
Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,su...Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.展开更多
Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision ...Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision technique has become a promising alternative.In this study,a low-cost stereo vision-based system,and a gripper to be placed at the end of the robot arm(Fanuc M10 iA/12)are developed for position and orientation estimation of robotic manipulators to pick and place different shaped objects.The stereo vision system developed in this research is used to estimate the position(X,Y,Z),orientation(P_(y))of the Center of Volume of four standard objects(cube,cuboid,cylinder,and sphere)whereas the robot arm with the gripper is used to mechanically pick and place the objects.The stereo vision system is placed on the movable robot arm,and it consists of two cameras to capture two 2D views of a stationary object to derive 3D depth information in 3D space.Moreover,a graphical user interface is developed to train a linear regression model,live predict the coordinates of the objects,and check the accuracy of the predicted data.The graphical user interface can also send predicted coordinates and angles to the gripper and the robot arm.The project is facilitated with python programming language modules and image processing techniques.Identification of the stationary object and estimation of its coordinates is done using image processing techniques.The final product can be identified as a device that converts conventional robot arms without an image processing vision system into a highly precise and accurate robot arm with an image processing vision system.Experimental studies are performed to test the efficiency and effectiveness of used techniques and the gripper prototype.Necessary actions are taken to minimize the errors in position and orientation estimation.In addition,as a future implementation,an embedded system will be developed with a user-friendly software interface to install the vision system into the Fanuc M10 iA/12 robot arm and will upgrade the system to a device that can be implemented with any kind of customized robot arms available in the industry.展开更多
Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 ...Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 D information,and 2 D semantic segmentation of the scene,to accomplish view synthesis of complicated scenes.We use the idea of cost volume to estimate the depth and confidence map of the scene,and use a multi-layer representation and resolution of the data to optimize the view synthesis of the main object.Results/Conclusions By applying different treatment methods on different layers of the volume,we can handle complicated scenes containing multiple persons and plentiful occlusions.We also propose the view-interpolation→multi-view reconstruction→view interpolation pipeline to iteratively optimize the result.We test our method on varying data of multi-view scenes and generate decent results.展开更多
为了实现机器人焊接的免示教路径规划,结合深度学习与点云处理技术,开发了一种高效、稳定的焊缝智能识别算法.首先,采用ETH(Eye-to-hand)构型的工业级3D相机获取焊件周围的二维图像和3D点云模型,利用预先训练的YOLOv8目标检测模型识别...为了实现机器人焊接的免示教路径规划,结合深度学习与点云处理技术,开发了一种高效、稳定的焊缝智能识别算法.首先,采用ETH(Eye-to-hand)构型的工业级3D相机获取焊件周围的二维图像和3D点云模型,利用预先训练的YOLOv8目标检测模型识别焊件所在的ROI区域(region of interest,ROI),模型识别精度为99.5%,从而实现快速剔除背景点云,并基于RANSAC平面拟合、欧式聚类等点云处理算法,对ROI区域的三维点云进行焊缝空间位置的精细识别;最后根据手眼标定结果转化为机器人用户坐标系下的焊接轨迹.结果表明,文中所开发的算法可实现随机摆放的焊缝自动识别和焊接机器人路径规划,生成的轨迹与人工示教轨迹效果相当,偏差在0.5 mm以内.展开更多
Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits...Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.展开更多
基金supported by the National Natural Science Foundation of China[grant number 41801390]the National Key R&D Program of China[grant number 2018YFD1100405].
文摘In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus often limiting its use in small-scale acquisition.Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors.However,underperformance of stereo matching easily occurs in textureless areas.Accordingly,this study proposed a Shading Aware DSM GEneration Method(SADGE)with high resolution multi-view satellite images.Considering the complementarity of stereo matching and Shape from Shading(SfS),SADGE combines the advantage of stereo matching and SfS technique.First,an improved Semi-Global Matching(SGM)technique is used to generate an initial surface expressed by a DSM;then,it is refined by optimizing the objective function which modeled the imaging process with the illumination,surface albedo,and normal object surface.Different from the existing shading-based DEM refinement or generation method,no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized.Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM(12.5-56.3%improvement)with sound overall shape and temporarily detailed surface compared with a software solution(SURE)for multi-view stereo.
基金supported by the National Natural Science Foundation of China(No.62271410)the Fundamental Research Funds for the Central Universities.
文摘Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.
文摘Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision technique has become a promising alternative.In this study,a low-cost stereo vision-based system,and a gripper to be placed at the end of the robot arm(Fanuc M10 iA/12)are developed for position and orientation estimation of robotic manipulators to pick and place different shaped objects.The stereo vision system developed in this research is used to estimate the position(X,Y,Z),orientation(P_(y))of the Center of Volume of four standard objects(cube,cuboid,cylinder,and sphere)whereas the robot arm with the gripper is used to mechanically pick and place the objects.The stereo vision system is placed on the movable robot arm,and it consists of two cameras to capture two 2D views of a stationary object to derive 3D depth information in 3D space.Moreover,a graphical user interface is developed to train a linear regression model,live predict the coordinates of the objects,and check the accuracy of the predicted data.The graphical user interface can also send predicted coordinates and angles to the gripper and the robot arm.The project is facilitated with python programming language modules and image processing techniques.Identification of the stationary object and estimation of its coordinates is done using image processing techniques.The final product can be identified as a device that converts conventional robot arms without an image processing vision system into a highly precise and accurate robot arm with an image processing vision system.Experimental studies are performed to test the efficiency and effectiveness of used techniques and the gripper prototype.Necessary actions are taken to minimize the errors in position and orientation estimation.In addition,as a future implementation,an embedded system will be developed with a user-friendly software interface to install the vision system into the Fanuc M10 iA/12 robot arm and will upgrade the system to a device that can be implemented with any kind of customized robot arms available in the industry.
文摘Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 D information,and 2 D semantic segmentation of the scene,to accomplish view synthesis of complicated scenes.We use the idea of cost volume to estimate the depth and confidence map of the scene,and use a multi-layer representation and resolution of the data to optimize the view synthesis of the main object.Results/Conclusions By applying different treatment methods on different layers of the volume,we can handle complicated scenes containing multiple persons and plentiful occlusions.We also propose the view-interpolation→multi-view reconstruction→view interpolation pipeline to iteratively optimize the result.We test our method on varying data of multi-view scenes and generate decent results.
文摘为了实现机器人焊接的免示教路径规划,结合深度学习与点云处理技术,开发了一种高效、稳定的焊缝智能识别算法.首先,采用ETH(Eye-to-hand)构型的工业级3D相机获取焊件周围的二维图像和3D点云模型,利用预先训练的YOLOv8目标检测模型识别焊件所在的ROI区域(region of interest,ROI),模型识别精度为99.5%,从而实现快速剔除背景点云,并基于RANSAC平面拟合、欧式聚类等点云处理算法,对ROI区域的三维点云进行焊缝空间位置的精细识别;最后根据手眼标定结果转化为机器人用户坐标系下的焊接轨迹.结果表明,文中所开发的算法可实现随机摆放的焊缝自动识别和焊接机器人路径规划,生成的轨迹与人工示教轨迹效果相当,偏差在0.5 mm以内.
文摘Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.