Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision ...Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision technique has become a promising alternative.In this study,a low-cost stereo vision-based system,and a gripper to be placed at the end of the robot arm(Fanuc M10 iA/12)are developed for position and orientation estimation of robotic manipulators to pick and place different shaped objects.The stereo vision system developed in this research is used to estimate the position(X,Y,Z),orientation(P_(y))of the Center of Volume of four standard objects(cube,cuboid,cylinder,and sphere)whereas the robot arm with the gripper is used to mechanically pick and place the objects.The stereo vision system is placed on the movable robot arm,and it consists of two cameras to capture two 2D views of a stationary object to derive 3D depth information in 3D space.Moreover,a graphical user interface is developed to train a linear regression model,live predict the coordinates of the objects,and check the accuracy of the predicted data.The graphical user interface can also send predicted coordinates and angles to the gripper and the robot arm.The project is facilitated with python programming language modules and image processing techniques.Identification of the stationary object and estimation of its coordinates is done using image processing techniques.The final product can be identified as a device that converts conventional robot arms without an image processing vision system into a highly precise and accurate robot arm with an image processing vision system.Experimental studies are performed to test the efficiency and effectiveness of used techniques and the gripper prototype.Necessary actions are taken to minimize the errors in position and orientation estimation.In addition,as a future implementation,an embedded system will be developed with a user-friendly software interface to install the vision system into the Fanuc M10 iA/12 robot arm and will upgrade the system to a device that can be implemented with any kind of customized robot arms available in the industry.展开更多
In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus ofte...In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus often limiting its use in small-scale acquisition.Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors.However,underperformance of stereo matching easily occurs in textureless areas.Accordingly,this study proposed a Shading Aware DSM GEneration Method(SADGE)with high resolution multi-view satellite images.Considering the complementarity of stereo matching and Shape from Shading(SfS),SADGE combines the advantage of stereo matching and SfS technique.First,an improved Semi-Global Matching(SGM)technique is used to generate an initial surface expressed by a DSM;then,it is refined by optimizing the objective function which modeled the imaging process with the illumination,surface albedo,and normal object surface.Different from the existing shading-based DEM refinement or generation method,no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized.Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM(12.5-56.3%improvement)with sound overall shape and temporarily detailed surface compared with a software solution(SURE)for multi-view stereo.展开更多
Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 ...Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 D information,and 2 D semantic segmentation of the scene,to accomplish view synthesis of complicated scenes.We use the idea of cost volume to estimate the depth and confidence map of the scene,and use a multi-layer representation and resolution of the data to optimize the view synthesis of the main object.Results/Conclusions By applying different treatment methods on different layers of the volume,we can handle complicated scenes containing multiple persons and plentiful occlusions.We also propose the view-interpolation→multi-view reconstruction→view interpolation pipeline to iteratively optimize the result.We test our method on varying data of multi-view scenes and generate decent results.展开更多
Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,su...Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.展开更多
为了实现对立体图像质量的精确预测,提出了一种基于边缘和特征点匹配的立体图像客观质量评价方法。首先,对图像的质量进行评价,在基于结构相似度的质量评价方法的基础上,考虑了边缘信息对人眼视觉特性的重要性,加入了边缘结构相似度指标...为了实现对立体图像质量的精确预测,提出了一种基于边缘和特征点匹配的立体图像客观质量评价方法。首先,对图像的质量进行评价,在基于结构相似度的质量评价方法的基础上,考虑了边缘信息对人眼视觉特性的重要性,加入了边缘结构相似度指标;然后,对图像的立体感进行评价,采用特征点匹配的方法提取立体感指标。最后,根据总体视差失真法将图像质量和立体感指标拟合为一个综合指标。实验结果表明,采用本文提出的方法对立体图像测试库进行评价,总体评价的PLCC(Pearson Linear Correlation Coefficient)均在0.94以上;与其他方法相比,本方法具有较高的预测精确性。展开更多
文摘Conventional robotic manipulators consist of touch and vision sensors in order to pick and place differently shaped objects.Due to the technology development and degrading sensors over a long period,the stereo vision technique has become a promising alternative.In this study,a low-cost stereo vision-based system,and a gripper to be placed at the end of the robot arm(Fanuc M10 iA/12)are developed for position and orientation estimation of robotic manipulators to pick and place different shaped objects.The stereo vision system developed in this research is used to estimate the position(X,Y,Z),orientation(P_(y))of the Center of Volume of four standard objects(cube,cuboid,cylinder,and sphere)whereas the robot arm with the gripper is used to mechanically pick and place the objects.The stereo vision system is placed on the movable robot arm,and it consists of two cameras to capture two 2D views of a stationary object to derive 3D depth information in 3D space.Moreover,a graphical user interface is developed to train a linear regression model,live predict the coordinates of the objects,and check the accuracy of the predicted data.The graphical user interface can also send predicted coordinates and angles to the gripper and the robot arm.The project is facilitated with python programming language modules and image processing techniques.Identification of the stationary object and estimation of its coordinates is done using image processing techniques.The final product can be identified as a device that converts conventional robot arms without an image processing vision system into a highly precise and accurate robot arm with an image processing vision system.Experimental studies are performed to test the efficiency and effectiveness of used techniques and the gripper prototype.Necessary actions are taken to minimize the errors in position and orientation estimation.In addition,as a future implementation,an embedded system will be developed with a user-friendly software interface to install the vision system into the Fanuc M10 iA/12 robot arm and will upgrade the system to a device that can be implemented with any kind of customized robot arms available in the industry.
基金supported by the National Natural Science Foundation of China[grant number 41801390]the National Key R&D Program of China[grant number 2018YFD1100405].
文摘In many cases,the Digital Surface Models(DSMs)and Digital Elevation Models(DEMs)are obtained with Light Detection and Ranging(LiDAR)or stereo matching.As an active method,LiDAR is very accurate but expensive,thus often limiting its use in small-scale acquisition.Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors.However,underperformance of stereo matching easily occurs in textureless areas.Accordingly,this study proposed a Shading Aware DSM GEneration Method(SADGE)with high resolution multi-view satellite images.Considering the complementarity of stereo matching and Shape from Shading(SfS),SADGE combines the advantage of stereo matching and SfS technique.First,an improved Semi-Global Matching(SGM)technique is used to generate an initial surface expressed by a DSM;then,it is refined by optimizing the objective function which modeled the imaging process with the illumination,surface albedo,and normal object surface.Different from the existing shading-based DEM refinement or generation method,no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized.Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM(12.5-56.3%improvement)with sound overall shape and temporarily detailed surface compared with a software solution(SURE)for multi-view stereo.
文摘Background Aiming at free-view exploration of complicated scenes,this paper presents a method for interpolating views among multi RGB cameras.Methods In this study,we combine the idea of cost volume,which represent 3 D information,and 2 D semantic segmentation of the scene,to accomplish view synthesis of complicated scenes.We use the idea of cost volume to estimate the depth and confidence map of the scene,and use a multi-layer representation and resolution of the data to optimize the view synthesis of the main object.Results/Conclusions By applying different treatment methods on different layers of the volume,we can handle complicated scenes containing multiple persons and plentiful occlusions.We also propose the view-interpolation→multi-view reconstruction→view interpolation pipeline to iteratively optimize the result.We test our method on varying data of multi-view scenes and generate decent results.
基金supported by the National Natural Science Foundation of China(No.62271410)the Fundamental Research Funds for the Central Universities.
文摘Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.
文摘为了实现对立体图像质量的精确预测,提出了一种基于边缘和特征点匹配的立体图像客观质量评价方法。首先,对图像的质量进行评价,在基于结构相似度的质量评价方法的基础上,考虑了边缘信息对人眼视觉特性的重要性,加入了边缘结构相似度指标;然后,对图像的立体感进行评价,采用特征点匹配的方法提取立体感指标。最后,根据总体视差失真法将图像质量和立体感指标拟合为一个综合指标。实验结果表明,采用本文提出的方法对立体图像测试库进行评价,总体评价的PLCC(Pearson Linear Correlation Coefficient)均在0.94以上;与其他方法相比,本方法具有较高的预测精确性。
基金The National Natural Science Foundation of China(Nos.51105332,51275465)the Science and Technology Plan of Zhejiang Province(No.2014C31096)the Key Program of Zhejiang Provincial Natural Science Foundation of China(No.LZ16E050001)