Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness...Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.展开更多
The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative po...The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative pose estimation.This study proposes a novel airborne monocular visual estimation method based on structured scene features to address this challenge.First,a multitask neural network model is established for segmentation,depth estimation,and slope estimation on monocular images.And a monocular image comprehensive three-dimensional information metric is designed,encompassing length,span,flatness,and slope information.Subsequently,structured edge features are leveraged to filter candidate landing regions adaptively.By leveraging the three-dimensional information metric,the optimal landing region is accurately and efficiently identified.Finally,sparse two-dimensional key point is used to parameterize the optimal landing region for the first time and a high-precision relative pose estimation is achieved.Additional measurement information is introduced to provide the autonomous landing guidance information between the aircraft and the optimal landing region.Experimental results obtained from both synthetic and real data demonstrate the effectiveness of the proposed method in monocular pose estimation for autonomous aircraft landing guidance in unknown structured scenes.展开更多
In the dynamic scene of autonomous vehicles,the depth estimation of monocular cameras often faces the problem of inaccurate edge depth estimation.To solve this problem,we propose an unsupervised monocular depth estima...In the dynamic scene of autonomous vehicles,the depth estimation of monocular cameras often faces the problem of inaccurate edge depth estimation.To solve this problem,we propose an unsupervised monocular depth estimation model based on edge enhancement,which is specifically aimed at the depth perception challenge in dynamic scenes.The model consists of two core networks:a deep prediction network and a motion estimation network,both of which adopt an encoder-decoder architecture.The depth prediction network is based on the U-Net structure of ResNet18,which is responsible for generating the depth map of the scene.The motion estimation network is based on the U-Net structure of Flow-Net,focusing on the motion estimation of dynamic targets.In the decoding stage of the motion estimation network,we innovatively introduce an edge-enhanced decoder,which integrates a convolutional block attention module(CBAM)in the decoding process to enhance the recognition ability of the edge features of moving objects.In addition,we also designed a strip convolution module to improve the model’s capture efficiency of discrete moving targets.To further improve the performance of the model,we propose a novel edge regularization method based on the Laplace operator,which effectively accelerates the convergence process of themodel.Experimental results on the KITTI and Cityscapes datasets show that compared with the current advanced dynamic unsupervised monocular model,the proposed model has a significant improvement in depth estimation accuracy and convergence speed.Specifically,the rootmean square error(RMSE)is reduced by 4.8%compared with the DepthMotion algorithm,while the training convergence speed is increased by 36%,which shows the superior performance of the model in the depth estimation task in dynamic scenes.展开更多
Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain su...Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.展开更多
Depth maps play a crucial role in various practical applications such as computer vision,augmented reality,and autonomous driving.How to obtain clear and accurate depth information in video depth estimation is a signi...Depth maps play a crucial role in various practical applications such as computer vision,augmented reality,and autonomous driving.How to obtain clear and accurate depth information in video depth estimation is a significant challenge faced in the field of computer vision.However,existing monocular video depth estimation models tend to produce blurred or inaccurate depth information in regions with object edges and low texture.To address this issue,we propose a monocular depth estimation model architecture guided by semantic segmentation masks,which introduces semantic information into the model to correct the ambiguous depth regions.We have evaluated the proposed method,and experimental results show that our method improves the accuracy of edge depth,demonstrating the effectiveness of our approach.展开更多
Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t...Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.展开更多
Visual sensors are used to measure the relative state of the chaser spacecraft to the target spacecraft during close range ren- dezvous phases. This article proposes a two-stage iterative algorithm based on an inverse...Visual sensors are used to measure the relative state of the chaser spacecraft to the target spacecraft during close range ren- dezvous phases. This article proposes a two-stage iterative algorithm based on an inverse projection ray approach to address the relative position and attitude estimation by using feature points and monocular vision. It consists of two stages: absolute orienta- tion and depth recovery. In the first stage, Umeyama's algorithm is used to fit the three-dimensional (3D) model set and estimate the 3D point set while in the second stage, the depths of the observed feature points are estimated. This procedure is repeated until the result converges. Moreover, the effectiveness and convergence of the proposed algorithm are verified through theoreti- cal analysis and mathematical simulation.展开更多
In order to decrease vehicle crashes, a new rear view vehicle detection system based on monocular vision is designed. First, a small and flexible hardware platform based on a DM642 digtal signal processor (DSP) micr...In order to decrease vehicle crashes, a new rear view vehicle detection system based on monocular vision is designed. First, a small and flexible hardware platform based on a DM642 digtal signal processor (DSP) micro-controller is built. Then, a two-step vehicle detection algorithm is proposed. In the first step, a fast vehicle edge and symmetry fusion algorithm is used and a low threshold is set so that all the possible vehicles have a nearly 100% detection rate (TP) and the non-vehicles have a high false detection rate (FP), i. e., all the possible vehicles can be obtained. In the second step, a classifier using a probabilistic neural network (PNN) which is based on multiple scales and an orientation Gabor feature is trained to classify the possible vehicles and eliminate the false detected vehicles from the candidate vehicles generated in the first step. Experimental results demonstrate that the proposed system maintains a high detection rate and a low false detection rate under different road, weather and lighting conditions.展开更多
Vehicle anti-collision technique is a hot topic in the research area of Intelligent Transport System. The research on preceding vehicles detection and the distance measurement, which are the key techniques, makes grea...Vehicle anti-collision technique is a hot topic in the research area of Intelligent Transport System. The research on preceding vehicles detection and the distance measurement, which are the key techniques, makes great contributions to safe-driving. This paper presents a method which can be used to detect preceding vehicles and get the distance between own car and the car ahead. Firstly, an adaptive threshold method is used to get shadow feature, and a shadow!area merging approach is used to deal with the distortion of the shadow border. Region of interest(ROI) is obtained using shadow feature. Then in the ROI, symmetry feature is analyzed to verify whether there are vehicles and to locate the vehicles. Finally, using monocular vision distance measurement based on camera interior parameters and geometrical reasoning, we get the distance between own car and the preceding one. Experimental results show that the proposed method can detect the preceding vehicle effectively and get the distance between vehicles accurately.展开更多
Drogue recognition and 3D locating is a key problem during the docking phase of the autonomous aerial refueling (AAR). To solve this problem, a novel and effective method based on monocular vision is presented in th...Drogue recognition and 3D locating is a key problem during the docking phase of the autonomous aerial refueling (AAR). To solve this problem, a novel and effective method based on monocular vision is presented in this paper. Firstly, by employing computer vision with red-ring-shape feature, a drogue detection and recognition algorithm is proposed to guarantee safety and ensure the robustness to the drogue diversity and the changes in environmental condi- tions, without using a set of infrared light emitting diodes (LEDs) on the parachute part of the dro- gue. Secondly, considering camera lens distortion, a monocular vision measurement algorithm for drogue 3D locating is designed to ensure the accuracy and real-time performance of the system, with the drogue attitude provided. Finally, experiments are conducted to demonstrate the effective- ness of the proposed method. Experimental results show the performances of the entire system in contrast with other methods, which validates that the proposed method can recognize and locate the drogue three dimensionally, rapidly and precisely.展开更多
Objective:To explore the changes of lateral geniculate body and visual cortex in monocular strabismus and form deprived amblyopic rat,and visual development plastic stage and visual plasticity in adult rats.Methods:A ...Objective:To explore the changes of lateral geniculate body and visual cortex in monocular strabismus and form deprived amblyopic rat,and visual development plastic stage and visual plasticity in adult rats.Methods:A total of 60 SD rats ages 13 d were randomly divided into A,B,C three groups with 20 in each group,group A was set as the normal control group without any processing,group B was strabismus amblyopic group,using the unilateral extraocular rectus resection to establish the strabismus amblyopia model,group C was monocular form deprivation amblyopia group using unilateral eyelid edge resection+lid suture.At visual developmental early phase(P2S),meta phase(P3S),late phase(P45)and adult phase(P120),the lateral geniculate body and visual cortex area 17 of five rats in each group were exacted for C-fos Immunocytochemistry.Neuron morphological changes in lateral geniculate body and visual cortex was observed,the positive neurons differences of C-fos expression induced by light stimulation was measured in each group,and the condition of radiation development of P120 amblyopic adult rats was observed.Results:In groups B and C,C-fos positive cells were significantly lower than the control group at P25(P<0.05),there was no statistical difference of C-fos protein positive cells between group B and group A(P>0.05),C-fos protein positive cells level of group B was significantly lower than that of group A(P<0.05).The binoculus C-fos protein positive cells level of groups B and C were significantly higher than that of control group at P35,P4S and P120 with statistically significant differences(P<0.05).Conclusions:The increasing of C-fos expression in geniculate body and visual cortex neurons of adult amblyopia suggests the visual cortex neurons exist a certain degree of visual plasticity.展开更多
The rotation matrix estimation problem is a keypoint for mobile robot localization, navigation, and control. Based on the quaternion theory and the epipolar geometry, an extended Kalman filter (EKF) algorithm is propo...The rotation matrix estimation problem is a keypoint for mobile robot localization, navigation, and control. Based on the quaternion theory and the epipolar geometry, an extended Kalman filter (EKF) algorithm is proposed to estimate the rotation matrix by using a single-axis gyroscope and the image points correspondence from a monocular camera. The experimental results show that the precision of mobile robot s yaw angle estimated by the proposed EKF algorithm is much better than the results given by the image-only and gyroscope-only method, which demonstrates that our method is a preferable way to estimate the rotation for the autonomous mobile robot applications.展开更多
AIM: Many studies have demonstrated N-methyl-D-aspartate receptor-1-subunit (NMDAR1) is associated with amblyopia. The effectiveness of levodopa in improving the visual function of the children with amblyopia has also...AIM: Many studies have demonstrated N-methyl-D-aspartate receptor-1-subunit (NMDAR1) is associated with amblyopia. The effectiveness of levodopa in improving the visual function of the children with amblyopia has also been proved. But the mechanism is undefined. Our study was to explore the possible mechanism. METHODS: Sixty 14-day-old healthy SD rats were randomly divided into 4 groups, including normal group, monocular deprivation group, levodopa group and normal saline group, 15 rats each. We sutured all the rats' unilateral eyelids except normal group to establish the monocular deprivation animal model and raise them in normal sunlight till 45-day-old. NMDAR1 was detected in the visual cortex with immunohistochemistry methods, Western Blot and Real time PCR. LD and NS groups were gavaged with levodopa (40mg/kg) and normal saline for 28 days respectively. NMDAR1 was also detected with the methods above. RESULTS: NMDAR1 in the visual cortex of MD group was less than that of normal group. NMDAR1 in the visual cortex of LD group was more than that of NS group. CONCLUSION: NMDAR1 is associated with the plasticity of visual development. Levodopa may influence the expression of NMDAR1 and improve visual function, and its target may lie in the visual cortex.展开更多
A new visual measurement method is proposed to estimate three-dimensional (3D) position of the object on the floor based on a single camera. The camera fixed on a robot is in an inclined position with respect to the...A new visual measurement method is proposed to estimate three-dimensional (3D) position of the object on the floor based on a single camera. The camera fixed on a robot is in an inclined position with respect to the floor. A measurement model with the camera's extrinsic parameters such as the height and pitch angle is described. Single image of a chessboard pattern placed on the floor is enough to calibrate the camera's extrinsic parameters after the camera's intrinsic parameters are calibrated. Then the position of object on the floor can be computed with the measurement model. Furthermore, the height of object can be calculated with the paired-points in the vertical line sharing the same position on the floor. Compared to the conventional method used to estimate the positions on the plane, this method can obtain the 3D positions. The indoor experiment testifies the accuracy and validity of the proposed method.展开更多
In the laser displacement sensors measurement system,the laser beam direction is an important parameter.Particularly,the azimuth and pitch angles are the most important parameters to a laser beam.In this paper,based o...In the laser displacement sensors measurement system,the laser beam direction is an important parameter.Particularly,the azimuth and pitch angles are the most important parameters to a laser beam.In this paper,based on monocular vision,a laser beam direction measurement method is proposed.First,place the charge coupled device(CCD)camera above the base plane,and adjust and fix the camera position so that the optical axis is nearly perpendicular to the base plane.The monocular vision localization model is established by using circular aperture calibration board.Then the laser beam generating device is placed and maintained on the base plane at fixed position.At the same time a special target block is placed on the base plane so that the laser beam can project to the special target and form a laser spot.The CCD camera placed above the base plane can acquire the laser spot and the image of the target block clearly,so the two-dimensional(2D)image coordinate of the centroid of the laser spot can be extracted by correlation algorithm.The target is moved at an equal distance along the laser beam direction,and the spots and target images of each moving under the current position are collected by the CCD camera.By using the relevant transformation formula and combining the intrinsic parameters of the target block,the2D coordinates of the gravity center of the spot are converted to the three-dimensional(3D)coordinate in the base plane.Because of the moving of the target,the3D coordinates of the gravity center of the laser spot at different positions are obtained,and these3D coordinates are synthesized into a space straight line to represent the laser beam to be measured.In the experiment,the target parameters are measured by high-precision instruments,and the calibration parameters of the camera are calibrated by a high-precision calibration board to establish the corresponding positioning model.The measurement accuracy is mainly guaranteed by the monocular vision positioning accuracy and the gravity center extraction accuracy.The experimental results show the maximum error of the angle between laser beams reaches to0.04°and the maximum error of beam pitch angle reaches to0.02°.展开更多
Purpose: To assess the effects of monocular lid closure during critical period on cortical activity.Method: Pattern visual evoked potentials (PVEP) of the normal and the monocular deprivation (MD) cats were dynamicall...Purpose: To assess the effects of monocular lid closure during critical period on cortical activity.Method: Pattern visual evoked potentials (PVEP) of the normal and the monocular deprivation (MD) cats were dynamically measured and the number of gam-maaminobutyric acid immunopositive (GABA-IP) neurones of the area 17 of the visual cortex and the lateral geniculate nucleus (LGN) was quantitatively compared by using immunohistochemical method (ABC).Result: The amplitude of the N1-P1 attenuated in deprived eyes (DE) , NE/DE at postnatal week (PNW) 7-8 (P【0. 05), NE/DE at PNW 15-16 (P【0. 01); while P1 latency delayed, NE/DE at PNW 7-8 (P】0. 05), NE/DE at PNW 15 - 16(P【0. 05). The numbers of GABA-IP neurones in layer A1 of the ipsilateral LGN and in layer A of the contralateral LGN, compared to those in the corresponding normal laminae, were not significant at PNW 7 - 8 and PNW 11 - 12 (P 】0. 05), while in the same cats a reduction in the number of GABA-IP neurcnes was found in layer IV of area 17 at PNW展开更多
A hierarchical mobile robot simultaneous localization and mapping (SLAM) method that allows us to obtain accurate maps was presented. The local map level is composed of a set of local metric feature maps that are guar...A hierarchical mobile robot simultaneous localization and mapping (SLAM) method that allows us to obtain accurate maps was presented. The local map level is composed of a set of local metric feature maps that are guaranteed to be statistically independent. The global level is a topological graph whose arcs are labeled with the relative location between local maps. An estimation of these relative locations is maintained with local map alignment algorithm, and more accurate estimation is calculated through a global minimization procedure using the loop closure constraint. The local map is built with Rao-Blackwellised particle filter (RBPF), where the particle filter is used to extending the path posterior by sampling new poses. The landmark position estimation and update is implemented through extended Kalman filter (EKF). Monocular vision mounted on the robot tracks the 3D natural point landmarks, which are structured with matching scale invariant feature transform (SIFT) feature pairs. The matching for multi-dimension SIFT features is implemented with a KD-tree in the time cost of O(lbN). Experiment results on Pioneer mobile robot in a real indoor environment show the superior performance of our proposed method.展开更多
A system for mobile robot localization and navigation was presented.With the proposed system,the robot can be located and navigated by a single landmark in a single image.And the navigation mode may be following-track...A system for mobile robot localization and navigation was presented.With the proposed system,the robot can be located and navigated by a single landmark in a single image.And the navigation mode may be following-track,teaching and playback,or programming.The basic idea is that the system computes the differences between the expected and the recognized position at each time and then controls the robot in a direction to reduce those differences.To minimize the robot sensor equipment,only one omnidirectional camera was used.Experiments in disturbing environments show that the presented algorithm is robust and easy to implement,without camera rectification.The rootmean-square error(RMSE) of localization is 1.4,cm,and the navigation error in teaching and playback is within 10,cm.展开更多
Along with the increase of the number of failed satellites,plus space debris,year by year,it will take considerable manpower and resources if we rely just on ground surveillance and early warning.An alternative effect...Along with the increase of the number of failed satellites,plus space debris,year by year,it will take considerable manpower and resources if we rely just on ground surveillance and early warning.An alternative effective way would be to use autonomous long-range non-cooperative target relative navigation to solve this problem.For longrange non-cooperative targets,the stereo cameras or lidars that are commonly used would not be applicable.This paper studies a relative navigation method for long-range relative motion estimation of non-cooperative targets using only a monocular camera.Firstly,the paper provides the nonlinear relative orbit dynamics equations and then derives the discrete recursive form of the dynamics equations.An EKF filter is then designed to implement the relative navigation estimation.After that,the relative"locally weakly observability"theory for nonlinear systems is used to analyze the observability of monocular sequence images.The analysis results show that by relying only on monocular sequence images it has the possibility of deducing the relative navigation for long-range non-cooperative targets.Finally,numerical simulations show that the method given in this paper can achieve a complete estimation of the relative motion of longrange non-cooperative targets without conducting orbital maneuvers.展开更多
In terms of the requirement of automatically sorting pearls, the pearl contour feature extraction and shape recognition algorithm are studied in this paper to reckon with the rapid identification of pearls shape onlin...In terms of the requirement of automatically sorting pearls, the pearl contour feature extraction and shape recognition algorithm are studied in this paper to reckon with the rapid identification of pearls shape online,and a monocular dynamic machine vision-based pearl shape detection device is designed. Through blowing, the pearl is suspended in a funnel shaped container and flipped rapidly in the device. The entire surface image of the pearl to be measured can be promptly grasped by the camera placed right above the funnel. The results of illumination experiments conducted from different angles indicate that the image contour acquired by the medium angle illumination is better extracted. The pearl shape test indicates that the method is incorporated with the inflatable suspension device to classify the pearls into seven types according to the national standard,and additionally the average error rate is confined under 5.38%. The shape characteristic of the pearl can be detected promptly and reliably, and accordingly the high-speed automatic sorting can be satisfied.展开更多
基金supported by the National Science Fund for Distinguished Young Scholars,China(No.51625501)Aeronautical Science Foundation of China(No.20240046051002)National Natural Science Foundation of China(No.52005028).
文摘Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.
基金co-supported by the Science and Technology Innovation Program of Hunan Province,China(No.2023RC3023)the National Natural Science Foundation of China(No.12272404)。
文摘The autonomous landing guidance of fixed-wing aircraft in unknown structured scenes presents a substantial technological challenge,particularly regarding the effectiveness of solutions for monocular visual relative pose estimation.This study proposes a novel airborne monocular visual estimation method based on structured scene features to address this challenge.First,a multitask neural network model is established for segmentation,depth estimation,and slope estimation on monocular images.And a monocular image comprehensive three-dimensional information metric is designed,encompassing length,span,flatness,and slope information.Subsequently,structured edge features are leveraged to filter candidate landing regions adaptively.By leveraging the three-dimensional information metric,the optimal landing region is accurately and efficiently identified.Finally,sparse two-dimensional key point is used to parameterize the optimal landing region for the first time and a high-precision relative pose estimation is achieved.Additional measurement information is introduced to provide the autonomous landing guidance information between the aircraft and the optimal landing region.Experimental results obtained from both synthetic and real data demonstrate the effectiveness of the proposed method in monocular pose estimation for autonomous aircraft landing guidance in unknown structured scenes.
基金funded by the Yangtze River Delta Science and Technology Innovation Community Joint Research Project(2023CSJGG1600)the Natural Science Foundation of Anhui Province(2208085MF173)Wuhu“ChiZhu Light”Major Science and Technology Project(2023ZD01,2023ZD03).
文摘In the dynamic scene of autonomous vehicles,the depth estimation of monocular cameras often faces the problem of inaccurate edge depth estimation.To solve this problem,we propose an unsupervised monocular depth estimation model based on edge enhancement,which is specifically aimed at the depth perception challenge in dynamic scenes.The model consists of two core networks:a deep prediction network and a motion estimation network,both of which adopt an encoder-decoder architecture.The depth prediction network is based on the U-Net structure of ResNet18,which is responsible for generating the depth map of the scene.The motion estimation network is based on the U-Net structure of Flow-Net,focusing on the motion estimation of dynamic targets.In the decoding stage of the motion estimation network,we innovatively introduce an edge-enhanced decoder,which integrates a convolutional block attention module(CBAM)in the decoding process to enhance the recognition ability of the edge features of moving objects.In addition,we also designed a strip convolution module to improve the model’s capture efficiency of discrete moving targets.To further improve the performance of the model,we propose a novel edge regularization method based on the Laplace operator,which effectively accelerates the convergence process of themodel.Experimental results on the KITTI and Cityscapes datasets show that compared with the current advanced dynamic unsupervised monocular model,the proposed model has a significant improvement in depth estimation accuracy and convergence speed.Specifically,the rootmean square error(RMSE)is reduced by 4.8%compared with the DepthMotion algorithm,while the training convergence speed is increased by 36%,which shows the superior performance of the model in the depth estimation task in dynamic scenes.
基金supported in part by the National Natural Science Foundation of China under Grants 62071345。
文摘Self-supervised monocular depth estimation has emerged as a major research focus in recent years,primarily due to the elimination of ground-truth depth dependence.However,the prevailing architectures in this domain suffer from inherent limitations:existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions.These assumptions are often violated in real-world scenarios due to dynamic objects,non-Lambertian reflectance,and unstructured background elements,leading to pervasive artifacts such as depth discontinuities(“holes”),structural collapse,and ambiguous reconstruction.To address these challenges,we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network,enhancing its ability to model complex scene dynamics.Our contributions are threefold:(1)a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations;(2)a physically-informed loss function that couples dynamic pose and depth predictions,designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles;(3)an efficient SE(3)transformation parameterization that streamlines network complexity and temporal pre-processing.Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity,significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.
文摘Depth maps play a crucial role in various practical applications such as computer vision,augmented reality,and autonomous driving.How to obtain clear and accurate depth information in video depth estimation is a significant challenge faced in the field of computer vision.However,existing monocular video depth estimation models tend to produce blurred or inaccurate depth information in regions with object edges and low texture.To address this issue,we propose a monocular depth estimation model architecture guided by semantic segmentation masks,which introduces semantic information into the model to correct the ambiguous depth regions.We have evaluated the proposed method,and experimental results show that our method improves the accuracy of edge depth,demonstrating the effectiveness of our approach.
基金supported in part by the Major Project for New Generation of AI (2018AAA0100400)the National Natural Science Foundation of China (61836014,U21B2042,62072457,62006231)the InnoHK Program。
文摘Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
基金Program for Changjiang Scholars and Innovative Research Team in University (IRT0520)Ph.D.Programs Foundation of Ministry of Education of China (20070213055)
文摘Visual sensors are used to measure the relative state of the chaser spacecraft to the target spacecraft during close range ren- dezvous phases. This article proposes a two-stage iterative algorithm based on an inverse projection ray approach to address the relative position and attitude estimation by using feature points and monocular vision. It consists of two stages: absolute orienta- tion and depth recovery. In the first stage, Umeyama's algorithm is used to fit the three-dimensional (3D) model set and estimate the 3D point set while in the second stage, the depths of the observed feature points are estimated. This procedure is repeated until the result converges. Moreover, the effectiveness and convergence of the proposed algorithm are verified through theoreti- cal analysis and mathematical simulation.
基金The National Key Technology R&D Program of China during the 11th Five-Year Plan Period(2009BAG13A04)Jiangsu Transportation Science Research Program(No.08X09)Program of Suzhou Science and Technology(No.SG201076)
文摘In order to decrease vehicle crashes, a new rear view vehicle detection system based on monocular vision is designed. First, a small and flexible hardware platform based on a DM642 digtal signal processor (DSP) micro-controller is built. Then, a two-step vehicle detection algorithm is proposed. In the first step, a fast vehicle edge and symmetry fusion algorithm is used and a low threshold is set so that all the possible vehicles have a nearly 100% detection rate (TP) and the non-vehicles have a high false detection rate (FP), i. e., all the possible vehicles can be obtained. In the second step, a classifier using a probabilistic neural network (PNN) which is based on multiple scales and an orientation Gabor feature is trained to classify the possible vehicles and eliminate the false detected vehicles from the candidate vehicles generated in the first step. Experimental results demonstrate that the proposed system maintains a high detection rate and a low false detection rate under different road, weather and lighting conditions.
基金Key Projects in the Tianjin Science & Technology Pillay Program
文摘Vehicle anti-collision technique is a hot topic in the research area of Intelligent Transport System. The research on preceding vehicles detection and the distance measurement, which are the key techniques, makes great contributions to safe-driving. This paper presents a method which can be used to detect preceding vehicles and get the distance between own car and the car ahead. Firstly, an adaptive threshold method is used to get shadow feature, and a shadow!area merging approach is used to deal with the distortion of the shadow border. Region of interest(ROI) is obtained using shadow feature. Then in the ROI, symmetry feature is analyzed to verify whether there are vehicles and to locate the vehicles. Finally, using monocular vision distance measurement based on camera interior parameters and geometrical reasoning, we get the distance between own car and the preceding one. Experimental results show that the proposed method can detect the preceding vehicle effectively and get the distance between vehicles accurately.
基金supported by the National Natural Science Foundation of China(Nos.61473307,61304120)
文摘Drogue recognition and 3D locating is a key problem during the docking phase of the autonomous aerial refueling (AAR). To solve this problem, a novel and effective method based on monocular vision is presented in this paper. Firstly, by employing computer vision with red-ring-shape feature, a drogue detection and recognition algorithm is proposed to guarantee safety and ensure the robustness to the drogue diversity and the changes in environmental condi- tions, without using a set of infrared light emitting diodes (LEDs) on the parachute part of the dro- gue. Secondly, considering camera lens distortion, a monocular vision measurement algorithm for drogue 3D locating is designed to ensure the accuracy and real-time performance of the system, with the drogue attitude provided. Finally, experiments are conducted to demonstrate the effective- ness of the proposed method. Experimental results show the performances of the entire system in contrast with other methods, which validates that the proposed method can recognize and locate the drogue three dimensionally, rapidly and precisely.
文摘Objective:To explore the changes of lateral geniculate body and visual cortex in monocular strabismus and form deprived amblyopic rat,and visual development plastic stage and visual plasticity in adult rats.Methods:A total of 60 SD rats ages 13 d were randomly divided into A,B,C three groups with 20 in each group,group A was set as the normal control group without any processing,group B was strabismus amblyopic group,using the unilateral extraocular rectus resection to establish the strabismus amblyopia model,group C was monocular form deprivation amblyopia group using unilateral eyelid edge resection+lid suture.At visual developmental early phase(P2S),meta phase(P3S),late phase(P45)and adult phase(P120),the lateral geniculate body and visual cortex area 17 of five rats in each group were exacted for C-fos Immunocytochemistry.Neuron morphological changes in lateral geniculate body and visual cortex was observed,the positive neurons differences of C-fos expression induced by light stimulation was measured in each group,and the condition of radiation development of P120 amblyopic adult rats was observed.Results:In groups B and C,C-fos positive cells were significantly lower than the control group at P25(P<0.05),there was no statistical difference of C-fos protein positive cells between group B and group A(P>0.05),C-fos protein positive cells level of group B was significantly lower than that of group A(P<0.05).The binoculus C-fos protein positive cells level of groups B and C were significantly higher than that of control group at P35,P4S and P120 with statistically significant differences(P<0.05).Conclusions:The increasing of C-fos expression in geniculate body and visual cortex neurons of adult amblyopia suggests the visual cortex neurons exist a certain degree of visual plasticity.
基金supported by National Natural Science Foundation of China (Nos. 60874010 and 61070048)Innovation Program of Shanghai Municipal Education Commission (No. 11ZZ37)+1 种基金Fundamental Research Funds for the Central Universities (No. 009QJ12)Collaborative Construction Project of Beijing Municipal Commission of Education
文摘The rotation matrix estimation problem is a keypoint for mobile robot localization, navigation, and control. Based on the quaternion theory and the epipolar geometry, an extended Kalman filter (EKF) algorithm is proposed to estimate the rotation matrix by using a single-axis gyroscope and the image points correspondence from a monocular camera. The experimental results show that the precision of mobile robot s yaw angle estimated by the proposed EKF algorithm is much better than the results given by the image-only and gyroscope-only method, which demonstrates that our method is a preferable way to estimate the rotation for the autonomous mobile robot applications.
文摘AIM: Many studies have demonstrated N-methyl-D-aspartate receptor-1-subunit (NMDAR1) is associated with amblyopia. The effectiveness of levodopa in improving the visual function of the children with amblyopia has also been proved. But the mechanism is undefined. Our study was to explore the possible mechanism. METHODS: Sixty 14-day-old healthy SD rats were randomly divided into 4 groups, including normal group, monocular deprivation group, levodopa group and normal saline group, 15 rats each. We sutured all the rats' unilateral eyelids except normal group to establish the monocular deprivation animal model and raise them in normal sunlight till 45-day-old. NMDAR1 was detected in the visual cortex with immunohistochemistry methods, Western Blot and Real time PCR. LD and NS groups were gavaged with levodopa (40mg/kg) and normal saline for 28 days respectively. NMDAR1 was also detected with the methods above. RESULTS: NMDAR1 in the visual cortex of MD group was less than that of normal group. NMDAR1 in the visual cortex of LD group was more than that of NS group. CONCLUSION: NMDAR1 is associated with the plasticity of visual development. Levodopa may influence the expression of NMDAR1 and improve visual function, and its target may lie in the visual cortex.
基金supported by National Natural Science Foundation of China(Nos.61273352 and 61473295)National High Technology Research and Development Program of China(863 Program)(No.2015AA042307)Beijing Natural Science Foundation(No.4161002)
文摘A new visual measurement method is proposed to estimate three-dimensional (3D) position of the object on the floor based on a single camera. The camera fixed on a robot is in an inclined position with respect to the floor. A measurement model with the camera's extrinsic parameters such as the height and pitch angle is described. Single image of a chessboard pattern placed on the floor is enough to calibrate the camera's extrinsic parameters after the camera's intrinsic parameters are calibrated. Then the position of object on the floor can be computed with the measurement model. Furthermore, the height of object can be calculated with the paired-points in the vertical line sharing the same position on the floor. Compared to the conventional method used to estimate the positions on the plane, this method can obtain the 3D positions. The indoor experiment testifies the accuracy and validity of the proposed method.
基金National Science and Technology Major Project of China(No.2016ZX04003001)Tianjin Research Program of Application Foundation and Advanced Technology(No.14JCZDJC39700)
文摘In the laser displacement sensors measurement system,the laser beam direction is an important parameter.Particularly,the azimuth and pitch angles are the most important parameters to a laser beam.In this paper,based on monocular vision,a laser beam direction measurement method is proposed.First,place the charge coupled device(CCD)camera above the base plane,and adjust and fix the camera position so that the optical axis is nearly perpendicular to the base plane.The monocular vision localization model is established by using circular aperture calibration board.Then the laser beam generating device is placed and maintained on the base plane at fixed position.At the same time a special target block is placed on the base plane so that the laser beam can project to the special target and form a laser spot.The CCD camera placed above the base plane can acquire the laser spot and the image of the target block clearly,so the two-dimensional(2D)image coordinate of the centroid of the laser spot can be extracted by correlation algorithm.The target is moved at an equal distance along the laser beam direction,and the spots and target images of each moving under the current position are collected by the CCD camera.By using the relevant transformation formula and combining the intrinsic parameters of the target block,the2D coordinates of the gravity center of the spot are converted to the three-dimensional(3D)coordinate in the base plane.Because of the moving of the target,the3D coordinates of the gravity center of the laser spot at different positions are obtained,and these3D coordinates are synthesized into a space straight line to represent the laser beam to be measured.In the experiment,the target parameters are measured by high-precision instruments,and the calibration parameters of the camera are calibrated by a high-precision calibration board to establish the corresponding positioning model.The measurement accuracy is mainly guaranteed by the monocular vision positioning accuracy and the gravity center extraction accuracy.The experimental results show the maximum error of the angle between laser beams reaches to0.04°and the maximum error of beam pitch angle reaches to0.02°.
文摘Purpose: To assess the effects of monocular lid closure during critical period on cortical activity.Method: Pattern visual evoked potentials (PVEP) of the normal and the monocular deprivation (MD) cats were dynamically measured and the number of gam-maaminobutyric acid immunopositive (GABA-IP) neurones of the area 17 of the visual cortex and the lateral geniculate nucleus (LGN) was quantitatively compared by using immunohistochemical method (ABC).Result: The amplitude of the N1-P1 attenuated in deprived eyes (DE) , NE/DE at postnatal week (PNW) 7-8 (P【0. 05), NE/DE at PNW 15-16 (P【0. 01); while P1 latency delayed, NE/DE at PNW 7-8 (P】0. 05), NE/DE at PNW 15 - 16(P【0. 05). The numbers of GABA-IP neurones in layer A1 of the ipsilateral LGN and in layer A of the contralateral LGN, compared to those in the corresponding normal laminae, were not significant at PNW 7 - 8 and PNW 11 - 12 (P 】0. 05), while in the same cats a reduction in the number of GABA-IP neurcnes was found in layer IV of area 17 at PNW
基金The National High Technology Research and Development Program (863) of China (No2006AA04Z259)The National Natural Sci-ence Foundation of China (No60643005)
文摘A hierarchical mobile robot simultaneous localization and mapping (SLAM) method that allows us to obtain accurate maps was presented. The local map level is composed of a set of local metric feature maps that are guaranteed to be statistically independent. The global level is a topological graph whose arcs are labeled with the relative location between local maps. An estimation of these relative locations is maintained with local map alignment algorithm, and more accurate estimation is calculated through a global minimization procedure using the loop closure constraint. The local map is built with Rao-Blackwellised particle filter (RBPF), where the particle filter is used to extending the path posterior by sampling new poses. The landmark position estimation and update is implemented through extended Kalman filter (EKF). Monocular vision mounted on the robot tracks the 3D natural point landmarks, which are structured with matching scale invariant feature transform (SIFT) feature pairs. The matching for multi-dimension SIFT features is implemented with a KD-tree in the time cost of O(lbN). Experiment results on Pioneer mobile robot in a real indoor environment show the superior performance of our proposed method.
基金Supported by National Natural Science Foundation of China (No. 31000422 and No. 61201081)Tianjin Municipal Education Commission(No.20110829)Tianjin Science and Technology Committee(No. 10JCZDJC22800)
文摘A system for mobile robot localization and navigation was presented.With the proposed system,the robot can be located and navigated by a single landmark in a single image.And the navigation mode may be following-track,teaching and playback,or programming.The basic idea is that the system computes the differences between the expected and the recognized position at each time and then controls the robot in a direction to reduce those differences.To minimize the robot sensor equipment,only one omnidirectional camera was used.Experiments in disturbing environments show that the presented algorithm is robust and easy to implement,without camera rectification.The rootmean-square error(RMSE) of localization is 1.4,cm,and the navigation error in teaching and playback is within 10,cm.
文摘Along with the increase of the number of failed satellites,plus space debris,year by year,it will take considerable manpower and resources if we rely just on ground surveillance and early warning.An alternative effective way would be to use autonomous long-range non-cooperative target relative navigation to solve this problem.For longrange non-cooperative targets,the stereo cameras or lidars that are commonly used would not be applicable.This paper studies a relative navigation method for long-range relative motion estimation of non-cooperative targets using only a monocular camera.Firstly,the paper provides the nonlinear relative orbit dynamics equations and then derives the discrete recursive form of the dynamics equations.An EKF filter is then designed to implement the relative navigation estimation.After that,the relative"locally weakly observability"theory for nonlinear systems is used to analyze the observability of monocular sequence images.The analysis results show that by relying only on monocular sequence images it has the possibility of deducing the relative navigation for long-range non-cooperative targets.Finally,numerical simulations show that the method given in this paper can achieve a complete estimation of the relative motion of longrange non-cooperative targets without conducting orbital maneuvers.
基金the Foundation of Zhejiang Key Level1 Discipline of Forestry Engineering within the Research Project(No.2014lygcz018)the Public Welfare Project of Zhejiang Science and Technology Department(No.2012C32021)+1 种基金the Preresearch Project of the Research Center for Smart Agriculture and Forestry,Zhejiang Agricultural and Forestry University(No.2013ZHNL02)the Scientific Research Foundation of Zhejiang Agricultural and Forestry University(No.2012FR070)
文摘In terms of the requirement of automatically sorting pearls, the pearl contour feature extraction and shape recognition algorithm are studied in this paper to reckon with the rapid identification of pearls shape online,and a monocular dynamic machine vision-based pearl shape detection device is designed. Through blowing, the pearl is suspended in a funnel shaped container and flipped rapidly in the device. The entire surface image of the pearl to be measured can be promptly grasped by the camera placed right above the funnel. The results of illumination experiments conducted from different angles indicate that the image contour acquired by the medium angle illumination is better extracted. The pearl shape test indicates that the method is incorporated with the inflatable suspension device to classify the pearls into seven types according to the national standard,and additionally the average error rate is confined under 5.38%. The shape characteristic of the pearl can be detected promptly and reliably, and accordingly the high-speed automatic sorting can be satisfied.