Effective and robust recognition and tracking of objects are the key problems in visual surveillance systems. Most existing object recognition methods were designed with particular objects in mind. This study presents...Effective and robust recognition and tracking of objects are the key problems in visual surveillance systems. Most existing object recognition methods were designed with particular objects in mind. This study presents a general moving objects recognition method using global features of targets. Targets are extracted with an adaptive Gaussian mixture model and their silhouette images are captured and unified. A new objects silhouette database is built to provide abundant samples to train the subspace feature. This database is more convincing than the previous ones. A more effective dimension reduction method based on graph embedding is used to obtain the projection eigenvector. In our experiments, we show the effective performance of our method in addressing the moving objects recognition problem and its superiority compared with the previous methods.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,u...Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,unmanned driving and other fields.In order to realize the real-time recognition and location of indoor scene objects,this article proposes an improved YOLOv3 neural network model,which combines densely connected networks and residual networks to construct a new YOLOv3 backbone network,which is applied to the detection and recognition of objects in indoor scenes.In this article,RealSense D415 RGB-D camera is used to obtain the RGB map and depth map,the actual distance value is calculated after each pixel in the scene image is mapped to the real scene.Experiment results proved that the detection and recognition accuracy and real-time performance by the new network are obviously improved compared with the previous YOLOV3 neural network model in the same scene.More objects can be detected after the improvement of network which cannot be detected with the YOLOv3 network before the improvement.The running time of objects detection and recognition is reduced to less than half of the original.This improved network has a certain reference value for practical engineering application.展开更多
With the rapid development of flexible electronics,the tactile systems for object recognition are becoming increasingly delicate.This paper presents the design of a tactile glove for object recognition,integrating 243...With the rapid development of flexible electronics,the tactile systems for object recognition are becoming increasingly delicate.This paper presents the design of a tactile glove for object recognition,integrating 243 palm pressure units and 126 finger joint strain units that are implemented by piezoresistive Velostat film.The palm pressure and joint bending strain data from the glove were collected using a two-dimensional resistance array scanning circuit and further converted into tactile images with a resolution of 32×32.To verify the effect of tactile data types on recognition precision,three datasets of tactile images were respectively built by palm pressure data,joint bending strain data,and a tactile data combing of both palm pressure and joint bending strain.An improved residual convolutional neural network(CNN)model,SP-ResNet,was developed by light-weighting ResNet-18 to classify these tactile images.Experimental results show that the data collection method combining palm pressure and joint bending strain demonstrates a 4.33%improvement in recognition precision compared to the best results obtained by using only palm pressure or joint bending strain.The recognition precision of 95.50%for 16 objects can be achieved by the presented tactile glove with SP-ResNet of less computation cost.The presented tactile system can serve as a sensing platform for intelligent prosthetics and robot grippers.展开更多
An object learning and recognition system is implemented for humanoid robots to discover and memorize objects only by simple interactions with non-expert users. When the object is presented, the system makes use of th...An object learning and recognition system is implemented for humanoid robots to discover and memorize objects only by simple interactions with non-expert users. When the object is presented, the system makes use of the motion information over consecutive frames to extract object features and implements machine learning based on the bag of visual words approach. Instead of using a local feature descriptor only, the proposed system uses the co-occurring local features in order to increase feature discriminative power for both object model learning and inference stages. For different objects with different textures, a hybrid sampling strategy is considered. This hybrid approach minimizes the consumption of computation resources and helps achieving good performances demonstrated on a set of a dozen different daily objects.展开更多
The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,...The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,building models from scratch is computationally expensive and requires large datasets.This paper presents a transfer-learning-based approach for category-specific 3D reconstruction from a single 2D image.The core idea is to fine-tune a pre-trained model on specific object categories using new,unseen data,resulting in specialized versions of the model that are better adapted to reconstruct particular objects.The proposed approach utilizes a three-phase pipeline comprising image acquisition,3D reconstruction,and refinement.After ensuring the quality of the input image,a ResNet50 model is used for object recognition,directing the image to the corresponding category-specific model to generate a voxel-based representation.The voxel-based 3D model is then refined by transforming it into a detailed triangular mesh representation using the Marching Cubes algorithm and Laplacian smoothing.An experimental study,using the Pix2Vox model and the Pascal3D dataset,has been conducted to evaluate and validate the effectiveness of the proposed approach.Results demonstrate that category-specific fine-tuning of Pix2Vox significantly outperforms both the original model and the general model fine-tuned for all object categories,with substantial gains in Intersection over Union(IoU)scores.Visual assessments confirm improvements in geometric detail and surface realism.These findings indicate that combining transfer learning with category-specific fine tuning and refinement strategy of our approach leads to better-quality 3D model generation.展开更多
This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obsta...This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.展开更多
[Objectives]To investigate the ameliorative effects of Huanglian Jiedu Decoction(HLJDD)on cognitive function impairment in an Alzheimer s disease(AD)mouse model induced by Porphyromonas gingivalis infection.[Methods]T...[Objectives]To investigate the ameliorative effects of Huanglian Jiedu Decoction(HLJDD)on cognitive function impairment in an Alzheimer s disease(AD)mouse model induced by Porphyromonas gingivalis infection.[Methods]Thirty-six male C57BL/6 mice were randomly assigned to six groups:control group,model group,low-dose HLJDD group,medium-dose HLJDD group,high-dose HLJDD group,and positive drug group(treated with moxifloxacin).With the exception of the control group,all groups underwent an 8-week P.gingivalis chronic infection model induced via oral administration.Subsequently,each treatment group received corresponding doses of HLJDD(2.5,5,and 10 mg/g)or moxifloxacin for 8 weeks intervention.The novel object recognition test was employed to evaluate the non-spatial memory abilities of mice,and the novel object exploration preference index was calculated to assess cognitive function.[Results]Compared to the control group,the novel object exploration preference index of mice in the model group was significantly reduced(P<0.01),indicating that P.gingivalis infection effectively induced cognitive impairment.Relative to the model group,mice treated with medium and high doses of HLJDD exhibited a significant,dose-dependent increase in the novel object exploration preference index,whereas the low-dose group showed no significant improvement.Additionally,the positive drug moxifloxacin demonstrated a significant neuroprotective effect on cognition.[Conclusions]HLJDD effectively improves cognitive function impairment in AD model mice induced by P.gingivalis infection,offering novel experimental evidence supporting the heat-clearing and detoxification approach as well as the therapeutic potential of traditional Chinese medicine(TCM)compounds in the intervention of AD.展开更多
Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this uniq...Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence.展开更多
A method for moving object recognition and tracking in the intelligent traffic monitoring system is presented. For the shortcomings and deficiencies of the frame-subtraction method, a redundant discrete wavelet transf...A method for moving object recognition and tracking in the intelligent traffic monitoring system is presented. For the shortcomings and deficiencies of the frame-subtraction method, a redundant discrete wavelet transform (RDWT) based moving object recognition algorithm is put forward, which directly detects moving objects in the redundant discrete wavelet transform domain. An improved adaptive mean-shift algorithm is used to track the moving object in the follow up frames. Experimental results show that the algorithm can effectively extract the moving object, even though the object is similar to the background, and the results are better than the traditional frame-subtraction method. The object tracking is accurate without the impact of changes in the size of the object. Therefore the algorithm has a certain practical value and prospect.展开更多
Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-...Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-dimensional (3D) satellites dataset named BUAA Satellite Image Dataset (BUAA-SID 1.0) to supply data for 3D space object research. Then, based on the dataset, we propose to recognize full-viewpoint 3D space objects based on kernel locality preserving projections (KLPP). To obtain more accurate and separable description of the objects, firstly, we build feature vectors employing moment invariants, Fourier descriptors, region covariance and histogram of oriented gradients. Then, we map the features into kernel space followed by dimensionality reduction using KLPP to obtain the submanifold of the features. At last, k-nearest neighbor (kNN) is used to accomplish the classification. Experimental results show that the proposed approach is more appropriate for space object recognition mainly considering changes of viewpoints. Encouraging recognition rate could be obtained based on images in BUAA-SID 1.0, and the highest recognition result could achieve 95.87%.展开更多
In the traditional pattern classification method,it usually assumes that the object to be classified must lie in one of given(known)classes of the training data set.However,the training data set may not contain the cl...In the traditional pattern classification method,it usually assumes that the object to be classified must lie in one of given(known)classes of the training data set.However,the training data set may not contain the class of some objects in practice,and this is considered as an Open-Set Recognition(OSR)problem.In this paper,we propose a new progressive open-set recognition method with adaptive probability threshold.Both the labeled training data and the test data(objects to be classified)are put into a common data set,and the k-Nearest Neighbors(k-NNs)of each object are sought in this common set.Then,we can determine the probability of object lying in the given classes.If the majority of k-NNs of the object are from labeled training data,this object quite likely belongs to one of the given classes,and the density of the object and its neighbors is taken into account here.However,when most of k-NNs are from the unlabeled test data set,the class of object is considered very uncertain because the class of test data is unknown,and this object cannot be classified in this step.Once the objects belonging to known classes with high probability are all found,we re-calculate the probability of the other uncertain objects belonging to known classes based on the labeled training data and the objects marked with the estimated probability.Such iteration will stop when the probabilities of all the objects belonging to known classes are not changed.Then,a modified Otsu’s method is employed to adaptively seek the probability threshold for the final classification.If the probability of object belonging to known classes is smaller than this threshold,it will be assigned to the ignorant(unknown)class that is not included in training data set.The other objects will be committed to a specific class.The effectiveness of the proposed method has been validated using some experiments.展开更多
The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We...The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We propose a genetic algorithm(GA) based deep belief neural network(DBNN) method for robot object recognition and grasping purpose. This method optimizes the parameters of the DBNN method, such as the number of hidden units, the number of epochs, and the learning rates, which would reduce the error rate and the network training time of object recognition. After recognizing objects, the robot performs the pick-andplace operations. We build a database of six objects for experimental purpose. Experimental results demonstrate that our method outperforms on the optimized robot object recognition and grasping tasks.展开更多
A new method based on adaptive Hessian matrix threshold of finding key SRUF ( speeded up robust features) features is proposed and is applied to an unmanned vehicle for its dynamic object recognition and guided navi...A new method based on adaptive Hessian matrix threshold of finding key SRUF ( speeded up robust features) features is proposed and is applied to an unmanned vehicle for its dynamic object recognition and guided navigation. First, the object recognition algorithm based on SURF feature matching for unmanned vehicle guided navigation is introduced. Then, the standard local invariant feature extraction algorithm SRUF is analyzed, the Hessian Metrix is especially discussed, and a method of adaptive Hessian threshold is proposed which is based on correct matching point pairs threshold feedback under a close loop frame. At last, different dynamic object recognition experi- ments under different weather light conditions are discussed. The experimental result shows that the key SURF feature abstract algorithm and the dynamic object recognition method can be used for un- manned vehicle systems.展开更多
This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion b...This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.展开更多
The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we propose...The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we proposed a kernel regression-based method for joint multi-view space object recognition and pose estimation. We built a new simulated satellite image dataset named BUAA-SID 1.5 to test our method using different image representations. We evaluated our method for recognition-only tasks, pose estimation-only tasks, and joint recognition and pose estimation tasks. Experimental results show that our method outperforms the state-of-the-arts in space object recognition, and can recognize space objects and estimate their poses effectively and robustly against noise and lighting conditions.展开更多
In order to accomplish the task of object recognition in natural scenes,a new object recognition algorithm based on an improved convolutional neural network(CNN)is proposed.First,candidate object windows are extracted...In order to accomplish the task of object recognition in natural scenes,a new object recognition algorithm based on an improved convolutional neural network(CNN)is proposed.First,candidate object windows are extracted from the original image.Then,candidate object windows are input into the improved CNN model to obtain deep features.Finally,the deep features are input into the Softmax and the confidence scores of classes are obtained.The candidate object window with the highest confidence score is selected as the object recognition result.Based on AlexNet,Inception V1 is introduced into the improved CNN and the fully connected layer is replaced by the average pooling layer,which widens the network and deepens the network at the same time.Experimental results show that the improved object recognition algorithm can obtain better recognition results in multiple natural scene images,and has a higher degree of accuracy than the classical algorithms in the field of object recognition.展开更多
The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Con...The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Convolutional Block Attention Module (CBAM) has been developed. This algorithm initially employs the latest YOLOv8 for object recognition. Subsequently, the integration of CBAM enhances its feature extraction capabilities. Finally, the WIoU function is used to optimize the network’s bounding box loss, facilitating rapid convergence. Experimental validation using a smoke and fire dataset demonstrated that the proposed algorithm achieved a 2.3% increase in smoke and fire detection accuracy, surpassing other state-of-the-art methods.展开更多
Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to ...Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to produce satisfa ctory effects.Therefore,in the search for a solution,we found that a treatment with the gene corresponding to the RGS14414protein in visual area V2,a brain area connected with brain circuits of the ventral stream and the medial temporal lobe,which is crucial for object recognition memory(ORM),can induce enhancement of ORM.In this study,we demonstrated that the same treatment with RGS14414in visual area V2,which is relatively unaffected in neurodegenerative diseases such as Alzheimer s disease,produced longlasting enhancement of ORM in young animals and prevent ORM deficits in rodent models of aging and Alzheimer’s disease.Furthermore,we found that the prevention of memory deficits was mediated through the upregulation of neuronal arbo rization and spine density,as well as an increase in brain-derived neurotrophic factor(BDNF).A knockdown of BDNF gene in RGS14414-treated aging rats and Alzheimer s disease model mice caused complete loss in the upregulation of neuronal structural plasticity and in the prevention of ORM deficits.These findings suggest that BDNF-mediated neuronal structural plasticity in area V2 is crucial in the prevention of memory deficits in RGS14414-treated rodent models of aging and Alzheimer’s disease.Therefore,our findings of RGS14414gene-mediated activation of neuronal circuits in visual area V2 have therapeutic relevance in the treatment of memory deficits.展开更多
基金Project (No. 60805001) partially supported by the National NaturalScience Foundation of China
文摘Effective and robust recognition and tracking of objects are the key problems in visual surveillance systems. Most existing object recognition methods were designed with particular objects in mind. This study presents a general moving objects recognition method using global features of targets. Targets are extracted with an adaptive Gaussian mixture model and their silhouette images are captured and unified. A new objects silhouette database is built to provide abundant samples to train the subspace feature. This database is more convincing than the previous ones. A more effective dimension reduction method based on graph embedding is used to obtain the projection eigenvector. In our experiments, we show the effective performance of our method in addressing the moving objects recognition problem and its superiority compared with the previous methods.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金supported by Henan Province Science and Technology Project under Grant No.182102210065.
文摘Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,unmanned driving and other fields.In order to realize the real-time recognition and location of indoor scene objects,this article proposes an improved YOLOv3 neural network model,which combines densely connected networks and residual networks to construct a new YOLOv3 backbone network,which is applied to the detection and recognition of objects in indoor scenes.In this article,RealSense D415 RGB-D camera is used to obtain the RGB map and depth map,the actual distance value is calculated after each pixel in the scene image is mapped to the real scene.Experiment results proved that the detection and recognition accuracy and real-time performance by the new network are obviously improved compared with the previous YOLOV3 neural network model in the same scene.More objects can be detected after the improvement of network which cannot be detected with the YOLOv3 network before the improvement.The running time of objects detection and recognition is reduced to less than half of the original.This improved network has a certain reference value for practical engineering application.
基金supported by the Key Research and Development Program of Shaanxi Province(No.2024 GX-YBXM-178)the Shaanxi Province Qinchuangyuan“Scientists+Engineers”Team Development(No.2022KXJ032)。
文摘With the rapid development of flexible electronics,the tactile systems for object recognition are becoming increasingly delicate.This paper presents the design of a tactile glove for object recognition,integrating 243 palm pressure units and 126 finger joint strain units that are implemented by piezoresistive Velostat film.The palm pressure and joint bending strain data from the glove were collected using a two-dimensional resistance array scanning circuit and further converted into tactile images with a resolution of 32×32.To verify the effect of tactile data types on recognition precision,three datasets of tactile images were respectively built by palm pressure data,joint bending strain data,and a tactile data combing of both palm pressure and joint bending strain.An improved residual convolutional neural network(CNN)model,SP-ResNet,was developed by light-weighting ResNet-18 to classify these tactile images.Experimental results show that the data collection method combining palm pressure and joint bending strain demonstrates a 4.33%improvement in recognition precision compared to the best results obtained by using only palm pressure or joint bending strain.The recognition precision of 95.50%for 16 objects can be achieved by the presented tactile glove with SP-ResNet of less computation cost.The presented tactile system can serve as a sensing platform for intelligent prosthetics and robot grippers.
基金The National Natural Science Foundation of China(No.60672094,60971098)
文摘An object learning and recognition system is implemented for humanoid robots to discover and memorize objects only by simple interactions with non-expert users. When the object is presented, the system makes use of the motion information over consecutive frames to extract object features and implements machine learning based on the bag of visual words approach. Instead of using a local feature descriptor only, the proposed system uses the co-occurring local features in order to increase feature discriminative power for both object model learning and inference stages. For different objects with different textures, a hybrid sampling strategy is considered. This hybrid approach minimizes the consumption of computation resources and helps achieving good performances demonstrated on a set of a dozen different daily objects.
基金funded by the Research,Development,and Innovation Authority(RDIA)—Kingdom of Saudi Arabia—under supervision Energy,Industry,and Advanced Technologies Research Center,Taibah University,Madinah,Saudi Arabia with grant number(12979-iau-2023-TAU-R-3-1-EI-).
文摘The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,building models from scratch is computationally expensive and requires large datasets.This paper presents a transfer-learning-based approach for category-specific 3D reconstruction from a single 2D image.The core idea is to fine-tune a pre-trained model on specific object categories using new,unseen data,resulting in specialized versions of the model that are better adapted to reconstruct particular objects.The proposed approach utilizes a three-phase pipeline comprising image acquisition,3D reconstruction,and refinement.After ensuring the quality of the input image,a ResNet50 model is used for object recognition,directing the image to the corresponding category-specific model to generate a voxel-based representation.The voxel-based 3D model is then refined by transforming it into a detailed triangular mesh representation using the Marching Cubes algorithm and Laplacian smoothing.An experimental study,using the Pix2Vox model and the Pascal3D dataset,has been conducted to evaluate and validate the effectiveness of the proposed approach.Results demonstrate that category-specific fine-tuning of Pix2Vox significantly outperforms both the original model and the general model fine-tuned for all object categories,with substantial gains in Intersection over Union(IoU)scores.Visual assessments confirm improvements in geometric detail and surface realism.These findings indicate that combining transfer learning with category-specific fine tuning and refinement strategy of our approach leads to better-quality 3D model generation.
基金supported by the National Science and Technology Council of under Grant NSTC 114-2221-E-130-007.
文摘This paper presents an intelligent patrol and security robot integrating 2D LiDAR and RGB-D vision sensors to achieve semantic simultaneous localization and mapping(SLAM),real-time object recognition,and dynamic obstacle avoidance.The system employs the YOLOv7 deep-learning framework for semantic detection and SLAM for localization and mapping,fusing geometric and visual data to build a high-fidelity 2D semantic map.This map enables the robot to identify and project object information for improved situational awareness.Experimental results show that object recognition reached 95.4%mAP@0.5.Semantic completeness increased from 68.7%(single view)to 94.1%(multi-view)with an average position error of 3.1 cm.During navigation,the robot achieved 98.0%reliability,avoided moving obstacles in 90.0%of encounters,and replanned paths in 0.42 s on average.The integration of LiDAR-based SLAMwith deep-learning–driven semantic perception establishes a robust foundation for intelligent,adaptive,and safe robotic navigation in dynamic environments.
基金Supported by National Natural Science Foundation of China(82160832)Natural Science Foundation of Guangxi Zhuang Autonomous Region(2017GXNS-FAA198255)+2 种基金Open Project of Guangxi Key Laboratory of Brain and Cognitive Neuroscience(GKLBCN-202206-02)Guangxi Undergraduate Innovation and Entrepreneurship Training Program(202410601029,S202410601113)The 4th Thousand Young and Middle-Aged Backbone Teachers Cultivation Program of Guangxi Higher Education Institutions.
文摘[Objectives]To investigate the ameliorative effects of Huanglian Jiedu Decoction(HLJDD)on cognitive function impairment in an Alzheimer s disease(AD)mouse model induced by Porphyromonas gingivalis infection.[Methods]Thirty-six male C57BL/6 mice were randomly assigned to six groups:control group,model group,low-dose HLJDD group,medium-dose HLJDD group,high-dose HLJDD group,and positive drug group(treated with moxifloxacin).With the exception of the control group,all groups underwent an 8-week P.gingivalis chronic infection model induced via oral administration.Subsequently,each treatment group received corresponding doses of HLJDD(2.5,5,and 10 mg/g)or moxifloxacin for 8 weeks intervention.The novel object recognition test was employed to evaluate the non-spatial memory abilities of mice,and the novel object exploration preference index was calculated to assess cognitive function.[Results]Compared to the control group,the novel object exploration preference index of mice in the model group was significantly reduced(P<0.01),indicating that P.gingivalis infection effectively induced cognitive impairment.Relative to the model group,mice treated with medium and high doses of HLJDD exhibited a significant,dose-dependent increase in the novel object exploration preference index,whereas the low-dose group showed no significant improvement.Additionally,the positive drug moxifloxacin demonstrated a significant neuroprotective effect on cognition.[Conclusions]HLJDD effectively improves cognitive function impairment in AD model mice induced by P.gingivalis infection,offering novel experimental evidence supporting the heat-clearing and detoxification approach as well as the therapeutic potential of traditional Chinese medicine(TCM)compounds in the intervention of AD.
基金the National Natural Science Foundation of China(Grant No.52072041)the Beijing Natural Science Foundation(Grant No.JQ21007)+2 种基金the University of Chinese Academy of Sciences(Grant No.Y8540XX2D2)the Robotics Rhino-Bird Focused Research Project(No.2020-01-002)the Tencent Robotics X Laboratory.
文摘Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence.
文摘A method for moving object recognition and tracking in the intelligent traffic monitoring system is presented. For the shortcomings and deficiencies of the frame-subtraction method, a redundant discrete wavelet transform (RDWT) based moving object recognition algorithm is put forward, which directly detects moving objects in the redundant discrete wavelet transform domain. An improved adaptive mean-shift algorithm is used to track the moving object in the follow up frames. Experimental results show that the algorithm can effectively extract the moving object, even though the object is similar to the background, and the results are better than the traditional frame-subtraction method. The object tracking is accurate without the impact of changes in the size of the object. Therefore the algorithm has a certain practical value and prospect.
基金National Natural Science Foundation of China (60776793,60802043)National Basic Research Program of China (2010CB327900)
文摘Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-dimensional (3D) satellites dataset named BUAA Satellite Image Dataset (BUAA-SID 1.0) to supply data for 3D space object research. Then, based on the dataset, we propose to recognize full-viewpoint 3D space objects based on kernel locality preserving projections (KLPP). To obtain more accurate and separable description of the objects, firstly, we build feature vectors employing moment invariants, Fourier descriptors, region covariance and histogram of oriented gradients. Then, we map the features into kernel space followed by dimensionality reduction using KLPP to obtain the submanifold of the features. At last, k-nearest neighbor (kNN) is used to accomplish the classification. Experimental results show that the proposed approach is more appropriate for space object recognition mainly considering changes of viewpoints. Encouraging recognition rate could be obtained based on images in BUAA-SID 1.0, and the highest recognition result could achieve 95.87%.
基金supported by the National Natural Science Foundation of China(Nos.U20B2067).
文摘In the traditional pattern classification method,it usually assumes that the object to be classified must lie in one of given(known)classes of the training data set.However,the training data set may not contain the class of some objects in practice,and this is considered as an Open-Set Recognition(OSR)problem.In this paper,we propose a new progressive open-set recognition method with adaptive probability threshold.Both the labeled training data and the test data(objects to be classified)are put into a common data set,and the k-Nearest Neighbors(k-NNs)of each object are sought in this common set.Then,we can determine the probability of object lying in the given classes.If the majority of k-NNs of the object are from labeled training data,this object quite likely belongs to one of the given classes,and the density of the object and its neighbors is taken into account here.However,when most of k-NNs are from the unlabeled test data set,the class of object is considered very uncertain because the class of test data is unknown,and this object cannot be classified in this step.Once the objects belonging to known classes with high probability are all found,we re-calculate the probability of the other uncertain objects belonging to known classes based on the labeled training data and the objects marked with the estimated probability.Such iteration will stop when the probabilities of all the objects belonging to known classes are not changed.Then,a modified Otsu’s method is employed to adaptively seek the probability threshold for the final classification.If the probability of object belonging to known classes is smaller than this threshold,it will be assigned to the ignorant(unknown)class that is not included in training data set.The other objects will be committed to a specific class.The effectiveness of the proposed method has been validated using some experiments.
文摘The performance of deep learning(DL)networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We propose a genetic algorithm(GA) based deep belief neural network(DBNN) method for robot object recognition and grasping purpose. This method optimizes the parameters of the DBNN method, such as the number of hidden units, the number of epochs, and the learning rates, which would reduce the error rate and the network training time of object recognition. After recognizing objects, the robot performs the pick-andplace operations. We build a database of six objects for experimental purpose. Experimental results demonstrate that our method outperforms on the optimized robot object recognition and grasping tasks.
基金Supported by the National Natural Science Foundation of China(61103157)Beijing Municipal Education Commission Project(SQKM201311417010)
文摘A new method based on adaptive Hessian matrix threshold of finding key SRUF ( speeded up robust features) features is proposed and is applied to an unmanned vehicle for its dynamic object recognition and guided navigation. First, the object recognition algorithm based on SURF feature matching for unmanned vehicle guided navigation is introduced. Then, the standard local invariant feature extraction algorithm SRUF is analyzed, the Hessian Metrix is especially discussed, and a method of adaptive Hessian threshold is proposed which is based on correct matching point pairs threshold feedback under a close loop frame. At last, different dynamic object recognition experi- ments under different weather light conditions are discussed. The experimental result shows that the key SURF feature abstract algorithm and the dynamic object recognition method can be used for un- manned vehicle systems.
文摘This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.
基金co-supported by the National Natural Science Foundation of China (Grant Nos. 61371134, 61071137)the National Basic Research Program of China (No. 2010CB327900)
文摘The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we proposed a kernel regression-based method for joint multi-view space object recognition and pose estimation. We built a new simulated satellite image dataset named BUAA-SID 1.5 to test our method using different image representations. We evaluated our method for recognition-only tasks, pose estimation-only tasks, and joint recognition and pose estimation tasks. Experimental results show that our method outperforms the state-of-the-arts in space object recognition, and can recognize space objects and estimate their poses effectively and robustly against noise and lighting conditions.
基金Supported by the National Natural Science Foundation of China(61701029)Basic Research Foundation of Beijing Institute of Technology(20170542008)Industry-University Research Innovation Foundation of the Science and Technology Development Center of the Ministry of Education(2018A02012)。
文摘In order to accomplish the task of object recognition in natural scenes,a new object recognition algorithm based on an improved convolutional neural network(CNN)is proposed.First,candidate object windows are extracted from the original image.Then,candidate object windows are input into the improved CNN model to obtain deep features.Finally,the deep features are input into the Softmax and the confidence scores of classes are obtained.The candidate object window with the highest confidence score is selected as the object recognition result.Based on AlexNet,Inception V1 is introduced into the improved CNN and the fully connected layer is replaced by the average pooling layer,which widens the network and deepens the network at the same time.Experimental results show that the improved object recognition algorithm can obtain better recognition results in multiple natural scene images,and has a higher degree of accuracy than the classical algorithms in the field of object recognition.
文摘The complexity of fire and smoke in terms of shape, texture, and color presents significant challenges for accurate fire and smoke detection. To address this, a YOLOv8-based detection algorithm integrated with the Convolutional Block Attention Module (CBAM) has been developed. This algorithm initially employs the latest YOLOv8 for object recognition. Subsequently, the integration of CBAM enhances its feature extraction capabilities. Finally, the WIoU function is used to optimize the network’s bounding box loss, facilitating rapid convergence. Experimental validation using a smoke and fire dataset demonstrated that the proposed algorithm achieved a 2.3% increase in smoke and fire detection accuracy, surpassing other state-of-the-art methods.
基金supported by grants from the Ministerio de Economia y Competitividad(BFU2013-43458-R)Junta de Andalucia(P12-CTS-1694 and Proyexcel-00422)to ZUK。
文摘Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to produce satisfa ctory effects.Therefore,in the search for a solution,we found that a treatment with the gene corresponding to the RGS14414protein in visual area V2,a brain area connected with brain circuits of the ventral stream and the medial temporal lobe,which is crucial for object recognition memory(ORM),can induce enhancement of ORM.In this study,we demonstrated that the same treatment with RGS14414in visual area V2,which is relatively unaffected in neurodegenerative diseases such as Alzheimer s disease,produced longlasting enhancement of ORM in young animals and prevent ORM deficits in rodent models of aging and Alzheimer’s disease.Furthermore,we found that the prevention of memory deficits was mediated through the upregulation of neuronal arbo rization and spine density,as well as an increase in brain-derived neurotrophic factor(BDNF).A knockdown of BDNF gene in RGS14414-treated aging rats and Alzheimer s disease model mice caused complete loss in the upregulation of neuronal structural plasticity and in the prevention of ORM deficits.These findings suggest that BDNF-mediated neuronal structural plasticity in area V2 is crucial in the prevention of memory deficits in RGS14414-treated rodent models of aging and Alzheimer’s disease.Therefore,our findings of RGS14414gene-mediated activation of neuronal circuits in visual area V2 have therapeutic relevance in the treatment of memory deficits.