Finding more specific subcategories within a larger category is the goal of fine-grained image classification(FGIC),and the key is to find local discriminative regions of visual features.Most existing methods use trad...Finding more specific subcategories within a larger category is the goal of fine-grained image classification(FGIC),and the key is to find local discriminative regions of visual features.Most existing methods use traditional convolutional operations to achieve fine-grained image classification.However,traditional convolution cannot extract multi-scale features of an image and existing methods are susceptible to interference from image background information.Therefore,to address the above problems,this paper proposes an FGIC model(Attention-PCNN)based on hybrid attention mechanism and pyramidal convolution.The model feeds the multi-scale features extracted by the pyramidal convolutional neural network into two branches capturing global and local information respectively.In particular,a hybrid attention mechanism is added to the branch capturing global information in order to reduce the interference of image background information and make the model pay more attention to the target region with fine-grained features.In addition,the mutual-channel loss(MC-LOSS)is introduced in the local information branch to capture fine-grained features.We evaluated the model on three publicly available datasets CUB-200-2011,Stanford Cars,FGVCAircraft,etc.Compared to the state-of-the-art methods,the results show that Attention-PCNN performs better.展开更多
Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for ...Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.展开更多
Sinus floor elevation with a lateral window approach requires bone graft(BG)to ensure sufficient bone mass,and it is necessary to measure and analyse the BG region for follow-up of postoperative patients.However,the B...Sinus floor elevation with a lateral window approach requires bone graft(BG)to ensure sufficient bone mass,and it is necessary to measure and analyse the BG region for follow-up of postoperative patients.However,the BG region from cone-beam computed tomography(CBCT)images is connected to the margin of the maxillary sinus,and its boundary is blurred.Common segmentation methods are usually performed manually by experienced doctors,and are complicated by challenges such as low efficiency and low precision.In this study,an auto-segmentation approach was applied to the BG region within the maxillary sinus based on an atrous spatial pyramid convolution(ASPC)network.The ASPC module was adopted using residual connections to compose multiple atrous convolutions,which could extract more features on multiple scales.Subsequently,a segmentation network of the BG region with multiple ASPC modules was established,which effectively improved the segmentation performance.Although the training data were insufficient,our networks still achieved good auto-segmentation results,with a dice coefficient(Dice)of 87.13%,an Intersection over Union(Iou)of 78.01%,and a sensitivity of 95.02%.Compared with other methods,our method achieved a better segmentation effect,and effectively reduced the misjudgement of segmentation.Our method can thus be used to implement automatic segmentation of the BG region and improve doctors’work efficiency,which is of great importance for developing preliminary studies on the measurement of postoperative BG within the maxillary sinus.展开更多
Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the g...Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the great potential to deal with pore pressure prediction.However,most of the traditional deep learning models are less efficient to address generalization problems.To fill this technical gap,in this work,we developed a new adaptive physics-informed deep learning model with high generalization capability to predict pore pressure values directly from seismic data.Specifically,the new model,named CGP-NN,consists of a novel parametric features extraction approach(1DCPP),a stacked multilayer gated recurrent model(multilayer GRU),and an adaptive physics-informed loss function.Through machine training,the developed model can automatically select the optimal physical model to constrain the results for each pore pressure prediction.The CGP-NN model has the best generalization when the physicsrelated metricλ=0.5.A hybrid approach combining Eaton and Bowers methods is also proposed to build machine-learnable labels for solving the problem of few labels.To validate the developed model and methodology,a case study on a complex reservoir in Tarim Basin was further performed to demonstrate the high accuracy on the pore pressure prediction of new wells along with the strong generalization ability.The adaptive physics-informed deep learning approach presented here has potential application in the prediction of pore pressures coupled with multiple genesis mechanisms using seismic data.展开更多
The occurrence of perioperative heart failure will affect the quality of medical services and threaten the safety of patients.Existing methods depend on the judgment of doctors,the results are affected by many factors...The occurrence of perioperative heart failure will affect the quality of medical services and threaten the safety of patients.Existing methods depend on the judgment of doctors,the results are affected by many factors such as doctors’knowledge and experience.The accuracy is difficult to guarantee and has a serious lag.In this paper,a mixture prediction model is proposed for perioperative adverse events of heart failure,which combined with the advantages of the Deep Pyramid Convolutional Neural Networks(DPCNN)and Extreme Gradient Boosting(XGBOOST).The DPCNN was used to automatically extract features from patient’s diagnostic texts,and the text features were integrated with the preoperative examination and intraoperative monitoring values of patients,then the XGBOOST algorithm was used to construct the prediction model of heart failure.An experimental comparison was conducted on the model based on the data of patients with heart failure in southwest hospital from 2014 to 2018.The results showed that the DPCNN-XGBOOST model improved the predictive sensitivity of the model by 3%and 31%compared with the text-based DPCNN Model and the numeric-based XGBOOST Model.展开更多
Facial expression recognition is a research hot spot in the fields of computer vision and pattern recognition.However,the existing facial expression recognition models are mainly concentrated in the visible light envi...Facial expression recognition is a research hot spot in the fields of computer vision and pattern recognition.However,the existing facial expression recognition models are mainly concentrated in the visible light environment.They have insufficient generalization ability and low recognition accuracy,and are vulnerable to environmental changes such as illumination and distance.In order to solve these problems,we combine the advantages of the infrared and visible images captured simultaneously by array equipment our developed with two infrared and two visible lens,so that the fused image not only has the texture information of visible image,but also has the contrast information of infrared image.On the other hand,we improved the WGAN by adding SSIM and LBP loss functions to ensure the structural similarity between the fused image and infrared image,and also the texture similarity between the fused image and visible image respectively.Finally,a facial expression recognition model Pyconv-SE18 with pyramid convolution and attention mechanism module is designed to extract the important feature information of facial expression in multiple scales.We add cosine distance loss function to reduce the feature difference within the class.Experiment results show that the robustness of expression recognition algorithm to illumination is improved based on the fused images.The accuracy of this model on FER2013 and CK+public data sets are 69.3%and 94.6%,respectively.展开更多
Existing power grid fault diagnosis methods relyon manual experience to design diagnosis models, lack theability to extract fault knowledge, and are difficult to adaptto complex and changeable engineering sites. Consi...Existing power grid fault diagnosis methods relyon manual experience to design diagnosis models, lack theability to extract fault knowledge, and are difficult to adaptto complex and changeable engineering sites. Considering thissituation, this paper proposes a power grid fault diagnosismethod based on a deep pyramid convolutional neural networkfor the alarm information set. This approach uses the deepfeature extraction ability of the network to extract fault featureknowledge from alarm information texts and achieve end-to-endfault classification and fault device identification. First, a deeppyramid convolutional neural network model for extracting theoverall characteristics of fault events is constructed to identifyfault types. Second, a deep pyramidal convolutional neuralnetwork model for alarm information text is constructed, thetext description characteristics associated with alarm informationtexts are extracted, the key information corresponding to faultsin the alarm information set is identified, and suspicious faultydevices are selected. Then, a fault device identification strategythat integrates fault-type and time sequence priorities is proposedto identify faulty devices. Finally, the actual fault cases and thefault cases generated by the simulation are studied, and theresults verify the effectiveness and practicability of the methodpresented in this paper.展开更多
基金supported by the National Natural Science Foundation of China(Nos.62372266,61832012,12271295,and 62072273)the Natural Science Foundation of Shandong Province(Nos.ZR2020MF149,ZR2022MF304,ZR2021MF075,ZR2021QF050,and ZR2019ZD10)the Key Research and Development Program Project of Shandong Province(No.2022CXPT055).
文摘Finding more specific subcategories within a larger category is the goal of fine-grained image classification(FGIC),and the key is to find local discriminative regions of visual features.Most existing methods use traditional convolutional operations to achieve fine-grained image classification.However,traditional convolution cannot extract multi-scale features of an image and existing methods are susceptible to interference from image background information.Therefore,to address the above problems,this paper proposes an FGIC model(Attention-PCNN)based on hybrid attention mechanism and pyramidal convolution.The model feeds the multi-scale features extracted by the pyramidal convolutional neural network into two branches capturing global and local information respectively.In particular,a hybrid attention mechanism is added to the branch capturing global information in order to reduce the interference of image background information and make the model pay more attention to the target region with fine-grained features.In addition,the mutual-channel loss(MC-LOSS)is introduced in the local information branch to capture fine-grained features.We evaluated the model on three publicly available datasets CUB-200-2011,Stanford Cars,FGVCAircraft,etc.Compared to the state-of-the-art methods,the results show that Attention-PCNN performs better.
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
基金suported by the Fundamental Research Grant Scheme(FRGS)of Universiti Sains Malaysia,Research Number:FRGS/1/2024/ICT02/USM/02/1.
文摘Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.
基金the National Key Research and Development Program of China(No.2017YFB1302900)the National Natural Science Foundation of China(Nos.81971709,M-0019,and 82011530141)+2 种基金the Foundation of Science and Technology Commission of Shanghai Municipality(Nos.19510712200,and 20490740700)the Shanghai Jiao Tong University Foundation on Medical and Technological Joint Science Research(Nos.ZH2018ZDA15,YG2019ZDA06,and ZH2018QNA23)the 2020 Key Research Project of Xiamen Municipal Government(No.3502Z20201030)。
文摘Sinus floor elevation with a lateral window approach requires bone graft(BG)to ensure sufficient bone mass,and it is necessary to measure and analyse the BG region for follow-up of postoperative patients.However,the BG region from cone-beam computed tomography(CBCT)images is connected to the margin of the maxillary sinus,and its boundary is blurred.Common segmentation methods are usually performed manually by experienced doctors,and are complicated by challenges such as low efficiency and low precision.In this study,an auto-segmentation approach was applied to the BG region within the maxillary sinus based on an atrous spatial pyramid convolution(ASPC)network.The ASPC module was adopted using residual connections to compose multiple atrous convolutions,which could extract more features on multiple scales.Subsequently,a segmentation network of the BG region with multiple ASPC modules was established,which effectively improved the segmentation performance.Although the training data were insufficient,our networks still achieved good auto-segmentation results,with a dice coefficient(Dice)of 87.13%,an Intersection over Union(Iou)of 78.01%,and a sensitivity of 95.02%.Compared with other methods,our method achieved a better segmentation effect,and effectively reduced the misjudgement of segmentation.Our method can thus be used to implement automatic segmentation of the BG region and improve doctors’work efficiency,which is of great importance for developing preliminary studies on the measurement of postoperative BG within the maxillary sinus.
基金funded by the National Natural Science Foundation of China(General Program:No.52074314,No.U19B6003-05)National Key Research and Development Program of China(2019YFA0708303-05)。
文摘Accurate prediction of formation pore pressure is essential to predict fluid flow and manage hydrocarbon production in petroleum engineering.Recent deep learning technique has been receiving more interest due to the great potential to deal with pore pressure prediction.However,most of the traditional deep learning models are less efficient to address generalization problems.To fill this technical gap,in this work,we developed a new adaptive physics-informed deep learning model with high generalization capability to predict pore pressure values directly from seismic data.Specifically,the new model,named CGP-NN,consists of a novel parametric features extraction approach(1DCPP),a stacked multilayer gated recurrent model(multilayer GRU),and an adaptive physics-informed loss function.Through machine training,the developed model can automatically select the optimal physical model to constrain the results for each pore pressure prediction.The CGP-NN model has the best generalization when the physicsrelated metricλ=0.5.A hybrid approach combining Eaton and Bowers methods is also proposed to build machine-learnable labels for solving the problem of few labels.To validate the developed model and methodology,a case study on a complex reservoir in Tarim Basin was further performed to demonstrate the high accuracy on the pore pressure prediction of new wells along with the strong generalization ability.The adaptive physics-informed deep learning approach presented here has potential application in the prediction of pore pressures coupled with multiple genesis mechanisms using seismic data.
基金This study was approved by the Ethics Committee of the First Affiliated Hospital of Army Medical University,PLA,and the Approved No.of ethic committee is KY201936This work is supported by the National Key Research&Development Plan of China(2018YFC0116704)in data collectionIn addition,it is supported by Chongqing Technology Innovation and application research and development project(cstc2019jscx-msxmx0237)in the design of the study.
文摘The occurrence of perioperative heart failure will affect the quality of medical services and threaten the safety of patients.Existing methods depend on the judgment of doctors,the results are affected by many factors such as doctors’knowledge and experience.The accuracy is difficult to guarantee and has a serious lag.In this paper,a mixture prediction model is proposed for perioperative adverse events of heart failure,which combined with the advantages of the Deep Pyramid Convolutional Neural Networks(DPCNN)and Extreme Gradient Boosting(XGBOOST).The DPCNN was used to automatically extract features from patient’s diagnostic texts,and the text features were integrated with the preoperative examination and intraoperative monitoring values of patients,then the XGBOOST algorithm was used to construct the prediction model of heart failure.An experimental comparison was conducted on the model based on the data of patients with heart failure in southwest hospital from 2014 to 2018.The results showed that the DPCNN-XGBOOST model improved the predictive sensitivity of the model by 3%and 31%compared with the text-based DPCNN Model and the numeric-based XGBOOST Model.
基金The work of this paper is supported by the Innovation Capability Improvement Project of Science and Technology Service for the Elderly by Beijing Municipal Science&Technology Commission under Grant No.Z1811000009218012,and the Innovation Project Foundation NCUT.
文摘Facial expression recognition is a research hot spot in the fields of computer vision and pattern recognition.However,the existing facial expression recognition models are mainly concentrated in the visible light environment.They have insufficient generalization ability and low recognition accuracy,and are vulnerable to environmental changes such as illumination and distance.In order to solve these problems,we combine the advantages of the infrared and visible images captured simultaneously by array equipment our developed with two infrared and two visible lens,so that the fused image not only has the texture information of visible image,but also has the contrast information of infrared image.On the other hand,we improved the WGAN by adding SSIM and LBP loss functions to ensure the structural similarity between the fused image and infrared image,and also the texture similarity between the fused image and visible image respectively.Finally,a facial expression recognition model Pyconv-SE18 with pyramid convolution and attention mechanism module is designed to extract the important feature information of facial expression in multiple scales.We add cosine distance loss function to reduce the feature difference within the class.Experiment results show that the robustness of expression recognition algorithm to illumination is improved based on the fused images.The accuracy of this model on FER2013 and CK+public data sets are 69.3%and 94.6%,respectively.
基金the National Natural Science Foundation of China(51877079).
文摘Existing power grid fault diagnosis methods relyon manual experience to design diagnosis models, lack theability to extract fault knowledge, and are difficult to adaptto complex and changeable engineering sites. Considering thissituation, this paper proposes a power grid fault diagnosismethod based on a deep pyramid convolutional neural networkfor the alarm information set. This approach uses the deepfeature extraction ability of the network to extract fault featureknowledge from alarm information texts and achieve end-to-endfault classification and fault device identification. First, a deeppyramid convolutional neural network model for extracting theoverall characteristics of fault events is constructed to identifyfault types. Second, a deep pyramidal convolutional neuralnetwork model for alarm information text is constructed, thetext description characteristics associated with alarm informationtexts are extracted, the key information corresponding to faultsin the alarm information set is identified, and suspicious faultydevices are selected. Then, a fault device identification strategythat integrates fault-type and time sequence priorities is proposedto identify faulty devices. Finally, the actual fault cases and thefault cases generated by the simulation are studied, and theresults verify the effectiveness and practicability of the methodpresented in this paper.