Pneumonia is part of the main diseases causing the death of children.It is generally diagnosed through chest Xray images.With the development of Deep Learning(DL),the diagnosis of pneumonia based on DL has received ex...Pneumonia is part of the main diseases causing the death of children.It is generally diagnosed through chest Xray images.With the development of Deep Learning(DL),the diagnosis of pneumonia based on DL has received extensive attention.However,due to the small difference between pneumonia and normal images,the performance of DL methods could be improved.This research proposes a new fine-grained Convolutional Neural Network(CNN)for children’s pneumonia diagnosis(FG-CPD).Firstly,the fine-grainedCNNclassificationwhich can handle the slight difference in images is investigated.To obtain the raw images from the real-world chest X-ray data,the YOLOv4 algorithm is trained to detect and position the chest part in the raw images.Secondly,a novel attention network is proposed,named SGNet,which integrates the spatial information and channel information of the images to locate the discriminative parts in the chest image for expanding the difference between pneumonia and normal images.Thirdly,the automatic data augmentation method is adopted to increase the diversity of the images and avoid the overfitting of FG-CPD.The FG-CPD has been tested on the public Chest X-ray 2017 dataset,and the results show that it has achieved great effect.Then,the FG-CPD is tested on the real chest X-ray images from children aged 3–12 years ago from Tongji Hospital.The results show that FG-CPD has achieved up to 96.91%accuracy,which can validate the potential of the FG-CPD.展开更多
Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained promine...Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.展开更多
Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimo...Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Severe ground-level ozone(O_(3))pollution over major Chinese cities has become one of the most challenging problems,which have deleterious effects on human health and the sustainability of society.This study explored ...Severe ground-level ozone(O_(3))pollution over major Chinese cities has become one of the most challenging problems,which have deleterious effects on human health and the sustainability of society.This study explored the spatiotemporal distribution characteristics of ground-level O_(3) and its precursors based on conventional pollutant and meteorological monitoring data in Zhejiang Province from 2016 to 2021.Then,a high-performance convolutional neural network(CNN)model was established by expanding the moment and the concentration variations to general factors.Finally,the response mechanism of O_(3) to the variation with crucial influencing factors is explored by controlling variables and interpolating target variables.The results indicated that the annual average MDA8-90th concentrations in Zhejiang Province are higher in the northern and lower in the southern.When the wind direction(WD)ranges from east to southwest and the wind speed(WS)ranges between 2 and 3 m/sec,higher O_(3) concentration prone to occur.At different temperatures(T),the O_(3) concentration showed a trend of first increasing and subsequently decreasing with increasing NO_(2) concentration,peaks at the NO_(2) concentration around 0.02mg/m^(3).The sensitivity of NO_(2) to O_(3) formation is not easily affected by temperature,barometric pressure and dew point temperature.Additionally,there is a minimum IRNO_(2) at each temperature when the NO_(2) concentration is 0.03 mg/m^(3),and this minimum IRNO_(2) decreases with increasing temperature.The study explores the response mechanism of O_(3) with the change of driving variables,which can provide a scientific foundation and methodological support for the targeted management of O_(3) pollution.展开更多
The isolated fracture-vug systems controlled by small-scale strike-slip faults within ultra-deep carbonate rocks of the Tarim Basin exhibit significant exploration potential.The study employs a novel training set inco...The isolated fracture-vug systems controlled by small-scale strike-slip faults within ultra-deep carbonate rocks of the Tarim Basin exhibit significant exploration potential.The study employs a novel training set incorporating innovative fault labels to train a U-Net-structured CNN model,enabling effective identification of small-scale strike-slip faults through seismic data interpretation.Based on the CNN faults,we analyze the distribution patterns of small-scale strike-slip faults.The small-scale strike-slip faults can be categorized into NNW-trending and NE-trending groups with strike lengths ranging 200–5000 m.The development intensity of small-scale strike-slip faults in the Lower Yingshan Member notably exceeds that in the Upper Member.The Lower and Upper Yingshan members are two distinct mechanical layers with contrasting brittleness characteristics,separated by a low-brittleness layer.The superior brittleness of the Lower Yingshan Member enhances the development intensity of small-scale strike-slip faults compared to the upper member,while the low-brittleness layer exerts restrictive effects on vertical fault propagation.Fracture-vug systems formed by interactions of two or more small-scale strike-slip faults demonstrate larger sizes than those controlled by individual faults.All fracture-vug system sizes show positive correlations with the vertical extents of associated small-scale strike-slip faults,particularly intersection and approaching fracture-vug systems exhibit accelerated size increases proportional to the vertical extents.展开更多
With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based...With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based on GNN can deal with encrypted traffic well. However, existing GNN-based approaches ignore the relationship between client or server packets. In this paper, we design a network traffic topology based on GCN, called Flow Mapping Graph (FMG). FMG establishes sequential edges between vertexes by the arrival order of packets and establishes jump-order edges between vertexes by connecting packets in different bursts with the same direction. It not only reflects the time characteristics of the packet but also strengthens the relationship between the client or server packets. According to FMG, a Traffic Mapping Classification model (TMC-GCN) is designed, which can automatically capture and learn the characteristics and structure information of the top vertex in FMG. The TMC-GCN model is used to classify the encrypted traffic. The encryption stream classification problem is transformed into a graph classification problem, which can effectively deal with data from different data sources and application scenarios. By comparing the performance of TMC-GCN with other classical models in four public datasets, including CICIOT2023, ISCXVPN2016, CICAAGM2017, and GraphDapp, the effectiveness of the FMG algorithm is verified. The experimental results show that the accuracy rate of the TMC-GCN model is 96.13%, the recall rate is 95.04%, and the F1 rate is 94.54%.展开更多
The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to u...The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.展开更多
Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been pr...Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.展开更多
Human disturbance activities is one of the main reasons for inducing geohazards.Ecological impact assessment metrics of roads are inconsistent criteria and multiple.From the perspective of visual observation,the envir...Human disturbance activities is one of the main reasons for inducing geohazards.Ecological impact assessment metrics of roads are inconsistent criteria and multiple.From the perspective of visual observation,the environment damage can be shown through detecting the uncovered area of vegetation in the images along road.To realize this,an end-to-end environment damage detection model based on convolutional neural network is proposed.A 50-layer residual network is used to extract feature map.The initial parameters are optimized by transfer learning.An example is shown by this method.The dataset including cliff and landslide damage are collected by us along road in Shennongjia national forest park.Results show 0.4703 average precision(AP)rating for cliff damage and 0.4809 average precision(AP)rating for landslide damage.Compared with YOLOv3,our model shows a better accuracy in cliff and landslide detection although a certain amount of speed is sacrificed.展开更多
An evolution inequality of Sobolev type involving a nonlinear convolution term is considered.By using the nonlinear capacity method and the contradiction argument,the non-existence of the nontrivial local weak solutio...An evolution inequality of Sobolev type involving a nonlinear convolution term is considered.By using the nonlinear capacity method and the contradiction argument,the non-existence of the nontrivial local weak solution is proved.展开更多
Aspect-oriented sentiment analysis is a meticulous sentiment analysis task that aims to analyse the sentiment polarity of specific aspects. Most of the current research builds graph convolutional networks based on dep...Aspect-oriented sentiment analysis is a meticulous sentiment analysis task that aims to analyse the sentiment polarity of specific aspects. Most of the current research builds graph convolutional networks based on dependent syntactic trees, which improves the classification performance of the models to some extent. However, the technical limitations of dependent syntactic trees can introduce considerable noise into the model. Meanwhile, it is difficult for a single graph convolutional network to aggregate both semantic and syntactic structural information of nodes, which affects the final sentence classification. To cope with the above problems, this paper proposes a bi-channel graph convolutional network model. The model introduces a phrase structure tree and transforms it into a hierarchical phrase matrix. The adjacency matrix of the dependent syntactic tree and the hierarchical phrase matrix are combined as the initial matrix of the graph convolutional network to enhance the syntactic information. The semantic information feature representations of the sentences are obtained by the graph convolutional network with a multi-head attention mechanism and fused to achieve complementary learning of dual-channel features. Experimental results show that the model performs well and improves the accuracy of sentiment classification on three public benchmark datasets, namely Rest14, Lap14 and Twitter.展开更多
The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and hist...The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.展开更多
During its growth stage,the plant is exposed to various diseases.Detection and early detection of crop diseases is amajor challenge in the horticulture industry.Crop infections can harmtotal crop yield and reduce farm...During its growth stage,the plant is exposed to various diseases.Detection and early detection of crop diseases is amajor challenge in the horticulture industry.Crop infections can harmtotal crop yield and reduce farmers’income if not identified early.Today’s approved method involves a professional plant pathologist to diagnose the disease by visual inspection of the afflicted plant leaves.This is an excellent use case for Community Assessment and Treatment Services(CATS)due to the lengthy manual disease diagnosis process and the accuracy of identification is directly proportional to the skills of pathologists.An alternative to conventional Machine Learning(ML)methods,which require manual identification of parameters for exact results,is to develop a prototype that can be classified without pre-processing.To automatically diagnose tomato leaf disease,this research proposes a hybrid model using the Convolutional Auto-Encoders(CAE)network and the CNN-based deep learning architecture of DenseNet.To date,none of the modern systems described in this paper have a combined model based on DenseNet,CAE,and ConvolutionalNeuralNetwork(CNN)todiagnose the ailments of tomato leaves automatically.Themodelswere trained on a dataset obtained from the Plant Village repository.The dataset consisted of 9920 tomato leaves,and the model-tomodel accuracy ratio was 98.35%.Unlike other approaches discussed in this paper,this hybrid strategy requires fewer training components.Therefore,the training time to classify plant diseases with the trained algorithm,as well as the training time to automatically detect the ailments of tomato leaves,is significantly reduced.展开更多
Soil responds to cavity expansion is inherently rate-dependent,especially in the case of fine-grained soils.To better understand such rate effects,self-boring pressuremeter tests were conducted on Kunming peaty soil w...Soil responds to cavity expansion is inherently rate-dependent,especially in the case of fine-grained soils.To better understand such rate effects,self-boring pressuremeter tests were conducted on Kunming peaty soil within a strain rate range of 0.1%/min to 5.0%/min.The results showed a clear dependence of cavity pressure and excess pore pressure(EPP)on strain ratesdboth increased with higher rates for a given radial displacement.In light of the experimental results,three cases of cylindrical cavity expansion were investigated using the finite element method and analytical method,partially drained expansion in Modified Cam-Clay(MCC)soil,and undrained and partially drained expansion in elastoviscoplastic(EVP)soil.The EVP behavior was and modeled using the MCC model and the overstress viscoplastic theory.The results indicated that over the strain rate range of 0.0001%/min and 50%/min,the rate response of cavity pressure for the case of partially drained expansion in MCC soil(permeability coefficient ranging from 5×10^(-6) m/s to 2.5×10^(-11) m/s)is not obvious,while the EPP response during undrained expansion in EVP soil shows rate-independent.Only the partially drained solution for cavity expansion in EVP soil captured the rate-sensitive responses of both cavity pressure and EPP,confirmed by the pressuremeter tests on the Kunming peaty soil,Saint-Herblain clay,and Burswood clay.This suggests that the rate effect results from a combination of drainage-related and time-dependent soil behavior.Parametric studies further demonstrated that both viscous behavior and the overconsolidation ratio significantly influence cylindrical cavity expansion response,and the drainage conditions during expansion can be assessed using a nondimensional velocity.展开更多
Climate change is a global phenomenon that has profound impacts on ecological dynamics and biodiversity,shaping the interactions between species and their environment.To gain a deeper understanding of the mechanisms d...Climate change is a global phenomenon that has profound impacts on ecological dynamics and biodiversity,shaping the interactions between species and their environment.To gain a deeper understanding of the mechanisms driving climate change,phenological monitoring is essential.Traditional methods of defining phenological phases often rely on fixed thresholds.However,with the development of technology,deep learning-based classification models are now able to more accurately delineate phenological phases from images,enabling phenological monitoring.Despite the significant advancements these models have made in phenological monitoring,they still face challenges in fully capturing the complexity of biotic-environmental interactions,which can limit the fine-grained accuracy of phenological phase identification.To address this,we propose a novel deep learning model,RESformer,designed to monitor tree phenology at a fine-grained level using PhenoCam images.RESformer features a lightweight structure,making it suitable for deployment in resource-constrained environments.It incorporates a dual-branch routing mechanism that considers both global and local information,thereby improving the accuracy of phenological monitoring.To validate the effectiveness of RESformer,we conducted a case study involving 82,118 images taken over two years from four different locations in Wisconsin,focusing on the phenology of Acer.The images were classified into seven distinct phenological stages,with RESformer achieving an overall monitoring accuracy of 96.02%.Furthermore,we compared RESformer with a phenological monitoring approach based on the Green Chromatic Coordinate(GCC)index and ten popular classification models.The results showed that RESformer excelled in fine-grained monitoring,effectively capturing and identifying changes in phenological stages.This finding not only provides strong support for monitoring the phenology of Acer species but also offers valuable insights for understanding ecological trends and developing more effective ecosystem conservation and management strategies.展开更多
The spray-deposition was used to produce billets of Mg-4Al-1.5Zn-3Ca-1Nd(A alloy)and Mg-13Al-3Zn-3Ca-1Nd(B alloy),and evolution of deformation substructure and Mg_(x)Zn_(y)Ca_(z)metastable phase in fine-grained(3μm)M...The spray-deposition was used to produce billets of Mg-4Al-1.5Zn-3Ca-1Nd(A alloy)and Mg-13Al-3Zn-3Ca-1Nd(B alloy),and evolution of deformation substructure and Mg_(x)Zn_(y)Ca_(z)metastable phase in fine-grained(3μm)Mg alloys was investigated by scanning electron microscopy(SEM),transmission electron microscopy(TEM),X-ray diffraction(XRD),and electron backscattered diffraction(EBSD).It was found that different dislocation configurations were formed in A and B alloys.Redundant free dislocations(RFDs)and dislocation tangles were the ways to form deformation substructure in A alloy,no RFDs except dislocation tangles were found in B alloy.The interaction between nano-scale second phase particles(nano-scale C15 andβ-Mg_(17)(Al,Zn)_(12)phase)and different dislocation configurations had a significant effect on the deformation substructures formation.The mass transfer of Mg_(x)Zn_(y)Ca_(z)metastable phases and the stacking order of stacking faults were conducive to the Mg-Nd-Zn typed long period stacking ordered(LPSO)phases formation.Nano-scale C15 phases,Mg-Nd-Zn typed LPSO phases,c/a ratio,β-Mg_(17)(Al,Zn)_(12)phases were the key factors influencing the formation of textures.Different textures and grain boundary features(GB features)had a significant effect on k-value.The non-basal textures were the main factor affecting k-value in A alloy,while the high-angle grain boundary(HAGB)was the main factor affecting k-value in B alloy.展开更多
Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robus...Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robust feature extraction and efficient fusion remain major challenges.We introduce a multi-stage fine-grained audiovisual fusion network(MSFG-AVFNet) for fine-grained bird species classification,which addresses these challenges through two key components:(1) the audiovisual feature extraction module,which adopts a multi-stage finetuning strategy to provide high-quality unimodal features,laying a solid foundation for modality fusion;(2) the audiovisual feature fusion module,which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion.Experiments were conducted on the self-built AVB81and the publicly available SSW60 datasets,which contain data from 81 and 60 bird species,respectively.Comprehensive experiments demonstrate that our approach achieves notable performance gains,outperforming existing state-of-the-art methods.These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.展开更多
Fine-grained aircraft target detection in remote sensing holds significant research valueand practical applications,particularly in military defense and precision strikes.Given the complex-ity of remote sensing images...Fine-grained aircraft target detection in remote sensing holds significant research valueand practical applications,particularly in military defense and precision strikes.Given the complex-ity of remote sensing images,where targets are often small and similar within categories,detectingthese fine-grained targets is challenging.To address this,we constructed a fine-grained dataset ofremotely sensed airplanes;for the problems of remote sensing fine-grained targets with obvious head-to-tail distributions and large variations in target sizes,we proposed the DWDet fine-grained tar-get detection and recognition algorithm.First,for the problem of unbalanced category distribution,we adopt an adaptive sampling strategy.In addition,we construct a deformable convolutional blockand improve the decoupling head structure to improve the detection effect of the model ondeformed targets.Then,we design a localization loss function,which is used to improve the model’slocalization ability for targets of different scales.The experimental results show that our algorithmimproves the overall accuracy of the model by 4.1%compared to the baseline model,and improvesthe detection accuracy of small targets by 12.2%.The ablation and comparison experiments alsoprove the effectiveness of our algorithm.展开更多
Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely us...Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.展开更多
基金supported in part by the Natural Science Foundation of China(NSFC)underGrant No.51805192,Major Special Science and Technology Project of Hubei Province under Grant No.2020AEA009sponsored by the State Key Laboratory of Digital Manufacturing Equipment and Technology(DMET)of Huazhong University of Science and Technology(HUST)under Grant No.DMETKF2020029.
文摘Pneumonia is part of the main diseases causing the death of children.It is generally diagnosed through chest Xray images.With the development of Deep Learning(DL),the diagnosis of pneumonia based on DL has received extensive attention.However,due to the small difference between pneumonia and normal images,the performance of DL methods could be improved.This research proposes a new fine-grained Convolutional Neural Network(CNN)for children’s pneumonia diagnosis(FG-CPD).Firstly,the fine-grainedCNNclassificationwhich can handle the slight difference in images is investigated.To obtain the raw images from the real-world chest X-ray data,the YOLOv4 algorithm is trained to detect and position the chest part in the raw images.Secondly,a novel attention network is proposed,named SGNet,which integrates the spatial information and channel information of the images to locate the discriminative parts in the chest image for expanding the difference between pneumonia and normal images.Thirdly,the automatic data augmentation method is adopted to increase the diversity of the images and avoid the overfitting of FG-CPD.The FG-CPD has been tested on the public Chest X-ray 2017 dataset,and the results show that it has achieved great effect.Then,the FG-CPD is tested on the real chest X-ray images from children aged 3–12 years ago from Tongji Hospital.The results show that FG-CPD has achieved up to 96.91%accuracy,which can validate the potential of the FG-CPD.
基金supported by the National Natural Science Foundation of China(No.52277055).
文摘Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.
基金supported by the Science and Technology Project of Henan Province(No.222102210081).
文摘Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金supported by the National Key Research and Development Program of China (Nos.2022YFC3702000 and 2022YFC3703500)the Key R&D Project of Zhejiang Province (No.2022C03146).
文摘Severe ground-level ozone(O_(3))pollution over major Chinese cities has become one of the most challenging problems,which have deleterious effects on human health and the sustainability of society.This study explored the spatiotemporal distribution characteristics of ground-level O_(3) and its precursors based on conventional pollutant and meteorological monitoring data in Zhejiang Province from 2016 to 2021.Then,a high-performance convolutional neural network(CNN)model was established by expanding the moment and the concentration variations to general factors.Finally,the response mechanism of O_(3) to the variation with crucial influencing factors is explored by controlling variables and interpolating target variables.The results indicated that the annual average MDA8-90th concentrations in Zhejiang Province are higher in the northern and lower in the southern.When the wind direction(WD)ranges from east to southwest and the wind speed(WS)ranges between 2 and 3 m/sec,higher O_(3) concentration prone to occur.At different temperatures(T),the O_(3) concentration showed a trend of first increasing and subsequently decreasing with increasing NO_(2) concentration,peaks at the NO_(2) concentration around 0.02mg/m^(3).The sensitivity of NO_(2) to O_(3) formation is not easily affected by temperature,barometric pressure and dew point temperature.Additionally,there is a minimum IRNO_(2) at each temperature when the NO_(2) concentration is 0.03 mg/m^(3),and this minimum IRNO_(2) decreases with increasing temperature.The study explores the response mechanism of O_(3) with the change of driving variables,which can provide a scientific foundation and methodological support for the targeted management of O_(3) pollution.
基金supported by the National Natural Science Foundation of China(No.U21B2062).
文摘The isolated fracture-vug systems controlled by small-scale strike-slip faults within ultra-deep carbonate rocks of the Tarim Basin exhibit significant exploration potential.The study employs a novel training set incorporating innovative fault labels to train a U-Net-structured CNN model,enabling effective identification of small-scale strike-slip faults through seismic data interpretation.Based on the CNN faults,we analyze the distribution patterns of small-scale strike-slip faults.The small-scale strike-slip faults can be categorized into NNW-trending and NE-trending groups with strike lengths ranging 200–5000 m.The development intensity of small-scale strike-slip faults in the Lower Yingshan Member notably exceeds that in the Upper Member.The Lower and Upper Yingshan members are two distinct mechanical layers with contrasting brittleness characteristics,separated by a low-brittleness layer.The superior brittleness of the Lower Yingshan Member enhances the development intensity of small-scale strike-slip faults compared to the upper member,while the low-brittleness layer exerts restrictive effects on vertical fault propagation.Fracture-vug systems formed by interactions of two or more small-scale strike-slip faults demonstrate larger sizes than those controlled by individual faults.All fracture-vug system sizes show positive correlations with the vertical extents of associated small-scale strike-slip faults,particularly intersection and approaching fracture-vug systems exhibit accelerated size increases proportional to the vertical extents.
基金supported by the National Key Research and Development Program of China No.2023YFA1009500.
文摘With the emphasis on user privacy and communication security, encrypted traffic has increased dramatically, which brings great challenges to traffic classification. The classification method of encrypted traffic based on GNN can deal with encrypted traffic well. However, existing GNN-based approaches ignore the relationship between client or server packets. In this paper, we design a network traffic topology based on GCN, called Flow Mapping Graph (FMG). FMG establishes sequential edges between vertexes by the arrival order of packets and establishes jump-order edges between vertexes by connecting packets in different bursts with the same direction. It not only reflects the time characteristics of the packet but also strengthens the relationship between the client or server packets. According to FMG, a Traffic Mapping Classification model (TMC-GCN) is designed, which can automatically capture and learn the characteristics and structure information of the top vertex in FMG. The TMC-GCN model is used to classify the encrypted traffic. The encryption stream classification problem is transformed into a graph classification problem, which can effectively deal with data from different data sources and application scenarios. By comparing the performance of TMC-GCN with other classical models in four public datasets, including CICIOT2023, ISCXVPN2016, CICAAGM2017, and GraphDapp, the effectiveness of the FMG algorithm is verified. The experimental results show that the accuracy rate of the TMC-GCN model is 96.13%, the recall rate is 95.04%, and the F1 rate is 94.54%.
文摘The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.
基金supported by the Key Research and Development Program of Jiangsu Province under Grant BE2022059-3,CTBC Bank through the Industry-Academia Cooperation Project,as well as by the Ministry of Science and Technology of Taiwan through Grants MOST-108-2218-E-002-055,MOST-109-2223-E-009-002-MY3,MOST-109-2218-E-009-025,and MOST431109-2218-E-002-015.
文摘Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.
文摘Human disturbance activities is one of the main reasons for inducing geohazards.Ecological impact assessment metrics of roads are inconsistent criteria and multiple.From the perspective of visual observation,the environment damage can be shown through detecting the uncovered area of vegetation in the images along road.To realize this,an end-to-end environment damage detection model based on convolutional neural network is proposed.A 50-layer residual network is used to extract feature map.The initial parameters are optimized by transfer learning.An example is shown by this method.The dataset including cliff and landslide damage are collected by us along road in Shennongjia national forest park.Results show 0.4703 average precision(AP)rating for cliff damage and 0.4809 average precision(AP)rating for landslide damage.Compared with YOLOv3,our model shows a better accuracy in cliff and landslide detection although a certain amount of speed is sacrificed.
基金Supported by Scientific Research Fund of Hunan Provincial Education Departmen(t23A0361)。
文摘An evolution inequality of Sobolev type involving a nonlinear convolution term is considered.By using the nonlinear capacity method and the contradiction argument,the non-existence of the nontrivial local weak solution is proved.
文摘Aspect-oriented sentiment analysis is a meticulous sentiment analysis task that aims to analyse the sentiment polarity of specific aspects. Most of the current research builds graph convolutional networks based on dependent syntactic trees, which improves the classification performance of the models to some extent. However, the technical limitations of dependent syntactic trees can introduce considerable noise into the model. Meanwhile, it is difficult for a single graph convolutional network to aggregate both semantic and syntactic structural information of nodes, which affects the final sentence classification. To cope with the above problems, this paper proposes a bi-channel graph convolutional network model. The model introduces a phrase structure tree and transforms it into a hierarchical phrase matrix. The adjacency matrix of the dependent syntactic tree and the hierarchical phrase matrix are combined as the initial matrix of the graph convolutional network to enhance the syntactic information. The semantic information feature representations of the sentences are obtained by the graph convolutional network with a multi-head attention mechanism and fused to achieve complementary learning of dual-channel features. Experimental results show that the model performs well and improves the accuracy of sentiment classification on three public benchmark datasets, namely Rest14, Lap14 and Twitter.
文摘The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.
基金funded by UKRI EPSRC Grant EP/W020408/1 Project SPRITE+2:The Security,Privacy,Identity,and Trust Engagement Network plus(phase 2)for this studyfunded by PhD project RS718 on Explainable AI through the UKRI EPSRC Grant-funded Doctoral Training Centre at Swansea University.
文摘During its growth stage,the plant is exposed to various diseases.Detection and early detection of crop diseases is amajor challenge in the horticulture industry.Crop infections can harmtotal crop yield and reduce farmers’income if not identified early.Today’s approved method involves a professional plant pathologist to diagnose the disease by visual inspection of the afflicted plant leaves.This is an excellent use case for Community Assessment and Treatment Services(CATS)due to the lengthy manual disease diagnosis process and the accuracy of identification is directly proportional to the skills of pathologists.An alternative to conventional Machine Learning(ML)methods,which require manual identification of parameters for exact results,is to develop a prototype that can be classified without pre-processing.To automatically diagnose tomato leaf disease,this research proposes a hybrid model using the Convolutional Auto-Encoders(CAE)network and the CNN-based deep learning architecture of DenseNet.To date,none of the modern systems described in this paper have a combined model based on DenseNet,CAE,and ConvolutionalNeuralNetwork(CNN)todiagnose the ailments of tomato leaves automatically.Themodelswere trained on a dataset obtained from the Plant Village repository.The dataset consisted of 9920 tomato leaves,and the model-tomodel accuracy ratio was 98.35%.Unlike other approaches discussed in this paper,this hybrid strategy requires fewer training components.Therefore,the training time to classify plant diseases with the trained algorithm,as well as the training time to automatically detect the ailments of tomato leaves,is significantly reduced.
基金The financial support of the National Natural Science Foundation of China(Grant Nos.41972293,42272337)the Science Fund for Distinguished Young Scholars of Hubei Province(Grant No.2023AFA078)are gratefully acknowledged.
文摘Soil responds to cavity expansion is inherently rate-dependent,especially in the case of fine-grained soils.To better understand such rate effects,self-boring pressuremeter tests were conducted on Kunming peaty soil within a strain rate range of 0.1%/min to 5.0%/min.The results showed a clear dependence of cavity pressure and excess pore pressure(EPP)on strain ratesdboth increased with higher rates for a given radial displacement.In light of the experimental results,three cases of cylindrical cavity expansion were investigated using the finite element method and analytical method,partially drained expansion in Modified Cam-Clay(MCC)soil,and undrained and partially drained expansion in elastoviscoplastic(EVP)soil.The EVP behavior was and modeled using the MCC model and the overstress viscoplastic theory.The results indicated that over the strain rate range of 0.0001%/min and 50%/min,the rate response of cavity pressure for the case of partially drained expansion in MCC soil(permeability coefficient ranging from 5×10^(-6) m/s to 2.5×10^(-11) m/s)is not obvious,while the EPP response during undrained expansion in EVP soil shows rate-independent.Only the partially drained solution for cavity expansion in EVP soil captured the rate-sensitive responses of both cavity pressure and EPP,confirmed by the pressuremeter tests on the Kunming peaty soil,Saint-Herblain clay,and Burswood clay.This suggests that the rate effect results from a combination of drainage-related and time-dependent soil behavior.Parametric studies further demonstrated that both viscous behavior and the overconsolidation ratio significantly influence cylindrical cavity expansion response,and the drainage conditions during expansion can be assessed using a nondimensional velocity.
基金supported by the National Natural Science Foundation of China(32171777)the Natural Science Foundation of Heilongjiang for Distinguished Young Scientists(JQ2023F002)the Fundamental Research Funds for Central Universities(2572023CT16).
文摘Climate change is a global phenomenon that has profound impacts on ecological dynamics and biodiversity,shaping the interactions between species and their environment.To gain a deeper understanding of the mechanisms driving climate change,phenological monitoring is essential.Traditional methods of defining phenological phases often rely on fixed thresholds.However,with the development of technology,deep learning-based classification models are now able to more accurately delineate phenological phases from images,enabling phenological monitoring.Despite the significant advancements these models have made in phenological monitoring,they still face challenges in fully capturing the complexity of biotic-environmental interactions,which can limit the fine-grained accuracy of phenological phase identification.To address this,we propose a novel deep learning model,RESformer,designed to monitor tree phenology at a fine-grained level using PhenoCam images.RESformer features a lightweight structure,making it suitable for deployment in resource-constrained environments.It incorporates a dual-branch routing mechanism that considers both global and local information,thereby improving the accuracy of phenological monitoring.To validate the effectiveness of RESformer,we conducted a case study involving 82,118 images taken over two years from four different locations in Wisconsin,focusing on the phenology of Acer.The images were classified into seven distinct phenological stages,with RESformer achieving an overall monitoring accuracy of 96.02%.Furthermore,we compared RESformer with a phenological monitoring approach based on the Green Chromatic Coordinate(GCC)index and ten popular classification models.The results showed that RESformer excelled in fine-grained monitoring,effectively capturing and identifying changes in phenological stages.This finding not only provides strong support for monitoring the phenology of Acer species but also offers valuable insights for understanding ecological trends and developing more effective ecosystem conservation and management strategies.
基金financial support by the National Natural Science Foundation of China(No.51364032)the Inner Mongolia Natural Science Foundation(No.2022MS05028)。
文摘The spray-deposition was used to produce billets of Mg-4Al-1.5Zn-3Ca-1Nd(A alloy)and Mg-13Al-3Zn-3Ca-1Nd(B alloy),and evolution of deformation substructure and Mg_(x)Zn_(y)Ca_(z)metastable phase in fine-grained(3μm)Mg alloys was investigated by scanning electron microscopy(SEM),transmission electron microscopy(TEM),X-ray diffraction(XRD),and electron backscattered diffraction(EBSD).It was found that different dislocation configurations were formed in A and B alloys.Redundant free dislocations(RFDs)and dislocation tangles were the ways to form deformation substructure in A alloy,no RFDs except dislocation tangles were found in B alloy.The interaction between nano-scale second phase particles(nano-scale C15 andβ-Mg_(17)(Al,Zn)_(12)phase)and different dislocation configurations had a significant effect on the deformation substructures formation.The mass transfer of Mg_(x)Zn_(y)Ca_(z)metastable phases and the stacking order of stacking faults were conducive to the Mg-Nd-Zn typed long period stacking ordered(LPSO)phases formation.Nano-scale C15 phases,Mg-Nd-Zn typed LPSO phases,c/a ratio,β-Mg_(17)(Al,Zn)_(12)phases were the key factors influencing the formation of textures.Different textures and grain boundary features(GB features)had a significant effect on k-value.The non-basal textures were the main factor affecting k-value in A alloy,while the high-angle grain boundary(HAGB)was the main factor affecting k-value in B alloy.
基金supported by the Beijing Natural Science Foundation(No.5252014)the Open Fund of The Key Laboratory of Urban Ecological Environment Simulation and Protection,Ministry of Ecology and Environment of the People's Republic of China (No.UEESP-202502)the National Natural Science Foundation of China (No.62303063&32371874)。
文摘Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robust feature extraction and efficient fusion remain major challenges.We introduce a multi-stage fine-grained audiovisual fusion network(MSFG-AVFNet) for fine-grained bird species classification,which addresses these challenges through two key components:(1) the audiovisual feature extraction module,which adopts a multi-stage finetuning strategy to provide high-quality unimodal features,laying a solid foundation for modality fusion;(2) the audiovisual feature fusion module,which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion.Experiments were conducted on the self-built AVB81and the publicly available SSW60 datasets,which contain data from 81 and 60 bird species,respectively.Comprehensive experiments demonstrate that our approach achieves notable performance gains,outperforming existing state-of-the-art methods.These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.
基金supported by National Natural Science Foundation of China(No.62471034)Hebei Natural Science Foundation(No.F2023105001).
文摘Fine-grained aircraft target detection in remote sensing holds significant research valueand practical applications,particularly in military defense and precision strikes.Given the complex-ity of remote sensing images,where targets are often small and similar within categories,detectingthese fine-grained targets is challenging.To address this,we constructed a fine-grained dataset ofremotely sensed airplanes;for the problems of remote sensing fine-grained targets with obvious head-to-tail distributions and large variations in target sizes,we proposed the DWDet fine-grained tar-get detection and recognition algorithm.First,for the problem of unbalanced category distribution,we adopt an adaptive sampling strategy.In addition,we construct a deformable convolutional blockand improve the decoupling head structure to improve the detection effect of the model ondeformed targets.Then,we design a localization loss function,which is used to improve the model’slocalization ability for targets of different scales.The experimental results show that our algorithmimproves the overall accuracy of the model by 4.1%compared to the baseline model,and improvesthe detection accuracy of small targets by 12.2%.The ablation and comparison experiments alsoprove the effectiveness of our algorithm.
基金supported by the National Natural Science Foundation of China(62276092,62303167)the Postdoctoral Fellowship Program(Grade C)of China Postdoctoral Science Foundation(GZC20230707)+3 种基金the Key Science and Technology Program of Henan Province,China(242102211051,242102211042,212102310084)Key Scientiffc Research Projects of Colleges and Universities in Henan Province,China(25A520009)the China Postdoctoral Science Foundation(2024M760808)the Henan Province medical science and technology research plan joint construction project(LHGJ2024069).
文摘Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.