In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D...In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.展开更多
3D sparse convolution has emerged as a pivotal technique for efficient voxel-based perception in autonomous systems,enabling selective feature extraction from non-empty voxels while suppressing computational waste.Des...3D sparse convolution has emerged as a pivotal technique for efficient voxel-based perception in autonomous systems,enabling selective feature extraction from non-empty voxels while suppressing computational waste.Despite its theoretical efficiency advantages,practical implementations face under-explored limitations:the fixed geometric patterns of conventional sparse convolutional kernels inevitably process non-contributory positions during sliding-window operations,particularly in regions with uneven point cloud density.To address this,we propose Hierarchical Shape Pruning for 3D Sparse Convolution(HSP-S),which dynamically eliminates redundant kernel stripes through layer-adaptive thresholding.Unlike static soft pruning methods,HSP-S maintains trainable sparsity patterns by progressively adjusting pruning thresholds during optimization,enlarging original parameter search space while removing redundant operations.Extensive experiments validate effectiveness of HSP-S acrossmajor autonomous driving benchmarks.On KITTI’s 3D object detection task,our method reduces 93.47%redundant kernel computations whilemaintaining comparable accuracy(1.56%mAP drop).Remarkably,on themore complexNuScenes benchmark,HSP-S achieves simultaneous computation reduction(21.94%sparsity)and accuracy gains(1.02%mAP(mean Average Precision)and 0.47%NDS(nuScenes detection score)improvement),demonstrating its scalability to diverse perception scenarios.This work establishes the first learnable shape pruning framework that simultaneously enhances computational efficiency and preserves detection accuracy in 3D perception systems.展开更多
Data-driven models have become increasingly prominent in the building,architecture,and construction industries.One area ideally suited to exploit this powerful new technology is building performance simulation.Physics...Data-driven models have become increasingly prominent in the building,architecture,and construction industries.One area ideally suited to exploit this powerful new technology is building performance simulation.Physics-based models have traditionally been used to estimate the energy flow,air movement,and heat balance of buildings.However,physics-based models require many assumptions,significant computational power,and a considerable amount of time to output predictions.Artificial neural networks(ANNs)with prefabricated or simulated data are likely to be a more feasible option for environmental analysis conducted by designers during the early design phase.Because ANNs require fewer inputs and shorter computation times and offer superior performance and potential for data augmentation,they have received increased attention for predicting the surface solar radiation on buildings.Furthermore,ANNs can provide innovative and quick design solutions,enabling designers to receive instantaneous feedback on the effects of a proposed change to a building's design.This research introduces deep learning methods as a means of simulating the annual radiation intensities and exposure level of buildings without the need for physics-based engines.We propose the CoolVox model to demonstrate the feasibility of using 3D convolutional neural networks to predict the surface radiation on building facades.The CoolVox model accurately predicted the radiation intensities of building facades under different boundary conditions and performed better than ARINet(with average mean square errors of 0.01 and 0.036,respectively)in predicting the radiation intensity both with(validation error=0.0165)and without(validation error=0.0066)the presence of boundary buildings.展开更多
Deep convolutional neural networks(CNNs)have demonstrated remarkable performance in video super-resolution(VSR).However,the ability of most existing methods to recover fine details in complex scenes is often hindered ...Deep convolutional neural networks(CNNs)have demonstrated remarkable performance in video super-resolution(VSR).However,the ability of most existing methods to recover fine details in complex scenes is often hindered by the loss of shallow texture information during feature extraction.To address this limitation,we propose a 3D Convolutional Enhanced Residual Video Super-Resolution Network(3D-ERVSNet).This network employs a forward and backward bidirectional propagation module(FBBPM)that aligns features across frames using explicit optical flow through lightweight SPyNet.By incorporating an enhanced residual structure(ERS)with skip connections,shallow and deep features are effectively integrated,enhancing texture restoration capabilities.Furthermore,3D convolution module(3DCM)is applied after the backward propagation module to implicitly capture spatio-temporal dependencies.The architecture synergizes these components where FBBPM extracts aligned features,ERS fuses hierarchical representations,and 3DCM refines temporal coherence.Finally,a deep feature aggregation module(DFAM)fuses the processed features,and a pixel-upsampling module(PUM)reconstructs the high-resolution(HR)video frames.Comprehensive evaluations on REDS,Vid4,UDM10,and Vim4 benchmarks demonstrate well performance including 30.95 dB PSNR/0.8822 SSIM on REDS and 32.78 dB/0.8987 on Vim4.3D-ERVSNet achieves significant gains over baselines while maintaining high efficiency with only 6.3M parameters and 77ms/frame runtime(i.e.,20×faster than RBPN).The network’s effectiveness stems from its task-specific asymmetric design that balances explicit alignment and implicit fusion.展开更多
Cerenkov Luminescence Tomography(CLT)is a novel and potential imaging modality which can display the three-dimensional distribution of radioactive probes.However,due to severe ill-posed inverse problem,obtaining accur...Cerenkov Luminescence Tomography(CLT)is a novel and potential imaging modality which can display the three-dimensional distribution of radioactive probes.However,due to severe ill-posed inverse problem,obtaining accurate reconstruction results is still a challenge for traditional model-based methods.The recently emerged deep learning-based methods can directly learn the mapping relation between the surface photon intensity and the distribution of the radioactive source,which effectively improves the performance of CLT reconstruction.However,the previously proposed deep learning-based methods cannot work well when the order of input is disarranged.In this paper,a novel 3D graph convolution-based residual network,GCR-Net,is proposed,which can obtain a robust and accurate reconstruction result from the photon intensity of the surface.Additionally,it is proved that the network is insensitive to the order of input.The performance of this method was evaluated with numerical simulations and in vivo experiments.The results demonstrated that compared with the existing methods,the proposed method can achieve efficient and accurate reconstruction in localization and shape recovery by utilizing threedimensional information.展开更多
Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the...Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.展开更多
Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech ...Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech impairments.However,most Visual Speech Recognition(VSR)research has primarily focused on the English language and general-purpose applications,limiting its practical applicability in medical and rehabilitative settings.This study introduces the first Deep Learning(DL)based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions,facilitating communication for patients recovering from vocal cord surgeries,whether temporarily or permanently impaired.To ensure relevance and effectiveness in real-world scenarios,a carefully curated vocabulary of twenty-five Italian words was selected,encompassing critical semantic fields such as Needs,Questions,Answers,Emergencies,Greetings,Requests,and Body Parts.These words were chosen to address both essential daily communication and urgent medical assistance requests.Our approach combines a spatiotemporal Convolutional Neural Network(CNN)with a bidirectional Long Short-Term Memory(BiLSTM)recurrent network,and a Connectionist Temporal Classification(CTC)loss function to recognize individual words,without requiring predefined words boundaries.The experimental results demonstrate the system’s robust performance in recognizing target words,reaching an average accuracy of 96.4%in individual word recognition,suggesting that the system is particularly well-suited for offering support in constrained clinical and caregiving environments,where quick and reliable communication is critical.In conclusion,the study highlights the importance of developing language-specific,application-driven VSR solutions,particularly for non-English languages with limited linguistic resources.By bridging the gap between deep learning-based lip-reading and real-world clinical needs,this research advances assistive communication technologies,paving the way for more inclusive and medically relevant applications of VSR in rehabilitation and healthcare.展开更多
An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information r...An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.展开更多
Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as b...Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as by human activities.For this reason,the study of damaged areas is crucial for mural restoration.These damaged regions differ significantly from undamaged areas and can be considered abnormal targets.Traditional manual visual processing lacks strong characterization capabilities and is prone to omissions and false detections.Hyperspectral imaging can reflect the material properties more effectively than visual characterization methods.Thus,this study employs hyperspectral imaging to obtain mural information and proposes a mural anomaly detection algorithm based on a hyperspectral multi-scale residual attention network(HM-MRANet).The innovations of this paper include:(1)Constructing mural painting hyperspectral datasets.(2)Proposing a multi-scale residual spectral-spatial feature extraction module based on a 3D CNN(Convolutional Neural Networks)network to better capture multiscale information and improve performance on small-sample hyperspectral datasets.(3)Proposing the Enhanced Residual Attention Module(ERAM)to address the feature redundancy problem,enhance the network’s feature discrimination ability,and further improve abnormal area detection accuracy.The experimental results show that the AUC(Area Under Curve),Specificity,and Accuracy of this paper’s algorithm reach 85.42%,88.84%,and 87.65%,respectively,on this dataset.These results represent improvements of 3.07%,1.11%and 2.68%compared to the SSRN algorithm,demonstrating the effectiveness of this method for mural anomaly detection.展开更多
Objective:To construct a prediction model for three-dimensional(3D)dose distribution of iodine-131 based on SPECT/CT radiomics features.Methods:A multi-scale feature pyramid network(MSFPN)was used to extract heterogen...Objective:To construct a prediction model for three-dimensional(3D)dose distribution of iodine-131 based on SPECT/CT radiomics features.Methods:A multi-scale feature pyramid network(MSFPN)was used to extract heterogeneity features of thyroid tissues before and after iodine-131 treatment.A spatiotemporal modeling framework(MSFPN+CNN+GAT+3D model)integrating convolutional neural network(CNN)and graph attention network(GAT)was established to achieve precise dose distri-bution prediction.Clinical and imaging data from 320 patients at a tertiary hospital were divided into training,validation,and test sets at a 7:2:1 ratio.The models were evaluated for iodine-131 residence time and time-activity curves(TACs)of key target organs.Pre-dictive accuracy was assessed using root mean square error(RMSE),mean absolute error(MAE),and gamma index.Results:The residence time of iodine-131 in the thyroid,bladder,and stomach was longer than that in the bone marrow(P<0.05).Following io-dine-131 treatment,the activity curve of bone marrow showed minimal variation over time.The bladder tissue exhibited an initial increase in activity,reaching its peak at 4 h.followed by a gradual decline.Both the thyroid and gastric tissues demonstrated a de-creasing trend in activity over time,with the gastric tissue displaying even lower dose levels compared to the thyroid.The RMSE and MAE of the MSFPN+CNN+GAT+3D model for the dose distribution of each target organ were lower than those of the MSFPN.ResNet and 3D CNN models,and the γ index was higher than those of the MSFPN,ResNet and 3D CNN models(P<0.05),and there was no statistical significance when compared with the ensemble model(P>0.05).Conclusion:The MSFPN+CNN+GAT+3D model systematically captures deep radiomics features,effectively facilitating the imag-ing evaluation of the accurate dose distribution of multiple organs during radioactive io-dine-131 treatment.展开更多
Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet...Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet behavior recognition models,this paper proposes GTB-ResNet,a network designed to detect abnormal behaviors in petroleum station staff.Design/methodology/approach-Firstly,to mitigate the issues of excessive parameters and computational complexity in 3D ResNet,a lightweight residual convolution module called the Ghost residual module(GhostNet)is introduced in the feature extraction network.Ghost convolution replaces standard convolution,reducing model parameters while preserving multi-scale feature extraction capabilities.Secondly,to enhance the model’s focus on salient features amidst wide surveillance ranges and small target objects,the triplet attention mechanism module is integrated to facilitate spatial and channel information interaction.Lastly,to address the challenge of short time-series features leading to misjudgments in similar actions,a bidirectional gated recurrent network is added to the feature extraction backbone network.This ensures the extraction of key long time-series features,thereby improving feature extraction accuracy.Findings-The experimental setup encompasses four behavior types:illegal phone answering,smoking,falling(abnormal)and touching the face(normal),comprising a total of 892 videos.Experimental results showcase GTB-ResNet achieving a recognition accuracy of 96.7%with a model parameter count of 4.46 M and a computational complexity of 3.898 G.This represents a 4.4%improvement over 3D ResNet,with reductions of 90.4%in parameters and 61.5%in computational complexity.Originality/value-Specifically designed for edge devices in oil stations,the 3D ResNet network is tailored for real-time action prediction.To address the challenges posed by the large number of parameters in 3D ResNet networks and the difficulties in deployment on edge devices,a lightweight residual module based on ghost convolution is developed.Additionally,to tackle the issue of low detection accuracy of behaviors amidst the noisy environment of petroleum stations,a triple attention mechanism is introduced during feature extraction to enhance focus on salient features.Moreover,to overcome the potential for misjudgments arising from the similarity of actions,a Bi-GRU model is introduced to enhance the extraction of key long-term features.展开更多
Numerical weather prediction of wind speed requires statistical postprocessing of systematic errors to obtain reliable and accurate forecasts.However,use of postprocessing models is often undesirable for extreme weath...Numerical weather prediction of wind speed requires statistical postprocessing of systematic errors to obtain reliable and accurate forecasts.However,use of postprocessing models is often undesirable for extreme weather events such as gales.Here,we propose a postprocessing algorithm based on a gale-aware deep attention network to simultaneously improve wind speed forecasts and gale area warnings.Specifically,the algorithm includes both a galeaware loss function that focuses the model on potential gale areas,and an observation station supervision strategy that alleviates the problem of missing extreme values caused by data gridding.The effectiveness of the proposed model was verified by using data from 235 wind speed observation stations.Experimental results show that our model can produce wind speed forecasts with a root-mean-square error of 1.1547 m s^(-1),and a Hanssen–Kuipers discriminant score of 0.517,performance that is superior to that of the other postprocessing algorithms considered.展开更多
基金supported by the Open Project of Key Laboratory of Computational Aerodynamics,AVIC Aerodynamics Research Institute(Grant No.YL2022XFX0409).
文摘In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.
文摘3D sparse convolution has emerged as a pivotal technique for efficient voxel-based perception in autonomous systems,enabling selective feature extraction from non-empty voxels while suppressing computational waste.Despite its theoretical efficiency advantages,practical implementations face under-explored limitations:the fixed geometric patterns of conventional sparse convolutional kernels inevitably process non-contributory positions during sliding-window operations,particularly in regions with uneven point cloud density.To address this,we propose Hierarchical Shape Pruning for 3D Sparse Convolution(HSP-S),which dynamically eliminates redundant kernel stripes through layer-adaptive thresholding.Unlike static soft pruning methods,HSP-S maintains trainable sparsity patterns by progressively adjusting pruning thresholds during optimization,enlarging original parameter search space while removing redundant operations.Extensive experiments validate effectiveness of HSP-S acrossmajor autonomous driving benchmarks.On KITTI’s 3D object detection task,our method reduces 93.47%redundant kernel computations whilemaintaining comparable accuracy(1.56%mAP drop).Remarkably,on themore complexNuScenes benchmark,HSP-S achieves simultaneous computation reduction(21.94%sparsity)and accuracy gains(1.02%mAP(mean Average Precision)and 0.47%NDS(nuScenes detection score)improvement),demonstrating its scalability to diverse perception scenarios.This work establishes the first learnable shape pruning framework that simultaneously enhances computational efficiency and preserves detection accuracy in 3D perception systems.
文摘Data-driven models have become increasingly prominent in the building,architecture,and construction industries.One area ideally suited to exploit this powerful new technology is building performance simulation.Physics-based models have traditionally been used to estimate the energy flow,air movement,and heat balance of buildings.However,physics-based models require many assumptions,significant computational power,and a considerable amount of time to output predictions.Artificial neural networks(ANNs)with prefabricated or simulated data are likely to be a more feasible option for environmental analysis conducted by designers during the early design phase.Because ANNs require fewer inputs and shorter computation times and offer superior performance and potential for data augmentation,they have received increased attention for predicting the surface solar radiation on buildings.Furthermore,ANNs can provide innovative and quick design solutions,enabling designers to receive instantaneous feedback on the effects of a proposed change to a building's design.This research introduces deep learning methods as a means of simulating the annual radiation intensities and exposure level of buildings without the need for physics-based engines.We propose the CoolVox model to demonstrate the feasibility of using 3D convolutional neural networks to predict the surface radiation on building facades.The CoolVox model accurately predicted the radiation intensities of building facades under different boundary conditions and performed better than ARINet(with average mean square errors of 0.01 and 0.036,respectively)in predicting the radiation intensity both with(validation error=0.0165)and without(validation error=0.0066)the presence of boundary buildings.
基金supported in part by the Basic and Applied Basic Research Foundation of Guangdong Province[2025A1515011566]in part by the State Key Laboratory for Novel Software Technology,Nanjing University[KFKT2024B08]+1 种基金in part by Leading Talents in Gusu Innovation and Entrepreneurship[ZXL2023170]in part by the Basic Research Programs of Taicang 2024,[TC2024JC32].
文摘Deep convolutional neural networks(CNNs)have demonstrated remarkable performance in video super-resolution(VSR).However,the ability of most existing methods to recover fine details in complex scenes is often hindered by the loss of shallow texture information during feature extraction.To address this limitation,we propose a 3D Convolutional Enhanced Residual Video Super-Resolution Network(3D-ERVSNet).This network employs a forward and backward bidirectional propagation module(FBBPM)that aligns features across frames using explicit optical flow through lightweight SPyNet.By incorporating an enhanced residual structure(ERS)with skip connections,shallow and deep features are effectively integrated,enhancing texture restoration capabilities.Furthermore,3D convolution module(3DCM)is applied after the backward propagation module to implicitly capture spatio-temporal dependencies.The architecture synergizes these components where FBBPM extracts aligned features,ERS fuses hierarchical representations,and 3DCM refines temporal coherence.Finally,a deep feature aggregation module(DFAM)fuses the processed features,and a pixel-upsampling module(PUM)reconstructs the high-resolution(HR)video frames.Comprehensive evaluations on REDS,Vid4,UDM10,and Vim4 benchmarks demonstrate well performance including 30.95 dB PSNR/0.8822 SSIM on REDS and 32.78 dB/0.8987 on Vim4.3D-ERVSNet achieves significant gains over baselines while maintaining high efficiency with only 6.3M parameters and 77ms/frame runtime(i.e.,20×faster than RBPN).The network’s effectiveness stems from its task-specific asymmetric design that balances explicit alignment and implicit fusion.
基金National Key Research and Development Program of China (2019YFC1521102)National Natural Science Foundation of China (61701403,61806164,62101439,61906154)+4 种基金China Postdoctoral Science Foundation (2018M643719)Natural Science Foundation of Shaanxi Province (2020JQ-601)Young Talent Support Program of the Shaanxi Association for Science and Technology (20190107)Key Research and Development Program of Shaanxi Province (2019GY-215,2021ZDLSF06-04)Major research and development project of Qinghai (2020-SF-143).
文摘Cerenkov Luminescence Tomography(CLT)is a novel and potential imaging modality which can display the three-dimensional distribution of radioactive probes.However,due to severe ill-posed inverse problem,obtaining accurate reconstruction results is still a challenge for traditional model-based methods.The recently emerged deep learning-based methods can directly learn the mapping relation between the surface photon intensity and the distribution of the radioactive source,which effectively improves the performance of CLT reconstruction.However,the previously proposed deep learning-based methods cannot work well when the order of input is disarranged.In this paper,a novel 3D graph convolution-based residual network,GCR-Net,is proposed,which can obtain a robust and accurate reconstruction result from the photon intensity of the surface.Additionally,it is proved that the network is insensitive to the order of input.The performance of this method was evaluated with numerical simulations and in vivo experiments.The results demonstrated that compared with the existing methods,the proposed method can achieve efficient and accurate reconstruction in localization and shape recovery by utilizing threedimensional information.
基金Supported by the Shaanxi Province Key Research and Development Project (No. 2021GY-280)Shaanxi Province Natural Science Basic Research Program (No. 2021JM-459)the National Natural Science Foundation of China (No. 61772417)
文摘Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.
文摘Lip-reading technology,based on visual speech decoding and automatic speech recognition,offers a promising solution to overcoming communication barriers,particularly for individuals with temporary or permanent speech impairments.However,most Visual Speech Recognition(VSR)research has primarily focused on the English language and general-purpose applications,limiting its practical applicability in medical and rehabilitative settings.This study introduces the first Deep Learning(DL)based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions,facilitating communication for patients recovering from vocal cord surgeries,whether temporarily or permanently impaired.To ensure relevance and effectiveness in real-world scenarios,a carefully curated vocabulary of twenty-five Italian words was selected,encompassing critical semantic fields such as Needs,Questions,Answers,Emergencies,Greetings,Requests,and Body Parts.These words were chosen to address both essential daily communication and urgent medical assistance requests.Our approach combines a spatiotemporal Convolutional Neural Network(CNN)with a bidirectional Long Short-Term Memory(BiLSTM)recurrent network,and a Connectionist Temporal Classification(CTC)loss function to recognize individual words,without requiring predefined words boundaries.The experimental results demonstrate the system’s robust performance in recognizing target words,reaching an average accuracy of 96.4%in individual word recognition,suggesting that the system is particularly well-suited for offering support in constrained clinical and caregiving environments,where quick and reliable communication is critical.In conclusion,the study highlights the importance of developing language-specific,application-driven VSR solutions,particularly for non-English languages with limited linguistic resources.By bridging the gap between deep learning-based lip-reading and real-world clinical needs,this research advances assistive communication technologies,paving the way for more inclusive and medically relevant applications of VSR in rehabilitation and healthcare.
基金supported by the General Program of the National Natural Science Foundation of China (62272234)the Enterprise Cooperation Project (2022h160)the Priority Academic Program Development of Jiangsu Higher Education Institutions Project.
文摘An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.
基金supported by Key Research and Development Plan of Ministry of Science and Technology(No.2023YFF0906200)Shaanxi Key Research and Development Plan(No.2018ZDXM-SF-093)+3 种基金Shaanxi Province Key Industrial Innovation Chain(Nos.S2022-YF-ZDCXL-ZDLGY-0093 and 2023-ZDLGY-45)Light of West China(No.XAB2022YN10)The China Postdoctoral Science Foundation(No.2023M740760)Shaanxi Key Research and Development Plan(No.2024SF-YBXM-678).
文摘Mural paintings hold significant historical information and possess substantial artistic and cultural value.However,murals are inevitably damaged by natural environmental factors such as wind and sunlight,as well as by human activities.For this reason,the study of damaged areas is crucial for mural restoration.These damaged regions differ significantly from undamaged areas and can be considered abnormal targets.Traditional manual visual processing lacks strong characterization capabilities and is prone to omissions and false detections.Hyperspectral imaging can reflect the material properties more effectively than visual characterization methods.Thus,this study employs hyperspectral imaging to obtain mural information and proposes a mural anomaly detection algorithm based on a hyperspectral multi-scale residual attention network(HM-MRANet).The innovations of this paper include:(1)Constructing mural painting hyperspectral datasets.(2)Proposing a multi-scale residual spectral-spatial feature extraction module based on a 3D CNN(Convolutional Neural Networks)network to better capture multiscale information and improve performance on small-sample hyperspectral datasets.(3)Proposing the Enhanced Residual Attention Module(ERAM)to address the feature redundancy problem,enhance the network’s feature discrimination ability,and further improve abnormal area detection accuracy.The experimental results show that the AUC(Area Under Curve),Specificity,and Accuracy of this paper’s algorithm reach 85.42%,88.84%,and 87.65%,respectively,on this dataset.These results represent improvements of 3.07%,1.11%and 2.68%compared to the SSRN algorithm,demonstrating the effectiveness of this method for mural anomaly detection.
文摘Objective:To construct a prediction model for three-dimensional(3D)dose distribution of iodine-131 based on SPECT/CT radiomics features.Methods:A multi-scale feature pyramid network(MSFPN)was used to extract heterogeneity features of thyroid tissues before and after iodine-131 treatment.A spatiotemporal modeling framework(MSFPN+CNN+GAT+3D model)integrating convolutional neural network(CNN)and graph attention network(GAT)was established to achieve precise dose distri-bution prediction.Clinical and imaging data from 320 patients at a tertiary hospital were divided into training,validation,and test sets at a 7:2:1 ratio.The models were evaluated for iodine-131 residence time and time-activity curves(TACs)of key target organs.Pre-dictive accuracy was assessed using root mean square error(RMSE),mean absolute error(MAE),and gamma index.Results:The residence time of iodine-131 in the thyroid,bladder,and stomach was longer than that in the bone marrow(P<0.05).Following io-dine-131 treatment,the activity curve of bone marrow showed minimal variation over time.The bladder tissue exhibited an initial increase in activity,reaching its peak at 4 h.followed by a gradual decline.Both the thyroid and gastric tissues demonstrated a de-creasing trend in activity over time,with the gastric tissue displaying even lower dose levels compared to the thyroid.The RMSE and MAE of the MSFPN+CNN+GAT+3D model for the dose distribution of each target organ were lower than those of the MSFPN.ResNet and 3D CNN models,and the γ index was higher than those of the MSFPN,ResNet and 3D CNN models(P<0.05),and there was no statistical significance when compared with the ensemble model(P>0.05).Conclusion:The MSFPN+CNN+GAT+3D model systematically captures deep radiomics features,effectively facilitating the imag-ing evaluation of the accurate dose distribution of multiple organs during radioactive io-dine-131 treatment.
文摘Purpose-The abnormal behaviors of staff at petroleum stations pose significant safety hazards.Addressing the challenges of high parameter counts,lengthy training periods and low recognition rates in existing 3D ResNet behavior recognition models,this paper proposes GTB-ResNet,a network designed to detect abnormal behaviors in petroleum station staff.Design/methodology/approach-Firstly,to mitigate the issues of excessive parameters and computational complexity in 3D ResNet,a lightweight residual convolution module called the Ghost residual module(GhostNet)is introduced in the feature extraction network.Ghost convolution replaces standard convolution,reducing model parameters while preserving multi-scale feature extraction capabilities.Secondly,to enhance the model’s focus on salient features amidst wide surveillance ranges and small target objects,the triplet attention mechanism module is integrated to facilitate spatial and channel information interaction.Lastly,to address the challenge of short time-series features leading to misjudgments in similar actions,a bidirectional gated recurrent network is added to the feature extraction backbone network.This ensures the extraction of key long time-series features,thereby improving feature extraction accuracy.Findings-The experimental setup encompasses four behavior types:illegal phone answering,smoking,falling(abnormal)and touching the face(normal),comprising a total of 892 videos.Experimental results showcase GTB-ResNet achieving a recognition accuracy of 96.7%with a model parameter count of 4.46 M and a computational complexity of 3.898 G.This represents a 4.4%improvement over 3D ResNet,with reductions of 90.4%in parameters and 61.5%in computational complexity.Originality/value-Specifically designed for edge devices in oil stations,the 3D ResNet network is tailored for real-time action prediction.To address the challenges posed by the large number of parameters in 3D ResNet networks and the difficulties in deployment on edge devices,a lightweight residual module based on ghost convolution is developed.Additionally,to tackle the issue of low detection accuracy of behaviors amidst the noisy environment of petroleum stations,a triple attention mechanism is introduced during feature extraction to enhance focus on salient features.Moreover,to overcome the potential for misjudgments arising from the similarity of actions,a Bi-GRU model is introduced to enhance the extraction of key long-term features.
基金Supported by the National Natural Science Foundation of China (62106169)。
文摘Numerical weather prediction of wind speed requires statistical postprocessing of systematic errors to obtain reliable and accurate forecasts.However,use of postprocessing models is often undesirable for extreme weather events such as gales.Here,we propose a postprocessing algorithm based on a gale-aware deep attention network to simultaneously improve wind speed forecasts and gale area warnings.Specifically,the algorithm includes both a galeaware loss function that focuses the model on potential gale areas,and an observation station supervision strategy that alleviates the problem of missing extreme values caused by data gridding.The effectiveness of the proposed model was verified by using data from 235 wind speed observation stations.Experimental results show that our model can produce wind speed forecasts with a root-mean-square error of 1.1547 m s^(-1),and a Hanssen–Kuipers discriminant score of 0.517,performance that is superior to that of the other postprocessing algorithms considered.