Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empi...Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning.展开更多
Early identification and treatment of stroke can greatly improve patient outcomes and quality of life.Although clinical tests such as the Cincinnati Pre-hospital Stroke Scale(CPSS)and the Face Arm Speech Test(FAST)are...Early identification and treatment of stroke can greatly improve patient outcomes and quality of life.Although clinical tests such as the Cincinnati Pre-hospital Stroke Scale(CPSS)and the Face Arm Speech Test(FAST)are commonly used for stroke screening,accurate administration is dependent on specialized training.In this study,we proposed a novel multimodal deep learning approach,based on the FAST,for assessing suspected stroke patients exhibiting symptoms such as limb weakness,facial paresis,and speech disorders in acute settings.We collected a dataset comprising videos and audio recordings of emergency room patients performing designated limb movements,facial expressions,and speech tests based on the FAST.We compared the constructed deep learning model,which was designed to process multi-modal datasets,with six prior models that achieved good action classification performance,including the I3D,SlowFast,X3D,TPN,TimeSformer,and MViT.We found that the findings of our deep learning model had a higher clinical value compared with the other approaches.Moreover,the multi-modal model outperformed its single-module variants,highlighting the benefit of utilizing multiple types of patient data,such as action videos and speech audio.These results indicate that a multi-modal deep learning model combined with the FAST could greatly improve the accuracy and sensitivity of early stroke identification of stroke,thus providing a practical and powerful tool for assessing stroke patients in an emergency clinical setting.展开更多
The fast increase of online communities has brought about an increase in cyber threats inclusive of cyberbullying, hate speech, misinformation, and online harassment, making content moderation a pressing necessity. Tr...The fast increase of online communities has brought about an increase in cyber threats inclusive of cyberbullying, hate speech, misinformation, and online harassment, making content moderation a pressing necessity. Traditional single-modal AI-based detection systems, which analyze both text, photos, or movies in isolation, have established useless at taking pictures multi-modal threats, in which malicious actors spread dangerous content throughout a couple of formats. To cope with these demanding situations, we advise a multi-modal deep mastering framework that integrates Natural Language Processing (NLP), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) networks to become aware of and mitigate online threats effectively. Our proposed model combines BERT for text class, ResNet50 for photograph processing, and a hybrid LSTM-3-d CNN community for video content material analysis. We constructed a large-scale dataset comprising 500,000 textual posts, 200,000 offensive images, and 50,000 annotated motion pictures from more than one platform, which includes Twitter, Reddit, YouTube, and online gaming forums. The system became carefully evaluated using trendy gadget mastering metrics which include accuracy, precision, remember, F1-score, and ROC-AUC curves. Experimental outcomes demonstrate that our multi-modal method extensively outperforms single-modal AI classifiers, achieving an accuracy of 92.3%, precision of 91.2%, do not forget of 90.1%, and an AUC rating of 0.95. The findings validate the necessity of integrating multi-modal AI for actual-time, high-accuracy online chance detection and moderation. Future paintings will have consciousness on improving hostile robustness, enhancing scalability for real-world deployment, and addressing ethical worries associated with AI-driven content moderation.展开更多
The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational per...The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.展开更多
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear...Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.展开更多
Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power li...Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power line communication(DC-PLC)enables real-time data transmission on DC power lines.With traffic adaptation,DC-PLC can be integrated with other complementary media such as 5G to reduce transmission delay and improve reliability.However,traffic adaptation for DC-PLC and 5G integration still faces the challenges such as coupling between traffic admission control and traffic partition,dimensionality curse,and the ignorance of extreme event occurrence.To address these challenges,we propose a deep reinforcement learning(DRL)-based delay sensitive and reliable traffic adaptation algorithm(DSRTA)to minimize the total queuing delay under the constraints of traffic admission control,queuing delay,and extreme events occurrence probability.DSRTA jointly optimizes traffic admission control and traffic partition,and enables learning-based intelligent traffic adaptation.The long-term constraints are incorporated into both state and bound of drift-pluspenalty to achieve delay awareness and enforce reliability guarantee.Simulation results show that DSRTA has lower queuing delay and more reliable quality of service(QoS)guarantee than other state-of-the-art algorithms.展开更多
With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration predict...With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.展开更多
Heart disease remains a leading cause of morbidity and mortality worldwide,highlighting the need for improved diagnostic methods.Traditional diagnostics face limitations such as reliance on single-modality data and vu...Heart disease remains a leading cause of morbidity and mortality worldwide,highlighting the need for improved diagnostic methods.Traditional diagnostics face limitations such as reliance on single-modality data and vulnerability to apparatus faults,which can reduce accuracy,especially with poor-quality images.Additionally,these methods often require significant time and expertise,making them less accessible in resource-limited settings.Emerging technologies like artificial intelligence and machine learning offer promising solutions by integrating multi-modality data and enhancing diagnostic precision,ultimately improving patient outcomes and reducing healthcare costs.This study introduces Heart-Net,a multi-modal deep learning framework designed to enhance heart disease diagnosis by integrating data from Cardiac Magnetic Resonance Imaging(MRI)and Electrocardiogram(ECG).Heart-Net uses a 3D U-Net for MRI analysis and a Temporal Convolutional Graph Neural Network(TCGN)for ECG feature extraction,combining these through an attention mechanism to emphasize relevant features.Classification is performed using Optimized TCGN.This approach improves early detection,reduces diagnostic errors,and supports personalized risk assessments and continuous health monitoring.The proposed approach results show that Heart-Net significantly outperforms traditional single-modality models,achieving accuracies of 92.56%forHeartnetDataset Ⅰ(HNET-DSⅠ),93.45%forHeartnetDataset Ⅱ(HNET-DSⅡ),and 91.89%for Heartnet Dataset Ⅲ(HNET-DSⅢ),mitigating the impact of apparatus faults and image quality issues.These findings underscore the potential of Heart-Net to revolutionize heart disease diagnostics and improve clinical outcomes.展开更多
Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-dr...Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.展开更多
With the rapid development of artificial intelligence,the Internet of Things(IoT)can deploy various machine learning algorithms for network and application management.In the IoT environment,many sensors and devices ge...With the rapid development of artificial intelligence,the Internet of Things(IoT)can deploy various machine learning algorithms for network and application management.In the IoT environment,many sensors and devices generatemassive data,but data security and privacy protection have become a serious challenge.Federated learning(FL)can achieve many intelligent IoT applications by training models on local devices and allowing AI training on distributed IoT devices without data sharing.This review aims to deeply explore the combination of FL and the IoT,and analyze the application of federated learning in the IoT from the aspects of security and privacy protection.In this paper,we first describe the potential advantages of FL and the challenges faced by current IoT systems in the fields of network burden and privacy security.Next,we focus on exploring and analyzing the advantages of the combination of FL on the Internet,including privacy security,attack detection,efficient communication of the IoT,and enhanced learning quality.We also list various application scenarios of FL on the IoT.Finally,we propose several open research challenges and possible solutions.展开更多
In the realm of Intelligent Railway Transportation Systems,effective multi-party collaboration is crucial due to concerns over privacy and data silos.Vertical Federated Learning(VFL)has emerged as a promising approach...In the realm of Intelligent Railway Transportation Systems,effective multi-party collaboration is crucial due to concerns over privacy and data silos.Vertical Federated Learning(VFL)has emerged as a promising approach to facilitate such collaboration,allowing diverse entities to collectively enhance machine learning models without the need to share sensitive training data.However,existing works have highlighted VFL’s susceptibility to privacy inference attacks,where an honest but curious server could potentially reconstruct a client’s raw data from embeddings uploaded by the client.This vulnerability poses a significant threat to VFL-based intelligent railway transportation systems.In this paper,we introduce SensFL,a novel privacy-enhancing method to against privacy inference attacks in VFL.Specifically,SensFL integrates regularization of the sensitivity of embeddings to the original data into the model training process,effectively limiting the information contained in shared embeddings.By reducing the sensitivity of embeddings to the original data,SensFL can effectively resist reverse privacy attacks and prevent the reconstruction of the original data from the embeddings.Extensive experiments were conducted on four distinct datasets and three different models to demonstrate the efficacy of SensFL.Experiment results show that SensFL can effectively mitigate privacy inference attacks while maintaining the accuracy of the primary learning task.These results underscore SensFL’s potential to advance privacy protection technologies within VFL-based intelligent railway systems,addressing critical security concerns in collaborative learning environments.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Mental health is a significant issue worldwide,and the utilization of technology to assist mental health has seen a growing trend.This aims to alleviate the workload on healthcare professionals and aid individuals.Num...Mental health is a significant issue worldwide,and the utilization of technology to assist mental health has seen a growing trend.This aims to alleviate the workload on healthcare professionals and aid individuals.Numerous applications have been developed to support the challenges in intelligent healthcare systems.However,because mental health data is sensitive,privacy concerns have emerged.Federated learning has gotten some attention.This research reviews the studies on federated learning and mental health related to solving the issue of intelligent healthcare systems.It explores various dimensions of federated learning in mental health,such as datasets(their types and sources),applications categorized based on mental health symptoms,federated mental health frameworks,federated machine learning,federated deep learning,and the benefits of federated learning in mental health applications.This research conducts surveys to evaluate the current state of mental health applications,mainly focusing on the role of Federated Learning(FL)and related privacy and data security concerns.The survey provides valuable insights into how these applications are emerging and evolving,specifically emphasizing FL’s impact.展开更多
Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数...针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。展开更多
Deep neural networks(DNNs)are effective in solving both forward and inverse problems for nonlinear partial differential equations(PDEs).However,conventional DNNs are not effective in handling problems such as delay di...Deep neural networks(DNNs)are effective in solving both forward and inverse problems for nonlinear partial differential equations(PDEs).However,conventional DNNs are not effective in handling problems such as delay differential equations(DDEs)and delay integrodifferential equations(DIDEs)with constant delays,primarily due to their low regularity at delayinduced breaking points.In this paper,a DNN method that combines multi-task learning(MTL)which is proposed to solve both the forward and inverse problems of DIDEs.The core idea of this approach is to divide the original equation into multiple tasks based on the delay,using auxiliary outputs to represent the integral terms,followed by the use of MTL to seamlessly incorporate the properties at the breaking points into the loss function.Furthermore,given the increased training dificulty associated with multiple tasks and outputs,we employ a sequential training scheme to reduce training complexity and provide reference solutions for subsequent tasks.This approach significantly enhances the approximation accuracy of solving DIDEs with DNNs,as demonstrated by comparisons with traditional DNN methods.We validate the effectiveness of this method through several numerical experiments,test various parameter sharing structures in MTL and compare the testing results of these structures.Finally,this method is implemented to solve the inverse problem of nonlinear DIDE and the results show that the unknown parameters of DIDE can be discovered with sparse or noisy data.展开更多
As AI systems scale, the limitations of cloud-based architectures, including latency, bandwidth, and privacy concerns, demand decentralized alternatives. Federated learning (FL) and Edge AI provide a paradigm shift by...As AI systems scale, the limitations of cloud-based architectures, including latency, bandwidth, and privacy concerns, demand decentralized alternatives. Federated learning (FL) and Edge AI provide a paradigm shift by combining privacy preserving training with efficient, on device computation. This paper introduces a cutting-edge FL-edge integration framework, achieving a 10% to 15% increase in model accuracy and reducing communication costs by 25% in heterogeneous environments. Blockchain based secure aggregation ensures robust and tamper-proof model updates, while exploratory quantum AI techniques enhance computational efficiency. By addressing key challenges such as device variability and non-IID data, this work sets the stage for the next generation of adaptive, privacy-first AI systems, with applications in IoT, healthcare, and autonomous systems.展开更多
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi...In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.展开更多
基金Supported by Technology and Innovation Major Project of the Ministry of Science and Technology of China(2020AAA0108400, 2020AAA0108403)Tsinghua Precision Medicine Foundation(10001020109)。
文摘Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning.
基金supported by the Ministry of Science and Technology of China,No.2020AAA0109605(to XL)Meizhou Major Scientific and Technological Innovation PlatformsProjects of Guangdong Provincial Science & Technology Plan Projects,No.2019A0102005(to HW).
文摘Early identification and treatment of stroke can greatly improve patient outcomes and quality of life.Although clinical tests such as the Cincinnati Pre-hospital Stroke Scale(CPSS)and the Face Arm Speech Test(FAST)are commonly used for stroke screening,accurate administration is dependent on specialized training.In this study,we proposed a novel multimodal deep learning approach,based on the FAST,for assessing suspected stroke patients exhibiting symptoms such as limb weakness,facial paresis,and speech disorders in acute settings.We collected a dataset comprising videos and audio recordings of emergency room patients performing designated limb movements,facial expressions,and speech tests based on the FAST.We compared the constructed deep learning model,which was designed to process multi-modal datasets,with six prior models that achieved good action classification performance,including the I3D,SlowFast,X3D,TPN,TimeSformer,and MViT.We found that the findings of our deep learning model had a higher clinical value compared with the other approaches.Moreover,the multi-modal model outperformed its single-module variants,highlighting the benefit of utilizing multiple types of patient data,such as action videos and speech audio.These results indicate that a multi-modal deep learning model combined with the FAST could greatly improve the accuracy and sensitivity of early stroke identification of stroke,thus providing a practical and powerful tool for assessing stroke patients in an emergency clinical setting.
文摘The fast increase of online communities has brought about an increase in cyber threats inclusive of cyberbullying, hate speech, misinformation, and online harassment, making content moderation a pressing necessity. Traditional single-modal AI-based detection systems, which analyze both text, photos, or movies in isolation, have established useless at taking pictures multi-modal threats, in which malicious actors spread dangerous content throughout a couple of formats. To cope with these demanding situations, we advise a multi-modal deep mastering framework that integrates Natural Language Processing (NLP), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) networks to become aware of and mitigate online threats effectively. Our proposed model combines BERT for text class, ResNet50 for photograph processing, and a hybrid LSTM-3-d CNN community for video content material analysis. We constructed a large-scale dataset comprising 500,000 textual posts, 200,000 offensive images, and 50,000 annotated motion pictures from more than one platform, which includes Twitter, Reddit, YouTube, and online gaming forums. The system became carefully evaluated using trendy gadget mastering metrics which include accuracy, precision, remember, F1-score, and ROC-AUC curves. Experimental outcomes demonstrate that our multi-modal method extensively outperforms single-modal AI classifiers, achieving an accuracy of 92.3%, precision of 91.2%, do not forget of 90.1%, and an AUC rating of 0.95. The findings validate the necessity of integrating multi-modal AI for actual-time, high-accuracy online chance detection and moderation. Future paintings will have consciousness on improving hostile robustness, enhancing scalability for real-world deployment, and addressing ethical worries associated with AI-driven content moderation.
基金National Natural Science Foundation of China (52075420)Fundamental Research Funds for the Central Universities (xzy022023049)National Key Research and Development Program of China (2023YFB3408600)。
文摘The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.
基金supported by the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the National Natural Science Foundation of China(Grant No.62302086).
文摘Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.
基金supported by the Science and Technology Project of State Grid Corporation of China under grant 52094021N010(5400-202199534A-0-5-ZN)。
文摘Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power line communication(DC-PLC)enables real-time data transmission on DC power lines.With traffic adaptation,DC-PLC can be integrated with other complementary media such as 5G to reduce transmission delay and improve reliability.However,traffic adaptation for DC-PLC and 5G integration still faces the challenges such as coupling between traffic admission control and traffic partition,dimensionality curse,and the ignorance of extreme event occurrence.To address these challenges,we propose a deep reinforcement learning(DRL)-based delay sensitive and reliable traffic adaptation algorithm(DSRTA)to minimize the total queuing delay under the constraints of traffic admission control,queuing delay,and extreme events occurrence probability.DSRTA jointly optimizes traffic admission control and traffic partition,and enables learning-based intelligent traffic adaptation.The long-term constraints are incorporated into both state and bound of drift-pluspenalty to achieve delay awareness and enforce reliability guarantee.Simulation results show that DSRTA has lower queuing delay and more reliable quality of service(QoS)guarantee than other state-of-the-art algorithms.
基金supported by General Scientific Research Funding of the Science and Technology Development Fund(FDCT)in Macao(No.0150/2022/A)the Faculty Research Grants of Macao University of Science and Technology(No.FRG-22-074-FIE).
文摘With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Heart disease remains a leading cause of morbidity and mortality worldwide,highlighting the need for improved diagnostic methods.Traditional diagnostics face limitations such as reliance on single-modality data and vulnerability to apparatus faults,which can reduce accuracy,especially with poor-quality images.Additionally,these methods often require significant time and expertise,making them less accessible in resource-limited settings.Emerging technologies like artificial intelligence and machine learning offer promising solutions by integrating multi-modality data and enhancing diagnostic precision,ultimately improving patient outcomes and reducing healthcare costs.This study introduces Heart-Net,a multi-modal deep learning framework designed to enhance heart disease diagnosis by integrating data from Cardiac Magnetic Resonance Imaging(MRI)and Electrocardiogram(ECG).Heart-Net uses a 3D U-Net for MRI analysis and a Temporal Convolutional Graph Neural Network(TCGN)for ECG feature extraction,combining these through an attention mechanism to emphasize relevant features.Classification is performed using Optimized TCGN.This approach improves early detection,reduces diagnostic errors,and supports personalized risk assessments and continuous health monitoring.The proposed approach results show that Heart-Net significantly outperforms traditional single-modality models,achieving accuracies of 92.56%forHeartnetDataset Ⅰ(HNET-DSⅠ),93.45%forHeartnetDataset Ⅱ(HNET-DSⅡ),and 91.89%for Heartnet Dataset Ⅲ(HNET-DSⅢ),mitigating the impact of apparatus faults and image quality issues.These findings underscore the potential of Heart-Net to revolutionize heart disease diagnostics and improve clinical outcomes.
文摘Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.
基金supported by the Shandong Province Science and Technology Project(2023TSGC0509,2022TSGC2234)Qingdao Science and Technology Plan Project(23-1-5-yqpy-2-qy)Open Topic Grants of Anhui Province Key Laboratory of Intelligent Building&Building Energy Saving,Anhui Jianzhu University(IBES2024KF08).
文摘With the rapid development of artificial intelligence,the Internet of Things(IoT)can deploy various machine learning algorithms for network and application management.In the IoT environment,many sensors and devices generatemassive data,but data security and privacy protection have become a serious challenge.Federated learning(FL)can achieve many intelligent IoT applications by training models on local devices and allowing AI training on distributed IoT devices without data sharing.This review aims to deeply explore the combination of FL and the IoT,and analyze the application of federated learning in the IoT from the aspects of security and privacy protection.In this paper,we first describe the potential advantages of FL and the challenges faced by current IoT systems in the fields of network burden and privacy security.Next,we focus on exploring and analyzing the advantages of the combination of FL on the Internet,including privacy security,attack detection,efficient communication of the IoT,and enhanced learning quality.We also list various application scenarios of FL on the IoT.Finally,we propose several open research challenges and possible solutions.
基金supported by Systematic Major Project of Shuohuang Railway Development Co.,Ltd.,National Energy Group(Grant Number:SHTL-23-31)Beijing Natural Science Foundation(U22B2027).
文摘In the realm of Intelligent Railway Transportation Systems,effective multi-party collaboration is crucial due to concerns over privacy and data silos.Vertical Federated Learning(VFL)has emerged as a promising approach to facilitate such collaboration,allowing diverse entities to collectively enhance machine learning models without the need to share sensitive training data.However,existing works have highlighted VFL’s susceptibility to privacy inference attacks,where an honest but curious server could potentially reconstruct a client’s raw data from embeddings uploaded by the client.This vulnerability poses a significant threat to VFL-based intelligent railway transportation systems.In this paper,we introduce SensFL,a novel privacy-enhancing method to against privacy inference attacks in VFL.Specifically,SensFL integrates regularization of the sensitivity of embeddings to the original data into the model training process,effectively limiting the information contained in shared embeddings.By reducing the sensitivity of embeddings to the original data,SensFL can effectively resist reverse privacy attacks and prevent the reconstruction of the original data from the embeddings.Extensive experiments were conducted on four distinct datasets and three different models to demonstrate the efficacy of SensFL.Experiment results show that SensFL can effectively mitigate privacy inference attacks while maintaining the accuracy of the primary learning task.These results underscore SensFL’s potential to advance privacy protection technologies within VFL-based intelligent railway systems,addressing critical security concerns in collaborative learning environments.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Mental health is a significant issue worldwide,and the utilization of technology to assist mental health has seen a growing trend.This aims to alleviate the workload on healthcare professionals and aid individuals.Numerous applications have been developed to support the challenges in intelligent healthcare systems.However,because mental health data is sensitive,privacy concerns have emerged.Federated learning has gotten some attention.This research reviews the studies on federated learning and mental health related to solving the issue of intelligent healthcare systems.It explores various dimensions of federated learning in mental health,such as datasets(their types and sources),applications categorized based on mental health symptoms,federated mental health frameworks,federated machine learning,federated deep learning,and the benefits of federated learning in mental health applications.This research conducts surveys to evaluate the current state of mental health applications,mainly focusing on the role of Federated Learning(FL)and related privacy and data security concerns.The survey provides valuable insights into how these applications are emerging and evolving,specifically emphasizing FL’s impact.
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
文摘针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。
文摘Deep neural networks(DNNs)are effective in solving both forward and inverse problems for nonlinear partial differential equations(PDEs).However,conventional DNNs are not effective in handling problems such as delay differential equations(DDEs)and delay integrodifferential equations(DIDEs)with constant delays,primarily due to their low regularity at delayinduced breaking points.In this paper,a DNN method that combines multi-task learning(MTL)which is proposed to solve both the forward and inverse problems of DIDEs.The core idea of this approach is to divide the original equation into multiple tasks based on the delay,using auxiliary outputs to represent the integral terms,followed by the use of MTL to seamlessly incorporate the properties at the breaking points into the loss function.Furthermore,given the increased training dificulty associated with multiple tasks and outputs,we employ a sequential training scheme to reduce training complexity and provide reference solutions for subsequent tasks.This approach significantly enhances the approximation accuracy of solving DIDEs with DNNs,as demonstrated by comparisons with traditional DNN methods.We validate the effectiveness of this method through several numerical experiments,test various parameter sharing structures in MTL and compare the testing results of these structures.Finally,this method is implemented to solve the inverse problem of nonlinear DIDE and the results show that the unknown parameters of DIDE can be discovered with sparse or noisy data.
文摘As AI systems scale, the limitations of cloud-based architectures, including latency, bandwidth, and privacy concerns, demand decentralized alternatives. Federated learning (FL) and Edge AI provide a paradigm shift by combining privacy preserving training with efficient, on device computation. This paper introduces a cutting-edge FL-edge integration framework, achieving a 10% to 15% increase in model accuracy and reducing communication costs by 25% in heterogeneous environments. Blockchain based secure aggregation ensures robust and tamper-proof model updates, while exploratory quantum AI techniques enhance computational efficiency. By addressing key challenges such as device variability and non-IID data, this work sets the stage for the next generation of adaptive, privacy-first AI systems, with applications in IoT, healthcare, and autonomous systems.
文摘In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.