This study proposes a method for analyzing the security distance of an Active Distribution Network(ADN)by incorporating the demand response of an Energy Hub(EH).Taking into account the impact of stochastic wind-solar ...This study proposes a method for analyzing the security distance of an Active Distribution Network(ADN)by incorporating the demand response of an Energy Hub(EH).Taking into account the impact of stochastic wind-solar power and flexible loads on the EH,an interactive power model was developed to represent the EH’s operation under these influences.Additionally,an ADN security distance model,integrating an EH with flexible loads,was constructed to evaluate the effect of flexible load variations on the ADN’s security distance.By considering scenarios such as air conditioning(AC)load reduction and base station(BS)load transfer,the security distances of phases A,B,and C increased by 17.1%,17.2%,and 17.7%,respectively.Furthermore,a multi-objective optimal power flow model was formulated and solved using the Forward-Backward Power Flow Algorithm,the NSGA-II multi-objective optimization algo-rithm,and the maximum satisfaction method.The simulation results of the IEEE33 node system example demonstrate that after opti-mization,the total energy cost for one day is reduced by 0.026%,and the total security distance limit of the ADN’s three phases is improved by 0.1 MVA.This method effectively enhances the security distance,facilitates BS load transfer and AC load reduction,and contributes to the energy-saving,economical,and safe operation of the power system.展开更多
Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power li...Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power line communication(DC-PLC)enables real-time data transmission on DC power lines.With traffic adaptation,DC-PLC can be integrated with other complementary media such as 5G to reduce transmission delay and improve reliability.However,traffic adaptation for DC-PLC and 5G integration still faces the challenges such as coupling between traffic admission control and traffic partition,dimensionality curse,and the ignorance of extreme event occurrence.To address these challenges,we propose a deep reinforcement learning(DRL)-based delay sensitive and reliable traffic adaptation algorithm(DSRTA)to minimize the total queuing delay under the constraints of traffic admission control,queuing delay,and extreme events occurrence probability.DSRTA jointly optimizes traffic admission control and traffic partition,and enables learning-based intelligent traffic adaptation.The long-term constraints are incorporated into both state and bound of drift-pluspenalty to achieve delay awareness and enforce reliability guarantee.Simulation results show that DSRTA has lower queuing delay and more reliable quality of service(QoS)guarantee than other state-of-the-art algorithms.展开更多
Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction mode...Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the...Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.展开更多
Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and compl...Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and complementary information simultaneously,including gene expression profiles,spatial locations,and histology images.However,most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images.To fully leverage the multi-modal information,we propose a SPAtially embedded Deep Attentional graph Clustering(SpaDAC)method to identify spatial domains while reconstructing denoised gene expression profiles.This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives.Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets.SpaDAC is a valuable tool for spatial domain detection,facilitating the comprehension of tissue architecture and cellular microenvironment.The source code of SpaDAC is freely available at Github(https://github.com/huoyuying/SpaDAC.git).展开更多
Objective Oral squamous cell carcinoma(OSCC)is an aggressive cancer with a high mortality rate.San-Zhong-Kui-Jian-Tang(SZKJT),a Chinese herbal formula,has long been used as an adjuvant therapy in cancer clinical pract...Objective Oral squamous cell carcinoma(OSCC)is an aggressive cancer with a high mortality rate.San-Zhong-Kui-Jian-Tang(SZKJT),a Chinese herbal formula,has long been used as an adjuvant therapy in cancer clinical practice.Although its therapeutic effects and molecular mechanisms in OSCC have been previously elucidated,the potential interactions and mechanisms between the active phytochemicals and their therapeutic targets are still lacking.Methods The present study employed network pharmacology and topology approaches to establish a“herbal ingredients–active phytochemicals–target interaction”network to explore the potential therapeutic targets of SZKJT-active phytochemicals in the treatment of OSCC.The role of the target proteins in oncogenesis was assessed via GO and KEGG enrichment analyses,and their interactions with the active phytochemicals of SZKJT were calculated via molecular docking and dynamic simulations.The pharmacokinetic properties and toxicity of the active phytochemicals were also predicted.Results A total of 171 active phytochemicals of SZKJT fulfilled the bioavailability and drug-likeness screening criteria,with the flavonoids quercetin,kaempferol,and naringenin having the greatest potential.The 4 crucial targets of these active phytochemicals are PTGS2,TNF,BCL2,and CASP3,which encode cyclooxygenase-2,tumor necrosis factor(TNF),BCL-2 apoptosis regulator,and caspase-3,respectively.The interactions between phytochemicals and target proteins were predicted to be thermodynamically feasible and stable via molecular docking and dynamics simulations.Finally,the results revealed that the IL-6/JAK/STAT3 pathway and TNF signaling via NF-κB are the two prominent pathways targeted by SZKJT.Conclusion In summary,this study provides computational data for in-depth exploration of the mechanism by which SZKJT activates phytochemicals to treat OSCC.展开更多
There is an explicit and implicit assumption in multimodal traffic equilibrium models, that is, if the equilibrium exists, then it will also occur. The assumption is very idealized; in fact, it may be shown that the q...There is an explicit and implicit assumption in multimodal traffic equilibrium models, that is, if the equilibrium exists, then it will also occur. The assumption is very idealized; in fact, it may be shown that the quite contrary could happen, because in multimodal traffic network, especially in mixed traffic conditions the interaction among traffic modes is asymmetric and the asymmetric interaction may result in the instability of traffic system. In this paper, to study the stability of multimodal traffic system, we respectively present the travel cost function in mixed traffic conditions and in traffic network with dedicated bus lanes. Based on a day-to-day dynamical model, we study the evolution of daily route choice of travelers in multimodal traffic network using 10000 random initial values for different cases. From the results of simulation, it can be concluded that the asymmetric interaction between the cars and buses in mixed traffic conditions can lead the traffic system to instability when traffic demand is larger. We also study the effect of travelers' perception error on the stability of multimodal traffic network. Although the larger perception error can alleviate the effect of interaction between cars and buses and improve the stability of traffic system in mixed traffic conditions, the traffic system also become instable when the traffic demand is larger than a number. For all cases simulated in this study, with the same parameters, traffic system with dedicated bus lane has better stability for traffic demand than that in mixed traffic conditions. We also find that the network with dedicated bus lane has higher portion of travelers by bus than it of mixed traffic network. So it can be concluded that building dedicated bus lane can improve the stability of traffic system and attract more travelers to choose bus reducing the traffic congestion.展开更多
Biometric authentication provides a reliable,user-specific approach for identity verification,significantly enhancing access control and security against unauthorized intrusions in cybersecurity.Unimodal biometric sys...Biometric authentication provides a reliable,user-specific approach for identity verification,significantly enhancing access control and security against unauthorized intrusions in cybersecurity.Unimodal biometric systems that rely on either face or voice recognition encounter several challenges,including inconsistent data quality,environmental noise,and susceptibility to spoofing attacks.To address these limitations,this research introduces a robust multi-modal biometric recognition framework,namely Quantum-Enhanced Biometric Fusion Network.The proposed model strengthens security and boosts recognition accuracy through the fusion of facial and voice features.Furthermore,the model employs advanced pre-processing techniques to generate high-quality facial images and voice recordings,enabling more efficient face and voice recognition.Augmentation techniques are deployed to enhance model performance by enriching the training dataset with diverse and representative samples.The local features are extracted using advanced neural methods,while the voice features are extracted using a Pyramid-1D Wavelet Convolutional Bidirectional Network,which effectively captures speech dynamics.The Quantum Residual Network encodes facial features into quantum states,enabling powerful quantum-enhanced representations.These normalized feature sets are fused using an early fusion strategy that preserves complementary spatial-temporal characteristics.The experimental validation is conducted using a biometric audio and video dataset,with comprehensive evaluations including ablation and statistical analyses.The experimental analyses ensure that the proposed model attains superior performance,outperforming existing biometric methods with an average accuracy of 98.99%.The proposed model improves recognition robustness,making it an efficient multimodal solution for cybersecurity applications.展开更多
With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extract...With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extracting high-quality emotional features and achieving effective interaction between different modalities remain two major obstacles in multimodal sentiment analysis.To address these challenges,this paper proposes a Text-Gated Interaction Network with Inter-Sample Commonality Perception(TGICP).Specifically,we utilize a Inter-sample Commonality Perception(ICP)module to extract common features from similar samples within the same modality,and use these common features to enhance the original features of each modality,thereby obtaining a richer and more complete multimodal sentiment representation.Subsequently,in the cross-modal interaction stage,we design a Text-Gated Interaction(TGI)module,which is text-driven.By calculating the mutual information difference between the text modality and nonverbal modalities,the TGI module dynamically adjusts the influence of emotional information from the text modality on nonverbal modalities.This helps to reduce modality information asymmetry while enabling full cross-modal interaction.Experimental results show that the proposed model achieves outstanding performance on both the CMU-MOSI and CMU-MOSEI baseline multimodal sentiment analysis datasets,validating its effectiveness in emotion recognition tasks.展开更多
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u...Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.展开更多
In this paper, a study on four African ports was taken out that all have the ca-pability to become a hub port that can serve the central African region. The paper sort to determine which port was most suitable and por...In this paper, a study on four African ports was taken out that all have the ca-pability to become a hub port that can serve the central African region. The paper sort to determine which port was most suitable and port indexing was the method that was used to evaluate these ports. The ports evaluated were the port of Kribi, the port of Bata, the port of Libreville and the port of Pointe-Noire. There were other models that were also used which included linear regression and linear programming which all contributed to providing the final results of the port with the most suitable potential to serve as a hub port and meaningful results were obtained. The final results showed that the port of Pointe-Noire was the most suitable port to serve the central African region as a hub port.展开更多
The rapid development of electric buses has brought a surge in the number of bus hubs and their charging and discharging capacities.Therefore,the location and construction scale of bus hubs will greatly affect the ope...The rapid development of electric buses has brought a surge in the number of bus hubs and their charging and discharging capacities.Therefore,the location and construction scale of bus hubs will greatly affect the operation costs and benefits of an urban distribution network in the future.Through the scientific and reasonable planning of public transport hubs on the premise of meeting the needs of basic public transport services,it can reduce the negative impact of electric bus charging loads upon the power grids.Furthermore,it can use its flexible operation characteristics to provide flexible support for the distribution network.In this paper,taking the impact of public transport hub on the reliability of distribution network as the starting point,a three-level programming optimization model based on the value and economy of distribution network load loss is proposed.Through the upper model,several planning schemes can be generated,which provides boundary conditions for the expansion of middle-level optimization.The normal operation dispatching scheme of public transport hub obtained from the middle-level optimization results provides boundary conditions for the development of lower level optimization.Through the lower level optimization,the expected load loss of the whole distribution system including bus hub under the planning scheme given by the upper level can be obtained.The effectiveness of the model is verified by an IEEE-33 bus example.展开更多
With the emergence and development of social networks,people can stay in touch with friends,family,and colleagues more quickly and conveniently,regardless of their location.This ubiquitous digital internet environment...With the emergence and development of social networks,people can stay in touch with friends,family,and colleagues more quickly and conveniently,regardless of their location.This ubiquitous digital internet environment has also led to large-scale disclosure of personal privacy.Due to the complexity and subtlety of sensitive information,traditional sensitive information identification technologies cannot thoroughly address the characteristics of each piece of data,thus weakening the deep connections between text and images.In this context,this paper adopts the CLIP model as a modality discriminator.By using comparative learning between sensitive image descriptions and images,the similarity between the images and the sensitive descriptions is obtained to determine whether the images contain sensitive information.This provides the basis for identifying sensitive information using different modalities.Specifically,if the original data does not contain sensitive information,only single-modality text-sensitive information identification is performed;if the original data contains sensitive information,multimodality sensitive information identification is conducted.This approach allows for differentiated processing of each piece of data,thereby achieving more accurate sensitive information identification.The aforementioned modality discriminator can address the limitations of existing sensitive information identification technologies,making the identification of sensitive information from the original data more appropriate and precise.展开更多
Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion...Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion and daily life.Compared to pure text content,multmodal content significantly increases the visibility and share ability of posts.This has made the search for efficient modality representations and cross-modal information interaction methods a key focus in the field of multimodal fake news detection.To effectively address the critical challenge of accurately detecting fake news on social media,this paper proposes a fake news detection model based on crossmodal message aggregation and a gated fusion network(MAGF).MAGF first uses BERT to extract cumulative textual feature representations and word-level features,applies Faster Region-based ConvolutionalNeuralNetwork(Faster R-CNN)to obtain image objects,and leverages ResNet-50 and Visual Geometry Group-19(VGG-19)to obtain image region features and global features.The image region features and word-level text features are then projected into a low-dimensional space to calculate a text-image affinity matrix for cross-modal message aggregation.The gated fusion network combines text and image region features to obtain adaptively aggregated features.The interaction matrix is derived through an attention mechanism and further integrated with global image features using a co-attention mechanism to producemultimodal representations.Finally,these fused features are fed into a classifier for news categorization.Experiments were conducted on two public datasets,Twitter and Weibo.Results show that the proposed model achieves accuracy rates of 91.8%and 88.7%on the two datasets,respectively,significantly outperforming traditional unimodal and existing multimodal models.展开更多
In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure in...In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics.展开更多
In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the e...In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.展开更多
基金supported in part by the National Nat-ural Science Foundation of China(No.51977012,No.52307080).
文摘This study proposes a method for analyzing the security distance of an Active Distribution Network(ADN)by incorporating the demand response of an Energy Hub(EH).Taking into account the impact of stochastic wind-solar power and flexible loads on the EH,an interactive power model was developed to represent the EH’s operation under these influences.Additionally,an ADN security distance model,integrating an EH with flexible loads,was constructed to evaluate the effect of flexible load variations on the ADN’s security distance.By considering scenarios such as air conditioning(AC)load reduction and base station(BS)load transfer,the security distances of phases A,B,and C increased by 17.1%,17.2%,and 17.7%,respectively.Furthermore,a multi-objective optimal power flow model was formulated and solved using the Forward-Backward Power Flow Algorithm,the NSGA-II multi-objective optimization algo-rithm,and the maximum satisfaction method.The simulation results of the IEEE33 node system example demonstrate that after opti-mization,the total energy cost for one day is reduced by 0.026%,and the total security distance limit of the ADN’s three phases is improved by 0.1 MVA.This method effectively enhances the security distance,facilitates BS load transfer and AC load reduction,and contributes to the energy-saving,economical,and safe operation of the power system.
基金supported by the Science and Technology Project of State Grid Corporation of China under grant 52094021N010(5400-202199534A-0-5-ZN)。
文摘Low-carbon smart parks achieve selfbalanced carbon emission and absorption through the cooperative scheduling of direct current(DC)-based distributed photovoltaic,energy storage units,and loads.Direct current power line communication(DC-PLC)enables real-time data transmission on DC power lines.With traffic adaptation,DC-PLC can be integrated with other complementary media such as 5G to reduce transmission delay and improve reliability.However,traffic adaptation for DC-PLC and 5G integration still faces the challenges such as coupling between traffic admission control and traffic partition,dimensionality curse,and the ignorance of extreme event occurrence.To address these challenges,we propose a deep reinforcement learning(DRL)-based delay sensitive and reliable traffic adaptation algorithm(DSRTA)to minimize the total queuing delay under the constraints of traffic admission control,queuing delay,and extreme events occurrence probability.DSRTA jointly optimizes traffic admission control and traffic partition,and enables learning-based intelligent traffic adaptation.The long-term constraints are incorporated into both state and bound of drift-pluspenalty to achieve delay awareness and enforce reliability guarantee.Simulation results show that DSRTA has lower queuing delay and more reliable quality of service(QoS)guarantee than other state-of-the-art algorithms.
基金Project(2023JH26-10100002)supported by the Liaoning Science and Technology Major Project,ChinaProjects(U21A20117,52074085)supported by the National Natural Science Foundation of China+1 种基金Project(2022JH2/101300008)supported by the Liaoning Applied Basic Research Program Project,ChinaProject(22567612H)supported by the Hebei Provincial Key Laboratory Performance Subsidy Project,China。
文摘Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金supported by the National Key Research and Development Project under Grant 2020YFB1807602Key Program of Marine Economy Development Special Foundation of Department of Natural Resources of Guangdong Province(GDNRC[2023]24)the National Natural Science Foundation of China under Grant 62271267.
文摘Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.
基金supported by National Natural Science Foundation of China(62003028).X.L.was supported by a Scholarship from the China Scholarship Council.
文摘Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and complementary information simultaneously,including gene expression profiles,spatial locations,and histology images.However,most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images.To fully leverage the multi-modal information,we propose a SPAtially embedded Deep Attentional graph Clustering(SpaDAC)method to identify spatial domains while reconstructing denoised gene expression profiles.This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives.Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets.SpaDAC is a valuable tool for spatial domain detection,facilitating the comprehension of tissue architecture and cellular microenvironment.The source code of SpaDAC is freely available at Github(https://github.com/huoyuying/SpaDAC.git).
文摘Objective Oral squamous cell carcinoma(OSCC)is an aggressive cancer with a high mortality rate.San-Zhong-Kui-Jian-Tang(SZKJT),a Chinese herbal formula,has long been used as an adjuvant therapy in cancer clinical practice.Although its therapeutic effects and molecular mechanisms in OSCC have been previously elucidated,the potential interactions and mechanisms between the active phytochemicals and their therapeutic targets are still lacking.Methods The present study employed network pharmacology and topology approaches to establish a“herbal ingredients–active phytochemicals–target interaction”network to explore the potential therapeutic targets of SZKJT-active phytochemicals in the treatment of OSCC.The role of the target proteins in oncogenesis was assessed via GO and KEGG enrichment analyses,and their interactions with the active phytochemicals of SZKJT were calculated via molecular docking and dynamic simulations.The pharmacokinetic properties and toxicity of the active phytochemicals were also predicted.Results A total of 171 active phytochemicals of SZKJT fulfilled the bioavailability and drug-likeness screening criteria,with the flavonoids quercetin,kaempferol,and naringenin having the greatest potential.The 4 crucial targets of these active phytochemicals are PTGS2,TNF,BCL2,and CASP3,which encode cyclooxygenase-2,tumor necrosis factor(TNF),BCL-2 apoptosis regulator,and caspase-3,respectively.The interactions between phytochemicals and target proteins were predicted to be thermodynamically feasible and stable via molecular docking and dynamics simulations.Finally,the results revealed that the IL-6/JAK/STAT3 pathway and TNF signaling via NF-κB are the two prominent pathways targeted by SZKJT.Conclusion In summary,this study provides computational data for in-depth exploration of the mechanism by which SZKJT activates phytochemicals to treat OSCC.
基金Supported by the National Basic Research Development Program of China under Grant No. 2012CB725401, Fundamental Research Funds for the Central Universities under Grant No. 2012JBZ 005, Funds for International Cooperation and Exchange of the National Natural Science Foundation of China under Grant No. 71210001, National Natural Science Foundation of China under Grant No. 71271023, Foundation for the Author of National Excellent Doctoral Dissertation of China under Grant No. 201170, and Beijing Nova Program under Grant No. 2009A15
文摘There is an explicit and implicit assumption in multimodal traffic equilibrium models, that is, if the equilibrium exists, then it will also occur. The assumption is very idealized; in fact, it may be shown that the quite contrary could happen, because in multimodal traffic network, especially in mixed traffic conditions the interaction among traffic modes is asymmetric and the asymmetric interaction may result in the instability of traffic system. In this paper, to study the stability of multimodal traffic system, we respectively present the travel cost function in mixed traffic conditions and in traffic network with dedicated bus lanes. Based on a day-to-day dynamical model, we study the evolution of daily route choice of travelers in multimodal traffic network using 10000 random initial values for different cases. From the results of simulation, it can be concluded that the asymmetric interaction between the cars and buses in mixed traffic conditions can lead the traffic system to instability when traffic demand is larger. We also study the effect of travelers' perception error on the stability of multimodal traffic network. Although the larger perception error can alleviate the effect of interaction between cars and buses and improve the stability of traffic system in mixed traffic conditions, the traffic system also become instable when the traffic demand is larger than a number. For all cases simulated in this study, with the same parameters, traffic system with dedicated bus lane has better stability for traffic demand than that in mixed traffic conditions. We also find that the network with dedicated bus lane has higher portion of travelers by bus than it of mixed traffic network. So it can be concluded that building dedicated bus lane can improve the stability of traffic system and attract more travelers to choose bus reducing the traffic congestion.
文摘Biometric authentication provides a reliable,user-specific approach for identity verification,significantly enhancing access control and security against unauthorized intrusions in cybersecurity.Unimodal biometric systems that rely on either face or voice recognition encounter several challenges,including inconsistent data quality,environmental noise,and susceptibility to spoofing attacks.To address these limitations,this research introduces a robust multi-modal biometric recognition framework,namely Quantum-Enhanced Biometric Fusion Network.The proposed model strengthens security and boosts recognition accuracy through the fusion of facial and voice features.Furthermore,the model employs advanced pre-processing techniques to generate high-quality facial images and voice recordings,enabling more efficient face and voice recognition.Augmentation techniques are deployed to enhance model performance by enriching the training dataset with diverse and representative samples.The local features are extracted using advanced neural methods,while the voice features are extracted using a Pyramid-1D Wavelet Convolutional Bidirectional Network,which effectively captures speech dynamics.The Quantum Residual Network encodes facial features into quantum states,enabling powerful quantum-enhanced representations.These normalized feature sets are fused using an early fusion strategy that preserves complementary spatial-temporal characteristics.The experimental validation is conducted using a biometric audio and video dataset,with comprehensive evaluations including ablation and statistical analyses.The experimental analyses ensure that the proposed model attains superior performance,outperforming existing biometric methods with an average accuracy of 98.99%.The proposed model improves recognition robustness,making it an efficient multimodal solution for cybersecurity applications.
基金supported by the Natural Science Foundation of Henan under Grant 242300421220the Henan Provincial Science and Technology Research Project under Grants 252102211047 and 252102211062+3 种基金the Jiangsu Provincial Scheme Double Initiative Plan JSS-CBS20230474the XJTLU RDF-21-02-008the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205the Higher Education Teaching Reform Research and Practice Project of Henan Province under Grant 2024SJGLX0126.
文摘With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extracting high-quality emotional features and achieving effective interaction between different modalities remain two major obstacles in multimodal sentiment analysis.To address these challenges,this paper proposes a Text-Gated Interaction Network with Inter-Sample Commonality Perception(TGICP).Specifically,we utilize a Inter-sample Commonality Perception(ICP)module to extract common features from similar samples within the same modality,and use these common features to enhance the original features of each modality,thereby obtaining a richer and more complete multimodal sentiment representation.Subsequently,in the cross-modal interaction stage,we design a Text-Gated Interaction(TGI)module,which is text-driven.By calculating the mutual information difference between the text modality and nonverbal modalities,the TGI module dynamically adjusts the influence of emotional information from the text modality on nonverbal modalities.This helps to reduce modality information asymmetry while enabling full cross-modal interaction.Experimental results show that the proposed model achieves outstanding performance on both the CMU-MOSI and CMU-MOSEI baseline multimodal sentiment analysis datasets,validating its effectiveness in emotion recognition tasks.
基金National Key R&D Program of China(No.2022ZD0118401).
文摘Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.
文摘In this paper, a study on four African ports was taken out that all have the ca-pability to become a hub port that can serve the central African region. The paper sort to determine which port was most suitable and port indexing was the method that was used to evaluate these ports. The ports evaluated were the port of Kribi, the port of Bata, the port of Libreville and the port of Pointe-Noire. There were other models that were also used which included linear regression and linear programming which all contributed to providing the final results of the port with the most suitable potential to serve as a hub port and meaningful results were obtained. The final results showed that the port of Pointe-Noire was the most suitable port to serve the central African region as a hub port.
文摘The rapid development of electric buses has brought a surge in the number of bus hubs and their charging and discharging capacities.Therefore,the location and construction scale of bus hubs will greatly affect the operation costs and benefits of an urban distribution network in the future.Through the scientific and reasonable planning of public transport hubs on the premise of meeting the needs of basic public transport services,it can reduce the negative impact of electric bus charging loads upon the power grids.Furthermore,it can use its flexible operation characteristics to provide flexible support for the distribution network.In this paper,taking the impact of public transport hub on the reliability of distribution network as the starting point,a three-level programming optimization model based on the value and economy of distribution network load loss is proposed.Through the upper model,several planning schemes can be generated,which provides boundary conditions for the expansion of middle-level optimization.The normal operation dispatching scheme of public transport hub obtained from the middle-level optimization results provides boundary conditions for the development of lower level optimization.Through the lower level optimization,the expected load loss of the whole distribution system including bus hub under the planning scheme given by the upper level can be obtained.The effectiveness of the model is verified by an IEEE-33 bus example.
基金supported by the National Natural Science Foundation of China(No.62302540),with author Fangfang Shan for more information,please visit their website at https://www.nsfc.gov.cn/(accessed on 05 June 2024)Additionally,it is also funded by the Open Foundation of Henan Key Laboratory of Cyberspace Situation Awareness(No.HNTS2022020),where Fangfang Shan is an author.Further details can be found at http://xt.hnkjt.gov.cn/data/pingtai/(accessed on 05 June 2024)the Natural Science Foundation of Henan Province Youth Science Fund Project(No.232300420422),and for more information,you can visit https://kjt.henan.gov.cn(accessed on 05 June 2024).
文摘With the emergence and development of social networks,people can stay in touch with friends,family,and colleagues more quickly and conveniently,regardless of their location.This ubiquitous digital internet environment has also led to large-scale disclosure of personal privacy.Due to the complexity and subtlety of sensitive information,traditional sensitive information identification technologies cannot thoroughly address the characteristics of each piece of data,thus weakening the deep connections between text and images.In this context,this paper adopts the CLIP model as a modality discriminator.By using comparative learning between sensitive image descriptions and images,the similarity between the images and the sensitive descriptions is obtained to determine whether the images contain sensitive information.This provides the basis for identifying sensitive information using different modalities.Specifically,if the original data does not contain sensitive information,only single-modality text-sensitive information identification is performed;if the original data contains sensitive information,multimodality sensitive information identification is conducted.This approach allows for differentiated processing of each piece of data,thereby achieving more accurate sensitive information identification.The aforementioned modality discriminator can address the limitations of existing sensitive information identification technologies,making the identification of sensitive information from the original data more appropriate and precise.
基金supported by the National Natural Science Foundation of China(No.62302540)with author Fangfang Shan.For more information,please visit their website at https://www.nsfc.gov.cn/(accessed on 31/05/2024)+3 种基金Additionally,it is also funded by the Open Foundation of Henan Key Laboratory of Cyberspace Situation Awareness(No.HNTS2022020)where Fangfang Shan is an author.Further details can be found at http://xt.hnkjt.gov.cn/data/pingtai/(accessed on 31/05/2024)supported by the Natural Science Foundation of Henan Province Youth Science Fund Project(No.232300420422)for more information,you can visit https://kjt.henan.gov.cn/2022/09-02/2599082.html(accessed on 31/05/2024).
文摘Social media has become increasingly significant in modern society,but it has also turned into a breeding ground for the propagation of misleading information,potentially causing a detrimental impact on public opinion and daily life.Compared to pure text content,multmodal content significantly increases the visibility and share ability of posts.This has made the search for efficient modality representations and cross-modal information interaction methods a key focus in the field of multimodal fake news detection.To effectively address the critical challenge of accurately detecting fake news on social media,this paper proposes a fake news detection model based on crossmodal message aggregation and a gated fusion network(MAGF).MAGF first uses BERT to extract cumulative textual feature representations and word-level features,applies Faster Region-based ConvolutionalNeuralNetwork(Faster R-CNN)to obtain image objects,and leverages ResNet-50 and Visual Geometry Group-19(VGG-19)to obtain image region features and global features.The image region features and word-level text features are then projected into a low-dimensional space to calculate a text-image affinity matrix for cross-modal message aggregation.The gated fusion network combines text and image region features to obtain adaptively aggregated features.The interaction matrix is derived through an attention mechanism and further integrated with global image features using a co-attention mechanism to producemultimodal representations.Finally,these fused features are fed into a classifier for news categorization.Experiments were conducted on two public datasets,Twitter and Weibo.Results show that the proposed model achieves accuracy rates of 91.8%and 88.7%on the two datasets,respectively,significantly outperforming traditional unimodal and existing multimodal models.
基金This research was funded by the General Project of Philosophy and Social Science of Heilongjiang Province,Grant Number:20SHB080.
文摘In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics.
文摘In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.