Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algori...Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algorithms with missing labels do not consider the relevance of labels, resulting in label estimation errors of new samples. A new multi-label learning algorithm with support vector machine(SVM) based association(SVMA) is proposed to estimate missing labels by constructing the association between different labels. SVMA will establish a mapping function to minimize the number of samples in the margin while ensuring the margin large enough as well as minimizing the misclassification probability. To evaluate the performance of SVMA in the condition of missing labels, four typical data sets are adopted with the integrity of the labels being handled manually. Simulation results show the superiority of SVMA in dealing with the samples with missing labels compared with other models in image classification.展开更多
Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess...Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.展开更多
In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local...In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local label correlations can appear in real-world situation at same time.On the other hand,we should not be limited to pairwise labels while ignoring the high-order label correlation.In this paper,we propose a novel and effective method called GLLCBN for multi-label learning.Firstly,we obtain the global label correlation by exploiting label semantic similarity.Then,we analyze the pairwise labels in the label space of the data set to acquire the local correlation.Next,we build the original version of the label dependency model by global and local label correlations.After that,we use graph theory,probability theory and Bayesian networks to eliminate redundant dependency structure in the initial version model,so as to get the optimal label dependent model.Finally,we obtain the feature extraction model by adjusting the Inception V3 model of convolution neural network and combine it with the GLLCBN model to achieve the multi-label learning.The experimental results show that our proposed model has better performance than other multi-label learning methods in performance evaluating.展开更多
In this paper, we utilize the framework of multi-label learning for face demographic classification. We also attempt t;o explore the suitable classifiers and features for face demographic classification. Three most po...In this paper, we utilize the framework of multi-label learning for face demographic classification. We also attempt t;o explore the suitable classifiers and features for face demographic classification. Three most popular demographic information, gender, ethnicity and age are considered in experiments. Based on the results from demographic classification, we utilize statistic analysis to explore the correlation among various face demographic information. Through the analysis, we draw several conclusions on the correlation and interaction among these high-level face semantic, and the obtained results can be helpful in automatic face semantic annotation and other face analysis tasks.展开更多
<div style="text-align:justify;"> This paper studies a kind of urban security risk assessment model based on multi-label learning, which is transformed into the solution of linear equations through a s...<div style="text-align:justify;"> This paper studies a kind of urban security risk assessment model based on multi-label learning, which is transformed into the solution of linear equations through a series of transformations, and then the solution of linear equations is transformed into an optimization problem. Finally, this paper uses some classical optimization algorithms to solve these optimization problems, the convergence of the algorithm is proved, and the advantages and disadvantages of several optimization methods are compared. </div>展开更多
Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages suc...Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance.展开更多
It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical informati...It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.展开更多
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone...Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.展开更多
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ...High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.展开更多
With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications...With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications.In practical MIML tasks,the naturally skewed label distribution and label interdependence bring up the label imbalance issue and decrease model performance,which is rarely studied.To solve these problems,we propose an imbalanced multi-instance multi-label learning method via tensor product-based semantic fusion(IMIML-TPSF)to deal with label interdependence and label distribution imbalance simultaneously.Specifically,to reduce the effect of label interdependence,it models similarity between the query object and object sets of different label classes for similarity-structural features.To alleviate disturbance caused by the imbalanced label distribution,it establishes the ensemble model for imbalanced distribution features.Subsequently,IMIML-TPSF fuses two types of features by tensor product and generates the new feature vector,which can preserve the original and interactive feature information for each bag.Based on such features with rich semantics,it trains the robust generalized linear classification model and further captures label interdependence.Extensive experimental results on several datasets validate the effectiveness of IMIML-TPSF against state-of-the-art methods.展开更多
Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face...Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face challenges,including high metal usage,high process costs,and low cyclohexene yield.This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing benzene conversion,cyclohexene selectivity,and yield in the benzene hydrogenation to cyclohexene reaction.It constructs predictive models based on XGBoost and Random Forest algorithms.After analysis,it was found that reaction time,Ru content,and space velocity are key factors influencing cyclohexene yield,selectivity,and benzene conversion.Shapley Additive Explanations(SHAP)analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes.Additionally,we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations.This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research.展开更多
The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combi...The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combines numerical simulation with machine learning techniques to explore this issue.It presents a summary of special-shaped tunnel geometries and introduces a shape coefficient.Through the finite element software,Plaxis3D,the study simulates six key parameters—shape coefficient,burial depth ratio,tunnel’s longest horizontal length,internal friction angle,cohesion,and soil submerged bulk density—that impact uplift resistance across different conditions.Employing XGBoost and ANN methods,the feature importance of each parameter was analyzed based on the numerical simulation results.The findings demonstrate that a tunnel shape more closely resembling a circle leads to reduced uplift resistance in the overlying soil,whereas other parameters exhibit the contrary effects.Furthermore,the study reveals a diminishing trend in the feature importance of buried depth ratio,internal friction angle,tunnel longest horizontal length,cohesion,soil submerged bulk density,and shape coefficient in influencing uplift resistance.展开更多
As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impact...As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impacting travel experiences and posing safety risks.Smart urban transportation management emerges as a strategic solution,conceptualized here as a multidimensional big data problem.The success of this strategy hinges on the effective collection of information from diverse,extensive,and heterogeneous data sources,necessitating the implementation of full⁃stack Information and Communication Technology(ICT)solutions.The main idea of the work is to investigate the current technologies of Intelligent Transportation Systems(ITS)and enhance the safety of urban transportation systems.Machine learning models,trained on historical data,can predict traffic congestion,allowing for the implementation of preventive measures.Deep learning architectures,with their ability to handle complex data representations,further refine traffic predictions,contributing to more accurate and dynamic transportation management.The background of this research underscores the challenges posed by traffic congestion in metropolitan areas and emphasizes the need for advanced technological solutions.By integrating GPS and GIS technologies with machine learning algorithms,this work aims to pay attention to the development of intelligent transportation systems that not only address current challenges but also pave the way for future advancements in urban transportation management.展开更多
BACKGROUND The accurate prediction of lymph node metastasis(LNM)is crucial for managing locally advanced(T3/T4)colorectal cancer(CRC).However,both traditional histopathology and standard slide-level deep learning ofte...BACKGROUND The accurate prediction of lymph node metastasis(LNM)is crucial for managing locally advanced(T3/T4)colorectal cancer(CRC).However,both traditional histopathology and standard slide-level deep learning often fail to capture the sparse and diagnostically critical features of metastatic potential.AIM To develop and validate a case-level multiple-instance learning(MIL)framework mimicking a pathologist's comprehensive review and improve T3/T4 CRC LNM prediction.METHODS The whole-slide images of 130 patients with T3/T4 CRC were retrospectively collected.A case-level MIL framework utilising the CONCH v1.5 and UNI2-h deep learning models was trained on features from all haematoxylin and eosinstained primary tumour slides for each patient.These pathological features were subsequently integrated with clinical data,and model performance was evaluated using the area under the curve(AUC).RESULTS The case-level framework demonstrated superior LNM prediction over slide-level training,with the CONCH v1.5 model achieving a mean AUC(±SD)of 0.899±0.033 vs 0.814±0.083,respectively.Integrating pathology features with clinical data further enhanced performance,yielding a top model with a mean AUC of 0.904±0.047,in sharp contrast to a clinical-only model(mean AUC 0.584±0.084).Crucially,a pathologist’s review confirmed that the model-identified high-attention regions correspond to known high-risk histopathological features.CONCLUSION A case-level MIL framework provides a superior approach for predicting LNM in advanced CRC.This method shows promise for risk stratification and therapy decisions,requiring further validation.展开更多
Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to wea...Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to weaker explanations and practicability.This study introduces a novel explainable deep learning framework comprising the Informer model with enhanced attention mechanisms(EAMInfor)and deep learning important features(DeepLIFT),aimed at improving the prediction accuracy of shield position deviations and providing interpretability for predictive results.The EAMInfor model attempts to integrate channel attention,spatial attention,and simple attention modules to improve the Informer model's performance.The framework is tested with the four different geological conditions datasets generated from the Xiamen metro line 3,China.Results show that the EAMInfor model outperforms the traditional Informer and comparison models.The analysis with the DeepLIFT method indicates that the push thrust of push cylinder and the earth chamber pressure are the most significant features,while the stroke length of the push cylinder demonstrated lower importance.Furthermore,the variation trends in the significance of data points within input sequences exhibit substantial differences between single and composite strata.This framework not only improves predictive accuracy but also strengthens the credibility and reliability of the results.展开更多
Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-h...Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.展开更多
The solar cycle(SC),a phenomenon caused by the quasi-periodic regular activities in the Sun,occurs approximately every 11 years.Intense solar activity can disrupt the Earth’s ionosphere,affecting communication and na...The solar cycle(SC),a phenomenon caused by the quasi-periodic regular activities in the Sun,occurs approximately every 11 years.Intense solar activity can disrupt the Earth’s ionosphere,affecting communication and navigation systems.Consequently,accurately predicting the intensity of the SC holds great significance,but predicting the SC involves a long-term time series,and many existing time series forecasting methods have fallen short in terms of accuracy and efficiency.The Time-series Dense Encoder model is a deep learning solution tailored for long time series prediction.Based on a multi-layer perceptron structure,it outperforms the best previously existing models in accuracy,while being efficiently trainable on general datasets.We propose a method based on this model for SC forecasting.Using a trained model,we predict the test set from SC 19 to SC 25 with an average mean absolute percentage error of 32.02,root mean square error of 30.3,mean absolute error of 23.32,and R^(2)(coefficient of determination)of 0.76,outperforming other deep learning models in terms of accuracy and training efficiency on sunspot number datasets.Subsequently,we use it to predict the peaks of SC 25 and SC 26.For SC 25,the peak time has ended,but a stronger peak is predicted for SC 26,of 199.3,within a range of 170.8-221.9,projected to occur during April 2034.展开更多
Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing scree...Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing screening methods suffer from limitations in accuracy and accessibility,hindering their application in large-scale population screening.In this work,a surface-enhanced Raman spectroscopy(SERS)-based method was established to explore the profiles of different stratified components in saliva from NPC and healthy subjects after fractionation processing.The study findings indicate that all fractionated samples exhibit diseaseassociated molecular signaling differences,where small-molecule(molecular weight cut-offvalue is 10 kDa)demonstrating superior classification capabilities with sensitivity of 90.5%and speci-ficity of 75.6%,area under receiver operating characteristic(ROC)curve of 0:925±0:031.The primary objective of this study was to qualitatively explore patterns in saliva composition across groups.The proposed SERS detection strategy for fractionated saliva offers novel insights for enhancing the sensitivity and reliability of noninvasive NPC screening,laying the foundation for translational application in large-scale clinical settings.展开更多
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top...Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.展开更多
基金Support by the National High Technology Research and Development Program of China(No.2012AA120802)National Natural Science Foundation of China(No.61771186)+1 种基金Postdoctoral Research Project of Heilongjiang Province(No.LBH-Q15121)Undergraduate University Project of Young Scientist Creative Talent of Heilongjiang Province(No.UNPYSCT-2017125)
文摘Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algorithms with missing labels do not consider the relevance of labels, resulting in label estimation errors of new samples. A new multi-label learning algorithm with support vector machine(SVM) based association(SVMA) is proposed to estimate missing labels by constructing the association between different labels. SVMA will establish a mapping function to minimize the number of samples in the margin while ensuring the margin large enough as well as minimizing the misclassification probability. To evaluate the performance of SVMA in the condition of missing labels, four typical data sets are adopted with the integrity of the labels being handled manually. Simulation results show the superiority of SVMA in dealing with the samples with missing labels compared with other models in image classification.
基金This work was supported by the National Science Foundation of China(62176055)the China University S&T Innovation Plan Guided by the Ministry of Education.
文摘Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.
文摘In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local label correlations can appear in real-world situation at same time.On the other hand,we should not be limited to pairwise labels while ignoring the high-order label correlation.In this paper,we propose a novel and effective method called GLLCBN for multi-label learning.Firstly,we obtain the global label correlation by exploiting label semantic similarity.Then,we analyze the pairwise labels in the label space of the data set to acquire the local correlation.Next,we build the original version of the label dependency model by global and local label correlations.After that,we use graph theory,probability theory and Bayesian networks to eliminate redundant dependency structure in the initial version model,so as to get the optimal label dependent model.Finally,we obtain the feature extraction model by adjusting the Inception V3 model of convolution neural network and combine it with the GLLCBN model to achieve the multi-label learning.The experimental results show that our proposed model has better performance than other multi-label learning methods in performance evaluating.
基金Project supported by the National Natural Science Foundation of China(Grant No.60605012)the Natural Science Foundation of Shanghai(Grant No.08ZR1408200)+1 种基金the Open Project Program of the National Laboratory of Pattern Recognition of China(Grant No.08-2-16)the Shanghai Leading Academic Discipline Project(Grant No.J50103)
文摘In this paper, we utilize the framework of multi-label learning for face demographic classification. We also attempt t;o explore the suitable classifiers and features for face demographic classification. Three most popular demographic information, gender, ethnicity and age are considered in experiments. Based on the results from demographic classification, we utilize statistic analysis to explore the correlation among various face demographic information. Through the analysis, we draw several conclusions on the correlation and interaction among these high-level face semantic, and the obtained results can be helpful in automatic face semantic annotation and other face analysis tasks.
文摘<div style="text-align:justify;"> This paper studies a kind of urban security risk assessment model based on multi-label learning, which is transformed into the solution of linear equations through a series of transformations, and then the solution of linear equations is transformed into an optimization problem. Finally, this paper uses some classical optimization algorithms to solve these optimization problems, the convergence of the algorithm is proved, and the advantages and disadvantages of several optimization methods are compared. </div>
基金supported by the NSFC (Grant Nos. 61772281,61703212, 61602254)Jiangsu Province Natural Science Foundation [grant numberBK2160968]the Priority Academic Program Development of Jiangsu Higher Edu-cationInstitutions (PAPD) and Jiangsu Collaborative Innovation Center on AtmosphericEnvironment and Equipment Technology (CICAEET).
文摘Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance.
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.
基金supported in part by the National Key R&D Program of China (2023YFA1011601)the Major Key Project of PCL, China (PCL2023AS7-1)+3 种基金in part by the National Natural Science Foundation of China (U21A20478, 62106224, 92267203)in part by the Science and Technology Major Project of Guangzhou (202007030006)in part by the Major Key Project of PCL (PCL2021A09)in part by the Guangzhou Science and Technology Plan Project (2024A04J3749)。
文摘Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(RS-2020-NR049579).
文摘High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques.
基金supported by the National Natural Science Foundation of China(Grant Nos.62376281 and 62036013)the NSF for Huxiang Young Talents Program of Hunan Province(2021RC3070).
文摘With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications.In practical MIML tasks,the naturally skewed label distribution and label interdependence bring up the label imbalance issue and decrease model performance,which is rarely studied.To solve these problems,we propose an imbalanced multi-instance multi-label learning method via tensor product-based semantic fusion(IMIML-TPSF)to deal with label interdependence and label distribution imbalance simultaneously.Specifically,to reduce the effect of label interdependence,it models similarity between the query object and object sets of different label classes for similarity-structural features.To alleviate disturbance caused by the imbalanced label distribution,it establishes the ensemble model for imbalanced distribution features.Subsequently,IMIML-TPSF fuses two types of features by tensor product and generates the new feature vector,which can preserve the original and interactive feature information for each bag.Based on such features with rich semantics,it trains the robust generalized linear classification model and further captures label interdependence.Extensive experimental results on several datasets validate the effectiveness of IMIML-TPSF against state-of-the-art methods.
基金Supported by CAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project(XDB1190300,XDB1190302)Youth Innovation Promotion Association CAS(Y2021056)+1 种基金Joint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy(YLU-DNL Fund 2022007)The special fund for Science and Technology Innovation Teams of Shanxi Province(202304051001007)。
文摘Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face challenges,including high metal usage,high process costs,and low cyclohexene yield.This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing benzene conversion,cyclohexene selectivity,and yield in the benzene hydrogenation to cyclohexene reaction.It constructs predictive models based on XGBoost and Random Forest algorithms.After analysis,it was found that reaction time,Ru content,and space velocity are key factors influencing cyclohexene yield,selectivity,and benzene conversion.Shapley Additive Explanations(SHAP)analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes.Additionally,we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations.This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research.
基金Guangzhou Metro Scientific Research Project(No.JT204-100111-23001)Chongqing Municipal Special Project for Technological Innovation and Application Development(No.CSTB2022TIAD-KPX0101)Science and Technology Research and Development Program of China State Railway Group Co.,Ltd.(No.N2023G045)。
文摘The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combines numerical simulation with machine learning techniques to explore this issue.It presents a summary of special-shaped tunnel geometries and introduces a shape coefficient.Through the finite element software,Plaxis3D,the study simulates six key parameters—shape coefficient,burial depth ratio,tunnel’s longest horizontal length,internal friction angle,cohesion,and soil submerged bulk density—that impact uplift resistance across different conditions.Employing XGBoost and ANN methods,the feature importance of each parameter was analyzed based on the numerical simulation results.The findings demonstrate that a tunnel shape more closely resembling a circle leads to reduced uplift resistance in the overlying soil,whereas other parameters exhibit the contrary effects.Furthermore,the study reveals a diminishing trend in the feature importance of buried depth ratio,internal friction angle,tunnel longest horizontal length,cohesion,soil submerged bulk density,and shape coefficient in influencing uplift resistance.
文摘As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impacting travel experiences and posing safety risks.Smart urban transportation management emerges as a strategic solution,conceptualized here as a multidimensional big data problem.The success of this strategy hinges on the effective collection of information from diverse,extensive,and heterogeneous data sources,necessitating the implementation of full⁃stack Information and Communication Technology(ICT)solutions.The main idea of the work is to investigate the current technologies of Intelligent Transportation Systems(ITS)and enhance the safety of urban transportation systems.Machine learning models,trained on historical data,can predict traffic congestion,allowing for the implementation of preventive measures.Deep learning architectures,with their ability to handle complex data representations,further refine traffic predictions,contributing to more accurate and dynamic transportation management.The background of this research underscores the challenges posed by traffic congestion in metropolitan areas and emphasizes the need for advanced technological solutions.By integrating GPS and GIS technologies with machine learning algorithms,this work aims to pay attention to the development of intelligent transportation systems that not only address current challenges but also pave the way for future advancements in urban transportation management.
基金Supported by Chongqing Medical Scientific Research Project(Joint Project of Chongqing Health Commission and Science and Technology Bureau),No.2023MSXM060.
文摘BACKGROUND The accurate prediction of lymph node metastasis(LNM)is crucial for managing locally advanced(T3/T4)colorectal cancer(CRC).However,both traditional histopathology and standard slide-level deep learning often fail to capture the sparse and diagnostically critical features of metastatic potential.AIM To develop and validate a case-level multiple-instance learning(MIL)framework mimicking a pathologist's comprehensive review and improve T3/T4 CRC LNM prediction.METHODS The whole-slide images of 130 patients with T3/T4 CRC were retrospectively collected.A case-level MIL framework utilising the CONCH v1.5 and UNI2-h deep learning models was trained on features from all haematoxylin and eosinstained primary tumour slides for each patient.These pathological features were subsequently integrated with clinical data,and model performance was evaluated using the area under the curve(AUC).RESULTS The case-level framework demonstrated superior LNM prediction over slide-level training,with the CONCH v1.5 model achieving a mean AUC(±SD)of 0.899±0.033 vs 0.814±0.083,respectively.Integrating pathology features with clinical data further enhanced performance,yielding a top model with a mean AUC of 0.904±0.047,in sharp contrast to a clinical-only model(mean AUC 0.584±0.084).Crucially,a pathologist’s review confirmed that the model-identified high-attention regions correspond to known high-risk histopathological features.CONCLUSION A case-level MIL framework provides a superior approach for predicting LNM in advanced CRC.This method shows promise for risk stratification and therapy decisions,requiring further validation.
基金supported by the National Natural Science Foundation of China(Grant Nos.52378392,52408356)the Foal Eagle Program Youth Top-notch Talent Project of Fujian Province,China(Grant No.00387088).
文摘Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to weaker explanations and practicability.This study introduces a novel explainable deep learning framework comprising the Informer model with enhanced attention mechanisms(EAMInfor)and deep learning important features(DeepLIFT),aimed at improving the prediction accuracy of shield position deviations and providing interpretability for predictive results.The EAMInfor model attempts to integrate channel attention,spatial attention,and simple attention modules to improve the Informer model's performance.The framework is tested with the four different geological conditions datasets generated from the Xiamen metro line 3,China.Results show that the EAMInfor model outperforms the traditional Informer and comparison models.The analysis with the DeepLIFT method indicates that the push thrust of push cylinder and the earth chamber pressure are the most significant features,while the stroke length of the push cylinder demonstrated lower importance.Furthermore,the variation trends in the significance of data points within input sequences exhibit substantial differences between single and composite strata.This framework not only improves predictive accuracy but also strengthens the credibility and reliability of the results.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00518960)in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00563192).
文摘Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.
基金supported by the Academic Research Projects of Beijing Union University(ZK20202204)the National Natural Science Foundation of China(12250005,12073040,12273059,11973056,12003051,11573037,12073041,11427901,11572005,11611530679 and 12473052)+1 种基金the Strategic Priority Research Program of the China Academy of Sciences(XDB0560000,XDA15052200,XDB09040200,XDA15010700,XDB0560301,and XDA15320102)the Chinese Meridian Project(CMP).
文摘The solar cycle(SC),a phenomenon caused by the quasi-periodic regular activities in the Sun,occurs approximately every 11 years.Intense solar activity can disrupt the Earth’s ionosphere,affecting communication and navigation systems.Consequently,accurately predicting the intensity of the SC holds great significance,but predicting the SC involves a long-term time series,and many existing time series forecasting methods have fallen short in terms of accuracy and efficiency.The Time-series Dense Encoder model is a deep learning solution tailored for long time series prediction.Based on a multi-layer perceptron structure,it outperforms the best previously existing models in accuracy,while being efficiently trainable on general datasets.We propose a method based on this model for SC forecasting.Using a trained model,we predict the test set from SC 19 to SC 25 with an average mean absolute percentage error of 32.02,root mean square error of 30.3,mean absolute error of 23.32,and R^(2)(coefficient of determination)of 0.76,outperforming other deep learning models in terms of accuracy and training efficiency on sunspot number datasets.Subsequently,we use it to predict the peaks of SC 25 and SC 26.For SC 25,the peak time has ended,but a stronger peak is predicted for SC 26,of 199.3,within a range of 170.8-221.9,projected to occur during April 2034.
基金financially supported by National Natural Science Foundation ofChina(No.12374405)Provincial Science Foundation for Distinguished Young Scholars of Fujian(No.2024J010024)+1 种基金Natural Science Foundation of Fujian Province of China(No.2023J011267)Major Research Projects for Young and Middle-aged Researchers of Fujian Provincial Health Commission(No.2021ZQNZD010).
文摘Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing screening methods suffer from limitations in accuracy and accessibility,hindering their application in large-scale population screening.In this work,a surface-enhanced Raman spectroscopy(SERS)-based method was established to explore the profiles of different stratified components in saliva from NPC and healthy subjects after fractionation processing.The study findings indicate that all fractionated samples exhibit diseaseassociated molecular signaling differences,where small-molecule(molecular weight cut-offvalue is 10 kDa)demonstrating superior classification capabilities with sensitivity of 90.5%and speci-ficity of 75.6%,area under receiver operating characteristic(ROC)curve of 0:925±0:031.The primary objective of this study was to qualitatively explore patterns in saliva composition across groups.The proposed SERS detection strategy for fractionated saliva offers novel insights for enhancing the sensitivity and reliability of noninvasive NPC screening,laying the foundation for translational application in large-scale clinical settings.
基金funded by Hung Yen University of Technology and Education under grand number UTEHY.L.2025.62.
文摘Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.