With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with...With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular...Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular localization,most of them ignore the sequence order information by relying on k-mer frequency features to encode IncRNA sequences.In the study,we develope SGCL-LncLoc,a novel interpretable deep learning model based on supervised graph contrastive learning.SGCL-LncLoc transforms IncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph.Then,SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation.Additionally,we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the IncRNA sequence,allowing SGCL-LncLoc to serve as an interpretable deep learning model.Furthermore,SGCL-LncLoc employs a supervised contrastive learning strategy,which leverages the relationships between different samples and label information,guiding the model to enhance representation learning for IncRNAs.Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors,showing its capability for accurate IncRNA subcellular localization prediction.Furthermore,we conduct a motif analysis,revealing that SGCL-LncLoc successfully captures known motifs associated with IncRNA subcellular localization.The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc.The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.展开更多
Cells are the fundamental units of life and exhibit significant diversity in structure,behavior,and function,known as cell heterogeneity.The advent and development of single-cell RNA sequencing(scRNA-seq)technology ha...Cells are the fundamental units of life and exhibit significant diversity in structure,behavior,and function,known as cell heterogeneity.The advent and development of single-cell RNA sequencing(scRNA-seq)technology have provided a crucial data foundation for studying cellular heterogeneity.Currently,most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation.However,those clustering-based methods are susceptible to the selection of genes and clustering parameters,resulting in inaccuracies in cell annotation.To address this issue,we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data.This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells.Then,it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells.The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets.Based on the correct labels of scRNA-seq data,we further assess the latest unsupervised clustering methods,thereby establishing a more objective benchmark to compare their performance.展开更多
基金supported by the National Natural Science Foundation of China (No.U1936122)Primary Research&Developement Plan of Hubei Province (Nos.2020BAB101 and 2020BAA003).
文摘With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
基金supported by the National Natural Science Foundation of China(No.62102457)the Hunan Provincial Natural Science Foundation of China(No.2023JJ40763)+1 种基金the Hunan Provincial Science and Technology Program(No.2021RC4008)the Fundamental Research Funds for the Central Universities of Central South University(No.CX20230271).
文摘Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular localization,most of them ignore the sequence order information by relying on k-mer frequency features to encode IncRNA sequences.In the study,we develope SGCL-LncLoc,a novel interpretable deep learning model based on supervised graph contrastive learning.SGCL-LncLoc transforms IncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph.Then,SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation.Additionally,we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the IncRNA sequence,allowing SGCL-LncLoc to serve as an interpretable deep learning model.Furthermore,SGCL-LncLoc employs a supervised contrastive learning strategy,which leverages the relationships between different samples and label information,guiding the model to enhance representation learning for IncRNAs.Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors,showing its capability for accurate IncRNA subcellular localization prediction.Furthermore,we conduct a motif analysis,revealing that SGCL-LncLoc successfully captures known motifs associated with IncRNA subcellular localization.The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc.The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.
基金supported by the National Natural Science Foundation of China(Nos.62202503 and 62225209)the Hunan Provincial Natural Science Foundation of China(No.2023JJ40780)This work is carried out in part using computing resources at the High Performance Computing Center,Central South University,China.
文摘Cells are the fundamental units of life and exhibit significant diversity in structure,behavior,and function,known as cell heterogeneity.The advent and development of single-cell RNA sequencing(scRNA-seq)technology have provided a crucial data foundation for studying cellular heterogeneity.Currently,most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation.However,those clustering-based methods are susceptible to the selection of genes and clustering parameters,resulting in inaccuracies in cell annotation.To address this issue,we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data.This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells.Then,it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells.The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets.Based on the correct labels of scRNA-seq data,we further assess the latest unsupervised clustering methods,thereby establishing a more objective benchmark to compare their performance.