1 Introduction Black soils are a soil type with good properties and high fertility,which is very suitable for plant growth(Liu et al.,2015).Black soil resources are widely distributed in North America,Eurasia,and Sout...1 Introduction Black soils are a soil type with good properties and high fertility,which is very suitable for plant growth(Liu et al.,2015).Black soil resources are widely distributed in North America,Eurasia,and South America,and cover about 916million ha around the world,35 million ha of this in northeast China(Liu et al.,2012).展开更多
For digitalization of traditional Chinese medicine(TCM),research is being conducted on objectivization of diagnosis and treatment,mathematical models of TCM theories,and application of modern information technology to...For digitalization of traditional Chinese medicine(TCM),research is being conducted on objectivization of diagnosis and treatment,mathematical models of TCM theories,and application of modern information technology to digitize the vast amounts of existing information.However,the author believes that TCM practitioners should first conduct a systematic and comprehensive refined analysis on the knowledge of TCM and unify data elements used in computer intelligence to avoid ambiguity.Thus,we must overcome the epistemological constraints and carefully analyze the relationship among data elements to achieve systematic results and administer TCM appropriately.展开更多
Traditional Chinese medicine (TCM) is one of the safe and effective methods to treat liver cir-rhosis. The practitioners of TCM assess hepatic function in term of syndrome. But the course of syndrome differentiation i...Traditional Chinese medicine (TCM) is one of the safe and effective methods to treat liver cir-rhosis. The practitioners of TCM assess hepatic function in term of syndrome. But the course of syndrome differentiation is subjectivity. At pre-sent most of all the researches are focused on the relationship between the syndrome and the Western medicine objective indicators such as child-pugh grade. In fact syndrome is the syn-thesis of signs and symptoms and collecting signs, symptoms is easy than syndrome differ-entiation. We try to explore the relationship be-tween the objective Western medicine standard such as child-pugh grade, decompensation or compensation stage, active or inactive period and the signs and symptoms of TCM by using the data mining method. We use the information gain method to assess the attributes and use five typical classifiers such as logistic, Bayes-Net, NaiveBayes, RBF and C4.5 to obtain the classification accuracy. After attribute selection, we obtain the main symptoms and signs of TCM relating to the stage, period and child-pugh grade about liver cirrhosis. The experiment re-sults show the classification accuracy is im-proved after filtering some symptoms and signs.展开更多
传统数据增强技术,如同义词替换、随机插入和随机删除等,可能改变文本的原始语义,甚至导致关键信息丢失。此外,在文本分类任务中,数据通常包含文本部分和标签部分,然而传统数据增强方法仅针对文本部分。为解决这些问题,提出一种结合标...传统数据增强技术,如同义词替换、随机插入和随机删除等,可能改变文本的原始语义,甚至导致关键信息丢失。此外,在文本分类任务中,数据通常包含文本部分和标签部分,然而传统数据增强方法仅针对文本部分。为解决这些问题,提出一种结合标签混淆的数据增强(LCDA)技术,从文本和标签这2个基本方面入手,为数据提供全面的强化。在文本方面,通过对文本进行标点符号随机插入和替换以及句末标点符号补齐等增强,在保留全部文本信息和顺序的同时增加文本的多样性;在标签方面,采用标签混淆方法生成模拟标签分布替代传统的one-hot标签分布,以更好地反映实例和标签与标签之间的关系。在THUCNews(TsingHua University Chinese News)和Toutiao这2个中文新闻数据集构建的小样本数据集上分别结合TextCNN、TextRNN、BERT(Bidirectional Encoder Representations from Transformers)和RoBERTa-CNN(Robustly optimized BERT approach Convolutional Neural Network)文本分类模型的实验结果表明,与增强前相比,性能均得到显著提升。其中,在由THUCNews数据集构造的50-THU数据集上,4种模型结合LCDA技术后的准确率相较于增强前分别提高了1.19、6.87、3.21和2.89个百分点;相较于softEDA(Easy Data Augmentation with soft labels)方法增强的模型分别提高了0.78、7.62、1.75和1.28个百分点。通过在文本和标签这2个维度的处理结果可知,LCDA技术能显著提升模型的准确率,在数据量较少的应用场景中表现尤为突出。展开更多
Treatment determination based on syndrome differentiation is the key of Chinese medicine. A feasible way of improving the clinical therapy effectiveness is needed to correctly differentiate the syndrome classification...Treatment determination based on syndrome differentiation is the key of Chinese medicine. A feasible way of improving the clinical therapy effectiveness is needed to correctly differentiate the syndrome classifications based on the clinical manifestations. In this paper, a novel data mining method based on manifold ranking (MR) is proposed to explore the relation between syndromes and symptoms for viral hepatitis. Since MR could take the symptom data with expert differentiation and the symptom data without expert differentiation into the task of syndrome classification, the clinical information used for modeling the syndrome features is greatly enlarged so as to improve the precise of syndrome classification. In addition, the proposed method of syndrome classification could also avoid two disadvantages in previous methods: linear relation of the clinical data and mutually exclusive symptoms among different syndromes. And it could help exploit the latent relation between syndromes and symptoms more effectively. Better performance of syndrome classification is able to be achieved according to the experimental results and the clinical experts.展开更多
基金funded by the Land Resources Evolution Mechanism and Sustainable Use in Global Black Soil Critical Zone Program(IGCP665)the Geochemical Survey of Land Quality in Northeast China Black Soil Area at 1:250000 Scale Program(Grant No.DD20160316)the Program for JLU Science and Technology Innovative Research Team(Grant Nos.JLUSTIRT,2017TD-26).
文摘1 Introduction Black soils are a soil type with good properties and high fertility,which is very suitable for plant growth(Liu et al.,2015).Black soil resources are widely distributed in North America,Eurasia,and South America,and cover about 916million ha around the world,35 million ha of this in northeast China(Liu et al.,2012).
基金the funding support from the National Natural Science Foundation of China(No.81373702)
文摘For digitalization of traditional Chinese medicine(TCM),research is being conducted on objectivization of diagnosis and treatment,mathematical models of TCM theories,and application of modern information technology to digitize the vast amounts of existing information.However,the author believes that TCM practitioners should first conduct a systematic and comprehensive refined analysis on the knowledge of TCM and unify data elements used in computer intelligence to avoid ambiguity.Thus,we must overcome the epistemological constraints and carefully analyze the relationship among data elements to achieve systematic results and administer TCM appropriately.
文摘Traditional Chinese medicine (TCM) is one of the safe and effective methods to treat liver cir-rhosis. The practitioners of TCM assess hepatic function in term of syndrome. But the course of syndrome differentiation is subjectivity. At pre-sent most of all the researches are focused on the relationship between the syndrome and the Western medicine objective indicators such as child-pugh grade. In fact syndrome is the syn-thesis of signs and symptoms and collecting signs, symptoms is easy than syndrome differ-entiation. We try to explore the relationship be-tween the objective Western medicine standard such as child-pugh grade, decompensation or compensation stage, active or inactive period and the signs and symptoms of TCM by using the data mining method. We use the information gain method to assess the attributes and use five typical classifiers such as logistic, Bayes-Net, NaiveBayes, RBF and C4.5 to obtain the classification accuracy. After attribute selection, we obtain the main symptoms and signs of TCM relating to the stage, period and child-pugh grade about liver cirrhosis. The experiment re-sults show the classification accuracy is im-proved after filtering some symptoms and signs.
文摘传统数据增强技术,如同义词替换、随机插入和随机删除等,可能改变文本的原始语义,甚至导致关键信息丢失。此外,在文本分类任务中,数据通常包含文本部分和标签部分,然而传统数据增强方法仅针对文本部分。为解决这些问题,提出一种结合标签混淆的数据增强(LCDA)技术,从文本和标签这2个基本方面入手,为数据提供全面的强化。在文本方面,通过对文本进行标点符号随机插入和替换以及句末标点符号补齐等增强,在保留全部文本信息和顺序的同时增加文本的多样性;在标签方面,采用标签混淆方法生成模拟标签分布替代传统的one-hot标签分布,以更好地反映实例和标签与标签之间的关系。在THUCNews(TsingHua University Chinese News)和Toutiao这2个中文新闻数据集构建的小样本数据集上分别结合TextCNN、TextRNN、BERT(Bidirectional Encoder Representations from Transformers)和RoBERTa-CNN(Robustly optimized BERT approach Convolutional Neural Network)文本分类模型的实验结果表明,与增强前相比,性能均得到显著提升。其中,在由THUCNews数据集构造的50-THU数据集上,4种模型结合LCDA技术后的准确率相较于增强前分别提高了1.19、6.87、3.21和2.89个百分点;相较于softEDA(Easy Data Augmentation with soft labels)方法增强的模型分别提高了0.78、7.62、1.75和1.28个百分点。通过在文本和标签这2个维度的处理结果可知,LCDA技术能显著提升模型的准确率,在数据量较少的应用场景中表现尤为突出。
基金Supported by National Natural Science Foundation of China(No.81202858)National Key Technology Support Program(No.2012BAI25B02)+1 种基金Self-selected Subject of China Academyof Chinese Medical Sciences(No.ZZ05003,No.ZZ03090,No.Z0217)the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1306)
文摘Treatment determination based on syndrome differentiation is the key of Chinese medicine. A feasible way of improving the clinical therapy effectiveness is needed to correctly differentiate the syndrome classifications based on the clinical manifestations. In this paper, a novel data mining method based on manifold ranking (MR) is proposed to explore the relation between syndromes and symptoms for viral hepatitis. Since MR could take the symptom data with expert differentiation and the symptom data without expert differentiation into the task of syndrome classification, the clinical information used for modeling the syndrome features is greatly enlarged so as to improve the precise of syndrome classification. In addition, the proposed method of syndrome classification could also avoid two disadvantages in previous methods: linear relation of the clinical data and mutually exclusive symptoms among different syndromes. And it could help exploit the latent relation between syndromes and symptoms more effectively. Better performance of syndrome classification is able to be achieved according to the experimental results and the clinical experts.