This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate clu...This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.展开更多
Existing studies have challenged the current definition of named bacterial species,especially in the case of highly recombinogenic bacteria.This has led to considering the use of computational procedures to examine po...Existing studies have challenged the current definition of named bacterial species,especially in the case of highly recombinogenic bacteria.This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming.This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process.An implementation of the k-means algorithm has been developed based on an existing source code implementation,and it has been evaluated against MLST data.Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information.The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm.展开更多
The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one ...The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (i.e. early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis, The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two groups (i.e, maintainer line group and restorer line group) and seven sub-groups. The maintainer line group consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line group was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.展开更多
In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the sc...In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.展开更多
Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks o...Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.展开更多
The objective of this paper is to analyze the relationship among the interrelated gene sequences of Alzheimer’s disease (AD). Further this paper will provide a study on genetic factor of the occurrence about Alzheime...The objective of this paper is to analyze the relationship among the interrelated gene sequences of Alzheimer’s disease (AD). Further this paper will provide a study on genetic factor of the occurrence about Alzheimer’s disease, so as to provide more information on the prevention of Alzheimer’s disease, the clinical diagnosis and gene therapy for Alzheimer’s disease. The respective alignment of the Alzheimer’s disease interrelated gene sequences with those in The National Center for Biotechnology Information (NCBI) database was studied, and the measurement relationship of these sequences was identified and analyzed by the method of fuzzy cluster. The result of fuzzy cluster analysis indicates that the gene sequences interrelated within one group is consistently having closer relationship within the group other than in another group.展开更多
A modified DBSCAN algorithm is presented for deinterleaving of radar pulses in modern EW environments.A main characteristic of the proposed method is that using only time of arrival of pulses,the method can sort the p...A modified DBSCAN algorithm is presented for deinterleaving of radar pulses in modern EW environments.A main characteristic of the proposed method is that using only time of arrival of pulses,the method can sort the pulses efficiently.Other PDW information such as rise time,carrier frequency,pulse width,modulation on pulse,fall time and direction of arrival are not required.To identify the valid PRIs in a set of interleaved pulses,an innovative modification of the DBSCAN algorithm is introduced which is accurate and easy to implement.The proposed method determines valid PRIs more accurately and neglects the spurious ones more efficiently as compared to the classical histogram based algorithms such as SDIF.Furthermore,without specifying any input parameter,the proposed method can deinterleave radar pulses while up to 30%jitter is present in the associated PRI.The accuracy and efficiency of the proposed method are verified by computer simulations and real data results.Experimental simulations are based on different real and operational scenarios where the presence of missing and spurious pulses are also considered.So,the simulation results can be of practical significance.展开更多
The genetic diversity and relationship among 40 elite barley varieties were analyzed based on simple sequence repeat (SSR) genotyping data. The amplified fragments from SSR primers were highly polymorphic in the bad...The genetic diversity and relationship among 40 elite barley varieties were analyzed based on simple sequence repeat (SSR) genotyping data. The amplified fragments from SSR primers were highly polymorphic in the badey accessions investigated. A total of 85 alleles were detected at 35 SSR loci, and allelic variations existed at 29 SSR loci. The allele number per locus ranged from 1 to 5 with an average of 2.4 alleles per locus detected from the 40 badey accessions. A cluster analysis based on the genetic similarity coefficients was conducted and the 40 varieties were classified into two groups. Seven malting barley varieties from China fell into the same subgroup. It was found that the genetic diversity within the Chinese malting barley varieties was narrower than that in other barley germplasm sources, suggesting the importance and feasibility of introducing elite genotypes from different origins for malting barley breeding in China.展开更多
Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein I...Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein IscA1(or MagR)is found within the mitochondria of most eukaryotes.Magnetoreceptor(MagR)is a highly conserved A-type iron and iron-sulfur cluster-binding protein,characterized by two distinct types of iron-sulfur clusters,[2Fe-2S]and[3Fe-4S],each conferring unique magnetic properties.MagR forms a rod-like polymer structure in complex with photoreceptive cryptochrome(Cry)and serves as a putative magnetoreceptor for retrieving geomagnetic information in animal navigation.Although the N-terminal sequences of MagR vary among species,their specific function remains unknown.In the present study,we found that the N-terminal sequences of pigeon MagR,previously thought to serve as a mitochondrial targeting signal(MTS),were not cleaved following mitochondrial entry but instead modulated the efficiency with which iron-sulfur clusters and irons are bound.Moreover,the N-terminal region of MagR was required for the formation of a stable MagR/Cry complex.Thus,the N-terminal sequences in pigeon MagR fulfil more important functional roles than just mitochondrial targeting.These results further extend our understanding of the function of MagR and provide new insights into the origin of magnetoreception from an evolutionary perspective.展开更多
With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network securi...With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network security.The blockchain uses the P2P protocol to implement various functions across the network.Furthermore,the P2P protocol format of blockchain may differ from the standard format specification,which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them.Therefore,the ability to distinguish different types of unknown network protocols is vital for network security.In this paper,we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols,which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats.We mine the maximum frequent sequences of protocolmessage sets in bytes.Andwe calculate the fuzzymembership of the protocolmessage to each maximum frequent sequence,which is based on fuzzy set theory.Then we construct the fuzzy membership vector for each protocol message.Finally,we adopt K-means++to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity,integrity,and Fowlkes and Mallows Index(FMI).Besides,the clustering algorithms based onNeedleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper.Compared with these traditional clustering methods,we demonstrate a certain improvement in the clustering performance of our work.展开更多
In 1997 - 2003, 27 earthquakes with M≥ 5.0 occurred in the Jiashi-Bachu area of Xinjiang. It was a rare strong earthquake swarm activity. The earthquake swarm has three time segments of activity with different magnit...In 1997 - 2003, 27 earthquakes with M≥ 5.0 occurred in the Jiashi-Bachu area of Xinjiang. It was a rare strong earthquake swarm activity. The earthquake swarm has three time segments of activity with different magnitudes in the years 1997, 1998 and 2003. In different time segments, the seismic activity showed strengthenin-qguiet changes in various degrees before earthquakes with M ≥ 5.0. In order to delimitate effectively the precursory meaning of the clustering (strengthening) quiet change in sequence and to seek the time criterion for impending prediction, the nonlinear characteristics of seismic activity have been used to analyze the time structure characteristics of the earthquake swarm sequence, and further to forecast the development tendency of earthquake sequences in the future. Using the sequence catalogue recorded by the Kashi Station, and taking the earthquakes with Ms≥ 5.0 in the sequence as the starting point and the next earthquake with Ms = 5.0 as the end, statistical analysis has been performed on the time structure relations of the earthquake sequence in different stages. The main results are as follows: (1) Before the major earthquakes with M ≥ 5.0 in the swarm sequence, the time variation coefficient (δ-value) has abnormal demonstrations to different degrees. (2) Within 10 days after δ= 1, occurrence of earthquakes with M ≥ 5.0 in the swarm is very possible. (3) The time variation coefficient has three types of change. (4) The change process before earthquakes with M5.0 is similar to that before earthquakes with M6.0, with little difference in the threshold value. In the earthquake swarm sequence, it is difficult to delimitate accurately the attribute of the current sequences (foreshock or aftershock sequence) and to judge the magnitude of the follow-up earthquake by δ-value. We can only make the judgment that earthquakes with M5.0 are likely to occur in the sequence. (5) The critical clustering characteristics of the sequence are hierarchical. Only corresponding to a certain magnitude can the sequence have the variation state of critical clustering. (6) The coefficient of the time variation has a clear meaning in physics. After the clustering-quiet state of earthquake activity has appeared, it can describe clearly the randomness of the seismogenic system. Furthermore, it can efficiently clarify whether or not the clustering quiescence variation is of some prognostic meaning. In the case that the earthquake frequency attenuation is essentially normal (h 〉 1 ) and there is no remarkable clustering-quiescence state, it is still possible to discover the abnormal change of the sequence from the time variation coefficient. On the contrary, in the later period of swarm activity, after the appearance of many seismic quiescence phenomena, this coefficient did not appear abnormally, even when h 〈 1, suggesting that the δ-value diagnosis is more universal.展开更多
The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question neede...The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail Experiments have proved that our method is valid and efficient.展开更多
Shallow earthquakes usually show obvious spatio-temporal clustering patterns. In this study, several spatio-temporal point process models are applied to investigate the clustering characteristics of the well-known Tan...Shallow earthquakes usually show obvious spatio-temporal clustering patterns. In this study, several spatio-temporal point process models are applied to investigate the clustering characteristics of the well-known Tangshan sequence based on classical empirical laws and a few assumptions. The relative fit of competing models is compared by Akalke Information Criterion. The spatial clustering pattern is well characterized by the model which gives the best fit to the data. A simulated aftershock sequence is generated by thinning algorithm and compared with the real seismicity.展开更多
A new method of fault domain identification is proposed based on K-means clustering analysis theories using the wide-area information of power grid. In the method, the node Intelligent Electronic Device (IED) associat...A new method of fault domain identification is proposed based on K-means clustering analysis theories using the wide-area information of power grid. In the method, the node Intelligent Electronic Device (IED) associated domain is defined, and the relationship of positive sequence current fault component for the association domain boundaries is sought, then the conception of positive sequence fault component differential current for node IED association domains is introduced. The information of the positive sequence fault component differential current gathered by node IEDs is selected as the object of K-means clustering. The node IEDs of fault associated domains can be classified into one category, and the node IEDs of non-fault associated domains are classified into another category. With the fault area minimum principle, the group of node IEDs about fault associated domains can be obtained. The overlap of fault associated domains for different nodes is the fault area. A large number of simulations show that the algorithm proposed can identify fault domains with high accuracy and no influence by the operating mode of the system and topological changes.展开更多
Foraminifera are highly diverse and have a long evolutionary history.As key bioindicators,their phylogenetic schemes are of great importance for paleogeographic applications,but may be hard to recognize correctly.The ...Foraminifera are highly diverse and have a long evolutionary history.As key bioindicators,their phylogenetic schemes are of great importance for paleogeographic applications,but may be hard to recognize correctly.The phylogenetic relationships within the prominent genus Amphistegina are still uncertain.Molecular studies on Amphistegina have so far only focused on genetic diversity within single species and suggested a cryptic diversity that demands for further investigations.Besides molecular sequencing-based approaches,different mass spectrometry-based proteomics approaches are increasingly used to give insights into the relationship between samples and organisms,especially as these do not require reference databases.To better understand the relationship of amphisteginids and test different proteomics-based approaches we applied de novo peptide sequencing and similarity clustering to several populations of Amphistegina lobifera,A.lessonii and A.gibbosa.We also analyzed the dominant photosymbiont community to study their influence on holobiont proteomes.Our analyses indicate that especially de novo peptide sequencing allows to reconstruct the relationship among foraminiferal holobionts,although the detected separation of A.gibbosa from A.lessonii and A.lobifera may be partly influenced by their different photosymbiont types.The resulting dendrograms reflect the separation in two lineages previously suggested and provide a basis for future studies.展开更多
<i>Bacillus thuringiensis</i> (Bt) parasporal crystal proteins were well known to be toxic to certain insects and cytocidal activity against various human cancer cells. Bt serovar <i>coreanensis</...<i>Bacillus thuringiensis</i> (Bt) parasporal crystal proteins were well known to be toxic to certain insects and cytocidal activity against various human cancer cells. Bt serovar <i>coreanensis</i> ST7, non-pathogenic to insects and non-hemolytic, has an important parasporin, PS4Aa1 (Cry45Aa1), with potential toxicity to human cancer cells. In this study, we reported the feature of complete genome sequence and the cluster of orthologous groups of proteins function classification of ST7. Meanwhile, the evolutionary of ST7 was also studied. The genome data of ST7 will strongly contribute to a better understanding of the genomic diversity and evolution, and enrich the Bt genome database.展开更多
Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketin...Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.展开更多
The source parameters of the Yingjiang earthquake sequences in 2008 are obtained by applying spectral analysis and Brunes source model,based on the digital waveform data recorded by the Yunnan Digital Seismic Network....The source parameters of the Yingjiang earthquake sequences in 2008 are obtained by applying spectral analysis and Brunes source model,based on the digital waveform data recorded by the Yunnan Digital Seismic Network.The correlation coefficients are calculated using the low-frequency spectral amplitudes of 2 events recorded by a same station,then,events with similar focal mechanism are grouped using the clustering analysis method.Compared to the obtained focal mechanisms,it is found that there are good correlations with the azimuth of P axes in each clustering group,and the larger the correlation coefficient,the closer the azimuths of P axes.We divide the Yingjiang area into 3 regions to analyze the stress level and stress direction by combining the source parameters and the mean focal mechanism of each group.The results show:The change and transformation of the focal mechanism types at different stages can represent the temporal characteristics of the regional stress field.If the earthquake focal mechanism types are concentrated in a time period and switch to the direction of regional stress field,it may be a sign of strong earthquake.There is some relationship between the stress drop and the type of focal mechanism.Those earthquakes with stress fields revealed by focal mechanism types closer to the regional tectonic stress field will have higher stress drop,while those with the focal mechanism-revealed stress fields differing a lot from the regional tectonic stress field will generally have a lower stress drop.展开更多
基金Supported by the National Natural Science Funda-tion of China (60173058)
文摘This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.
文摘Existing studies have challenged the current definition of named bacterial species,especially in the case of highly recombinogenic bacteria.This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming.This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process.An implementation of the k-means algorithm has been developed based on an existing source code implementation,and it has been evaluated against MLST data.Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information.The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm.
文摘The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (i.e. early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis, The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two groups (i.e, maintainer line group and restorer line group) and seven sub-groups. The maintainer line group consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line group was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.
基金Project supported by the National Natural Science Foundation of China (No. 20574052)Program for New Century Excellent Talents in University,and the Natural Science Foundation of Zhejiang Prov-ince (Nos. R404047 and Y405011),China
文摘In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.
基金Project supported by the National Natural Science Foundation ofChina (Nos. 20174036 20274040)+2 种基金 and the Natural Science Founda-tion of Zhejiang Province (Nos. R404047 10102) China
文摘Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.
文摘The objective of this paper is to analyze the relationship among the interrelated gene sequences of Alzheimer’s disease (AD). Further this paper will provide a study on genetic factor of the occurrence about Alzheimer’s disease, so as to provide more information on the prevention of Alzheimer’s disease, the clinical diagnosis and gene therapy for Alzheimer’s disease. The respective alignment of the Alzheimer’s disease interrelated gene sequences with those in The National Center for Biotechnology Information (NCBI) database was studied, and the measurement relationship of these sequences was identified and analyzed by the method of fuzzy cluster. The result of fuzzy cluster analysis indicates that the gene sequences interrelated within one group is consistently having closer relationship within the group other than in another group.
文摘A modified DBSCAN algorithm is presented for deinterleaving of radar pulses in modern EW environments.A main characteristic of the proposed method is that using only time of arrival of pulses,the method can sort the pulses efficiently.Other PDW information such as rise time,carrier frequency,pulse width,modulation on pulse,fall time and direction of arrival are not required.To identify the valid PRIs in a set of interleaved pulses,an innovative modification of the DBSCAN algorithm is introduced which is accurate and easy to implement.The proposed method determines valid PRIs more accurately and neglects the spurious ones more efficiently as compared to the classical histogram based algorithms such as SDIF.Furthermore,without specifying any input parameter,the proposed method can deinterleave radar pulses while up to 30%jitter is present in the associated PRI.The accuracy and efficiency of the proposed method are verified by computer simulations and real data results.Experimental simulations are based on different real and operational scenarios where the presence of missing and spurious pulses are also considered.So,the simulation results can be of practical significance.
基金Project supported by the National Natural Science Foundation of China (Nos. 30700485 and 30771333)the Zhejiang Provincial Natural Science Foundation (No. Y306641),China
文摘The genetic diversity and relationship among 40 elite barley varieties were analyzed based on simple sequence repeat (SSR) genotyping data. The amplified fragments from SSR primers were highly polymorphic in the badey accessions investigated. A total of 85 alleles were detected at 35 SSR loci, and allelic variations existed at 29 SSR loci. The allele number per locus ranged from 1 to 5 with an average of 2.4 alleles per locus detected from the 40 badey accessions. A cluster analysis based on the genetic similarity coefficients was conducted and the 40 varieties were classified into two groups. Seven malting barley varieties from China fell into the same subgroup. It was found that the genetic diversity within the Chinese malting barley varieties was narrower than that in other barley germplasm sources, suggesting the importance and feasibility of introducing elite genotypes from different origins for malting barley breeding in China.
基金supported by the National Natural Science Foundation of China(31640001 and T2350005 to C.X.,U21A20148 to X.Z.and C.X.)Ministry of Science and Technology of China(2021ZD0140300 to C.X.)+2 种基金Natural Science Foundation of Hainan Province(No.822RC703 for J.L.)Foundation of Hainan Educational Committee(No.Hnky2022-27 for J.L.)Presidential Foundation of Hefei Institutes of Physical Science,Chinese Academy of Sciences(Y96XC11131,E26CCG27,and E26CCD15 to C.X.,E36CWGBR24B and E36CZG14132 to T.C.)。
文摘Iron-sulfur clusters(ISC)are essential cofactors for proteins involved in various biological processes,such as electron transport,biosynthetic reactions,DNA repair,and gene expression regulation.ISC assembly protein IscA1(or MagR)is found within the mitochondria of most eukaryotes.Magnetoreceptor(MagR)is a highly conserved A-type iron and iron-sulfur cluster-binding protein,characterized by two distinct types of iron-sulfur clusters,[2Fe-2S]and[3Fe-4S],each conferring unique magnetic properties.MagR forms a rod-like polymer structure in complex with photoreceptive cryptochrome(Cry)and serves as a putative magnetoreceptor for retrieving geomagnetic information in animal navigation.Although the N-terminal sequences of MagR vary among species,their specific function remains unknown.In the present study,we found that the N-terminal sequences of pigeon MagR,previously thought to serve as a mitochondrial targeting signal(MTS),were not cleaved following mitochondrial entry but instead modulated the efficiency with which iron-sulfur clusters and irons are bound.Moreover,the N-terminal region of MagR was required for the formation of a stable MagR/Cry complex.Thus,the N-terminal sequences in pigeon MagR fulfil more important functional roles than just mitochondrial targeting.These results further extend our understanding of the function of MagR and provide new insights into the origin of magnetoreception from an evolutionary perspective.
基金National Natural Science Foundation of China under Grant No.61872111Sichuan Science and Technology Program(No.2019YFSY0049)the“Project for the Development and Application of Safety Testing and Verification Platform for Industrial Robots”of the Ministry of Industry and Information Technology.
文摘With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network security.The blockchain uses the P2P protocol to implement various functions across the network.Furthermore,the P2P protocol format of blockchain may differ from the standard format specification,which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them.Therefore,the ability to distinguish different types of unknown network protocols is vital for network security.In this paper,we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols,which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats.We mine the maximum frequent sequences of protocolmessage sets in bytes.Andwe calculate the fuzzymembership of the protocolmessage to each maximum frequent sequence,which is based on fuzzy set theory.Then we construct the fuzzy membership vector for each protocol message.Finally,we adopt K-means++to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity,integrity,and Fowlkes and Mallows Index(FMI).Besides,the clustering algorithms based onNeedleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper.Compared with these traditional clustering methods,we demonstrate a certain improvement in the clustering performance of our work.
基金a sub-project entitled"Strong Earthquake Trend Assessment of the Jiashi-Bachu and the Tianshan,Xinjiang Areas (Grant No.200333116-06)"under the project of "The MS6.8 Jiashi-Bachu, Xinjiang Earthquakesthe Strong Earthquake Trendin the Future" of the key science and technology research program of Xinjiang Uygur Autonomous Region
文摘In 1997 - 2003, 27 earthquakes with M≥ 5.0 occurred in the Jiashi-Bachu area of Xinjiang. It was a rare strong earthquake swarm activity. The earthquake swarm has three time segments of activity with different magnitudes in the years 1997, 1998 and 2003. In different time segments, the seismic activity showed strengthenin-qguiet changes in various degrees before earthquakes with M ≥ 5.0. In order to delimitate effectively the precursory meaning of the clustering (strengthening) quiet change in sequence and to seek the time criterion for impending prediction, the nonlinear characteristics of seismic activity have been used to analyze the time structure characteristics of the earthquake swarm sequence, and further to forecast the development tendency of earthquake sequences in the future. Using the sequence catalogue recorded by the Kashi Station, and taking the earthquakes with Ms≥ 5.0 in the sequence as the starting point and the next earthquake with Ms = 5.0 as the end, statistical analysis has been performed on the time structure relations of the earthquake sequence in different stages. The main results are as follows: (1) Before the major earthquakes with M ≥ 5.0 in the swarm sequence, the time variation coefficient (δ-value) has abnormal demonstrations to different degrees. (2) Within 10 days after δ= 1, occurrence of earthquakes with M ≥ 5.0 in the swarm is very possible. (3) The time variation coefficient has three types of change. (4) The change process before earthquakes with M5.0 is similar to that before earthquakes with M6.0, with little difference in the threshold value. In the earthquake swarm sequence, it is difficult to delimitate accurately the attribute of the current sequences (foreshock or aftershock sequence) and to judge the magnitude of the follow-up earthquake by δ-value. We can only make the judgment that earthquakes with M5.0 are likely to occur in the sequence. (5) The critical clustering characteristics of the sequence are hierarchical. Only corresponding to a certain magnitude can the sequence have the variation state of critical clustering. (6) The coefficient of the time variation has a clear meaning in physics. After the clustering-quiet state of earthquake activity has appeared, it can describe clearly the randomness of the seismogenic system. Furthermore, it can efficiently clarify whether or not the clustering quiescence variation is of some prognostic meaning. In the case that the earthquake frequency attenuation is essentially normal (h 〉 1 ) and there is no remarkable clustering-quiescence state, it is still possible to discover the abnormal change of the sequence from the time variation coefficient. On the contrary, in the later period of swarm activity, after the appearance of many seismic quiescence phenomena, this coefficient did not appear abnormally, even when h 〈 1, suggesting that the δ-value diagnosis is more universal.
基金Supported by the Foundation of Hubei Key Technology Research and Development(2005AA101C18)the Natural Science Founda-tion of South-Central University for Nationalities(YZY06009)
文摘The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail Experiments have proved that our method is valid and efficient.
基金supported by National Natural Science of Foundation of China(No.10871026)
文摘Shallow earthquakes usually show obvious spatio-temporal clustering patterns. In this study, several spatio-temporal point process models are applied to investigate the clustering characteristics of the well-known Tangshan sequence based on classical empirical laws and a few assumptions. The relative fit of competing models is compared by Akalke Information Criterion. The spatial clustering pattern is well characterized by the model which gives the best fit to the data. A simulated aftershock sequence is generated by thinning algorithm and compared with the real seismicity.
文摘A new method of fault domain identification is proposed based on K-means clustering analysis theories using the wide-area information of power grid. In the method, the node Intelligent Electronic Device (IED) associated domain is defined, and the relationship of positive sequence current fault component for the association domain boundaries is sought, then the conception of positive sequence fault component differential current for node IED association domains is introduced. The information of the positive sequence fault component differential current gathered by node IEDs is selected as the object of K-means clustering. The node IEDs of fault associated domains can be classified into one category, and the node IEDs of non-fault associated domains are classified into another category. With the fault area minimum principle, the group of node IEDs about fault associated domains can be obtained. The overlap of fault associated domains for different nodes is the fault area. A large number of simulations show that the algorithm proposed can identify fault domains with high accuracy and no influence by the operating mode of the system and topological changes.
基金funded by the Leibniz Association(No.SAW-2014-ISAS-2)awarded to Hildegard Westphal,Albert Sickmann and Jorg Rahnenführer。
文摘Foraminifera are highly diverse and have a long evolutionary history.As key bioindicators,their phylogenetic schemes are of great importance for paleogeographic applications,but may be hard to recognize correctly.The phylogenetic relationships within the prominent genus Amphistegina are still uncertain.Molecular studies on Amphistegina have so far only focused on genetic diversity within single species and suggested a cryptic diversity that demands for further investigations.Besides molecular sequencing-based approaches,different mass spectrometry-based proteomics approaches are increasingly used to give insights into the relationship between samples and organisms,especially as these do not require reference databases.To better understand the relationship of amphisteginids and test different proteomics-based approaches we applied de novo peptide sequencing and similarity clustering to several populations of Amphistegina lobifera,A.lessonii and A.gibbosa.We also analyzed the dominant photosymbiont community to study their influence on holobiont proteomes.Our analyses indicate that especially de novo peptide sequencing allows to reconstruct the relationship among foraminiferal holobionts,although the detected separation of A.gibbosa from A.lessonii and A.lobifera may be partly influenced by their different photosymbiont types.The resulting dendrograms reflect the separation in two lineages previously suggested and provide a basis for future studies.
文摘<i>Bacillus thuringiensis</i> (Bt) parasporal crystal proteins were well known to be toxic to certain insects and cytocidal activity against various human cancer cells. Bt serovar <i>coreanensis</i> ST7, non-pathogenic to insects and non-hemolytic, has an important parasporin, PS4Aa1 (Cry45Aa1), with potential toxicity to human cancer cells. In this study, we reported the feature of complete genome sequence and the cluster of orthologous groups of proteins function classification of ST7. Meanwhile, the evolutionary of ST7 was also studied. The genome data of ST7 will strongly contribute to a better understanding of the genomic diversity and evolution, and enrich the Bt genome database.
文摘Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.
基金funded under the National Science and Technology Support Program of the 12th "Five-year Plan",China(2012BAK19B02)
文摘The source parameters of the Yingjiang earthquake sequences in 2008 are obtained by applying spectral analysis and Brunes source model,based on the digital waveform data recorded by the Yunnan Digital Seismic Network.The correlation coefficients are calculated using the low-frequency spectral amplitudes of 2 events recorded by a same station,then,events with similar focal mechanism are grouped using the clustering analysis method.Compared to the obtained focal mechanisms,it is found that there are good correlations with the azimuth of P axes in each clustering group,and the larger the correlation coefficient,the closer the azimuths of P axes.We divide the Yingjiang area into 3 regions to analyze the stress level and stress direction by combining the source parameters and the mean focal mechanism of each group.The results show:The change and transformation of the focal mechanism types at different stages can represent the temporal characteristics of the regional stress field.If the earthquake focal mechanism types are concentrated in a time period and switch to the direction of regional stress field,it may be a sign of strong earthquake.There is some relationship between the stress drop and the type of focal mechanism.Those earthquakes with stress fields revealed by focal mechanism types closer to the regional tectonic stress field will have higher stress drop,while those with the focal mechanism-revealed stress fields differing a lot from the regional tectonic stress field will generally have a lower stress drop.