The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjia...The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjiang Plain, Heilongjiang Province, China. Semi-variograms and Z-test value were calculated to assess the separability of grey-level co-occurrence texture measures to maximize the difference between land cover types. The degree of spatial autocorrelation showed that window sizes of 3×3 pixels and 11×11 pixels were most appropriate for Landsat TM im- age texture calculations. The texture analysis showed that co-occurrence entropy, dissimilarity, and variance texture measures, derived from the Landsat TM spectrum bands and vegetation indices provided the most significant statistical differentiation between land cover types. Subsequently, a Classification and Regression Tree (CART) algorithm was applied to three different combinations of predictors: 1) TM imagery alone (TM-only); 2) TM imagery plus image texture (TM+TXT model); and 3) all predictors including TM imagery, image texture and additional ancillary GIS in- formation (TM+TXT+GIS model). Compared with traditional Maximum Likelihood Classification (MLC) supervised classification, three classification trees predictive models reduced the overall error rate significantly. Image texture measures and ancillary geographical variables depressed the speckle noise effectively and reduced classification error rate of marsh obviously. For classification trees model making use of all available predictors, omission error rate was 12.90% and commission error rate was 10.99% for marsh. The developed method is portable, relatively easy to im- plement and should be applicable in other settings and over larger extents.展开更多
Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services...Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services is influenced by species diversity,tree health,and the distribution and the composition of trees.Traditionally,data on urban trees has been collected through field surveys and manual interpretation of remote sensing images.In this study,we evaluated the effectiveness of multispectral airborne laser scanning(ALS)data in classifying 24 common urban roadside tree species in Espoo,Finland.Tree crown structure information,intensity features,and spectral data were used for classification.Eight different machine learning algorithms were tested,with the extra trees(ET)algorithm performing the best,achieving an overall accuracy of 71.7%using multispectral LiDAR data.This result highlights that integrating structural and spectral information within a single framework can improve the classification accuracy.Future research will focus on identifying the most important features for species classification and developing algorithms with greater efficiency and accuracy.展开更多
Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree c...Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree classification rules through multi-source and multi-temporal feature fusion, classified groundobjects before the disaster and extracted flood information in the disaster area based on optical imagesduring the disaster, so as to achieve rapid acquisition of the disaster situation of each disaster bearing object.In the case of Qianliang Lake, which suffered from flooding in 2020, the results show that decision treeclassification algorithms based on multi-temporal features can effectively integrate multi-temporal and multispectralinformation to overcome the shortcomings of single-temporal image classification and achieveground-truth object classification.展开更多
The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been auto...In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.展开更多
This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric feature...This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.展开更多
To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree...To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.展开更多
[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification...[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification method was established through the expert classification knowledge base.The images of Dongting Lake wetland were classified into water area,mudflat,protection forest beach,Carem spp beach,Phragmites beach,Carex beach and other water body according to decision tree layers.[Result] The accuracy of decision tree classification reached 80.29%,which was much higher than the traditional method,and the total Kappa coefficient was 0.883 9,indicating that the data accuracy of this method could fulfill the requirements of actual practice.In addition,the image classification results based on knowledge could solve some classification mistakes.[Conclusion] Compared with the traditional method,the decision tree classification based on rules could classify the images by using various conditions,which reduced the data processing time and improved the classification accuracy.展开更多
A machine-learning approach was developed for automated building of knowledgebases for soil resources mapping by using a classification tree to generate knowledge from trainingdata. With this method, building a knowle...A machine-learning approach was developed for automated building of knowledgebases for soil resources mapping by using a classification tree to generate knowledge from trainingdata. With this method, building a knowledge base for automated soil mapping was easier than usingthe conventional knowledge acquisition approach. The knowledge base built by classification tree wasused by the knowledge classifier to perform the soil type classification of Longyou County,Zhejiang Province, China using Landsat TM bi-temporal images and CIS data. To evaluate theperformance of the resultant knowledge bases, the classification results were compared to existingsoil map based on a field survey. The accuracy assessment and analysis of the resultant soil mapssuggested that the knowledge bases built by the machine-learning method was of good quality formapping distribution model of soil classes over the study area.展开更多
The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aer...The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aerial vehicles(UAVs) provides a new research direction for urban tree species classification.We proposed an RGB optical image dataset with 10 urban tree species,termed TCC10,which is a benchmark for tree canopy classification(TCC).TCC10 dataset contains two types of data:tree canopy images with simple backgrounds and those with complex backgrounds.The objective was to examine the possibility of using deep learning methods(AlexNet,VGG-16,and ResNet-50) for individual tree species classification.The results of convolutional neural networks(CNNs) were compared with those of K-nearest neighbor(KNN) and BP neural network.Our results demonstrated:(1) ResNet-50 achieved an overall accuracy(OA) of 92.6% and a kappa coefficient of 0.91 for tree species classification on TCC10 and outperformed AlexNet and VGG-16.(2) The classification accuracy of KNN and BP neural network was less than70%,while the accuracy of CNNs was relatively higher.(3)The classification accuracy of tree canopy images with complex backgrounds was lower than that for images with simple backgrounds.For the deciduous tree species in TCC10,the classification accuracy of ResNet-50 was higher in summer than that in autumn.Therefore,the deep learning is effective for urban tree species classification using RGB optical images.展开更多
Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive p...Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter da...Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter data could be used to estimate sea ice freeboard and provide alternative Antarctic sea ice thickness information with a high precision and long time series,as other radar altimetry satellites can,needs further investigation.This paper proposed an algorithm to discriminate leads and then retrieve sea ice freeboard and thickness from HY-2B radar altimeter data.We first collected the Moderate-resolution Imaging Spectroradiometer ice surface temperature(IST)product from the National Aeronautics and Space Administration to extract leads from the Antarctic waters and verified their accuracy through Sentinel-1 Synthetic Aperture Radar images.Second,a surface classification decision tree was generated for HY-2B satellite altimeter measurements of the Antarctic waters to extract leads and calculate local sea surface heights.We then estimated the Antarctic sea ice freeboard and thickness based on local sea surface heights and the static equilibrium equation.Finally,the retrieved HY-2B Antarctic sea ice thickness was compared with the CryoSat-2 sea ice thickness and the Antarctic Sea Ice Processes and Climate(ASPeCt)ship-based observed sea ice thickness.The results indicate that our classification decision tree constructed for HY-2B satellite altimeter measurements was reasonable,and the root mean square error of the obtained sea ice thickness compared to the ship measurements was 0.62 m.The proposed sea ice thickness algorithm for the HY-2B radar satellite fills a gap in this application domain for the HY-series satellites and can be a complement to existing Antarctic sea ice thickness products;this algorithm could provide long-time-series and large-scale sea ice thickness data that contribute to research on global climate change.展开更多
Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.M...Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.Machine learn-ing methods for land use and land cover(LULC)classification are vital for monitoring environmental changes.Remote sensing advancements increase the potential for classifying land cover,which requires assessing algorithm ac-curacy and efficiency for fragile environments.This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensi-tive ecosystems.Landsat-9 imagery from January to April 2023 facilitated land use class identification.Preprocessing in the Google Earth Engine applied spec-tral indices such as the NDVI,NDWI,BSI,and NDBI.Supervised classification uses random forest(RF),support vector machine(SVM),classification and re-gression trees(CARTs),gradient boosting trees(GBTs),and naïve Bayes.An accuracy assessment was used to determine the optimal classifiers for future land use analyses.The evaluation revealed that the RF model achieved 84.4%accuracy with a 0.85 weighted F1 score,indicating its effectiveness for complex LULC data.In contrast,the GBT and CART methods yielded moderate F1 scores(0.77 and 0.68),indicating the presence of overclassification and class imbalance issues.The SVM and naïve Bayes methods were less accurate,ren-dering them unsuitable for LULC tasks.RF is optimal for monitoring and plan-ning land use in dynamic arid areas.Future research should explore hybrid methods and diversify training sites to improve performance.展开更多
With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working...With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.展开更多
We built a classification tree (CT) model to estimate climatic factors controlling the cold temperate coniferous forest (CTCF) distributions in Yunnan province and to predict its potential habitats under the curre...We built a classification tree (CT) model to estimate climatic factors controlling the cold temperate coniferous forest (CTCF) distributions in Yunnan province and to predict its potential habitats under the current and future climates, using seven climate change scenarios, projected over the years of 2070-2099. The accurate CT model on CTCFs showed that minimum temperature of coldest month (TMW) was the overwhelmingly potent factor among the six climate variables. The areas of TMW〈-4.05 were suitable habitats of CTCF, and the areas of -1.35 〈 TMW were non-habitats, where temperate conifer and broad-leaved mixed forests (TCBLFs) were distribute in lower elevation, bordering on the CTCF. Dominant species of Abies, Picea, and Larix in the CTCFs, are more tolerant to winter coldness than Tsuga and broad-leaved trees including deciduous broad-leaved Acer and Betula, evergreen broad- leaved Cyclobalanopsis and Lithocarpus in TCBLFs. Winter coldness may actually limit the cool-side distributions of TCBLFs in the areas between -1.35℃ and -4.05℃, and the warm-side distributions of CTCFs may be controlled by competition to the species of TCBLFs. Under future climate scenarios, the vulnerable area, where current potential (suitable + marginal) habitats (80,749 km^2) shift to non-habitats, was predicted to decrease to 55.91% (45,053 km^2) of the current area. Inferring from the current vegetation distribution pattern, TCBLFs will replace declining CTCFs. Vulnerable areas predicted by models are important in determining priority of ecosystem conservation.展开更多
Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for t...Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for the IS in Beijing metropolitan region. However, most previous studies primarily considered the Beijing metropolitan region as a whole without considering the differences and heterogeneity among the function zones. In this study, the subpixel impervious surface results in Beijing within a time series(1991, 2001, 2005, 2011 and 2015) were extracted by means of the classification and regression tree(CART) model combined with change detection models. Then based on the method of standard deviation ellipse, Lorenz curve, contribution index(CI) and landscape metrics, the spatio-temporal dynamics and variations of IS(1991, 2001, 2011 and 2015) in different function zones and districts were analyzed. It is found that the total area of impervious surface in Beijing increased dramatically during the study period, increasing about 144.18%. The deflection angle of major axis of standard deviation ellipse decreased from 47.15° to 38.82°, indicating the major development axis in Beijing gradually moved from northeast-southwest to north-south. Moreover, the heterogeneity of impervious surface’s distribution among 16 districts weakened gradually, but the CI values and landscape metrics in four function zones differed greatly. The urban function extended zone(UFEZ), the main source of the growth of IS in Beijing, had the highest CI values. Its lowest CI value was 1.79 that is still much higher than the highest CI value in other function zones. The core function zone(CFZ), the traditional aggregation zone of impervious surface, had the highest contagion index(CONTAG) values, but it contributed less than UFEZ due to its small area. The CI value of the new urban developed zone(NUDZ) increased rapidly, and it increased from negative to positive and multiplied, becoming animportant contributor to the rise of urban impervious surface. However, the ecological conservation zone(ECZ) had a constant negative contribution all the time, and its CI value decreased gradually. Moreover, the landscape metrics and centroids of impervious surface in different density classes differed greatly. The high-density impervious surface had a more compact configuration and a greater impact on the eco-environment.展开更多
The accurate prediction of poverty is critical to efforts of poverty reduction,and high-resolution remote sensing(HRRS)data have shown great promise for facilitating such prediction.Accordingly,the present study used ...The accurate prediction of poverty is critical to efforts of poverty reduction,and high-resolution remote sensing(HRRS)data have shown great promise for facilitating such prediction.Accordingly,the present study used HRRS with 1 m resolution and 238 households data to evaluate the utility and optimal scale of HRRS data for predicting household poverty in a grassland region of Inner Mongolia,China.The prediction of household poverty was improved by using remote sensing indicators at multiple scales,instead of indicators at a single scale,and a model that combined indicators from four scales(building land,household,neighborhood,and regional)provided the most accurate prediction of household poverty,with testing and training accuracies of 48.57%and 70.83%,respectively.Furthermore,building area was the most efficient indicator of household poverty.When compared to conducting household surveys,the analysis of HRRS data is a cheaper and more time-efficient method for predicting household poverty and,in this case study,it reduced study time and cost by about 75%and 90%,respectively.This study provides the first evaluation of HRRS data for the prediction of household poverty in pastoral areas and thus provides technical support for the identification of poverty in pastoral areas around the world.展开更多
Asphaltenes have always been an attractive subject for researchers.However,the application of this fraction of the geochemical field has only been studied in a limited way.In other words,despite many studies on asphal...Asphaltenes have always been an attractive subject for researchers.However,the application of this fraction of the geochemical field has only been studied in a limited way.In other words,despite many studies on asphaltene structure,the application of asphaltene structures in organic geochemistry has not so far been assessed.Oil-oil correlation is a wellknown concept in geochemical studies and plays a vital role in basin modeling and the reconstruction of the burial history of basin sediments,as well as accurate characterization of the relevant petroleum system.This study aims to propose the Xray diffraction(XRD)technique as a novel method for oil-oil correlation and investigate its reliability and accuracy for different crude oils.To this end,13 crude oil samples from the Iranian sector of the Persian Gulf region,which had previously been correlated by traditional geochemical tools such as biomarker ratios and isotope values,in four distinct genetic groups,were selected and their asphaltene fractions analyzed by two prevalent methods of XRD and Fouriertransform infrared spectroscopy(FTIR).For oil-oil correlation assessment,various cross-plots,as well as principal component analysis(PCA),were conducted,based on the structural parameters of the studied asphaltenes.The results indicate that asphaltene structural parameters can also be used for oil-oil correlation purposes,their results being completely in accord with the previous classifications.The average values of distance between saturated portions(d_(r))and the distance between two aromatic layers(d_(m))of asphaltene molecules belonging to the studied oil samples are 4.69Aand 3.54A,respectively.Furthermore,the average diameter of the aromatic sheets(L_(a)),the height of the clusters(L_(c)),the number of carbons per aromatic unit(C_(au)),the number of aromatic rings per layer(R_(a)),the number of sheets in the cluster(M_(e))and aromaticity(f_(a))values of these asphaltene samples are 10.09A,34.04A,17.42A,3.78A,10.61Aand 0.26A,respectively.The results of XRD parameters indicate that plots of dr vs.d_(m),d_(r) vs.M_(e),d_(r) vs.f_(a),d_(m) vs.L_(c),L_(c) vs.L_(a),and f_(a) vs.L_(a) perform appropriately for distinguishing genetic groups.A comparison between XRD and FTIR results indicated that the XRD method is more accurate for this purpose.In addition,decision tree classification,one of the most efficacious approaches of machine learning,was employed for the geochemical groups of this study for the first time.This tree,which was constructed using XRD data,can distinguish genetic groups accurately and can also determine the characteristics of each geochemical group.In conclusion,the obtaining of structural parameters for asphaltene by the XRD technique is a novel,precise and inexpensive method,which can be deployed as a new approach for oil-oil correlation goals.The findings of this study can help in the prompt determination of genetic groups as a screening method and can also be useful for assessing oil samples affected by secondary processes.展开更多
基金Under the auspices of National Natural Science Foundation of China (No. 40871188) National Key Technologies R&D Program of China (No. 2006BAD23B03)
文摘The main objective of this research is to determine the capacity of land cover classification combining spec- tral and textural features of Landsat TM imagery with ancillary geographical data in wetlands of the Sanjiang Plain, Heilongjiang Province, China. Semi-variograms and Z-test value were calculated to assess the separability of grey-level co-occurrence texture measures to maximize the difference between land cover types. The degree of spatial autocorrelation showed that window sizes of 3×3 pixels and 11×11 pixels were most appropriate for Landsat TM im- age texture calculations. The texture analysis showed that co-occurrence entropy, dissimilarity, and variance texture measures, derived from the Landsat TM spectrum bands and vegetation indices provided the most significant statistical differentiation between land cover types. Subsequently, a Classification and Regression Tree (CART) algorithm was applied to three different combinations of predictors: 1) TM imagery alone (TM-only); 2) TM imagery plus image texture (TM+TXT model); and 3) all predictors including TM imagery, image texture and additional ancillary GIS in- formation (TM+TXT+GIS model). Compared with traditional Maximum Likelihood Classification (MLC) supervised classification, three classification trees predictive models reduced the overall error rate significantly. Image texture measures and ancillary geographical variables depressed the speckle noise effectively and reduced classification error rate of marsh obviously. For classification trees model making use of all available predictors, omission error rate was 12.90% and commission error rate was 10.99% for marsh. The developed method is portable, relatively easy to im- plement and should be applicable in other settings and over larger extents.
文摘Urban tree species provide various essential ecosystem services in cities,such as regulating urban temperatures,reducing noise,capturing carbon,and mitigating the urban heat island effect.The quality of these services is influenced by species diversity,tree health,and the distribution and the composition of trees.Traditionally,data on urban trees has been collected through field surveys and manual interpretation of remote sensing images.In this study,we evaluated the effectiveness of multispectral airborne laser scanning(ALS)data in classifying 24 common urban roadside tree species in Espoo,Finland.Tree crown structure information,intensity features,and spectral data were used for classification.Eight different machine learning algorithms were tested,with the extra trees(ET)algorithm performing the best,achieving an overall accuracy of 71.7%using multispectral LiDAR data.This result highlights that integrating structural and spectral information within a single framework can improve the classification accuracy.Future research will focus on identifying the most important features for species classification and developing algorithms with greater efficiency and accuracy.
文摘Flood disasters can have a serious impact on people's production and lives, and can cause hugelosses in lives and property security. Based on multi-source remote sensing data, this study establisheddecision tree classification rules through multi-source and multi-temporal feature fusion, classified groundobjects before the disaster and extracted flood information in the disaster area based on optical imagesduring the disaster, so as to achieve rapid acquisition of the disaster situation of each disaster bearing object.In the case of Qianliang Lake, which suffered from flooding in 2020, the results show that decision treeclassification algorithms based on multi-temporal features can effectively integrate multi-temporal and multispectralinformation to overcome the shortcomings of single-temporal image classification and achieveground-truth object classification.
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
文摘In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.
文摘This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.
基金The National Natural Science Foundation of China(No.60473045)the Technology Research Project of Hebei Province(No.05213573)the Research Plan of Education Office of Hebei Province(No.2004406)
文摘To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
文摘[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification method was established through the expert classification knowledge base.The images of Dongting Lake wetland were classified into water area,mudflat,protection forest beach,Carem spp beach,Phragmites beach,Carex beach and other water body according to decision tree layers.[Result] The accuracy of decision tree classification reached 80.29%,which was much higher than the traditional method,and the total Kappa coefficient was 0.883 9,indicating that the data accuracy of this method could fulfill the requirements of actual practice.In addition,the image classification results based on knowledge could solve some classification mistakes.[Conclusion] Compared with the traditional method,the decision tree classification based on rules could classify the images by using various conditions,which reduced the data processing time and improved the classification accuracy.
基金Project supported by the National Natural Science Foundation of China(Nos.40101014 and 40001008).
文摘A machine-learning approach was developed for automated building of knowledgebases for soil resources mapping by using a classification tree to generate knowledge from trainingdata. With this method, building a knowledge base for automated soil mapping was easier than usingthe conventional knowledge acquisition approach. The knowledge base built by classification tree wasused by the knowledge classifier to perform the soil type classification of Longyou County,Zhejiang Province, China using Landsat TM bi-temporal images and CIS data. To evaluate theperformance of the resultant knowledge bases, the classification results were compared to existingsoil map based on a field survey. The accuracy assessment and analysis of the resultant soil mapssuggested that the knowledge bases built by the machine-learning method was of good quality formapping distribution model of soil classes over the study area.
基金supported by Joint Fund of Natural Science Foundation of Zhejiang-Qingshanhu Science and Technology City(Grant No.LQY18C160002)National Natural Science Foundation of China(Grant No.U1809208)+1 种基金Zhejiang Science and Technology Key R&D Program Funded Project(Grant No.2018C02013)Natural Science Foundation of Zhejiang Province(Grant No.LQ20F020005).
文摘The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aerial vehicles(UAVs) provides a new research direction for urban tree species classification.We proposed an RGB optical image dataset with 10 urban tree species,termed TCC10,which is a benchmark for tree canopy classification(TCC).TCC10 dataset contains two types of data:tree canopy images with simple backgrounds and those with complex backgrounds.The objective was to examine the possibility of using deep learning methods(AlexNet,VGG-16,and ResNet-50) for individual tree species classification.The results of convolutional neural networks(CNNs) were compared with those of K-nearest neighbor(KNN) and BP neural network.Our results demonstrated:(1) ResNet-50 achieved an overall accuracy(OA) of 92.6% and a kappa coefficient of 0.91 for tree species classification on TCC10 and outperformed AlexNet and VGG-16.(2) The classification accuracy of KNN and BP neural network was less than70%,while the accuracy of CNNs was relatively higher.(3)The classification accuracy of tree canopy images with complex backgrounds was lower than that for images with simple backgrounds.For the deciduous tree species in TCC10,the classification accuracy of ResNet-50 was higher in summer than that in autumn.Therefore,the deep learning is effective for urban tree species classification using RGB optical images.
基金supported by the National Natural Science Foundation of China (Grant No.42101403)the National Key Researchand Development Program of China (Grant No.2017YFD0600404)。
文摘Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
基金The National Natural Science Foundation of China under contract No.42076235.
文摘Antarctic sea ice is an important part of the Earth’s atmospheric system,and satellite remote sensing is an important technology for observing Antarctic sea ice.Whether Chinese Haiyang-2B(HY-2B)satellite altimeter data could be used to estimate sea ice freeboard and provide alternative Antarctic sea ice thickness information with a high precision and long time series,as other radar altimetry satellites can,needs further investigation.This paper proposed an algorithm to discriminate leads and then retrieve sea ice freeboard and thickness from HY-2B radar altimeter data.We first collected the Moderate-resolution Imaging Spectroradiometer ice surface temperature(IST)product from the National Aeronautics and Space Administration to extract leads from the Antarctic waters and verified their accuracy through Sentinel-1 Synthetic Aperture Radar images.Second,a surface classification decision tree was generated for HY-2B satellite altimeter measurements of the Antarctic waters to extract leads and calculate local sea surface heights.We then estimated the Antarctic sea ice freeboard and thickness based on local sea surface heights and the static equilibrium equation.Finally,the retrieved HY-2B Antarctic sea ice thickness was compared with the CryoSat-2 sea ice thickness and the Antarctic Sea Ice Processes and Climate(ASPeCt)ship-based observed sea ice thickness.The results indicate that our classification decision tree constructed for HY-2B satellite altimeter measurements was reasonable,and the root mean square error of the obtained sea ice thickness compared to the ship measurements was 0.62 m.The proposed sea ice thickness algorithm for the HY-2B radar satellite fills a gap in this application domain for the HY-series satellites and can be a complement to existing Antarctic sea ice thickness products;this algorithm could provide long-time-series and large-scale sea ice thickness data that contribute to research on global climate change.
文摘Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion,especially in Tiaty,Baringo,Kenya.These issues create mixed opportunities for pastoral and agro-pastoral livelihoods.Machine learn-ing methods for land use and land cover(LULC)classification are vital for monitoring environmental changes.Remote sensing advancements increase the potential for classifying land cover,which requires assessing algorithm ac-curacy and efficiency for fragile environments.This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensi-tive ecosystems.Landsat-9 imagery from January to April 2023 facilitated land use class identification.Preprocessing in the Google Earth Engine applied spec-tral indices such as the NDVI,NDWI,BSI,and NDBI.Supervised classification uses random forest(RF),support vector machine(SVM),classification and re-gression trees(CARTs),gradient boosting trees(GBTs),and naïve Bayes.An accuracy assessment was used to determine the optimal classifiers for future land use analyses.The evaluation revealed that the RF model achieved 84.4%accuracy with a 0.85 weighted F1 score,indicating its effectiveness for complex LULC data.In contrast,the GBT and CART methods yielded moderate F1 scores(0.77 and 0.68),indicating the presence of overclassification and class imbalance issues.The SVM and naïve Bayes methods were less accurate,ren-dering them unsuitable for LULC tasks.RF is optimal for monitoring and plan-ning land use in dynamic arid areas.Future research should explore hybrid methods and diversify training sites to improve performance.
文摘With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.
基金supported by the Environment Research and Technology Development Fund (S-14) of the Ministry of the EnvironmentJapan and JSPS KAKENHI Grant Numbers 15H02833
文摘We built a classification tree (CT) model to estimate climatic factors controlling the cold temperate coniferous forest (CTCF) distributions in Yunnan province and to predict its potential habitats under the current and future climates, using seven climate change scenarios, projected over the years of 2070-2099. The accurate CT model on CTCFs showed that minimum temperature of coldest month (TMW) was the overwhelmingly potent factor among the six climate variables. The areas of TMW〈-4.05 were suitable habitats of CTCF, and the areas of -1.35 〈 TMW were non-habitats, where temperate conifer and broad-leaved mixed forests (TCBLFs) were distribute in lower elevation, bordering on the CTCF. Dominant species of Abies, Picea, and Larix in the CTCFs, are more tolerant to winter coldness than Tsuga and broad-leaved trees including deciduous broad-leaved Acer and Betula, evergreen broad- leaved Cyclobalanopsis and Lithocarpus in TCBLFs. Winter coldness may actually limit the cool-side distributions of TCBLFs in the areas between -1.35℃ and -4.05℃, and the warm-side distributions of CTCFs may be controlled by competition to the species of TCBLFs. Under future climate scenarios, the vulnerable area, where current potential (suitable + marginal) habitats (80,749 km^2) shift to non-habitats, was predicted to decrease to 55.91% (45,053 km^2) of the current area. Inferring from the current vegetation distribution pattern, TCBLFs will replace declining CTCFs. Vulnerable areas predicted by models are important in determining priority of ecosystem conservation.
基金National Basic Research Program of China,No.2015CB953603National Natural Science Foundation of China,No.41671339State Key Laboratory of Earth Surface Processes and Resource Ecology,No.2017-FX-01(1)
文摘Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for the IS in Beijing metropolitan region. However, most previous studies primarily considered the Beijing metropolitan region as a whole without considering the differences and heterogeneity among the function zones. In this study, the subpixel impervious surface results in Beijing within a time series(1991, 2001, 2005, 2011 and 2015) were extracted by means of the classification and regression tree(CART) model combined with change detection models. Then based on the method of standard deviation ellipse, Lorenz curve, contribution index(CI) and landscape metrics, the spatio-temporal dynamics and variations of IS(1991, 2001, 2011 and 2015) in different function zones and districts were analyzed. It is found that the total area of impervious surface in Beijing increased dramatically during the study period, increasing about 144.18%. The deflection angle of major axis of standard deviation ellipse decreased from 47.15° to 38.82°, indicating the major development axis in Beijing gradually moved from northeast-southwest to north-south. Moreover, the heterogeneity of impervious surface’s distribution among 16 districts weakened gradually, but the CI values and landscape metrics in four function zones differed greatly. The urban function extended zone(UFEZ), the main source of the growth of IS in Beijing, had the highest CI values. Its lowest CI value was 1.79 that is still much higher than the highest CI value in other function zones. The core function zone(CFZ), the traditional aggregation zone of impervious surface, had the highest contagion index(CONTAG) values, but it contributed less than UFEZ due to its small area. The CI value of the new urban developed zone(NUDZ) increased rapidly, and it increased from negative to positive and multiplied, becoming animportant contributor to the rise of urban impervious surface. However, the ecological conservation zone(ECZ) had a constant negative contribution all the time, and its CI value decreased gradually. Moreover, the landscape metrics and centroids of impervious surface in different density classes differed greatly. The high-density impervious surface had a more compact configuration and a greater impact on the eco-environment.
基金This study was supported by the Key Science and Technology Program of Inner Mongolia(Grant No.ZDZX2018020,2020GG0007,2019GG009)Natural Science Founda-tion of Inner Mongolia(Grant No.2020MS03068)+1 种基金Research Project of China Institute of Water Resources and Hydropower Research(Grant No.MK2019J02)Grassland Talents Program of Inner Mongolia(Grant No.CYYC9013).
文摘The accurate prediction of poverty is critical to efforts of poverty reduction,and high-resolution remote sensing(HRRS)data have shown great promise for facilitating such prediction.Accordingly,the present study used HRRS with 1 m resolution and 238 households data to evaluate the utility and optimal scale of HRRS data for predicting household poverty in a grassland region of Inner Mongolia,China.The prediction of household poverty was improved by using remote sensing indicators at multiple scales,instead of indicators at a single scale,and a model that combined indicators from four scales(building land,household,neighborhood,and regional)provided the most accurate prediction of household poverty,with testing and training accuracies of 48.57%and 70.83%,respectively.Furthermore,building area was the most efficient indicator of household poverty.When compared to conducting household surveys,the analysis of HRRS data is a cheaper and more time-efficient method for predicting household poverty and,in this case study,it reduced study time and cost by about 75%and 90%,respectively.This study provides the first evaluation of HRRS data for the prediction of household poverty in pastoral areas and thus provides technical support for the identification of poverty in pastoral areas around the world.
文摘Asphaltenes have always been an attractive subject for researchers.However,the application of this fraction of the geochemical field has only been studied in a limited way.In other words,despite many studies on asphaltene structure,the application of asphaltene structures in organic geochemistry has not so far been assessed.Oil-oil correlation is a wellknown concept in geochemical studies and plays a vital role in basin modeling and the reconstruction of the burial history of basin sediments,as well as accurate characterization of the relevant petroleum system.This study aims to propose the Xray diffraction(XRD)technique as a novel method for oil-oil correlation and investigate its reliability and accuracy for different crude oils.To this end,13 crude oil samples from the Iranian sector of the Persian Gulf region,which had previously been correlated by traditional geochemical tools such as biomarker ratios and isotope values,in four distinct genetic groups,were selected and their asphaltene fractions analyzed by two prevalent methods of XRD and Fouriertransform infrared spectroscopy(FTIR).For oil-oil correlation assessment,various cross-plots,as well as principal component analysis(PCA),were conducted,based on the structural parameters of the studied asphaltenes.The results indicate that asphaltene structural parameters can also be used for oil-oil correlation purposes,their results being completely in accord with the previous classifications.The average values of distance between saturated portions(d_(r))and the distance between two aromatic layers(d_(m))of asphaltene molecules belonging to the studied oil samples are 4.69Aand 3.54A,respectively.Furthermore,the average diameter of the aromatic sheets(L_(a)),the height of the clusters(L_(c)),the number of carbons per aromatic unit(C_(au)),the number of aromatic rings per layer(R_(a)),the number of sheets in the cluster(M_(e))and aromaticity(f_(a))values of these asphaltene samples are 10.09A,34.04A,17.42A,3.78A,10.61Aand 0.26A,respectively.The results of XRD parameters indicate that plots of dr vs.d_(m),d_(r) vs.M_(e),d_(r) vs.f_(a),d_(m) vs.L_(c),L_(c) vs.L_(a),and f_(a) vs.L_(a) perform appropriately for distinguishing genetic groups.A comparison between XRD and FTIR results indicated that the XRD method is more accurate for this purpose.In addition,decision tree classification,one of the most efficacious approaches of machine learning,was employed for the geochemical groups of this study for the first time.This tree,which was constructed using XRD data,can distinguish genetic groups accurately and can also determine the characteristics of each geochemical group.In conclusion,the obtaining of structural parameters for asphaltene by the XRD technique is a novel,precise and inexpensive method,which can be deployed as a new approach for oil-oil correlation goals.The findings of this study can help in the prompt determination of genetic groups as a screening method and can also be useful for assessing oil samples affected by secondary processes.