When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes...When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.展开更多
Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveill...Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveillance,biometric authentication,and human-computer interaction.This paper provides a comprehensive review of face detection techniques developed to handle occluded faces.Studies are categorized into four main approaches:feature-based,machine learning-based,deep learning-based,and hybrid methods.We analyzed state-of-the-art studies within each category,examining their methodologies,strengths,and limitations based on widely used benchmark datasets,highlighting their adaptability to partial and severe occlusions.The review also identifies key challenges,including dataset diversity,model generalization,and computational efficiency.Our findings reveal that deep learning methods dominate recent studies,benefiting from their ability to extract hierarchical features and handle complex occlusion patterns.More recently,researchers have increasingly explored Transformer-based architectures,such as Vision Transformer(ViT)and Swin Transformer,to further improve detection robustness under challenging occlusion scenarios.In addition,hybrid approaches,which aim to combine traditional andmodern techniques,are emerging as a promising direction for improving robustness.This review provides valuable insights for researchers aiming to develop more robust face detection systems and for practitioners seeking to deploy reliable solutions in real-world,occlusionprone environments.Further improvements and the proposal of broader datasets are required to developmore scalable,robust,and efficient models that can handle complex occlusions in real-world scenarios.展开更多
Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geograph...Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources.展开更多
Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensi...Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.展开更多
The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficu...The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.展开更多
Inferring phylogenetic trees from molecular sequences is a cornerstone of evolutionary biology.Many standard phylogenetic methods(such as maximum-likelihood[ML])rely on explicit models of sequence evolution and thus o...Inferring phylogenetic trees from molecular sequences is a cornerstone of evolutionary biology.Many standard phylogenetic methods(such as maximum-likelihood[ML])rely on explicit models of sequence evolution and thus often suffer from model misspecification or inadequacy.The on-rising deep learning(DL)techniques offer a powerful alternative.Deep learning employs multi-layered artificial neural networks to progressively transform input data into more abstract and complex representations.DL methods can autonomously uncover meaningful patterns from data,thereby bypassing potential biases introduced by predefined features(Franklin,2005;Murphy,2012).Recent efforts have aimed to apply deep neural networks(DNNs)to phylogenetics,with a growing number of applications in tree reconstruction(Suvorov et al.,2020;Zou et al.,2020;Nesterenko et al.,2022;Smith and Hahn,2023;Wang et al.,2023),substitution model selection(Abadi et al.,2020;Burgstaller-Muehlbacher et al.,2023),and diversification rate inference(Voznica et al.,2022;Lajaaiti et al.,2023;Lambert et al.,2023).In phylogenetic tree reconstruction,PhyDL(Zou et al.,2020)and Tree_learning(Suvorov et al.,2020)are two notable DNN-based programs designed to infer unrooted quartet trees directly from alignments of four amino acid(AA)and DNA sequences,respectively.展开更多
This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,...This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.展开更多
This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ...This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.展开更多
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to...This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.展开更多
Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategi...Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategies.A high-resolution PM_(2.5) chemical composition dataset(CAQRA-aerosol)is developed in this study,which provides hourly maps of organic carbon,black carbon,ammonium,nitrate,and sulfate in China from 2013 to 2020 with a horizontal resolution of 15 km.This paper describes the method,access,and validation results of this dataset.It shows that CAQRA-aerosol has good consistency with observations and achieves higher or comparable accuracy with previous PM_(2.5) composition datasets.Based on CAQRA-aerosol,spatiotemporal changes of different PM_(2.5) compositions were investigated from a national viewpoint,which emphasizes different changes of nitrate from other compositions.The estimated annual rate of population-weighted concentrations of nitrate is 0.23μg m^(−3)yr^(−1) from 2015 to 2020,compared with−0.19 to−1.1μg m^(−3)yr^(−1) for other compositions.The whole dataset is freely available from the China Air Pollution Data Center(https://doi.org/10.12423/capdb_PKU.2023.DA).展开更多
Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors ha...Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites,they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware.It is impossible to stop attackers fromcreating phishingwebsites,which is one of the core challenges in combating them.However,this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information.In this study,five machine learning(ML)and DL algorithms—cat-boost(CATB),gradient boost(GB),random forest(RF),multilayer perceptron(MLP),and deep neural network(DNN)—were tested with three different reputable datasets and two useful feature selection techniques,to assess the scalability and consistency of each classifier’s performance on varied dataset sizes.The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.9%,95.73%,and 98.83%.The GB classifier achieved the second-best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.16%,95.18%,and 98.58%.MLP achieved the best computational time across all datasets(DS-1,DS-2,and DS-3)with respective values of 2,7,and 3 seconds despite scoring the lowest accuracy across all datasets.展开更多
The increasing adoption of Industrial Internet of Things(IIoT)systems in smart manufacturing is leading to raise cyberattack numbers and pressing the requirement for intrusion detection systems(IDS)to be effective.How...The increasing adoption of Industrial Internet of Things(IIoT)systems in smart manufacturing is leading to raise cyberattack numbers and pressing the requirement for intrusion detection systems(IDS)to be effective.However,existing datasets for IDS training often lack relevance to modern IIoT environments,limiting their applicability for research and development.To address the latter gap,this paper introduces the HiTar-2024 dataset specifically designed for IIoT systems.As a consequence,that can be used by an IDS to detect imminent threats.Likewise,HiTar-2024 was generated using the AREZZO simulator,which replicates realistic smart manufacturing scenarios.The generated dataset includes five distinct classes:Normal,Probing,Remote to Local(R2L),User to Root(U2R),and Denial of Service(DoS).Furthermore,comprehensive experiments with popular Machine Learning(ML)models using various classifiers,including BayesNet,Logistic,IBK,Multiclass,PART,and J48 demonstrate high accuracy,precision,recall,and F1-scores,exceeding 0.99 across all ML metrics.The latter result is reached thanks to the rigorous applied process to achieve this quite good result,including data pre-processing,features extraction,fixing the class imbalance problem,and using a test option for model robustness.This comprehensive approach emphasizes meticulous dataset construction through a complete dataset generation process,a careful labelling algorithm,and a sophisticated evaluation method,providing valuable insights to reinforce IIoT system security.Finally,the HiTar-2024 dataset is compared with other similar datasets in the literature,considering several factors such as data format,feature extraction tools,number of features,attack categories,number of instances,and ML metrics.展开更多
In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceana...In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.展开更多
Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence o...Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.展开更多
Lithology identification is a critical aspect of geoenergy exploration,including geothermal energy development,gas hydrate extraction,and gas storage.In recent years,artificial intelligence techniques based on drill c...Lithology identification is a critical aspect of geoenergy exploration,including geothermal energy development,gas hydrate extraction,and gas storage.In recent years,artificial intelligence techniques based on drill core images have made significant strides in lithology identification,achieving high accuracy.However,the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets.This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset(DCID),addressing the need for large-scale,high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation.DCID consists of 35 lithology categories and a total of 98,000 high-resolution images(512×512 pixels),making it the most comprehensive drill core image dataset in terms of lithology categories,image quantity,and resolution.This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks(CNNs)such as VGG,ResNet,DenseNet,MobileNet,as well as for the Vision Transformer(ViT)and MLP-Mixer,based on DCID.Additionally,the sensitivity of model performance to various parameters and image resolution is evaluated.In response to real-world challenges,we propose a real-world data augmentation(RWDA)method,leveraging slightly defective images from DCID to enhance model robustness.The study also explores the impact of real-world lighting conditions on the performance of lithology identification models.Finally,we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets,advancing the application and development of new lithology identification models for geoenergy exploration.展开更多
Deep neural networks provide accurate results for most applications.However,they need a big dataset to train properly.Providing a big dataset is a significant challenge in most applications.Image augmentation refers t...Deep neural networks provide accurate results for most applications.However,they need a big dataset to train properly.Providing a big dataset is a significant challenge in most applications.Image augmentation refers to techniques that increase the amount of image data.Common operations for image augmentation include changes in illumination,rotation,contrast,size,viewing angle,and others.Recently,Generative Adversarial Networks(GANs)have been employed for image generation.However,like image augmentation methods,GAN approaches can only generate images that are similar to the original images.Therefore,they also cannot generate new classes of data.Texture images presentmore challenges than general images,and generating textures is more complex than creating other types of images.This study proposes a gradient-based deep neural network method that generates a new class of texture.It is possible to rapidly generate new classes of textures using different kernels from pre-trained deep networks.After generating new textures for each class,the number of textures increases through image augmentation.During this process,several techniques are proposed to automatically remove incomplete and similar textures that are created.The proposed method is faster than some well-known generative networks by around 4 to 10 times.In addition,the quality of the generated textures surpasses that of these networks.The proposed method can generate textures that surpass those of someGANs and parametric models in certain image qualitymetrics.It can provide a big texture dataset to train deep networks.A new big texture dataset is created artificially using the proposed method.This dataset is approximately 2 GB in size and comprises 30,000 textures,each 150×150 pixels in size,organized into 600 classes.It is uploaded to the Kaggle site and Google Drive.This dataset is called BigTex.Compared to other texture datasets,the proposed dataset is the largest and can serve as a comprehensive texture dataset for training more powerful deep neural networks and mitigating overfitting.展开更多
The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These con...The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.展开更多
The source region of the Yellow River, accounting for over 38% of its total runoff, is a critical catchment area,primarily characterized by alpine grasslands. In 2005, the Maqu land surface processes observational sit...The source region of the Yellow River, accounting for over 38% of its total runoff, is a critical catchment area,primarily characterized by alpine grasslands. In 2005, the Maqu land surface processes observational site was established to monitor climate, land surface dynamics, and hydrological variability in this region. Over a 10-year period(2010–19), an extensive observational dataset was compiled, now available to the scientific community. This dataset includes comprehensive details on site characteristics, instrumentation, and data processing methods, covering meteorological and radiative fluxes, energy exchanges, soil moisture dynamics, and heat transfer properties. The dataset is particularly valuable for researchers studying land surface processes, land–atmosphere interactions, and climate modeling, and may also benefit ecological, hydrological, and water resource studies. The report ends with a discussion on perspectives and challenges of continued observational monitoring in this region, focusing on issues such as cryosphere influences, complex topography,and ecological changes like the encroachment of weeds and scrubland.展开更多
Land use/cover change is an important parameter in the climate and ecological simulations. Although they had been widely used in the community, SAGE dataset and HYDE dataset, the two representative global historical l...Land use/cover change is an important parameter in the climate and ecological simulations. Although they had been widely used in the community, SAGE dataset and HYDE dataset, the two representative global historical land use datasets, were little assessed about their accuracies in regional scale. Here, we carried out some assessments for the traditional cultivated region of China (TCRC) over last 300 years, by comparing SAGE2010 and HYDE (v3.1) with Chinese Historical Cropland Dataset (CHCD). The comparisons were performed at three spatial scales: entire study area, provincial area and 60 km by 60 km grid cell. The results show that (1) the cropland area from SAGE2010 was much more than that from CHCD moreover, the growth at a rate of 0.51% from 1700 to 1950 and -0.34% after 1950 were also inconsistent with that from CHCD. (2) HYDE dataset (v3.1) was closer to CHCD dataset than SAGE dataset on entire study area. However, the large biases could be detected at provincial scale and 60 km by 60 km grid cell scale. The percent of grid cells having biases greater than 70% (〈-70% or 〉70%) and 90% (〈-90% or 〉90%) accounted for 56%-63% and 40%-45% of the total grid cells respectively while those having biases range from -10% to 10% and from -30% to 30% account for only 5%-6% and 17% of the total grid cells respectively. (3) Using local historical archives to reconstruct historical dataset with high accuracy would be a valu- able way to improve the accuracy of climate and ecological simulation.展开更多
We analyzed the spatial local accuracy of land cover (LC) datasets for the Qiangtang Plateau,High Asia,incorporating 923 field sampling points and seven LC compilations including the International Geosphere Biosphere ...We analyzed the spatial local accuracy of land cover (LC) datasets for the Qiangtang Plateau,High Asia,incorporating 923 field sampling points and seven LC compilations including the International Geosphere Biosphere Programme Data and Information System (IGBPDIS),Global Land cover mapping at 30 m resolution (GlobeLand30),MODIS Land Cover Type product (MCD12Q1),Climate Change Initiative Land Cover (CCI-LC),Global Land Cover 2000 (GLC2000),University of Maryland (UMD),and GlobCover 2009 (Glob- Cover).We initially compared resultant similarities and differences in both area and spatial patterns and analyzed inherent relationships with data sources.We then applied a geographically weighted regression (GWR) approach to predict local accuracy variation.The results of this study reveal that distinct differences,even inverse time series trends,in LC data between CCI-LC and MCD12Q1 were present between 2001 and 2015,with the exception of category areal discordance between the seven datasets.We also show a series of evident discrepancies amongst the LC datasets sampled here in terms of spatial patterns,that is,high spatial congruence is mainly seen in the homogeneous southeastern region of the study area while a low degree of spatial congruence is widely distributed across heterogeneous northwestern and northeastern regions.The overall combined spatial accuracy of the seven LC datasets considered here is less than 70%,and the GlobeLand30 and CCI-LC datasets exhibit higher local accuracy than their counterparts,yielding maximum overall accuracy (OA) values of 77.39% and 61.43%,respectively.Finally,5.63% of this area is characterized by both high assessment and accuracy (HH) values,mainly located in central and eastern regions of the Qiangtang Plateau,while most low accuracy regions are found in northern,northeastern,and western regions.展开更多
基金supported by the Natural Science Basic Research Program of Shaanxi(Program No.2024JC-YBMS-026).
文摘When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.
基金funded by A’Sharqiyah University,Sultanate of Oman,under Research Project grant number(BFP/RGP/ICT/22/490).
文摘Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveillance,biometric authentication,and human-computer interaction.This paper provides a comprehensive review of face detection techniques developed to handle occluded faces.Studies are categorized into four main approaches:feature-based,machine learning-based,deep learning-based,and hybrid methods.We analyzed state-of-the-art studies within each category,examining their methodologies,strengths,and limitations based on widely used benchmark datasets,highlighting their adaptability to partial and severe occlusions.The review also identifies key challenges,including dataset diversity,model generalization,and computational efficiency.Our findings reveal that deep learning methods dominate recent studies,benefiting from their ability to extract hierarchical features and handle complex occlusion patterns.More recently,researchers have increasingly explored Transformer-based architectures,such as Vision Transformer(ViT)and Swin Transformer,to further improve detection robustness under challenging occlusion scenarios.In addition,hybrid approaches,which aim to combine traditional andmodern techniques,are emerging as a promising direction for improving robustness.This review provides valuable insights for researchers aiming to develop more robust face detection systems and for practitioners seeking to deploy reliable solutions in real-world,occlusionprone environments.Further improvements and the proposal of broader datasets are required to developmore scalable,robust,and efficient models that can handle complex occlusions in real-world scenarios.
文摘Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources.
文摘Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.
文摘The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration.
基金supported by the National Key R&D Program of China(2022YFD1401600)the National Science Foundation for Distinguished Young Scholars of Zhejang Province,China(LR23C140001)supported by the Key Area Research and Development Program of Guangdong Province,China(2018B020205003 and 2020B0202090001).
文摘Inferring phylogenetic trees from molecular sequences is a cornerstone of evolutionary biology.Many standard phylogenetic methods(such as maximum-likelihood[ML])rely on explicit models of sequence evolution and thus often suffer from model misspecification or inadequacy.The on-rising deep learning(DL)techniques offer a powerful alternative.Deep learning employs multi-layered artificial neural networks to progressively transform input data into more abstract and complex representations.DL methods can autonomously uncover meaningful patterns from data,thereby bypassing potential biases introduced by predefined features(Franklin,2005;Murphy,2012).Recent efforts have aimed to apply deep neural networks(DNNs)to phylogenetics,with a growing number of applications in tree reconstruction(Suvorov et al.,2020;Zou et al.,2020;Nesterenko et al.,2022;Smith and Hahn,2023;Wang et al.,2023),substitution model selection(Abadi et al.,2020;Burgstaller-Muehlbacher et al.,2023),and diversification rate inference(Voznica et al.,2022;Lajaaiti et al.,2023;Lambert et al.,2023).In phylogenetic tree reconstruction,PhyDL(Zou et al.,2020)and Tree_learning(Suvorov et al.,2020)are two notable DNN-based programs designed to infer unrooted quartet trees directly from alignments of four amino acid(AA)and DNA sequences,respectively.
基金supported by a project entitled Loess Plateau Region-Watershed-Slope Geological Hazard Multi-Scale Collaborative Intelligent Early Warning System of the National Key R&D Program of China(2022YFC3003404)a project of the Shaanxi Youth Science and Technology Star(2021KJXX-87)public welfare geological survey projects of Shaanxi Institute of Geologic Survey(20180301,201918,202103,and 202413).
文摘This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.
基金National Natural Science Foundation of China(No.61971036)Fundamental Research Funds for the Central Universities(No.2023CX01011)Beijing Nova Program(No.20230484361)。
文摘This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.
文摘This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
基金support from the National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (Earth Lab)sponsored by the National Natural Science Foundation of China (Grant Nos. 42175132, 92044303, and 42205119)+2 种基金the National Key R&D Program (Grant Nos. 2020YFA0607802 and 2022YFC3703003)the CAS Information Technology Program (Grant No. CAS-WX2021SF-0107-02)the fellowship of China Postdoctoral Science Foundation (Grant No. 2022M723093)
文摘Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategies.A high-resolution PM_(2.5) chemical composition dataset(CAQRA-aerosol)is developed in this study,which provides hourly maps of organic carbon,black carbon,ammonium,nitrate,and sulfate in China from 2013 to 2020 with a horizontal resolution of 15 km.This paper describes the method,access,and validation results of this dataset.It shows that CAQRA-aerosol has good consistency with observations and achieves higher or comparable accuracy with previous PM_(2.5) composition datasets.Based on CAQRA-aerosol,spatiotemporal changes of different PM_(2.5) compositions were investigated from a national viewpoint,which emphasizes different changes of nitrate from other compositions.The estimated annual rate of population-weighted concentrations of nitrate is 0.23μg m^(−3)yr^(−1) from 2015 to 2020,compared with−0.19 to−1.1μg m^(−3)yr^(−1) for other compositions.The whole dataset is freely available from the China Air Pollution Data Center(https://doi.org/10.12423/capdb_PKU.2023.DA).
文摘Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites,they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware.It is impossible to stop attackers fromcreating phishingwebsites,which is one of the core challenges in combating them.However,this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information.In this study,five machine learning(ML)and DL algorithms—cat-boost(CATB),gradient boost(GB),random forest(RF),multilayer perceptron(MLP),and deep neural network(DNN)—were tested with three different reputable datasets and two useful feature selection techniques,to assess the scalability and consistency of each classifier’s performance on varied dataset sizes.The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.9%,95.73%,and 98.83%.The GB classifier achieved the second-best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.16%,95.18%,and 98.58%.MLP achieved the best computational time across all datasets(DS-1,DS-2,and DS-3)with respective values of 2,7,and 3 seconds despite scoring the lowest accuracy across all datasets.
文摘The increasing adoption of Industrial Internet of Things(IIoT)systems in smart manufacturing is leading to raise cyberattack numbers and pressing the requirement for intrusion detection systems(IDS)to be effective.However,existing datasets for IDS training often lack relevance to modern IIoT environments,limiting their applicability for research and development.To address the latter gap,this paper introduces the HiTar-2024 dataset specifically designed for IIoT systems.As a consequence,that can be used by an IDS to detect imminent threats.Likewise,HiTar-2024 was generated using the AREZZO simulator,which replicates realistic smart manufacturing scenarios.The generated dataset includes five distinct classes:Normal,Probing,Remote to Local(R2L),User to Root(U2R),and Denial of Service(DoS).Furthermore,comprehensive experiments with popular Machine Learning(ML)models using various classifiers,including BayesNet,Logistic,IBK,Multiclass,PART,and J48 demonstrate high accuracy,precision,recall,and F1-scores,exceeding 0.99 across all ML metrics.The latter result is reached thanks to the rigorous applied process to achieve this quite good result,including data pre-processing,features extraction,fixing the class imbalance problem,and using a test option for model robustness.This comprehensive approach emphasizes meticulous dataset construction through a complete dataset generation process,a careful labelling algorithm,and a sophisticated evaluation method,providing valuable insights to reinforce IIoT system security.Finally,the HiTar-2024 dataset is compared with other similar datasets in the literature,considering several factors such as data format,feature extraction tools,number of features,attack categories,number of instances,and ML metrics.
基金The fund from Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)under contract No.SML2021SP310the National Natural Science Foundation of China under contract Nos 42227901 and 42475061the Key R&D Program of Zhejiang Province under contract No.2024C03257.
文摘In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.
文摘Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models.
基金support from the National Natural Science Foundation of China(Nos.U24B2034,U2139204)the China Petroleum Science and Technology Innovation Fund(2021DQ02-0501)the Science and Technology Support Project of Langfang(2024011073).
文摘Lithology identification is a critical aspect of geoenergy exploration,including geothermal energy development,gas hydrate extraction,and gas storage.In recent years,artificial intelligence techniques based on drill core images have made significant strides in lithology identification,achieving high accuracy.However,the current demand for advanced lithology identification models remains unmet due to the lack of high-quality drill core image datasets.This study successfully constructs and publicly releases the first open-source Drill Core Image Dataset(DCID),addressing the need for large-scale,high-quality datasets in lithology characterization tasks within geological engineering and establishing a standard dataset for model evaluation.DCID consists of 35 lithology categories and a total of 98,000 high-resolution images(512×512 pixels),making it the most comprehensive drill core image dataset in terms of lithology categories,image quantity,and resolution.This study also provides lithology identification accuracy benchmarks for popular convolutional neural networks(CNNs)such as VGG,ResNet,DenseNet,MobileNet,as well as for the Vision Transformer(ViT)and MLP-Mixer,based on DCID.Additionally,the sensitivity of model performance to various parameters and image resolution is evaluated.In response to real-world challenges,we propose a real-world data augmentation(RWDA)method,leveraging slightly defective images from DCID to enhance model robustness.The study also explores the impact of real-world lighting conditions on the performance of lithology identification models.Finally,we demonstrate how to rapidly evaluate model performance across multiple dimensions using low-resolution datasets,advancing the application and development of new lithology identification models for geoenergy exploration.
基金supported via funding from Prince Sattam bin Abdulaziz University(PSAU/2025/R/1446)Princess Nourah bint Abdulrahman University(PNURSP2025R300)Prince Sultan University.
文摘Deep neural networks provide accurate results for most applications.However,they need a big dataset to train properly.Providing a big dataset is a significant challenge in most applications.Image augmentation refers to techniques that increase the amount of image data.Common operations for image augmentation include changes in illumination,rotation,contrast,size,viewing angle,and others.Recently,Generative Adversarial Networks(GANs)have been employed for image generation.However,like image augmentation methods,GAN approaches can only generate images that are similar to the original images.Therefore,they also cannot generate new classes of data.Texture images presentmore challenges than general images,and generating textures is more complex than creating other types of images.This study proposes a gradient-based deep neural network method that generates a new class of texture.It is possible to rapidly generate new classes of textures using different kernels from pre-trained deep networks.After generating new textures for each class,the number of textures increases through image augmentation.During this process,several techniques are proposed to automatically remove incomplete and similar textures that are created.The proposed method is faster than some well-known generative networks by around 4 to 10 times.In addition,the quality of the generated textures surpasses that of these networks.The proposed method can generate textures that surpass those of someGANs and parametric models in certain image qualitymetrics.It can provide a big texture dataset to train deep networks.A new big texture dataset is created artificially using the proposed method.This dataset is approximately 2 GB in size and comprises 30,000 textures,each 150×150 pixels in size,organized into 600 classes.It is uploaded to the Kaggle site and Google Drive.This dataset is called BigTex.Compared to other texture datasets,the proposed dataset is the largest and can serve as a comprehensive texture dataset for training more powerful deep neural networks and mitigating overfitting.
基金supported in part by NSFC under Grant Nos.62402379,U22A2029 and U24A20237.
文摘The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.
基金supported by the National Natural Science Foundation of China for Distinguished Young Scholars (Grant No.42325502)the 2nd Scientific Expedition to the Qinghai–Tibet Plateau (Grant No.2019QZKK0102)+3 种基金the West Light Foundation of the Chinese Academy of Sciences (Grant No.xbzg-zdsys-202215)the Science and Technology Research Plan of Gansu Province (Grant Nos.23JRRA654 and 20JR10RA070)the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No.QCH2019004)iLEAPS (integrated Land Ecosystem–Atmosphere Processes Study)。
文摘The source region of the Yellow River, accounting for over 38% of its total runoff, is a critical catchment area,primarily characterized by alpine grasslands. In 2005, the Maqu land surface processes observational site was established to monitor climate, land surface dynamics, and hydrological variability in this region. Over a 10-year period(2010–19), an extensive observational dataset was compiled, now available to the scientific community. This dataset includes comprehensive details on site characteristics, instrumentation, and data processing methods, covering meteorological and radiative fluxes, energy exchanges, soil moisture dynamics, and heat transfer properties. The dataset is particularly valuable for researchers studying land surface processes, land–atmosphere interactions, and climate modeling, and may also benefit ecological, hydrological, and water resource studies. The report ends with a discussion on perspectives and challenges of continued observational monitoring in this region, focusing on issues such as cryosphere influences, complex topography,and ecological changes like the encroachment of weeds and scrubland.
基金China Global Change Research Program, No.2010CB950901 National Natural Science Foundation of China, No.41271227 No.41001122
文摘Land use/cover change is an important parameter in the climate and ecological simulations. Although they had been widely used in the community, SAGE dataset and HYDE dataset, the two representative global historical land use datasets, were little assessed about their accuracies in regional scale. Here, we carried out some assessments for the traditional cultivated region of China (TCRC) over last 300 years, by comparing SAGE2010 and HYDE (v3.1) with Chinese Historical Cropland Dataset (CHCD). The comparisons were performed at three spatial scales: entire study area, provincial area and 60 km by 60 km grid cell. The results show that (1) the cropland area from SAGE2010 was much more than that from CHCD moreover, the growth at a rate of 0.51% from 1700 to 1950 and -0.34% after 1950 were also inconsistent with that from CHCD. (2) HYDE dataset (v3.1) was closer to CHCD dataset than SAGE dataset on entire study area. However, the large biases could be detected at provincial scale and 60 km by 60 km grid cell scale. The percent of grid cells having biases greater than 70% (〈-70% or 〉70%) and 90% (〈-90% or 〉90%) accounted for 56%-63% and 40%-45% of the total grid cells respectively while those having biases range from -10% to 10% and from -30% to 30% account for only 5%-6% and 17% of the total grid cells respectively. (3) Using local historical archives to reconstruct historical dataset with high accuracy would be a valu- able way to improve the accuracy of climate and ecological simulation.
基金The Strategic Priority Research Program of the Chinese Academy of Sciences,Nos.XDA20040200,XDB03030500Key Foundation Project of Basic Work of the Ministry of Science and Technology of China,No.2012FY111400National Key Technologies R&D Program,No.2012BC06B00
文摘We analyzed the spatial local accuracy of land cover (LC) datasets for the Qiangtang Plateau,High Asia,incorporating 923 field sampling points and seven LC compilations including the International Geosphere Biosphere Programme Data and Information System (IGBPDIS),Global Land cover mapping at 30 m resolution (GlobeLand30),MODIS Land Cover Type product (MCD12Q1),Climate Change Initiative Land Cover (CCI-LC),Global Land Cover 2000 (GLC2000),University of Maryland (UMD),and GlobCover 2009 (Glob- Cover).We initially compared resultant similarities and differences in both area and spatial patterns and analyzed inherent relationships with data sources.We then applied a geographically weighted regression (GWR) approach to predict local accuracy variation.The results of this study reveal that distinct differences,even inverse time series trends,in LC data between CCI-LC and MCD12Q1 were present between 2001 and 2015,with the exception of category areal discordance between the seven datasets.We also show a series of evident discrepancies amongst the LC datasets sampled here in terms of spatial patterns,that is,high spatial congruence is mainly seen in the homogeneous southeastern region of the study area while a low degree of spatial congruence is widely distributed across heterogeneous northwestern and northeastern regions.The overall combined spatial accuracy of the seven LC datasets considered here is less than 70%,and the GlobeLand30 and CCI-LC datasets exhibit higher local accuracy than their counterparts,yielding maximum overall accuracy (OA) values of 77.39% and 61.43%,respectively.Finally,5.63% of this area is characterized by both high assessment and accuracy (HH) values,mainly located in central and eastern regions of the Qiangtang Plateau,while most low accuracy regions are found in northern,northeastern,and western regions.