The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By takin...The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.展开更多
To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,t...To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.展开更多
The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natura...The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natural adversarial examples has posed significant challenges, as traditional defense methods against adversarial attacks have proven to be largely ineffective against these natural adversarial examples. This paper explores defenses against these natural adversarial examples from three perspectives: adversarial examples, model architecture, and dataset. First, it employs Class Activation Mapping (CAM) to visualize how models classify natural adversarial examples, identifying several typical attack patterns. Next, various common CNN models are analyzed to evaluate their susceptibility to these attacks, revealing that different architectures exhibit varying defensive capabilities. The study finds that as the depth of a network increases, its defenses against natural adversarial examples strengthen. Lastly, Finally, the impact of dataset class distribution on the defense capability of models is examined, focusing on two aspects: the number of classes in the training set and the number of predicted classes. This study investigates how these factors influence the model’s ability to defend against natural adversarial examples. Results indicate that reducing the number of training classes enhances the model’s defense against natural adversarial examples. Additionally, under a fixed number of training classes, some CNN models show an optimal range of predicted classes for achieving the best defense performance against these adversarial examples.展开更多
Evaluating soil quality(SQ)is crucial for ensuring the long-term stability of restored slope ecosystems,yet selecting efficient assessment methods remains challenging.The aim of this study was to develop a targeted SQ...Evaluating soil quality(SQ)is crucial for ensuring the long-term stability of restored slope ecosystems,yet selecting efficient assessment methods remains challenging.The aim of this study was to develop a targeted SQ evaluation system to compare the differences in the effectiveness of ecological restoration methods for slopes.We analysed the characteristics of 18 soil physicochemical and biological indices within a total data set(TDS)for five restored slopes with distinct ecological restoration techniques and three untreated slopes(as the control)in Yichang,China.Principal component analysis,entropy weight method,and Norm were employed to identify a minimum data set(MDS)and four soil quality index(SQI)models,linear unweighted(SQI_(L-A)),linear weighted(SQI_(L-W)),nonlinear unweighted(SQI_(NL-A)),and nonlinear weighted(SQI_(NL-W)),were used to comprehensively evaluate the MDS-based SQ.The results revealed that(1)MDS,consisting of microbial biomass carbon(MBC),microbial biomass phosphorus(MBP),microbial biomass quotient(qMBC),catalase(CAT),and bulk density(BD),effectively characterized the SQ of the ecological restoration slopes;(2)the SQI_(NL-W)model demonstrated superior discrimination among different ecological restoration slopes,with a significantly greater coefficient of determination(R^(2)=0.881,P<0.01)than other SQI models;and(3)all five ecological restoration techniques effectively improved SQ of slope to varying degrees,elevating it from low to high levels,with the vegetative cement-soil eco-restoration&vegetation concrete eco-restoration technique demonstrating the best effect(SQI_(NL-W)=0.627).Our study developed a practical SQ evaluation system based on the validated MDS and the most suitable SQI model(SQI_(NL-W)).This system enables reliable assessment on the effectiveness of restoration techniques.展开更多
As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thi...As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.展开更多
To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different typ...To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different types and proportions of data noise are added to two reference data sets, Cifar-10 and Cifar-100. Then, this data containing noise is used to train deep convolutional models and classify the validation data set. The experimental results show that the noise in the data set has obvious adverse effects on deep convolutional network classification models. The adverse effects of random noise are small, but the cross-category noise among categories can significantly reduce the recognition ability of the model. Therefore, a solution is proposed to improve the quality of the data sets that are mixed into a single noise category. The model trained with a data set containing noise is used to evaluate the current training data and reclassify the categories of the anomalies to form a new data set. Repeating the above steps can greatly reduce the noise ratio, so the influence of cross-category noise can be effectively avoided.展开更多
Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP...Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.展开更多
In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponent...In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponential distribution. We assume that this lifetime data may be reported imprecisely and that this lack of precision may be described using fuzzy sets. As the direct application of the fuzzy sets methodology leads in this case to very complicated and time consuming calculations, we propose simple approximations of fuzzy numbers using shadowed sets introduced by Pedrycz (1998). The proposed methodology may be simply extended to the case of general lifetime probability distributions.展开更多
The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for ...The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.展开更多
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside...The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.展开更多
Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be creat...Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be created,deployed,and managed without worrying about the underlying infrastructure.These applications and their persisted data on one PaaS provider are not easy to port to another provider.To overcome this issue,we propose a middleware to abstract and make the database services as cloud-agnostic.The middleware supports several SQL and NoSQL data stores that can be hosted and ported among disparate PaaS providers.It facilitates the developers with data portability and data migration among relational and NoSQL-based cloud databases.NoSQL databases are fundamental to endure Big Data applications as they support the handling of an enormous volume of highly variable data while assuring fault tolerance,availability,and scalability.The implementation of the middleware depicts that using it alleviates the efforts of rewriting the application code while changing the backend database system.A working protocol of a migration tool has been developed using this middleware to facilitate the migration of the database(move existing data from a database on one cloud to a new database even on a different cloud).Although the middleware adds some overhead compared to the native code for the cloud services being used,the experimental evaluation on Twitter(a Big Data application)data set,proves this overhead is negligible.展开更多
In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI...In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.展开更多
An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part ...An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.展开更多
With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has be...With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.展开更多
Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting ins...Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.展开更多
Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyp...Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyperelastic-magnetic coupling problems is proposed.Since the solution space consists of two-phase domains,two separate networks are constructed to independently predict the solution for each phase region.In addition,a conscious point allocation strategy is incorporated to enhance the prediction precision of the PINN in regions characterized by sharp gradients.With the developed framework,the magnetic fields and deformation fields of magnetorheological elastomers(MREs)are solved under the control of hyperelastic-magnetic coupling equations.Illustrative examples are provided and contrasted with the reference results to validate the predictive accuracy of the proposed framework.Moreover,the advantages of the proposed framework in solving hyperelastic-magnetic coupling problems are validated,particularly in handling small data sets,as well as its ability in swiftly and precisely forecasting magnetostrictive motion.展开更多
Preserving the soil quality of the siltated back area in the lower reaches of the Yellow River Basin is the key to the sustainable ecological development of the Yellow River Basin.Soil quality has gradually become an ...Preserving the soil quality of the siltated back area in the lower reaches of the Yellow River Basin is the key to the sustainable ecological development of the Yellow River Basin.Soil quality has gradually become an important part of the ecological landscape construction,so the evaluation of soil quality in the lower reaches of the Yellow River is helpful for the rational utilization of soil resources,and can effectively guide the actual development and construction of the silt back area.After collecting the siltated soil under three different utilization modes in the Gaoqing County section of the lower reaches of the Yellow River Basin,16 soil physical and chemical properties were used as evaluation indexes.The principal component analysis method was used to combine the correlations between the indexes,and the suitable soil indexes were selected to establish a minimum data set for comprehensively evaluating the soil quality of the silt back soil.The results show three key aspects of this system.(1)The minimum dataset for the quality evaluation of siltated soil in the siltation area of the lower reaches of the Yellow River comprised six indexes:capillary water holding capacity,available phosphorus,water content,water-stable macroaggregate content,available potassium and alkaline hydrolyzable nitrogen.The soil quality index SQi-MDS was 0.421,the overall soil quality level was low,and the soil nutrient content was generally"nitrogen deficiency and potassium deficiency".(2)The linear fiting R^(2)=0.82737 between the full dataset and the minimum dataset indicated a positive correlation,so the minimum dataset can accurately evaluate the quality of the soil in the silt back area.(3)The soil quality index values of bare land,forest land and cultivated land were 0.321,0.581 and 0.360,respectively,with the highest soil quality in forest land and the lowest soil quality in bare land.The findings of this paper can provide a theoretical basis and reference for the rational utilization and sustainable development of sedimentary soil in the lower reaches of the Yellow River.展开更多
Soil health assessment is an important step toward understanding the potential effects of agricultural practices on crop yield, quality and human health. The objectives of this study were to select a minimum data set ...Soil health assessment is an important step toward understanding the potential effects of agricultural practices on crop yield, quality and human health. The objectives of this study were to select a minimum data set for soil health evaluation from the physical, chemical and biological properties and environmental pollution characteristics of agricultural soil and to develop a soil health diagnosis model for determining the soil health status under different planting patterns and soil types in Chongming Island of Shanghai, China. The results showed that the majority of the farmland soils in Chongming Island were in poor soil health condition, accounting for 48.9% of the survey samples, followed by the medium healthy soil, accounting for 32.2% of the survey samples and mainly distributed in the central and mid-eastern regions of the island. The indicators of pH, total organic carbon, microbial biomass carbon and Cd exerted less influence on soil health, while the soil salinization and nitrate accumulation under a greenhouse cropping pattern and phosphate fertilizer shortage in the paddy field had limited the development of soil health. Dichlorodiphenyltrichloroethanes, hexachlorocyclohexanes and Hg contributed less to soil health index (SHI) and showed no significant difference among paddy field, greenhouse and open-air vegetable/watermelon fields. The difference of the SHI of the three soil types was significant at P = 0.05. The paddy soil had the highest SHI values, followed by the gray alluvial soil, and the coastal saline soil was in a poor soil health condition, indicating a need to plant some salt-tolerant crops to effectively improve soil quality.展开更多
Assessment of soil quality is important for optimum production and natural resources conservation. Agricultural and pasture soil qualities of Deh-Sorkh region located at south of Mashhad, northeastern Iran were assess...Assessment of soil quality is important for optimum production and natural resources conservation. Agricultural and pasture soil qualities of Deh-Sorkh region located at south of Mashhad, northeastern Iran were assessed using the integrated quality index (IQI) and Nemero quality index (NQI) models in combination with two datasets, i.e., total data set (TDS) and minimum data set (MDS). In this study 6 soil properties considered as MDS were selected out of 18 properties as TDS using principle component analysis. Soil samples were divided into 3 groups based on optimum ranges of 8 soil physical quality indicators. Soil samples with the most indicators at optimum range were selected as group 1 and the samples having fewer indicators at optimum range were located in groups 2 and 3. Optimum ranges of soil pore size distribution functions were also determined as soil physical quality indices based on 8 soil physical quality indicators. Pore size distribution curves of group 1 were considered as the optimum pore size functions. The results showed that relatively high organic carbon contents could improve pore size distribution. Mean comparisons of soil physical quality indicators demonstrated that mean weight diameter of wet aggregates, structural stability index, the slope of moisture retention curve at inflection point, and plant available water content in agricultural land use decreased significantly in relation to pasture land use. In addition, the results demonstrated that the studied MDS could be a suitable representative of TDS. 78% of pasture soils had the optimum pore size distribution functions, while this parameter for agricultural soils was only 13%. In general, the soils of the studied region showed high limitations for plant growth according to the studied indicators.展开更多
文摘The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.
基金Microsoft Research Asia Internet Services in Academic Research Fund(No.FY07-RES-OPP-116)the Science and Technology Development Program of Tianjin(No.06YFGZGX05900)
文摘To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.
文摘The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natural adversarial examples has posed significant challenges, as traditional defense methods against adversarial attacks have proven to be largely ineffective against these natural adversarial examples. This paper explores defenses against these natural adversarial examples from three perspectives: adversarial examples, model architecture, and dataset. First, it employs Class Activation Mapping (CAM) to visualize how models classify natural adversarial examples, identifying several typical attack patterns. Next, various common CNN models are analyzed to evaluate their susceptibility to these attacks, revealing that different architectures exhibit varying defensive capabilities. The study finds that as the depth of a network increases, its defenses against natural adversarial examples strengthen. Lastly, Finally, the impact of dataset class distribution on the defense capability of models is examined, focusing on two aspects: the number of classes in the training set and the number of predicted classes. This study investigates how these factors influence the model’s ability to defend against natural adversarial examples. Results indicate that reducing the number of training classes enhances the model’s defense against natural adversarial examples. Additionally, under a fixed number of training classes, some CNN models show an optimal range of predicted classes for achieving the best defense performance against these adversarial examples.
基金supported by the fund project of the Key Laboratory of Geological Hazards on Three Gorges Reservoir Area(China Three Gorges University),Hubei Province,China(Grant No.2023KDZ12)Hubei Provincial Engineering Research Center of Slope Habitat Construction Technique Using Cement-based Materials(China Three Gorges University),Hubei Province,China(Grant No.2022SNJ04).
文摘Evaluating soil quality(SQ)is crucial for ensuring the long-term stability of restored slope ecosystems,yet selecting efficient assessment methods remains challenging.The aim of this study was to develop a targeted SQ evaluation system to compare the differences in the effectiveness of ecological restoration methods for slopes.We analysed the characteristics of 18 soil physicochemical and biological indices within a total data set(TDS)for five restored slopes with distinct ecological restoration techniques and three untreated slopes(as the control)in Yichang,China.Principal component analysis,entropy weight method,and Norm were employed to identify a minimum data set(MDS)and four soil quality index(SQI)models,linear unweighted(SQI_(L-A)),linear weighted(SQI_(L-W)),nonlinear unweighted(SQI_(NL-A)),and nonlinear weighted(SQI_(NL-W)),were used to comprehensively evaluate the MDS-based SQ.The results revealed that(1)MDS,consisting of microbial biomass carbon(MBC),microbial biomass phosphorus(MBP),microbial biomass quotient(qMBC),catalase(CAT),and bulk density(BD),effectively characterized the SQ of the ecological restoration slopes;(2)the SQI_(NL-W)model demonstrated superior discrimination among different ecological restoration slopes,with a significantly greater coefficient of determination(R^(2)=0.881,P<0.01)than other SQI models;and(3)all five ecological restoration techniques effectively improved SQ of slope to varying degrees,elevating it from low to high levels,with the vegetative cement-soil eco-restoration&vegetation concrete eco-restoration technique demonstrating the best effect(SQI_(NL-W)=0.627).Our study developed a practical SQ evaluation system based on the validated MDS and the most suitable SQI model(SQI_(NL-W)).This system enables reliable assessment on the effectiveness of restoration techniques.
基金The National Natural Science Foundation of China under contract Nos 40576080 and 40506036 the National"863" Project of China under contract No 2007AA12Z182
文摘As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.
基金The Science and Technology R&D Fund Project of Shenzhen(No.JCYJ2017081765149850)
文摘To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different types and proportions of data noise are added to two reference data sets, Cifar-10 and Cifar-100. Then, this data containing noise is used to train deep convolutional models and classify the validation data set. The experimental results show that the noise in the data set has obvious adverse effects on deep convolutional network classification models. The adverse effects of random noise are small, but the cross-category noise among categories can significantly reduce the recognition ability of the model. Therefore, a solution is proposed to improve the quality of the data sets that are mixed into a single noise category. The model trained with a data set containing noise is used to evaluate the current training data and reclassify the categories of the anomalies to form a new data set. Repeating the above steps can greatly reduce the noise ratio, so the influence of cross-category noise can be effectively avoided.
基金Projects(60903082,60975042)supported by the National Natural Science Foundation of ChinaProject(20070217043)supported by the Research Fund for the Doctoral Program of Higher Education of China
文摘Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
文摘In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponential distribution. We assume that this lifetime data may be reported imprecisely and that this lack of precision may be described using fuzzy sets. As the direct application of the fuzzy sets methodology leads in this case to very complicated and time consuming calculations, we propose simple approximations of fuzzy numbers using shadowed sets introduced by Pedrycz (1998). The proposed methodology may be simply extended to the case of general lifetime probability distributions.
基金Supported by the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory(Guangzhou)(No.GML2019ZD0208)the National Natural Science Foundation of China(No.41606030)+1 种基金the Science and Technology Program of Guangzhou(No.202102080363)the China Geological Survey projects(Nos.DD20190212,DD20190216)。
文摘The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.
基金supported by proposal No.OSD/BCUD/392/197 Board of Colleges and University Development,Savitribai Phule Pune University,Pune
文摘The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.
文摘Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be created,deployed,and managed without worrying about the underlying infrastructure.These applications and their persisted data on one PaaS provider are not easy to port to another provider.To overcome this issue,we propose a middleware to abstract and make the database services as cloud-agnostic.The middleware supports several SQL and NoSQL data stores that can be hosted and ported among disparate PaaS providers.It facilitates the developers with data portability and data migration among relational and NoSQL-based cloud databases.NoSQL databases are fundamental to endure Big Data applications as they support the handling of an enormous volume of highly variable data while assuring fault tolerance,availability,and scalability.The implementation of the middleware depicts that using it alleviates the efforts of rewriting the application code while changing the backend database system.A working protocol of a migration tool has been developed using this middleware to facilitate the migration of the database(move existing data from a database on one cloud to a new database even on a different cloud).Although the middleware adds some overhead compared to the native code for the cloud services being used,the experimental evaluation on Twitter(a Big Data application)data set,proves this overhead is negligible.
基金the National Key Research and Development Program of China under Grant 2018YFF0301205in part by the National Natural Science Foundation of China under Grant NSFC 61925105 and Grant 61801260.
文摘In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.
基金Sponsored by the National Basic Research Program of China(Grant No. 2006CB403302)the National Education Ministry foundation of China(Grant No.705011)the National Special Science and Technology Program Water Pollution Control and Treatment (Grant No.2009ZX07526-006,2008AX07208-001)
文摘An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.
基金National Key Research and Development Program of China(2016YFE0122600)。
文摘With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.
文摘Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.
基金supported by the National Natural Science Foundation of China(Nos.12072105 and 11932006)。
文摘Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyperelastic-magnetic coupling problems is proposed.Since the solution space consists of two-phase domains,two separate networks are constructed to independently predict the solution for each phase region.In addition,a conscious point allocation strategy is incorporated to enhance the prediction precision of the PINN in regions characterized by sharp gradients.With the developed framework,the magnetic fields and deformation fields of magnetorheological elastomers(MREs)are solved under the control of hyperelastic-magnetic coupling equations.Illustrative examples are provided and contrasted with the reference results to validate the predictive accuracy of the proposed framework.Moreover,the advantages of the proposed framework in solving hyperelastic-magnetic coupling problems are validated,particularly in handling small data sets,as well as its ability in swiftly and precisely forecasting magnetostrictive motion.
基金The Project of China Coal Geology Group Co.,Ltd.(2023HXFWSBXY005)。
文摘Preserving the soil quality of the siltated back area in the lower reaches of the Yellow River Basin is the key to the sustainable ecological development of the Yellow River Basin.Soil quality has gradually become an important part of the ecological landscape construction,so the evaluation of soil quality in the lower reaches of the Yellow River is helpful for the rational utilization of soil resources,and can effectively guide the actual development and construction of the silt back area.After collecting the siltated soil under three different utilization modes in the Gaoqing County section of the lower reaches of the Yellow River Basin,16 soil physical and chemical properties were used as evaluation indexes.The principal component analysis method was used to combine the correlations between the indexes,and the suitable soil indexes were selected to establish a minimum data set for comprehensively evaluating the soil quality of the silt back soil.The results show three key aspects of this system.(1)The minimum dataset for the quality evaluation of siltated soil in the siltation area of the lower reaches of the Yellow River comprised six indexes:capillary water holding capacity,available phosphorus,water content,water-stable macroaggregate content,available potassium and alkaline hydrolyzable nitrogen.The soil quality index SQi-MDS was 0.421,the overall soil quality level was low,and the soil nutrient content was generally"nitrogen deficiency and potassium deficiency".(2)The linear fiting R^(2)=0.82737 between the full dataset and the minimum dataset indicated a positive correlation,so the minimum dataset can accurately evaluate the quality of the soil in the silt back area.(3)The soil quality index values of bare land,forest land and cultivated land were 0.321,0.581 and 0.360,respectively,with the highest soil quality in forest land and the lowest soil quality in bare land.The findings of this paper can provide a theoretical basis and reference for the rational utilization and sustainable development of sedimentary soil in the lower reaches of the Yellow River.
基金Supported by the Major National Science and Technology Project of Water Pollution Control and Treatment,China(No.2009ZX07317-006)the National Natural Science Foundation of China(No.40971259)the Shanghai Excellent Academic Leaders Plan,China(No.10XD1401600)
文摘Soil health assessment is an important step toward understanding the potential effects of agricultural practices on crop yield, quality and human health. The objectives of this study were to select a minimum data set for soil health evaluation from the physical, chemical and biological properties and environmental pollution characteristics of agricultural soil and to develop a soil health diagnosis model for determining the soil health status under different planting patterns and soil types in Chongming Island of Shanghai, China. The results showed that the majority of the farmland soils in Chongming Island were in poor soil health condition, accounting for 48.9% of the survey samples, followed by the medium healthy soil, accounting for 32.2% of the survey samples and mainly distributed in the central and mid-eastern regions of the island. The indicators of pH, total organic carbon, microbial biomass carbon and Cd exerted less influence on soil health, while the soil salinization and nitrate accumulation under a greenhouse cropping pattern and phosphate fertilizer shortage in the paddy field had limited the development of soil health. Dichlorodiphenyltrichloroethanes, hexachlorocyclohexanes and Hg contributed less to soil health index (SHI) and showed no significant difference among paddy field, greenhouse and open-air vegetable/watermelon fields. The difference of the SHI of the three soil types was significant at P = 0.05. The paddy soil had the highest SHI values, followed by the gray alluvial soil, and the coastal saline soil was in a poor soil health condition, indicating a need to plant some salt-tolerant crops to effectively improve soil quality.
基金Supported by the Ferdowsi University of Mashhad, Iran
文摘Assessment of soil quality is important for optimum production and natural resources conservation. Agricultural and pasture soil qualities of Deh-Sorkh region located at south of Mashhad, northeastern Iran were assessed using the integrated quality index (IQI) and Nemero quality index (NQI) models in combination with two datasets, i.e., total data set (TDS) and minimum data set (MDS). In this study 6 soil properties considered as MDS were selected out of 18 properties as TDS using principle component analysis. Soil samples were divided into 3 groups based on optimum ranges of 8 soil physical quality indicators. Soil samples with the most indicators at optimum range were selected as group 1 and the samples having fewer indicators at optimum range were located in groups 2 and 3. Optimum ranges of soil pore size distribution functions were also determined as soil physical quality indices based on 8 soil physical quality indicators. Pore size distribution curves of group 1 were considered as the optimum pore size functions. The results showed that relatively high organic carbon contents could improve pore size distribution. Mean comparisons of soil physical quality indicators demonstrated that mean weight diameter of wet aggregates, structural stability index, the slope of moisture retention curve at inflection point, and plant available water content in agricultural land use decreased significantly in relation to pasture land use. In addition, the results demonstrated that the studied MDS could be a suitable representative of TDS. 78% of pasture soils had the optimum pore size distribution functions, while this parameter for agricultural soils was only 13%. In general, the soils of the studied region showed high limitations for plant growth according to the studied indicators.