Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil...Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil texture, soil salinity varies with short-term time. Thus, how to choose powerful environmental predictors is especially important for soil salinity. This paper presents a similarity-based prediction approach to map soil salinity and detects powerful environmental predictors for the Huanghe(Yellow) River Delta area in China. The similarity-based approach predicts the soil salinities of unsampled locations based on the environmental similarity between unsampled and sampled locations. A dataset of 92 points with salt data at depth of 30–40 cm was divided into two subsets for prediction and validation. Topographical parameters, soil textures, distances to irrigation channels and to the coastline, land surface temperature from Moderate Resolution Imaging Spectroradiometer(MODIS), Normalized Difference Vegetation Indices(NDVIs) and land surface reflectance data from Landsat Thematic Mapper(TM) imagery were generated. The similarity-based prediction approach was applied on several combinations of different environmental factors. Based on three evaluation indices including the correlation coefficient(CC) between observed and predicted values, the mean absolute error and the root mean squared error we found that elevation, distance to irrigation channels, soil texture, night land surface temperature, NDVI, and land surface reflectance Band 5 are the optimal combination for mapping soil salinity at the 30–40 cm depth in the study area(with a CC value of 0.69 and a root mean squared error value of 0.38). Our results indicated that the similarity-based prediction approach could be a vital alternative to other methods for mapping soil salinity, especially for area with limited observation data and could be used to monitor soil salinity distributions in the future.展开更多
Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-...Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.展开更多
Tight conglomerate reservoirs are featured with extremely low permeability,strong heterogeneity and poor water injectivity.CO_(2) huff-n-puff has been considered a promising candidate to enhance oil recovery in tight ...Tight conglomerate reservoirs are featured with extremely low permeability,strong heterogeneity and poor water injectivity.CO_(2) huff-n-puff has been considered a promising candidate to enhance oil recovery in tight reservoirs,owing to its advantages in reducing oil viscosity,improving mobility ratio,quickly replenishing formation pressure,and potentially achieving a miscible state.However,reliable inhouse laboratory evaluation of CO_(2) huff-n-puff in natural conglomerate cores is challenging due to the inherent high formation pressure.In this study,we put forward an equivalent method based on the similarity of the miscibility index and Grashof number to acquire a lab-controllable pressure that features the flow characteristics of CO_(2) injection in a tight conglomerate reservoir.The impacts of depletion degree,pore volume injection of CO_(2) and soaking time on ultimate oil recovery in tight cores from the Mahu conglomerate reservoir were successfully tested at an equivalent pressure.Our results showed that oil recovery decreased with increased depletion degree while exhibiting a non-monotonic tendency(first increased and then decreased)with increased CO_(2) injection volume and soaking time.The lower oil recoveries under excess CO_(2) injection and soaking time were attributed to limited CO_(2) dissolution and asphaltene precipitation.This work guides secure and reliable laboratory design of CO_(2) huff-n-puff in tight reservoirs with high formation pressure.展开更多
Emergency events need early detection,quick response,and accuracy recover.In the era of big data,the use of social media platforms is being popularized.Social media users can be seen as social sensors to monitor real ...Emergency events need early detection,quick response,and accuracy recover.In the era of big data,the use of social media platforms is being popularized.Social media users can be seen as social sensors to monitor real time emergency events.In this paper,a similarity-based method is proposed to early detect all kinds of emergency events in social media,including natural disasters,accidents,public health events and social security events.The method focuses on clustering social media texts based on the 3 W attribute information(What,When,and Where)of events.First,with the two-step classification,emergency related messages are detected and divided into different types from the massive and irrelevant data.Second,the time and location information are respectively extracted with the regular expression matching and the BiLSTM model.Finally,the text similarity is calculated using the type,time and location information,based on which social media texts are clustered into different events.The experiments on Sina Weibo data demonstrate the superiority of the proposed framework.Case studies on some real emergency events show the proposed framework has good performance and high timeliness.As the attribute information of events is extracted during the algorithm flow,it can be described what emergency,and when and where it happened.展开更多
Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certai...Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certain models,they do not invariably guarantee the extraction of the most critical or impactful features.Prior literature underscores the significance of equitable FS practices and has proposed diverse methodologies for the identification of appropriate features.However,the challenge of discerning the most relevant and influential features persists,particularly in the context of the exponential growth and heterogeneity of big data—a challenge that is increasingly salient in modern artificial intelligence(AI)applications.In response,this study introduces an innovative,automated statistical method termed Farea Similarity for Feature Selection(FSFS).The FSFS approach computes a similarity metric for each feature by benchmarking it against the record-wise mean,thereby finding feature dependencies and mitigating the influence of outliers that could potentially distort evaluation outcomes.Features are subsequently ranked according to their similarity scores,with the threshold established at the average similarity score.Notably,lower FSFS values indicate higher similarity and stronger data correlations,whereas higher values suggest lower similarity.The FSFS method is designed not only to yield reliable evaluation metrics but also to reduce data complexity without compromising model performance.Comparative analyses were performed against several established techniques,including Chi-squared(CS),Correlation Coefficient(CC),Genetic Algorithm(GA),Exhaustive Approach,Greedy Stepwise Approach,Gain Ratio,and Filtered Subset Eval,using a variety of datasets such as the Experimental Dataset,Breast Cancer Wisconsin(Original),KDD CUP 1999,NSL-KDD,UNSW-NB15,and Edge-IIoT.In the absence of the FSFS method,the highest classifier accuracies observed were 60.00%,95.13%,97.02%,98.17%,95.86%,and 94.62%for the respective datasets.When the FSFS technique was integrated with data normalization,encoding,balancing,and feature importance selection processes,accuracies improved to 100.00%,97.81%,98.63%,98.94%,94.27%,and 98.46%,respectively.The FSFS method,with a computational complexity of O(fn log n),demonstrates robust scalability and is well-suited for datasets of large size,ensuring efficient processing even when the number of features is substantial.By automatically eliminating outliers and redundant data,FSFS reduces computational overhead,resulting in faster training and improved model performance.Overall,the FSFS framework not only optimizes performance but also enhances the interpretability and explainability of data-driven models,thereby facilitating more trustworthy decision-making in AI applications.展开更多
Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process ...Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process of discovering intelligence through similarity.This article will explore similarity intelligence,similarity-based reasoning,similarity computing and analytics.More specifically,this article looks at the similarity as an intelligence and its impact on a few areas in the real world.It explores similarity intelligence accompanying experience-based intelligence,knowledge-based intelligence,and data-based intelligence to play an important role in computer science,AI,and data science.This article explores similarity-based reasoning(SBR)and proposes three similarity-based inference rules.It then examines similarity computing and analytics,and a multiagent SBR system.The main contributions of this article are:1)Similarity intelligence is discovered from experience-based intelligence consisting of data-based intelligence and knowledge-based intelligence.2)Similarity-based reasoning,computing and analytics can be used to create similarity intelligence.The proposed approach will facilitate research and development of similarity intelligence,similarity computing and analytics,machine learning and case-based reasoning.展开更多
Knowledge-based modeling is a trend in complex system modeling technology. To extract the process knowledge from an information system, an approach of knowledge modeling based on interval-valued fuzzy rough set is pre...Knowledge-based modeling is a trend in complex system modeling technology. To extract the process knowledge from an information system, an approach of knowledge modeling based on interval-valued fuzzy rough set is presented in this paper, in which attribute reduction is a key to obtain the simplified knowledge model. Through defining dependency and inclusion functions, algorithms for attribute reduction and rule extraction are obtained. The approximation inference plays an important role in the development of the fuzzy system. To improve the inference mechanism, we provide a method of similaritybased inference in an interval-valued fuzzy environment. Combining the conventional compositional rule of inference with similarity based approximate reasoning, an inference result is deduced via rule translation, similarity matching, relation modification, and projection operation. This approach is applied to the problem of predicting welding distortion in marine structures, and the experimental results validate the effectiveness of the proposed methods of knowledge modeling and similarity-based inference.展开更多
With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications...With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications.In practical MIML tasks,the naturally skewed label distribution and label interdependence bring up the label imbalance issue and decrease model performance,which is rarely studied.To solve these problems,we propose an imbalanced multi-instance multi-label learning method via tensor product-based semantic fusion(IMIML-TPSF)to deal with label interdependence and label distribution imbalance simultaneously.Specifically,to reduce the effect of label interdependence,it models similarity between the query object and object sets of different label classes for similarity-structural features.To alleviate disturbance caused by the imbalanced label distribution,it establishes the ensemble model for imbalanced distribution features.Subsequently,IMIML-TPSF fuses two types of features by tensor product and generates the new feature vector,which can preserve the original and interactive feature information for each bag.Based on such features with rich semantics,it trains the robust generalized linear classification model and further captures label interdependence.Extensive experimental results on several datasets validate the effectiveness of IMIML-TPSF against state-of-the-art methods.展开更多
基金Under the auspices of Special Fund for Ocean Public Welfare Profession Scientific Research(No.201105020)National Natural Science Foundation of China(No.41471178,41023010,41431177)National Key Technology Innovation Project for Water Pollution Control and Remediation(No.2013ZX07103006)
文摘Spatial distribution of soil salinity can be estimated based on its environmental factors because soil salinity is strongly affected and indicated by environmental factors. Different with other properties such as soil texture, soil salinity varies with short-term time. Thus, how to choose powerful environmental predictors is especially important for soil salinity. This paper presents a similarity-based prediction approach to map soil salinity and detects powerful environmental predictors for the Huanghe(Yellow) River Delta area in China. The similarity-based approach predicts the soil salinities of unsampled locations based on the environmental similarity between unsampled and sampled locations. A dataset of 92 points with salt data at depth of 30–40 cm was divided into two subsets for prediction and validation. Topographical parameters, soil textures, distances to irrigation channels and to the coastline, land surface temperature from Moderate Resolution Imaging Spectroradiometer(MODIS), Normalized Difference Vegetation Indices(NDVIs) and land surface reflectance data from Landsat Thematic Mapper(TM) imagery were generated. The similarity-based prediction approach was applied on several combinations of different environmental factors. Based on three evaluation indices including the correlation coefficient(CC) between observed and predicted values, the mean absolute error and the root mean squared error we found that elevation, distance to irrigation channels, soil texture, night land surface temperature, NDVI, and land surface reflectance Band 5 are the optimal combination for mapping soil salinity at the 30–40 cm depth in the study area(with a CC value of 0.69 and a root mean squared error value of 0.38). Our results indicated that the similarity-based prediction approach could be a vital alternative to other methods for mapping soil salinity, especially for area with limited observation data and could be used to monitor soil salinity distributions in the future.
基金supported by the National Natural Science Foundation of China (41130530,91325301,41431177,41571212,41401237)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science,Chinese Academy of Sciences (ISSASIP1622)+1 种基金the Government Interest Related Program between Canadian Space Agency and Agriculture and Agri-Food,Canada (13MOA01002)the Natural Science Research Program of Jiangsu Province (14KJA170001)
文摘Conventional soil maps generally contain one or more soil types within a single soil polygon.But their geographic locations within the polygon are not specified.This restricts current applications of the maps in site-specific agricultural management and environmental modelling.We examined the utility of legacy pedon data for disaggregating soil polygons and the effectiveness of similarity-based prediction for making use of the under-or over-sampled legacy pedon data for the disaggregation.The method consisted of three steps.First,environmental similarities between the pedon sites and each location were computed based on soil formative environmental factors.Second,according to soil types of the pedon sites,the similarities were aggregated to derive similarity distribution for each soil type.Third,a hardening process was performed on the maps to allocate candidate soil types within the polygons.The study was conducted at the soil subgroup level in a semi-arid area situated in Manitoba,Canada.Based on 186 independent pedon sites,the evaluation of the disaggregated map of soil subgroups showed an overall accuracy of 67% and a Kappa statistic of 0.62.The map represented a better spatial pattern of soil subgroups in both detail and accuracy compared to a dominant soil subgroup map,which was commonly used in practice.Incorrect predictions mainly occurred in the agricultural plain area and the soil subgroups that are very similar in taxonomy,indicating that new environmental covariates need to be developed.We concluded that the combination of legacy pedon data with similarity-based prediction is an effective solution for soil polygon disaggregation.
基金This study is financially supported by CNPC Innovation Foundation(2020D-5007-0214)Major Strategic Project of CNPC(ZLZX2020-01-04)Beijing Municipal Excellent Talent Training Funds Youth Advanced Individual Project(2018000020124G163)。
文摘Tight conglomerate reservoirs are featured with extremely low permeability,strong heterogeneity and poor water injectivity.CO_(2) huff-n-puff has been considered a promising candidate to enhance oil recovery in tight reservoirs,owing to its advantages in reducing oil viscosity,improving mobility ratio,quickly replenishing formation pressure,and potentially achieving a miscible state.However,reliable inhouse laboratory evaluation of CO_(2) huff-n-puff in natural conglomerate cores is challenging due to the inherent high formation pressure.In this study,we put forward an equivalent method based on the similarity of the miscibility index and Grashof number to acquire a lab-controllable pressure that features the flow characteristics of CO_(2) injection in a tight conglomerate reservoir.The impacts of depletion degree,pore volume injection of CO_(2) and soaking time on ultimate oil recovery in tight cores from the Mahu conglomerate reservoir were successfully tested at an equivalent pressure.Our results showed that oil recovery decreased with increased depletion degree while exhibiting a non-monotonic tendency(first increased and then decreased)with increased CO_(2) injection volume and soaking time.The lower oil recoveries under excess CO_(2) injection and soaking time were attributed to limited CO_(2) dissolution and asphaltene precipitation.This work guides secure and reliable laboratory design of CO_(2) huff-n-puff in tight reservoirs with high formation pressure.
基金This research has been supported by the China National Key R&D Program during the 13th Five-year Plan Period(Grant No.2018YFC0807000)the China National Science Foundation for Post-doctoral Scientists(Grant No.2019M660663).
文摘Emergency events need early detection,quick response,and accuracy recover.In the era of big data,the use of social media platforms is being popularized.Social media users can be seen as social sensors to monitor real time emergency events.In this paper,a similarity-based method is proposed to early detect all kinds of emergency events in social media,including natural disasters,accidents,public health events and social security events.The method focuses on clustering social media texts based on the 3 W attribute information(What,When,and Where)of events.First,with the two-step classification,emergency related messages are detected and divided into different types from the massive and irrelevant data.Second,the time and location information are respectively extracted with the regular expression matching and the BiLSTM model.Finally,the text similarity is calculated using the type,time and location information,based on which social media texts are clustered into different events.The experiments on Sina Weibo data demonstrate the superiority of the proposed framework.Case studies on some real emergency events show the proposed framework has good performance and high timeliness.As the attribute information of events is extracted during the algorithm flow,it can be described what emergency,and when and where it happened.
文摘Feature selection(FS)is a pivotal pre-processing step in developing data-driven models,influencing reliability,performance and optimization.Although existing FS techniques can yield high-performance metrics for certain models,they do not invariably guarantee the extraction of the most critical or impactful features.Prior literature underscores the significance of equitable FS practices and has proposed diverse methodologies for the identification of appropriate features.However,the challenge of discerning the most relevant and influential features persists,particularly in the context of the exponential growth and heterogeneity of big data—a challenge that is increasingly salient in modern artificial intelligence(AI)applications.In response,this study introduces an innovative,automated statistical method termed Farea Similarity for Feature Selection(FSFS).The FSFS approach computes a similarity metric for each feature by benchmarking it against the record-wise mean,thereby finding feature dependencies and mitigating the influence of outliers that could potentially distort evaluation outcomes.Features are subsequently ranked according to their similarity scores,with the threshold established at the average similarity score.Notably,lower FSFS values indicate higher similarity and stronger data correlations,whereas higher values suggest lower similarity.The FSFS method is designed not only to yield reliable evaluation metrics but also to reduce data complexity without compromising model performance.Comparative analyses were performed against several established techniques,including Chi-squared(CS),Correlation Coefficient(CC),Genetic Algorithm(GA),Exhaustive Approach,Greedy Stepwise Approach,Gain Ratio,and Filtered Subset Eval,using a variety of datasets such as the Experimental Dataset,Breast Cancer Wisconsin(Original),KDD CUP 1999,NSL-KDD,UNSW-NB15,and Edge-IIoT.In the absence of the FSFS method,the highest classifier accuracies observed were 60.00%,95.13%,97.02%,98.17%,95.86%,and 94.62%for the respective datasets.When the FSFS technique was integrated with data normalization,encoding,balancing,and feature importance selection processes,accuracies improved to 100.00%,97.81%,98.63%,98.94%,94.27%,and 98.46%,respectively.The FSFS method,with a computational complexity of O(fn log n),demonstrates robust scalability and is well-suited for datasets of large size,ensuring efficient processing even when the number of features is substantial.By automatically eliminating outliers and redundant data,FSFS reduces computational overhead,resulting in faster training and improved model performance.Overall,the FSFS framework not only optimizes performance but also enhances the interpretability and explainability of data-driven models,thereby facilitating more trustworthy decision-making in AI applications.
文摘Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process of discovering intelligence through similarity.This article will explore similarity intelligence,similarity-based reasoning,similarity computing and analytics.More specifically,this article looks at the similarity as an intelligence and its impact on a few areas in the real world.It explores similarity intelligence accompanying experience-based intelligence,knowledge-based intelligence,and data-based intelligence to play an important role in computer science,AI,and data science.This article explores similarity-based reasoning(SBR)and proposes three similarity-based inference rules.It then examines similarity computing and analytics,and a multiagent SBR system.The main contributions of this article are:1)Similarity intelligence is discovered from experience-based intelligence consisting of data-based intelligence and knowledge-based intelligence.2)Similarity-based reasoning,computing and analytics can be used to create similarity intelligence.The proposed approach will facilitate research and development of similarity intelligence,similarity computing and analytics,machine learning and case-based reasoning.
基金supported by 2013 Comprehensive Reform Pilot of Marine Engineering Specialty(No.ZG0434)
文摘Knowledge-based modeling is a trend in complex system modeling technology. To extract the process knowledge from an information system, an approach of knowledge modeling based on interval-valued fuzzy rough set is presented in this paper, in which attribute reduction is a key to obtain the simplified knowledge model. Through defining dependency and inclusion functions, algorithms for attribute reduction and rule extraction are obtained. The approximation inference plays an important role in the development of the fuzzy system. To improve the inference mechanism, we provide a method of similaritybased inference in an interval-valued fuzzy environment. Combining the conventional compositional rule of inference with similarity based approximate reasoning, an inference result is deduced via rule translation, similarity matching, relation modification, and projection operation. This approach is applied to the problem of predicting welding distortion in marine structures, and the experimental results validate the effectiveness of the proposed methods of knowledge modeling and similarity-based inference.
基金supported by the National Natural Science Foundation of China(Grant Nos.62376281 and 62036013)the NSF for Huxiang Young Talents Program of Hunan Province(2021RC3070).
文摘With powerful expressiveness of multi-instance multi-label learning(MIML)for objects with multiple semantics and its great flexibility for complex object structures,MIML has been widely applied to various applications.In practical MIML tasks,the naturally skewed label distribution and label interdependence bring up the label imbalance issue and decrease model performance,which is rarely studied.To solve these problems,we propose an imbalanced multi-instance multi-label learning method via tensor product-based semantic fusion(IMIML-TPSF)to deal with label interdependence and label distribution imbalance simultaneously.Specifically,to reduce the effect of label interdependence,it models similarity between the query object and object sets of different label classes for similarity-structural features.To alleviate disturbance caused by the imbalanced label distribution,it establishes the ensemble model for imbalanced distribution features.Subsequently,IMIML-TPSF fuses two types of features by tensor product and generates the new feature vector,which can preserve the original and interactive feature information for each bag.Based on such features with rich semantics,it trains the robust generalized linear classification model and further captures label interdependence.Extensive experimental results on several datasets validate the effectiveness of IMIML-TPSF against state-of-the-art methods.