The traditional model selection criterions try to make a balance between fitted error and model complexity. Assumptions on the distribution of the response or the noise, which may be misspecified, should be made befor...The traditional model selection criterions try to make a balance between fitted error and model complexity. Assumptions on the distribution of the response or the noise, which may be misspecified, should be made before using the traditional ones. In this ar- ticle, we give a new model selection criterion, based on the assumption that noise term in the model is independent with explanatory variables, of minimizing the association strength between regression residuals and the response, with fewer assumptions. Maximal Information Coe^cient (MIC), a recently proposed dependence measure, captures a wide range of associ- ations, and gives almost the same score to different type of relationships with equal noise, so MIC is used to measure the association strength. Furthermore, partial maximal information coefficient (PMIC) is introduced to capture the association between two variables removing a third controlling random variable. In addition, the definition of general partial relationship is given.展开更多
It is an important issue to identify important influencing factors in railway accident analysis.In this paper,employing the good measure of dependence for two-variable relationships,the maximal information coefficient...It is an important issue to identify important influencing factors in railway accident analysis.In this paper,employing the good measure of dependence for two-variable relationships,the maximal information coefficient(MIC),which can capture a wide range of associations,a complex network model for railway accident analysis is designed in which nodes denote factors of railway accidents and edges are generated between two factors of which MIC values are larger than or equal to the dependent criterion.The variety of network structure is studied.As the increasing of the dependent criterion,the network becomes to an approximate scale-free network.Moreover,employing the proposed network,important influencing factors are identified.And we find that the annual track density-gross tonnage factor is an important factor which is a cut vertex when the dependent criterion is equal to 0.3.From the network,it is found that the railway development is unbalanced for different states which is consistent with the fact.展开更多
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
In the era of big data,correlation analysis is significant because it can quickly detect the correlation between factors.And then,it has been received much attention.Due to the good properties of generality and equita...In the era of big data,correlation analysis is significant because it can quickly detect the correlation between factors.And then,it has been received much attention.Due to the good properties of generality and equitability of the maximal information coefficient(MIC),MIC is a hotspot in the research of correlation analysis.However,if the original approximate algorithm of MIC is directly applied into mining correlations in big data,the computation time is very long.Then the theoretical time complexity of the original approximate algorithm is analyzed in depth and the time complexity is n2.4 when parameters are default.And the experiments show that the large number of candidate partitions of random relationships results in long computation time.The analysis is a good preparation for the next step work of designing new fast algorithms.展开更多
Human activities have significantly impacted the land surface temperature(LST),endangering human health;however,the relationship between these two factors has not been adequately quantified.This study comprehensively ...Human activities have significantly impacted the land surface temperature(LST),endangering human health;however,the relationship between these two factors has not been adequately quantified.This study comprehensively constructs a Human Activity Intensity(HAI)index and employs the Maximal Information Coefficient,four-quadrant model,and XGBoostSHAP model to investigate the spatiotemporal relationship and influencing factors of HAI-LST in the Yellow River Basin(YRB)from 2000 to 2020.The results indicated that from 2000 to 2020,as HAI and LST increased,the static HAI-LST relationship in the YRB showed a positive correlation that continued to strengthen.This dynamic relationship exhibited conflicting development,with the proportion of coordinated to conflicting regions shifting from 1:4 to 1:2,indicating a reduction in conflict intensity.Notably,only the degree of conflict in the source area decreased significantly,whereas it intensified in the upper and lower reaches.The key factors influencing the HAI-LST relationship include fractional vegetation cover,slope,precipitation,and evapotranspiration,along with region-specific factors such as PM_(2.5),biodiversity,and elevation.Based on these findings,region-specific ecological management strategies have been proposed to mitigate conflict-prone areas and alleviate thermal stress,thereby providing important guidance for promoting harmonious development between humans and nature.展开更多
The momentum wheel assumes a dominant role as an inertial actuator for satellite attitude control systems.Due to the effects of structural aging and external interference,the momentum wheel may experience the gradual ...The momentum wheel assumes a dominant role as an inertial actuator for satellite attitude control systems.Due to the effects of structural aging and external interference,the momentum wheel may experience the gradual emergence of irreversible faults.These fault features will become apparent in the telemetry signal transmitted by the momentum wheel.This paper introduces ADTWformer,a lightweight model for long-term prediction of time series,to analyze the time evolution trend and multi-dimensional data coupling mechanism of satellite momentum wheel faults.Moreover,the incorporation of the approximate Markov blanket with the maximum information coefficient presents a novel methodology for performing correlation analysis,providing significant perspectives from a data-centric standpoint.Ultimately,the creation of an adaptive alarm mechanism allows for the successful attainment of the momentum wheel fault warning by detecting the changes in the health status curves.The analysis methodology outlined in this article has exhibited positive results in identifying instances of satellite momentum wheel failure in two scenarios,thereby showcasing considerable promise for large-scale applications.展开更多
In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection cr...In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.展开更多
The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient...The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient(MIC),interpretable ML,and automated ML.It is previously shown that the solidvolume fraction has the greatest effect on the drag force.The present study aims to quantitativelyinvestigate the influence of flow properties on mesoscale drag correction(H_(d)).The MIC results showstrong correlations between the features(i.e.,slip velocity(u^(*)_(sy))and particle volume fraction(εs))and thelabel H_(d).The interpretable ML analysis confirms this conclusion,and quantifies the contribution of u^(*)_(sy),εs and gas pressure gradient to the model as 71.9%,27.2%and 0.9%,respectively.Automated ML without theneed to select the model structure and hyperparameters is used for modeling,improving the predictionaccuracy over our previous model(Zhu et al.,2020;Ouyang,Zhu,Su,&Luo,2021).展开更多
This paper proposes a data-driven topology identification method for distribution systems with distributed energy resources(DERs).First,a neural network is trained to depict the relationship between nodal power inject...This paper proposes a data-driven topology identification method for distribution systems with distributed energy resources(DERs).First,a neural network is trained to depict the relationship between nodal power injections and voltage magnitude measurements,and then it is used to generate synthetic measurements under independent nodal power injections,thus eliminating the influence of correlated nodal power injections on topology identification.Second,a maximal information coefficient-based maximum spanning tree algorithm is developed to obtain the network topology by evaluating the dependence among the synthetic measurements.The proposed method is tested on different distribution networks and the simulation results are compared with those of other methods to validate the effectiveness of the proposed method.展开更多
基金partly supported by National Basic Research Program of China(973 Program,2011CB707802,2013CB910200)National Science Foundation of China(11201466)
文摘The traditional model selection criterions try to make a balance between fitted error and model complexity. Assumptions on the distribution of the response or the noise, which may be misspecified, should be made before using the traditional ones. In this ar- ticle, we give a new model selection criterion, based on the assumption that noise term in the model is independent with explanatory variables, of minimizing the association strength between regression residuals and the response, with fewer assumptions. Maximal Information Coe^cient (MIC), a recently proposed dependence measure, captures a wide range of associ- ations, and gives almost the same score to different type of relationships with equal noise, so MIC is used to measure the association strength. Furthermore, partial maximal information coefficient (PMIC) is introduced to capture the association between two variables removing a third controlling random variable. In addition, the definition of general partial relationship is given.
基金Supported by the Fundamental Research Funds for the Central Universities under Grant No.2016YJS087the National Natural Science Foundation of China under Grant No.U1434209the Research Foundation of State Key Laboratory of Railway Traffic Control and Safety,Beijing Jiaotong University under Grant No.RCS2016ZJ001
文摘It is an important issue to identify important influencing factors in railway accident analysis.In this paper,employing the good measure of dependence for two-variable relationships,the maximal information coefficient(MIC),which can capture a wide range of associations,a complex network model for railway accident analysis is designed in which nodes denote factors of railway accidents and edges are generated between two factors of which MIC values are larger than or equal to the dependent criterion.The variety of network structure is studied.As the increasing of the dependent criterion,the network becomes to an approximate scale-free network.Moreover,employing the proposed network,important influencing factors are identified.And we find that the annual track density-gross tonnage factor is an important factor which is a cut vertex when the dependent criterion is equal to 0.3.From the network,it is found that the railway development is unbalanced for different states which is consistent with the fact.
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
基金Supported by the China Postdoctoral Science Foundation(2019M650981)Shandong Provincial Natural Science Foundation,China(ZR2018MG003)。
文摘In the era of big data,correlation analysis is significant because it can quickly detect the correlation between factors.And then,it has been received much attention.Due to the good properties of generality and equitability of the maximal information coefficient(MIC),MIC is a hotspot in the research of correlation analysis.However,if the original approximate algorithm of MIC is directly applied into mining correlations in big data,the computation time is very long.Then the theoretical time complexity of the original approximate algorithm is analyzed in depth and the time complexity is n2.4 when parameters are default.And the experiments show that the large number of candidate partitions of random relationships results in long computation time.The analysis is a good preparation for the next step work of designing new fast algorithms.
基金Shanxi Province Graduate Research Practice Innovation Project,No.2023KY465Project on the Reform of Graduate Education and Teaching in Shanxi Province,No.2021YJJG146+1 种基金Research Project of Shanxi Provincial Cultural Relics Bureau,No.22-8-14-1400-119National Key R&D Program of China,No.2021YFB3901300。
文摘Human activities have significantly impacted the land surface temperature(LST),endangering human health;however,the relationship between these two factors has not been adequately quantified.This study comprehensively constructs a Human Activity Intensity(HAI)index and employs the Maximal Information Coefficient,four-quadrant model,and XGBoostSHAP model to investigate the spatiotemporal relationship and influencing factors of HAI-LST in the Yellow River Basin(YRB)from 2000 to 2020.The results indicated that from 2000 to 2020,as HAI and LST increased,the static HAI-LST relationship in the YRB showed a positive correlation that continued to strengthen.This dynamic relationship exhibited conflicting development,with the proportion of coordinated to conflicting regions shifting from 1:4 to 1:2,indicating a reduction in conflict intensity.Notably,only the degree of conflict in the source area decreased significantly,whereas it intensified in the upper and lower reaches.The key factors influencing the HAI-LST relationship include fractional vegetation cover,slope,precipitation,and evapotranspiration,along with region-specific factors such as PM_(2.5),biodiversity,and elevation.Based on these findings,region-specific ecological management strategies have been proposed to mitigate conflict-prone areas and alleviate thermal stress,thereby providing important guidance for promoting harmonious development between humans and nature.
基金supported by the Science Center Program of National Natural Science Foundation of China(62188101)the National Natural Science Foundation of China(61833009,61690212,51875119)+1 种基金the Heilongjiang Touyan Teamthe Guangdong Major Project of Basic and Applied Basic Research(2019B030302001)
文摘The momentum wheel assumes a dominant role as an inertial actuator for satellite attitude control systems.Due to the effects of structural aging and external interference,the momentum wheel may experience the gradual emergence of irreversible faults.These fault features will become apparent in the telemetry signal transmitted by the momentum wheel.This paper introduces ADTWformer,a lightweight model for long-term prediction of time series,to analyze the time evolution trend and multi-dimensional data coupling mechanism of satellite momentum wheel faults.Moreover,the incorporation of the approximate Markov blanket with the maximum information coefficient presents a novel methodology for performing correlation analysis,providing significant perspectives from a data-centric standpoint.Ultimately,the creation of an adaptive alarm mechanism allows for the successful attainment of the momentum wheel fault warning by detecting the changes in the health status curves.The analysis methodology outlined in this article has exhibited positive results in identifying instances of satellite momentum wheel failure in two scenarios,thereby showcasing considerable promise for large-scale applications.
基金National Natural Science Foundation of China(62161048)Sichuan Science and Technology Program(2022NSFSC0547,2022ZYD0109)。
文摘In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.
基金This work was supported by the National Natural ScienceFoundation of China(No.U1862201,91834303 and 22208208)the China Postdoctoral Science Foundation(No.2022M712056)the China National Postdoctoral Program for Innovative Talents(No.BX20220205).
文摘The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient(MIC),interpretable ML,and automated ML.It is previously shown that the solidvolume fraction has the greatest effect on the drag force.The present study aims to quantitativelyinvestigate the influence of flow properties on mesoscale drag correction(H_(d)).The MIC results showstrong correlations between the features(i.e.,slip velocity(u^(*)_(sy))and particle volume fraction(εs))and thelabel H_(d).The interpretable ML analysis confirms this conclusion,and quantifies the contribution of u^(*)_(sy),εs and gas pressure gradient to the model as 71.9%,27.2%and 0.9%,respectively.Automated ML without theneed to select the model structure and hyperparameters is used for modeling,improving the predictionaccuracy over our previous model(Zhu et al.,2020;Ouyang,Zhu,Su,&Luo,2021).
基金supported by the National Key R&D Program of China(No.2017YFB0902800)the National Natural Science Foundation of China(Grant No.52077136).
文摘This paper proposes a data-driven topology identification method for distribution systems with distributed energy resources(DERs).First,a neural network is trained to depict the relationship between nodal power injections and voltage magnitude measurements,and then it is used to generate synthetic measurements under independent nodal power injections,thus eliminating the influence of correlated nodal power injections on topology identification.Second,a maximal information coefficient-based maximum spanning tree algorithm is developed to obtain the network topology by evaluating the dependence among the synthetic measurements.The proposed method is tested on different distribution networks and the simulation results are compared with those of other methods to validate the effectiveness of the proposed method.