Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-cal...Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-called“Topp-Leone strategy”,aiming to improve its overall flexibility by adding a shape parameter.The main objective is to offer original distributions with modifiable properties,from which adaptive and pliant statistical models can be derived.For the new family,these aspects are illustrated by the means of comprehensive mathematical and numerical results.In particular,we emphasize a special distribution with three parameters based on the exponential distribution.The related model is shown to be skillful to the fitting of various lifetime data,more or less heterogeneous.Among all the possible applications,we consider two data sets of current interest,linked to the COVID-19 pandemic.They concern daily cases confirmed and recovered in Pakistan from March 24 to April 28,2020.As a result of our analyzes,the proposed model has the best fitting results in comparison to serious challengers,including the former odd Fréchet model.展开更多
This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United Sta...This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United States Dollar(USD)and the Pakistani Rupee(PKR)was formed by collecting information from a forex website as well as a collection of tweets from the business community in Pakistan containing finance-related words.The dataset was collected in raw form,and was subjected to natural language processing by way of data preprocessing.Response variable labeling was then applied to the standardized dataset,where the response variables were divided into two classes:“1”indicated an increase in the exchange rate and“−1”indicated a decrease in it.To better represent the dataset,we used linear discriminant analysis and principal component analysis to visualize the data in three-dimensional vector space.Clusters that were obtained using a sampling approach were then used for data optimization.Five machine learning classifiers—the simple logistic classifier,the random forest,bagging,naïve Bayes,and the support vector machine—were applied to the optimized dataset.The results show that the simple logistic classifier yielded the highest accuracy of 82.14%for the USD and the PKR exchange rates forecasting.展开更多
Accelerated life testing has been widely used in product life testing experiments because it can quickly provide information on the lifetime distributions by testing products or materials at higher than basic conditio...Accelerated life testing has been widely used in product life testing experiments because it can quickly provide information on the lifetime distributions by testing products or materials at higher than basic conditional levels of stress,such as pressure,temperature,vibration,voltage,or load to induce early failures.In this paper,a step stress partially accelerated life test(SSPALT)is regarded under the progressive type-II censored data with random removals.The removals from the test are considered to have the binomial distribution.The life times of the testing items are assumed to follow lengthbiased weighted Lomax distribution.The maximum likelihood method is used for estimating the model parameters of length-biased weighted Lomax.The asymptotic confidence interval estimates of the model parameters are evaluated using the Fisher information matrix.The Bayesian estimators cannot be obtained in the explicit form,so the Markov chain Monte Carlo method is employed to address this problem,which ensures both obtaining the Bayesian estimates as well as constructing the credible interval of the involved parameters.The precision of the Bayesian estimates and the maximum likelihood estimates are compared by simulations.In addition,to compare the performance of the considered confidence intervals for different parameter values and sample sizes.The Bootstrap confidence intervals give more accurate results than the approximate confidence intervals since the lengths of the former are less than the lengths of latter,for different sample sizes,observed failures,and censoring schemes,in most cases.Also,the percentile Bootstrap confidence intervals give more accurate results than Bootstrap-t since the lengths of the former are less than the lengths of latter for different sample sizes,observed failures,and censoring schemes,in most cases.Further performance comparison is conducted by the experiments with real data.展开更多
The interface between computer science and statistics has developed considerably in recent years,with exponential progress in the fields of data analysis,stochastic modeling,machine learning,econometrics,simulation,al...The interface between computer science and statistics has developed considerably in recent years,with exponential progress in the fields of data analysis,stochastic modeling,machine learning,econometrics,simulation,algorithms,classification,and networks.Innovative discoveries in this field appear every day,opening new scientific horizons for the modern world.This is especially true in the post-2020 period,with the treatment of large volumes of data that feed the daily operations of large corporations,as well as the development of artificial intelligence,including advanced machine learning techniques,particularly“deep learning”.展开更多
Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences...Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance.展开更多
This paper introduces a new rich family of distributions based on mixtures and the so-called Marshall-Olkin family of distributions.It includes a wide variety of well-established mixture distributions,ensuring a high ...This paper introduces a new rich family of distributions based on mixtures and the so-called Marshall-Olkin family of distributions.It includes a wide variety of well-established mixture distributions,ensuring a high ability for data fitting.Some distributional properties are derived for the general family.The Weibull distribution is then considered as the base-line,exhibiting a pliant four-parameter lifetime distribution.Five estimation methods for the related parameters are discussed.Bootstrap confidence intervals are also considered for these parameters.The distribution is reparametrized with location-scale parameters and it is used for a lifetime regression analysis.An extensive simulation is carried out on the esti-mation methods for distribution parameters and regression model parameters.Applications are given to two practical data sets to illustrate the applicability of the new family.展开更多
This paper is devoted to the derivation of macroscopic fluid dynamics from the Boltzmann mesoscopic dynamics of a binary mixture of hard-sphere gas particles.Specifically the hydrodynamics limit is performed by employ...This paper is devoted to the derivation of macroscopic fluid dynamics from the Boltzmann mesoscopic dynamics of a binary mixture of hard-sphere gas particles.Specifically the hydrodynamics limit is performed by employing different time and space scalings.The paper shows that,depending on the magnitude of the parameters which define the scaling,the macroscopic quantities(number density,mean velocity and local temperature)are solutions of the acoustic equation,the linear incompressible Euler equation and the incompressible Navier–Stokes equation.The derivation is formally tackled by the recent moment method proposed by[C.Bardos,et al.,J.Stat.Phys.63(1991)323]and the results generalize the analysis performed in[C.Bianca,et al.,Commun.Nonlinear Sci.Numer.Simulat.29(2015)240].展开更多
Understanding a phenomenon from observed data requires contextual and efficient statistical models.Such models are based on probability distributions having sufficiently flexible statistical properties to adapt to a m...Understanding a phenomenon from observed data requires contextual and efficient statistical models.Such models are based on probability distributions having sufficiently flexible statistical properties to adapt to a maximum of situations.Modern examples include the distributions of the truncated Fréchet generated family.In this paper,we go even further by introducing a more general family,based on a truncated version of the generalized Fréchet distribution.This generalization involves a new shape parameter modulating to the extreme some central and dispersion parameters,as well as the skewness and weight of the tails.We also investigate the main functions of the new family,stress-strength parameter,diverse functional series expansions,incomplete moments,various entropy measures,theoretical and practical parameters estimation,bivariate extensions through the use of copulas,and the estimation of the model parameters.By considering a special member of the family having the Weibull distribution as the parent,we fit two data sets of interest,one about waiting times and the other about precipitation.Solid statistical criteria attest that the proposed model is superior over other extended Weibull models,including the one derived to the former truncated Fréchet generated family.展开更多
The purpose of this research is the segmentation of lungs computed tomography(CT)scan for the diagnosis of COVID-19 by using machine learning methods.Our dataset contains data from patients who are prone to the epidem...The purpose of this research is the segmentation of lungs computed tomography(CT)scan for the diagnosis of COVID-19 by using machine learning methods.Our dataset contains data from patients who are prone to the epidemic.It contains three types of lungs CT images(Normal,Pneumonia,and COVID-19)collected from two different sources;the first one is the Radiology Department of Nishtar Hospital Multan and Civil Hospital Bahawalpur,Pakistan,and the second one is a publicly free available medical imaging database known as Radiopaedia.For the preprocessing,a novel fuzzy c-mean automated region-growing segmentation approach is deployed to take an automated region of interest(ROIs)and acquire 52 hybrid statistical features for each ROIs.Also,12 optimized statistical features are selected via the chi-square feature reduction technique.For the classification,five machine learning classifiers named as deep learning J4,multilayer perceptron,support vector machine,random forest,and naive Bayes are deployed to optimize the hybrid statistical features dataset.It is observed that the deep learning J4 has promising results(sensitivity and specificity:0.987;accuracy:98.67%)among all the deployed classifiers.As a complementary study,a statistical work is devoted to the use of a new statistical model to fit the main datasets of COVID-19 collected in Pakistan.展开更多
In this paper,we introduce a modified family of distributions that unifies three different families with only one tuning parameter;the so-called exp-G,Topp–Leone-G and exp-half-G families of distributions.We study ma...In this paper,we introduce a modified family of distributions that unifies three different families with only one tuning parameter;the so-called exp-G,Topp–Leone-G and exp-half-G families of distributions.We study mathematical properties of the proposed family,including linear representations,quantile function,probability weighted moments,reliability parameter and stochastic ordering.One of the corresponding parametric statistical model is outlined,with estimation of the parameters by the method of maximum likelihood and investigation for possible applications to glycosaminoglycans concentration level in urine over the beta Weibull and Kumaraswamy Weibull distributions.The goodness-of-fit of five other members of the family is also assessed.Regression model is also discussed using the proposed distribution and applied to establish the relationship between the glycosaminoglycans concentration level and age of the children.展开更多
On the basis of a well-established binomial structure and the socalled Poisson-Lindley distribution,a new two-parameter discrete distribution is introduced.Its properties are studied from both the theoretical and prac...On the basis of a well-established binomial structure and the socalled Poisson-Lindley distribution,a new two-parameter discrete distribution is introduced.Its properties are studied from both the theoretical and practical sides.For the theory,we discuss the moments,survival and hazard rate functions,mode and quantile function.The statistical inference on the model parameters is investigated by the maximum likelihood,moments,proportions,least square,and weighted least square estimations.A simulation study is conducted to observe the performance of the bias and mean square error of the obtained estimates.Then,applications to two practical data sets are given.Finally,we construct a new flexible count data regression model called the binomial-Poisson Lindley regression model with two practical examples in the medical area.展开更多
Let M and N be topological spaces,let G be a group,and letτ:G×M→M be a proper free action of G.In this paper,we define a Borsuk-Ulam-type property for homotopy classes of maps from M to N with respect to the pa...Let M and N be topological spaces,let G be a group,and letτ:G×M→M be a proper free action of G.In this paper,we define a Borsuk-Ulam-type property for homotopy classes of maps from M to N with respect to the pair(G,τ)that generalises the classical antipodal Borsuk-Ulam theorem of maps from the n-sphere S^(n) to R^(n).In the cases where M is a finite pathwise-connected CWcomplex,G is a finite,non-trivial Abelian group,τis a proper free cellular action,and N is either R^(2) or a compact surface without boundary different from S^(2) and RP^(2),we give an algebraic criterion involving braid groups to decide whether a free homotopy class β∈[M,N]has the Borsuk-Ulam property.As an application of this criterion,we consider the case where M is a compact surface without boundary equipped with a free actionτof the finite cyclic group Zn.In terms of the orient ability of the orbit space Mof M by the actionτ,the value of n modulo 4 and a certain algebraic condition involving the first homology group of M,we are able to determine if the single homotopy class of maps from M to R^(2) possesses the Borsuk-Ulam property with respect to(Z_(n),τ).Finally,we give some examples of surfaces on which the symmetric group acts,and for these cases,we obtain some partial results regarding the Borsuk-Ulam property for maps whose target is R^(2).展开更多
基金This work was funded by the Deanship of Scientific Research(DSR),King AbdulAziz University,Jeddah,under grant No.(G:550-247-1441).
文摘Recent studies have pointed out the potential of the odd Fréchet family(or class)of continuous distributions in fitting data of all kinds.In this article,we propose an extension of this family through the so-called“Topp-Leone strategy”,aiming to improve its overall flexibility by adding a shape parameter.The main objective is to offer original distributions with modifiable properties,from which adaptive and pliant statistical models can be derived.For the new family,these aspects are illustrated by the means of comprehensive mathematical and numerical results.In particular,we emphasize a special distribution with three parameters based on the exponential distribution.The related model is shown to be skillful to the fitting of various lifetime data,more or less heterogeneous.Among all the possible applications,we consider two data sets of current interest,linked to the COVID-19 pandemic.They concern daily cases confirmed and recovered in Pakistan from March 24 to April 28,2020.As a result of our analyzes,the proposed model has the best fitting results in comparison to serious challengers,including the former odd Fréchet model.
文摘This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United States Dollar(USD)and the Pakistani Rupee(PKR)was formed by collecting information from a forex website as well as a collection of tweets from the business community in Pakistan containing finance-related words.The dataset was collected in raw form,and was subjected to natural language processing by way of data preprocessing.Response variable labeling was then applied to the standardized dataset,where the response variables were divided into two classes:“1”indicated an increase in the exchange rate and“−1”indicated a decrease in it.To better represent the dataset,we used linear discriminant analysis and principal component analysis to visualize the data in three-dimensional vector space.Clusters that were obtained using a sampling approach were then used for data optimization.Five machine learning classifiers—the simple logistic classifier,the random forest,bagging,naïve Bayes,and the support vector machine—were applied to the optimized dataset.The results show that the simple logistic classifier yielded the highest accuracy of 82.14%for the USD and the PKR exchange rates forecasting.
基金This work was funded by the Deanship of Scientific Research(DSR),King Abdulaziz University,Jeddah,under Grant No.FP-190-42.
文摘Accelerated life testing has been widely used in product life testing experiments because it can quickly provide information on the lifetime distributions by testing products or materials at higher than basic conditional levels of stress,such as pressure,temperature,vibration,voltage,or load to induce early failures.In this paper,a step stress partially accelerated life test(SSPALT)is regarded under the progressive type-II censored data with random removals.The removals from the test are considered to have the binomial distribution.The life times of the testing items are assumed to follow lengthbiased weighted Lomax distribution.The maximum likelihood method is used for estimating the model parameters of length-biased weighted Lomax.The asymptotic confidence interval estimates of the model parameters are evaluated using the Fisher information matrix.The Bayesian estimators cannot be obtained in the explicit form,so the Markov chain Monte Carlo method is employed to address this problem,which ensures both obtaining the Bayesian estimates as well as constructing the credible interval of the involved parameters.The precision of the Bayesian estimates and the maximum likelihood estimates are compared by simulations.In addition,to compare the performance of the considered confidence intervals for different parameter values and sample sizes.The Bootstrap confidence intervals give more accurate results than the approximate confidence intervals since the lengths of the former are less than the lengths of latter,for different sample sizes,observed failures,and censoring schemes,in most cases.Also,the percentile Bootstrap confidence intervals give more accurate results than Bootstrap-t since the lengths of the former are less than the lengths of latter for different sample sizes,observed failures,and censoring schemes,in most cases.Further performance comparison is conducted by the experiments with real data.
文摘The interface between computer science and statistics has developed considerably in recent years,with exponential progress in the fields of data analysis,stochastic modeling,machine learning,econometrics,simulation,algorithms,classification,and networks.Innovative discoveries in this field appear every day,opening new scientific horizons for the modern world.This is especially true in the post-2020 period,with the treatment of large volumes of data that feed the daily operations of large corporations,as well as the development of artificial intelligence,including advanced machine learning techniques,particularly“deep learning”.
文摘Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance.
文摘This paper introduces a new rich family of distributions based on mixtures and the so-called Marshall-Olkin family of distributions.It includes a wide variety of well-established mixture distributions,ensuring a high ability for data fitting.Some distributional properties are derived for the general family.The Weibull distribution is then considered as the base-line,exhibiting a pliant four-parameter lifetime distribution.Five estimation methods for the related parameters are discussed.Bootstrap confidence intervals are also considered for these parameters.The distribution is reparametrized with location-scale parameters and it is used for a lifetime regression analysis.An extensive simulation is carried out on the esti-mation methods for distribution parameters and regression model parameters.Applications are given to two practical data sets to illustrate the applicability of the new family.
文摘This paper is devoted to the derivation of macroscopic fluid dynamics from the Boltzmann mesoscopic dynamics of a binary mixture of hard-sphere gas particles.Specifically the hydrodynamics limit is performed by employing different time and space scalings.The paper shows that,depending on the magnitude of the parameters which define the scaling,the macroscopic quantities(number density,mean velocity and local temperature)are solutions of the acoustic equation,the linear incompressible Euler equation and the incompressible Navier–Stokes equation.The derivation is formally tackled by the recent moment method proposed by[C.Bardos,et al.,J.Stat.Phys.63(1991)323]and the results generalize the analysis performed in[C.Bianca,et al.,Commun.Nonlinear Sci.Numer.Simulat.29(2015)240].
基金funded by the Deanship of Scientific Research(DSR),King AbdulAziz University,Jeddah,under Grant No.G:531-305-1441.
文摘Understanding a phenomenon from observed data requires contextual and efficient statistical models.Such models are based on probability distributions having sufficiently flexible statistical properties to adapt to a maximum of situations.Modern examples include the distributions of the truncated Fréchet generated family.In this paper,we go even further by introducing a more general family,based on a truncated version of the generalized Fréchet distribution.This generalization involves a new shape parameter modulating to the extreme some central and dispersion parameters,as well as the skewness and weight of the tails.We also investigate the main functions of the new family,stress-strength parameter,diverse functional series expansions,incomplete moments,various entropy measures,theoretical and practical parameters estimation,bivariate extensions through the use of copulas,and the estimation of the model parameters.By considering a special member of the family having the Weibull distribution as the parent,we fit two data sets of interest,one about waiting times and the other about precipitation.Solid statistical criteria attest that the proposed model is superior over other extended Weibull models,including the one derived to the former truncated Fréchet generated family.
基金support provided by the Center of Excellence in Theoretical and Computational Science(TaCS-CoE),KMUTT.Moreoverthis research project is supported by Thailand Science Research and Innovation(TSRI)Basic Research Fund:Fiscal year 2021,received by Dr.Poom Kumam,under project number 64A306000005,and sponsors URL:https://www.tsri.or.th/.
文摘The purpose of this research is the segmentation of lungs computed tomography(CT)scan for the diagnosis of COVID-19 by using machine learning methods.Our dataset contains data from patients who are prone to the epidemic.It contains three types of lungs CT images(Normal,Pneumonia,and COVID-19)collected from two different sources;the first one is the Radiology Department of Nishtar Hospital Multan and Civil Hospital Bahawalpur,Pakistan,and the second one is a publicly free available medical imaging database known as Radiopaedia.For the preprocessing,a novel fuzzy c-mean automated region-growing segmentation approach is deployed to take an automated region of interest(ROIs)and acquire 52 hybrid statistical features for each ROIs.Also,12 optimized statistical features are selected via the chi-square feature reduction technique.For the classification,five machine learning classifiers named as deep learning J4,multilayer perceptron,support vector machine,random forest,and naive Bayes are deployed to optimize the hybrid statistical features dataset.It is observed that the deep learning J4 has promising results(sensitivity and specificity:0.987;accuracy:98.67%)among all the deployed classifiers.As a complementary study,a statistical work is devoted to the use of a new statistical model to fit the main datasets of COVID-19 collected in Pakistan.
基金the financial support from Science and Engineering Research Board,Department of Science&Technology,Govt,of India,under the scheme Early Career Research Award(file no.:ECR/2017/002416).
文摘In this paper,we introduce a modified family of distributions that unifies three different families with only one tuning parameter;the so-called exp-G,Topp–Leone-G and exp-half-G families of distributions.We study mathematical properties of the proposed family,including linear representations,quantile function,probability weighted moments,reliability parameter and stochastic ordering.One of the corresponding parametric statistical model is outlined,with estimation of the parameters by the method of maximum likelihood and investigation for possible applications to glycosaminoglycans concentration level in urine over the beta Weibull and Kumaraswamy Weibull distributions.The goodness-of-fit of five other members of the family is also assessed.Regression model is also discussed using the proposed distribution and applied to establish the relationship between the glycosaminoglycans concentration level and age of the children.
文摘On the basis of a well-established binomial structure and the socalled Poisson-Lindley distribution,a new two-parameter discrete distribution is introduced.Its properties are studied from both the theoretical and practical sides.For the theory,we discuss the moments,survival and hazard rate functions,mode and quantile function.The statistical inference on the model parameters is investigated by the maximum likelihood,moments,proportions,least square,and weighted least square estimations.A simulation study is conducted to observe the performance of the bias and mean square error of the obtained estimates.Then,applications to two practical data sets are given.Finally,we construct a new flexible count data regression model called the binomial-Poisson Lindley regression model with two practical examples in the medical area.
基金supported by the CNPq project n°140836the Capes/COFECUB project n°12693/13-8+2 种基金supported by the Capes/INCTMat project n°88887.136371/2017-00-465591/2014-0partially supported by the Projeto Temático FAPESP,grant n°2016/24707-4:Topologia AlgébricaGeométrica e Diferencial。
文摘Let M and N be topological spaces,let G be a group,and letτ:G×M→M be a proper free action of G.In this paper,we define a Borsuk-Ulam-type property for homotopy classes of maps from M to N with respect to the pair(G,τ)that generalises the classical antipodal Borsuk-Ulam theorem of maps from the n-sphere S^(n) to R^(n).In the cases where M is a finite pathwise-connected CWcomplex,G is a finite,non-trivial Abelian group,τis a proper free cellular action,and N is either R^(2) or a compact surface without boundary different from S^(2) and RP^(2),we give an algebraic criterion involving braid groups to decide whether a free homotopy class β∈[M,N]has the Borsuk-Ulam property.As an application of this criterion,we consider the case where M is a compact surface without boundary equipped with a free actionτof the finite cyclic group Zn.In terms of the orient ability of the orbit space Mof M by the actionτ,the value of n modulo 4 and a certain algebraic condition involving the first homology group of M,we are able to determine if the single homotopy class of maps from M to R^(2) possesses the Borsuk-Ulam property with respect to(Z_(n),τ).Finally,we give some examples of surfaces on which the symmetric group acts,and for these cases,we obtain some partial results regarding the Borsuk-Ulam property for maps whose target is R^(2).