Even though data wrangling(DW)accounts for more than half of the machine learning(ML)process,there is a dearth of research on data wrangling and dataset preparation.Consequently,we present a valuable state-of-the-art ...Even though data wrangling(DW)accounts for more than half of the machine learning(ML)process,there is a dearth of research on data wrangling and dataset preparation.Consequently,we present a valuable state-of-the-art dataset prepared based on medical and pharmaceutical claims,as well as patient-level data collected from the transactional processing system of a Zimbabwean health insurance company.We were given two initial raw sub-datasets(one for the diabetics and one for the hypertensive patients)that had been retrieved and divided based on the International Classification of Diseases 10th Revision Code(ICD-10).Since adherence metrics measure the percentage of patients covered by prescription claims for the same medication in the same therapeutic class,within the measurement year,the created dataset only comprised data for 2022,which spanned from January 1 until December 31,2022.We acknowledge,however,that there is no universally approved compendium of DW stages;yet certain building blocks and functionalities with considerable overlaps are commonly accepted and can be considered typical of DW.In light of this,we take a pragmatic approach to generating a dataset,incorporating nine essential DW building blocks and tasks,employed iteratively and incrementally,akin to an agile approach to dataset generation.Thus,we used some of the DW tasks iteratively and incrementally throughout the DW process without restricting them to a single stage.Following a rigorous DW procedure,the created dataset was then used to build an ML model,which then demonstrates a generally good degree of performance,as evidenced by the 81%accuracy.The outcomes,techniques,and insights proffered in this article inform future researchers,data scientists,and analysts on how to conduct data mining,dataset enrichment,and DW.The created dataset can stimulate further study in areas such as evaluating medication adherence(MA)and performing ML tasks such as classifying,predicting,and clustering MA behaviours.展开更多
This paper proposes a modified version of the Dwarf Mongoose Optimization Algorithm (IDMO) for constrained engineering design problems. This optimization technique modifies the base algorithm (DMO) in three simple but...This paper proposes a modified version of the Dwarf Mongoose Optimization Algorithm (IDMO) for constrained engineering design problems. This optimization technique modifies the base algorithm (DMO) in three simple but effective ways. First, the alpha selection in IDMO differs from the DMO, where evaluating the probability value of each fitness is just a computational overhead and contributes nothing to the quality of the alpha or other group members. The fittest dwarf mongoose is selected as the alpha, and a new operator ω is introduced, which controls the alpha movement, thereby enhancing the exploration ability and exploitability of the IDMO. Second, the scout group movements are modified by randomization to introduce diversity in the search process and explore unvisited areas. Finally, the babysitter's exchange criterium is modified such that once the criterium is met, the babysitters that are exchanged interact with the dwarf mongoose exchanging them to gain information about food sources and sleeping mounds, which could result in better-fitted mongooses instead of initializing them afresh as done in DMO, then the counter is reset to zero. The proposed IDMO was used to solve the classical and CEC 2020 benchmark functions and 12 continuous/discrete engineering optimization problems. The performance of the IDMO, using different performance metrics and statistical analysis, is compared with the DMO and eight other existing algorithms. In most cases, the results show that solutions achieved by the IDMO are better than those obtained by the existing algorithms.展开更多
文摘Even though data wrangling(DW)accounts for more than half of the machine learning(ML)process,there is a dearth of research on data wrangling and dataset preparation.Consequently,we present a valuable state-of-the-art dataset prepared based on medical and pharmaceutical claims,as well as patient-level data collected from the transactional processing system of a Zimbabwean health insurance company.We were given two initial raw sub-datasets(one for the diabetics and one for the hypertensive patients)that had been retrieved and divided based on the International Classification of Diseases 10th Revision Code(ICD-10).Since adherence metrics measure the percentage of patients covered by prescription claims for the same medication in the same therapeutic class,within the measurement year,the created dataset only comprised data for 2022,which spanned from January 1 until December 31,2022.We acknowledge,however,that there is no universally approved compendium of DW stages;yet certain building blocks and functionalities with considerable overlaps are commonly accepted and can be considered typical of DW.In light of this,we take a pragmatic approach to generating a dataset,incorporating nine essential DW building blocks and tasks,employed iteratively and incrementally,akin to an agile approach to dataset generation.Thus,we used some of the DW tasks iteratively and incrementally throughout the DW process without restricting them to a single stage.Following a rigorous DW procedure,the created dataset was then used to build an ML model,which then demonstrates a generally good degree of performance,as evidenced by the 81%accuracy.The outcomes,techniques,and insights proffered in this article inform future researchers,data scientists,and analysts on how to conduct data mining,dataset enrichment,and DW.The created dataset can stimulate further study in areas such as evaluating medication adherence(MA)and performing ML tasks such as classifying,predicting,and clustering MA behaviours.
文摘This paper proposes a modified version of the Dwarf Mongoose Optimization Algorithm (IDMO) for constrained engineering design problems. This optimization technique modifies the base algorithm (DMO) in three simple but effective ways. First, the alpha selection in IDMO differs from the DMO, where evaluating the probability value of each fitness is just a computational overhead and contributes nothing to the quality of the alpha or other group members. The fittest dwarf mongoose is selected as the alpha, and a new operator ω is introduced, which controls the alpha movement, thereby enhancing the exploration ability and exploitability of the IDMO. Second, the scout group movements are modified by randomization to introduce diversity in the search process and explore unvisited areas. Finally, the babysitter's exchange criterium is modified such that once the criterium is met, the babysitters that are exchanged interact with the dwarf mongoose exchanging them to gain information about food sources and sleeping mounds, which could result in better-fitted mongooses instead of initializing them afresh as done in DMO, then the counter is reset to zero. The proposed IDMO was used to solve the classical and CEC 2020 benchmark functions and 12 continuous/discrete engineering optimization problems. The performance of the IDMO, using different performance metrics and statistical analysis, is compared with the DMO and eight other existing algorithms. In most cases, the results show that solutions achieved by the IDMO are better than those obtained by the existing algorithms.