Customer attrition in the banking industry occurs when consumers quit using the goods and services offered by the bank for some time and,after that,end their connection with the bank.Therefore,customer retention is es...Customer attrition in the banking industry occurs when consumers quit using the goods and services offered by the bank for some time and,after that,end their connection with the bank.Therefore,customer retention is essential in today’s extremely competitive banking market.Additionally,having a solid customer base helps attract new consumers by fostering confidence and a referral from a current clientele.These factors make reducing client attrition a crucial step that banks must pursue.In our research,we aim to examine bank data and forecast which users will most likely discontinue using the bank’s services and become paying customers.We use various machine learning algorithms to analyze the data and show comparative analysis on different evaluation metrics.In addition,we developed a Data Visualization RShiny app for data science and management regarding customer churn analysis.Analyzing this data will help the bank indicate the trend and then try to retain customers on the verge of attrition.展开更多
In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclu...In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclusions.While more data become available,we find some data analysis studies were too simplistic,while others too overwhelming in answering this challenging question.We find the approach that analyzes life expectancy related data(mortality causes and rate for different cohorts)inspiring,and use this approach to study a carefully selected set of targets for comparison.In discussing the factors that matter,we argue that it is more reasonable to try to identify a set of factors that together explain the phenomenon.展开更多
Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vac...Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vaccine development for COVID-19 is unprecedented.Digital and data technologies were quickly adopted during the pandemic,showing how those technologies can be harnessed to enhance public health and healthcare.A wide range of digital data sources are being utilized and visually presented to enhance the epidemiological surveillance of COVID-19.Digital contact tracing mobile apps have been adopted by many countries to control community transmission.Deep learning has been utilized to achieve various solutions for COVID-19 disruption,including outbreak prediction,virus spread tracking.展开更多
The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This pap...The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.展开更多
Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have l...Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have learned,and are continuing to discover,about the many social,structural,and environmental determinants of health.We must fully consider the conditions in which people are born,grow,learn,work,play,and age.The study of social determinants of health has made tremendous strides in recent decades.At the same time,we have seen huge advances in how health data are collected,analyzed,and used to inform action in the health sector.It is time to merge these two fields,to harness the best from both and to improve decision-making to accelerate evidence-based action toward greater health equity.展开更多
Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpe...Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.展开更多
BACKGROUND Hepatocellular carcinoma(HCC)remains a significant public health concern in South Korea even though the incidence rates are declining.While medical travel for cancer treatment is common,its patterns and inf...BACKGROUND Hepatocellular carcinoma(HCC)remains a significant public health concern in South Korea even though the incidence rates are declining.While medical travel for cancer treatment is common,its patterns and influencing factors for patients with HCC are unknown.AIM To assess medical travel patterns and determinants and their policy implications among patients with newly diagnosed HCC in South Korea.METHODS This retrospective cohort study used the National Health Insurance Service database to identify patients with newly diagnosed HCC from 2013 to 2021.Medical travel was defined as receiving initial treatment outside one’s residential region.Patient characteristics and regional trends were analyzed,and factors influencing medical travel were identified using logistic regression analysis.RESULTS Among 64808 patients 52.4%received treatment in the capital.This proportion increased to 67.4%when including the surrounding metropolitan area.Medical travel was significantly more common among younger and wealthier patients.Patients with greater comorbidity burden or liver cirrhosis were less likely to travel.While geographic distance influenced travel patterns,high-volume academic centers in the capital attracted patients nationwide regardless of proximity.CONCLUSION This nationwide study highlighted the centralization of HCC care in the capital.This observation indicates that regional cancer hubs should be strengthened and promoted for equitable healthcare access.展开更多
The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limi...The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limits the accuracy of prediction models,which is escalated by the incompletion of data induced by the issues such as sensor failures.To address these challenges,we propose a novel approach to accommodate data insufficiency through achieving external information from incomplete data samples,which are usually discarded in existing studies.In order to fully unleash the prediction power of incomplete data,we have investigated the Multiple Imputation by Chained Equations(MICE)method that diversifies the training data through exploring the potential data patterns.The experimental results demonstrate that the proposed method significantly outperforms the baselines in the most considered scenarios while reducing the prediction root mean square error(RMSE)by up to 18.9%.Furthermore,we have also observed that the penetration of incomplete data benefits the explainability of the prediction model through facilitating the feature selection.展开更多
Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensi...Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.展开更多
Electric Vehicle Charging Systems(EVCS)are increasingly vulnerable to cybersecurity threats as they integrate deeply into smart grids and Internet ofThings(IoT)environments,raising significant security challenges.Most...Electric Vehicle Charging Systems(EVCS)are increasingly vulnerable to cybersecurity threats as they integrate deeply into smart grids and Internet ofThings(IoT)environments,raising significant security challenges.Most existing research primarily emphasizes network-level anomaly detection,leaving critical vulnerabilities at the host level underexplored.This study introduces a novel forensic analysis framework leveraging host-level data,including system logs,kernel events,and Hardware Performance Counters(HPC),to detect and analyze sophisticated cyberattacks such as cryptojacking,Denial-of-Service(DoS),and reconnaissance activities targeting EVCS.Using comprehensive forensic analysis and machine learning models,the proposed framework significantly outperforms existing methods,achieving an accuracy of 98.81%.The findings offer insights into distinct behavioral signatures associated with specific cyber threats,enabling improved cybersecurity strategies and actionable recommendations for robust EVCS infrastructure protection.展开更多
Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Dee...Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Deep Belief Network(DBN),Bidirectional Recurrent Neural Network(BiDirRNN),Gated Recurrent Unit(GRU),and others using the NASA B0005 dataset of 591,458 instances.Results indicate that DBN excels in capacity estimation,achieving orders-of-magnitude lower error values and explaining over 99.97%of the predicted variable’s variance.When computational efficiency is paramount,the Deep Neural Network(DNN)offers a strong alternative,delivering near-competitive accuracy with significantly reduced prediction times.The GRU achieves the best overall performance for SOC estimation,attaining an R^(2) of 0.9999,while the BiDirRNN provides a marginally lower error at a slightly higher computational speed.In contrast,Convolutional Neural Networks(CNN)and Radial Basis Function Networks(RBFN)exhibit relatively high error rates,making them less viable for real-world battery management.Analyses of error distributions reveal that the top-performing models cluster most predictions within tight bounds,limiting the risk of overcharging or deep discharging.These findings highlight the trade-off between accuracy and computational overhead,offering valuable guidance for battery management system(BMS)designers seeking optimal performance under constrained resources.Future work may further explore advanced data augmentation and domain adaptation techniques to enhance these models’robustness in diverse operating conditions.展开更多
Metaheuristic optimization methods are iterative search processes that aim to efficiently solve complexoptimization problems. These basically find the solution space very efficiently, often without utilizing the gradi...Metaheuristic optimization methods are iterative search processes that aim to efficiently solve complexoptimization problems. These basically find the solution space very efficiently, often without utilizing the gradientinformation, and are inspired by the bio-inspired and socially motivated heuristics. Metaheuristic optimizationalgorithms are increasingly applied to complex feature selection problems in high-dimensional medical datasets.Among these, Teaching-Learning-Based optimization (TLBO) has proven effective for continuous design tasks bybalancing exploration and exploitation phases. However, its binary version (BTLBO) suffers from limited exploitationability, often converging prematurely or getting trapped in local optima, particularly when applied to discrete featureselection tasks. Previous studies reported that BTLBO yields lower classification accuracy and higher feature subsetvariance compared to other hybrid methods in benchmark tests, motivating the development of hybrid approaches.This study proposes a novel hybrid algorithm, BTLBO-Cheetah Optimizer (BTLBO-CO), which integrates the globalexploration strength of BTLBO with the local exploitation efficiency of the Cheetah Optimization (CO) algorithm. Theobjective is to enhance the feature selection process for cancer classification tasks involving high-dimensional data. Theproposed BTLBO-CO algorithm was evaluated on six benchmark cancer datasets: 11 tumors (T), Lung Cancer (LUC),Leukemia (LEU), Small Round Blue Cell Tumor or SRBCT (SR), Diffuse Large B-cell Lymphoma or DLBCL (DL), andProstate Tumor (PT).The results demonstrate superior classification accuracy across all six datasets, achieving 93.71%,96.12%, 98.13%, 97.11%, 98.44%, and 98.84%, respectively.These results validate the effectiveness of the hybrid approachin addressing diverse feature selection challenges using a Support Vector Machine (SVM) classifier.展开更多
Sinkhole formation poses a significant geohazard in karst regions,where unpredictable subsurface erosion often necessitates costly grouting for stabilization.Accurate estimation of grout volume remains a persistent ch...Sinkhole formation poses a significant geohazard in karst regions,where unpredictable subsurface erosion often necessitates costly grouting for stabilization.Accurate estimation of grout volume remains a persistent challenge due to spatial variability,site-specific conditions,and the limitations of traditional empirical methods.This study introduces a novel machine learning-based regression model for grout volume prediction that integrates cone penetration test(CPT)-derived Sinkhole Resistance Ratio(SRR)values,spatial correlations between CPT and grouting points(GPs),and field-recorded grout volumes from six sinkhole sites in Florida.Three data trans-formation methods,the Proximal Allocation Method(PAM),the Equitable Distribution Method(EDM),and the Threshold-based Equitable Distribution Method(TEDM),were applied to distribute grout influence across CPTs,with TEDM demonstrating superior predictive performance.Synthetic data augmentation using spline method-ology further improved model robustness.A high-degree polynomial regression model,optimized with ridge regularization,achieved high accuracy(R^(2)=0.95;PEV=0.94)and significantly outperformed existing linear and logarithmic models.Results confirm that lower SRR values correlate with higher grout demand,and the proposed model reliably captures these nonlinear relationships.This research advances sinkhole remediation practice by providing a data-driven,accurate,and generalizable framework for grout volume estimation,enabling more efficient resource allocation and improved project outcomes.展开更多
In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asy...In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asymptotic stability of the trivial solution and the positive periodic solution.Finally,numerical simulations are presented to validate our results.Our results show that age-selective harvesting is more conducive to sustainable population survival than non-age-selective harvesting.展开更多
Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzhe...Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzheimer’s Disease International).The apolipoproteinε4(APOE4)allele is the strongest genetic risk factor for late-onset AD(after age 65 years).Apolipoprotein E,a lipid transporter,exists in three variants:ε2,ε3,andε4.APOEε2(APOE2)is protective against AD,APOEε3(APOE3)is neutral,while APOE4 significantly increases the risk.Individuals with one copy of APOE4 have a 4-fold greater risk of developing AD,and those with two copies face an 8-fold risk compared to non-carriers.Even in cognitively normal individuals,APOE4 carriers exhibit brain metabolic and vascular deficits decades before amyloid-beta(Aβ)plaques and neurofibrillary tau tangles emerge-the hallmark pathologies of AD(Reiman et al.,2001,2005;Thambisetty et al.,2010).Notably,studies have demonstrated reduced glucose uptake,or hypometabolism,in brain regions vulnerable to AD in asymptomatic middle-aged APOE4 carriers,long before clinical symptoms arise(Reiman et al.,2001,2005).展开更多
Background There is insufficient evidence to provide recommendations for leisure-time physical activity among workers across various occupational physical activity levels.This study aimed to assess the association of ...Background There is insufficient evidence to provide recommendations for leisure-time physical activity among workers across various occupational physical activity levels.This study aimed to assess the association of leisure-time physical activity with cardiovascular and all-cause mortality across occupational physical activity levels.Methods This study utilized individual participant data from 21 cohort studies,comprising both published and unpublished data.Eligibility criteria included individual-level data on leisure-time and occupational physical activity(categorized as sedentary,low,moderate,and high)along with data on all-cause and/or cardiovascular mortality.A 2-stage individual participant data meta-analysis was conducted,with separate analysis of each study using Cox proportional hazards models(Stage 1).These results were combined using random-effects models(Stage 2).Results Higher leisure-time physical activity levels were associated with lower all-cause and cardiovascular mortality risk across most occupational physical activity levels,for both males and females.Among males with sedentary work,high compared to sedentary leisure-time physical activity was associated with lower all-cause(hazard ratios(HR)=0.77,95%confidence interval(95%CI):0.70-0.85)and cardiovascular mortality(HR=0.76,95%CI:0.66-0.87)risk.Among males with high levels of occupational physical activity,high compared to sedentary leisure-time physical activity was associated with lower all-cause(HR=0.84,95%CI:0.74-0.97)and cardiovascular mortality(HR=0.79,95%CI:0.60-1.04)risk,while HRs for low and moderate levels of leisure-time physical activity ranged between 0.87 and 0.97 and were not statistically significant.Among females,most effects were similar but more imprecise,especially in the higher occupational physical activity levels.Conclusion Higher levels of leisure-time physical activity were generally associated with lower mortality risks.However,results for workers with moderate and high occupational physical activity levels,especially women,were more imprecise.Our findings suggests that workers may benefit from engaging in high levels of leisure-time physical activity,irrespective of their level of occupational physical activity.展开更多
It is important for modern hospital management to strengthen medical humanistic care and build a harmonious doctor-patient relationship.Innovative applications of the big data resources of patient experience in modern...It is important for modern hospital management to strengthen medical humanistic care and build a harmonious doctor-patient relationship.Innovative applications of the big data resources of patient experience in modern hospital management facilitate hospital management to realize real-time supervision,dynamic management and s&entitle decision-making based on patients experiences.It is helping the transformation of hospital management from an administrator^perspective to a patients perspective,and from experience-driven to data-driven.The technological innovations in hospital management based on patient experience data can assist the optimization and continuous improvement of healthcare quality,therefore help to increase patient satisfaction to the medical services.展开更多
Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the un...Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the unbiased estimation on condition that several hypotheses hold,especially the common trend assumption.In this paper,we gave a systematic demonstration of DID in the science of science,and the potential ways to improve the accuracy of DID method.Design/methodology/approach:At first,we reviewed the statistical assumptions,the model specification,and the application procedures of DID method.Second,to improve the necessary assumptions before conducting DID regression and the accuracy of estimation,we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention.Lastly,we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates,by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.Findings:We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results.As a case study,we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.Research limitations:This study ignored the rigorous mathematical deduction parts of DID,while focused on the practical parts.Practical implications:This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.Originality/value:This study gains insights into the usage of econometric tools in science of science.展开更多
Anomaly detection in high dimensional data is a critical research issue with serious implication in the real-world problems.Many issues in this field still unsolved,so several modern anomaly detection methods struggle...Anomaly detection in high dimensional data is a critical research issue with serious implication in the real-world problems.Many issues in this field still unsolved,so several modern anomaly detection methods struggle to maintain adequate accuracy due to the highly descriptive nature of big data.Such a phenomenon is referred to as the“curse of dimensionality”that affects traditional techniques in terms of both accuracy and performance.Thus,this research proposed a hybrid model based on Deep Autoencoder Neural Network(DANN)with five layers to reduce the difference between the input and output.The proposed model was applied to a real-world gas turbine(GT)dataset that contains 87620 columns and 56 rows.During the experiment,two issues have been investigated and solved to enhance the results.The first is the dataset class imbalance,which solved using SMOTE technique.The second issue is the poor performance,which can be solved using one of the optimization algorithms.Several optimization algorithms have been investigated and tested,including stochastic gradient descent(SGD),RMSprop,Adam and Adamax.However,Adamax optimization algorithm showed the best results when employed to train theDANNmodel.The experimental results show that our proposed model can detect the anomalies by efficiently reducing the high dimensionality of dataset with accuracy of 99.40%,F1-score of 0.9649,Area Under the Curve(AUC)rate of 0.9649,and a minimal loss function during the hybrid model training.展开更多
The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big da...The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.展开更多
文摘Customer attrition in the banking industry occurs when consumers quit using the goods and services offered by the bank for some time and,after that,end their connection with the bank.Therefore,customer retention is essential in today’s extremely competitive banking market.Additionally,having a solid customer base helps attract new consumers by fostering confidence and a referral from a current clientele.These factors make reducing client attrition a crucial step that banks must pursue.In our research,we aim to examine bank data and forecast which users will most likely discontinue using the bank’s services and become paying customers.We use various machine learning algorithms to analyze the data and show comparative analysis on different evaluation metrics.In addition,we developed a Data Visualization RShiny app for data science and management regarding customer churn analysis.Analyzing this data will help the bank indicate the trend and then try to retain customers on the verge of attrition.
基金support of funding(No.UGC/IDS(R)11/21)from the Hong Kong SAR Government.
文摘In trying to explain why Hong Kong of China ranks highest in life expectancy in the world,we review what various experts are hypothesizing,and how data science methods may be used to provide more evidence-based conclusions.While more data become available,we find some data analysis studies were too simplistic,while others too overwhelming in answering this challenging question.We find the approach that analyzes life expectancy related data(mortality causes and rate for different cohorts)inspiring,and use this approach to study a carefully selected set of targets for comparison.In discussing the factors that matter,we argue that it is more reasonable to try to identify a set of factors that together explain the phenomenon.
文摘Health data and cutting-edge technologies empower medicine and improve healthcare.It has become even more true during the COVID-19 pandemic.Through coronavirus data sharing and worldwide collaboration,the speed of vaccine development for COVID-19 is unprecedented.Digital and data technologies were quickly adopted during the pandemic,showing how those technologies can be harnessed to enhance public health and healthcare.A wide range of digital data sources are being utilized and visually presented to enhance the epidemiological surveillance of COVID-19.Digital contact tracing mobile apps have been adopted by many countries to control community transmission.Deep learning has been utilized to achieve various solutions for COVID-19 disruption,including outbreak prediction,virus spread tracking.
文摘The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.
文摘Improving population health by creating more equitable health systems is a major focus of health policy and planning today.However,before we can achieve equity in health,we must first begin by leveraging all we have learned,and are continuing to discover,about the many social,structural,and environmental determinants of health.We must fully consider the conditions in which people are born,grow,learn,work,play,and age.The study of social determinants of health has made tremendous strides in recent decades.At the same time,we have seen huge advances in how health data are collected,analyzed,and used to inform action in the health sector.It is time to merge these two fields,to harness the best from both and to improve decision-making to accelerate evidence-based action toward greater health equity.
基金supported in part by the National Key Research and Development Program of China under Grant 2024YFE0200600in part by the National Natural Science Foundation of China under Grant 62071425+3 种基金in part by the Zhejiang Key Research and Development Plan under Grant 2022C01093in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LR23F010005in part by the National Key Laboratory of Wireless Communications Foundation under Grant 2023KP01601in part by the Big Data and Intelligent Computing Key Lab of CQUPT under Grant BDIC-2023-B-001.
文摘Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.
基金Supported by Dong-A University Research Fund,No.20230598.
文摘BACKGROUND Hepatocellular carcinoma(HCC)remains a significant public health concern in South Korea even though the incidence rates are declining.While medical travel for cancer treatment is common,its patterns and influencing factors for patients with HCC are unknown.AIM To assess medical travel patterns and determinants and their policy implications among patients with newly diagnosed HCC in South Korea.METHODS This retrospective cohort study used the National Health Insurance Service database to identify patients with newly diagnosed HCC from 2013 to 2021.Medical travel was defined as receiving initial treatment outside one’s residential region.Patient characteristics and regional trends were analyzed,and factors influencing medical travel were identified using logistic regression analysis.RESULTS Among 64808 patients 52.4%received treatment in the capital.This proportion increased to 67.4%when including the surrounding metropolitan area.Medical travel was significantly more common among younger and wealthier patients.Patients with greater comorbidity burden or liver cirrhosis were less likely to travel.While geographic distance influenced travel patterns,high-volume academic centers in the capital attracted patients nationwide regardless of proximity.CONCLUSION This nationwide study highlighted the centralization of HCC care in the capital.This observation indicates that regional cancer hubs should be strengthened and promoted for equitable healthcare access.
文摘The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limits the accuracy of prediction models,which is escalated by the incompletion of data induced by the issues such as sensor failures.To address these challenges,we propose a novel approach to accommodate data insufficiency through achieving external information from incomplete data samples,which are usually discarded in existing studies.In order to fully unleash the prediction power of incomplete data,we have investigated the Multiple Imputation by Chained Equations(MICE)method that diversifies the training data through exploring the potential data patterns.The experimental results demonstrate that the proposed method significantly outperforms the baselines in the most considered scenarios while reducing the prediction root mean square error(RMSE)by up to 18.9%.Furthermore,we have also observed that the penetration of incomplete data benefits the explainability of the prediction model through facilitating the feature selection.
文摘Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research.
文摘Electric Vehicle Charging Systems(EVCS)are increasingly vulnerable to cybersecurity threats as they integrate deeply into smart grids and Internet ofThings(IoT)environments,raising significant security challenges.Most existing research primarily emphasizes network-level anomaly detection,leaving critical vulnerabilities at the host level underexplored.This study introduces a novel forensic analysis framework leveraging host-level data,including system logs,kernel events,and Hardware Performance Counters(HPC),to detect and analyze sophisticated cyberattacks such as cryptojacking,Denial-of-Service(DoS),and reconnaissance activities targeting EVCS.Using comprehensive forensic analysis and machine learning models,the proposed framework significantly outperforms existing methods,achieving an accuracy of 98.81%.The findings offer insights into distinct behavioral signatures associated with specific cyber threats,enabling improved cybersecurity strategies and actionable recommendations for robust EVCS infrastructure protection.
文摘Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Deep Belief Network(DBN),Bidirectional Recurrent Neural Network(BiDirRNN),Gated Recurrent Unit(GRU),and others using the NASA B0005 dataset of 591,458 instances.Results indicate that DBN excels in capacity estimation,achieving orders-of-magnitude lower error values and explaining over 99.97%of the predicted variable’s variance.When computational efficiency is paramount,the Deep Neural Network(DNN)offers a strong alternative,delivering near-competitive accuracy with significantly reduced prediction times.The GRU achieves the best overall performance for SOC estimation,attaining an R^(2) of 0.9999,while the BiDirRNN provides a marginally lower error at a slightly higher computational speed.In contrast,Convolutional Neural Networks(CNN)and Radial Basis Function Networks(RBFN)exhibit relatively high error rates,making them less viable for real-world battery management.Analyses of error distributions reveal that the top-performing models cluster most predictions within tight bounds,limiting the risk of overcharging or deep discharging.These findings highlight the trade-off between accuracy and computational overhead,offering valuable guidance for battery management system(BMS)designers seeking optimal performance under constrained resources.Future work may further explore advanced data augmentation and domain adaptation techniques to enhance these models’robustness in diverse operating conditions.
基金funded by the Deanship of Research andGraduate Studies at King Khalid University through the Large Research Project under grant number RGP2/417/46.
文摘Metaheuristic optimization methods are iterative search processes that aim to efficiently solve complexoptimization problems. These basically find the solution space very efficiently, often without utilizing the gradientinformation, and are inspired by the bio-inspired and socially motivated heuristics. Metaheuristic optimizationalgorithms are increasingly applied to complex feature selection problems in high-dimensional medical datasets.Among these, Teaching-Learning-Based optimization (TLBO) has proven effective for continuous design tasks bybalancing exploration and exploitation phases. However, its binary version (BTLBO) suffers from limited exploitationability, often converging prematurely or getting trapped in local optima, particularly when applied to discrete featureselection tasks. Previous studies reported that BTLBO yields lower classification accuracy and higher feature subsetvariance compared to other hybrid methods in benchmark tests, motivating the development of hybrid approaches.This study proposes a novel hybrid algorithm, BTLBO-Cheetah Optimizer (BTLBO-CO), which integrates the globalexploration strength of BTLBO with the local exploitation efficiency of the Cheetah Optimization (CO) algorithm. Theobjective is to enhance the feature selection process for cancer classification tasks involving high-dimensional data. Theproposed BTLBO-CO algorithm was evaluated on six benchmark cancer datasets: 11 tumors (T), Lung Cancer (LUC),Leukemia (LEU), Small Round Blue Cell Tumor or SRBCT (SR), Diffuse Large B-cell Lymphoma or DLBCL (DL), andProstate Tumor (PT).The results demonstrate superior classification accuracy across all six datasets, achieving 93.71%,96.12%, 98.13%, 97.11%, 98.44%, and 98.84%, respectively.These results validate the effectiveness of the hybrid approachin addressing diverse feature selection challenges using a Support Vector Machine (SVM) classifier.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1C1C1005409)supported by the Korea Agency for Infrastructure Technology Advancement(KAIA)grant funded by the Ministry of Land,Infrastructure and Trans-port(Grant RS-2023-00251002)+2 种基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Grant No.RS-2025-00516147)support provided by the NSF PREM program(DMR-2122178)the Institute of Advanced Manufacturing(IAM)at the University of Texas at Rio Grande Valley(UTRGV).
文摘Sinkhole formation poses a significant geohazard in karst regions,where unpredictable subsurface erosion often necessitates costly grouting for stabilization.Accurate estimation of grout volume remains a persistent challenge due to spatial variability,site-specific conditions,and the limitations of traditional empirical methods.This study introduces a novel machine learning-based regression model for grout volume prediction that integrates cone penetration test(CPT)-derived Sinkhole Resistance Ratio(SRR)values,spatial correlations between CPT and grouting points(GPs),and field-recorded grout volumes from six sinkhole sites in Florida.Three data trans-formation methods,the Proximal Allocation Method(PAM),the Equitable Distribution Method(EDM),and the Threshold-based Equitable Distribution Method(TEDM),were applied to distribute grout influence across CPTs,with TEDM demonstrating superior predictive performance.Synthetic data augmentation using spline method-ology further improved model robustness.A high-degree polynomial regression model,optimized with ridge regularization,achieved high accuracy(R^(2)=0.95;PEV=0.94)and significantly outperformed existing linear and logarithmic models.Results confirm that lower SRR values correlate with higher grout demand,and the proposed model reliably captures these nonlinear relationships.This research advances sinkhole remediation practice by providing a data-driven,accurate,and generalizable framework for grout volume estimation,enabling more efficient resource allocation and improved project outcomes.
基金Supported by the National Natural Science Foundation of China(12261018)Universities Key Laboratory of Mathematical Modeling and Data Mining in Guizhou Province(2023013)。
文摘In this paper,we establish and study a single-species logistic model with impulsive age-selective harvesting.First,we prove the ultimate boundedness of the solutions of the system.Then,we obtain conditions for the asymptotic stability of the trivial solution and the positive periodic solution.Finally,numerical simulations are presented to validate our results.Our results show that age-selective harvesting is more conducive to sustainable population survival than non-age-selective harvesting.
基金supported by National Institute on Aging(NIH-NIA)R01AG054459(to ALL).
文摘Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzheimer’s Disease International).The apolipoproteinε4(APOE4)allele is the strongest genetic risk factor for late-onset AD(after age 65 years).Apolipoprotein E,a lipid transporter,exists in three variants:ε2,ε3,andε4.APOEε2(APOE2)is protective against AD,APOEε3(APOE3)is neutral,while APOE4 significantly increases the risk.Individuals with one copy of APOE4 have a 4-fold greater risk of developing AD,and those with two copies face an 8-fold risk compared to non-carriers.Even in cognitively normal individuals,APOE4 carriers exhibit brain metabolic and vascular deficits decades before amyloid-beta(Aβ)plaques and neurofibrillary tau tangles emerge-the hallmark pathologies of AD(Reiman et al.,2001,2005;Thambisetty et al.,2010).Notably,studies have demonstrated reduced glucose uptake,or hypometabolism,in brain regions vulnerable to AD in asymptomatic middle-aged APOE4 carriers,long before clinical symptoms arise(Reiman et al.,2001,2005).
基金The Trùndelag Health Study (HUNT) is a collaboration between HUNT Research Centre (Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology), Trùndelag County Council, Central Norway Regional Health Authority, and the Norwegian Institute of Public HealthThe coordination of European Prospective Investigation into Cancer and Nutrition - Spain study (EPIC) is financially supported by the International Agency for Research on Cancer (IARC)+7 种基金by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC)supported by Health Research Fund (FIS) - Instituto de Salud Carlos III (ISCIII), Regional Governments of Andaluc 1a, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain)funded by The Netherlands Organisation for Health Research and DevelopmentZon Mw (Grant No.: 531-00141-3)Funding for the SHIP study has been provided by the Federal Ministry for Education and Research (BMBFidentification codes 01 ZZ96030, 01 ZZ0103, and 01 ZZ0701)support from the Swedish Research Council (2018-02527 and 2019-00193)financed by the Helmholtz Zentrum München - German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria.
文摘Background There is insufficient evidence to provide recommendations for leisure-time physical activity among workers across various occupational physical activity levels.This study aimed to assess the association of leisure-time physical activity with cardiovascular and all-cause mortality across occupational physical activity levels.Methods This study utilized individual participant data from 21 cohort studies,comprising both published and unpublished data.Eligibility criteria included individual-level data on leisure-time and occupational physical activity(categorized as sedentary,low,moderate,and high)along with data on all-cause and/or cardiovascular mortality.A 2-stage individual participant data meta-analysis was conducted,with separate analysis of each study using Cox proportional hazards models(Stage 1).These results were combined using random-effects models(Stage 2).Results Higher leisure-time physical activity levels were associated with lower all-cause and cardiovascular mortality risk across most occupational physical activity levels,for both males and females.Among males with sedentary work,high compared to sedentary leisure-time physical activity was associated with lower all-cause(hazard ratios(HR)=0.77,95%confidence interval(95%CI):0.70-0.85)and cardiovascular mortality(HR=0.76,95%CI:0.66-0.87)risk.Among males with high levels of occupational physical activity,high compared to sedentary leisure-time physical activity was associated with lower all-cause(HR=0.84,95%CI:0.74-0.97)and cardiovascular mortality(HR=0.79,95%CI:0.60-1.04)risk,while HRs for low and moderate levels of leisure-time physical activity ranged between 0.87 and 0.97 and were not statistically significant.Among females,most effects were similar but more imprecise,especially in the higher occupational physical activity levels.Conclusion Higher levels of leisure-time physical activity were generally associated with lower mortality risks.However,results for workers with moderate and high occupational physical activity levels,especially women,were more imprecise.Our findings suggests that workers may benefit from engaging in high levels of leisure-time physical activity,irrespective of their level of occupational physical activity.
文摘It is important for modern hospital management to strengthen medical humanistic care and build a harmonious doctor-patient relationship.Innovative applications of the big data resources of patient experience in modern hospital management facilitate hospital management to realize real-time supervision,dynamic management and s&entitle decision-making based on patients experiences.It is helping the transformation of hospital management from an administrator^perspective to a patients perspective,and from experience-driven to data-driven.The technological innovations in hospital management based on patient experience data can assist the optimization and continuous improvement of healthcare quality,therefore help to increase patient satisfaction to the medical services.
基金This work was supported by grants from the National Natural Science Foundation of China,with No.NSFC62006109 and NSFC12031005.
文摘Purpose:In recent decades,with the availability of large-scale scientific corpus datasets,difference-in-difference(DID)is increasingly used in the science of science and bibliometrics studies.DID method outputs the unbiased estimation on condition that several hypotheses hold,especially the common trend assumption.In this paper,we gave a systematic demonstration of DID in the science of science,and the potential ways to improve the accuracy of DID method.Design/methodology/approach:At first,we reviewed the statistical assumptions,the model specification,and the application procedures of DID method.Second,to improve the necessary assumptions before conducting DID regression and the accuracy of estimation,we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention.Lastly,we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates,by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.Findings:We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results.As a case study,we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.Research limitations:This study ignored the rigorous mathematical deduction parts of DID,while focused on the practical parts.Practical implications:This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.Originality/value:This study gains insights into the usage of econometric tools in science of science.
基金This research/paper was fully supported by Universiti Teknologi PETRONAS,under the Yayasan Universiti Teknologi PETRONAS(YUTP)Fundamental Research Grant Scheme(YUTP-015LC0-123).
文摘Anomaly detection in high dimensional data is a critical research issue with serious implication in the real-world problems.Many issues in this field still unsolved,so several modern anomaly detection methods struggle to maintain adequate accuracy due to the highly descriptive nature of big data.Such a phenomenon is referred to as the“curse of dimensionality”that affects traditional techniques in terms of both accuracy and performance.Thus,this research proposed a hybrid model based on Deep Autoencoder Neural Network(DANN)with five layers to reduce the difference between the input and output.The proposed model was applied to a real-world gas turbine(GT)dataset that contains 87620 columns and 56 rows.During the experiment,two issues have been investigated and solved to enhance the results.The first is the dataset class imbalance,which solved using SMOTE technique.The second issue is the poor performance,which can be solved using one of the optimization algorithms.Several optimization algorithms have been investigated and tested,including stochastic gradient descent(SGD),RMSprop,Adam and Adamax.However,Adamax optimization algorithm showed the best results when employed to train theDANNmodel.The experimental results show that our proposed model can detect the anomalies by efficiently reducing the high dimensionality of dataset with accuracy of 99.40%,F1-score of 0.9649,Area Under the Curve(AUC)rate of 0.9649,and a minimal loss function during the hybrid model training.
基金supported by two research grants provided by the Karachi Institute of Economics and Technology(KIET)the Big Data Analytics Laboratory at the Insitute of Business Administration(IBAKarachi)。
文摘The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.