RETRACTION:P.Goyal and R.Malviya,“Challenges and Opportunities of Big Data Analytics in Healthcare,”Health Care Science 2,no.5(2023):328-338,https://doi.org/10.1002/hcs2.66.The above article,published online on 4 Oc...RETRACTION:P.Goyal and R.Malviya,“Challenges and Opportunities of Big Data Analytics in Healthcare,”Health Care Science 2,no.5(2023):328-338,https://doi.org/10.1002/hcs2.66.The above article,published online on 4 October 2023 in Wiley Online Library(wileyonlinelibrary.com),has been retracted by agreement between the journal Editor-in-Chief,Zongjiu Zhang;Tsinghua University Press;and John Wiley&Sons Ltd.展开更多
Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Sma...Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.展开更多
As financial criminal methods become increasingly sophisticated, traditional anti-money laundering and fraud detection approaches face significant challenges. This study focuses on the application technologies and cha...As financial criminal methods become increasingly sophisticated, traditional anti-money laundering and fraud detection approaches face significant challenges. This study focuses on the application technologies and challenges of big data analytics in anti-money laundering and financial fraud detection. The research begins by outlining the evolutionary trends of financial crimes and highlighting the new characteristics of the big data era. Subsequently, it systematically analyzes the application of big data analytics technologies in this field, including machine learning, network analysis, and real-time stream processing. Through case studies, the research demonstrates how these technologies enhance the accuracy and efficiency of anomalous transaction detection. However, the study also identifies challenges faced by big data analytics, such as data quality issues, algorithmic bias, and privacy protection concerns. To address these challenges, the research proposes solutions from both technological and managerial perspectives, including the application of privacy-preserving technologies like federated learning. Finally, the study discusses the development prospects of Regulatory Technology (RegTech), emphasizing the importance of synergy between technological innovation and regulatory policies. This research provides guidance for financial institutions and regulatory bodies in optimizing their anti-money laundering and fraud detection strategies.展开更多
This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabyt...This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabytes to petabytes of data on a daily basis. Io T applications in telecom are further contributing to this data deluge. Recent advances in BDA have exposed new opportunities to get actionable insights from telecom big data. These benefits and the fast-changing BDA technology landscape make it important to investigate existing BDA applications to telecom sector. For this, we initially determine published research on BDA applications to telecom through a systematic literature review through which we filter 38 articles and categorize them in frameworks, use cases, literature reviews, white papers and experimental validations. We also discuss the benefits and challenges mentioned in these articles. We find that experiments are all proof of concepts(POC) on a severely limited BDA technology stack(as compared to the available technology stack), i.e.,we did not find any work focusing on full-fledged BDA implementation in an operational telecom environment. To facilitate these applications at research-level, we propose a state-of-the-art lambda architecture for BDA pipeline implementation(called Lambda Tel) based completely on open source BDA technologies and the standard Python language, along with relevant guidelines.We discovered only one research paper which presented a relatively-limited lambda architecture using the proprietary AWS cloud infrastructure. We believe Lambda Tel presents a clear roadmap for telecom industry practitioners to implement and enhance BDA applications in their enterprises.展开更多
The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big da...The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.展开更多
Big Data and Data Analytics affect almost all aspects of modern organisations’decision-making and business strategies.Big Data and Data Analytics create opportunities,challenges,and implications for the external audi...Big Data and Data Analytics affect almost all aspects of modern organisations’decision-making and business strategies.Big Data and Data Analytics create opportunities,challenges,and implications for the external auditing procedure.The purpose of this article is to reveal essential aspects of the impact of Big Data and Data Analytics on external auditing.It seems that Big Data Analytics is a critical tool for organisations,as well as auditors,that contributes to the enhancement of the auditing process.Also,legislative implications must be taken under consideration,since existing standards may need to change.Last,auditors need to develop new skills and competence,and educational organisations need to change their educational programs in order to be able to correspond to new market needs.展开更多
To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain mod...To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain model are built,in which manufacturers design products alone in the retail channel,while the platform and manufacturer complete the product design in the direct online channel.These two models are analyzed using the game theoretical model and numerical simulation.The findings indicate that if the manufacturers design capabilities are not very high and the commission rate is not very low,the manufacturers will choose the direct online channel if the platform s technical efforts are within an interval.When the platform s technical efforts are exogenous,they positively influence the manufacturers decisions;however,in the endogenous case,the platform s effect on the manufacturers is reflected in the interaction of the commission rate and cost efficiency.The manufacturers and the platform should make synthetic effort decisions based on the manufacturer s development capabilities,the intensity of market competition,and the cost efficiency of the platform.展开更多
In recent years,huge volumes of healthcare data are getting generated in various forms.The advancements made in medical imaging are tremendous owing to which biomedical image acquisition has become easier and quicker....In recent years,huge volumes of healthcare data are getting generated in various forms.The advancements made in medical imaging are tremendous owing to which biomedical image acquisition has become easier and quicker.Due to such massive generation of big data,the utilization of new methods based on Big Data Analytics(BDA),Machine Learning(ML),and Artificial Intelligence(AI)have become essential.In this aspect,the current research work develops a new Big Data Analytics with Cat Swarm Optimization based deep Learning(BDA-CSODL)technique for medical image classification on Apache Spark environment.The aim of the proposed BDA-CSODL technique is to classify the medical images and diagnose the disease accurately.BDA-CSODL technique involves different stages of operations such as preprocessing,segmentation,fea-ture extraction,and classification.In addition,BDA-CSODL technique also fol-lows multi-level thresholding-based image segmentation approach for the detection of infected regions in medical image.Moreover,a deep convolutional neural network-based Inception v3 method is utilized in this study as feature extractor.Stochastic Gradient Descent(SGD)model is used for parameter tuning process.Furthermore,CSO with Long Short-Term Memory(CSO-LSTM)model is employed as a classification model to determine the appropriate class labels to it.Both SGD and CSO design approaches help in improving the overall image classification performance of the proposed BDA-CSODL technique.A wide range of simulations was conducted on benchmark medical image datasets and the com-prehensive comparative results demonstrate the supremacy of the proposed BDA-CSODL technique under different measures.展开更多
With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeo...With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.展开更多
Lately,the Internet of Things(IoT)application requires millions of structured and unstructured data since it has numerous problems,such as data organization,production,and capturing.To address these shortcomings,big d...Lately,the Internet of Things(IoT)application requires millions of structured and unstructured data since it has numerous problems,such as data organization,production,and capturing.To address these shortcomings,big data analytics is the most superior technology that has to be adapted.Even though big data and IoT could make human life more convenient,those benefits come at the expense of security.To manage these kinds of threats,the intrusion detection system has been extensively applied to identify malicious network traffic,particularly once the preventive technique fails at the level of endpoint IoT devices.As cyberattacks targeting IoT have gradually become stealthy and more sophisticated,intrusion detection systems(IDS)must continually emerge to manage evolving security threats.This study devises Big Data Analytics with the Internet of Things Assisted Intrusion Detection using Modified Buffalo Optimization Algorithm with Deep Learning(IDMBOA-DL)algorithm.In the presented IDMBOA-DL model,the Hadoop MapReduce tool is exploited for managing big data.The MBOA algorithm is applied to derive an optimal subset of features from picking an optimum set of feature subsets.Finally,the sine cosine algorithm(SCA)with convolutional autoencoder(CAE)mechanism is utilized to recognize and classify the intrusions in the IoT network.A wide range of simulations was conducted to demonstrate the enhanced results of the IDMBOA-DL algorithm.The comparison outcomes emphasized the better performance of the IDMBOA-DL model over other approaches.展开更多
Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempt...Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempting to maintain discriminative features in processed data.The existing scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.Recently ensemble methods have made a mark in classification tasks as combine multiple results into a single representation.When comparing to a single model,this technique offers for improved prediction.Ensemble based feature selections parallel multiple expert’s judgments on a single topic.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.Further,individual outputs produced by methods producing subsets of features or rankings or voting are also combined in this work.KNN(K-Nearest Neighbor)classifier is used to classify the big dataset obtained from the ensemble learning approach.The results found of the study have been good,proving the proposed model’s efficiency in classifications in terms of the performance metrics like precision,recall,F-measure and accuracy used.展开更多
Climate change and global warming results in natural hazards, including flash floods. Flash floods can create blue spots;areas where transport networks (roads, tunnels, bridges, passageways) and other engineering stru...Climate change and global warming results in natural hazards, including flash floods. Flash floods can create blue spots;areas where transport networks (roads, tunnels, bridges, passageways) and other engineering structures within them are at flood risk. The economic and social impact of flooding revealed that the damage caused by flash floods leading to blue spots is very high in terms of dollar amount and direct impacts on people’s lives. The impact of flooding within blue spots is either infrastructural or social, affecting lives and properties. Currently, more than 16.1 million properties in the U.S are vulnerable to flooding, and this is projected to increase by 3.2% within the next 30 years. Some models have been developed for flood risks analysis and management including some hydrological models, algorithms and machine learning and geospatial models. The models and methods reviewed are based on location data collection, statistical analysis and computation, and visualization (mapping). This research aims to create blue spots model for the State of Tennessee using ArcGIS visual programming language (model) and data analytics pipeline.展开更多
The information gained after the data analysis is vital to implement its outcomes to optimize processes and systems for more straightforward problem-solving. Therefore, the first step of data analytics deals with iden...The information gained after the data analysis is vital to implement its outcomes to optimize processes and systems for more straightforward problem-solving. Therefore, the first step of data analytics deals with identifying data requirements, mainly how the data should be grouped or labeled. For example, for data about Cybersecurity in organizations, grouping can be done into categories such as DOS denial of services, unauthorized access from local or remote, and surveillance and another probing. Next, after identifying the groups, a researcher or whoever carrying out the data analytics goes out into the field and primarily collects the data. The data collected is then organized in an orderly fashion to enable easy analysis;we aim to study different articles and compare performances for each algorithm to choose the best suitable classifies.展开更多
Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workload...Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workloads running on state-of-the-art SMT( simultaneous multithreading) processors,which needs comprehensive understanding to workload characteristics. This paper chooses the Spark workloads as the representative big data analytics workloads and performs comprehensive measurements on the POWER8 platform,which supports a wide range of multithreading. The research finds that the thread assignment policy and cache contention have significant impacts on application performance. In order to identify the potential optimization method from the experiment results,this study performs micro-architecture level characterizations by means of hardware performance counters and gives implications accordingly.展开更多
Data science is an interdisciplinary discipline that employs big data,machine learning algorithms,data mining techniques,and scientific methodologies to extract insights and information from massive amounts of structu...Data science is an interdisciplinary discipline that employs big data,machine learning algorithms,data mining techniques,and scientific methodologies to extract insights and information from massive amounts of structured and unstructured data.The healthcare industry constantly creates large,important databases on patient demographics,treatment plans,results of medical exams,insurance coverage,and more.The data that IoT(Internet of Things)devices collect is of interest to data scientists.Data science can help with the healthcare industry's massive amounts of disparate,structured,and unstructured data by processing,managing,analyzing,and integrating it.To get reliable findings from this data,proper management and analysis are essential.This article provides a comprehen-sive study and discussion of process data analysis as it pertains to healthcare applications.The article discusses the advantages and dis-advantages of using big data analytics(BDA)in the medical industry.The insights offered by BDA,which can also aid in making strategic decisions,can assist the healthcare system.展开更多
The amount and scale of worldwide data centers grow rapidly in the era of big data,leading to massive energy consumption and formidable carbon emission.To achieve the efficient and sustainable development of informati...The amount and scale of worldwide data centers grow rapidly in the era of big data,leading to massive energy consumption and formidable carbon emission.To achieve the efficient and sustainable development of information technology(IT)industry,researchers propose to schedule data or data analytics jobs to data centers with low electricity prices and carbon emission rates.However,due to the highly heterogeneous and dynamic nature of geo-distributed data centers in terms of resource capacity,electricity price,and the rate of carbon emissions,it is quite difficult to optimize the electricity cost and carbon emission of data centers over a long period.In this paper,we propose an energy-aware data backup and job scheduling method with minimal cost(EDJC)to minimize the electricity cost of geo-distributed data analytics jobs,and simultaneously ensure the long-term carbon emission budget of each data center.Specifically,we firstly design a cost-effective data backup algorithm to generate a data backup strategy that minimizes cost based on historical job requirements.After that,based on the data backup strategy,we utilize an online carbon-aware job scheduling algorithm to calculate the job scheduling strategy in each time slot.In this algorithm,we use the Lyapunov optimization to decompose the long-term job scheduling optimization problem into a series of real-time job scheduling optimization subproblems,and thereby minimize the electricity cost and satisfy the budget of carbon emission.The experimental results show that the EDJC method can significantly reduce the total electricity cost of the data center and meet the carbon emission constraints of the data center at the same time.展开更多
More and more marketplace platforms choose to use the data gathered from consumers(e.g.,customer search terms,demographics)to provide a data analytics service for third-party sellers,both encouraging innovations and i...More and more marketplace platforms choose to use the data gathered from consumers(e.g.,customer search terms,demographics)to provide a data analytics service for third-party sellers,both encouraging innovations and improving the operations of the latter.Through adopting the data analytics service,a seller can enhance its competitiveness by improving the quality of its products when there is more than one seller offering substitutes.This study develops a game-theoretic model to characterize a scenario with a marketplace platform and two competing sellers that sell substitutable products and decide whether to adopt the data analytics service provided by the platform.Then,we find that a seller's decision of whether or not to purchase the service depends on the competitor's decision and the two sellers'absorptive capacities of knowledge.Furthermore,a seller can benefit from adopting the data analytics service only if the seller has a high absorptive capacity,which can help increase product quality.When only one seller adopts,the platform and competing sellers can benefit from the adoption if this seller's absorptive capacity is high and the other's is moderate.The sellers'interests and social welfare can be aligned unless both sellers'absorptive capacities are low or high.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
This study examines the effect of key big data analytics capabilities(data,skills,technology,and culture)on innovation performance and business performance.Data were gathered from 91 companies and analyzed to determin...This study examines the effect of key big data analytics capabilities(data,skills,technology,and culture)on innovation performance and business performance.Data were gathered from 91 companies and analyzed to determine correlations in the proposed model.The results find that big data analytics capabilities(BDAC)partially mediate the effect between innovative performance and business performance.Further,as a company’s performance is a multidimensional element,was necessary to analyze more than one attribute to evaluate the relationship with BDAC through a canonical correlation analysis.The results in this sense reveal that the four big data capabilities increase the growth of sales,revenue,the number of workers,the net profit margin,innovation management,the development of new products and services,and the adoption of new information technologies.展开更多
It's no exaggeration to say that data-driven journalism has taken off in the era of the Internet.Data analytics and visualisation tools help journalists uncover more in-depth insights into their stories and presen...It's no exaggeration to say that data-driven journalism has taken off in the era of the Internet.Data analytics and visualisation tools help journalists uncover more in-depth insights into their stories and present complex information in a more easily understandable way.There are different ways to collect,analyse and visualise data.Various tools,from Web scraping to sophisticated coding or modelling are used depending on the topic,data type,and objectives the journalist wants to achieve with them.The purpose of this paper is to showcase the techniques and applications in data-driven journalism,and to offer a concise overview of the topic.It is clear from the case studies and technological development presented below that data journalism does have a transformative power in helping to tell better stories and to better engage audiences by offering more context to important issues.Of course,one of the greatest challenges for data journalism is the traditional problem journalists have in general:accuracy,humanity,and context.These are as important as ever in the data age.The purpose of this paper is to showcase the techniques and applications in data-driven journalism,and to offer a concise overview of the topic.展开更多
文摘RETRACTION:P.Goyal and R.Malviya,“Challenges and Opportunities of Big Data Analytics in Healthcare,”Health Care Science 2,no.5(2023):328-338,https://doi.org/10.1002/hcs2.66.The above article,published online on 4 October 2023 in Wiley Online Library(wileyonlinelibrary.com),has been retracted by agreement between the journal Editor-in-Chief,Zongjiu Zhang;Tsinghua University Press;and John Wiley&Sons Ltd.
文摘Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.
文摘As financial criminal methods become increasingly sophisticated, traditional anti-money laundering and fraud detection approaches face significant challenges. This study focuses on the application technologies and challenges of big data analytics in anti-money laundering and financial fraud detection. The research begins by outlining the evolutionary trends of financial crimes and highlighting the new characteristics of the big data era. Subsequently, it systematically analyzes the application of big data analytics technologies in this field, including machine learning, network analysis, and real-time stream processing. Through case studies, the research demonstrates how these technologies enhance the accuracy and efficiency of anomalous transaction detection. However, the study also identifies challenges faced by big data analytics, such as data quality issues, algorithmic bias, and privacy protection concerns. To address these challenges, the research proposes solutions from both technological and managerial perspectives, including the application of privacy-preserving technologies like federated learning. Finally, the study discusses the development prospects of Regulatory Technology (RegTech), emphasizing the importance of synergy between technological innovation and regulatory policies. This research provides guidance for financial institutions and regulatory bodies in optimizing their anti-money laundering and fraud detection strategies.
基金supported in part by the Big Data Analytics Laboratory(BDALAB)at the Institute of Business Administration under the research grant approved by the Higher Education Commission of Pakistan(www.hec.gov.pk)the Darbi company(www.darbi.io)
文摘This paper focuses on facilitating state-of-the-art applications of big data analytics(BDA) architectures and infrastructures to telecommunications(telecom) industrial sector.Telecom companies are dealing with terabytes to petabytes of data on a daily basis. Io T applications in telecom are further contributing to this data deluge. Recent advances in BDA have exposed new opportunities to get actionable insights from telecom big data. These benefits and the fast-changing BDA technology landscape make it important to investigate existing BDA applications to telecom sector. For this, we initially determine published research on BDA applications to telecom through a systematic literature review through which we filter 38 articles and categorize them in frameworks, use cases, literature reviews, white papers and experimental validations. We also discuss the benefits and challenges mentioned in these articles. We find that experiments are all proof of concepts(POC) on a severely limited BDA technology stack(as compared to the available technology stack), i.e.,we did not find any work focusing on full-fledged BDA implementation in an operational telecom environment. To facilitate these applications at research-level, we propose a state-of-the-art lambda architecture for BDA pipeline implementation(called Lambda Tel) based completely on open source BDA technologies and the standard Python language, along with relevant guidelines.We discovered only one research paper which presented a relatively-limited lambda architecture using the proprietary AWS cloud infrastructure. We believe Lambda Tel presents a clear roadmap for telecom industry practitioners to implement and enhance BDA applications in their enterprises.
基金supported by two research grants provided by the Karachi Institute of Economics and Technology(KIET)the Big Data Analytics Laboratory at the Insitute of Business Administration(IBAKarachi)。
文摘The advent of healthcare information management systems(HIMSs)continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale.Analysis of this big data allows for boundless potential outcomes for discovering knowledge.Big data analytics(BDA)in healthcare can,for instance,help determine causes of diseases,generate effective diagnoses,enhance Qo S guarantees by increasing efficiency of the healthcare delivery and effectiveness and viability of treatments,generate accurate predictions of readmissions,enhance clinical care,and pinpoint opportunities for cost savings.However,BDA implementations in any domain are generally complicated and resource-intensive with a high failure rate and no roadmap or success strategies to guide the practitioners.In this paper,we present a comprehensive roadmap to derive insights from BDA in the healthcare(patient care)domain,based on the results of a systematic literature review.We initially determine big data characteristics for healthcare and then review BDA applications to healthcare in academic research focusing particularly on No SQL databases.We also identify the limitations and challenges of these applications and justify the potential of No SQL databases to address these challenges and further enhance BDA healthcare research.We then propose and describe a state-of-the-art BDA architecture called Med-BDA for healthcare domain which solves all current BDA challenges and is based on the latest zeta big data paradigm.We also present success strategies to ensure the working of Med-BDA along with outlining the major benefits of BDA applications to healthcare.Finally,we compare our work with other related literature reviews across twelve hallmark features to justify the novelty and importance of our work.The aforementioned contributions of our work are collectively unique and clearly present a roadmap for clinical administrators,practitioners and professionals to successfully implement BDA initiatives in their organizations.
文摘Big Data and Data Analytics affect almost all aspects of modern organisations’decision-making and business strategies.Big Data and Data Analytics create opportunities,challenges,and implications for the external auditing procedure.The purpose of this article is to reveal essential aspects of the impact of Big Data and Data Analytics on external auditing.It seems that Big Data Analytics is a critical tool for organisations,as well as auditors,that contributes to the enhancement of the auditing process.Also,legislative implications must be taken under consideration,since existing standards may need to change.Last,auditors need to develop new skills and competence,and educational organisations need to change their educational programs in order to be able to correspond to new market needs.
基金The National Natural Science Foundation of China(No.72071039)the Foundation of China Scholarship Council(No.202106090197)。
文摘To obtain the platform s big data analytics support,manufacturers in the traditional retail channel must decide whether to use the direct online channel.A retail supply chain model and a direct online supply chain model are built,in which manufacturers design products alone in the retail channel,while the platform and manufacturer complete the product design in the direct online channel.These two models are analyzed using the game theoretical model and numerical simulation.The findings indicate that if the manufacturers design capabilities are not very high and the commission rate is not very low,the manufacturers will choose the direct online channel if the platform s technical efforts are within an interval.When the platform s technical efforts are exogenous,they positively influence the manufacturers decisions;however,in the endogenous case,the platform s effect on the manufacturers is reflected in the interaction of the commission rate and cost efficiency.The manufacturers and the platform should make synthetic effort decisions based on the manufacturer s development capabilities,the intensity of market competition,and the cost efficiency of the platform.
基金The author extends his appreciation to the Deanship of Scientific Research at Majmaah University for funding this study under Project Number(R-2022-61).
文摘In recent years,huge volumes of healthcare data are getting generated in various forms.The advancements made in medical imaging are tremendous owing to which biomedical image acquisition has become easier and quicker.Due to such massive generation of big data,the utilization of new methods based on Big Data Analytics(BDA),Machine Learning(ML),and Artificial Intelligence(AI)have become essential.In this aspect,the current research work develops a new Big Data Analytics with Cat Swarm Optimization based deep Learning(BDA-CSODL)technique for medical image classification on Apache Spark environment.The aim of the proposed BDA-CSODL technique is to classify the medical images and diagnose the disease accurately.BDA-CSODL technique involves different stages of operations such as preprocessing,segmentation,fea-ture extraction,and classification.In addition,BDA-CSODL technique also fol-lows multi-level thresholding-based image segmentation approach for the detection of infected regions in medical image.Moreover,a deep convolutional neural network-based Inception v3 method is utilized in this study as feature extractor.Stochastic Gradient Descent(SGD)model is used for parameter tuning process.Furthermore,CSO with Long Short-Term Memory(CSO-LSTM)model is employed as a classification model to determine the appropriate class labels to it.Both SGD and CSO design approaches help in improving the overall image classification performance of the proposed BDA-CSODL technique.A wide range of simulations was conducted on benchmark medical image datasets and the com-prehensive comparative results demonstrate the supremacy of the proposed BDA-CSODL technique under different measures.
基金Supported by the National High Technology Research and Development Program of China(No.2015AA015308)the National Key Research and Development Plan of China(No.2016YFB1000600,2016YFB1000601)the Major Program of National Natural Science Foundation of China(No.61432006)
文摘With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.
文摘Lately,the Internet of Things(IoT)application requires millions of structured and unstructured data since it has numerous problems,such as data organization,production,and capturing.To address these shortcomings,big data analytics is the most superior technology that has to be adapted.Even though big data and IoT could make human life more convenient,those benefits come at the expense of security.To manage these kinds of threats,the intrusion detection system has been extensively applied to identify malicious network traffic,particularly once the preventive technique fails at the level of endpoint IoT devices.As cyberattacks targeting IoT have gradually become stealthy and more sophisticated,intrusion detection systems(IDS)must continually emerge to manage evolving security threats.This study devises Big Data Analytics with the Internet of Things Assisted Intrusion Detection using Modified Buffalo Optimization Algorithm with Deep Learning(IDMBOA-DL)algorithm.In the presented IDMBOA-DL model,the Hadoop MapReduce tool is exploited for managing big data.The MBOA algorithm is applied to derive an optimal subset of features from picking an optimum set of feature subsets.Finally,the sine cosine algorithm(SCA)with convolutional autoencoder(CAE)mechanism is utilized to recognize and classify the intrusions in the IoT network.A wide range of simulations was conducted to demonstrate the enhanced results of the IDMBOA-DL algorithm.The comparison outcomes emphasized the better performance of the IDMBOA-DL model over other approaches.
文摘Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempting to maintain discriminative features in processed data.The existing scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.Recently ensemble methods have made a mark in classification tasks as combine multiple results into a single representation.When comparing to a single model,this technique offers for improved prediction.Ensemble based feature selections parallel multiple expert’s judgments on a single topic.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.Further,individual outputs produced by methods producing subsets of features or rankings or voting are also combined in this work.KNN(K-Nearest Neighbor)classifier is used to classify the big dataset obtained from the ensemble learning approach.The results found of the study have been good,proving the proposed model’s efficiency in classifications in terms of the performance metrics like precision,recall,F-measure and accuracy used.
文摘Climate change and global warming results in natural hazards, including flash floods. Flash floods can create blue spots;areas where transport networks (roads, tunnels, bridges, passageways) and other engineering structures within them are at flood risk. The economic and social impact of flooding revealed that the damage caused by flash floods leading to blue spots is very high in terms of dollar amount and direct impacts on people’s lives. The impact of flooding within blue spots is either infrastructural or social, affecting lives and properties. Currently, more than 16.1 million properties in the U.S are vulnerable to flooding, and this is projected to increase by 3.2% within the next 30 years. Some models have been developed for flood risks analysis and management including some hydrological models, algorithms and machine learning and geospatial models. The models and methods reviewed are based on location data collection, statistical analysis and computation, and visualization (mapping). This research aims to create blue spots model for the State of Tennessee using ArcGIS visual programming language (model) and data analytics pipeline.
文摘The information gained after the data analysis is vital to implement its outcomes to optimize processes and systems for more straightforward problem-solving. Therefore, the first step of data analytics deals with identifying data requirements, mainly how the data should be grouped or labeled. For example, for data about Cybersecurity in organizations, grouping can be done into categories such as DOS denial of services, unauthorized access from local or remote, and surveillance and another probing. Next, after identifying the groups, a researcher or whoever carrying out the data analytics goes out into the field and primarily collects the data. The data collected is then organized in an orderly fashion to enable easy analysis;we aim to study different articles and compare performances for each algorithm to choose the best suitable classifies.
基金Supported by the National High Technology Research and Development Program of China(No.2015AA015308)the State Key Development Program for Basic Research of China(No.2014CB340402)
文摘Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workloads running on state-of-the-art SMT( simultaneous multithreading) processors,which needs comprehensive understanding to workload characteristics. This paper chooses the Spark workloads as the representative big data analytics workloads and performs comprehensive measurements on the POWER8 platform,which supports a wide range of multithreading. The research finds that the thread assignment policy and cache contention have significant impacts on application performance. In order to identify the potential optimization method from the experiment results,this study performs micro-architecture level characterizations by means of hardware performance counters and gives implications accordingly.
文摘Data science is an interdisciplinary discipline that employs big data,machine learning algorithms,data mining techniques,and scientific methodologies to extract insights and information from massive amounts of structured and unstructured data.The healthcare industry constantly creates large,important databases on patient demographics,treatment plans,results of medical exams,insurance coverage,and more.The data that IoT(Internet of Things)devices collect is of interest to data scientists.Data science can help with the healthcare industry's massive amounts of disparate,structured,and unstructured data by processing,managing,analyzing,and integrating it.To get reliable findings from this data,proper management and analysis are essential.This article provides a comprehen-sive study and discussion of process data analysis as it pertains to healthcare applications.The article discusses the advantages and dis-advantages of using big data analytics(BDA)in the medical industry.The insights offered by BDA,which can also aid in making strategic decisions,can assist the healthcare system.
基金supported by the National Natural Science Foundation of China under Grant Nos.U23B2004 and 62162018,the Guangxi Natural Science Foundation under Grant Nos.2025GXNSFBA069389 and 2025GXNSFBA069101the Key Research and Development Program of Guangxi under Grant No.AB25069157+1 种基金the Hunan Provincial Excellent Youth Fund under Grant No.2023JJ20055the Open Project Program of Guangxi Key Laboratory of Digital Infrastructure under Grant No.GXDIOP2024005.
文摘The amount and scale of worldwide data centers grow rapidly in the era of big data,leading to massive energy consumption and formidable carbon emission.To achieve the efficient and sustainable development of information technology(IT)industry,researchers propose to schedule data or data analytics jobs to data centers with low electricity prices and carbon emission rates.However,due to the highly heterogeneous and dynamic nature of geo-distributed data centers in terms of resource capacity,electricity price,and the rate of carbon emissions,it is quite difficult to optimize the electricity cost and carbon emission of data centers over a long period.In this paper,we propose an energy-aware data backup and job scheduling method with minimal cost(EDJC)to minimize the electricity cost of geo-distributed data analytics jobs,and simultaneously ensure the long-term carbon emission budget of each data center.Specifically,we firstly design a cost-effective data backup algorithm to generate a data backup strategy that minimizes cost based on historical job requirements.After that,based on the data backup strategy,we utilize an online carbon-aware job scheduling algorithm to calculate the job scheduling strategy in each time slot.In this algorithm,we use the Lyapunov optimization to decompose the long-term job scheduling optimization problem into a series of real-time job scheduling optimization subproblems,and thereby minimize the electricity cost and satisfy the budget of carbon emission.The experimental results show that the EDJC method can significantly reduce the total electricity cost of the data center and meet the carbon emission constraints of the data center at the same time.
基金supported by the Major Program of National Natural Science Foundation of China(No.72394373)the National Excellent Youth Fund of National Natural Science Foundation of China(No.72022012)the Key Program and General Program of National Natural Science Foundation of China(No.72231004,No.71871155 and No.71971153).Authors are very grateful to the editor and all reviewers whose invaluable comments and suggestions substantially helped improve the quality of our paper.
文摘More and more marketplace platforms choose to use the data gathered from consumers(e.g.,customer search terms,demographics)to provide a data analytics service for third-party sellers,both encouraging innovations and improving the operations of the latter.Through adopting the data analytics service,a seller can enhance its competitiveness by improving the quality of its products when there is more than one seller offering substitutes.This study develops a game-theoretic model to characterize a scenario with a marketplace platform and two competing sellers that sell substitutable products and decide whether to adopt the data analytics service provided by the platform.Then,we find that a seller's decision of whether or not to purchase the service depends on the competitor's decision and the two sellers'absorptive capacities of knowledge.Furthermore,a seller can benefit from adopting the data analytics service only if the seller has a high absorptive capacity,which can help increase product quality.When only one seller adopts,the platform and competing sellers can benefit from the adoption if this seller's absorptive capacity is high and the other's is moderate.The sellers'interests and social welfare can be aligned unless both sellers'absorptive capacities are low or high.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
基金financially supported by the Spanish State Research Agency(Agencia Estatal de Investigación)of the Spanish Ministry of Science and Innovation(MCIN/AEI/10.13039/501100011033)via the project‘Speeding up the transition towards resilient circular economy networks:forecasting,inventory and production control,reverse logistics and supply chain dynamics’(SPUR,grant ref.PID2020-117021GB-I00)+1 种基金Author Omar León:contract financed under Royal Decree 289/2021,of April 20,Ministry of Universities,for the requalification of the Spanish university system and Order UNI/551/2021of May 26.Maria Zambrano Modality,Project Ref.MU-21-UP2021-030 AX492867.
文摘This study examines the effect of key big data analytics capabilities(data,skills,technology,and culture)on innovation performance and business performance.Data were gathered from 91 companies and analyzed to determine correlations in the proposed model.The results find that big data analytics capabilities(BDAC)partially mediate the effect between innovative performance and business performance.Further,as a company’s performance is a multidimensional element,was necessary to analyze more than one attribute to evaluate the relationship with BDAC through a canonical correlation analysis.The results in this sense reveal that the four big data capabilities increase the growth of sales,revenue,the number of workers,the net profit margin,innovation management,the development of new products and services,and the adoption of new information technologies.
文摘It's no exaggeration to say that data-driven journalism has taken off in the era of the Internet.Data analytics and visualisation tools help journalists uncover more in-depth insights into their stories and present complex information in a more easily understandable way.There are different ways to collect,analyse and visualise data.Various tools,from Web scraping to sophisticated coding or modelling are used depending on the topic,data type,and objectives the journalist wants to achieve with them.The purpose of this paper is to showcase the techniques and applications in data-driven journalism,and to offer a concise overview of the topic.It is clear from the case studies and technological development presented below that data journalism does have a transformative power in helping to tell better stories and to better engage audiences by offering more context to important issues.Of course,one of the greatest challenges for data journalism is the traditional problem journalists have in general:accuracy,humanity,and context.These are as important as ever in the data age.The purpose of this paper is to showcase the techniques and applications in data-driven journalism,and to offer a concise overview of the topic.