The ease of accessing a virtually unlimited pool of resources makes Infrastructure as a Service (IaaS) clouds an ideal platform for running data-intensive workflow applications comprising hundreds of computational tas...The ease of accessing a virtually unlimited pool of resources makes Infrastructure as a Service (IaaS) clouds an ideal platform for running data-intensive workflow applications comprising hundreds of computational tasks. However, executing scientific workflows in IaaS cloud environments poses significant challenges due to conflicting objectives, such as minimizing execution time (makespan) and reducing resource utilization costs. This study responds to the increasing need for efficient and adaptable optimization solutions in dynamic and complex environments, which are critical for meeting the evolving demands of modern users and applications. This study presents an innovative multi-objective approach for scheduling scientific workflows in IaaS cloud environments. The proposed algorithm, MOS-MWMC, aims to minimize total execution time (makespan) and resource utilization costs by leveraging key features of virtual machine instances, such as a high number of cores and fast local SSD storage. By integrating realistic simulations based on the WRENCH framework, the method effectively dimensions the cloud infrastructure and optimizes resource usage. Experimental results highlight the superiority of MOS-MWMC compared to benchmark algorithms HEFT and Max-Min. The Pareto fronts obtained for the CyberShake, Epigenomics, and Montage workflows demonstrate closer proximity to the optimal front, confirming the algorithm’s ability to balance conflicting objectives. This study contributes to optimizing scientific workflows in complex environments by providing solutions tailored to specific user needs while minimizing costs and execution times.展开更多
With the increasing complexity of scientific computing,it is imperative to enhance the efficiency and ease of High PerformanceComputing(HPC)utilization.Scientific workflow is introduced to that aim,but the current inf...With the increasing complexity of scientific computing,it is imperative to enhance the efficiency and ease of High PerformanceComputing(HPC)utilization.Scientific workflow is introduced to that aim,but the current infrastructure still needsoptimization.In this paper,we discuss the current problems based on scientific computing scenarios and design a moreuser-friendly workflow system solution targeting HPC services.In the proposed solution,we introduce a structured methodto describe the workflow and employ a more user-friendly interface for scientific workflows to bring a better experiencethan traditional command line approaches.We have integrated a variety of methods to enhance the user experience duringgeoscience experiments.Data analytics are being used to make more intelligent recommendations to users.Runtime predictionshelp users to better plan their schedules for research.The statistics of the testing period and user feedback show thatthe proposed workflow management system can effectively save the operating time and complexity of the scientists,whilesaving computing resources.Our proposed system has a variety of advantageous features,including the ease of use,uniformspecification with scalability,improved utilization of computing resources,and exemplary significance.展开更多
Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially l...Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially located in different datacenters,thereby resulting in huge delays during data transmis-sion.Edge computing minimizes the delays in data transmission and supports the fixed storage strategy for scientific workflow private datasets.Therefore,this fixed storage strategy creates huge amount of bottleneck in its storage capacity.At this juncture,integrating the merits of cloud computing and edge computing during the process of rationalizing the data placement of scientific workflows and optimizing the energy and time incurred in data transmission across different datacentres remains a challenge.In this paper,Adaptive Cooperative Foraging and Dispersed Foraging Strategies-Improved Harris Hawks Optimization Algorithm(ACF-DFS-HHOA)is proposed for optimizing the energy and data transmission time in the event of placing data for a specific scientific workflow.This ACF-DFS-HHOA considered the factors influencing transmission delay and energy consumption of data centers into account during the process of rationalizing the data placement of scientific workflows.The adaptive cooperative and dispersed foraging strategy is included in HHOA to guide the position updates that improve population diversity and effectively prevent the algorithm from being trapped into local optimality points.The experimental results of ACF-DFS-HHOA confirmed its predominance in minimizing energy and data transmission time incurred during workflow execution.展开更多
Scientific workflows are essential to modern scientific computing,yet traditional execution approaches-based on control-flow paradigms and disk-based data transfers-struggle as data movement,rather than computation,em...Scientific workflows are essential to modern scientific computing,yet traditional execution approaches-based on control-flow paradigms and disk-based data transfers-struggle as data movement,rather than computation,emerges as the dominant performance bottleneck.These methods suffer from long latency due to centralized orchestration,sequential task triggering,and inefficient disk-mediated exchanges.We propose HPCFlow,a data-flow-oriented workflow framework designed for high-performance computing(HPC)environments.HPCFlow supports decentralized,input-driven execution.Functions are decomposed into computation and data transmission,enabling asynchronous data propagation and efficient overlap.HPCFlow incorporates context-aware data transfer strategies and alleviates small-file I/O inefficiencies through mini-batching.Additionally,HPCFlow implements an input synchronization mechanism to guarantee data completeness during parallel execution under elastic scaling conditions.Empirical results from a production HPC environment demonstrate that compared to a control-flow baseline,HPCFlow significantly reduces makespan and end-to-end latency,achieves efficient overlap,and alleviates pressure on network file systems,thereby validating its effectiveness for data-intensive scientific workflows.展开更多
Earth system science is an interdisciplinary effort to understand the fundamentals and interactions of environmental processes.Interdisciplinary research is challenging since it demands the integration of scientific s...Earth system science is an interdisciplinary effort to understand the fundamentals and interactions of environmental processes.Interdisciplinary research is challenging since it demands the integration of scientific schemes and practices from different research fields into a collaborative work environment.This paper introduces the framework F4ESS that supports this integration.F4ESS provides methods and technologies that facilitate the development of integrative work environments for Earth system science.F4ESS enables scientists a)to outline structured and summarized descriptions of scientific procedures to facilitate communication and synthesis,b)to combine a large variety of distributed data analysis software into seamless data analysis chains and workflows,c)to visually combine and interactively explore the manifold spatiotemporal data and results to support understanding and knowledge creation.The F4ESS methods and technologies are generic and can be applied in various scientific fields.We discuss F4EsS in the context of the interdisciplinary investigation of flood events.展开更多
We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow langua...We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.展开更多
The UK Catalysis Hub(UKCH)is designing a virtual research environment to support data processing and analysis,the Catalysis Research Workbench(CRW).The development of this platform requires identifying the processing ...The UK Catalysis Hub(UKCH)is designing a virtual research environment to support data processing and analysis,the Catalysis Research Workbench(CRW).The development of this platform requires identifying the processing and analysis needs of the UKCH members and mapping them to potential solutions.This paper presents a proposal for a demonstrator to analyse the use of scientific workflows for large scale data processing.The demonstrator provides a concrete target to promote further discussion of the processing and analysis needs of the UKCH community.In this paper,we will discuss the main requirements for data processing elicited and the proposed adaptations that will be incorporated in the design of the CRW and how to integrate the proposed solutions with existing practices of the UKCH.The demonstrator has been used in discussion with researchers and in presentations to the UKCH community,generating increased interest and motivating furtherdevelopment.展开更多
文摘The ease of accessing a virtually unlimited pool of resources makes Infrastructure as a Service (IaaS) clouds an ideal platform for running data-intensive workflow applications comprising hundreds of computational tasks. However, executing scientific workflows in IaaS cloud environments poses significant challenges due to conflicting objectives, such as minimizing execution time (makespan) and reducing resource utilization costs. This study responds to the increasing need for efficient and adaptable optimization solutions in dynamic and complex environments, which are critical for meeting the evolving demands of modern users and applications. This study presents an innovative multi-objective approach for scheduling scientific workflows in IaaS cloud environments. The proposed algorithm, MOS-MWMC, aims to minimize total execution time (makespan) and resource utilization costs by leveraging key features of virtual machine instances, such as a high number of cores and fast local SSD storage. By integrating realistic simulations based on the WRENCH framework, the method effectively dimensions the cloud infrastructure and optimizes resource usage. Experimental results highlight the superiority of MOS-MWMC compared to benchmark algorithms HEFT and Max-Min. The Pareto fronts obtained for the CyberShake, Epigenomics, and Montage workflows demonstrate closer proximity to the optimal front, confirming the algorithm’s ability to balance conflicting objectives. This study contributes to optimizing scientific workflows in complex environments by providing solutions tailored to specific user needs while minimizing costs and execution times.
基金National Key R&D Plan of China under Grant No.2017YFA0604500by National Sci-Tech Support Plan of China under Grant No.2014BAH02F00+4 种基金by National Natural Science Foundation of China under Grant No.61701190by Youth Science Foundation of Jilin Province of China under Grant No.20160520011JH and 20180520021JHby Youth Sci-Tech Innovation Leader and Team Project of Jilin Province of China under Grant No.20170519017JHby Key Technology Innovation Cooperation Project of Government and University for the whole Industry Demonstration under Grant No.SXGJSF2017-4by Key scientific and technological R&D Plan of Jilin Province of China under Grant No.20180201103GX.
文摘With the increasing complexity of scientific computing,it is imperative to enhance the efficiency and ease of High PerformanceComputing(HPC)utilization.Scientific workflow is introduced to that aim,but the current infrastructure still needsoptimization.In this paper,we discuss the current problems based on scientific computing scenarios and design a moreuser-friendly workflow system solution targeting HPC services.In the proposed solution,we introduce a structured methodto describe the workflow and employ a more user-friendly interface for scientific workflows to bring a better experiencethan traditional command line approaches.We have integrated a variety of methods to enhance the user experience duringgeoscience experiments.Data analytics are being used to make more intelligent recommendations to users.Runtime predictionshelp users to better plan their schedules for research.The statistics of the testing period and user feedback show thatthe proposed workflow management system can effectively save the operating time and complexity of the scientists,whilesaving computing resources.Our proposed system has a variety of advantageous features,including the ease of use,uniformspecification with scalability,improved utilization of computing resources,and exemplary significance.
文摘Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially located in different datacenters,thereby resulting in huge delays during data transmis-sion.Edge computing minimizes the delays in data transmission and supports the fixed storage strategy for scientific workflow private datasets.Therefore,this fixed storage strategy creates huge amount of bottleneck in its storage capacity.At this juncture,integrating the merits of cloud computing and edge computing during the process of rationalizing the data placement of scientific workflows and optimizing the energy and time incurred in data transmission across different datacentres remains a challenge.In this paper,Adaptive Cooperative Foraging and Dispersed Foraging Strategies-Improved Harris Hawks Optimization Algorithm(ACF-DFS-HHOA)is proposed for optimizing the energy and data transmission time in the event of placing data for a specific scientific workflow.This ACF-DFS-HHOA considered the factors influencing transmission delay and energy consumption of data centers into account during the process of rationalizing the data placement of scientific workflows.The adaptive cooperative and dispersed foraging strategy is included in HHOA to guide the position updates that improve population diversity and effectively prevent the algorithm from being trapped into local optimality points.The experimental results of ACF-DFS-HHOA confirmed its predominance in minimizing energy and data transmission time incurred during workflow execution.
基金supported by National Key R&D Program of China grant 2023YFB3002204.
文摘Scientific workflows are essential to modern scientific computing,yet traditional execution approaches-based on control-flow paradigms and disk-based data transfers-struggle as data movement,rather than computation,emerges as the dominant performance bottleneck.These methods suffer from long latency due to centralized orchestration,sequential task triggering,and inefficient disk-mediated exchanges.We propose HPCFlow,a data-flow-oriented workflow framework designed for high-performance computing(HPC)environments.HPCFlow supports decentralized,input-driven execution.Functions are decomposed into computation and data transmission,enabling asynchronous data propagation and efficient overlap.HPCFlow incorporates context-aware data transfer strategies and alleviates small-file I/O inefficiencies through mini-batching.Additionally,HPCFlow implements an input synchronization mechanism to guarantee data completeness during parallel execution under elastic scaling conditions.Empirical results from a production HPC environment demonstrate that compared to a control-flow baseline,HPCFlow significantly reduces makespan and end-to-end latency,achieves efficient overlap,and alleviates pressure on network file systems,thereby validating its effectiveness for data-intensive scientific workflows.
基金supported by The Initiative and Networking Fund of the Helmholtz Association:[Grant Number].
文摘Earth system science is an interdisciplinary effort to understand the fundamentals and interactions of environmental processes.Interdisciplinary research is challenging since it demands the integration of scientific schemes and practices from different research fields into a collaborative work environment.This paper introduces the framework F4ESS that supports this integration.F4ESS provides methods and technologies that facilitate the development of integrative work environments for Earth system science.F4ESS enables scientists a)to outline structured and summarized descriptions of scientific procedures to facilitate communication and synthesis,b)to combine a large variety of distributed data analysis software into seamless data analysis chains and workflows,c)to visually combine and interactively explore the manifold spatiotemporal data and results to support understanding and knowledge creation.The F4ESS methods and technologies are generic and can be applied in various scientific fields.We discuss F4EsS in the context of the interdisciplinary investigation of flood events.
基金a project funded by the European Union contracts H2020-INFRAEDI-02-2018823830,and H2020-EINFRA-2015-1675728funded through EOSC-Life(https://www.eosc-life.eu)contract H2020-INFRAEOSC-2018-2824087ELIXIR-CONVERGE(https://elixir-europe.org)contract H2020-INFRADEV-2019-2871075.
文摘We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
基金funded by EPSRC grant:EP/R026939/1,EP/R026815/1,EP/R026645/1,EP/R027129/1 or EP/M013219/1(biocatalysis)part-funded by the European Regional Development Fund(ERDF)via Welsh Government.
文摘The UK Catalysis Hub(UKCH)is designing a virtual research environment to support data processing and analysis,the Catalysis Research Workbench(CRW).The development of this platform requires identifying the processing and analysis needs of the UKCH members and mapping them to potential solutions.This paper presents a proposal for a demonstrator to analyse the use of scientific workflows for large scale data processing.The demonstrator provides a concrete target to promote further discussion of the processing and analysis needs of the UKCH community.In this paper,we will discuss the main requirements for data processing elicited and the proposed adaptations that will be incorporated in the design of the CRW and how to integrate the proposed solutions with existing practices of the UKCH.The demonstrator has been used in discussion with researchers and in presentations to the UKCH community,generating increased interest and motivating furtherdevelopment.