An evaluation approach for the response time probability distribution of workflows based on the fluid stochastic Petri net formalism is presented. Firstly, some problems about stochastic workflow net modeling are disc...An evaluation approach for the response time probability distribution of workflows based on the fluid stochastic Petri net formalism is presented. Firstly, some problems about stochastic workflow net modeling are discussed. Then how to convert a stochastic workflow net model into a fluid stochastic Petri net model is described. The response time distribution can be obtained directly upon the transient state solution of the fluid stochastic Petri net model. In the proposed approach, there are not any restrictions on the structure of workflow models, and the processing times of workflow tasks can be modeled by using arbitrary probability distributions. Large workflow models can be efficiently tackled by recursively using a net reduction technique.展开更多
In recent years, several researchers have applied workflow technologies for service automation on ubiquitous compating environments. However, most context-aware workflows do not offer a method to compose several workf...In recent years, several researchers have applied workflow technologies for service automation on ubiquitous compating environments. However, most context-aware workflows do not offer a method to compose several workflows in order to get mare large-scale or complicated workflow. They only provide a simple workflow model, not a composite workflow model. In this paper, the autorhs propose a context-aware workflow model to support composite workflows by expanding the patterns of the existing context-aware wrY:flows, which support the basic woddlow patterns. The suggested workflow model of. fers composite workflow patterns for a context-aware workflow, which consists of various flow patterns, such as simple, split, parallel flows, and subflow. With the suggested model, the model can easily reuse few of existing workflows to make a new workflow. As a result, it can save the development efforts and time of context-aware workflows and increase the workflow reusability. Therefore, the suggested model is expected to make it easy to develop applications related to context-aware workflow services on ubiquitous computing environments.展开更多
Based on the methods of acquaintance cache and group-based intelligent forwarding of service recommendations,a novel group-based active service(GAS) protocol for migrating workflows was proposed.This protocol did not ...Based on the methods of acquaintance cache and group-based intelligent forwarding of service recommendations,a novel group-based active service(GAS) protocol for migrating workflows was proposed.This protocol did not require service requesters to discover services or resources.The semantic acquaintance knowledge representation was exploited to describe service groups and this semantic information was used to recommend service to respective clients.The experimental results show that the new protocol proposed offers better performance than other protocols in terms of first-response-time,success-scope and ratio of success-packet-number to total-packet-number.When the number of service request packet is 20,the first-response-time of GAS protocol is only 5.1 s,which is significantly lower than that of other protocols.The success-scope of GAS protocol is 49.1%,showing that GAS protocol can effectively improve the reliability of mobile transactions.And the ratio of success-packet-number to total-packet-number of GAS protocol is up to 0.080,which is obviously higher than that of other protocols.展开更多
As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent task...As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent tasks.By exploiting its low latency and high bandwidth,mobile edge computing(MEC)has emerged to achieve the high-performance computation offloading of these applications to satisfy the quality-of-service requirements of workflows and devices.In this study,we propose an offloading strategy for IoT-based workflows in a high-performance MEC environment.The proposed task-based offloading strategy consists of an optimization problem that includes task dependency,communication costs,workflow constraints,device energy consumption,and the heterogeneous characteristics of the edge environment.In addition,the optimal placement of workflow tasks is optimized using a discrete teaching learning-based optimization(DTLBO)metaheuristic.Extensive experimental evaluations demonstrate that the proposed offloading strategy is effective at minimizing the energy consumption of mobile devices and reducing the execution times of workflows compared to offloading strategies using different metaheuristics,including particle swarm optimization and ant colony optimization.展开更多
With the prevalence of service-oriented architecture (SOA), web services have become the dominating technology to construct workflow systems. As a workflow is the composition of a series of interrelated web services...With the prevalence of service-oriented architecture (SOA), web services have become the dominating technology to construct workflow systems. As a workflow is the composition of a series of interrelated web services which realize its activities, the interoperability of workflows can be treated as the composition of web services. To address it, a framework for interoperability of business process execution language (BPEL)-based workflows is presented, which can perform three phases, that is, transformation, eonformance test and execution. The core components of the framework are proposed, especially how these components promote interoperability. In particular, dynamic binding and re-composition of work-flows in terms of web service testing are presented. Besides, an example of business-to-business (B2B) collaboration is provided to illustrate how to perform composition and conformance test.展开更多
With the rapid growth of the Industrial Internet of Things(IIoT), the Mobile Edge Computing(MEC) has coming widely used in many emerging scenarios. In MEC, each workflow task can be executed locally or offloaded to ed...With the rapid growth of the Industrial Internet of Things(IIoT), the Mobile Edge Computing(MEC) has coming widely used in many emerging scenarios. In MEC, each workflow task can be executed locally or offloaded to edge to help improve Quality of Service(QoS) and reduce energy consumption. However, most of the existing offloading strategies focus on independent applications, which cannot be applied efficiently to workflow applications with a series of dependent tasks. To address the issue,this paper proposes an energy-efficient task offloading strategy for large-scale workflow applications in MEC. First, we formulate the task offloading problem into an optimization problem with the goal of minimizing the utility cost, which is the trade-off between energy consumption and the total execution time. Then, a novel heuristic algorithm named Green DVFS-GA is proposed, which includes a task offloading step based on the genetic algorithm and a further step to reduce the energy consumption using Dynamic Voltage and Frequency Scaling(DVFS) technique. Experimental results show that our proposed strategy can significantly reduce the energy consumption and achieve the best trade-off compared with other strategies.展开更多
Workflow-based systems are typically said to lead to better use of staff and better management and productivity. The first phase in building a workflow-based system is capturing the real-world process in a conceptual ...Workflow-based systems are typically said to lead to better use of staff and better management and productivity. The first phase in building a workflow-based system is capturing the real-world process in a conceptual representation suitable for the following phases of formalization and implementation. The specification may be in text or diagram form or written in a formal language. This paper proposes a flow-based diagrammatic methodology as a tool for workflow specification. The expressiveness of the method is appraised though its ability to capture a workflow-based application. Here we show that the proposed conceptual diagrams are able to express situations arising in practice as an alternative to tools currently used in workflow systems. This is demonstrated by using the proposed methodology to partial build demo systems for two government agencies.展开更多
The current application of digital workflows for the understanding,promotion and participation in the conservation of heritage sites involves several technical challenges and should be governed by serious ethical enga...The current application of digital workflows for the understanding,promotion and participation in the conservation of heritage sites involves several technical challenges and should be governed by serious ethical engagement.Recording consists of capturing(or mapping)the physical characteristics of character-defining elements that provide the significance of cultural heritage sites.Usually,the outcome of this work represents the cornerstone information serving for their conservation,whatever it uses actively for maintaining them or for ensuring a posterity record in case of destruction.The records produced could guide the decision-making process at different levels by property owners,site managers,public officials,and conservators around the world,as well as to present historical knowledge and values of these resources.Rigorous documentation may also serve a broader purpose:over time,it becomes the primary means by which scholars and the public apprehends a site that has since changed radically or disappeared.This contribution is aimed at providing an overview of the potential application and threats of technology utilised by a heritage recording professional by addressing the need to develop ethical principles that can improve the heritage recording practice at large.展开更多
Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastru...Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.展开更多
In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automa...In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automatically enriching them with semantics on the level of their operations and datasets.Linked workflows can be easily published on the Web and queried for types of inputs,results,or tools.Thus,GIS analysts can reuse their workflows in a modular way,selecting,adapting,and recommending resources based on compatible semantic types.Our typing approach starts from minimal annotations of workflow operations with classes of GIS tools,and then propagates data types and implicit semantic structures through the workflow using an OWL typing scheme and SPARQL rules by backtracking over GIS operations.The method is implemented in Python and is evaluated on two real-world geoprocessing workflows,generated with Esri's ArcGIS.To illustrate the potential applications of our typing method,we formulate and execute competency questions over these workflows.展开更多
When defining indicators on the environment,the use of existing initiatives should be a priority rather than redefining indicators each time.From an Information,Communication and Technology perspective,data interopera...When defining indicators on the environment,the use of existing initiatives should be a priority rather than redefining indicators each time.From an Information,Communication and Technology perspective,data interoperability and standardization are critical to improve data access and exchange as promoted by the Group on Earth Observations.GEOEssential is following an end-user driven approach by defining Essential Variables(EVs),as an intermediate value between environmental policy indicators and their appropriate data sources.From international to local scales,environmental policies and indicators are increasingly percolating down from the global to the local agendas.The scientific business processes for the generation of EVs and related indicators can be formalized in workflows specifying the necessary logical steps.To this aim,GEOEssential is developing a Virtual Laboratory the main objective of which is to instantiate conceptual workflows,which are stored in a dedicated knowledge base,generating executable workflows.To interpret and present the relevant outputs/results carried out by the different thematic workflows considered in GEOEssential(i.e.biodiversity,ecosystems,extractives,night light,and food-water-energy nexus),a Dashboard is built as a visual front-end.This is a valuable instrument to track progresses towards environmental policies.展开更多
There is a growing recognition of the interdependencies among the supply systems that rely upon food,water and energy.Billions of people lack safe and sufficient access to these systems,coupled with a rapidly growing ...There is a growing recognition of the interdependencies among the supply systems that rely upon food,water and energy.Billions of people lack safe and sufficient access to these systems,coupled with a rapidly growing global demand and increasing resource constraints.Modeling frameworks are considered one of the few means available to understand the complex interrelationships among the sectors,however development of nexus related frameworks has been limited.We describe three opensource models well known in their respective domains(i.e.TerrSysMP,WOFOST and SWAT)where components of each if combined could help decision-makers address the nexus issue.We propose as a first step the development of simple workflows utilizing essential variables and addressing components of the above-mentioned models which can act as building-blocks to be used ultimately in a comprehensive nexus model framework.The outputs of the workflows and the model framework are designed to address the SDGs.展开更多
Since their introduction by James Dixon in 2010,data lakes get more and more attention,driven by the promise of high reusability of the stored data due to the schema-on-read semantics.Building on this idea,several add...Since their introduction by James Dixon in 2010,data lakes get more and more attention,driven by the promise of high reusability of the stored data due to the schema-on-read semantics.Building on this idea,several additional requirements were discussed in literature to improve the general usability of the concept,like a central metadata catalog including all provenance information,an overarching data governance,or the integration with(high-performance)processing capabilities.Although the necessity for a logical and a physical organisation of data lakes in order to meet those requirements is widely recognized,no concrete guidelines are yet provided.The most common architecture implementing this conceptual organisation is the zone architecture,where data is assigned to a certain zone depending on the degree of processing.This paper discusses how FAIR Digital Objects can be used in a novel approach to organize a data lake based on data types instead of zones,how they can be used to abstract the physical implementation,and how they empower generic and portable processing capabilities based on a provenance-based approach.展开更多
We discuss the problem of accountability when multiple parties cooperate towards an end result,such as multiple companies in a supply chain or departments of a government service under different authorities.In cases w...We discuss the problem of accountability when multiple parties cooperate towards an end result,such as multiple companies in a supply chain or departments of a government service under different authorities.In cases where a fully trusted central point does not exist,it is difficult to obtain a trusted audit trail of a workflow when each individual participant is unaccountable to all others.We propose AudiWFlow,an auditing architecture that makes participants accountable for their contributions in a distributed workflow.Our scheme provides confidentiality in most cases,collusion detection,and availability of evidence after the workflow terminates.AudiWFlow is based on verifiable secret sharing and real-time peer-to-peer verification of records;it further supports multiple levels of assurance to meet a desired trade-off between the availability of evidence and the overhead resulting from the auditing approach.We propose and evaluate two implementation approaches for AudiWFlow.The first one is fully distributed except for a central auxiliary point that,nevertheless,needs only a low level of trust.The second one is based on smart contracts running on a public blockchain,which is able to remove the need for any central point but requires integration with a blockchain.展开更多
With quick development of grid techniques and growing complexity of grid applications, it is becoming critical for reasoning temporal properties of grid workflows to probe potential pitfalls and errors, in order to en...With quick development of grid techniques and growing complexity of grid applications, it is becoming critical for reasoning temporal properties of grid workflows to probe potential pitfalls and errors, in order to ensure reliability and trustworthiness at the initial design phase. A state Pi calculus is proposed and implemented in this work, which not only enables fexible abstraction and management of historical grid verification of grid workflows. Furthermore, a relaxed region system events, but also facilitates modeling and temporal analysis (RRA) approach is proposed to decompose large scale grid workflows into sequentially composed regions with relaxation of parallel workflow branches, and corresponding verification strategies are also decomposed following modular verification principles. Performance evaluation results show that the RRA approach can dramatically reduce CPU time and memory usage of formal verification.展开更多
Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contr...Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contribute to the FAIR data principles:by processing data according to established metadata;by creating metadata themselves during the processing of data;and by tracking and recording data provenance.These properties aid data quality assessment and contribute to secondary data usage.Moreover,workflows are digital objects in their own right.This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps,their provenance,and their development.展开更多
The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition...The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.展开更多
Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the ...Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the high volumes of data and relatively limited storage capability, resource allocation and data streaming have to be storage aware. Also to improve system performance, the data streaming and processing have to be concurrent. This study used a genetic algorithm (GA) for workflow scheduling, using on-line measurements and predictions with gray model (GM). On-demand data streaming is used to avoid data overflow through repertory strategies. Tests show that tasks with on-demand data streaming must be balanced to improve overall performance, to avoid system bottlenecks and backlogs of intermediate data, and to increase data throughput for the data processing workflows as a whole.展开更多
Machine learning(ML)applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing(HPC)power are paving the way.Ensuring FAIR data and reproducible ML pract...Machine learning(ML)applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing(HPC)power are paving the way.Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers.Even though the FAIR principle is well known to many scientists,research communities are slow to adopt them.Canonical Workflow Framework for Research(CWFR)provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers.This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate,focusing on HPC and big data.Specifically,we discuss Fair Digital Object(FDO)and Research Object(RO)in the DeepRain project to achieve granular reproducibility.DeepRain is a project that aims to improve precipitation forecast in Germany by using ML.Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access.We suggest the Juypter notebook as a single reproducible experiment.In addition,we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface.展开更多
基金The National Natural Science Foundation of China(No.60175027).
文摘An evaluation approach for the response time probability distribution of workflows based on the fluid stochastic Petri net formalism is presented. Firstly, some problems about stochastic workflow net modeling are discussed. Then how to convert a stochastic workflow net model into a fluid stochastic Petri net model is described. The response time distribution can be obtained directly upon the transient state solution of the fluid stochastic Petri net model. In the proposed approach, there are not any restrictions on the structure of workflow models, and the processing times of workflow tasks can be modeled by using arbitrary probability distributions. Large workflow models can be efficiently tackled by recursively using a net reduction technique.
基金supported by the The Ministry of Knowledge Economy,Korea,the ITRC(Information Technology Research Center)support program(ⅡTA-2009-(C1090-0902-0007))
文摘In recent years, several researchers have applied workflow technologies for service automation on ubiquitous compating environments. However, most context-aware workflows do not offer a method to compose several workflows in order to get mare large-scale or complicated workflow. They only provide a simple workflow model, not a composite workflow model. In this paper, the autorhs propose a context-aware workflow model to support composite workflows by expanding the patterns of the existing context-aware wrY:flows, which support the basic woddlow patterns. The suggested workflow model of. fers composite workflow patterns for a context-aware workflow, which consists of various flow patterns, such as simple, split, parallel flows, and subflow. With the suggested model, the model can easily reuse few of existing workflows to make a new workflow. As a result, it can save the development efforts and time of context-aware workflows and increase the workflow reusability. Therefore, the suggested model is expected to make it easy to develop applications related to context-aware workflow services on ubiquitous computing environments.
基金Project(60573169) supported by the National Natural Science Foundation of China
文摘Based on the methods of acquaintance cache and group-based intelligent forwarding of service recommendations,a novel group-based active service(GAS) protocol for migrating workflows was proposed.This protocol did not require service requesters to discover services or resources.The semantic acquaintance knowledge representation was exploited to describe service groups and this semantic information was used to recommend service to respective clients.The experimental results show that the new protocol proposed offers better performance than other protocols in terms of first-response-time,success-scope and ratio of success-packet-number to total-packet-number.When the number of service request packet is 20,the first-response-time of GAS protocol is only 5.1 s,which is significantly lower than that of other protocols.The success-scope of GAS protocol is 49.1%,showing that GAS protocol can effectively improve the reliability of mobile transactions.And the ratio of success-packet-number to total-packet-number of GAS protocol is up to 0.080,which is obviously higher than that of other protocols.
文摘As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent tasks.By exploiting its low latency and high bandwidth,mobile edge computing(MEC)has emerged to achieve the high-performance computation offloading of these applications to satisfy the quality-of-service requirements of workflows and devices.In this study,we propose an offloading strategy for IoT-based workflows in a high-performance MEC environment.The proposed task-based offloading strategy consists of an optimization problem that includes task dependency,communication costs,workflow constraints,device energy consumption,and the heterogeneous characteristics of the edge environment.In addition,the optimal placement of workflow tasks is optimized using a discrete teaching learning-based optimization(DTLBO)metaheuristic.Extensive experimental evaluations demonstrate that the proposed offloading strategy is effective at minimizing the energy consumption of mobile devices and reducing the execution times of workflows compared to offloading strategies using different metaheuristics,including particle swarm optimization and ant colony optimization.
基金the National High Technology.Research and Development Programme of China(No.2006AAO4Z151 and 2006AA04Z166)the National Natural Science Foundation of China(No.60674080 and No.60504030)the EU FP6(No.033610)
文摘With the prevalence of service-oriented architecture (SOA), web services have become the dominating technology to construct workflow systems. As a workflow is the composition of a series of interrelated web services which realize its activities, the interoperability of workflows can be treated as the composition of web services. To address it, a framework for interoperability of business process execution language (BPEL)-based workflows is presented, which can perform three phases, that is, transformation, eonformance test and execution. The core components of the framework are proposed, especially how these components promote interoperability. In particular, dynamic binding and re-composition of work-flows in terms of web service testing are presented. Besides, an example of business-to-business (B2B) collaboration is provided to illustrate how to perform composition and conformance test.
基金Supported by the National Natural Science Foundation of China(62102292)the Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology) of China(HBIRL202103,HBIRL202204)+1 种基金Science Foundation Research Project of Wuhan Institute of Technology of China(K202035)Graduate Innovative Fund of Wuhan Institute of Technology of China(CX2021265)。
文摘With the rapid growth of the Industrial Internet of Things(IIoT), the Mobile Edge Computing(MEC) has coming widely used in many emerging scenarios. In MEC, each workflow task can be executed locally or offloaded to edge to help improve Quality of Service(QoS) and reduce energy consumption. However, most of the existing offloading strategies focus on independent applications, which cannot be applied efficiently to workflow applications with a series of dependent tasks. To address the issue,this paper proposes an energy-efficient task offloading strategy for large-scale workflow applications in MEC. First, we formulate the task offloading problem into an optimization problem with the goal of minimizing the utility cost, which is the trade-off between energy consumption and the total execution time. Then, a novel heuristic algorithm named Green DVFS-GA is proposed, which includes a task offloading step based on the genetic algorithm and a further step to reduce the energy consumption using Dynamic Voltage and Frequency Scaling(DVFS) technique. Experimental results show that our proposed strategy can significantly reduce the energy consumption and achieve the best trade-off compared with other strategies.
文摘Workflow-based systems are typically said to lead to better use of staff and better management and productivity. The first phase in building a workflow-based system is capturing the real-world process in a conceptual representation suitable for the following phases of formalization and implementation. The specification may be in text or diagram form or written in a formal language. This paper proposes a flow-based diagrammatic methodology as a tool for workflow specification. The expressiveness of the method is appraised though its ability to capture a workflow-based application. Here we show that the proposed conceptual diagrams are able to express situations arising in practice as an alternative to tools currently used in workflow systems. This is demonstrated by using the proposed methodology to partial build demo systems for two government agencies.
文摘The current application of digital workflows for the understanding,promotion and participation in the conservation of heritage sites involves several technical challenges and should be governed by serious ethical engagement.Recording consists of capturing(or mapping)the physical characteristics of character-defining elements that provide the significance of cultural heritage sites.Usually,the outcome of this work represents the cornerstone information serving for their conservation,whatever it uses actively for maintaining them or for ensuring a posterity record in case of destruction.The records produced could guide the decision-making process at different levels by property owners,site managers,public officials,and conservators around the world,as well as to present historical knowledge and values of these resources.Rigorous documentation may also serve a broader purpose:over time,it becomes the primary means by which scholars and the public apprehends a site that has since changed radically or disappeared.This contribution is aimed at providing an overview of the potential application and threats of technology utilised by a heritage recording professional by addressing the need to develop ethical principles that can improve the heritage recording practice at large.
文摘Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.
文摘In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automatically enriching them with semantics on the level of their operations and datasets.Linked workflows can be easily published on the Web and queried for types of inputs,results,or tools.Thus,GIS analysts can reuse their workflows in a modular way,selecting,adapting,and recommending resources based on compatible semantic types.Our typing approach starts from minimal annotations of workflow operations with classes of GIS tools,and then propagates data types and implicit semantic structures through the workflow using an OWL typing scheme and SPARQL rules by backtracking over GIS operations.The method is implemented in Python and is evaluated on two real-world geoprocessing workflows,generated with Esri's ArcGIS.To illustrate the potential applications of our typing method,we formulate and execute competency questions over these workflows.
基金This work was supported by European Commission[grant number H2020 ERA-PLANET project No.689443].
文摘When defining indicators on the environment,the use of existing initiatives should be a priority rather than redefining indicators each time.From an Information,Communication and Technology perspective,data interoperability and standardization are critical to improve data access and exchange as promoted by the Group on Earth Observations.GEOEssential is following an end-user driven approach by defining Essential Variables(EVs),as an intermediate value between environmental policy indicators and their appropriate data sources.From international to local scales,environmental policies and indicators are increasingly percolating down from the global to the local agendas.The scientific business processes for the generation of EVs and related indicators can be formalized in workflows specifying the necessary logical steps.To this aim,GEOEssential is developing a Virtual Laboratory the main objective of which is to instantiate conceptual workflows,which are stored in a dedicated knowledge base,generating executable workflows.To interpret and present the relevant outputs/results carried out by the different thematic workflows considered in GEOEssential(i.e.biodiversity,ecosystems,extractives,night light,and food-water-energy nexus),a Dashboard is built as a visual front-end.This is a valuable instrument to track progresses towards environmental policies.
基金The authors would like to acknowledge the European Commission Horizon 2020 Program that funded both the ERAPLANET/GEOEssential(Grant Agreement no.689443)ConnectinGEO(Grant Agreement no.641538)projects.
文摘There is a growing recognition of the interdependencies among the supply systems that rely upon food,water and energy.Billions of people lack safe and sufficient access to these systems,coupled with a rapidly growing global demand and increasing resource constraints.Modeling frameworks are considered one of the few means available to understand the complex interrelationships among the sectors,however development of nexus related frameworks has been limited.We describe three opensource models well known in their respective domains(i.e.TerrSysMP,WOFOST and SWAT)where components of each if combined could help decision-makers address the nexus issue.We propose as a first step the development of simple workflows utilizing essential variables and addressing components of the above-mentioned models which can act as building-blocks to be used ultimately in a comprehensive nexus model framework.The outputs of the workflows and the model framework are designed to address the SDGs.
基金funding by the"Niedersachsisches Vorab"funding line of the Volkswagen Foundation.
文摘Since their introduction by James Dixon in 2010,data lakes get more and more attention,driven by the promise of high reusability of the stored data due to the schema-on-read semantics.Building on this idea,several additional requirements were discussed in literature to improve the general usability of the concept,like a central metadata catalog including all provenance information,an overarching data governance,or the integration with(high-performance)processing capabilities.Although the necessity for a logical and a physical organisation of data lakes in order to meet those requirements is widely recognized,no concrete guidelines are yet provided.The most common architecture implementing this conceptual organisation is the zone architecture,where data is assigned to a certain zone depending on the degree of processing.This paper discusses how FAIR Digital Objects can be used in a novel approach to organize a data lake based on data types instead of zones,how they can be used to abstract the physical implementation,and how they empower generic and portable processing capabilities based on a provenance-based approach.
文摘We discuss the problem of accountability when multiple parties cooperate towards an end result,such as multiple companies in a supply chain or departments of a government service under different authorities.In cases where a fully trusted central point does not exist,it is difficult to obtain a trusted audit trail of a workflow when each individual participant is unaccountable to all others.We propose AudiWFlow,an auditing architecture that makes participants accountable for their contributions in a distributed workflow.Our scheme provides confidentiality in most cases,collusion detection,and availability of evidence after the workflow terminates.AudiWFlow is based on verifiable secret sharing and real-time peer-to-peer verification of records;it further supports multiple levels of assurance to meet a desired trade-off between the availability of evidence and the overhead resulting from the auditing approach.We propose and evaluate two implementation approaches for AudiWFlow.The first one is fully distributed except for a central auxiliary point that,nevertheless,needs only a low level of trust.The second one is based on smart contracts running on a public blockchain,which is able to remove the need for any central point but requires integration with a blockchain.
基金supported by the National Basic Research 973 Program of China under Grant Nos.2011CB302805,2011CB302505the National High Technology Research and Development 863 Program of China under Grant No.2011AA040501+1 种基金the National Natural Science Foundation of China under Grant No.60803017Fan Zhang is supported by IBM 2011-2012 Ph.D. Fellowship
文摘With quick development of grid techniques and growing complexity of grid applications, it is becoming critical for reasoning temporal properties of grid workflows to probe potential pitfalls and errors, in order to ensure reliability and trustworthiness at the initial design phase. A state Pi calculus is proposed and implemented in this work, which not only enables fexible abstraction and management of historical grid verification of grid workflows. Furthermore, a relaxed region system events, but also facilitates modeling and temporal analysis (RRA) approach is proposed to decompose large scale grid workflows into sequentially composed regions with relaxation of parallel workflow branches, and corresponding verification strategies are also decomposed following modular verification principles. Performance evaluation results show that the RRA approach can dramatically reduce CPU time and memory usage of formal verification.
基金Carole Goble acknowledges funding by BioExcel2(H2020823830)IBISBA1.0(H2020730976)and EOSCLife(H2020824087)+3 种基金Daniel Schober’s work was financed by Phenomenal(H2020654241)at the initiation-phase of this effort,current work in kind contributionKristian Peters is funded by the German Network for Bioinformatics Infrastructure(de.NBI)and acknowledges BMBF funding under grant number 031L0107Stian Soiland-Reyes is funded by BioExcel2(H2020823830)Daniel Garijo,Yolanda Gil,gratefully acknowledge support from DARPA award W911NF-18-1-0027,NIH award 1R01AG059874-01,and NSF award ICER-1740683.
文摘Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contribute to the FAIR data principles:by processing data according to established metadata;by creating metadata themselves during the processing of data;and by tracking and recording data provenance.These properties aid data quality assessment and contribute to secondary data usage.Moreover,workflows are digital objects in their own right.This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps,their provenance,and their development.
文摘The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.
基金Supported by the National Natural Science Foundation of China(No. 60803017)the National High-Tech Research and Development (863) Program of China (Nos. 2006AA10Z237,2007AA01Z179, and 2008AA01Z118)+1 种基金the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministrythe FIT Foundation of Tsinghua University
文摘Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the high volumes of data and relatively limited storage capability, resource allocation and data streaming have to be storage aware. Also to improve system performance, the data streaming and processing have to be concurrent. This study used a genetic algorithm (GA) for workflow scheduling, using on-line measurements and predictions with gray model (GM). On-demand data streaming is used to avoid data overflow through repertory strategies. Tests show that tasks with on-demand data streaming must be balanced to improve overall performance, to avoid system bottlenecks and backlogs of intermediate data, and to increase data throughput for the data processing workflows as a whole.
基金German Bundesministerium fuer Bildung und Forschung(BMBF)for funding the DeepRain project under grant agreement 01 IS18047A-E.
文摘Machine learning(ML)applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing(HPC)power are paving the way.Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers.Even though the FAIR principle is well known to many scientists,research communities are slow to adopt them.Canonical Workflow Framework for Research(CWFR)provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers.This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate,focusing on HPC and big data.Specifically,we discuss Fair Digital Object(FDO)and Research Object(RO)in the DeepRain project to achieve granular reproducibility.DeepRain is a project that aims to improve precipitation forecast in Germany by using ML.Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access.We suggest the Juypter notebook as a single reproducible experiment.In addition,we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface.