Warning: This document is for the development version of Pydra: A simple dataflow engine with scalable semantics. The main version is master.

pydra.engine package

The core of the workflow engine.

class pydra.engine.AuditFlag(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Flag

Auditing flags.

ALL = 3

Track provenance and resource utilization.

NONE = 0

Do not track provenance or monitor resources.

PROV = 1

Track provenance only.

RESOURCE = 2

Monitor resource utilization only.

class pydra.engine.DockerTask(container_info=None, *args, **kwargs)

Bases: ContainerTask

Extend shell command task for containerized execution with the Docker Engine.

property container_args

Get container-specific CLI arguments, returns a list if the task has a state

init = False
class pydra.engine.ShellCommandTask(container_info=None, *args, **kwargs)

Bases: TaskBase

Wrap a shell command as a task element.

property cmdline

Get the actual command line that will be submitted Returns a list if the task has a state.

property command_args

Get command line arguments

input_spec = None
output_spec = None
class pydra.engine.Submitter(plugin='cf', **kwargs)

Bases: object

Send a task to the execution backend.

close()

Close submitter.

Do not close previously running loop.

async expand_runnable(runnable, wait=False, rerun=False)

This coroutine handles state expansion.

Removes any states from runnable. If wait is set to False (default), aggregates all worker execution coroutines and returns them. If wait is True, waits for all coroutines to complete / error and returns None.

Parameters:
  • runnable (pydra Task) – Task instance (Task, Workflow)

  • wait (bool (False)) – Await all futures before completing

Returns:

futures – Coroutines for TaskBase execution.

Return type:

set or None

async expand_workflow(wf, rerun=False)

Expand and execute a stateless Workflow. This method is only reached by Workflow._run_task.

Parameters:

wf (Workflow) – Workflow Task object

Returns:

wf – The computed workflow

Return type:

pydra.engine.core.Workflow

async submit_from_call(runnable, rerun)

This coroutine should only be called once per Submitter call, and serves as the bridge between sync/async lands.

There are 4 potential paths based on the type of runnable: 0) Workflow has a different plugin than a submitter 1) Workflow without State 2) Task without State 3) (Workflow or Task) with State

Once Python 3.10 is the minimum, this should probably be refactored into using structural pattern matching.

class pydra.engine.Workflow(name, audit_flags: AuditFlag = AuditFlag.NONE, cache_dir=None, cache_locations=None, input_spec: List[str] | SpecInfo | None = None, cont_dim=None, messenger_args=None, messengers=None, output_spec: SpecInfo | BaseSpec | None = None, rerun=False, propagate_rerun=True, **kwargs)

Bases: TaskBase

A composite task with structure of computational graph.

add(task)

Add a task to the workflow.

Parameters:

task (TaskBase) – The task to be added.

property checksum

Calculates the unique checksum of the task. Used to create specific directory name for task that are run; and to create nodes checksums needed for graph checksums (before the tasks have inputs etc.)

create_connections(task, detailed=False)

Add and connect a particular task to existing nodes in the workflow.

Parameters:
  • task (TaskBase) – The task to be added.

  • detailed (bool) – If True, add_edges_description is run for self.graph to add a detailed descriptions of the connections (input/output fields names)

create_dotfile(type='simple', export=None, name=None, output_dir=None)

creating a graph - dotfile and optionally exporting to other formats

property graph_sorted

Get a sorted graph representation of the workflow.

property nodes

Get the list of node names.

set_output(connections)

Write outputs.

Parameters:

connections – TODO

Submodules