Warning: This document is for an old version of Pydra: A simple dataflow engine with scalable semantics. The main version is master.

pydra.engine.core module

Basic processing graph elements.

class pydra.engine.core.TaskBase(name: str, audit_flags: AuditFlag = AuditFlag.NONE, cache_dir=None, cache_locations=None, inputs: str | File | Dict | None = None, cont_dim=None, messenger_args=None, messengers=None, rerun=False)

Bases: object

A base structure for the nodes in the processing graph.

Tasks are a generic compute step from which both elementary tasks and Workflow instances inherit.

DEFAULT_COPY_COLLATION = 0
SUPPORTED_COPY_MODES = 15
audit_flags: AuditFlag = 0

AuditFlag.

Type:

What to audit – available flags

property cache_dir

Get the location of the cache directory.

property cache_locations

Get the list of cache sources.

property can_resume

Whether the task accepts checkpoint-restart.

property checksum

Calculates the unique checksum of the task. Used to create specific directory name for task that are run; and to create nodes checksums needed for graph checksums (before the tasks have inputs etc.)

checksum_states(state_index=None)

Calculate a checksum for the specific state or all of the states of the task. Replaces lists in the inputs fields with a specific values for states. Used to recreate names of the task directories,

Parameters:

state_index – TODO

combine(combiner: List[str] | str, overwrite: bool = False)

Combine inputs parameterized by one or more previous tasks.

Parameters:
  • combiner (list[str] or str) – the

  • overwrite (bool) – whether to overwrite an existing combiner on the node

  • **kwargs (dict[str, Any]) – values for the task that will be “combined” before they are provided to the node

Returns:

self – a reference to the task

Return type:

TaskBase

property cont_dim
property done

Check whether the tasks has been finalized and all outputs are stored.

property errored

Check if the task has raised an error

property generated_output_names

Get the names of the outputs generated by the task. If the spec doesn’t have generated_output_names method, it uses output_names. The results depends on the input provided to the task

get_input_el(ind)

Collect all inputs required to run the node (for specific state element).

help(returnhelp=False)

Print class help.

property lzout
property output_dir

Get the filesystem path where outputs will be written.

property output_names

Get the names of the outputs from the task’s output_spec (not everything has to be generated, see generated_output_names).

pickle_task()

Pickling the tasks with full inputs

result(state_index=None, return_inputs=False)

Retrieve the outcomes of this particular task.

Parameters:
  • state_index (:obj: int) – index of the element for task with splitter and multiple states

  • return_inputs (:obj: bool, str) – if True or “val” result is returned together with values of the input fields, if “ind” result is returned together with indices of the input fields

Returns:

result – the result of the task

Return type:

Result

set_state(splitter, combiner=None)

Set a particular state on this task.

Parameters:
  • splitter – TODO

  • combiner – TODO

split(splitter: str | List[str] | Tuple[str, ...] | None = None, overwrite: bool = False, cont_dim: dict | None = None, **inputs)

Run this task parametrically over lists of split inputs.

Parameters:
  • splitter (str or list[str] or tuple[str] or None) – the fields which to split over. If splitting over multiple fields, lists of fields are interpreted as outer-products and tuples inner-products. If None, then the fields to split are taken from the keyword-arg names.

  • overwrite (bool, optional) – whether to overwrite an existing split on the node, by default False

  • cont_dim (dict, optional) – Container dimensions for specific inputs, used in the splitter. If input name is not in cont_dim, it is assumed that the input values has a container dimension of 1, so only the most outer dim will be used for splitting.

  • **split_inputs – fields to split over, will automatically be wrapped in a StateArray object and passed to the node inputs

Returns:

self – a reference to the task

Return type:

TaskBase

property uid

the unique id number for the task It will be used to create unique names for slurm scripts etc. without a need to run checksum

property version

Get version of this task structure.

class pydra.engine.core.Workflow(name, audit_flags: AuditFlag = AuditFlag.NONE, cache_dir=None, cache_locations=None, input_spec: List[str] | Dict[str, Type[Any]] | SpecInfo | None = None, cont_dim=None, messenger_args=None, messengers=None, output_spec: List[str] | Dict[str, type] | SpecInfo | BaseSpec | None = None, rerun=False, propagate_rerun=True, **kwargs)

Bases: TaskBase

A composite task with structure of computational graph.

add(task)

Add a task to the workflow.

Parameters:

task (TaskBase) – The task to be added.

property checksum

Calculates the unique checksum of the task. Used to create specific directory name for task that are run; and to create nodes checksums needed for graph checksums (before the tasks have inputs etc.)

create_connections(task, detailed=False)

Add and connect a particular task to existing nodes in the workflow.

Parameters:
  • task (TaskBase) – The task to be added.

  • detailed (bool) – If True, add_edges_description is run for self.graph to add a detailed descriptions of the connections (input/output fields names)

create_dotfile(type='simple', export=None, name=None, output_dir=None)

creating a graph - dotfile and optionally exporting to other formats

property graph_sorted

Get a sorted graph representation of the workflow.

property lzin
property nodes

Get the list of node names.

set_output(connections: Tuple[str, LazyField] | List[Tuple[str, LazyField]])

Set outputs of the workflow by linking them with lazy outputs of tasks

Parameters:

connections (tuple[str, LazyField] or list[tuple[str, LazyField]] or None) – single or list of tuples linking the name of the output to a lazy output of a task in the workflow.

pydra.engine.core.is_lazy(obj)

Check whether an object has any field that is a Lazy Field

pydra.engine.core.is_task(obj)

Check whether an object looks like a task.

pydra.engine.core.is_workflow(obj)

Check whether an object is a Workflow instance.