Glossary ======== .. glossary:: Cache-root The directory where cache directories for tasks to be executed are created. Task cache directories are named within the cache root directory using a hash of the task's parameters, so that the same task with the same parameters can be reused. Combiner A combiner is used to combine :ref:`State-array` values created by a split operation defined by a :ref:`Splitter` on the current node, upstream workflow nodes or stand-alone tasks. Container-ndim The number of dimensions of the container object to be iterated over when using a :ref:`Splitter` to split over an iterable value. For example, a list-of-lists or a 2D array with `container_ndim=2` would be split over the elements of the inner lists into a single 1-D state array. However, if `container_ndim=1`, the outer list/2D would be split into a 1-D state array of lists/1D arrays. Environment An environment refers to a specific software encapsulation, such as a Docker or Singularity image, that is used to run a task. Field A field is a parameter of a task, or a task outputs object, that can be set to a specific value. Fields are specified to be of any types, including objects and file-system objects. Hook A hook is a user-defined function that is executed at a specific point in the task execution process. Hooks can be used to prepare/finalise the task cache directory or send notifications Job A job is a discrete unit of work, a :ref:`Task`, with all inputs resolved (i.e. not lazy-values or state-arrays) that has been assigned to a worker. A task describes "what" is to be done and a submitter object describes "how" it is to be done, a job combines both objects to describe a concrete unit of processing. Lazy-fields A lazy-field is a field that is not immediately resolved to a value. Instead, it is a placeholder that will be resolved at runtime, allowing for dynamic parameterisation of tasks. Node A single task within the context of a workflow, which is assigned a name and references a state. Note this task can be nested workflow task. Read-only-caches A read-only cache is a cache root directory that was created by a previous pydra runs, which is checked for matching task caches to be reused if present but not written not modified during the execution of a task. State The combination of all upstream splits and combines with any splitters and combiners for a given node, it is used to track how many jobs, and their parameterisations, need to be run for a given workflow node. State-array A state array is a collection of parameterised tasks or values that were generated by a split operation either at the current or upstream node of a workflow. The size of the array is determined by the :ref:`State` of the workflow node. Splitter Defines how a task's inputs are to be split into multiple jobs. For example if a task's input takes an integer, a list of integers can be passed to it split over to create a ref:`State-array` of jobs. Different combinations of Submitter A submitter object parameterises how a task is to be executed, by defining the worker, environment, cache-root directory and other key execution parameters to be used when executing a task. Task A task describes a unit of work to be done (but not how it will be), either standalone or as one step in a larger workflow. Tasks can be of various types, including Python functions, shell commands, and nested workflows. Tasks are parameterised, meaning they can accept inputs and produce Worker Encapsulation of a task execution environment. It is responsible for executing tasks and managing their lifecycle. Workers can be local (e.g., a thread or process) or remote (e.g., high-performance cluster). Workflow A Directed-Acyclic-Graph (DAG) of parameterised tasks, to be executed in order. Note that a Workflow object is created by a :class:`WorkflowTask`'s `construct()` method at runtime and is not directly created by the end user.