Typing and file-formats

Pydra implements strong(-ish) type-checking at workflow construction time so some errors can be caught before workflows are run on potentially expensive computing resources. Input and output fields of tasks can be typed using Python annotations. Unlike how they are typically used, in Pydra these type annotations are not just for documentation and linting purposes, but are used to enforce the types of the inputs and outputs of tasks and workflows at workflow construction and runtime.

Note

With the exception of fields containing file-system paths, which should be typed a FileFormats class, types don't need to be specified if not desired.

File formats

The FileFormats package provides a way to specify the format of a file, or set of files, by the extensible collection of file format classes. These classes can be used to specify the format of a file in a task input or output, and can be used to validate the format of a file at runtime.

It is important to use a FileFormats type instead of a str or pathlib.Path, when defining a field that take paths to file-system objects, because otherwise only the file path, not the file contents, will be used in the hash used to locate the cache (see Caches and hashes). However, in most cases, it is sufficient to use the generic fileformats.generic.File, fileformats.generic.Directory, or the even more generic fileformats.generic.FsObject or fileformats.generic.FileSet classes.

The only cases where it isn't sufficient to use generic classes, is when there are implicit header or side cars assumed to be present adjacent to the primary file (e.g. a NIfTI file my_nifti.nii with an associated JSON sidecar file my_nifti.json). Because the header/sidecar file(s) will not be included in the hash calculation by default and may be omitted if the "file set" is copied into a different work directories. In such cases, a specific file format class, such as fileformats.nifti.NiftiGzX, should be used instead.

Coercion

Pydra will attempt to coerce the input to the correct type if it is not already, for example if a tuple is provided to a field that is typed as a list, Pydra will convert the tuple to a list before the task is run. By default the following coercions will be automatically applied between the following types:

  • ty.Sequence → ty.Sequence

  • ty.Mapping → ty.Mapping

  • Path → os.PathLike

  • str → os.PathLike

  • os.PathLike → Path

  • os.PathLike → str

  • ty.Any → MultiInputObj

  • int → float

  • field.Integer → float

  • int → field.Decimal

In addition to this, fileformats.fields.Singular (see FileFormats) can be coerced to and from their primitive types and Numpy ndarrays and primitive types can be coerced to and from Python sequences and built-in types, respectively.

Superclass auto-casting

Pydra is designed so that strict and specific typing can be used, but is not unnecessarily strict, if it proves too burdensome. Therefore, upstream fields that are typed as super classes (or as typing.Any by default) of the task input they are connected to will be automatically cast to the subclass when the task is run. This allows workflows and tasks to be easily connected together regardless of how specific typing is defined in the task definition. This includes file format types, so a task that expects a fileformats.medimage.NiftiGz file can be connected to a task that outputs a fileformats.generic.File file. Therefore, the only cases where a typing error will be raised are when the upstream field can't be cast or coerced to the downstream field, e.g. a fileformats.medimage.DicomSeries cannot be cast to a fileformats.medimage.Nifti file.