{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Shell-tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Command-line templates\n", "\n", "Shell task specs can be defined using string templates that resemble the command-line usage examples typically used in in-line help. Therefore, they can be quick and intuitive way to specify a shell task. For example, a simple spec for the copy command `cp` that omits optional flags," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydra.compose import shell\n", "\n", "Cp = shell.define(\"cp \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Input and output fields are both specified by placing the name of the field within enclosing `<` and `>`. Outputs are differentiated by the `out|` prefix.\n", "\n", "This shell task can then be run just as a Python task would be run, first parameterising it, then executing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from tempfile import mkdtemp\n", "\n", "# Make a test file to copy\n", "test_dir = Path(mkdtemp())\n", "test_file = test_dir / \"in.txt\"\n", "with open(test_file, \"w\") as f:\n", " f.write(\"Contents to be copied\")\n", "\n", "# Parameterise the task\n", "cp = Cp(in_file=test_file, destination=test_dir / \"out.txt\")\n", "\n", "# Print the cmdline to be run to double check\n", "print(f\"Command-line to be run: {cp.cmdline}\")\n", "\n", "# Run the shell-comand task\n", "outputs = cp()\n", "\n", "print(\n", " f\"Contents of copied file ('{outputs.destination}'): \"\n", " f\"'{Path(outputs.destination).read_text()}'\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If paths to output files are not provided in the parameterisation, it will default to the name of the field" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cp = Cp(in_file=test_file)\n", "print(cp.cmdline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining input/output types\n", "\n", "By default, shell-command fields are considered to be of `fileformats.generic.FsObject` type. However, more specific file formats or built-in Python types can be specified by appending the type to the field name after a `:`.\n", "\n", "File formats are specified by their MIME type or \"MIME-like\" strings (see the [FileFormats docs](https://arcanaframework.github.io/fileformats/mime.html) for details)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fileformats.image import Png\n", "\n", "TrimPng = shell.define(\"trim-png \")\n", "\n", "trim_png = TrimPng(in_image=Png.mock(), out_image=\"/path/to/output.png\")\n", "\n", "print(trim_png.cmdline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Flags and options\n", "\n", "Command line flags can also be added to the shell template, either the single or double hyphen form.\n", "The field template name immediately following the flag will be associate with that flag.\n", "If there is no space between the flag and the field template, then the field is assumed\n", "to be a boolean, otherwise it is assumed to be of type string unless otherwise specified.\n", "\n", "If a field is optional, the field template should end with a `?`. Tuple fields are\n", "specified by comma separated types. The ellipsis (`...`) can signify tuple types with\n", "variable number of items. Arguments and options that can be repeated are specified by\n", "appending a `+` (at least one must be provided) or `*` (defaults to empty list). Note that\n", "for options, this signifies that the flag itself is printed multiple times. e.g.\n", "`my-command --multi-opt 1 2 --multi-opt 1 5`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydra.utils import print_help\n", "\n", "Cp = shell.define(\n", " \"cp \"\n", " \"-R \"\n", " \"--text-arg \"\n", " \"--int-arg \"\n", " \"--tuple-arg \"\n", ")\n", "\n", "print_help(Cp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defaults\n", "\n", "Defaults can be specified by appending them to the field template after `=`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydra.utils import task_fields\n", "\n", "Cp = shell.define(\n", " \"cp \"\n", " \"-R \"\n", " \"--text-arg \"\n", " \"--int-arg \"\n", " \"--tuple-arg \"\n", ")\n", "\n", "print(f\"'--int-arg' default: {task_fields(Cp).int_arg.default}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Path templates for output files\n", "\n", "By default, when an output file argument is defined, a `path_template` attribute will\n", "be assigned to the field based on its name and extension (if applicable). For example,\n", "the `zipped` output field in the following Gzip command will be assigned a\n", "`path_template` of `out_file.gz`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydra.compose import shell\n", "from fileformats.generic import File\n", "\n", "Gzip = shell.define(\"gzip \")\n", "gzip = Gzip(in_files=File.mock(\"/a/file.txt\"))\n", "print(gzip.cmdline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, if this needs to be specified it can be by using the `$` operator, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Gzip = shell.define(\"gzip \")\n", "gzip = Gzip(in_files=File.mock(\"/a/file.txt\"))\n", "print(gzip.cmdline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To give the field a path_template of `archive.gz` when it is written on the command line.\n", "Note that this value can always be overridden when the task is initialised, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gzip = Gzip(in_files=File.mock(\"/a/file.txt\"), out_file=\"/path/to/archive.gz\")\n", "print(gzip.cmdline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional field attributes\n", "\n", "Additional attributes of the fields in the template can be specified by providing `shell.arg` or `shell.outarg` fields to the `inputs` and `outputs` keyword arguments to the define" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Cp = shell.define(\n", " (\n", " \"cp \"\n", " \"-R \"\n", " \"--text-arg \"\n", " \"--int-arg \"\n", " \"--tuple-arg \"\n", " ),\n", " inputs={\n", " \"recursive\": shell.arg(\n", " help=(\n", " \"If source_file designates a directory, cp copies the directory and \"\n", " \"the entire subtree connected at that point.\"\n", " )\n", " )\n", " },\n", " outputs={\n", " \"out_dir\": shell.outarg(position=-2),\n", " \"out_file\": shell.outarg(position=-1),\n", " },\n", ")\n", "\n", "\n", "print_help(Cp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Callable outptus\n", "\n", "In addition to outputs that are specified to the tool on the command line, outputs can be derived from the outputs of the tool by providing a Python function that can take the output directory and inputs as arguments and return the output value. Callables can be either specified in the `callable` attribute of the `shell.out` field, or in a dictionary mapping the output name to the callable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from pydra.compose import shell\n", "from pathlib import Path\n", "from fileformats.generic import File\n", "\n", "\n", "# Arguments to the callable function can be one of\n", "def get_file_size(out_file: Path) -> int:\n", " \"\"\"Calculate the file size\"\"\"\n", " result = os.stat(out_file)\n", " return result.st_size\n", "\n", "\n", "CpWithSize = shell.define(\n", " \"cp \",\n", " outputs={\"out_file_size\": get_file_size},\n", ")\n", "\n", "# Parameterise the task\n", "cp_with_size = CpWithSize(in_file=File.sample())\n", "\n", "# Run the command\n", "outputs = cp_with_size()\n", "\n", "\n", "print(f\"Size of the output file is: {outputs.out_file_size}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The callable can take any combination of the following arguments, which will be passed\n", "to it when it is called\n", "\n", "* field: the `Field` object to be provided a value, useful when writing generic callables\n", "* cache_dir: a `Path` object referencing the working directory the command was run within\n", "* inputs: a dictionary containing all the resolved inputs to the task\n", "* stdout: the standard output stream produced by the command\n", "* stderr: the standard error stream produced by the command\n", "* *name of an input*: the name of any of the input arguments to the task, including output args that are part of the command line (i.e. output files)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make workflows that use the interface type-checkable, the canonical form of a shell\n", "task dataclass should inherit from `shell.Def` parameterized by its nested Outputs class,\n", "and the `Outputs` nested class should inherit from `shell.Outputs`. Arguments that are\n", "provided None values are not included in the command line, so optional arguments should\n", "be typed as one of these equivalent forms `ty.Union[T, None]`, `ty.Optional[T]` or `T | None`\n", "and have a default of `None`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydra.utils.typing import MultiInputObj\n", "from fileformats.generic import FsObject, Directory\n", "\n", "\n", "@shell.define\n", "class Cp(shell.Task[\"Cp.Outputs\"]):\n", "\n", " executable = \"cp\"\n", "\n", " in_fs_objects: MultiInputObj[FsObject]\n", " recursive: bool = shell.arg(argstr=\"-R\", default=False)\n", " text_arg: str = shell.arg(argstr=\"--text-arg\")\n", " int_arg: int | None = shell.arg(argstr=\"--int-arg\", default=None)\n", " tuple_arg: tuple[int, str] | None = shell.arg(argstr=\"--tuple-arg\", default=None)\n", "\n", " class Outputs(shell.Outputs):\n", " out_dir: Directory = shell.outarg(path_template=\"{out_dir}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dynamic definitions\n", "\n", "In some cases, it is required to generate the definition for a task dynamically, which can be done by just providing the executable to `shell.define` and specifying all inputs and outputs explicitly" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fileformats.generic import File\n", "from pydra.utils import print_help\n", "\n", "ACommand = shell.define(\n", " \"a-command\",\n", " inputs={\n", " \"in_file\": shell.arg(type=File, help=\"output file\", argstr=\"\", position=-2)\n", " },\n", " outputs={\n", " \"out_file\": shell.outarg(type=File, help=\"output file\", argstr=\"\", position=-1),\n", " \"out_file_size\": {\n", " \"type\": int,\n", " \"help\": \"size of the output directory\",\n", " \"callable\": get_file_size,\n", " },\n", " },\n", ")\n", "\n", "print_help(ACommand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "wf13", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 2 }