5. ShellCommandTask#

import nest_asyncio

nest_asyncio.apply()

In addition to FunctionTask, pydra allows for creating tasks from shell commands by using ShellCommandTask.

Let’s run a simple command pwd using pydra

import pydra
cmd = 'pwd'
# we should use executable to pass the command we want to run
shelly = pydra.ShellCommandTask(name='shelly', executable=cmd)

# we can always check the cmdline of our task
shelly.cmdline
'pwd'

and now let’s try to run it:

with pydra.Submitter(plugin='cf') as sub:
    sub(shelly)

and check the result

shelly.result()
Result(output=Output(return_code=0, stdout='/tmp/tmpvdlo4t4p/ShellCommandTask_617121258bf6bc240a2d201f488975df\n', stderr=''), runtime=None, errored=False)

the result should have return_code, stdout and stderr. If everything goes well return_code should be 0, stdout should point to the working directory and stderr should be an empty string.

5.1. Commands with arguments and inputs#

you can also use longer command by providing a list:

cmd = ['echo', 'hail', 'pydra']
shelly = pydra.ShellCommandTask(name='shelly', executable=cmd)
print('cmndline = ', shelly.cmdline)

with pydra.Submitter(plugin='cf') as sub:
    sub(shelly)
shelly.result()
cmndline =  echo hail pydra
Result(output=Output(return_code=0, stdout='hail pydra\n', stderr=''), runtime=None, errored=False)

5.1.1. using args#

In addition to executable, we can also use args. Last example can be also rewritten:

cmd = 'echo'
args = ['hail', 'pydra']

shelly = pydra.ShellCommandTask(name='shelly', executable=cmd, args=args)
print('cmndline = ', shelly.cmdline)

with pydra.Submitter(plugin='cf') as sub:
    sub(shelly)
shelly.result()
cmndline =  echo hail pydra
Result(output=Output(return_code=0, stdout='hail pydra\n', stderr=''), runtime=None, errored=False)

5.2. Customized input#

Pydra always checks executable and args, but we can also provide additional inputs, in order to do it, we have to modify input_spec first by using SpecInfo class:

import attr

my_input_spec = pydra.specs.SpecInfo(
    name='Input',
    fields=[
        (
            'text',
            attr.ib(
                type=str,
                metadata={
                    'position': 1,
                    'argstr': '',
                    'help_string': 'text',
                    'mandatory': True,
                },
            ),
        )
    ],
    bases=(pydra.specs.ShellSpec,),
)

Notice, that in order to create your own input_spec, you have to provide a list of fields. There are several valid syntax to specify elements of fields:

  • (name, attribute)

  • (name, type, default)

  • (name, type, default, metadata)

  • (name, type, metadata)

where name, type, and default are the name, type and default values of the field. attribute is defined by using attr.ib, in the example the attribute has type and metadata, but the full specification can be found here.

In metadata, you can provide additional information that is used by pydra, help_string is the only key that is required, and the full list of supported keys is ['position', 'argstr', 'requires', 'mandatory', 'allowed_values', 'output_field_name', 'copyfile', 'separate_ext', 'container_path', 'help_string', 'xor', 'output_file_template']. Among the supported keys, you have:

  • help_string: a sring, description of the argument;

  • position: integer grater than 0, defines the relative position of the arguments when the shell command is constructed;

  • argstr: a string, e.g. “-o”, can be used to specify a flag if needed for the command argument;

  • mandatory: a bool, if True, pydra will raise an exception, if the argument is not provided;

The complete documentations for all suported keys is available here.

To define my_input_spec we used the most general syntax that requires (name, attribute), but perhaps the simplest syntax is the last one, that contains (name, type, metadata). Using this syntax, my_input_spec could look like this:

my_input_spec_short = pydra.specs.SpecInfo(
    name="Input",
    fields=[
        ("text", str, {"position": 1, "help_string": "text", "mandatory": True}),
    ],
    bases=(pydra.specs.ShellSpec,),
)

After defining my_input_spec, we can define our task:

cmd_exec = 'echo'
hello = 'HELLO'
shelly = pydra.ShellCommandTask(
    name='shelly', executable=cmd_exec, text=hello, input_spec=my_input_spec
)

print('cmndline = ', shelly.cmdline)

with pydra.Submitter(plugin='cf') as sub:
    sub(shelly)
shelly.result()
cmndline =  echo HELLO
Result(output=Output(return_code=0, stdout='HELLO\n', stderr=''), runtime=None, errored=False)

5.3. Customized output#

We can also customized output if we want to return something more than the stdout, e.g. a file.

my_output_spec = pydra.specs.SpecInfo(
    name='Output',
    fields=[('newfile', pydra.specs.File, 'newfile_tmp.txt')],
    bases=(pydra.specs.ShellOutSpec,),
)

now we can create a task that returns a new file:

cmd = ['touch', 'newfile_tmp.txt']
shelly = pydra.ShellCommandTask(
    name='shelly', executable=cmd, output_spec=my_output_spec
)

print('cmndline = ', shelly.cmdline)

with pydra.Submitter(plugin='cf') as sub:
    sub(shelly)
shelly.result()
cmndline =  touch newfile_tmp.txt
Result(output=Output(return_code=0, stdout='', stderr='', newfile=File('/tmp/tmp08mnff5c/ShellCommandTask_291a745f54ab54d29f6c45c6c4efb950/newfile_tmp.txt')), runtime=None, errored=False)

5.3.1. Exercise 1#

Write a task that creates two new files, use provided output spec.

cmd = 'touch'
args = ['newfile_1.txt', 'newfile_2.txt']

my_output_spec = pydra.specs.SpecInfo(
    name='Output',
    fields=[
        (
            'out1',
            attr.ib(
                type=pydra.specs.File,
                metadata={
                    'output_file_template': '{args}',
                    'help_string': 'output file',
                },
            ),
        )
    ],
    bases=(pydra.specs.ShellOutSpec,),
)

# write your solution here

DO NOT RUN IF Docker IS NOT AVAILABLE

Note, that the following task use Docker, so they will fail if the Docker is not available. It will also fail in Binder.

5.4. DockerTask#

all the commands can be also run in a docker container using DockerTask. Syntax is very similar, but additional argument image is required.

cmd = 'whoami'
docky = pydra.DockerTask(name='docky', executable=cmd, image='busybox')

with pydra.Submitter() as sub:
    docky(submitter=sub)

docky.result()
Result(output=Output(return_code=0, stdout='root\n', stderr="Unable to find image 'busybox:latest' locally\nlatest: Pulling from library/busybox\n3f4d90098f5b: Pulling fs layer\n3f4d90098f5b: Verifying Checksum\n3f4d90098f5b: Download complete\n3f4d90098f5b: Pull complete\nDigest: sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79\nStatus: Downloaded newer image for busybox:latest\n"), runtime=None, errored=False)

5.4.1. Exercise2#

Use splitter to run the same command in two different images:

Hide code cell content
cmd = 'whoami'
docky = pydra.DockerTask(
    name='docky', executable=cmd, image=['busybox', 'ubuntu']
).split('image')

with pydra.Submitter() as sub:
    docky(submitter=sub)

docky.result()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 4
      1 cmd = 'whoami'
      2 docky = pydra.DockerTask(
      3     name='docky', executable=cmd, image=['busybox', 'ubuntu']
----> 4 ).split('image')
      6 with pydra.Submitter() as sub:
      7     docky(submitter=sub)

File /usr/share/miniconda/envs/pydra-tutorial/lib/python3.11/site-packages/pydra/engine/core.py:605, in TaskBase.split(self, splitter, overwrite, cont_dim, **inputs)
    603     missing = [m for m in missing if not m.startswith("_")]
    604     if missing:
--> 605         raise ValueError(
    606             f"Split is missing values for the following fields {list(missing)}"
    607         )
    608 splitter = hlpst.add_name_splitter(splitter, self.name)
    609 # if user want to update the splitter, overwrite has to be True

ValueError: Split is missing values for the following fields ['image']
# write your solution here

5.4.1.1. Using ShellCommandTask with container_info argument:#

You can run the shell command in a docker container by adding container_info argument to ShellCommandTask:

shelly = pydra.ShellCommandTask(
    name='shelly', executable='whoami', container_info=('docker', 'busybox')
)
with pydra.Submitter() as sub:
    shelly(submitter=sub)

shelly.result()
Result(output=Output(return_code=0, stdout='root\n', stderr=''), runtime=None, errored=False)

If we don’t provide container_info the output should be different:

shelly = pydra.ShellCommandTask(name='shelly', executable='whoami')
with pydra.Submitter() as sub:
    shelly(submitter=sub)

shelly.result()
Result(output=Output(return_code=0, stdout='runner\n', stderr=''), runtime=None, errored=False)