5. ShellCommandTask#
import nest_asyncio
nest_asyncio.apply()
In addition to FunctionTask
, pydra allows for creating tasks from shell commands by using ShellCommandTask
.
Let’s run a simple command pwd
using pydra
import pydra
cmd = 'pwd'
# we should use executable to pass the command we want to run
shelly = pydra.ShellCommandTask(name='shelly', executable=cmd)
# we can always check the cmdline of our task
shelly.cmdline
'pwd'
and now let’s try to run it:
with pydra.Submitter(plugin='cf') as sub:
sub(shelly)
and check the result
shelly.result()
Result(output=Output(return_code=0, stdout='/tmp/tmpvdlo4t4p/ShellCommandTask_617121258bf6bc240a2d201f488975df\n', stderr=''), runtime=None, errored=False)
the result should have return_code
, stdout
and stderr
. If everything goes well return_code
should be 0
, stdout
should point to the working directory and stderr
should be an empty string.
5.1. Commands with arguments and inputs#
you can also use longer command by providing a list:
cmd = ['echo', 'hail', 'pydra']
shelly = pydra.ShellCommandTask(name='shelly', executable=cmd)
print('cmndline = ', shelly.cmdline)
with pydra.Submitter(plugin='cf') as sub:
sub(shelly)
shelly.result()
cmndline = echo hail pydra
Result(output=Output(return_code=0, stdout='hail pydra\n', stderr=''), runtime=None, errored=False)
5.1.1. using args#
In addition to executable
, we can also use args
. Last example can be also rewritten:
cmd = 'echo'
args = ['hail', 'pydra']
shelly = pydra.ShellCommandTask(name='shelly', executable=cmd, args=args)
print('cmndline = ', shelly.cmdline)
with pydra.Submitter(plugin='cf') as sub:
sub(shelly)
shelly.result()
cmndline = echo hail pydra
Result(output=Output(return_code=0, stdout='hail pydra\n', stderr=''), runtime=None, errored=False)
5.2. Customized input#
Pydra always checks executable
and args
, but we can also provide additional inputs, in order to do it, we have to modify input_spec
first by using SpecInfo
class:
import attr
my_input_spec = pydra.specs.SpecInfo(
name='Input',
fields=[
(
'text',
attr.ib(
type=str,
metadata={
'position': 1,
'argstr': '',
'help_string': 'text',
'mandatory': True,
},
),
)
],
bases=(pydra.specs.ShellSpec,),
)
Notice, that in order to create your own input_spec
, you have to provide a list of fields
. There are several valid syntax to specify elements of fields
:
(name, attribute)
(name, type, default)
(name, type, default, metadata)
(name, type, metadata)
where name
, type
, and default
are the name, type and default values of the field. attribute
is defined by using attr.ib
, in the example the attribute has type
and metadata
, but the full specification can be found here.
In metadata
, you can provide additional information that is used by pydra
, help_string
is the only key that is required, and the full list of supported keys is ['position', 'argstr', 'requires', 'mandatory', 'allowed_values', 'output_field_name', 'copyfile', 'separate_ext', 'container_path', 'help_string', 'xor', 'output_file_template']
. Among the supported keys, you have:
help_string
: a sring, description of the argument;position
: integer grater than 0, defines the relative position of the arguments when the shell command is constructed;argstr
: a string, e.g. “-o”, can be used to specify a flag if needed for the command argument;mandatory
: a bool, if True, pydra will raise an exception, if the argument is not provided;
The complete documentations for all suported keys is available here.
To define my_input_spec
we used the most general syntax that requires (name, attribute)
, but
perhaps the simplest syntax is the last one, that contains (name, type, metadata)
. Using this syntax, my_input_spec
could look like this:
my_input_spec_short = pydra.specs.SpecInfo(
name="Input",
fields=[
("text", str, {"position": 1, "help_string": "text", "mandatory": True}),
],
bases=(pydra.specs.ShellSpec,),
)
After defining my_input_spec
, we can define our task:
cmd_exec = 'echo'
hello = 'HELLO'
shelly = pydra.ShellCommandTask(
name='shelly', executable=cmd_exec, text=hello, input_spec=my_input_spec
)
print('cmndline = ', shelly.cmdline)
with pydra.Submitter(plugin='cf') as sub:
sub(shelly)
shelly.result()
cmndline = echo HELLO
Result(output=Output(return_code=0, stdout='HELLO\n', stderr=''), runtime=None, errored=False)
5.3. Customized output#
We can also customized output if we want to return something more than the stdout
, e.g. a file.
my_output_spec = pydra.specs.SpecInfo(
name='Output',
fields=[('newfile', pydra.specs.File, 'newfile_tmp.txt')],
bases=(pydra.specs.ShellOutSpec,),
)
now we can create a task that returns a new file:
cmd = ['touch', 'newfile_tmp.txt']
shelly = pydra.ShellCommandTask(
name='shelly', executable=cmd, output_spec=my_output_spec
)
print('cmndline = ', shelly.cmdline)
with pydra.Submitter(plugin='cf') as sub:
sub(shelly)
shelly.result()
cmndline = touch newfile_tmp.txt
Result(output=Output(return_code=0, stdout='', stderr='', newfile=File('/tmp/tmp08mnff5c/ShellCommandTask_291a745f54ab54d29f6c45c6c4efb950/newfile_tmp.txt')), runtime=None, errored=False)
5.3.1. Exercise 1#
Write a task that creates two new files, use provided output spec.
cmd = 'touch'
args = ['newfile_1.txt', 'newfile_2.txt']
my_output_spec = pydra.specs.SpecInfo(
name='Output',
fields=[
(
'out1',
attr.ib(
type=pydra.specs.File,
metadata={
'output_file_template': '{args}',
'help_string': 'output file',
},
),
)
],
bases=(pydra.specs.ShellOutSpec,),
)
# write your solution here
DO NOT RUN IF Docker IS NOT AVAILABLE
Note, that the following task use Docker, so they will fail if the Docker is not available. It will also fail in Binder.
5.4. DockerTask#
all the commands can be also run in a docker container using DockerTask
. Syntax is very similar, but additional argument image
is required.
cmd = 'whoami'
docky = pydra.DockerTask(name='docky', executable=cmd, image='busybox')
with pydra.Submitter() as sub:
docky(submitter=sub)
docky.result()
Result(output=Output(return_code=0, stdout='root\n', stderr="Unable to find image 'busybox:latest' locally\nlatest: Pulling from library/busybox\n3f4d90098f5b: Pulling fs layer\n3f4d90098f5b: Verifying Checksum\n3f4d90098f5b: Download complete\n3f4d90098f5b: Pull complete\nDigest: sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79\nStatus: Downloaded newer image for busybox:latest\n"), runtime=None, errored=False)
5.4.1. Exercise2#
Use splitter to run the same command in two different images:
Show code cell content
cmd = 'whoami'
docky = pydra.DockerTask(
name='docky', executable=cmd, image=['busybox', 'ubuntu']
).split('image')
with pydra.Submitter() as sub:
docky(submitter=sub)
docky.result()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[14], line 4
1 cmd = 'whoami'
2 docky = pydra.DockerTask(
3 name='docky', executable=cmd, image=['busybox', 'ubuntu']
----> 4 ).split('image')
6 with pydra.Submitter() as sub:
7 docky(submitter=sub)
File /usr/share/miniconda/envs/pydra-tutorial/lib/python3.11/site-packages/pydra/engine/core.py:605, in TaskBase.split(self, splitter, overwrite, cont_dim, **inputs)
603 missing = [m for m in missing if not m.startswith("_")]
604 if missing:
--> 605 raise ValueError(
606 f"Split is missing values for the following fields {list(missing)}"
607 )
608 splitter = hlpst.add_name_splitter(splitter, self.name)
609 # if user want to update the splitter, overwrite has to be True
ValueError: Split is missing values for the following fields ['image']
# write your solution here
5.4.1.1. Using ShellCommandTask
with container_info
argument:#
You can run the shell command in a docker container by adding container_info
argument to ShellCommandTask
:
shelly = pydra.ShellCommandTask(
name='shelly', executable='whoami', container_info=('docker', 'busybox')
)
with pydra.Submitter() as sub:
shelly(submitter=sub)
shelly.result()
Result(output=Output(return_code=0, stdout='root\n', stderr=''), runtime=None, errored=False)
If we don’t provide container_info
the output should be different:
shelly = pydra.ShellCommandTask(name='shelly', executable='whoami')
with pydra.Submitter() as sub:
shelly(submitter=sub)
shelly.result()
Result(output=Output(return_code=0, stdout='runner\n', stderr=''), runtime=None, errored=False)