General Linear Model (GLM)

In this tutorial, which is adapted from the Nilearn docs, we will go through a simple workflow of the first level general linear modeling with a BIDS dataset from openneuro. This analysis is only performed on one subject.

This tutorial is based on the Nilearn GLM tutorial.

[1]:
import nest_asyncio
nest_asyncio.apply()

Preparation

Import packages that will be used globally and set up output directory

[2]:
import warnings
import sys
if not sys.warnoptions:
    warnings.simplefilter("ignore")

import os
import typing as ty
from pathlib import Path

from pydra.compose import python, workflow
from pydra.engine.submitter import Submitter
from fileformats.generic import File, Directory
from fileformats.text import Csv
import pandas as pd
from scipy.stats import norm

import nibabel as nib
# These functions were removed within nilearn, so this notebook needs to be rewritten
# to use the 'openneuro' module instead
# from nilearn.datasets import (
#         fetch_openneuro_dataset_index,
#         fetch_openneuro_dataset,
#         select_from_index,
#     )
from nilearn.interfaces.fsl import get_design_from_fslmat
from nilearn.glm.first_level import first_level_from_bids
from nilearn.reporting import get_clusters_table, make_glm_report
from nilearn.plotting import (
    plot_glass_brain,
    plot_img_comparison,
    plot_contrast_matrix,
)
[3]:
# get current directory
pydra_tutorial_dir = os.path.dirname(os.getcwd())

# set up output directory
workflow_dir = Path(pydra_tutorial_dir) / 'outputs'
workflow_out_dir = workflow_dir / '6_glm'

# create the output directory if not exit
os.makedirs(workflow_out_dir, exist_ok=True)
[4]:
workflow_out_dir
[4]:
PosixPath('/home/runner/work/pydra/pydra/docs/source/outputs/6_glm')

Create tasks

In this section, we converte major steps into tasks. Each pydra task can have multiple python functions. We recommend to put those logically more related functions into the same task.

It is very important to keep in mind what adjacent tasks of your current task will be.

  1. Your previous task will decide your arguments in the current task

  2. Your next task will be impacted by the returns in the current task

fetch openneuro BIDS dataset

In this task, we do the following:

  1. get openneuro dataset index

  2. specify exclusion patterns and number of subjects

  3. download the data we need

Notes: Here we still use n_subjects as an argument. Given that we will only analyze one subject, you can also remove this argument and specify n_subjects =1 in select_from_index. If you do, do not forget to modify the argument in the workflow later.

[5]:
@python.define(outputs=["data_dir"])
def GetOpenneuroDataset(exclusion_patterns: list, n_subjects: int) -> str:
    _, urls = fetch_openneuro_dataset_index()
    urls = select_from_index(
        urls, exclusion_filters=exclusion_patterns, n_subjects=n_subjects
    )
    data_dir, _ = fetch_openneuro_dataset(urls=urls)
    return data_dir

obtain FirstLevelModel objects automatically and fit arguments

To get the first level model(s) we have to specify

  1. the dataset directory

  2. the task_label

  3. the space_label

  4. the folder with the desired derivatives (fMRIPrep)

In our case, we only have one subject so we will only have one first level model. Then, for this model, we will obtain

  1. the list of run images

  2. events

  3. confound regressors

Those are inferred from the confounds.tsv files available in the BIDS dataset.

[6]:
@python.define(outputs=["model", "imgs", "subject"])
def GetInfoFromBids(
    data_dir: Directory,
    task_label: str,
    space_label: str,
    smoothing_fwhm: float,
    derivatives_folder: Directory,
) -> ty.Tuple[ty.Any, list, str]:
    (
        models,
        models_run_imgs,
        models_events,
        models_confounds,
    ) = first_level_from_bids(
        dataset_path=data_dir,
        task_label=task_label,
        space_label=space_label,
        smoothing_fwhm=smoothing_fwhm,
        derivatives_folder=derivatives_folder,
    )
    model, imgs, events, confounds = (
        models[0],
        models_run_imgs[0],
        models_events[0],
        models_confounds[0],
    )
    subject = 'sub-' + model.subject_label
    return model, imgs, subject

Get design matrix

This task does the following:

  1. read the design matrix in .mat

  2. rename the column

  3. save the new design matrix as .csv

Think: What if we don’t save the new design matrix, but return it directly? In other words, we return a pandas.DataFrame instead of a path. What will happen? Worth a try :)

[7]:
@python.define(outputs=["dm_path"])
def GetDesignMatrix(data_dir: Directory, subject: str) -> Csv:
    fsl_design_matrix_path = data_dir.joinpath(
        'derivatives',
        'task',
        subject,
        'stopsignal.feat',
        'design.mat',
    )
    design_matrix = get_design_from_fslmat(
        fsl_design_matrix_path, column_names=None
    )

    design_columns = [
        'cond_%02d' % i for i in range(len(design_matrix.columns))
    ]
    design_columns[0] = 'Go'
    design_columns[4] = 'StopSuccess'
    design_matrix.columns = design_columns
    dm_path = Path('designmatrix.csv')
    design_matrix.to_csv(dm_path, index=None)
    return dm_path

Fit the first level model

What we are doing here is:

  1. use the design matrix to fit the first level model

  2. compute the contrast

  3. save the z_map and masker for further use

  4. generate a glm report (HTML file)

[8]:
@python.define(outputs=["model", "z_map_path", "masker", "glm_report_file"])
def ModelFit(model, imgs, dm_path, contrast: str) -> ty.Tuple[ty.Any, str, ty.Any, str]:
    design_matrix = pd.read_csv(dm_path)
    model.fit(imgs, design_matrices=[design_matrix])
    z_map = model.compute_contrast(contrast)
    z_map_path = Path('firstlevel_z_map.nii.gz')
    z_map.to_filename(z_map_path)
    masker_path = Path('firstlevel_masker.nii.gz')
    masker = model.masker_
    glm_report_file = Path('glm_report.html')
    report = make_glm_report(model, contrast)
    report.save_as_html(glm_report_file)
    return model, z_map_path, masker, glm_report_file

Get cluster table

For publication purposes, we obtain a cluster table.

[9]:
@python.define(outputs=["output_file"])
def ClusterTable(z_map_path: File) -> Csv:
    stat_img = nib.load(z_map_path)
    output_file = Path('cluster_table.csv')
    df = get_clusters_table(
        stat_img, stat_threshold=norm.isf(0.001), cluster_threshold=10
    )
    df.to_csv(output_file, index=None)
    return output_file

Make plots

Here we want to make some plots to display our results and compare the result from FSL.

  1. plot nilearn z-map

  2. plot fsl z-map

  3. plot nilearn and fsl comparison

  4. plot design matrix contrast

You can also separate this task into multiple sub-tasks. But it makes more sense to put them into one task as they use the same files and function nilearn.plotting repeatedly.

[10]:
@python.define(outputs=["output_file1", "output_file2", "output_file3", "output_file4"])
def Plots(
    data_dir: Directory,
    dm_path: File,
    z_map_path: File,
    contrast: str,
    subject: str,
    masker
) -> ty.Tuple[str, str, str, str]:
    # plot and save nilearn z-map
    z_map = nib.load(z_map_path)
    output_file1 = Path('nilearn_z_map.jpg')
    plot_glass_brain(
        z_map,
        output_file=output_file1,
        colorbar=True,
        threshold=norm.isf(0.001),
        title='Nilearn Z map of "StopSuccess - Go" (unc p<0.001)',
        plot_abs=False,
        display_mode='ortho',
    )

    # plot and save fsl z-map
    fsl_z_map = nib.load(
        os.path.join(
            data_dir,
            'derivatives',
            'task',
            subject,
            'stopsignal.feat',
            'stats',
            'zstat12.nii.gz',
        )
    )
    output_file2 = Path('fsl_z_map.jpg')
    plot_glass_brain(
        fsl_z_map,
        output_file=output_file2,
        colorbar=True,
        threshold=norm.isf(0.001),
        title='FSL Z map of "StopSuccess - Go" (unc p<0.001)',
        plot_abs=False,
        display_mode='ortho',
    )

    # plot and save nilearn and fsl comparison
    plot_img_comparison(
        [z_map],
        [fsl_z_map],
        masker,
        output_dir=workflow_out_dir,
        ref_label='Nilearn',
        src_label='FSL',
    )
    old = Path('0000.png')
    new = Path('nilearn_fsl_comp.jpg')
    os.rename(old, new)
    output_file3 = new
    print(output_file3)

    # plot and save design matrix contrast
    design_matrix = pd.read_csv(dm_path)
    output_file4 = Path('firstlevel_contrast.jpg')
    plot_contrast_matrix(contrast, design_matrix, output_file=output_file4)
    return output_file1, output_file2, output_file3, output_file4

Make a workflow from tasks

Now we have created all tasks we need for this first level analysis, and there are two choices for our next step.

  1. create one workflow to connect all tasks together

  2. create sub-workflows with some closely related tasks, and connect these workflows along with other tasks into a larger workflow.

We recommend the second approach as it is always a good practice to group tasks, especially when there are a large number of tasks in the analysis.

Our analysis can be divided into three parts: (1) get/read the data, (2) analyze the data, and (3) plot the result, where (1) and (3) only have one task each. So we can put all tasks in (2) into one workflow and name it as firstlevel or whatever you prefer.

[11]:
@workflow.define(outputs=["z_map", "masker", "subject", "dm_path", "cluster_table", "glm_report"])
def FirstLevelWorkflow(
    data_dir: Directory,
    contrast: str,
    output_dir: Path,
    task_label: str = 'stopsignal',
    space_label: str = 'MNI152NLin2009cAsym',
    derivatives_folder: str = 'derivatives/fmriprep',
    smoothing_fwhm: float = 5.0,
) -> ty.Tuple[str, str, str, File, str, str]:

    # add task - get_info_from_bids
    get_info_from_bids = workflow.add(
        GetInfoFromBids(
            data_dir=data_dir,
            task_label=task_label,
            space_label=space_label,
            derivatives_folder=derivatives_folder,
            smoothing_fwhm=smoothing_fwhm,
        )
    )
    # add task - get_designmatrix
    get_designmatrix = workflow.add(
        GetDesignMatrix(
            data_dir=data_dir,
            subject=get_info_from_bids.subject,
        )
    )
    l1estimation = workflow.add(
        ModelFit(
            model=get_info_from_bids.model,
            imgs=get_info_from_bids.imgs,
            dm_path=get_designmatrix.dm_path,
            contrast=contrast,
        )
    )
    # add task - cluster_table
    cluster_table = workflow.add(
        ClusterTable(
            z_map_path=l1estimation.z_map_path,
        )
    )
    # specify output
    return (
        l1estimation.z_map_path,
        l1estimation.masker,
        get_info_from_bids.subject,
        get_designmatrix.dm_path,
        cluster_table.output_file,
        l1estimation.glm_report_file,
    )

The overaching workflow

Connect other tasks and the above workflow into one

Now we need to create the overaching glm workflow that connects the above workflow and other tasks (e.g., get/read the data and plot the result)

[12]:
@workflow.define(outputs=["output1", "output2", "output3", "output4"])
def FullWorkflow(
    output_dir: Path,
    n_subjects: int = 1,
    contrast: str = 'StopSuccess - Go',
    exclusion_patterns: list[str] | None = None,
) -> tuple[ty.Any, ty.Any, ty.Any, ty.Any]:
    if exclusion_patterns is None:
        exclusion_patterns = [
            '*group*',
            '*phenotype*',
            '*mriqc*',
            '*parameter_plots*',
            '*physio_plots*',
            '*space-fsaverage*',
            '*space-T1w*',
            '*dwi*',
            '*beh*',
            '*task-bart*',
            '*task-rest*',
            '*task-scap*',
            '*task-task*',
        ]

    get_openneuro_dataset = workflow.add(
        GetOpenneuroDataset(
            exclusion_patterns=exclusion_patterns,
            n_subjects=n_subjects,
        )
    )

    wf_firstlevel = workflow.add(
        FirstLevelWorkflow(
            data_dir=get_openneuro_dataset.data_dir,
            contrast=contrast,
            output_dir=output_dir,
        )
    )

    plots = workflow.add(
        Plots(
            data_dir=get_openneuro_dataset.data_dir,
            dm_path=wf_firstlevel.dm_path,
            z_map_path=wf_firstlevel.z_map,
            contrast=contrast,
            subject=wf_firstlevel.subject,
            masker=wf_firstlevel.masker,
        )
    )

    return (
        plots.output_file1,
        plots.output_file2,
        plots.output_file3,
        plots.output_file4,
    )

Run Workflow Run

[13]:
wf = FullWorkflow(output_dir=workflow_out_dir, n_subjects=1, contrast='StopSuccess - Go')

if __name__ == "__main__":
    with Submitter(worker='cf', n_procs=4) as sub:
        results = sub(wf)

    print(results)

Task execution failed
Full crash report for 'FullWorkflow' job is here: /home/runner/.cache/pydra/1.0a0/run-cache/workflow-970ddbd8393fbbd70801771ec2b613e9/_error.pklz
Result(cache_dir=PosixPath('/home/runner/.cache/pydra/1.0a0/run-cache/workflow-970ddbd8393fbbd70801771ec2b613e9'), outputs=None, runtime=None, errored=True, task=FullWorkflow(output_dir=PosixPath('/home/runner/work/pydra/pydra/docs/source/outputs/6_glm')))

Visualization

If you arrive here without any errors, yay, you just made your first pydra workflow for a first-level GLM!

Examine folder structure

Let’s take a look at what you have got.

[14]:
! ls ../outputs/6_glm

Plot figures

[15]:
from IPython.display import Image


if not results.errored:
    # First-level contrast
    Image(filename='../outputs/6_glm/firstlevel_contrast.jpg')

    # Nilearn Z map
    Image(filename='../outputs/6_glm/nilearn_z_map.jpg')

    # FSL Z map
    Image(filename='../outputs/6_glm/fsl_z_map.jpg')

    # Nilearn and FSL comparison
    Image(filename='../outputs/6_glm/nilearn_fsl_comp.jpg')

Exercise

What if we need to run the first-level GLM on multiple subject? We will need the splitter.

So, where should we add .split?