General Linear Model (GLM)¶
In this tutorial, which is adapted from the Nilearn docs, we will go through a simple workflow of the first level general linear modeling with a BIDS dataset from openneuro. This analysis is only performed on one subject.
This tutorial is based on the Nilearn GLM tutorial.
[1]:
import nest_asyncio
nest_asyncio.apply()
Preparation¶
Import packages that will be used globally and set up output directory
[2]:
import warnings
import sys
if not sys.warnoptions:
warnings.simplefilter("ignore")
import os
import typing as ty
from pathlib import Path
from pydra.compose import python, workflow
from pydra.engine.submitter import Submitter
from fileformats.generic import File, Directory
from fileformats.text import Csv
import pandas as pd
from scipy.stats import norm
import nibabel as nib
# These functions were removed within nilearn, so this notebook needs to be rewritten
# to use the 'openneuro' module instead
# from nilearn.datasets import (
# fetch_openneuro_dataset_index,
# fetch_openneuro_dataset,
# select_from_index,
# )
from nilearn.interfaces.fsl import get_design_from_fslmat
from nilearn.glm.first_level import first_level_from_bids
from nilearn.reporting import get_clusters_table, make_glm_report
from nilearn.plotting import (
plot_glass_brain,
plot_img_comparison,
plot_contrast_matrix,
)
[3]:
# get current directory
pydra_tutorial_dir = os.path.dirname(os.getcwd())
# set up output directory
workflow_dir = Path(pydra_tutorial_dir) / 'outputs'
workflow_out_dir = workflow_dir / '6_glm'
# create the output directory if not exit
os.makedirs(workflow_out_dir, exist_ok=True)
[4]:
workflow_out_dir
[4]:
PosixPath('/home/runner/work/pydra/pydra/docs/source/outputs/6_glm')
Create tasks¶
In this section, we converte major steps into tasks. Each pydra task can have multiple python functions. We recommend to put those logically more related functions into the same task.
It is very important to keep in mind what adjacent tasks of your current task will be.
Your previous task will decide your arguments in the current task
Your next task will be impacted by the returns in the current task
fetch openneuro BIDS dataset¶
In this task, we do the following:
get openneuro dataset index
specify exclusion patterns and number of subjects
download the data we need
Notes: Here we still use n_subjects
as an argument. Given that we will only analyze one subject, you can also remove this argument and specify n_subjects =1
in select_from_index
. If you do, do not forget to modify the argument in the workflow later.
[5]:
@python.define(outputs=["data_dir"])
def GetOpenneuroDataset(exclusion_patterns: list, n_subjects: int) -> str:
_, urls = fetch_openneuro_dataset_index()
urls = select_from_index(
urls, exclusion_filters=exclusion_patterns, n_subjects=n_subjects
)
data_dir, _ = fetch_openneuro_dataset(urls=urls)
return data_dir
obtain FirstLevelModel objects automatically and fit arguments¶
To get the first level model(s) we have to specify
the dataset directory
the task_label
the space_label
the folder with the desired derivatives (fMRIPrep)
In our case, we only have one subject so we will only have one first level model. Then, for this model, we will obtain
the list of run images
events
confound regressors
Those are inferred from the confounds.tsv files available in the BIDS dataset.
[6]:
@python.define(outputs=["model", "imgs", "subject"])
def GetInfoFromBids(
data_dir: Directory,
task_label: str,
space_label: str,
smoothing_fwhm: float,
derivatives_folder: Directory,
) -> ty.Tuple[ty.Any, list, str]:
(
models,
models_run_imgs,
models_events,
models_confounds,
) = first_level_from_bids(
dataset_path=data_dir,
task_label=task_label,
space_label=space_label,
smoothing_fwhm=smoothing_fwhm,
derivatives_folder=derivatives_folder,
)
model, imgs, events, confounds = (
models[0],
models_run_imgs[0],
models_events[0],
models_confounds[0],
)
subject = 'sub-' + model.subject_label
return model, imgs, subject
Get design matrix¶
This task does the following:
read the design matrix in
.mat
rename the column
save the new design matrix as
.csv
Think: What if we don’t save the new design matrix, but return
it directly? In other words, we return
a pandas.DataFrame
instead of a path
. What will happen? Worth a try :)
[7]:
@python.define(outputs=["dm_path"])
def GetDesignMatrix(data_dir: Directory, subject: str) -> Csv:
fsl_design_matrix_path = data_dir.joinpath(
'derivatives',
'task',
subject,
'stopsignal.feat',
'design.mat',
)
design_matrix = get_design_from_fslmat(
fsl_design_matrix_path, column_names=None
)
design_columns = [
'cond_%02d' % i for i in range(len(design_matrix.columns))
]
design_columns[0] = 'Go'
design_columns[4] = 'StopSuccess'
design_matrix.columns = design_columns
dm_path = Path('designmatrix.csv')
design_matrix.to_csv(dm_path, index=None)
return dm_path
Fit the first level model¶
What we are doing here is:
use the design matrix to fit the first level model
compute the contrast
save the z_map and masker for further use
generate a glm report (HTML file)
[8]:
@python.define(outputs=["model", "z_map_path", "masker", "glm_report_file"])
def ModelFit(model, imgs, dm_path, contrast: str) -> ty.Tuple[ty.Any, str, ty.Any, str]:
design_matrix = pd.read_csv(dm_path)
model.fit(imgs, design_matrices=[design_matrix])
z_map = model.compute_contrast(contrast)
z_map_path = Path('firstlevel_z_map.nii.gz')
z_map.to_filename(z_map_path)
masker_path = Path('firstlevel_masker.nii.gz')
masker = model.masker_
glm_report_file = Path('glm_report.html')
report = make_glm_report(model, contrast)
report.save_as_html(glm_report_file)
return model, z_map_path, masker, glm_report_file
Get cluster table¶
For publication purposes, we obtain a cluster table.
[9]:
@python.define(outputs=["output_file"])
def ClusterTable(z_map_path: File) -> Csv:
stat_img = nib.load(z_map_path)
output_file = Path('cluster_table.csv')
df = get_clusters_table(
stat_img, stat_threshold=norm.isf(0.001), cluster_threshold=10
)
df.to_csv(output_file, index=None)
return output_file
Make plots¶
Here we want to make some plots to display our results and compare the result from FSL.
plot nilearn z-map
plot fsl z-map
plot nilearn and fsl comparison
plot design matrix contrast
You can also separate this task into multiple sub-tasks. But it makes more sense to put them into one task as they use the same files and function nilearn.plotting
repeatedly.
[10]:
@python.define(outputs=["output_file1", "output_file2", "output_file3", "output_file4"])
def Plots(
data_dir: Directory,
dm_path: File,
z_map_path: File,
contrast: str,
subject: str,
masker
) -> ty.Tuple[str, str, str, str]:
# plot and save nilearn z-map
z_map = nib.load(z_map_path)
output_file1 = Path('nilearn_z_map.jpg')
plot_glass_brain(
z_map,
output_file=output_file1,
colorbar=True,
threshold=norm.isf(0.001),
title='Nilearn Z map of "StopSuccess - Go" (unc p<0.001)',
plot_abs=False,
display_mode='ortho',
)
# plot and save fsl z-map
fsl_z_map = nib.load(
os.path.join(
data_dir,
'derivatives',
'task',
subject,
'stopsignal.feat',
'stats',
'zstat12.nii.gz',
)
)
output_file2 = Path('fsl_z_map.jpg')
plot_glass_brain(
fsl_z_map,
output_file=output_file2,
colorbar=True,
threshold=norm.isf(0.001),
title='FSL Z map of "StopSuccess - Go" (unc p<0.001)',
plot_abs=False,
display_mode='ortho',
)
# plot and save nilearn and fsl comparison
plot_img_comparison(
[z_map],
[fsl_z_map],
masker,
output_dir=workflow_out_dir,
ref_label='Nilearn',
src_label='FSL',
)
old = Path('0000.png')
new = Path('nilearn_fsl_comp.jpg')
os.rename(old, new)
output_file3 = new
print(output_file3)
# plot and save design matrix contrast
design_matrix = pd.read_csv(dm_path)
output_file4 = Path('firstlevel_contrast.jpg')
plot_contrast_matrix(contrast, design_matrix, output_file=output_file4)
return output_file1, output_file2, output_file3, output_file4
Make a workflow from tasks¶
Now we have created all tasks we need for this first level analysis, and there are two choices for our next step.
create one workflow to connect all tasks together
create sub-workflows with some closely related tasks, and connect these workflows along with other tasks into a larger workflow.
We recommend the second approach as it is always a good practice to group tasks, especially when there are a large number of tasks in the analysis.
Our analysis can be divided into three parts: (1) get/read the data, (2) analyze the data, and (3) plot the result, where (1) and (3) only have one task each. So we can put all tasks in (2) into one workflow and name it as firstlevel
or whatever you prefer.
[11]:
@workflow.define(outputs=["z_map", "masker", "subject", "dm_path", "cluster_table", "glm_report"])
def FirstLevelWorkflow(
data_dir: Directory,
contrast: str,
output_dir: Path,
task_label: str = 'stopsignal',
space_label: str = 'MNI152NLin2009cAsym',
derivatives_folder: str = 'derivatives/fmriprep',
smoothing_fwhm: float = 5.0,
) -> ty.Tuple[str, str, str, File, str, str]:
# add task - get_info_from_bids
get_info_from_bids = workflow.add(
GetInfoFromBids(
data_dir=data_dir,
task_label=task_label,
space_label=space_label,
derivatives_folder=derivatives_folder,
smoothing_fwhm=smoothing_fwhm,
)
)
# add task - get_designmatrix
get_designmatrix = workflow.add(
GetDesignMatrix(
data_dir=data_dir,
subject=get_info_from_bids.subject,
)
)
l1estimation = workflow.add(
ModelFit(
model=get_info_from_bids.model,
imgs=get_info_from_bids.imgs,
dm_path=get_designmatrix.dm_path,
contrast=contrast,
)
)
# add task - cluster_table
cluster_table = workflow.add(
ClusterTable(
z_map_path=l1estimation.z_map_path,
)
)
# specify output
return (
l1estimation.z_map_path,
l1estimation.masker,
get_info_from_bids.subject,
get_designmatrix.dm_path,
cluster_table.output_file,
l1estimation.glm_report_file,
)
The overaching workflow¶
Connect other tasks and the above workflow into one
Now we need to create the overaching glm workflow that connects the above workflow and other tasks (e.g., get/read the data
and plot the result
)
[12]:
@workflow.define(outputs=["output1", "output2", "output3", "output4"])
def FullWorkflow(
output_dir: Path,
n_subjects: int = 1,
contrast: str = 'StopSuccess - Go',
exclusion_patterns: list[str] | None = None,
) -> tuple[ty.Any, ty.Any, ty.Any, ty.Any]:
if exclusion_patterns is None:
exclusion_patterns = [
'*group*',
'*phenotype*',
'*mriqc*',
'*parameter_plots*',
'*physio_plots*',
'*space-fsaverage*',
'*space-T1w*',
'*dwi*',
'*beh*',
'*task-bart*',
'*task-rest*',
'*task-scap*',
'*task-task*',
]
get_openneuro_dataset = workflow.add(
GetOpenneuroDataset(
exclusion_patterns=exclusion_patterns,
n_subjects=n_subjects,
)
)
wf_firstlevel = workflow.add(
FirstLevelWorkflow(
data_dir=get_openneuro_dataset.data_dir,
contrast=contrast,
output_dir=output_dir,
)
)
plots = workflow.add(
Plots(
data_dir=get_openneuro_dataset.data_dir,
dm_path=wf_firstlevel.dm_path,
z_map_path=wf_firstlevel.z_map,
contrast=contrast,
subject=wf_firstlevel.subject,
masker=wf_firstlevel.masker,
)
)
return (
plots.output_file1,
plots.output_file2,
plots.output_file3,
plots.output_file4,
)
Run Workflow Run¶
[13]:
wf = FullWorkflow(output_dir=workflow_out_dir, n_subjects=1, contrast='StopSuccess - Go')
if __name__ == "__main__":
with Submitter(worker='cf', n_procs=4) as sub:
results = sub(wf)
print(results)
Task execution failed
Full crash report for 'FullWorkflow' job is here: /home/runner/.cache/pydra/1.0a0/run-cache/workflow-970ddbd8393fbbd70801771ec2b613e9/_error.pklz
Result(cache_dir=PosixPath('/home/runner/.cache/pydra/1.0a0/run-cache/workflow-970ddbd8393fbbd70801771ec2b613e9'), outputs=None, runtime=None, errored=True, task=FullWorkflow(output_dir=PosixPath('/home/runner/work/pydra/pydra/docs/source/outputs/6_glm')))
Visualization¶
If you arrive here without any errors, yay, you just made your first pydra workflow for a first-level GLM!
Examine folder structure¶
Let’s take a look at what you have got.
[14]:
! ls ../outputs/6_glm
Plot figures¶
[15]:
from IPython.display import Image
if not results.errored:
# First-level contrast
Image(filename='../outputs/6_glm/firstlevel_contrast.jpg')
# Nilearn Z map
Image(filename='../outputs/6_glm/nilearn_z_map.jpg')
# FSL Z map
Image(filename='../outputs/6_glm/fsl_z_map.jpg')
# Nilearn and FSL comparison
Image(filename='../outputs/6_glm/nilearn_fsl_comp.jpg')
Exercise¶
What if we need to run the first-level GLM on multiple subject? We will need the splitter
.
So, where should we add .split
?