Subtyping pipeline

Overview of the subtyping pipeline

The goal of this pipeline is to cluster subjects together based on similarity on a given measure (e.g. functional connectivity) and then perform subsequent statistical analyses.

The steps of the niak_pipeline_subtype.m are as follows:

Preprocessing of the data to create a "stack" map per network as input for the rest of the pipeline, with the option of regressing confounds prior to subtyping. The stack map contains a Subjects x Voxels array. See niak_brick_network_stack.m for more info.
Clustering the subjects to form subtypes (or subgroups) within the dataset See niak_brick_subtyping.m for more info.
Calculating the subtype "weights" for each subject, a measure of the strength of the association between each subject to a given subtype. See niak_brick_subtype_weight.m for more info.
Statistical tests of association, to test how subtypes may be related to variables of interest. For more info, see niak_brick_association_test.m to test with general linear models (GLM) and niak_brick_chi_cramer to test with Chi2 statistics. For visualization of GLMs, see niak_brick_association_test.

The command to run the pipeline in a Matlab/Octave session is: niak_pipeline_subtype(files_in,opt) where "files_in" is a structure describing how the dataset is organized, and "opt" is a structure describing the options of the pipeline. See this test script for an example of how to write your own script to call the pipeline.

Input files

Individual maps (e.g. rmap_part, stability_maps, etc).
A 3D binary mask
A model file (optional)
A .mat file with previously generated subtypes (optional)

These inputs must be specified in a structure, with required subfields for data and mask, and optionally, model.

Individual maps

These maps can be any type of preprocessed map (for example, the rmap from niak_pipeline_connectome). N.B. The pipeline assumes there is only one (1) mnc.gz or nii.gz per subject.

To grab the individual maps, we will have to build a structure. For example:
files_in.data.subject1 = 'data/subject1_session1_stability_maps.mnc.gz';
files_in.data.subject2 = 'data/subject2_session1_stability_maps.mnc.gz';

3D binary mask

The "mask" field is the name of a 3D binary volume serving as a mask for the analysis. It can be a mask of the brain common to all subjects, or a mask of a specific brain area, e.g. the thalami. It is important to make sure that this segmentation is in the same space and resolution as the fMRI datasets. If not, use SPM/SPM or MINCRESAMPLE to resample the mask at the correct resolution.

To specify the mask, add a subfield for the mask to the files_in structure. For example:
files_in.mask = '/home/pbellec/demo_niak_preproc/quality_control/group_coregistration/func_mask_group_stereonl.mnc.gz';

Model file

The model file is a .csv file containing demographic information, including variables of interest and confound variables, for each subject specified in files_in.data. This input is optional.

To specify the model files, add a subfield to the files_in structure. For example:
files_in.model = 'data/model.csv';

Options

The different options are passed through fields in the structure "opt".

The first option is the name of the folder where the results will be stored. Note that this folder does not need to be created beforehand. Example:
opt.folder_out = 'data/subtype_results/'; % Where to store results

The second option is the scale of the networks specified in files_in.data (e.g. a brain partition of 5 networks is considered to be at scale 5). Example:
opt.scale = 5;

There is the option to regress out confounding variables during the generation of the stack maps, prior to subtyping. N.B. The confounding variables that are specified in the option must correspond to variables within the model file. For example:
opt.stack.regress_conf = {'Gender', 'Age'}; % Regress out variables gender and age from stack maps

Subtyping options There are several options that may be specified for the subtyping part of the pipeline. These options must be specified in the structure "opt.subtype".

Number of subtypes to extract. For example:
opt.subtype.nb_subtype = 5; % We will extract 5 subtypes
The model for the subtype map. For example:
`opt.subtype.sbt_map_type = 'median'; % We will ask the subtype volumes to be created based on the median of the data.

Association options There are also several options that may be specified for the GLM association testing part of the pipeline. These options must be specified in the structure "opt.association".

Scale
qFDR
Type of FDR correction
Contrast
Interaction
Normalization For visualization of the above GLM results, you can specify the following options:
A flag to turn on/off the generation of plots for the GLM: opt.flag_visu = false;
Specification of the type of data used for the plots: opt.visu.data_type = 'categorical';

There are also options that may be specified for the calculation of Chi2 statistics.

The following flag turns on/off the generation of a contingency table and subsequent calculation of Cramer's V and Chi-2 statistics:
opt.flag_chi2 = true;
The index of the group column in files_in.model on which the contingency table is built. opt.chi2.group_col_id = 'Group'; % the name of variable of interest in the csv

Outputs

A number a subfolders and files are created in the "opt.folder_out" directory. In the following, EXT will denote the extension associated with the file type of the functional images, e.g. ".nii" or ".nii.gz" for nifti. An exhaustive description of the outputs follows. Most of them may not be of interest.

subtype_weights.mat : a .mat file containing a single variable weight_mat. Weight_mat is a 3D Subjects x Subtypes x Networks array that contains the weights for each subject for each subtype and each network.
sbt_weights_net_<number>.csv : a .csv containing the weights for each subject for each subject per network. For N networks, N .csv's will be generated.
fig_sbt_weights_net_<number>.pdf : a .pdf illustrating the weights matrix per network. For N networks, N .pdf's will be generated.

Logs

The logs folder keeps track of all the execution of the pipeline (see below the section on pipeline management). This folder needs to be left intact at all time. It contains all the logs of the pipeline execution, but no results directly relevant to the subtypes.

Subfolders per network

A subfolder will be generated for each network (e.g. If 7 networks were tested, there will be 7 subfolders labeled "network_1", "network_2" ... "network_7"). Each subfolder will contain:

By default

network_<number>_stack.mat : a .mat file that contains two variables: (1) provenance, a structure that contains information about the subjects, model, and volume; (2) stack, a Subjects x Voxels array.
network_<number>_similarity_matrix.mat : a .mat file that contains four variables: (1) provenance; (2) hier, a 2D array defining a hierarchy; (3) sim_matrix, a Subjects x Subjects array; (4) subj_order, a vector containing the order defines a permutation on the subjects as defined by "hier" when splitting the subjects backward.
network_<number>_subtype.mat : a .mat file that contains five variables: (1) provenance; (2) hier; (3) opt, a structure describing options that the user specified; (4) part, a vector where PART(I) = J if the object I is in the class J; (5) sub, a structure containing arrays for different maps.
similarity_matrix.pdf : a .pdf illustrating a Subjects x Subjects correlation matrix
dendrogram.pdf : a .pdf illustrating the clustering of the subjects
grand_mean.nii.gz : a 3D map illustrating the mean connectivity within the network across all subjects
grand_std.nii.gz : a 3D map illustrating the standard deviation of the connectivity within the network across all subjects
mean_subtype.nii.gz : a 4D map illustrating the mean connectivity within the network for each subtype
ttest_subtype.nii.gz : a 4D map illustrating the statistical difference in a t-test between each subtype and all other subtypes
eff_subtype.nii.gz : a 4D map illustrating the effect size of the difference between each subtype and all other subtypes

Optional : The following will only be generated if opt.flag_assoc = true

results_overview.csv : a .csv detailing significant or non-significant results from the association testing.
association_stats.mat : a .mat file containing results from statistics generated from the association testing.

Optional : The following will only be generated if opt.flag_visu = true

fig_association_net_<number>.pdf : a .pdf illustrating the association between subtype weights and variables of interest. For N networks, N .pdf's will be generated.

Optional : The following will only be generated if opt.flag_chi2 = true

group_stats.mat : a .mat file that contains two variables: (1) model, a structure containing information about the subjects; (2) stats, a structure with results from Chi-2 and Cramer's V tests.
chi2_contingency_table.csv : a .csv file that contains a contingency table based on user-specified options
pie_chart_<number>.pdf : a .pdf illustrating the proportions of subjects within subtypes.

Brought to you by the SIMEXP lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Subtyping pipeline

Overview of the subtyping pipeline

Input files

Individual maps

3D binary mask

Model file

Options

Outputs

Logs

Subfolders per network

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally