# Methods

## Data and Preprocessing

In this work we used 66 male subjects aged 31-35 from the group S500 of the Human Connectome Project (HCP), all preprocessed with the HCP minimum pipeline [7]. The main advantages of using this public data base are: each subject possess a dense mesh representing their cortical surface, we use it to create seed- points for tractography; all the mesh’s vertices are coregistered across subjects, property that we use to create the groupwise parcellation; each subject possess the Desikan Atlas [5] parcellation already computed over their cortical mesh; for each cortical mesh there are also different z-score maps representing the response to different stimuli obtained with functional MRI (fMRI) [2]. Finally, the group S500 contains z-score maps representing the average functional response to stimuli for 100 unrelated subjects (U100). These studies are used to validate our technique’s results.

## Cortical Connectivity Model and Tractography

Our model assumes that the cortex is divided in clusters of homogeneous extrinsic connectivity. That is, nearby neurons in the cortex share approximately the same long-ranged physical connections, we call this the *local coherence criterion.* Our assumption is based on histological results in the macaque brain [19]. As in clustered data models in statistics [16] we allow intra-cluster and across-subject variability. We formalize this concept as:

where the set of points on the cortex *K* is the disjoint union of each cluster K and conn(-) is the extrinsic connectivity fingerprint of a cluster. We will make the notion of variability explicit in Eq. (3). In this work, the connectivity fingerprint of a seed-point in the brain is a binary vector denoting to which other seed-points it is connected through axonal bundles. This is, the physical connections of a point *p **2 **K* in the brain are represented by its connectivity fingerprint conn(p) = conn(K).

Nowadays, the most common tool for estimating the extrinsic connectivity fingerprint of a point in vivo is probabilistic tractography [9]. Given a seed-point in the brain, probabilistic tractography creates a *tractogram:* an image where each voxel is valued with its probability of being connected to the seed through axonal bundles. One way of calculating these probabilities is with a Monte Carlo procedure, simulating the random walk of water particles through the white matter [3]. Each one of these paths is known as a streamline. If we think these streamlines as Bernoulli trials, were we get a value for the connection from our seed with other points (1 if they connected by the streamline, 0 if not) [3], then we can model the tractogram of the subject s in the seed-point *p* as:

where *C _{sp}i* is a Bernoulli random variable

^{[1]}representing “the point

*p*of the subject s is connected to the voxel i”. Each Bernoulli’s parameter

*(6*represents the probability of being connected, and is estimated as the proportion of success in the Bernoulli trials of each seed.

_{spi})To formulate the tractogram in accordance to our hypothesis of cortical connectivity, we model a tractogram as a vector of random variables. In our model, each element in a tractogram comes from a random variable depending of the point’s cluster alongside its intra-cluster and across-subject variability:

in this case, the point *p* belongs to the cluster *c; e _{ci}* represents the intra-cluster variability and

*e*represents the across-subject variability for the connectivity to voxel

_{si}*i*in the cluster c.

Since each *C _{spi}* follows a Bernoulli distribution [Eq. (2)] it’s difficult to find an explicit formulation for

*P(C*1| conn(K

_{spi}=_{c}),

*e*accounting for the variabilities. For this, we use the generalized linear model (GLM) theory. In this theory, the data is assumed to follow a linear form after being transformed with an appropriate link function [11]. Using the following notation abuse:

_{ci}, e_{si})

we derive from GLM a logistic random-effects model [16] for each point p:

where *e _{c}* and

*e*represent the intra-cluster and across-subject variability respectively. According to GLM theory

_{s}*fi*

_{c}*2*R" is the extrinsic connectivity fingerprint of cluster

*K*transformed:

_{c}

The choice of logit as link function is based on the work of Pohl et al. [17]. There, they show that the logit function’s codomain is a Euclidean space, which allows us to transform and manipulate the tractograms in a well-known space.

- [1] For the sake of clarity we denote all random variables with a tilde, e.g. C.