# Individual faces elicit distinct response patterns in human anterior temporal cortex

See allHide authors and affiliations

Kriegeskorte |

## Supporting Information

#### Files in this Data Supplement:

SI Figure 4SI Figure 5

SI Figure 6

SI Figure 7

SI Text

SI Figure 8

SI Figure 9

SI Figure 10

SI Figure 11

SI Figure 4

**Fig. 4.** Subject error rates on anomaly-detection task during fMRI. The anomaly-detection task (Fig. 1, *Methods*) was designed to require subjects to attend to every presentation, despite the fact that 88% of all trials consisted in repetitions of the four standard images. Subjects correctly detected approximately two-thirds (66%) of the anomalous versions of the images presented. Analysis of responses across time (data not shown) indicated that all subjects attentively viewed the stimuli throughout both runs. Bars show group-average percentages of trials with error bars indicating the standard error of the mean (computed from the standard deviation of the single-subject means).

SI Figure 5

**Fig. 5.** Detailed behavioral results for anomaly-detection task during fMRI (stimulus-specific reaction times and error rates). (*a*) Anomaly-detection error analysis as in SI Fig. 10, but performed separately for each of the four standard images (right column) and its anomalous variants (left column). Subjects performed similarly on each of the four standard images. (Only standard-image trials were used for the fMRI analyses.) However, they missed more anomalies for the houses than for the faces. (*b*) Reaction-time analysis for each of the four standard images (right column) and its anomalous variants (left column). Both error rates and reaction times indicate that the anomaly detection task was slightly more challenging to subjects for the houses than for the faces. Bars show group-average percentages of trials (*a*) and reaction times (*b*) with error bars in both indicating the standard error of the mean (computed from the standard deviation of the single-subject means).

SI Figure 6

**Fig. 6.** Definition of regions of interest in a single subject. The three rows (*a-c*) show 13 axial brain slices as acquired (echoplanar fMRI slices averaged across time) from inferior to superior (top and bottom slices are omitted because they contain incomplete data after head-motion correction). (*a*) The univariate mapping for the contrast faces-houses reveals the fusiform face area (FFA) as indicated. The statistical map (see color bar in *c*) was thresholded to control the false-discovery rate, *q* <0.05. For this activation analysis only, data were spatially smoothed by convolution with a Gaussian kernel of 6-mm full-width at half-maximum. (All pattern-information analyses were performed on unsmoothed data.) (*b*) A manually drawn cortex mask (transparent yellow) marks all cortical voxels in our imaging volume in each subject. (*c*) For each subject and hemisphere, the "FFA vicinity" is defined as 4,000 cortical voxels [voxel size: (2 mm)^{3}] within a sphere centered on (and including) FFA (ROI in light and dark magenta). Note that voxels within the sphere, but outside the cortex or imaging volume are not included and not counted. Analogously aIT is defined as the 4,000 most-anterior voxels in temporal cortex (ROI in light and dark red). Again only voxels within the cortex mask are included. (*d*) The 4,000-voxel ROIs from *c* for aIT (red, left) and the FFA vicinity (magenta, right). (*e*) These ROIs were analyzed for face-exemplar information (see Fig. 2) by using all 4,000 voxels and progressively reduced sets (1,400-voxel subsets shown) selected by thresholding the face-exemplar information map. The 1,400-voxel subsets are shown in light red and light magenta in *c*.

SI Figure 7

**Fig. 7.** Activation-based group mapping in Talairach space. (*a*) The occipito-temporal measurement slab (blue) and the Talairach-space slices shown in the other images (red) superimposed to a sagittal high-resolution anatomical MR image (MNI brain). Measured slices and Talairach slices are 2-mm thick with no gap. The anatomical locations of Talairach slices 10 and 21 (which contain FFA and aIT, respectively) are shown in the high-resolution anatomy (*a*, *Center* and *Right*). To indicate, where in Talairach space our measurements provided data for all subjects, we use a Talairach-space group average of the functional data as the background for the statistical maps in *b-d*. (*b*) Activation-based statistical map for the contrast faces versus houses. Right and left FFA appear in orange-yellow (face activation greater than house activation) in slice 10 and adjacent slices. The parahippocampal place area (PPA) appears bilaterally as well, medial to FFA (blue-green). (*c*) Activation-based statistical map for the contrast face 1 versus face 2. The two faces do not elicit different levels of overall activation in any region within our occipito-temporal imaging slab. This is plausible because the face images are physically very similar and because activation-based mapping as shown here involves smoothing out of fine-grained pattern information. (*d*) Activation-based statistical map for the contrast house 1 versus house 2. One of the house images elicits somewhat greater activity, particularly in early visual cortices. This is unsurprising because the two house images are very different physically, although they share the same category (house). (*b-d*) All maps in this figure show univariate fixed-effects group analyses performed on data smoothed with a kernel of 6-mm full-width at half-maximum (the kernel was a sphere of 3-mm radius). All maps are thresholded to control the false-discovery rate, *q* < 0.05. The right hemisphere is on the right side of each slice.

SI Figure 8

**Fig. 8.** Information-based response-pattern analysis. (*a*) For a region of interest (ROI, green voxel cluster) predefined by statistical mapping (1), a linear-model fit (2) provides an estimate of the response amplitude during each condition for each voxel of the region. For each condition, the pattern of responses across the voxels of the ROI can be thought of either as a point in the multidimensional space spanned by the voxel activities (3, red and blue central dots) or as an event-related spatial response pattern on the cortex (4). The analysis is applied to each pair of conditions in turn and is illustrated only for the face pair. For visualization of the ROI's response patterns, we compute an approximation to a regional cortical flatmap by a neighborhood-preserving self-organizing projection of the 3D voxel locations onto the unit square. The response patterns shown (4) are those found in the anterior inferotemporal face-exemplar region of subject K. The residuals of the linear-model fit provide a multinormal model of the variability of the response-pattern estimates (3, red and blue iso-probability-density contours). Under multinormality and homoscedasticity, the optimal decision boundary for classification of response patterns is a hyperplane and the optimal discriminant dimension is the Fisher linear discriminant (5, dashed arrow), which is the dimension orthogonal to the optimal decision boundary (solid line separating red and blue distributions). All computations described thus far are performed on data from run A of each subject. We use independent data (run B) to estimate the pair-wise condition information (our effect measure, see *Methods* and *SI* *Text*, *Single-trial pair-wise condition information*) and to perform statistical inference (right side). The independent run-B data set is projected onto the Fisher discriminant and analyzed univariately. This projection amounts to a weighted sum across the voxels, where each voxel can have a positive or a negative weight depending on the sign of the difference in response between the two conditions. If multinormality holds and the two data sets are consistent, no information is lost by this projection. If multinormality does not hold, the analysis becomes conservative, i.e., the information estimate will be lowered and the sensitivity of statistical inference will suffer. Note that the specificity of the test, i.e., its validity, depends only on univariate normality after projection onto the Fisher discriminant. The results of analysis for all pairs of conditions are summarized in a pairwise-effects icon (see step 8 and SI Fig. 10). Group analysis (SI Fig. 10) is performed by averaging information effects and combining the t values (representing the individual response-pattern differences) across subjects. (*b*) To find regions whose response pattern distinguishes two conditions, we scan the measured volume with a 3-mm-radius spherical multivariate searchlight (1, red voxel cluster). Note that this aspect of the analysis is purely descriptive. Inference is later performed on independent data. The searchlight is centered on each voxel in turn (selecting overlapping voxel sets at adjacent positions). For each voxel position, the time courses of all voxels falling within the searchlight are subjected to joint multivariate analysis (2, 3). As a measure of response-pattern difference, we use the Mahalanobis distance. The Mahalanobis distance representing the response-pattern difference within the searchlight is recorded in a map at the central voxel position (4). The whole volume is scanned in this manner (5). Note that the resulting map represents local response-pattern information, not activation. The map shown is that of subject *K*. The white arrow marks the map maximum. The map is thresholded (6) to define the region distinguishing the two conditions. Optionally, the highlighted voxel cluster can be expanded by the searchlight radius (7), to obtain an ROI that includes all voxels that contributed to the local multivariate effects indicated by the superthreshold voxels. The location and shape of the region thus defined represents a subject-specific hypothesis, which is subsequently tested on independent data as described in *a*.

SI Figure 9

**Fig. 9.** Information-based group mapping in Talairach space. (*a*) Same as SI Fig. 7*a*, repeated for convenient reference. (*b*) Information-based group map showing regions whose local activity pattern distinguishes the two faces. This is the full information-based map shown selectively to display the aIT location in Fig. 3. Note that the only cluster is in right aIT (slices 21 and 22) and that there are a few isolated voxels in other slices as well. The highlighted aIT volume is 8 voxels = 64 mm^{3}; the volume contributing information is 56 voxels = 448 mm^{3} (highlighted volume expanded by searchlight radius of 3 mm). Because of the effects of group-averaging and thresholding, these volumes should not be considered as estimates of the extent of the distributed code. The peak voxel had *P* < 0.0001. (*c*) Information-based group map showing regions whose local activity pattern distinguishes the two houses. Here, the information-based mapping reflects fine-grained pattern effects along with the activation-effects also seen in SI Fig. 7*d* for the house-exemplar contrast. (*b-c*) Information-based searchlight mapping is illustrated in SI Fig. 8*b*. Here, we used a randomization scheme for statistical inference (see *SI Text*: *Information-based group mapping in Talairach space*). Group maps were thresholded to highlight voxels with *P* < 0.001, uncorrected. All information-based analyses were performed on unsmoothed data. The right hemisphere is on the right side of each slice.

SI Figure 10

**Fig. 10.** Response-pattern effects in key regions. For each region of interest, a pair-wise-effects icon shows the multivariate effects for each pair of images. The color of each connection line indicates whether the response-pattern difference was significant for the group (red, *P* < 0.01; pink, 0.01 £*P* < 0.05; dotted gray, *P* ³ 0.05, not significant). The thickness of each line reflects the multivariate effect size in terms of the pairwise condition information (see *SI Text*), which is also given explicitly in single-trial bits (numbers on lines). A pairwise condition information of 1 bit would indicate that the response pattern estimated from a single trial allows us to determine with perfect certainty, which of the two images has been presented. To focus the analysis on genuine combinatorial effects, the spatial-mean effect has been removed from the data before the multivariate analysis (significance testing and information estimation) by subtracting the spatial-mean time course of the region from each single-voxel time course. The analyses are fixed-effects group analyses (based on all 11 subjects) for regions of interest defined individually in each subject on the basis of mapping analyses (FFA, PPA, aIT face-exemplar region) or anatomical location (early visual cortex). Independent data were used for (1) defining the ROIs and (2) testing response-pattern effects and estimating pairwise condition information. Note the transformation of response-pattern similarity across regions: In retinotopic visual areas, all image pairs elicit distinct response patterns, except the two faces. This may reflect the greater physical similarity of the two face images. In FFA and PPA, the category distinction (faces versus houses) is emphasized, whereas within-category differences appear to be deemphasized. The IT face-exemplar region distinguishes the face images, but there are no significant effects for any other pair of images. This is consistent with the weaker overall response to houses in aIT (SI Fig. 7) and suggests that the house response patterns tend to lie in between the two distinct face response patterns in multivariate space, rendering them statistically indistinguishable from each of the face response patterns and from each other.

SI Figure 11

**Fig. 11.** Anterior inferotemporal face-exemplar region (subject TS). (*a*) Event-related spatial response patterns elicited by the four images in the anterior temporal face-exemplar region of subject TS. The face-exemplar effect, but none of the other pairwise effects, is significant in this subject. The two horizontal dimensions of each surface plot represent an approximate local cortical flatmap obtained by a neighborhood-preserving projection of the voxels onto the unit square. The vertical axes represent single-image beta estimates. Each voxel is represented by a little black circle and the pattern is interpolated to yield a smooth surface. For an explanation of the central pairwise-effects icon, see legend of SI Fig. 10. The region was defined based on data set A using information-based brain mapping (see *Methods*). Data set B was used (*i*) to compute the event-related spatial response patterns, (*ii*) to estimate single-trial pairwise condition information (numbers on central pairwise-effects icon; 0 for effects of inconsistent direction between data set A and data set B), and (*iii*) to test the effect (red connection indicates *P* < 0.01 in data-set-B *t* test on a linear discriminant defined by data set A). The spatial-mean effects have been removed by subtracting the spatial-mean time course of the region from each single-voxel time course before the analysis. (Omitting this step yields the same pattern of multivariate effects with negligible changes to the effect sizes.) (*b*) The anatomical location of the face-exemplar region in subject TS. The anatomical background slices were obtained by averaging the functional volumes across time. Slices progress from inferior to superior (left to right).

**SI Text**

**Results of Control Analyses. Information in early visual cortex.** To investigate information in early visual cortex (EVC), we anatomically defined an ROI around the calcarine sulcus in each subject individually. Comparing the EVC response patterns for each pair of images (SI Fig. 10), we found significant response-pattern differences (multivariate fixed-effects group analysis,

*P*< 0.05) for all pairs of images, except the two faces (

*P*> 0.05). This reflects the physical similarity of the images (for example, spatial correlation of the face images is substantial, whereas all other pairs of images are essentially uncorrelated). Although the faces must have elicited subtly different response patterns in EVC, their retinotopic representations are too similar to be distinguished from our fMRI data. This suggests that our matching of view, lighting, and intensity histogram was successful at reducing low-level confounds to a negligible level.

** Activation effects in FFA and aIT.** In addition to analyzing the information in the FFA and aIT response patterns (using unsmoothed single-subject data), we asked, more conventionally, what overall activation the images elicited. First, we performed an activation-based mapping (using data smoothed with a Gaussian kernel of 6-mm full-width at half-maximum) for the contrast faces-houses. This revealed face-category activation (

*i*) in bilateral FFA (by definition), (

*ii*) more posteriorly in bilateral regions including the lateral occipital complex and the occipital face area, and also (

*iii*) more anteriorly in bilateral aIT (SI Fig. 7

*b*, Talairach-space group maps). The face-category activation effects in aIT were weaker than in FFA and more posterior regions; and they were not detected in every subject (SI Fig. 6

*a*, single-subject map).

Second, we performed an analysis of ROI-average activation. Independent data sets were used to (*i*) define the ROIs and (*ii*) analyze their activation effects. Our right-aIT face-exemplar region (defined by mapping for face-exemplar information) did not respond significantly more strongly to the faces than to the houses (*P* > 0.05), or vice versa (*P* > 0.05).

Note that absence of face-category activation in the right aIT face-exemplar region is not in contradiction to the distinctness of the two face response patterns: positive and negative single-voxel responses to a given face can yield an average across voxels that is close to the baseline, while the spatial response patterns are distinct.

FFA, as expected, did respond much more strongly to each of the faces than to each of the houses (*P* < 0.01). In both FFA and aIT, the two faces did not elicit significantly different overall activation (*P* > 0.05); the two houses did not elicit significantly different activation, either (*P* > 0.05).

** Adaptation effects caused by stimulus repetition.** The design of this study is unconventional in that each stimulus image forms a separate condition. To be able to obtain stable estimates of the single-image response patterns, we present the same four images (Fig. 1

*a*) many times in a pseudorandom sequence. A potential concern with such a design is that the repetitions could lead to reduced responses as a result of local neuronal adaptation or a more complex process of repetition suppression, which could include a gradual loss of attention directed at the stimuli. Although a response reduction cannot explain a positive finding (e.g., face-exemplar information in right aIT), it might explain a negative finding (e.g., the absence of a significant face-exemplar effect in FFA).

To assess whether the repeated presentation of the four images caused an overall decrease of the responses elicited, we divided each of the two fMRI runs performed with each subject into four equal temporal segments and analyzed the activation (ROI-average) elicited by each image in early visual cortex, FFA, and PPA. Results (data not shown) suggest that adaptation effects were small if they were present at all. A single-stimulus-per-condition design with only four stimuli can, thus, elicit stable responses throughout an event-related fMRI experiment. We think that our experimental task contributed to the stability of the responses across time. The anomaly-detection task (Fig. 1*b*) served to motivate subjects to attentively view of each presentation and allow us to monitor attentive viewing. Performance indicated that subjects viewed attentively throughout the experiment (SI Fig. 4).

**Details on Experimental Procedures. Stimuli and task.** The basic set of stimuli consisted of four photographs, depicting a woman's face, a man's face, a traditional house and a modern building (Fig. 1

*a*). The images were in 8-bit grayscale and had a resolution of 512 ´ 512 pixels. Each image was processed to have a precisely uniform histogram. The images, thus, had identical light and spatial-signal energy.

Before the experiment, subjects were familiarized with the four images. They were instructed to continually fixate a central cross, which was always visible, and to perform an anomaly-detection task during the experiment (Fig. 1*b*). On 12% of the trials of each experimental run, subtle variations of the four images were presented. In each anomalous version, the global shape of the object as well as several details were slightly distorted. The particular changes were unpredictable to the subjects because several anomalous versions were used for each original. Subjects were asked to press a button placed underneath their right index finger on a regular trial and a button underneath their left index finger when they detected an anomalous image. The task served to motivate subjects to attend to each image presentation even after many repetitions and allowed us to monitor attentive viewing. Behavioral performance (SI Figs. 4 and 5) indicated that all subjects attentively viewed the stimuli throughout both runs.

** Experimental design.** We used a rapid event-related design with a basic trial duration of 3 s (minimal stimulus-onset asynchrony) corresponding to two functional volumes of TR = 1500 ms. The stimulus sequence was optimized for estimation of the contrasts between the responses to the four original images by a method based on a genetic algorithm (1). Each image was presented for 400 ms. In each run, there were 63 presentations of each of the four original images, 33 presentations of anomalous versions of the images (see

*Stimuli and task*, above), and 9 null trials, on which the image presentation was omitted and the fixation cross remained visible. The total number of 3-s time slots was, thus, 4 ´ 63 + 33 + 9 = 294, and the duration of the run including two empty time slots at the end was (294 + 2) ´ 3 s = 14.8 min.

** Subjects.** Eleven subjects between 18 and 30 years of age participated in the experiments (average age: 24.5 years). All had normal or corrected-to-normal vision. Five of them were female, six male. Ten of them were right-handed; one was left-handed. After receiving information about magnetic resonance imaging they gave their informed consent by signing a form. The experimental techniques used in this study and the consent form were approved by the ethical committee CWOM of the Academisch Ziekenhuis (university hospital) associated with the Katholieke Universiteit Nijmegen (The Netherlands).

** Magnetic resonance imaging.** We measured 15 transversal functional slices with a Siemens Magnetom Trio scanner (3 Tesla) using a single-shot gradient-echo echo-planar-imaging (EPI) sequence and a standard birdcage headcoil. The imaged volume consisted in a 3-cm-thick temporal-occipital slab including early visual regions as well as the entire ventral visual stream. The pulse-sequence parameters were as follows: in-plane resolution: 2 ´ 2 mm

^{2}, slice thickness: 2 mm, gap: 0 mm, slice acquisition order: interleaved, field of view (FoV): 256 ´256 mm

^{2}, acquisition matrix: 128 ´ 128, time to repeat (TR): 1,500 ms, time to echo (TE): 32 ms, flip angle (FA): 75°. A functional run lasted 14.8 min. Each subject underwent a single imaging session including two functional runs and a high-resolution T1-weighted anatomical MPRAGE scan lasting 9.8 min (192 slices, slice thickness: 1 mm, TR: 2,300 ms, TE: 3.93, FA: 8°, FoV: 256 ´256 mm

^{2}, matrix: 256 ´ 256). The experiments were performed at the Donders Centre for Cognitive Neuroimaging (Nijmegen, The Netherlands).

**Details on Statistical Analysis. Preprocessing.** The fMRI data sets were subjected to slice-scan-time adjustment and head-motion correction by using the BrainVoyager 2000 software package (version 4.8). (

*i*) Slice-scan-time correction was performed by resampling the time courses with linear interpolation such that all voxels in a given volume represent the signal at the same point in time. (

*ii*) Small head movements were automatically detected and corrected by using the anatomical contrast present in functional MR images. The Levenberg-Marquardt algorithm was used to determine translation and rotation parameters (six parameters) that minimize the sum of squares of the voxel-wise intensity differences between each volume and the first volume of the run. Each volume was then resampled in 3D space according to the optimal parameters by using trilinear interpolation.

** Design matrix and multiple linear regression.** Single-subject analyses were performed by multiple linear regression of the response time course at each voxel. For each of the four original images, there was one predictor for the regular version and one predictor for the anomalous versions presented. The predictor time courses were computed by using a linear model of the hemodynamic response (2) and assuming an immediate rectangular neural response during each condition of visual stimulation. For each 7.4-min subrun (see

*Data splitting*, below), the design matrix included these cognitive predictors along with six head-motion-parameter time courses, a linear trend term, a six-predictor Fourier-basis for nonlinear trends (sines and cosines up to 3 cycles per subrun) and a confound-mean predictor. This design matrix was used for all univariate and multivariate analyses. Multiple linear regression was performed with custom software developed in Matlab.

** Data splitting.** Each 14.8-min run was split into two 7.4-min subruns (first half, second half), yielding four subruns per subject. Odd subruns (when numbered chronologically as acquired) constituted data set A (used for mapping, ROI definition, and discriminant fitting). Even subruns constituted data set B (used for significance testing and information estimation). This splitting strategy is preferable to using each run as a separate data set here for three reasons: (

*i*) The same stimulus sequence was used for run 1 and run 2. Set A and set B data correspond to independent stimulus subsequences. (

*ii*) Each set contains an earlier and a later portion of the experimental session. (

*iii*) Trend artefacts that have a similar time course across each run cannot introduce artefactual dependence between data set A and data set B.

** Definition of ROIs.** ROIs were defined by thresholding statistical maps computed from data set A of each individual subject (SI Fig. 6). To avoid a dependence of our results on the threshold used, thresholds were varied in small steps to highlight between 10 and 4,000 voxels. All regions were restricted to a cortex mask manually defined in each subject (transparent yellow in SI Fig. 6

*b*).

*Fusiform face area*. The FFA was defined in each subject and hemisphere by thresholding the *t* map for the contrast "faces minus houses." The *t* map was computed from data set A only, after spatial smoothing by convolution with a Gaussian kernel of 6-mm full-width at half-maximum. The region was defined as contiguous and seeded at the maximum of the face-house contrast map within the fusiform gyrus. (For small numbers of voxels, this definition matches what is called FFA in the literature. For large numbers of voxels, the region can extend far into anterior IT and posterior cortex. Nevertheless, it does not show evidence of face-exemplar information.)

*Anterior IT face-exemplar region.* The aIT face-exemplar region was defined in each subject and hemisphere by thresholding the face-exemplar information map obtained by using a 3-mm searchlight on data set A only (SI Fig. 8*b*). For a given number of voxels *n*, the region was defined as the discontiguous set of voxels (within the anterior 4,000 cortex-mask voxels in the hemisphere in question) with the highest entries in the data-set-A face-exemplar information map (computed by searchlight mapping on unsmoothed data).

*FFA-vicinity face-exemplar region.* To test for face-exemplar information in FFA and its vicinity in exactly the same way as in aIT, we defined the "FFA vicinity" as 4,000 cortex-mask voxels within a sphere around FFA in each subject and hemisphere. (First the center of FFA was defined as the peak of the face-house contrast map in the fusiform gyrus. Then a sphere was grown around this center. Voxels within both the sphere and the cortex mask were included in the ROI. The sphere was expanded until the ROI included exactly 4,000 voxels.) For a given number of voxels *n*, the region was then defined exactly the same way as the aIT face-exemplar region: as the discontiguous set of *n* voxels (within the 4,000-voxel FFA vicinity) that had the highest entries in the data-set-A face-exemplar information map.

** Significance testing of ROI response-pattern differences. General approach and motivation.** To test in a single subject whether two stimuli elicit distinct response patterns in a given ROI, we first use data set A to formulate a subject-specific hypothesis as to the multivariate dimension and the direction of the effect. We then test this hypothesis by a single-sided

*t*test performed on data set B. Forming a subject-specific hypothesis obviates the need for tests at multiple locations (multiple-comparisons problem) and allows us to apply a standard univariate

*t*test, which requires fewer assumptions and affords more power than multivariate tests. As the statistical cost of these advantages, only half the data are available for the test.

*Discriminant estimation from data set A.* For a given contrast (e.g., face 1 vs. face 2) and ROI, we estimate the Fisher linear discriminant based on data set A using the linear model described above (see *Design matrix and multiple linear regression*). The Fisher discriminant is a set of weights (one for each voxel in the ROI) defining the dimension (in the multivariate space spanned by voxel activities) that best separates two multinormal distributions of equal covariance (i.e., the dimension on which the ratio of between-class and within-class variance is maximal). The distributions here are distributions of spatial response patterns and each distribution corresponds to an experimental condition. The discriminant is defined by , where and are the two spatial response patterns, and is the error covariance matrix. We assume a diagonal covariance matrix for stability and to be able to test voxel sets larger than the number of time points (up to 4,000 voxels here). The resulting linear discriminant would be the optimally sensitive discriminant if the data were Gaussian with no dependence of errors between voxels. (The validity of the test, i.e., its specificity, is not affected by these assumptions.)

*Test on data set B.* We project the ROI time courses from data set B onto the data-set-A discriminant. This projection amounts to a weighted sum of the data-set-B time courses, yielding a single time course (discriminant time course) for a given ROI. We then perform a one-sided *t* test on the discriminant time course, with the direction of the test requiring consistency between data sets A and B. The *t* test assesses the same contrast that defines the discriminant and uses the same linear model (see above).

*Group analysis.* We perform a fixed-effects group analysis as defined in ref. 3 by concatenating the discriminant time courses of all subjects and fitting a composite design matrix with separate predictors for each subject.

*Temporal autocorrelation.* To ensure valid statistical inference in the presence of temporal autocorrelation of the errors, we apply the Cochrane-Orcutt prewhitening method (4) to the discriminant time courses and design matrix using an AR(1) model as described in ref. 5.

** Single-trial pairwise condition information. General approach.** For each pair of conditions (corresponding to the four stimulus images here), we estimate a lower bound on the mutual information between the condition and the multivariate response in the ROI for a single trial. Although the experiment as a whole (comprising many trials) provides much more information, we use the mutual information for a single-trial response as a measure, because it is less dependent on accidental properties of the experiment such as the amount of data acquired and the efficiency of the design. Unlike classification accuracy, single-trial information estimates can, in principle, be compared between different experiments.

*Technical details.* The mutual information *I(S;R*) between stimulus and response is defined as follows:

, [**1**]

where is the entropy of a variable *X* with particular values *x*, *S* is the stimulus variable (dichotomous here; two stimuli considered at a time), *R* is the response variable (continuous here), *s* and *r* are particular values of *S* and *R*, respectively, *p(x*) is the discrete probability mass function of random variable *X*, and log is the base-2 logarithm.

We first estimate the probability distributions, then plug them into the above formula to estimate the mutual information. The stimulus variable is dichotomous (two images at a time) and uniform (each image has the same probability of occurrence). The response is continuous and multivariate with one dimension for each voxel in the region. For stability of the estimate and to be able to deal with large voxel sets (up to 4,000 voxels here), we assume the errors to be multinormal with diagonal covariance (i.e., Gaussian and independent between voxels) and equal across conditions. A multinormal response can be projected onto the Fisher linear discriminant (see above) without loss of information about the stimulus. (This is because the likelihood ratio is constant on hyperplanes orthogonal to the Fisher discriminant.) If the true population means and the true population covariance are known, the mutual information can thus equivalently be computed from the one-dimensional response distributions of the two stimuli on the Fisher discriminant. However, as the Fisher discriminant maximizes class separation, estimation of the Fisher dimension and the distributions from the same noisy data gives strongly positively biased information estimates (overfitting). To avoid this bias, we use data set A to determine the Fisher discriminant and estimate the mutual information on the basis of the scatter and separation of the means of data set B on the data-set-A Fisher discriminant.

To estimate the distribution of the response estimates on the Fisher discriminant for single trials, we assume a design matrix *X* with two nonoverlapping predictors, each of which describes a complete hemodynamic response (time window considered: 20 s). This provides the scaling factor that relates the standard deviation of the measurement error to the standard error of the response estimates on the Fisher discriminant for a single trial. The set-B response estimates and their standard errors on the data-set-A Fisher discriminant define two univariate normal distributions (one for each stimulus), which are plugged into Eq. **1** to obtain an estimate of the mutual information.

Because of instrumental measurement noise and limited measurement resolution in space and time and because of the assumptions involved (multinormality, independent voxel responses, no temporal pattern information), our estimate should be considered an estimate of a lower bound on the actual information carried by the region.

**Information-Based Group Mapping in Talairach Space.** The information-based group mapping in Talairach space (Fig. 3, SI Fig. 9) differed from the information-based mapping performed in single subjects for the ROI analyses in two respects: (*i*) Statistical inference was performed by using a randomization scheme (instead of the pattern-discriminant *t* test based on splitting the data). This allowed us to use all data for the mapping. (*ii*) For computational efficiency the Mahalanobis distance was replaced by the mean squared t value (which is closely related to the Euclidean distance). The steps of the procedure are as follows:

*(i) Null-simulation design matrices from randomization of condition-labels.* For a given contrast (e.g., face 1 versus face 2) the condition labels (e.g., "face 1" and "face 2") were randomly reassigned within that set (i.e., labels were scrambled within the face and the house set, but a house trial would never receive a face label). This random relabeling was repeated 1,000 times. For each relabeling, a new design matrix with hemodynamic response predictors (2) was constructed. The resulting design matrices were spectrally similar to the actual design matrix used. (This was a concern because of the stimulus-sequence optimization, which should otherwise have been used to create the relabeling sequences as well.) Each of these design matrices was extended to include trend and head-motion components as described above (see *Design matrix and multiple linear regression*) and a confound mean predictor for each run.

*(ii) Randomization distribution of single-subject information-based maps.* Information-based mapping was performed in each subject separately for each of the 1,000 null-simulation design matrices using a 3-mm-radius spherical searchlight. To be able to perform these mappings efficiently the Mahalanobis distance (SI Fig. 8*b*) was replaced by the mean squared *t* (MST) value within the searchlight for the contrast of interest. (The MST is closely related to the Euclidean distance: to obtain the Euclidean distance, the MST needs to be multiplied by the number of voxels entering the estimate and the square root taken.)

*(iii) Talairach-space group averaging of information-based maps.* For each subject, each of the 1,000 information-based maps was projected into Talairach space. We used BrainVoyager to define this transformation based on the T1-weighted anatomical volumes. For each of the 1,000 null-simulations, the resulting information-based maps were averaged across the 11 subjects in Talairach space, yielding 1,000 group-average null-simulation maps. The information-based maps obtained using the design matrix for the true labeling of the experimental trials were averaged in Talairach space in the same way.

*(iv) P values from voxel-specific randomization distributions.* For each voxel, we used the 1,000 values in null-simulation maps as a voxel-specific randomization distribution. The *P* value of the voxel was estimated as the percent rank of the actual map's value at that voxel in the voxel-specific randomization distribution, divided by 100. The resulting map for face 1 versus face 2 highlights right aIT at *P* < 0.001. The voxels exceeding this threshold have actual values greater than all values in the randomization distribution. However, because there are only 1,000 values in the randomization distribution, this distribution does not allow us to estimate how much smaller than 0.001 the *P* value is.

*(v) P values from pooling randomization distributions across voxels.* To obtain more precise *P* value estimates, we pooled randomization distributions across voxels. To account for inhomogeneities across voxels, the voxel-specific randomization distributions were first normalized: For each voxel, the mean and standard deviation of the voxel-specific randomization distribution was computed, then each value of that randomization distribution as well the actual value of the information-based map at that voxel were normalized by subtracting the mean and dividing by the standard deviation. The resulting normalized randomization distributions were combined across cortical voxels. The normalized values of the actual information-based map were converted to p values as described above but using the randomization distribution pooled across voxels.

The resulting *P* map for face 1 versus face 2 (Fig. 3, SI Fig. 9) was thresholded at *P* < 0.001. The peak voxel has *P* < 0.0001, thus surviving small-volume Bonferroni correction for 500 voxels (4,000 mm^{3}). Similar results were obtained by using a randomization distribution of null-simulation map maxima to correct for multiple comparisons.

1. Wager TD, Nichols TE (2003) *NeuroImage* 18:293-309.

2. Boynton GM, Engel SA, Glover GH, Heeger DJ (1996) *J Neurosci* 16:4207-4221.

3. Lazar NA, Luna B, Sweeney JA, Eddy WF (2002) *NeuroImage* 16:538-550.

4. Cochrane D, Orcutt GH (1949) *J Am Stat Assoc* 44:32-61.

5. Bullmore E, Long C, Suckling J, Fadili J, Calvert G, Zelaya F, Carpenter TA, Brammer M (2001) *Hum Brain Mapp* 12:61-78.