Create Presentation
Download Presentation

Download Presentation
## Lorelei Howard and Nick Wright MfD 2008

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**t-tests, ANOVA and regression**- and their application to the statistical analysis of fMRI data Lorelei Howard and Nick WrightMfD 2008**Overview**• Why do we need statistics? • P values • T-tests • ANOVA**Why do we need statistics?**• To enable us to test experimental hypotheses • H0 = null hypothesis • H1 = experimental hypothesis • In terms of fMRI • Null = no difference in brain activation between these 2 conditions • Exp = there is a difference in brain activation between these 2 conditions**2 types of statistics**• Descriptive Stats • e.g., mean and standard deviation (S.D) • Inferential statistics • t-tests, ANOVAs and regression**So how do we know whether the effect observed in our**sample was genuine? • We don’t • Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population**P values**• P values = the probability that the observed result was obtained by chance • i.e. when the null hypothesis is true • α level is set a priori (Usually 0.05) • If p < α level then we reject the null hypothesis and accept the experimental hypothesis • 95% certain that our experimental effect is genuine • If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis**Two types of errors**• Type I error = false positive • α level of 0.05 means that there is 5% risk that a type I error will be encountered • Type II error = false negative**t-tests**• Compare two group means**Hypothetical experiment**Time Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions? Condition 1 = Simpson family faces Condition 2 = Griffin family faces**Group 1**Group 2 Calculating T Difference between the means divided by the pooled standard error of the mean**Degrees of freedom**• = number of unconstrained data points • Which in this case = number of data points – 1. • Can use t value and df to find the associated p value • Then compare to the α level**Different types of t-test**• 2 sample t tests • Related = two samples related, i.e. same people in both conditions • Independent = two independent samples, i.e. diff people in 2 conditions • One sample t tests • compare the mean of one sample to a given value**Another approach to group differences**• Analysis Of VAriance (ANOVA) • Variances not means • Multiple groups e.g. Different facial expressions • H0 = no differences between groups • H1 = differences between groups**Calculating F**• F = the between group variance divided by the within group variance • the model variance/error variance • for F to be significant the between group variance should be considerably larger than the within group variance**What can be concluded from a significant ANOVA?**• There is a significant difference between the groups • NOT where this difference lies • Finding exactly where the differences lie requires further statistical analyses**Different types of ANOVA**• One-way ANOVA • One factor with more than 2 levels • Factorial ANOVAs • More than 1 factor • Mixed design ANOVAs • Some factors independent, others related**Conclusions**• T-tests assess if two group means differ significantly • Can compare two samples or one sample to a given value • ANOVAs compare more than two groups or more complicated scenarios • They use variances instead of means**Further reading**• Howell. Statistical methods for psychologists • Howitt and Cramer. An introduction to statistics in psychology • Huettel.Functional magnetic resonance imaging (especially chapter 12) Acknowledgements • MfD Slides 2005 – 2007**PART 2**• Correlation • Regression • Relevance to GLM and SPM**Y**Y Y Y Y Y X X Positive correlation Negative correlation No correlation Correlation • Strength and direction of the relationship between variables • Scattergrams**Describe correlation: covariance**• A statistic representing the degree to which 2 variables vary together • Covariance formula • cf. variance formula but… • the absolute value of cov(x,y) is also a function of the standard deviations of x and y.**Describe correlation: Pearson correlation coefficient (r)**• Equation • r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation) • Limitations: • Sensitive to extreme values, e.g. • r is an estimate from the sample, but does it represent the population parameter? • Relationship not a prediction. s = st dev of sample**Summary**• Correlation • Regression • Relevance to SPM**Regression**• Regression: Prediction of one variable from knowledge of one or more other variables. • Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association). • Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.**ε**= ŷ, predicted = y i , observed ε =residual Best fit line, minimising sum of squared errors • Describing the line as in GCSE maths: y = m x + c • Here, ŷ = bx + a • ŷ : predicted value of y • b: slope of regression line • a: intercept ŷ = bx + a • Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ). • Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2**Sums of squared error (SSerror)**Gradient = 0 min SSerror Values of a and b How to minimise SSerror • Minimise (y- ŷ)2 , which is (y-bx+a)2 • Plotting SSerror for each possible regression line gives a parabola. • Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus. • Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:**How good is the model?**• We can calculate the regression line for any data, but how well does it fit the data? • Total variance = predicted variance + error variance sy2 = sŷ2 + ser2 • Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model r2 = sŷ2 / sy2 • Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2= sy2 (1 – r2) • From this we can see that the greater the correlation the smaller the error variance, so the better our prediction**sŷ2**r2 (n - 2)2 F = (dfŷ,dfer) ser2 1 – r2 Is the model significant? • i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean? • F-statistic: complicated rearranging =......= • And it follows that: So all we need to know are r and n ! r(n - 2) t(n-2) = √1 – r2**Summary**• Correlation • Regression • Relevance to SPM**General Linear Model**• Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a +ε • A General Linear Model is just any model that describes the data in terms of a straight line**One voxel: The GLM**Our aim: Solve equation for β – tells us how much BOLD signal is explained by X data vector (Voxel) parameters design matrix error vector a m b3 b4 b5 b6 b7 b8 b9 = + × = + Y X b e**Multiple regression**• Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y • The different x variables are combined in a linear way and each has its own regression coefficient: y = b0 + b1x1+ b2x2 +…..+ bnxn + ε • The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. • i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for**SPM**• Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y • Multiple Regression models the effect of several independent variables, x1,x2 etc, on one dependent variable, y • Both are types of General Linear Model • This is what SPM does and will be explained soon…**Summary**• Correlation • Regression • Relevance to SPM Thanks!