Multiple Regression Power Analysis | G*Power Data Analysis Examples (2024)

NOTE: This page was developed using G*Power version 3.1.9.2. Youcan download the current version of G*Power fromhttps://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html. You can also find help files, the manual and the user guide on this website.

Introduction

Power analysis is the name given to the process for determining the samplesize for a research study. The technical definition of power is that it is theprobability of detecting a “true” effect when it exists. Many students thinkthat there is a simple formula for determining sample size for every researchsituation. However, the reality is that there are many research situations thatare so complex that they almost defy rational power analysis. In most cases,power analysis involves a number of simplifying assumptions, in order to makethe problem tractable, and running the analyses numerous times with differentvariations to cover all of the contingencies.

In this unit we will try to illustrate how to do a power analysis formultiple regression model that has two control variables, one continuousresearch variable and one categorical research variable (three levels).

Description of the experiment

A school district is designing a multiple regression study looking at theeffect of gender, family income, mother’s education and language spoken in thehome on the English language proficiency scores of Latino high school students.The variables gender and family income are control variables and not of primaryresearch interest. Mother’s education is a continuous variable thatmeasures the number of years that the mother attended school. The range of thisvariable is expected to be from 4 to 20. The variable language spoken (homelang) in thehome is a categorical research variable with three levels: 1) Spanish only, 2)both Spanish and English, and 3) English only. Since there are three levels, itwill take two dummy variables to code language spoken in the home.

The full regression model will look something like this:

engprof = b₀ + b₁(gender) + b₂(income) + b₃(momeduc) + b₄(homelang1) + b₅(homelang2)

Thus, the primary research hypotheses are the test of b₃ and thejoint test of b₄ and b₅. These tests are equivalent thetesting the change in R² when momeduc (or homelang1 andhomelang2) are added last to the regression equation.

The power analysis

Let’s set up the analysis. Under Test family select F tests, and under Statistical test select ‘Linear multiple regression: Fixed model, R² increase’. Under Type of power analysis, choose ‘A priori…’, which will be used to identifythe sample size required given the alpha level, power, number of predictors and effect size.

The latter can be determined via the ‘Determine =>’ button, which calls up amenu requesting the variance explained by the special effect and the residual variance.

We believe, from previous research, that the R² for the full-modelwith five predictor variables (2 controls, 1 continuous research, and 2 dummyvariables for the categorical variable) will be will be about 0.48.

Let’s start with the continuous predictor (momeduc, the special effect inthis case). We think that it will add about 0.03 to the R² when it isadded last to the model. This is what we put under ‘Variance explained by special effect’.

The residual variance is defined as 1 – (R² of the full-model), and in this case is 1 – 0.48 = 0.52. The totalnumber of variables (predictors) is 5 and the number being tested (df) is one.Let’s assume that the power is 0.70.

We will run three calculations with power equal to 0.7, 0.8 and 0.9. Making use of the ‘X-Y plot for a range of values’ button and denoting poweras the independent variable y ranging from 0.7 to 0.9 in steps of 0.1:

This gives us a range of sample sizes ranging from 109 to 184 depending onpower.

Let’s see how this compares with the categorical predictor (homelang1and homelang2) which uses two dummy variables in the model. We believe that thechange in R² (variance explained by the special effect) attributed to the two dummy variables will be about0.025. Note that the residual variance is still defined as 1 – 0.48 = 0.52 so this does not change. The total number of predictors stays at 5 while the numerator df (number of tested predictors) is now 2.

This series of power analyses yielded sample sizes ranging from 163 to 266. These sample sizes are larger than those for the continuous research variable.

If it is the case that both of these research variables are important, wemight want to take into that we are testing two separate hypotheses (one for thecontinuous and one for the categorical) by adjusting the alpha level. Thesimplest but most draconian method would be to use a Bonferroni adjustment bydividing the nominal alpha level, 0.05, by the number of hypotheses, 2, yieldingan alpha of 0.025. We will rerun the categorical variable power analysis usingthe new adjusted alpha level.

The Bonferroni adjustment assumes that the tests of the two hypotheses areindependent which is, in fact, not the case. The squared correlation between thetwo sets of predictors is about .2 which is equivalent to a correlation ofapproximately .45. Using an internet applet to compute a Bonferroni adjustedalpha taking into account the correlation gives us an adjusted alpha value of0.034 to use in the power analysis.

Based on the series of power analyses the school district has decided tocollect data on a sample of about 225 students. This sample size should yield apower of around 0.8 in testing hypotheses concerning both the continuousresearch (momeduc) variable and the categorical research variable,language spoken in the home (homelang1 and homelang2). The nominalalpha level is 0.05 but has been adjusted to .034 to take into account thenumber of hypotheses tested and the correlation between the predictors.

Multiple Regression Power Analysis | G*Power Data Analysis Examples (2024)

Introduction

Description of the experiment

The power analysis

References