DRAFT: This module has unpublished changes.

https://robschnidert-test.shinyapps.io/Herb/

A Friendly Statistical Guide to the Power of a Test

The main statistical concepts that this shiny app explores is the power of a t-test, or—more specifically—how changing the elements of a one sample t-test can affect the number of times a test statistic is generated that rejects the assumed population mean. Let’s take a moment to go over each of those concepts before we get into the app itself. First off, what is a one sample t-test? A one sample t-test is a statistical test that tells you whether the sample mean, or the mean you get from your data, is substantially different from the hypothesized mean, or the value of the mean you’d expect if nothing unusual was going on. If you get a low test statistic from your t-test, you reject the null hypothesis that the means are equal. The main components of a one sample t-test are the sample size, the hypothesized value of the population mean, the sample mean you get from your data, the value of the standard deviation, and the confidence level of the test. This app will explore how the proportion of null hypothesis rejections will increase or decrease depending upon how the user chooses to change these components, particularly the true mean, standard deviation, sample size, and the test’s confidence level. This should help demonstrate the statistical power of a t-test. Statistical power describes the probability that a statistical test is going to correctly reject a false null hypothesis. In other words, the power of a test tells us whether our test gives us correct results. By looking at the proportions of times a test rejected the null hypothesis for various values of the mean, we can look to see how powerful and accurate our test is.

Shiny App Tutorial

So, what does our app do exactly? Well, it lets you observe the concept of power in action. Behind the scenes, our app is sampling data from the normal distribution, which you might also known as a Bell Curve, and splitting it up into 1,000 sections. After that, it performs a t-test on each section to see whether or not that section would make us reject the null hypothesis, or make us reject that the sample and true means are equal. Finally, it finds the proportion of t-tests that rejected the hypothesis over the total number of t-tests, then it produces a graph to show you those proportions for eleven values, which are the mean and five values to either side of it.  As such, our app contains these four adjustable components: the sample size, the value of the true mean, the standard deviation, and the confidence level being used. In this section, we’ll explain each of these options and how to use them to alter the graph on the right side of the screen.

These are the default settings of the app. As you can see, it has options for the sample size, true mean, standard deviation, and the confidence level. Each of those effects the graph on the right, changing how narrow its basic shape is and other such things. Feel free to play around with it a little bit—see what you can get! We’ll give you a quick summary of these options below so that you can get a better idea of what they are and how they work.

The Sample Size:

Remember how earlier we said that our app sampled data from the normal distribution and then split it into sections? This slider bar allows you to control the size of those sections. Its default value is 30, since this is typically the smallest sample size a professional statistician feels good about having in a trial for various reasons.  As you move the slider further to the left, assuming you keep the other options the same, you’ll notice that the graph gets wider and flatter, with more data points “pooling” closer to the bottom.

Conversely, if you crank it up to the right, you’ll see more and more data points rise to higher proportions. This is because the fewer data points you have in each section/trial, the less reliable they’ll be, while with larger samples you’ll have more reliable results.

The Hypothesized Mean, aka the True Value of Mu:

This option lets you set the true mean that our data is generated from. Since it’s from a normal distribution, we now know that whatever value we set here will be correct—that this really is the true mean of our dataset. Therefore, we should expect rejection of of this value for the true mean to be very low, since whatever value you put in will always equal itself, e.g., 100 = 100. It’s worth noting that we put a minimum value of mu = 0 for this option, so if you go into negative numbers you will not receive any output, and you’ll lose sight of that beautiful graph you’ve been looking so intently at. After playing around with this option, you’ll probably notice that it doesn’t change very much. Why is that? It’s because of what we said a few sentences ago: since mu is always equal to itself, you won’t get many rejections near it, but you will the further away from it you get. Because that identity property doesn’t change, the true value of mu won’t change the graph very much since the data is resampled to it every time.

The Standard Deviation:

Similar to the previous option of where we set the true mean, this option does the same thing only with the standard deviation. It determines the value of the standard deviation for the distribution we take our data from, which affects how spread out our data points are. If they’re very close together and have a low standard deviation, you’ll notice that we get very few incorrect rejections of the true mean. For example, let’s look at the graph where the standard deviation is equal to 1.

As you can see, the test correctly rejects the null at every point we’ve defined except for the true mean, which still has a very low number of rejections. The opposite is true with high standard deviations, which mean our data is very spread out and hard to find patterns in. For example, where the standard deviation is equal to 25, our graph looks like this:

As you can see, the pattern is not so well defined as when the standard deviation was equal to 5 in our default graph. Ratcheting it up even more, setting the standard deviation equal to 50, we can see there is almost no indication of pattern whatsoever.

Much like with the true mean, we also set a minimum value for the standard deviation to be 0. This is because the standard deviation cannot be negative, so we gave it that lower bound. Keep that in mind as you plug values in!

The Confidence Level:

Finally, we come to the confidence level. By clicking these buttons, you can adjust the standard we use when performing our t-test. When conducting a t-test, the test statistic we get is called a p-value, and we use low p-values to determine that we have sufficient evidence to reject the null hypothesis that the sample and true means are equal. Usually, statisticians use 0.05 as the highest possible p-value that determines a significant difference between those means. That’s why we use it as our default here, and it means that we can be 95% confident that our results in any given test are correct. Similarly, 0.10 is another common threshold that could be used as a standard, which would make us 90% confident in any given test. Changing this option alters the graph slightly, although perhaps not as majorly as the sample size or standard deviation options. Here’s another graph for comparison—compare this graph, which uses the standard of 0.10, with the default settings graph, which uses the standard of 0.05.

Notice how several of the points have shifted up compared to the first graph. This is because, by raising the value of our rejection condition, more tests are rejecting the null then before.

What we can learn?

After playing around with our app for a while, you’ll have hopefully noticed the v-shaped pattern in the graph that occurs in most instances. This indicates that the proportion of tests that correctly detect the difference between the sample mean being tested versus the true mean increases the further away one gets from the true mean. In other words, the further away from the true mean, the more powerful our t-test is. However, we can also so say that as the sample mean gets closer to the true mean, the proportions of test that incorrectly detect the difference in the sample mean being tested versus the true mean increases. Therefore, our test is less powerful the closer we get to the true mean. Through this Shiny App, you should be able to see that a test’s power is dependent on the elements of a t-test: the sample size, true mean, sample mean being tested, standard deviation, and the confidence level.

Appendix:

# This is a Shiny web application. You can run the application by clicking

# the 'Run App' button above.

#

# Find out more about building applications with Shiny here:

#

#

library(shiny)

normData <- function(rows=1000,n=3,mu=100,sigma=5,mu0=107,size="YES"){

rows <- rows

samplesize <- n

x <- rnorm(rows*samplesize,mu,sigma)

mydata <- matrix(x,nrow=rows)

mymeans <- apply(mydata,1,mean)

mysd <- apply(mydata,1,sd)

test.stat.num <- mymeans - mu0

test.stat.denom <- mysd/sqrt(samplesize)

test.stat <- test.stat.num/test.stat.denom

pvals <- 2*pt(abs(test.stat),lower.tail=FALSE,df=samplesize-1)

if(size=="YES") {myprop <- sum(pvals < 0.05)/rows}

if(size=="NO") {myprop <- sum(pvals < 0.10)/rows}

print(myprop)

}

library(ggplot2)

library(shiny)

# Define UI for application that draws a histogram

ui <- fluidPage(

# Application title

titlePanel("Rob Schneider is: The Power of a Test"),

# Sidebar with a slider input for number of bins

sidebarLayout(

sidebarPanel(

sliderInput("n","The Sample Size for the Proporitons", value = 30, min = 0, max = 100),

numericInput("mu","The True Value of Mu", value = 100, min = 0),

numericInput("sigma","The Value of the Standard Deviation",value = 5, min = 0),

radioButtons("size","Confidence Level Being Used:", choices = list("0.05"="YES","0.10"="NO"))

),

# Show a plot of the generated distribution

mainPanel(

plotOutput("anteater")

)

)

)

server <- function(input, output) {

normdata <- reactive({

robert<-function(rangeroo = input$mu){ val<-c((input$mu-5):(input$mu+5)) gpoints<-numeric(length = 11) for (i in val) { gpoints[i-(min(val)-1)]<-normData(rows=1000,n=input$n,mu=input$mu,sigma=input$sigma,mu0=i,size=input$size) } print(gpoints) } thingys<-robert(rangeroo = input$mu)

thingys<-as.data.frame(thingys)

})

output$anteater <- renderPlot({ normdata <- normdata() ggplot(data = normdata, aes(x = c((input$mu-5):(input$mu+5)), y = thingys))+ theme_bw()+ geom_point(color = "darkblue", size = 2)+ labs(title = "The Proportion of Times the Null Hypothesis was Rejected", subtitle = paste("With the True Value of Mu Being", input$mu),

caption = "Disclaimer: Rob Schneider is in no way affiliated with this project and should not be contacted to determine the power of a test.",

x = paste("Values Tested Against Mu = ", input\$mu),

y = "Proportion of Times Mu was Rejected")

})

}

# Run the application

shinyApp(ui = ui, server = server)

DRAFT: This module has unpublished changes.