Sampling error

As part of a research project I was writing R code that samples and resamples data from a given population. I was surprised by how badly sampling error affects small samples and how easy it was to visualise that. Here, I’ve posted some pictures and R code so you can see for yourself.

Ok, sampling error means that the distribution of a sample of observations drawn from a population looks different than the population-distribution (Wikipedia says this). As a rule, the more independent observations are drawn, the more the sample data will look like the population data.

In psychology and decision-making research, participants often answer questions like ‘how happy are you’ on a 1-7 scale, where 1 = very unhappy and 7 = very happy. Usually, answers on questions like these are assumed to be normally distributed.

Using R, I drew samples from a distribution that looks like this:

drawn from

On the question “how happy are you” in the population, 5% says ‘1’, 10% says ‘2’, 20% says ‘3’, 30% says ‘4’, 20% says ‘5’, 10% says ‘6’, and 5% says ‘7’

So, using this code (available for copy-paste at the end and for download here) you can play around with how many samples you draw from the population above and how large those samples are. Or change the distribution. Or adjust the color of the bars. Or whatever. Anyway, if you read this on a phone, do not have R installed, or if you are just plain lazy, here are some pictures.

I drew 10 samples of 20 observations from the population and plotted them in the pictures below. I was surprised by how much variation there is between the samples and how different they look from the original distribution. I mean, look at that fourth one.

(Note: I plotted the raw observed frequencies, these are not percentages)

20_10 20_9 20_8 20_7 20_6 20_5 20_4 20_3 20_2 20_1

Quite a bit of variation, right?

I also drew 1000 samples of N = 20 and saved the means. The population mean is 4, and this is the distribution of means in those 1000 samples. Look at how often a mean that is more than .5 different from 4 is observed!


You may ask, why choose N = 20? Good question. N = 20 per condition is pretty common in the papers I read. However, if you’ve ever run a power-calculation, you know that a sample of N = 20 per condition leads typical decision-making / psychology survey studies to be underpowered.

So, to close, the following pictures show what happens when you draw samples of, let’s say, N = 100. The sample data still do not look exactly like the population data, but they are much, much closer.

100_1 100_2 100_3 100_4 100_5 100_6 100_7 100_8 100_9 100_10

Much less variation, right?

The distribution of means in 1000 samples of N = 100 looks like this. It’s clear (look at the x-axis) that the means differ much less from the true mean (4) than the N = 20 samples.


Finally, here’s the code (or dropbox link to code), have fun and let me know if you have cool adjustments / insights!

# set possible answers, here 1-7
answers <- c(1:7)

# make ‘normal distribution’ by settings weights.
weights <- c(0.05,0.1,0.2,0.3,0.2,0.1,0.05)
means = 0

# set sample size to be drawn
N<- 20
repetitions = 1000
for (k in 1:repetitions)
# draw N observations from distribution specified above
sim<-sample(answers, N, replace = TRUE, prob = weights)

# plot observed frequencies (note y-axis starts at 0 and limit is N / 2)
barplot(table(factor(sim, levels = 1:7)), main =”N=20″, xlab = ‘Values 1-7’, ylab = ‘Observed frequencies’, col=(“navyblue”),ylim = c(0,N/2))

# save means in k repetitions
if (k==1) means = round(mean(sim),2) else means = c(means,round(mean(sim),2))

barplot(table(means), col=(“navyblue”), main = “Observed means in 1000 samples of N =20”)


About Job van Wolferen

PhD-student at TIBER / Tilburg University. Research: Moral hazard and fraud in the insurance industry. Interested in behavioral economics and JDM
This entry was posted in All and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s