Coupon Collecting and Football Stickers

Although Euro 2016 is still months away from officially starting, another footballing tradition has just started: collecting football stickers. For children and adults alike the quest to complete the entire sticker album is something that takes time and money but the excitement and joy makes it worth it.

As the number of stickers in each album have dramatically increased from the first few albums produced around 40 years ago, the task of collecting all of the stickers has become much harder. Mathematically this problem happens to be a classic problem in combinatorial optimisation: the coupon-collector problem. This problem is when there is a finite number of different coupons any of which can be given to a person one at a time. In a statistical sense this is the problem of sampling with replacement. The question is then how many times do you need to sample to obtain a copy of every single sticker?

albumcover2

You probably already have some intuition to the answer. The first sticker you get will be guaranteed to be one that you don’t have, whereas the chance of the same sticker being the one to complete your collection is tiny. Clearly the probability of finding a duplicate increases with the more stickers you own, it takes a short time to find the first few stickers but a very long time to find the last few.

Forming the Problem

The classic way to model the problem is by using the geometric distribution. Suppose that there are a total of n stickers in the album. Then we introduce a geometric random variable h_i that is the number of stickers needed to be purchased to move from having i-1 different stickers to i different stickers in our collection. This means that the total number of stickers we need to purchase to complete the collection H is the sum of all these h_i.

H = h_1 + h_2 + h_3 + \ldots + h_n

We now need to consider what the probability of finding a new sticker actually is. Let’s start by assuming that the probability of getting a sticker is exactly the same as getting any other. If there are a total of N stickers to choose from, then the chance of finding a new sticker the first time is N/N. This is because there are N stickers we need out of a possible N types of stickers. To get a second sticker we have a N-1 stickers left that we need again out of a possible N types of stickers. Similarly for the third sticker we have the N-2 stickers left that we want out of a possible N stickers and so on and so forth.

p_1 = \cfrac{N}{N}, p_2 = \cfrac{N-1}{N}, p_3 = \cfrac{N-2}{N}, \ldots, p_n = \cfrac{N-n+1}{N}

This means that the probability p_i of finding the ith new sticker is (N-i+1)/N. As a result h_i is a geometric random variable with expectation 1/p_i.

By taking into account the independence of the h variables and the linearity of their expectations we can calculate a formula for the expected number of stickers needed to be bought to complete the album, E(H).

E(H) = E(h_1) + E(h_2) + E(h_3) + \ldots + E(h_n) = \cfrac{1}{p_1} + \cfrac{1}{p_2} + \cfrac{1}{p_3} + \ldots + \cfrac{1}{p_n}

If we then substitute the probabilities into the above equation and factorise the equation we produce the result given below.

E(H) = \cfrac{n}{n} + \cfrac{n}{n-1} + \ldots + \cfrac{n}{2} + \cfrac{n}{1} = n(\cfrac{1}{n} + \cfrac{1}{n-1} + \ldots ++ \cfrac{1}{2} +\cfrac{1}{1}) = n \sum\limits_{i=1}^n \cfrac{1}{i}

This result produces an interesting result, it involves the sum of the reciprocals of the first n natural numbers. This is known as the nth harmonic number. It should be noted that this differs from the harmonic series as we are not summing to infinity. This sequence crops up in many interesting scenarios, but we are interested in finding its value. It turns out that there is no general formula to find harmonic numbers, we have to use asymptotic expansions or approximations to find it.

Asymptotic Expansion

The simplest way to illustrate this is to look at its relation with the natural logarithm and the area under the graph of y=\cfrac{1}{x}.

If we look at H_n - \cfrac{1}{n} = 1 + \cfrac{1}{2} + \ldots + \cfrac{1}{n-1} we can see that this is an overestimation of the area beneath the graph by using rectangles.

above1overx

Overestimate; source

If we also useH_n -1 = \cfrac{1}{2} + \cfrac{1}{3} + \ldots + \cfrac{1}{n} we can see that this turns out to be an underestimation of the area.

below1overx

Underestimate; source

 

As a result the true value of H_n is bounded by these two values,

H_n - 1 < \int\limits_1^n \frac{1}{x} dx < H_n - \cfrac{1}{n}

\rightarrow \ln{n} + \cfrac{1}{n} < H_n < \ln{n} + 1

It is clear that the natural logarithm has something to do with this number, which was proved later by Euler who found that the difference between a harmonic number and the natural logarithm was bounded by a constant.

\text{lim}_{n \rightarrow \infty} (H_n - \ln{n}) = \gamma

This constant \gamma is known as the Euler-Mascheroni constant and has the value of \gamma \approx 0.5772156649.

Finding Answers

Rearranging this sequence and multiplying it by n yields the answer to earlier equation.

E(H) = nH_n \approx n(\ln{n} + \gamma)

As the current Euro 2016 sticker album contains 680 stickers, we see that using this value for n that the expected number of stickers needed is about 4827. Since stickers are bought in packets of 5 and each packet costs 50p, if you wanted to complete the album by only purchasing packets it would most likely take 966 packets and cost you £483.

The real life problem is slightly more tricky since every sticker in a pack is supposed to be different. This shouldn’t change the answer significantly, 5 stickers out of 660 is pretty small. The correct analytical answer is given below.

\text{Expected Number of Packets} = {{680}\choose{5}} \sum\limits_{j=1}^{680}(-1)^{j+1} \cfrac{{{680}\choose{5}}}{{{680}\choose{5}} + {{680-j}\choose{5}}}

Swapping Stickers

Traditionally people who want to complete their album don’t just rely on buying packets, but also swap stickers with other people. By swapping in person or over the internet it is possible to reduce the number of packets needed by turning duplicates into missing stickers.

swaps2

Swapping stickers with my cool sister Hannah

Sardy and Velenik did numerical computation to determine the cheapest way of completing a collection. First you should buy a box of 100 packets to get about 500 different stickers. Then you should buy packets and swap stickers until you have about 50 stickers to go. These final 50 can be ordered directly to complete the album. As a result the cost of this album could be as low as £85 depending on the price of the box.

optimalswapping

Swapping reduces the number of sticker needed; source

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s