Santa. Secret Santa.

python simulation

With “Spooky Season” coming to an end, Holiday Season is up next. If you have done Secret Santa rounds before, you might have wondered, what the odds are that someone draws his or her own name from the lots.

Christian A. Gebhard https://twitter.com/c_gebhard (jolly data blog)https://jollydata.blog/about.html
2021-11-04

Introduction

Secret Santa is a game related to the tradition of giving gifts to one another on Christmas, with the twist that you (usually) don’t know who gave you the gift. In some regions the game is quite popular and is played in many communities, sports groups, school classes,…

At the time of writing, there are Wikipedia articles on Secret Santa traditions available in 11 languages. I haven’t read all of them, but it is clear, that there are many different variations across the world. In Germany this game is called “Wichteln”. There’s also a variation called “Schrottwichteln” where you ought to give something that is worthless to you, but might be of value for someone else.

For this post I will use the variant that I got to know as the “classic version” and will build the mathematical question on these rules:

A group of N participants draws lots from a bucket. On each lot is the name of one participant. Each name is included just once, so that the number of lots matches the number of participants. Each participant draws one lot. If any of the participants draws their own name, the draw is invalid and the whole process has to be repeated.
Q: What is the probability, that a draw is valid for any group size N?1

What’s your intuition?

If you think about it from an intuitive point of view, there are two contradicting effects:

  1. As more people participate, each one has a larger “pool” of lots to draw from, so the probability is lower to end up with one’s own name2
  2. As more players participate, the probability of at least one of them drawing their own name increases.3

Before you continue reading, think about what your intuition says: Do larger groups have to repeat the drawing more often or less often? If you want, you can leave your comment in the end and let me know if the result surprised you or not!

Let’s do a simulation first

First let’s create the functions that perform the necessary simulations of drawings for different group sizes and evaluate the correctness of the draws.

Show code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

np.random.seed(1224)

Basic simulation

draw_lots() simply returns a random permutation of a sequence from 1 to the group size.

Show code
def draw_lots(n_people = 5):
    """ 
    Takes a number of people and returns the simulated frequency of correct drawing rounds.
    

    Parameters
    ----------
    n_people : int, optional
        The number of people in a wichtel-round. The default is 5.

    Returns
    -------
    Array of people (arange from 0 to n_people).
    Array of lots (random choices without replacement).

    """
    people = np.arange(n_people)
    lots = np.random.choice(people,size = len(people), replace=False)  
    return people, lots

verify_draft() checks, whether any of the participants drew herself/himself

Show code
def verify_draft(people, lots):
    """
    Take two vectors of equal length and returns a boolean whether the draft was a correct wichtel-draft.

    Parameters
    ----------
    people : 1D-Array
    Array of people (arange from 0 to n_people).
    
    lots : 1D-Array
    Vector of lots (random choices without replacement).

    Returns
    -------
    True if the round was valid, false, if not.

    """

    # compare participants with the lot they drew
    check = people == lots

    # check if no one drew the own name
    valid_round = not any(check)

    # return true or false
    return valid_round

sim_wichteln() uses the above two functions to simulate multiple draws for a given group and returns the fraction of correct draws.

Show code
def sim_wichteln(n_people = 10, rounds = 100):
    """
    Simulate rounds drawings of lots for one group of size n_people participants, returns mean fraction of correct drawings.

    Parameters
    ----------
    n_people : int, optional
        Number of participants. The default is 10.
    rounds : int, optional
        Number of rounds to simulate. The default is 100.

    Returns
    -------
    Float: fraction of correct drafts.

    """
    results = []
    
    for i in range(rounds):
        people, lots = draw_lots(n_people)
        valid_round = verify_draft(people, lots)

        results.append(valid_round)
        
    return np.mean(results)

sim_groups() repeatedly calls sim_wichteln() with increasing group sizes from 1 to the specified max_group.

Show code
def sim_groups(max_group = 1000, rounds = 1000):
    """
    Simulate 'rounds' times drawings of lots for all group sizes from 1 to max_group.

    Parameters
    ----------
    max_group : int, optional
        Maximum number of participants. The default is 1000.
    rounds : int, optional
        Number of rounds to simulate for each group size. The default is 1000.

    Returns
    -------
    pd.DataFrame(): A dataframe containing the group sizes, the simulated rounds for each group size and the fraction of valid draws.

    """
    
    results_df = pd.DataFrame()
    
    for n_people in range(1, max_group):
        results = sim_wichteln(n_people, rounds)
        results_df = results_df.append({'people':n_people, 'rounds':rounds, 'valid':results}, ignore_index =True)
    
    return results_df

Let’s have a look at how the fraction of correct draws is simulated for groups of size 1 to 50, where each group is simulated 100 times:

Show code
simulated_groups = sim_groups(max_group = 50, rounds = 100)

fig, ax0 = plt.subplots(1, 1, figsize=(8, 5))
plt.suptitle("Fraction of valid draws from 100 simulated\nrounds in different group sizes", fontsize=18, y=0.95)

sns.scatterplot(x = simulated_groups.people, y = simulated_groups.valid, ax=ax0)
ax0.set_xlabel("Group size [n people]")
ax0.set_ylabel("Fraction of valid rounds in 100 simulated rounds")
sns.despine()

plt.show()

As you can see in the graph, there is at least one outlier: n = 1 (fraction of 0). But most other values are distributed between 0.3 and 0.5. The summary statistics of the simulated numbers4 are:

Show code
print(simulated_groups[simulated_groups.people > 3].describe())
#>           people  rounds      valid
#> count  46.000000    46.0  46.000000
#> mean   26.500000   100.0   0.363261
#> std    13.422618     0.0   0.054977
#> min     4.000000   100.0   0.230000
#> 25%    15.250000   100.0   0.330000
#> 50%    26.500000   100.0   0.365000
#> 75%    37.750000   100.0   0.390000
#> max    49.000000   100.0   0.460000

How are the values distributed?

Is there a trend, as the number of participants grows? To analyse this and to narrow down the distribution, we will now repeat the simulation for a larger max_group size, and we will increase the number of simulated rounds per group. For this we’ll use another function sim_steps() that calls the previously described functions. For each step a histogram of the resulting fractions of correct draws is plotted, showing a narrowing distribution.

Show code

def sim_steps(max_group = 1000, round_steps = [10, 100, 1000]):
  """
  Perform the simulation with different simulations per group size.

  Parameters
  ----------
 max_group : int, optional
      Maximum number of participants. The default is 1000.
  rounds : list, optional
      List with differnt steps of simulaiton rounds.

  Returns
  -------
  None. Plots are dynamically generated within the function, no value is returned.

  """
  
  fig, axes = plt.subplots(len(round_steps), 1, figsize=(8, 5*len(round_steps)), sharex=True)
  plt.subplots_adjust(hspace=0.5)
  plt.suptitle(f"Distribution of the fraction of valid draws\nin groups of 1 to {max_group} participants", fontsize=18, y=0.95)


  for step, ax in zip(round_steps, axes):
    sim_df = sim_groups(max_group, step)
    
    sns.histplot(sim_df['valid'], ax=ax)
    ax.set_title(f"Simulating {step} draws at each group size.")
    sns.despine()
    ax.set_xlabel(f"Mean fraction: {np.round(sim_df['valid'].mean(), 3)}, SD: {np.round(sim_df['valid'].std(), 3)}.")
    
  plt.show()

  return None

Let’s call the new function with increasing orders of magnitude:

Show code

sim_steps(max_group = 1000, round_steps = [10, 100, 1000, 5000])

The stepwise increase demonstrates, that the fraction of correct draws narrows down somewhere around 0.36. So it seems, that the fraction is not trending towards 1 or 0, but rather approaches a certain value somewhere below 0.4. If this simulation is sufficiently reliable for you, you can take this figure as an estimate for the next Secret Santa group you participate in and skip the rest.

However, if you’re into math, you might recognize the direction where this is going: isn’t 0.36something close to…yes it is!

The actual math is beautiful

The necessary keyword, which describes the underlying math is “derangements”. By definition, a “derangement” is a permutation without fixed points, i.e. no element appears in its previous position. Applied to the example of this post, someone drawing his or her own name would be such a fixed point. Hence, a valid round, where this does not happen, is a proper “derangement”.

When \(n!\) is the factorial of a set of size \(n\), then the number of derangements is the subfactorial \(!n\).

The number of derangements for a given set of size \(n\) can be described as:

\[!n=n!\cdot\sum_{i=0}^{n}\frac{(-1)^i}{i!} \text{ for } n\geq 1\]

The fraction of derangements can then be described as:

\[p_n = \frac{!n}{n!} = \sum_{i=0}^{n}\frac{(-1)^i}{i!}\]

If you increase \(n\) to infinity, \(p_n\) asymptotically approaches the following:

\[\lim_{n \to \infty} p_n = \sum_{i=0}^{\infty}\frac{(-1)^i}{i!} = \frac{1}{e}\]

where \(e\) is “Euler’s number”5.

This means that the fraction of valid draws for Secret Santa groups tends towards \(\frac{1}{e}\sim 0,3678794\). However this only applies for groups of at least 4 people. In smaller communities the numbers are different.

For groups of size 1 to 10 the fractions of valid derangements are as follows6

Show code
ffp <- tidyr::tribble(
  ~n, ~subfactorials, ~factorials, ~fraction,
  0, 1, 1, 1,
  1, 0, 1, 0,
  2, 1, 2, 0.5,
  3, 2, 6, 0.3333333,
  4, 9,24, 0.375,
  5, 44, 120, 0.36666666,
  6, 265, 720, 0.36805555,
  7, 1854, 5040, 0.36785714,
  8, 14833,  40320,  0.36788194,
  9,  133496,  362880,  0.36787918,
  10,  1334961,  3628800,  0.36787946
)

ffp |>
  knitr::kable() |>
  kableExtra::kable_paper()
n subfactorials factorials fraction
0 1 1 1,0000000
1 0 1 0,0000000
2 1 2 0,5000000
3 2 6 0,3333333
4 9 24 0,3750000
5 44 120 0,3666667
6 265 720 0,3680556
7 1854 5040 0,3678571
8 14833 40320 0,3678819
9 133496 362880 0,3678792
10 1334961 3628800 0,3678795

As you can see, for \(n=1\), a derangement is impossible. For \(n=2\) the fraction is every other game, while for \(n=3\) the probability is \(\frac{1}{3}\).

Summary and comments

If you’re planning on starting a Secret Santa round with infinitely many people, you can expect a valid draw of lots in \(\frac{1}{e}\sim 0,3678794\). If you’re playing it in groups of at least 4 participants, the probability that no one draws a lot with her or his own name on is roughly 37%, no matter, how many people participate.

Thanks for reading! Did this meet your intuition, that you had in the beginning? Leave a reply below if so, but also if not so or if you have made different experiences in your Secret Santa rounds!


  1. This simple riddle came across my mind, when I was invited to join a Wichtelgruppe in 2020. As I couldn’t find a convincing intuitive solution and my school combinatorics knwoledge wasn’t quite sufficient, I went for a simulation to get a feeling for the numbers. I later discussed this with a friend working in maths, who eventually pointed me to the correct equations to describe the problem properly.↩︎

  2. (and vice versa for smaller groups)↩︎

  3. (and vice versa for smaller groups)↩︎

  4. for groups of n >= 4, see next chapter for explanation↩︎

  5. ~2.718↩︎

  6. due to time constraints I’m taking these numbers from the German Wikipedia article on derangements.↩︎

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gebhard (2021, Nov. 4). jolly data: Santa. Secret Santa.. Retrieved from https://jollydata.blog/posts/2021-11-02-secret-santa/

BibTeX citation

@misc{gebhard2021santa.,
  author = {Gebhard, Christian A.},
  title = {jolly data: Santa. Secret Santa.},
  url = {https://jollydata.blog/posts/2021-11-02-secret-santa/},
  year = {2021}
}