Statistics Homework 7 — Server Security Random Walk Simulation

Problem Statement

Consider a server that receives weekly security updates for \(n\) weeks. There are \(m\) attackers who can breach the system with probability \(p\) each week.

We model the server's security as a random walk:

If the server remains secure in a week: score increases by +1
If the server becomes compromised in a week: score decreases by −1

Starting from an initial score of 0, after \(n\) weeks the cumulative score represents the server's overall security trajectory.

Task: Simulation and Analysis

1. Generate Random Trajectories

Simulate \(m\) independent random walk trajectories, each representing one possible security history over \(n\) weeks. For each week \(t = 1, 2, \ldots, n\):

With probability \(p\): server is breached → add −1 to score
With probability \(q = 1 - p\): server remains secure → add +1 to score

Each trajectory is a sequence \((S_0, S_1, S_2, \ldots, S_n)\) where \(S_0 = 0\) and \(S_t = S_{t-1} \pm 1\).

2. Visualize Trajectories

Draw all \(m\) trajectories on a single plot (spaghetti plot) showing how different security histories evolve over time. Observe how the trajectories fan out from the origin and how their endpoints distribute across possible final scores.

3. Count Final Scores

For each possible final score \(s \in \{-n, -n+2, \ldots, n-2, n\}\), count how many of the \(m\) trajectories end at that score. Create a histogram showing this empirical distribution.

4. Compare with Binomial Distribution

The final score \(S_n\) after \(n\) weeks is related to a binomial distribution. If \(K\) is the number of secure weeks (out of \(n\) total weeks), then: \[ S_n = 2K - n \] where \(K \sim \text{Binomial}(n, q)\) with \(q = 1 - p\).

The theoretical probability that \(S_n = 2k - n\) (i.e., exactly \(k\) secure weeks) is: \[ P(S_n = 2k - n) = \binom{n}{k} (1-p)^k \cdot p^{n-k} \]

Convergence Task: As \(n\) and \(m\) increase, observe how the empirical histogram converges to this theoretical binomial distribution. Try different values of \(n\) (20, 50, 100, 200) and \(m\) (50, 100, 200, 500) to see the Law of Large Numbers in action.

Expected Behavior

The expected final score after \(n\) weeks is: \[ \mathbb{E}[S_n] = n(1 - 2p) \]

If \(p < 0.5\) (server usually secure): trajectories drift upward, \(\mathbb{E}[S_n] > 0\)
If \(p = 0.5\) (symmetric case): trajectories have no drift, \(\mathbb{E}[S_n] = 0\)
If \(p > 0.5\) (server usually breached): trajectories drift downward, \(\mathbb{E}[S_n] < 0\)

The variance is: \[ \text{Var}(S_n) = 4np(1-p) \]

Note: For large \(n\), the distribution of \(S_n\) approaches a Normal distribution with mean \(n(1-2p)\) and variance \(4np(1-p)\), as predicted by the Central Limit Theorem.

Interactive Simulation

Use the controls below to configure the simulation parameters and generate random walk trajectories. Experiment with different values to observe how the probability \(p\) affects the distribution of security outcomes.

Simulation Parameters

m — Trajectories

Number of random walks to simulate

n — Weeks

Duration of each trajectory (time steps)

p — Breach Probability

Likelihood of security breach per week

Seed — Randomness

Random number generator seed

Simulation Statistics

Configure parameters and click "Generate Trajectories" to compute statistics.

Trajectory Visualization

This plot shows all \(m\) simulated trajectories. Each line represents one possible security history over \(n\) weeks. The horizontal axis is time (weeks), and the vertical axis is the cumulative security score \(S_t\).

Final Score Distribution

This histogram shows how many trajectories ended at each possible final score. The bars are the observed counts from your simulation. The red curve shows the theoretical binomial distribution. As \(m\) and \(n\) increase, the histogram should converge to the red curve.

Generate trajectories to see exact counts

Observations and Questions

Click on each question to reveal the answer and deepen your understanding.

Answer: Yes! This demonstrates the Law of Large Numbers. As \(m\) increases, the empirical distribution (histogram) converges to the theoretical binomial distribution (red curve). With \(m = 50\), noticeable deviations may be observed. With \(m = 500\), the match becomes much tighter. This occurs because larger sample sizes reduce random sampling variability.

Try it: Run with \(m = 50\), then \(m = 200\), then \(m = 500\). Watch how the bars align more closely with the red curve.

Answer: The hack probability \(p\) controls the drift (directional bias) of the random walk:

\(p < 0.5\) (more secure): Distribution shifts right (positive scores). Expected value: \(\mathbb{E}[S_n] = n(1-2p) > 0\)
\(p = 0.5\) (symmetric): Distribution centered at 0. No drift. \(\mathbb{E}[S_n] = 0\)
\(p > 0.5\) (more hacks): Distribution shifts left (negative scores). \(\mathbb{E}[S_n] < 0\)

Try it: Run with \(p = 0.3\), \(p = 0.5\), and \(p = 0.7\). Notice how the peak moves.

Answer: As \(n\) increases, the distribution becomes wider and flatter. This is because:

More time steps = more opportunities for random variation
Standard deviation grows as \(\text{Std}(S_n) = 2\sqrt{np(1-p)}\), which increases with \(\sqrt{n}\)
The range of possible outcomes expands: \(S_n \in [-n, n]\)

Try it: Run with \(n = 20\), \(n = 50\), \(n = 100\). Watch the distribution spread horizontally while maintaining its binomial shape.

Answer: Yes! When \(p = 0.5\), the random walk is perfectly symmetric around 0. This is because:

Each step has equal probability of going up (+1) or down (−1)
For every trajectory ending at \(+k\), there's an equally likely trajectory ending at \(−k\)
Mathematically: \(P(S_n = k) = P(S_n = -k)\) for all \(k\)

The expected value \(\mathbb{E}[S_n] = n(1-2p) = 0\) when \(p = 0.5\), confirming zero drift.

Try it: Set \(p = 0.5\) and run multiple times. The distribution should mirror itself perfectly across the vertical line at 0.

Answer: Notice that all possible final scores have the same parity as \(n\):

If \(n\) is even: possible scores are ..., −4, −2, 0, +2, +4, ... (all even)
If \(n\) is odd: possible scores are ..., −3, −1, +1, +3, ... (all odd)

Why? Starting at \(S_0 = 0\), each step changes the score by ±1. After \(n\) steps:

If \(k\) weeks are secure (+1) and \(n-k\) are hacked (−1), then \(S_n = k - (n-k) = 2k - n\)
Since \(2k\) is always even, \(S_n\) has the same parity as \(n\)

Try it: Run with \(n = 50\) (even) and verify only even scores appear. Then try \(n = 51\) (odd) and see only odd scores.

For deeper theoretical analysis of this random walk, including its relationship to Pascal's triangle, binomial coefficients, and the Fibonacci sequence, see Homework 8.

References

Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Volume I (3rd ed.). John Wiley & Sons.
Lawler, G. F. & Limic, V. (2010). Random Walk: A Modern Introduction. Cambridge University Press.
Ross, S. M. (2019). Introduction to Probability Models (12th ed.). Academic Press.