Statistics Homework 8 — Random Walk Theory & Combinatorics

Overview: Connecting Random Walks, Binomial Distributions, and Combinatorics

In Homework 7, we simulated server security as a random walk and observed how trajectory endpoints converge to a binomial distribution. In this homework, we dive deep into the mathematical theory behind these observations.

We will explore:

1. Analogies with Homework 4: Bernoulli Processes

Homework 4 Recap: \(\{0, 1\}\) Encoding

In Homework 4, we studied the Law of Large Numbers using Bernoulli trials \(Y_t \in \{0, 1\}\) with \(P(Y_t = 1) = p\).

Homework 7: \(\{-1, +1\}\) Encoding

In Homework 7, we used \(X_t \in \{-1, +1\}\) to represent security outcomes:

The cumulative sum \(S_n = \sum_{t=1}^{n} X_t\) represents a signed displacement from the origin, ranging from \(-n\) to \(+n\).

Relationship Between the Two Encodings

The two representations are affinely related: \[ S_n = \sum_{t=1}^{n} X_t = \sum_{t=1}^{n} (2Y_t - 1) = 2\sum_{t=1}^{n} Y_t - n = 2K_n - n \] where \(K_n = \sum_{t=1}^{n} Y_t\) is the number of secure weeks (successes in Bernoulli trials).

Equivalence: If \(K_n = k\), then \(S_n = 2k - n\). Thus:

Key Differences

Aspect HW4: \(\{0,1\}\) Encoding HW7: \(\{-1,+1\}\) Encoding
Interpretation Count of successes Signed displacement (random walk position)
Range \(0\) to \(n\) \(-n\) to \(+n\)
Mean \(np\) \(n(1-2p)\)
Symmetry Skewed (unless \(p=0.5\)) Centered at origin, symmetric when \(p=0.5\)
Parity All integers \(0, 1, 2, \ldots, n\) Only same parity as \(n\): \(-n, -n+2, \ldots, n-2, n\)
Geometric View Accumulation on \([0, n]\) Symmetric walk around 0

Insight: The \(\{-1, +1\}\) encoding naturally emphasizes displacement and reflection symmetry, making it ideal for random walk analysis and path counting problems.

2. Combinatorial Enumeration and Binomial Coefficients

Total Number of Trajectories

Since each of the \(n\) steps can independently be either \(+1\) or \(-1\), the total number of distinct \(n\)-step random walk trajectories is: \[ \text{Total trajectories} = 2^n \] This exponential growth reflects the fact that the trajectory space corresponds to all binary sequences of length \(n\).

Trajectories Ending at a Specific Position

To end at position \(S_n = s\), where \(s = 2k - n\), we need exactly:

The number of ways to choose which \(k\) steps out of \(n\) are \(+1\) is the binomial coefficient: \[ \binom{n}{k} = \frac{n!}{k!\,(n-k)!} \]

Verification: Sum Over All Positions

Summing over all possible values of \(k\) (from 0 to \(n\)): \[ \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This confirms that the binomial coefficients enumerate all possible trajectories.

Probability Distribution

If each step is \(+1\) with probability \(q = 1-p\) and \(-1\) with probability \(p\), then: \[ P(S_n = 2k - n) = \binom{n}{k} q^k p^{n-k} \] This is the binomial distribution for \(K_n \sim \text{Binomial}(n, q)\), transformed to the random walk scale.

3. Pascal's Triangle: Recursive Structure

Definition and Construction

Pascal's triangle (also called Tartaglia's triangle) is a triangular array where each entry is a binomial coefficient: \[ \begin{array}{ccccccccc} & & & & 1 & & & & \\ & & & 1 & & 1 & & & \\ & & 1 & & 2 & & 1 & & \\ & 1 & & 3 & & 3 & & 1 & \\ 1 & & 4 & & 6 & & 4 & & 1 \end{array} \]

Row \(n\) contains the coefficients \(\binom{n}{0}, \binom{n}{1}, \ldots, \binom{n}{n}\).

Pascal's Identity (Recursive Formula)

Each entry is the sum of the two entries above it: \[ \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k} \]

Combinatorial Interpretation: A trajectory ending at position \(2k-n\) after \(n\) steps is formed by:

Row Sums

The sum of row \(n\) is: \[ \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This confirms the total number of \(n\)-step trajectories.

Interactive Pascal's Triangle

Enter a value of \(n\) (between 1 and 10) to visualize Pascal's triangle up to row \(n\):

Click "Generate Triangle" to display Pascal's triangle.

4. Binomial Expansion and the Binomial Theorem

The Binomial Theorem

For any real numbers \(x\) and \(y\), and non-negative integer \(n\): \[ (x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k} \]

Special Cases

Case 1: \(x = y = 1\)

\[ (1 + 1)^n = \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This gives the total number of trajectories.

Case 2: \(x = 1, y = -1\)

\[ (1 - 1)^n = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} = 0 \quad \text{for } n \geq 1 \] This shows the alternating sum property: binomial coefficients at even and odd positions cancel out.

Case 3: Probability Generating Function

In the random walk context with probabilities \(q\) (secure) and \(p\) (breach): \[ (q + p)^n = \sum_{k=0}^{n} \binom{n}{k} q^k p^{n-k} = 1 \] since \(q + p = 1\). This confirms that probabilities sum to 1.

Connection to Random Walk

The binomial expansion directly gives the probability distribution of final positions: \[ \mathbb{E}[z^{S_n}] = \mathbb{E}[z^{2K_n - n}] = z^{-n}(qz^2 + p)^n \] where \(K_n \sim \text{Binomial}(n, q)\). Setting \(z = 1\) recovers the sum of probabilities.

5. Fibonacci Sequence and Diagonal Sums

Fibonacci Numbers

The Fibonacci sequence is defined recursively: \[ F_0 = 0, \quad F_1 = 1, \quad F_{n} = F_{n-1} + F_{n-2} \quad \text{for } n \geq 2 \] This gives: \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, \ldots\)

Connection to Pascal's Triangle

The Fibonacci numbers appear as diagonal sums in Pascal's triangle: \[ F_{n+1} = \sum_{k=0}^{\lfloor n/2 \rfloor} \binom{n-k}{k} \]

Visually, sum the entries along the "rising diagonals" (going up-right) in Pascal's triangle:

Combinatorial Interpretation

\(F_{n+1}\) counts the number of ways to tile a \(1 \times n\) board with \(1 \times 1\) and \(1 \times 2\) tiles. This is related to random walks where certain step patterns (consecutive \(+1\) steps) are restricted.

Interactive Fibonacci Calculator

Compute Fibonacci numbers and verify the Pascal's triangle relationship:

Enter a value and click "Compute F(n)".

6. Combinatorial Coefficients and Symmetries

Symmetry Property

The binomial coefficients are symmetric: \[ \binom{n}{k} = \binom{n}{n-k} \] Interpretation: The number of trajectories with \(k\) secure weeks equals the number with \(n-k\) breached weeks. This reflects the symmetry of choosing which weeks are secure vs. which are breached.

Hockey Stick Identity

Summing along a diagonal: \[ \sum_{k=0}^{r} \binom{k}{m} = \binom{r+1}{m+1} \]

Vandermonde's Identity

For non-negative integers \(m, n, r\): \[ \binom{m+n}{r} = \sum_{k=0}^{r} \binom{m}{k} \binom{n}{r-k} \] Interpretation: Counting paths in a 2D grid by breaking them into two segments.

Catalan Numbers

The \(n\)-th Catalan number is: \[ C_n = \frac{1}{n+1}\binom{2n}{n} = \binom{2n}{n} - \binom{2n}{n+1} \] Catalan numbers count random walk trajectories from \((0,0)\) to \((2n,0)\) that never go below the x-axis (ballot problem, Dyck paths).

7. Probability Insights and Expected Values

Expected Final Position

From the binomial distribution with \(K_n \sim \text{Binomial}(n, q)\) where \(q = 1-p\): \[ \mathbb{E}[S_n] = \mathbb{E}[2K_n - n] = 2nq - n = n(2q - 1) = n(1 - 2p) \]

Variance

\[ \text{Var}(S_n) = \text{Var}(2K_n) = 4\text{Var}(K_n) = 4nq(1-q) = 4np(1-p) \] The spread increases with \(\sqrt{n}\), typical of random walk behavior.

Central Limit Theorem

For large \(n\), the distribution of \(S_n\) approaches a Normal distribution: \[ S_n \sim \mathcal{N}\big(n(1-2p), 4np(1-p)\big) \quad \text{approximately for large } n \]

8. Summary of Key Relationships

Concept Formula / Relationship Significance
Encoding Conversion \(S_n = 2K_n - n\) Links \(\{0,1\}\) and \(\{-1,+1\}\) representations
Total Trajectories \(2^n\) Exponential growth of trajectory space
Trajectories to Position \(s\) \(\binom{n}{k}\) where \(s = 2k-n\) Combinatorial enumeration via binomial coefficients
Pascal's Identity \(\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}\) Recursive structure of path counting
Binomial Theorem \((x+y)^n = \sum \binom{n}{k} x^k y^{n-k}\) Algebraic expansion generates probabilities
Fibonacci in Pascal \(F_{n+1} = \sum_{k=0}^{\lfloor n/2 \rfloor} \binom{n-k}{k}\) Diagonal sums reveal Fibonacci sequence
Expected Position \(\mathbb{E}[S_n] = n(1-2p)\) Drift determined by asymmetry (\(p \neq 0.5\))
Variance \(\text{Var}(S_n) = 4np(1-p)\) Spread grows as \(\sqrt{n}\)

9. Practical Implications

For Security Analysis

For Algorithm Design

For Statistical Learning

References