Statistics Homework 8 — Random Walk Theory & Combinatorics
Overview: Connecting Random Walks, Binomial Distributions, and Combinatorics
In Homework 7, we simulated server security as a random walk and observed how trajectory endpoints converge to a binomial distribution. In this homework, we dive deep into the mathematical theory behind these observations.
We will explore:
- Analogies with Homework 4 (Bernoulli process and Law of Large Numbers)
- The relationship between \(\{0, 1\}\) and \(\{-1, +1\}\) encodings
- Combinatorial enumeration using binomial coefficients
- Pascal's triangle and its recursive structure
- Binomial expansion and the Binomial Theorem
- Connections to the Fibonacci sequence
- Combinatorial interpretations and path counting
1. Analogies with Homework 4: Bernoulli Processes
Homework 4 Recap: \(\{0, 1\}\) Encoding
In Homework 4, we studied the Law of Large Numbers using Bernoulli trials \(Y_t \in \{0, 1\}\) with \(P(Y_t = 1) = p\).
- Cumulative sum: \(\sum_{t=1}^{n} Y_t\) counts the number of successes (range: \(0\) to \(n\))
- Relative frequency: \(f_n = \frac{1}{n}\sum_{t=1}^{n} Y_t\) converges to \(p\) as \(n \to \infty\)
- Distribution: The sum follows a Binomial\((n, p)\) distribution
Homework 7: \(\{-1, +1\}\) Encoding
In Homework 7, we used \(X_t \in \{-1, +1\}\) to represent security outcomes:
- \(X_t = +1\) if server remains secure (probability \(q = 1-p\))
- \(X_t = -1\) if server is breached (probability \(p\))
The cumulative sum \(S_n = \sum_{t=1}^{n} X_t\) represents a signed displacement from the origin, ranging from \(-n\) to \(+n\).
Relationship Between the Two Encodings
The two representations are affinely related: \[ S_n = \sum_{t=1}^{n} X_t = \sum_{t=1}^{n} (2Y_t - 1) = 2\sum_{t=1}^{n} Y_t - n = 2K_n - n \] where \(K_n = \sum_{t=1}^{n} Y_t\) is the number of secure weeks (successes in Bernoulli trials).
Equivalence: If \(K_n = k\), then \(S_n = 2k - n\). Thus:
- \(k = 0\) (no secure weeks) \(\Rightarrow S_n = -n\) (maximum breach)
- \(k = n\) (all weeks secure) \(\Rightarrow S_n = +n\) (maximum security)
- \(k = n/2\) (half secure) \(\Rightarrow S_n = 0\) (balanced)
Key Differences
| Aspect | HW4: \(\{0,1\}\) Encoding | HW7: \(\{-1,+1\}\) Encoding |
|---|---|---|
| Interpretation | Count of successes | Signed displacement (random walk position) |
| Range | \(0\) to \(n\) | \(-n\) to \(+n\) |
| Mean | \(np\) | \(n(1-2p)\) |
| Symmetry | Skewed (unless \(p=0.5\)) | Centered at origin, symmetric when \(p=0.5\) |
| Parity | All integers \(0, 1, 2, \ldots, n\) | Only same parity as \(n\): \(-n, -n+2, \ldots, n-2, n\) |
| Geometric View | Accumulation on \([0, n]\) | Symmetric walk around 0 |
Insight: The \(\{-1, +1\}\) encoding naturally emphasizes displacement and reflection symmetry, making it ideal for random walk analysis and path counting problems.
2. Combinatorial Enumeration and Binomial Coefficients
Total Number of Trajectories
Since each of the \(n\) steps can independently be either \(+1\) or \(-1\), the total number of distinct \(n\)-step random walk trajectories is: \[ \text{Total trajectories} = 2^n \] This exponential growth reflects the fact that the trajectory space corresponds to all binary sequences of length \(n\).
Trajectories Ending at a Specific Position
To end at position \(S_n = s\), where \(s = 2k - n\), we need exactly:
- \(k\) steps of \(+1\) (secure weeks)
- \(n - k\) steps of \(-1\) (breached weeks)
The number of ways to choose which \(k\) steps out of \(n\) are \(+1\) is the binomial coefficient: \[ \binom{n}{k} = \frac{n!}{k!\,(n-k)!} \]
Verification: Sum Over All Positions
Summing over all possible values of \(k\) (from 0 to \(n\)): \[ \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This confirms that the binomial coefficients enumerate all possible trajectories.
Probability Distribution
If each step is \(+1\) with probability \(q = 1-p\) and \(-1\) with probability \(p\), then: \[ P(S_n = 2k - n) = \binom{n}{k} q^k p^{n-k} \] This is the binomial distribution for \(K_n \sim \text{Binomial}(n, q)\), transformed to the random walk scale.
3. Pascal's Triangle: Recursive Structure
Definition and Construction
Pascal's triangle (also called Tartaglia's triangle) is a triangular array where each entry is a binomial coefficient: \[ \begin{array}{ccccccccc} & & & & 1 & & & & \\ & & & 1 & & 1 & & & \\ & & 1 & & 2 & & 1 & & \\ & 1 & & 3 & & 3 & & 1 & \\ 1 & & 4 & & 6 & & 4 & & 1 \end{array} \]
Row \(n\) contains the coefficients \(\binom{n}{0}, \binom{n}{1}, \ldots, \binom{n}{n}\).
Pascal's Identity (Recursive Formula)
Each entry is the sum of the two entries above it: \[ \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k} \]
Combinatorial Interpretation: A trajectory ending at position \(2k-n\) after \(n\) steps is formed by:
- Taking a trajectory ending at \(2(k-1)-(n-1) = 2k-n\) after \(n-1\) steps, then adding a \(+1\) step
- OR taking a trajectory ending at \(2k-(n-1) = 2k-n+1\) after \(n-1\) steps, then adding a \(-1\) step
Row Sums
The sum of row \(n\) is: \[ \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This confirms the total number of \(n\)-step trajectories.
Interactive Pascal's Triangle
Enter a value of \(n\) (between 1 and 10) to visualize Pascal's triangle up to row \(n\):
Click "Generate Triangle" to display Pascal's triangle.
4. Binomial Expansion and the Binomial Theorem
The Binomial Theorem
For any real numbers \(x\) and \(y\), and non-negative integer \(n\): \[ (x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k} \]
Special Cases
Case 1: \(x = y = 1\)
\[ (1 + 1)^n = \sum_{k=0}^{n} \binom{n}{k} = 2^n \] This gives the total number of trajectories.Case 2: \(x = 1, y = -1\)
\[ (1 - 1)^n = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} = 0 \quad \text{for } n \geq 1 \] This shows the alternating sum property: binomial coefficients at even and odd positions cancel out.Case 3: Probability Generating Function
In the random walk context with probabilities \(q\) (secure) and \(p\) (breach): \[ (q + p)^n = \sum_{k=0}^{n} \binom{n}{k} q^k p^{n-k} = 1 \] since \(q + p = 1\). This confirms that probabilities sum to 1.
Connection to Random Walk
The binomial expansion directly gives the probability distribution of final positions: \[ \mathbb{E}[z^{S_n}] = \mathbb{E}[z^{2K_n - n}] = z^{-n}(qz^2 + p)^n \] where \(K_n \sim \text{Binomial}(n, q)\). Setting \(z = 1\) recovers the sum of probabilities.
5. Fibonacci Sequence and Diagonal Sums
Fibonacci Numbers
The Fibonacci sequence is defined recursively: \[ F_0 = 0, \quad F_1 = 1, \quad F_{n} = F_{n-1} + F_{n-2} \quad \text{for } n \geq 2 \] This gives: \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, \ldots\)
Connection to Pascal's Triangle
The Fibonacci numbers appear as diagonal sums in Pascal's triangle: \[ F_{n+1} = \sum_{k=0}^{\lfloor n/2 \rfloor} \binom{n-k}{k} \]
Visually, sum the entries along the "rising diagonals" (going up-right) in Pascal's triangle:
- Diagonal 0: \(\binom{0}{0} = 1 = F_1\)
- Diagonal 1: \(\binom{1}{0} = 1 = F_2\)
- Diagonal 2: \(\binom{2}{0} + \binom{1}{1} = 1 + 1 = 2 = F_3\)
- Diagonal 3: \(\binom{3}{0} + \binom{2}{1} = 1 + 2 = 3 = F_4\)
- Diagonal 4: \(\binom{4}{0} + \binom{3}{1} + \binom{2}{2} = 1 + 3 + 1 = 5 = F_5\)
Combinatorial Interpretation
\(F_{n+1}\) counts the number of ways to tile a \(1 \times n\) board with \(1 \times 1\) and \(1 \times 2\) tiles. This is related to random walks where certain step patterns (consecutive \(+1\) steps) are restricted.
Interactive Fibonacci Calculator
Compute Fibonacci numbers and verify the Pascal's triangle relationship:
Enter a value and click "Compute F(n)".
6. Combinatorial Coefficients and Symmetries
Symmetry Property
The binomial coefficients are symmetric: \[ \binom{n}{k} = \binom{n}{n-k} \] Interpretation: The number of trajectories with \(k\) secure weeks equals the number with \(n-k\) breached weeks. This reflects the symmetry of choosing which weeks are secure vs. which are breached.
Hockey Stick Identity
Summing along a diagonal: \[ \sum_{k=0}^{r} \binom{k}{m} = \binom{r+1}{m+1} \]
Vandermonde's Identity
For non-negative integers \(m, n, r\): \[ \binom{m+n}{r} = \sum_{k=0}^{r} \binom{m}{k} \binom{n}{r-k} \] Interpretation: Counting paths in a 2D grid by breaking them into two segments.
Catalan Numbers
The \(n\)-th Catalan number is: \[ C_n = \frac{1}{n+1}\binom{2n}{n} = \binom{2n}{n} - \binom{2n}{n+1} \] Catalan numbers count random walk trajectories from \((0,0)\) to \((2n,0)\) that never go below the x-axis (ballot problem, Dyck paths).
7. Probability Insights and Expected Values
Expected Final Position
From the binomial distribution with \(K_n \sim \text{Binomial}(n, q)\) where \(q = 1-p\): \[ \mathbb{E}[S_n] = \mathbb{E}[2K_n - n] = 2nq - n = n(2q - 1) = n(1 - 2p) \]
- If \(p < 0.5\): \(\mathbb{E}[S_n] > 0\) (upward drift, server usually secure)
- If \(p = 0.5\): \(\mathbb{E}[S_n] = 0\) (no drift, symmetric random walk)
- If \(p > 0.5\): \(\mathbb{E}[S_n] < 0\) (downward drift, server usually breached)
Variance
\[ \text{Var}(S_n) = \text{Var}(2K_n) = 4\text{Var}(K_n) = 4nq(1-q) = 4np(1-p) \] The spread increases with \(\sqrt{n}\), typical of random walk behavior.
Central Limit Theorem
For large \(n\), the distribution of \(S_n\) approaches a Normal distribution: \[ S_n \sim \mathcal{N}\big(n(1-2p), 4np(1-p)\big) \quad \text{approximately for large } n \]
8. Summary of Key Relationships
| Concept | Formula / Relationship | Significance |
|---|---|---|
| Encoding Conversion | \(S_n = 2K_n - n\) | Links \(\{0,1\}\) and \(\{-1,+1\}\) representations |
| Total Trajectories | \(2^n\) | Exponential growth of trajectory space |
| Trajectories to Position \(s\) | \(\binom{n}{k}\) where \(s = 2k-n\) | Combinatorial enumeration via binomial coefficients |
| Pascal's Identity | \(\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}\) | Recursive structure of path counting |
| Binomial Theorem | \((x+y)^n = \sum \binom{n}{k} x^k y^{n-k}\) | Algebraic expansion generates probabilities |
| Fibonacci in Pascal | \(F_{n+1} = \sum_{k=0}^{\lfloor n/2 \rfloor} \binom{n-k}{k}\) | Diagonal sums reveal Fibonacci sequence |
| Expected Position | \(\mathbb{E}[S_n] = n(1-2p)\) | Drift determined by asymmetry (\(p \neq 0.5\)) |
| Variance | \(\text{Var}(S_n) = 4np(1-p)\) | Spread grows as \(\sqrt{n}\) |
9. Practical Implications
For Security Analysis
- Risk Assessment: The binomial distribution predicts the likelihood of various security outcomes
- Threshold Detection: Compute probability that \(S_n < -T\) (server security score falls below threshold \(T\))
- Resource Allocation: Expected drift \(\mathbb{E}[S_n]\) informs long-term security investment needs
For Algorithm Design
- Randomized Algorithms: Many algorithms use random walks for sampling and optimization
- Monte Carlo Methods: Understanding convergence rates (related to variance \(4np(1-p)\))
- PageRank & Graph Algorithms: Random walks on graphs extend these principles to networks
For Statistical Learning
- Classification Boundaries: Random walk theory underlies perceptron convergence proofs
- Sequential Analysis: Deciding when to stop collecting data based on trajectory thresholds
- Time Series: Random walks model stock prices, sensor data, and other sequential phenomena
References
- Graham, R. L., Knuth, D. E., & Patashnik, O. (1994). Concrete Mathematics: A Foundation for Computer Science (2nd ed.). Addison-Wesley.
- Stanley, R. P. (2011). Enumerative Combinatorics, Volume 1 (2nd ed.). Cambridge University Press.
- Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Volume I (3rd ed.). John Wiley & Sons.