Statistical Distributions and Simulations

Paritosh Kumar Roy

2025-01-19

Random Processes

What is Monte Carlo Statistical Methods?

Mone Carlo methods are the procedures for solving a complex analytical problem stochastically by simulating the random process involved.
A formal definition of Monte Carlo methods was given by Halton (1970). He defined a Monte Carlo method as

Definition

Representing the solution of a problem as a parameter of a hypothetical population, and using a random sequence of numbers to construct a sample of the population, from which statistical estimates of the parameter can be obtained.

Common Problems in Statistical Inference

Two major classes of numerical problems that arise in statistical inference

optimization problems
integration problems

Although optimization is generally associated with the likelihood approach, and integration with the Bayesian approach, these are not strict classifications.

Application of Monte Carlo Statistical Methods

optimization problems
integration problems

Monte Carlo Integration

Generic problem of evaluating the integral \[\begin{align*} E\left[h(X)\right] = \int_{x} h(x)\; f(x)\; dx. \end{align*}\]
Based on previous developments, it is natural to propose using a sample \((X_1,\ldots,X_m)\) generated from the density f
Approximate the integral by the empirical average
This approach is often referred to as the Monte Carlo method.

Example

Consider the following the integration: \[\begin{align*} I = \int_{x=a}^{b} (x^2+x+2)\; \mathrm{d}x = \frac{17}{6} \approx 2.833 \end{align*}\]

Example: Monte Carlo method

The integral \(I\) can be represented as expectation \[\begin{align*} I &= \int_{x=a}^{b} (x^2+x+2)\; \mathrm{d}x = (b-a) \int_{x=a}^{b} (x^2+x+2) \frac{1}{b-a}\; \mathrm{d}x\\ &= (b-a) \operatorname{E}\left(X^2+X+2\right), \quad \text{where} \quad X \sim U(a,b). \end{align*}\]

Under Monte Carlo integration \(I\) is approximated as \[\begin{align*} \widehat{I} = (b-a) \frac{1}{n} \sum_{i=1}^{n}(x_i^2+x_i+2) \quad \text{with} \quad x_1,\ldots,x_n \quad \text{iid realizations from} \quad U(a,b). \end{align*}\]

x <- runif(1000, min = 0, max = 1);
(1-0)*mean(x^2+x+2)

[1] 2.835954

Exercise: Monte Carlo method

Consider computing the \(r\)-th moment of the random variable having the pdf \[\begin{align*} f_X(x | \lambda, \alpha) = \frac{\lambda^{\alpha} x^{\alpha-1}\exp(-\lambda x)}{\Gamma(\alpha)};\; \text{for } x>0, \end{align*}\] which is defined as \[\begin{align*} \mu_r = E(X^r)=\int_{0}^{\infty} x \, f_X(x | \lambda, \alpha) dx. \end{align*}\] For given any \(\lambda\) and \(\alpha\), describe the Monte-Carlo integration for computing \(\mu_r\) for all \(r\).

Monte Carlo method

Strong Law

For a sample \((X_1,\ldots,X_m)\), the empirical average \[\begin{align*} \bar{h}_m = \frac{1}{m} = \sum_{j=1}^{m} h(x_j), \end{align*}\] converges almost surely to \[\begin{align*} E\left[h(X)\right] \end{align*}\]
This is the Strong Law of Large Numbers

Monte Carlo method

Central Limit Theorem

Estimate the variance with \[\begin{align*} \operatorname{V}\left(\bar{h}_m\right) = \frac{1}{m} \int_{x} \left(h(x) - \operatorname{E}\left[h(X)\right]\right)^2 f(x) dx \end{align*}\]
For large \(m\), \begin{alig8} \end{align*} is therefore approximately distributed as a \(N(0,1)\) variable, where
\[\begin{align*} v_m = \frac{1}{m} \sum_{i=1}^m \left(h(x_i) - \bar{h}_m\right)^2 \end{align*}\]
This leads to the construction of a convergence test and of confidence bounds on the approximation of \(\operatorname{E}[h(X)]\).

Exercise: Monte Carlo method

Compute the following integral values, i) algebrically ii) using numerical integration (Simson’ rule/Trapezoidal rule) and iii) using Monte Carlo integration \[\begin{align*} &\mathrm{i)} \quad \int_{x=0}^{3} (x^2+x+2)\; \mathrm{d}x\\ &\mathrm{ii)} \quad \int_{x=0}^{1}\left[\cos(50x) + \sin(20x)\right]^2\; \mathrm{d}x \end{align*}\]

Exercise: Monte Carlo method

To calculate the integral, we generate \(x_1,\ldots,x_n\) iid \(U(0,1)\) random variables, and approximate \[\begin{align*} \int_{x=0}^{1}\left[\cos(50x) + \sin(20x)\right]^2 \mathrm{d}x \quad \text{with} \quad \sum_{i=1}^{n} \frac{h(x_i)}{n} \end{align*}\]

[1] 0.9828217

It is clear that the Monte Carlo average is converging, with value of 0.963 after 10,000 iterations.

Exercise: Monte Carlo method

Let \(X \sim N(0,1)\). Use Monte Carlo estimation to obtain an estimate for \(E(\cos (X))\) to three digits of accuracy.

Assume \(X \sim Exp(1)\) and \(Y \sim N(0,X)\), that is \(Y\) is normally distributed with a random variance. Use Monte Carlo estimation to estimate \(E(X|Y=4)\) and \(Var(X|Y=4)\).

Exercise: Monte Carlo method

Given the density function in \[\begin{align*} f(x)=\frac{1}{x\sqrt{2\pi}} e^{-\frac{(\log x)^2}{2}}, \; x>0 \end{align*}\]

What are the steps for computing the mean using the Monte Carlo integration method?
What are the steps for computing the mode, median, and standard deviation using the concept of stochastic solution of a complex problem?
Using the Monte Carlo integration calculate the mean, median, mode, standard deviation, 25th, 50th and 75th percentiles of \(X\).