Abstract
The Monte Carlo method is a numerical approach for solving problems using random sampling. The original motivation for the method was to facilitate the resolution of partial differential equations appearing in natural studies, but has been adopted for the general solution of problems where an analytical solution is not available or too complicated to be used. Typical situations include optimisation, numeric integration, sampling from probability distributions, and random variable computations. This article discusses the fundamental elements of the method and provides guidance for implementation using standard spreadsheet software.
Random variables
A discrete random variable can take values from a discrete set . The discrete random variable is specified by the following table:
The values can be arbitrary, but the associated probabilities must satisfy two conditions
The expected value of a discrete random variable is defined as
The expected value has the following properties, where is an arbitrary non-random number
The variance of a random variable is defined as
The variance has the following properties, where is an arbitrary non-random number
If a random variable is sampled times then it will assume values from and the arithmetic mean will be close to
The following relationships are valid for independent random variables
A continuous random variable is defined over a continuous interval with a probability density function (pdf) . All the properties of discrete random variables also apply to continuous random variables. In addition if is a continuous arbitrary function then
generally
Two useful random variables
The random variable defined over the interval with a pdf is said to be uniformly distributed in
It can be verified that , and .
The random variable defined over the interval with a pdf where and are numeric parameters.
It can be verified that , , and
The Central Limit Theorem
A set of identical independent random variables will have identical expected values and variances
Lets call the sum of these values
It can be seen that
Lets consider now a normal random variable with the same parameters:
The central limit theorem states that for any interval and for a large the sum of identical random variables is approximate normal
The central limit theorem is also valid for any sum of random variables (not necessarily identical and independent) provided that no single variable plays a significant role in the sum.
The Monte Carlo method
To determine some unknown value we shall try to find a random variable such that with .
Lets consider independent random variables with pdfs identical to . If is sufficiently large then the pdf of the sum of the random variables will be approximately normal with parameters and
Rearranging this equation we arrive at the Monte Carlo formula
From this expression we can determine the expected value of the random variable with an estimation error as a function of the number of trials
The method consists of sampling values of the random variable (determining a single value of each of the variables is equivalent to determining values of a single variable because all these random variables are identical). The arithmetic mean of these values will be approximately equal to with an error not exceeding .
Practical implementation
There are a number of commercial packages that run Monte Carlo simulation, however a basic spreadsheet program can be used in most situations. The generation of multiple trials is implemented by propagating a basic formula times.
A typical problem consists in the determination of an unknown value which is the value of a combination of random variables.
Lets assume that the total cost of a project is the sum of the cost of a number of independent activities . The cost of each activities is a random variable which can have a value in the range . The total cost is a random variable within the interval . Each of these activities may have a particular probability distribution, and the sum of all these variables will have a composite probability distribution, but thanks to the central limit theorem we can use a uniform distribution without compromising the result.
An important assumption is that each of the variables is independent of the others. This means that the cost of any activity is not influenced by the cost of any other activity.
The general scheme of the Monte Carlo method is as follows:
- Generate random values for each of the activity costs
- Add these variables in a total project cost
- Repeat this operation times
- The expected project cost is the average of the total project cost
The first step is to generate random values for each of the activity costs. Assuming a uniform distribution, we can use the RAND() function to generate random numbers in the interval (0,1) and multiply these by the range of each variable. The range is the difference between the maximum value and the minimum value .
The cost for Activity will look like this:
To determine the number of iterations we estimate the standard deviation of the random variable and specify the desired error .
We can estimate an upper bound of by calculating the standard deviation between the maximum, the minimum and average values of the random variable:
Note that we used the function STDEV.P, which calculates the standard deviation of the entire population, in this case only 3 values.
To calculate we set the error as fraction of the average value of the random variable, calculated between the minimum and the maximum.
The number of iterations to obtain a result with an error of less than is:
The kurtosis and skewness of the generated distribution will be close to the normal distribution.
Conditional variables
An important assumption is that the random variables are independent. Covariance is defined as:
If and are independent then . It should be noted that the converse is not necessarily true; if the covariance of two random variables is zero they are not necessarily independent.
If the simulation model requires conditional random variables then they can be generated using the following algorithm:
- Generate two uniform random variables and
- Generate a third random variable
Where is the coefficient of correlation.
Variance Reduction Techniques
The Monte Carlo method produces a result that is the expected value of the unknown variable . The confidence interval of this estimator is determined by the variance of the result . We shall discuss some techniques to reduce the variance of the Monte Carlo result and hence the confidence interval of the estimate.
Common Random Numbers. If we are comparing two different systems it is better to use the same set of random numbers for both simulation runs. The idea is based on the following property of the variance:
If and are positively correlated with then the variance of will be smaller than if and were independent.
Antithetic Variables. If is uniformly distributed on is also uniformly distributed. The two random variables are negatively correlated, . The idea is that if and are two successive results, then
Therefore if and are negatively correlated then the variance of is smaller than if the variables are independent.
References
Metropolis, N; Ulam, S (1949) “The Monte Carlo Method”, Journal of the American Statistical Association, No 247, Vol 44