Simulation: 16 Week Course Condensed into 10 Minutes

Posted On May 14, 2020

Introduction

What do you think of when you hear the word simulation? Personally, my head goes straight to the Matrix. Are we living in a simulation?

Well, over the past few months, I have been living in a simulation. More specifically, it has been my simulation class as part of the Georgia Tech MS Analytics program. Simulation is an incredibly important tool for anyone working with data. Not everyone has time to spend four months learning about simulation modeling, but everyone does have ten to fifteen minutes to spare for this post. Enjoy your trip into the simulation.

[mathjax]

The course was divided into ten modules. These include:

– Prerequisites (Module 1 and 2)
– Hand Simulations (Module 3)
– General Simulation Techniques (Module 4)
– ARENA (Module 5)
– Random Uniform Number Generation (Module 6)
– Random Variate Generation (Module 7)
– Input Data Analysis (Module 8)
– Output Data Analysis (Module 9)
– Comparing Systems (Module 10).

Note: Most module titles below are hyperlinked to older versions of slide decks used in Georgia Tech classes.

Objectives

The objectives of this course were to:

Identify simulation models
Create simulations using the software tool, Arena
Analyze the statistical aspects of simulation, such as input analysis, random variate generation, output analysis, and variance reduction techniques.

Prerequisites (Modules 1 and 2)

Return to Table of Contents

Models are high-level representations of the operation of a real-world process or system.
You can solve a model via three different methods:

Method	Example
Analytical methods (e.g. physics equations)	Toss a stone off a cliff and model its position
Numerical methods	Model the weather
Simulation methods	Call center analysis

Simulation can study models too complicated for analytical/numerical treatment
Simulation is one of the top three industrial engineering/operations research/management science technologies.
Simulation models can be applied across any industry. Examples include traffic simulation, portfolio analysis, supply chain analysis, patient flow in a hospital, etc.
In a room of just 23 people there’s a 50-50 chance of at least two people having the same birthday. This is called the birthday paradox. Simulation could help you visualize this answer.
You can simulate a situation that has equal probability outcomes using the Uniform (0,1) distribution. For example, if you want to simulate a 10 sided die, you could sample a $Unif(0,1)$ random variable, multiply it by 10, round up, and then have your random variate.
If $ U = \ 0.54$, then $X = \ 10*.54 = 5.4$, which rounded up becomes 6.
Theorem: If $X$ is a continuous random variable with cdf $F(x)$, then the random variable $ F(X) \sim \ Unif (0,1) $. This lets you generate realizations of the RV $X$. You set $F(X) = U \ \sim \ Unif(0,1)$ and solve for $ X = $F^-1$ (U) $.

Inverse Transform Method:
– Sample $U$ from $Unif(0,1)$
– Return $ X = $F^-1$ (U) $

Definition of the expected value of a random variable (RV)

Law of the Unconscious Statistician (LOTUS)

Joint probability distribution function and probability mass function (PDF/PMF). A function that depends on two random variables.

Marginal PDF/PMFs. What is the probability of $X$, regardless of $Y$?

$X_1, ..., X_n$ form a random sample from $f(x)$ if (i) $X_1, ..., X_n$ are independent, and (ii) each $X_i$ has the same pdf (or pmf) $f(x)$
Common Discrete Probability Distributions:
– Bernoulli ($p$): success probability, $p$
– Binomial ($n,p$): # of successes in n Bern($p$) trials
– Geometric ($p$): # of Bern($p$) trials until first success
– Negative Binomial
– Poisson ($\lambda$): counts # of arrivals over time
– Empirical: “sample” distribution
Common Continuous Probability Distributions:
– Uniform: not much known from the data, except maybe the minimum and maximum values, each value is equally likely
– Triangular: limited data, know minimum, maximum, and “most likely” values
– Exponential ($\lambda$): interarrival times from a Poisson process
– Normal: good model for heights, weights, IQs, sample means
– Beta: good for specifying bounded data
– Gamma/Weibull/Bumgel/Lognormal: good for reliability data
– Empirical: “sample” distribution
By the Law of Large Numbers, the sample mean will approximate the population mean as sample size becomes large.
A statistic is a function of observations that are not explicitly dependent on any unknown parameters. They are used to estimate an unknown parameter from the underlying probability distribution of the observations.
Statistics are random variables. If you take two different samples, then you would expect two different values for a statistic.
Mean Squared Error (MSE) equals variance + $bias^2$ .
Bias refers to results that are systematically off the mark, while variance is how far the results are spread out from their average. Here is a fantastic article on bias in statistics. A good introduction to variance is explained in another article I published.

Maximum likelihood estimation (MLE) is a popular method of estimating the parameters (e.g. mean, variance) of a probability distribution.
Invariance property of MLEs

The Central Limit Theorem (CLT)

Module 3: Hand Simulations

Return to Table of Contents

You can approximate an integral using monte carlo simulation. All you need is to generate a large number of uniform random variates (e.g. random numbers between 0-1).
Excel has functions that let you generate random numbers. RAND( ) generates a random number between 0-1.
To generate a random normal number in Excel, you can use NORM.INV(RAND( ), population mean, population variance). NORM.INV uses the Inverse Transform Method that was mentioned in item #8.

Module 4: General Simulation Principles

Return to Table of Contents

Steps in a simulation study
(i) Problem Formulation- statement of the problem
(ii) Objectives and Planning- what specific questions should be answered?
(iii) Model Building- what type of model is necessary?
(iv) Data Collection- what kinds of data and how much?
(v) Coding- choose a software tool, write the program, pick the modeling paradigm (event-scheduling or process-interaction)
(vi) Verification- is the code OK? If not, go back to (v)
(vii) Validation- is the model OK? If not, go back to (iii)
(viii) Experimental Design- what experiments need to be run to efficiently answer the questions?
(ix) Run Experiments
(x) Output Analysis- statistical analysis, estimate relevant measures of performance. Run more experiments if needed.
(xi) Write reports and implement
The simulation clock is a variable that represents simulated time, which is not equal to real time.
Simulated time advances through two different approaches, Fixed-Increment Time Advance and Next-Event Time Advance.
Fixed-Increment updates the state of the system at fixed times, which is useful for continuous-time models.
Next-Event advances to the next most imminent event. Times of all known future events are determined and placed in the future events list (FEL). At each event, the system state is updated, and the FEL is updated. Nothing happens in between events in this list.
Efficient list processing is critical for the FEL. Luckily, most commercial simulation packages take care of the FEL stuff for you.
There are two simulation modeling approaches, Event-Scheduling and Process-Interaction.
The Event-Scheduling approach concentrates on the events and how they affect the system state. It keeps track of every event in increasing order of time of occurrence, which requires A LOT of effort.
The Process-Interaction approach concentrates on a generic customer (entity) and the sequence of events and activities it undergoes as it progresses through the system. The simulation language handles the event scheduling for all of the generic customers. This approach saves a lot of programming effort. For example, a customer is generated, eventually gets served, and then leaves.
CREATE–> PROCESS –> DISPOSE.
When considering which software tool to use, focus on the learning curve, cost, publicly available documentation, and its features (e.g. random variate generators, debugging aids, etc.).

Module 5: ARENA

Return to Table of Contents

Arena, an easy-to-use discrete event simulation software tool, can be downloaded for free for students. Here is the user manual.
Arena follows the Process-Interaction approach and will handle the FEL for you in the background.
Arena contains a large number of modules that are organized into panels. The panels are structured from high level to low level concepts (e.g. Basic Process, Advanced Process, Advanced Transfer).
Entities must be created to enter the model and disposed to leave the model.
Entities queue when they need processing. An entity will try to seize a resource, but if the resource is not available it will wait in a queue.
The time the entity uses the resource is the delay. The entity releases the resource when processing is complete.
Resources have a name, a capacity, and can have a schedule.
Arena keeps a number of internal, global variables continually updated. For example, you can access the simulation clock through the variable, TNOW.

Module 6: Random Number Generation

Return to Table of Contents

$Unif (0,1)$ random numbers are the key to random variate generation in simulation (i.e. transforming uniforms to get other random variables).
The goal of random number generation is to use an algorithm that produces a sequence of pseudo-random numbers (PRNs) that “appear” to be identically, independent distributed Unif(0,1). The algorithm should be very fast and have the ability to reproduce any sequence it generates. These PRN’s are not completely random, but they are close enough if generated correctly.
PRNs are the building blocks of every other random variable.
Examples of bad random generators are outputs of a random device (e.g. flipping a coin), a table of random numbers, the Mid-Square Method, and Fibonacci and Additive Congruential Generators.
Linear congruential generators (LCG) are the most commonly used generators. They are good when implemented properly.
For LCG’s, choose $a, c, m$ carefully to get good statistical quality and long period/cycle length (i.e. time until LCG starts to repeat itself).
$X_i = (aX_i-1 + c) mod \ m, \ X_0$ is the seed
$R_i = X_i / m, \ i = 1, 2, …$
Tausworthe generators are another good generator. It seems to be more popular with computer science folks because it relies on binary numbers to generate uniform numbers.
In order to confirm you chose a good PRN generator, you need to run statistical tests to check goodness of fit and independence. A goodness of fit test confirms that the PRNs approximately fit a Unif (0,1) distribution and the test for independence confirms that the PRNs are approximately independent.
When you design the statistical test, you set the level of significance. $alpha = P(Reject \ H_0 \ | \ H_0 \ true) $. Alpha, $\alpha$ is the probability of Type I error.
The $\chi_2 $ goodness of fit statistic is $$ \chi_0^2 = \sum_{i=1}^k \frac{(O_i – E_i)^2}{E_i} $$
$O_i$ are observed values and $E_i$ are expected values.
Reject the null hypothesis (PRNs are uniform) if $\chi_0^2 > \chi_{\alpha,k-1}^2 $
For the g-o-f test to work, you pick $k,n$ such that $E_i >= 5$ and $n$ is at least 30. $k$ is the number of intervals and $n$ is the number of observations.
The null hypothesis for an independence test is that the PRNs are independent.
A “run” is a series of similar observations. Runs tests can test for whether the PRNs are approximately independent. A runs test will reject the null hypothesis if there are too many or too few runs.
Two examples of runs tests are “up and down” and “above and below the mean”. They act exactly as they sound by testing the frequency that the runs increase/decrease or the frequency that the runs are above/below the mean.
Other tests for independence include autocorrelation tests. These look for correlation in the observations. Spoilers…there should not be any.

Module 7: Random Variate Generation

Return to Table of Contents

The goal is to use $Unif (0,1)$ numbers to generate observations (variates) from other distributions, and even stochastic processes.
PRN’s can be used to generate random interarrival times, service times, breakdowns, correlated heights and weights, Markov chains, Poisson processes, time series, and even Brownian motion. These random variates, plugged into the appropriate distribution or process, will output simulated data.
Inverse Transform Method Poisson Example

For those more-difficult cases, there are more-efficient algorithms for random variate generation. These include the cutpoint method, convolution, the composition method, the acceptance rejection method, among others.
The convolution method refers to adding things up. Adding multiple RVs can be quite inefficient if done one by one.
If the random variables (RV’s) are identical and independent $Bernoulli(p) $, then the summation of these RVs is $Binomial (n,p) $.
$$ If X_1, ..X_n i.i.d \sim \ Bern(p), then \ Y = \sum_{i=1}^n Bin (n,p) $$
$$ If \ U_1 \ and \ U_2 \ are \ i.i.d. \ U(0,1), then \ U_1 + U_2 \ \sim \ Triangular(0,1,2) $$

If the RVs are identical and independent $Exponential (\lambda)$, then the sum of these Rv’s forms a $Gamma$ or $Erlang_n (\lambda)$ distribution. This can be shown by the inverse transform method.

FYI, the upper case pi symbol indicates taking the product of each uniform random number.

If the RV’s are identical and independent $Normal(0,1)$, then the summation of them is $X^2 (n)$.
The Acceptance Rejection (A-R) method is one of the more popular methods in practice.
The majority of c.d.f.’s cannot be inverted efficiently. A-R samples from a distribution that is “almost” the one we want, and then adjusts by “accepting” only a certain proportion of those samples.

Suppose we want to simulate a continuous RV $X$ with $p.d.f. f(x)$, but that it’s difficult to generate directly. Also suppose that we can easily generate a RV having $p.d.f. h(x) \equiv t(x)/c$, where $t(x)$ majorizes $f(x)$ (i.e. $t(x)$ is greater than $f(x)$ for all values of $x$).

The above theorem suggests the following A-R algorithm where “$U \le g(Y)$” is the acceptance event.
Repeat
_____Generate $U \ from \ U(0,1)$
_____Generate $Y \ from \ h(y)$ (independent of $U$)
until $U \le \ g(Y) = \ f(Y) / t(Y) = \ f(Y) / c*h(Y) $
Return X <– Y

See the figure below
Generate a point $Y$ uniformly under $t(x)$ (equivalently sample $Y$ from p.d.f. $h(x)$.
Accept the point with probability $f(Y) / t(Y) = f(Y) / c*h(Y)$.
IF you accept, then set $X= \ Y$ and stop.

A-R Example: Obtain a $Pois(2)$ RV. Sample until
$$e^{-1*\lambda} = 0.1353 > \Pi_{i=1}^{n+1} U_i $$

$n$	$U_{n+1}$	$\Pi_{i=1}^{n+1} U_i $	Stop?
0	0.3911	0.3911	No
1	0.9451	0.3696	No
2	0.5033	0.1860	No
3	0.7003	0.1303	Yes

Keep multiplying the uniform random numbers until the value is less than 0.1353. This occurs when $n=3$. Therefore, we take $X=3$.

Box-Muller Method: An easy way to generate standard normal random variates.

Note: The trig calculations must be done in radians.

Corollary: The sum of two identical and independent normal RVs is an Exponential distribution. Therefore, using #75 we can show that $$ Z_1^2 + Z_2^2 ~ Exp(1/2) \ and \ \chi^2(2) ~ Exp(1/2) $$

Multi-dimensional generation is important too. For example, heights and weights can be modeled as bivariate normal distributions.
A multivariate normal distribution is a vector in multiple normally distributed variables, such that any linear combination of the variables is also normally distributed. Follow these steps to calculate the random variate vector, $X$.
– Calculate the Cholesky matrix C such that $\sum = C*C’$
– Generate $Z_1, Z_2, …, Z_k ~ i.i.d. Nor(0,1)$
– $ X = \mu + C*Z$

A normal (or Gaussian) distribution in 2 variables.

Different methods for generating stochastic processes include Markov Chains, Poisson Arrivals, Nonhomogeneous Poisson Arrivals, Time Series (MA(1), AR(1), ARMA(p,q), EAR(1), Autoregressive Pareto) , M/M/1 Queue Waiting Times, and Brownian Motion.
Markov chains are stochastic models describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. It compares random uniform numbers to the values in the transition probability matrix in order to generate output. This is a time-series stochastic process.

When the arrival rate is a constant $\lambda$, the interarrivals of a $Poisson(\lambda$) process are i.i.d. $Exp(\lambda$), but if the arrival rate is not constant then it becomes a nonhomogeneous poisson process as seen below.

For the Nonhomogeneous Poisson process, we can use The Thinning Algorithm, a type of Acceptance-Rejection method, with the following:

– Assumes that $\lambda^* = max_t\lambda(t)$ is finite
– Generates potential arrivals at the max rate $\lambda^*$
– Accepts a potential arrival at time $t$ with probability $\lambda(t) / \lambda^*$

The First-Order Moving Average Process, or MA(1), is defined by
$$Y_i = \epsilon_i + \theta*\epsilon_i, for \ i = 1,2,… $$
where $\theta$ is a constant and the error terms are $Nor(0,1)$ RVs that are independent of $Y_0$.
The MA(1) is a popular tool for modeling and detecting trends.
Brownian motion is a widely used tool for financial analysis, queuing theory, statistics, and other operations research or industrial engineering applications. It is sort of a random walk that evolves over time. It is arguably the most important stochastic process.
Brownian motion definition:

Increments are defined as anything like $W(b) – W(a)$.
Stationary increments: the distribution of $W(t + h) – W(t)$ only depends on $h$.
Independent increments: If $a < b < c < d$, then $W(d) – W(c)$ is independent of $W(b) – W(a)$
$ Cov(W(s), W(t)) = \ min(s,t) $
Generating Brownian Motion Random Variates:

Example of generating Brownian Motion Random Variates

A Brownian Bridge, $B(t)$, is conditioned Brownian Motion such that
$W(0) \ = \ W(1) = \ 0$

Module 8: Input Data Analysis

Return to Table of Contents

How do we know which random variables to use as inputs for simulation? If you feed the simulation with the wrong RVs or questionable data, it will give you the wrong output. “Garbage in, garbage out.”

The goal is to use RVs in your simulation that adequately approximate what’s going on in the real world. Examples include interarrival times, service times, and breakdown times. Take a look at this case study where Netflix simulated power outages to make their system more robust and resilient in times of trouble. Netflix’s smooth performance is no coincidence. It requires constant iteration through simulation.
High level steps for input data analysis:
– Collect data for analysis
– Estimate the underlying distribution, along with associated parameters
– Conduct a statistical test to see if the distribution is “approximately” correct
If you take enough observations, the histogram will eventually converge to the true distribution.
Which distribution (See item #14/#15)?
– Continuous vs. Discrete
– Univariate vs. Multivariate
– How much data is available?
– Any experts to ask about the nature of the data?
– What if you don’t have much data? Can you make an educated guess at the underlying distribution?
Prior to conducting any statistical tests, you need to estimate the relevant parameters (See item #17 through #22).
Through a goodness of fit (GOF) test, the null hypothesis will be an assumption that the observations fit a particular distribution. The alternative hypothesis will be that they do not fit this distribution.
High level steps for the GOF:
– Divide the domain of $f(x)$ into $k$ sets, $A_1, …A_k$ (distinct points for discrete and intervals if $X$ is continuous)
– Tally the actual numbers of observations that fall in each set (e.g. $O_i, \ i=\ 1,2,…k.$ if $p_i \equiv \ P(X \in A_i), $ then $O_i \sim \ Bin(n,p_i)$.
– Determine the expected number of observations that would fall in each set, if $H_0$, were true (e.g. $E_i = \ E[O_i] = \ np_i, \ i=1,2,…k.$)
– Calculate a test statistic based on the differences between the $E_i’s$ and $O_i’s$. The chi-squared GOF test statistic is:
$$ \chi_0^2 = \sum_{i=1}^k \frac{(O_i – E_i)^2}{E_i} $$
Reject the null hypothesis (Observations fit the distribution) if $\chi_0^2 > \chi_{\alpha,k-1-s}^2 $. $k$ is the number of sets and s is the number of parameters estimated.
Example: Test whether the observations fit an exponential distribution by using a $\chi^2$ GOF test and equal probability intervals.
– $H_0$: the observations fit an Exponential distribution
– $F(a_i) = \ P(X \le a_i) = \ 1 – e^{-1*\lambda*a_i} = \ i/k, \ i = 1,2,…k$
– $a_i = \ (-1/\lambda) * ln(1-i/k), \ i = 1,2,…k$
– $\lambda$ is unknown –> s = 1
– MLE: $\hat\lambda = \ 1 / \bar{X}$ –> By Invariance Property, the MLEs for the $a_i$’s are:
– $\hat{a_i} = \ -1 / \hat\lambda * ln(1 – i/k) = \ -1*\bar{X} * ln(1 – i/k), i= \ 1,2,….k$
– $n = \ 100$ observations and want to divide into $k= \ 5$ equal probability intervals. Assume the sample mean is $\bar{X} = 9$.
– $\hat{a_i} = -9.0*ln(1 – 0.2*i), i = \ 1,…5$

Conclusion: Reject $H_0$. These observations are not Exponential.

There are goodness of fit tests other than the $\chi^2$ test. One test that is good for small sample situations is the Kolmogorov-Smirnov (K-S) goodness of fit test. It can only be applied to continuous distributions.
Input analysis can prove difficult if there is any of the following situations:
– little/no data
– data that doesn’t look like one of the usual distributions
– Nonstationary data (data that changes over time)
– Multivariate/correlated data
Potential solutions for the difficult situations:
– Interview experts to learn more about the nature of the data (min, max, most likely distribution, any other physical characteristics, etc.)
– Sample from the empirical distribution or a smoothed version of the empirical (i.e. bootstrap samples from empirical distribution)
– Model as a mixture of reasonable distributions
– Look at nonstationary models like nonhomogeneous poisson
– Specify a constant arrival rate for different periods of data
– Correlated data might require multivariate models or time series models
– Estimate relevant parameters
ARENA has the Input Analyzer that gives you the “best” distribution from its library with relevant sample and GOF statistics

Module 9: Output Data Analysis

Return to Table of Contents

If the input data into a simulation is random variables, then the output of the simulation must be random as well.
Runs of the simulation only provide estimates of measures of system performance, which are random variables and subject to sample error. Sampling error must be taken into account to make valid concerning the system’s performance.
Output from simulations is rarely i.i.d. It tends to be serially correlated, not identically distributed, and not normally distributed. For example, customer waiting times from a queuing system are correlated. My waiting time in a line at Chick-fil-a will likely affect the waiting time of the next guy. Therefore, “classical” statistical techniques are difficult to apply.
Two types of simulation with respect to output analysis (in this course):
– Finite-Horizon (Terminating, short-term performance)
– Steady-State (Long-term performance)
Finite Horizon
– The termination of a Finite-Horizon simulation takes place at a specific time or is caused by the occurrence of a specific event
– Examples: Mass transit during rush hour, production system until a machine breaks down
Steady-State
– A characteristic of the equilibrium distribution of an output process
– Examples: Distribution system over a long period of time, Markov Chains

Primer on Hidden Markov Models: Part 1 | Hep Cap — Markov Chain: Probability it is Sunny/Rainy given that the previous day was Sunny/Rainy (Source)

In these notes, assume that the output observations are identically distributed with mean $\mu$, but not independent. This is a frequent assumption for steady state simulations.
In situations with correlated output data, you need to include the effects of the covariances in any estimation of the standard deviation. These are not the typical sample standard error formulas you use elsewhere in statistics where i.i.d. is assumed. As a result, estimating these parameters can be tricky.
One method of dealing with output data that is not i.i.d. is independent replications (IR).
IR estimates the variance of the sample means, $Var(\overline{Y_m})$, by conducting $r$ independent runs/replications of the system, where each run consists of $m$ observations.
To make the runs independent you simply re-initialize each replication with a different pseudo-random number seed. Item #46 is relevant here because it is important that you have the ability to reproduce any sequence it generates.
The sample mean from replication $i$ is:
$$Z_i \equiv \ 1/m * \sum_{j=1}^m Y_{i,j}, $$
where $Y_{i,j}$ is observation $j= \ 1,…,m$ from replication $i = 1,…,r$ (e.g. $Y_{i,j}$ is customer $j$’s waiting time from rep $i$.
If each run is started under the same operating conditions, then the replication sample means, $Z_1,…., Z_r$ are i.i.d. random variables.
Variance for the sample means:

The approximate IR $100 (1 – \alpha)$% two-sided confidence interval for $\theta$:
$$\theta \in \ \bar{Z} \ \pm \ t_{\alpha / 2, r-1} * \sqrt{S_{Z}^2}/r$$

Prior to running a simulation, you need to provide initial values for all of the simulation’s state variables. These arbitrary values could have a significant effect on the outcome of the simulation. This initialization bias problem can lead to errors, particularly in steady-state analysis. Just imagine if you always ran a bank teller line’s simulation as starting from empty (i.e. when it opens). However, this might not be the case for any customer who enters the line during the middle of the day.
Initialization bias can lead to point estimators for steady-state parameters having high MSE and confidence intervals having poor coverage.
You can detect initialization bias visually or via statistical tests (e.g. does the mean or variance of the process change over time).
If initilization bias is detected, you can address it with the following:
– Truncating the output by allowing the simulation to “warm up” before data is retained for analysis
– Make a “very long” run
For truncation, it is suggested to average observations across several reps and visually choosing a truncation point. Newer techniques include change point detection algorithms
Truncating the output is the most popular method for addressing initialization bias.
A very long simulation run will overwhelm the initialization bias. However, this approach can be wasteful with observations and could be too expensive from a monetary or time perspective.
Once initialization bias is dealt with, you can perform steady state analysis.
Steady state output analysis can be conducted via batch means, IR, standardized time series, spectral analysis, regeneration, ARMA time series modeling, etc.
Batch means is often used to estimate variance, $\sigma^2$, and to estimate confidence intervals for mean, $\mu$.
For batch means, you divide one long simulation run into a number of contiguous batches and apply the CLT to assume that the resulting batch means are approximately i.i.d. normal. In particular, suppose we partition $Y_1, …, Y_n$ into $b$ nonoverlapping, contiguous batches consisting of $m$ observations (assume $n = \ b*m)$.

The $i$th batch mean is the sample mean of the $m$ observations from batch $i = 1,2,….,b$,
$$\bar{Y_{i,m}} \equiv \ 1/m * \sum_{j=1}^m Y_{(i-1)*m + j} $$
The batch means are correlated for small $m$, but for large $m$, they are approximately $normal (\mu$, $\sigma^2 / m)$
The batch means variance estimator is defined as:

The goal is to find the best choice of $b$ and $m$, subject to the sample constraint that $n=b*m$, so as to minimize the MSE.
Since the batch means are approximately normal for large $m$, the approximate $100(1- \alpha)$% confidence interval for mean $\mu$:

$$\mu \in \ \bar{Y_n} \ \pm \ t_{\alpha / 2, b-1} * \sqrt{\hat{V_B}}/n$$
A common recommendation is to take $b = \ 30$ batches and increase batch size, $m$, as much as possible.
Problems with batch means typically occur if the outputs are not stationary (e.g. significant initialization bias present), if the batch means are not normal, or if the batch means are not independent.

If initialization bias is present in each replication, batch means would be the preferred method over IR.
Another method, overlapping batch means, has the same bias as batch means, but lower variance.

Module 10: Comparing Systems

Return to Table of Contents

Statistics and simulation experiments are typically performed to analyze a small number of systems. For comparing two systems you can use confidence intervals or various paired versions. However, if there are more than 2 systems, you can use ranking and selection techniques.
Confidence intervals (CIs) for the difference in two normal means can be carried out by any of the following methods:
– pooled CI (use when $\sigma_x^2$ and $\sigma_y^2$ are equal but unknown)
– approximate CI (use when $\sigma_x^2$ and $\sigma_y^2$ are unequal and unknown)
– paired CI (use when $Cov(X_i, Y_i) \gt \ 0)$
Recall the formula for sample variance:
$$S_X^2 \equiv \ \sum_{i=1}^n (X_i – \bar{X_n})^2 / (n-1)$$
Pooled CI: If the $X$’s and $Y$’s are independent but with common, unknown variance, then the usual CI for the difference in means is:
$$\mu_X \ – \mu_Y \in \bar{X{_n}} – \bar{Y{_n}} \ \pm \ t_{\alpha / 2, n+m-2} * S_p * \sqrt{(1/n)+(1/m)}$$
$$S_p^2 \equiv \ ((n-1)*S_X^2 + \ (m-1)*S_Y^2) / \ (n+m – 2) $$
$S_p^2$ is the pooled variance estimator for $\sigma^2$.
Approximate CI: If the $X$’s and $Y$’s are independent but with arbitrary, unknown variances, then the usual CI for the difference in means is:
$$\mu_X \ – \mu_Y \in \bar{X{_n}} – \bar{Y{_n}} \ \pm \ t_{\alpha / 2, v} * S_p * \sqrt{(S_X^2/n)+(S_Y^2/m)}$$
$$ v \equiv \ [((S_X^2/n)+(S_Y^2/m))^2 / ((S_X^2/n)/(n+1)+(S_Y^2/m)/(m+1))] – 2$$
The CI is not quite exact, since it uses an approximate degrees of freedom.
Paired CI: The Assumptions and Set Up
– Collected data from two populations in pairs
– The different pairs are independent, but the values within a pair might not be independent
– Utilize common random numbers for both populations
– Assume the population variances are unknown and possibly unequal
Paired CI: Defining the Difference
– Define the pair-wise difference as $D_i = \ X_i – Y_i, \ i=1,2,…,n$
– Then, the differences are $Normal(\mu_D, \sigma_D^2)$
– $\sigma_D^2 \equiv \ \sigma_X^2 + \sigma_Y^2 – 2*Cov(X_i,Y_i)$
– A higher covariance between the values lowers the variance, which is good!
Paired CI: Mean, Variance, and CI
$$ \bar{D} \equiv \ (1/n)*\sum{i=1}^n \ \sim \ Nor(\mu_D, \sigma_D^2 / n)$$
$$ S_D^2 \equiv \ 1/(n-1) * \sum(D_i – \bar{D})^2 $$
$$\mu \in \ \bar{D} \ \pm \ t_{\alpha / 2, n-1} * \sqrt{S_D^2/n}$$
Paired CI Example: The amount of time it took for the same person to park each type of car.

Person	Toyota	Ford	Difference
1	10	20	-10
2	25	40	-15
3	5	5	0
4	20	35	-15
5	15	20	-5

The 90% two-sided CI is:
$$\mu \in \ \bar{D} \ \pm \ t_{.10/2, 5-1} * \sqrt{S_D^2/n}$$
$$-9 \pm \ 2.13 * \sqrt{42.5/5} \ = \ -9 \pm \ 6.21$$

Simulation is advantageous for comparing systems via classical confidence intervals, variance reduction methods, and through ranking and selection procedures.
High Level Example: Evaluate two different restart strategies that an airline could initiate following a snowstorm. Which policy minimizes the cost function associated with the restart?
– Let each observation, $Z_{i,j}$ be the $j$th simulation replication from strategy $i$
– Assume the observations from strategy i are i.i.d. with unknown mean and variance
High Level Example (Continued)
– Get independent data by controlling the random numbers between replications
– Get identically distributed costs between reps by performing reps under identical conditions
– Get approximately normal data by adding up (or averaging) many sub-costs to get overall costs for both strategies
– If the CI of the difference between means contains zero, then the two systems are statistically about the same. If the CI is entirely to the left or right of zero, then one system is likely better than the other.
Variance reduction techniques include common random numbers, antithetic numbers, and control variates.
Common random numbers uses the same pseudo-random numbers in exactly the same ways for corresponding runs of each of the competing systems. For example use the same customer arrival and service times when testing different customer service systems at a Starbucks. By subjecting the alternative systems to identical experimental conditions, we hope to make it easy to distinguish which systems are best, even though the respective estimators have sampling error.

Common randoms can sometimes induce positive correlation between point estimators $\hat\theta_A$ and $\hat\theta_B$. The covariance between the two parameters reduces the variance as seen below.
$$Var(\hat\theta_A – \ \hat\theta_B) \ = \ Var(\hat\theta_A) + \ Var(\hat\theta_B) – \ 2*Cov(\hat\theta_A, \ \hat\theta_B) \lt \ Var(\hat\theta_A) + \ (\hat\theta_B)$$
Antithetic random numbers is similar to common random numbers, except it introduces negative correlation between estimators. If we can introduce negative correlation between the two estimators, then the average of the two is also unbiased and may have low variance.
Ranking and Selection methods answer questions such as:
– Which of the 10 fertilizers produces the largest mean crop yield? (Normal)
– Find the pain reliever that has the highest probability of giving relief for a cough. (Binomial)
– Who is the most popular former Green Bay Packer football player? (Multinomial)
Ranking and Selection methods are relevant to simulation:
– Normally distributed data by batching
– Independence by controlling random numbers
– Multi-stage sampling by retaining seeds
Finding the normal distribution with the largest mean can be accomplished via the indifference zone approach. A correct selection (CS) of the largest mean, $\mu_{[k]}$, requires a specific probability, $P^*$, that the largest mean, $\mu_{[k]}$, is at least a specific amount, $\delta^*$ larger than the rest of the means.
Single-Stage Procedure $N_B$ (Bechhofer 1954)
– Takes all necessary observations and makes the selection decision in one single stage
– Assume populations have common known variance
– For the given $k$ number of means and specified $(P^*, \ \delta^* / \sigma)$, determine sample size $n$ (usually from a table).
Single-Stage Procedure (Continued)
– The sample size determined above is the minimum amount of samples needed to be confident that the largest sample mean will be the largest population mean.
– The conclusion will be referenced as the probability of a correct selection, $P(CS)$, when stating that one population has the largest mean.
Single-Stage Procedure Drawbacks
– It assumes the variances are common and known
– It requires populations to be independent
– Need to be careful when using it for simulated data
– It’s conservative since sample size is for the least favorable configuration of the means
Other options for unknown and unequal variances include the following:
– Rinott’s two stage procedure
– Kim and Nelson sequential procedure
Rinott’s two stage procedure and Kim and Nelson’s sequential procedure both require i.i.d normal observations. The batch means method helps address these concerns because batch means are approximately normal for large sample sizes. They are also approximately independent.
The goal of Bernoulli Probability Selection is to select the Bernoulli population with the largest success parameter. It also uses the indifference zone approach.
Examples:
– Which drug treatment is the most effective?
– Which simulated system is most likely to meet design specs?
Bernoulli Probability Selection
– Select the population having the largest probability, $p_{[k]}$.

Delta is interpreted as the smallest difference worth detecting.

Single Stage Procedure $B_SH$ (Sobel and Huyett 1957)
– For the specified $(P^*, \Delta^*)$, find $n$ from a table
– Take a sample of $n$ observations $X_{ij} \ (1 \le \ j \ \le n)$ in a single stage from each population $(1 \le \ i \ \le k)$
– Calculate the $k$ sample sums $Y_in = \ \sum_{j=1}^n X_{ij}$
– Select the treatment that yielded the largest $Y_{in}$ as the one associated with $p_{[k]}$; in the case of ties, randomize.

Multinomial Cell Selection
– Select the multinomial category having the largest probability
– Uses the indifference zone approach
– Examples:
– Who is the most popular political candidate?
– Which TV show is most watched during a particular time slot?
Multinomial Experimental Set Up:
– $k$ possible outcomes (categories)
– $p_i$ is the probability of the $i$th category
– $n$ independent replications of the experiment
– $Y_i$ is the number of outcomes falling in category $i$ after the $n$ observations have been taken
Definition of a multinomial distribution:

The Single-Stage Procedure $M_{BEM}$:
– For any given $k, \ P^*, \ \theta^*$, find $n$ from the table (related to Bechhofer, Elmaghraby, and Morse 1959)
– Take $n$ multinomial observations $X_j = \ (X_1j, \ …., X_kj) \ ((1 \le \ j \ \le n)$ in a single stage
– Calculate the ordered sample sums $Y_{[1]n} \ \le \ …. \ \le Y_{[k]n}$. Select the category with the largest sum, $Y_{[k]n}$, as the one associated with $p_{[k]}$, randomizing to break ties

Done. Snap back to reality.

~ The Data Generalist

Agentic AI Systems Compound ErrorsApril 27, 2026
AI Agents Will Never Have Full ContextFebruary 15, 2026
All AI Agentic Roads Lead to Data EngineeringJanuary 1, 2026
AI Will Extract Every Last Dollar Out of YouJuly 23, 2025
10 Tips About AI That Will Never ChangeJuly 21, 2025

Tags:analytics, arena, georgia tech, goodness of fit, gt, masters, matrix, models, omsa, probability, random number generation, simulation, statistics, summary

Introduction

Table of Contents

Objectives

Prerequisites (Modules 1 and 2)

Module 3: Hand Simulations

Module 4: General Simulation Principles

Module 5: ARENA

Module 6: Random Number Generation

Module 7: Random Variate Generation

Module 8: Input Data Analysis

Module 9: Output Data Analysis

Module 10: Comparing Systems

Done. Snap back to reality.

Leave a ReplyCancel reply