Simulating Data¶

We’ll start our statistical analysis of financial data with the concept of simulating a data set. This is useful for two reasons. First, it gives us explicit control over what the data looks like. When we turn to testing/analyzing data, having specified clear properties of the data will make it easier to see how the analysis reflects the underlying structure to the data. Second, some types of more complex analytics require simulation (on top of using real data). Option pricing is a classic example of this, and one that we will explore later.

Randomness is impossible for a computer to create. This is because computers only respond to instructions, and thus any simulated “randomness” is actually just a response to instructions by a human. For example, suppose that you simulate a coin flip on a computer. The coin is fair, and so you tell the computer that with probability one half the coin lands on heads, and with probability one half the coin lands on tails. But your computer doesn’t have a real, physical coin to flip. It has to predict a coin flip, using a random number generator (RNG). This RNG will return either the value “heads” or “tails” to the human, but the process by which the RNG selects either “heads” or “tails” is determined by the programmer who created the RNG. Confused? Let’s step through an example problem. In this case, we want to simulate the roll of a regular six-sided die.

Start by importing the numpy module.

import numpy as np

Next, use the random sub-module of numpy to make use of the function randint(). The function randint() takes two arguments, a value for low and a value for high. Try running the following block of code several times.

np.random.randint(low=1, high=6)

The randint(low=1, high=6) command is “randomly” selecting a number between 1 and 6.

What makes this not truly random is the fact that the RNG generating random numbers for the randint() function was written by a human. The RNG is simply a pre-defined process that follows the instructions of the author of the RNG.

We can control the RNG process with a seed. The seed value controls the start of the RNG procedure, so that if we specify a seed before using a RNG, we have an explcitly defined path that the RNG will follow (even though, without looking at the code for the RNG, we won’t be able to know preceisely what this path looks liked).

For example, run this following block of code several times.

print( np.random.randint(low=1, high=6, size=10) )

[5 4 1 1 1 2 1 2 4 1]

Now, specify a seed and run the following multiple times.

np.random.seed(0)
print( np.random.randint(low=1, high=6, size=10) )

[5 1 4 4 4 2 4 3 5 1]

In this latter scenario, the list of random numbers does not change. The seed forces the RNG to start in a particular way, so the “randomly” generated numbers are guaranteed to always be the same.

To ensure that the simulated output for some of this chapter is consistent across multiple runs through Python, many of the code blocks in this chapter will begin by setting a seed value.

When there is an equal chance of being “heads” or “tails”, or having an equal chance of the die rolling on 1, 2, 3, 4, 5, or 6, probabilities are said to follow a uniform distribution. The probability of any particular outcome is uniformly the same for all outcomes.

For our purposes, the normal distribution is more commonly used than a uniform distribution. This is explored in the next section.

Financial Modeling and Analytics Using Python

Simulating Data¶

Normal and Logistic Distributions¶

Application: Simulating Stock Returns¶