Introduction

This module introduces methods to simulate, store, and summarize data. We being with two tools for storing data: dictionaries and DataFrames. Dictionaries have key-value structures; you can think of the value as the information worth recording and the key as the “tag” for the data that makes it easier to look up. DataFrames are superpowered versions of dictionaries (an analogy that will be made clear in the lecture videos). You should visualize DataFrames as equivalent to recording data in an Excel spreadsheet, with the difference being that DataFrames live in Python-land and thus have a lot more usability than do simple spreadsheets! After learning about how to store information in DataFrames, we’ll learn to simulate information to create hypothetical datasets. These simulations will be helpful to learning about statistical analysis in the following module as well as in forecasting applications in a later module. When data is not generated by simulation and is instead observed from the real world, we need tools to verify the quality of the data and look for outliers. We’ll do this by way of data descriptives and diagnostics.

At the end of this chapter, you’ll be able to:

  1. Create dictionary and DataFrame variables to store complex data

  2. Simulate financial data using a pseudo-random number generator

  3. Report descriptive statistics about financial data