Dictionaries and DataFrames

Data, Three Ways

In this book, data (i.e., a collection of information) is stored as one of three variable types. These types are:

  • list

  • dict

  • DataFrame

and we will discuss the relative advantages and disadvantages of each type here.

Recall from our earlier example the list of daily stock returns, re-created below.

daily_stock_returns = [0.03, 0.01, -0.02, 0.01, -0.01]

Lists are the easiest way to deal with data, but they’re also the most limited. For instance, what does daily_stock_returns[2] refer to? We know that it has value \(-0.02\), since that’s the third element of the list. But without any other reference point, the utility of this data is low. We don’t know, for instance, what date corresponds to list index value 2. We also don’t know which stock’s returns are represented in this list.

Thus, lists keep track of a very limited amount of information. They are simple and convenient, and we’ll use them as appropriate. Lists are a fundamental tool, and we’ll never do away with them entirely. However, for certain applications, better tools exist.

Dictionaries

Lists are quick and convenient. No matter how advanced you get in programming, you will always make use of lists. However, lists aren’t the best tool for every job.

For example, one limitation of lists is that we have to refer to items by their position in the list. If we type daily_stock_returns[0], we’d get the stock return for the first item in the list. But what is 0? Besides a relative positioning, it doesn’t tell us anything. In some cases, it would be better if we could access other information, like the date of the stock return.

In python, a dictionary is a tool for storing information in a key-value structure. Have a look at the following dictionary, and we’ll use it to clarify what is meant by “key-value.”

finance_dict = {'debt': 'something that is owed or that one is bound to pay to or perform for another',
                'equity': 'the monetary value of a property or business beyond any amounts owed on it in mortgages, claims, liens, etc.'
                'mortgage': 'a conveyance of an interest in property as security for the repayment of money borrowed'}

These words (definitions taken from dictionary.com), make up a small dictionary variable which we’ve named finance_dict.

Each word in the dictionary (debt, equity, mortgage) constitutes a key. We “look up” keys in dictionaries to find their values. Thus, the key is a word and the value is the definition. The command

finance_dict['debt']

would return the definition for debt.

Returning to the daily stock returns example, suppose that we want to keep track of the day on which each return occurs. This is accomplished with a dictionary below.

daily_dict = {'7/8/19':0.03,
             '7/9/19':0.01,
             '7/10/19':-0.02,
             '7/11/19':0.01,
             '7/12/19':-0.01}

Now, given the variable daily_dict, we can quickly look up the stock return from \(7/8/19\) by entering

daily_dict['7/8/19']

This adds a bit of utility to the data, because now it includes additional information. The keys and the values of a dictionary can each be returned individually. Suppose we wish to know what dates are included in this data:

daily_dict.keys()
dict_keys(['7/8/19', '7/9/19', '7/10/19', '7/11/19', '7/12/19'])

or alternatively the returns:

daily_dict.values()
dict_values([0.03, 0.01, -0.02, 0.01, -0.01])

Recall that, with lists, the only reference point is the index of elements in each list. For example, to get the first element of daily_stock_returns, we use daily_stock_returns[0]. To add to the list, we simply need to use .append() and Python adds to the end of the list. To remove items from a list, we call .pop(i) and Python will remove the element in position i of the list.

Dictionaries work slightly differently, since they have more structure to them. To add to a dictionary, we need to specify both a key (word) and a value (definition). We do this with the .append() function. The function takes as an input a dictionary of new key-value pairs that we want to add.

Thus, to add one more day of return values to daily_dict, we’d do:

daily_dict.update( {'7/15/19':0.04} )

while for a list we’d do

daily_stock_returns.append(0.04)

Note that either lists or dictionaries allow for multiple values to be added simultaneously.

daily_dict.update( {'7/16/19':0.02, '7/17/19':-0.01} )
daily_stock_returns = daily_stock_returns + [0.02, -0.01]

Thus, to add multiple items to a dictionary, we call .append() and give as an argument another dictionary of key-value pairs to add.

To add multiple items to a list, the syntax changes! The .append() function takes whatever you put inside the parentheses and adds that input, as is, to the list. We’ll see later that lists are flexible and that makes them quite convenient to use. However, in this case, the flexibility of a list is tricky. Lists can store elements of different types, so if we used .append( [.02, -.01] ) then Python would as a list element to the pre-existing list of numbers. The following example clarifies.

my_list = [1, 2, 3]
my_list = my_list + [4, 5]
print(my_list)
[1, 2, 3, 4, 5]
my_list = [1, 2, 3]
my_list.append( [4,5] )
print(my_list)
[1, 2, 3, [4, 5]]

Deleting key-value pairs in a dictionary is straightforward. Simply instruct Python to delete a key from the dictionary with del.

my_dict = {'word 1': 'dfn 1', 'word 2':'dfn 2'}
del my_dict['word 1']
print(my_dict)
{'word 2': 'dfn 2'}