Stock Prices¶

We’ll begin by loading the pandas_datareader module for access to convenient APIs, and the datetime module to specify start and end dates of data that we want to pull.

There are multiple options in the pandas_datareader module for collecting stock price data. Some reliable data providers are:

Tiingo
IEX
AlphaVantage

These three sources require API keys to access their data. An alternative option is to use Yahoo Finance, which also has an API via pandas_datareader. You can use Yahoo without an API key because pandas_datareader will pull data from Yahoo not via a computer-friendly API system but rather by scraping information off of the Yahoo site. If the page gets re-designed, this method will fail until the pandas_datareader module updates to address the change. Yahoo data has some reliability issues (the prices are inaccurate), and it is not recommended to use Yahoo data for any kind of important analysis.

import pandas_datareader.data as web
from datetime import datetime

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3972f8269bfb> in <module>
----> 1 import pandas_datareader.data as web
      2 from datetime import datetime

ModuleNotFoundError: No module named 'pandas_datareader'

The API that we will make use of today is from Alpha Vantage. Click the link to their website to request a free key. Limits on free API users at the time of writing this notebook include:

no more than 5 API requests per minute
no more than 500 API requests per day

Premium memberships allow for more access to the data. For our purposes, the free key is sufficient.

Run the following code cell to pickle your key, filling in the value for api_key as appropriate, using the key that you receive from the Alpha Vantage website. If you want, you may delete this cell from your notebook after successfully pickling your key.

import pickle
api_key = 'YOUR KEY HERE'
with open('../pickle_jar/av_key.p', 'wb') as f:
    pickle.dump(api_key, f)

With your key pickled, you can now load it as follows:

import pickle
with open('../pickle_jar/av_key.p', 'rb') as f:
    api_key = pickle.load(f)
print(api_key[0:3])

VAS

Let’s start by investigating VISA’s stock price in 2015. Begin by specifying start and end dates as datetime dates, as we did in the previous chapter. To access daily data from Alpha Vantage, we use av-daily-adjusted as our source. Unlike in the previous chapter, we now need to use an api_key argument in the DataReader() function. Using this key allows for Alpha Vantage to impose rate limits on free accounts.

start = datetime(2015, 1, 1)
end = datetime(2015, 12, 31)

visa = web.DataReader('V', 'av-daily-adjusted', start, end, api_key=api_key)

To see a snapshot of this data, let’s print the head.

visa.head()

	open	high	low	close	adjusted close	volume	split coefficient
2015-01-02	263.38	266.7500	262.4900	265.02	63.342410	2098520	1.0
2015-01-05	263.49	263.4999	258.8652	259.17	61.944202	3187800	1.0
2015-01-06	259.79	260.7800	254.8550	257.50	61.545055	2767523	1.0
2015-01-07	258.44	261.1400	257.4500	260.95	62.369639	2336705	1.0
2015-01-08	263.20	265.1200	262.6300	264.45	63.206174	2610864	1.0

The term’s 'close' and 'adjusted close' may or may not be new to you. To see the utility of the 'adjusted close' data, we’ll begin by plotting the closing stock price for VISA, as reported in the 'close' column.

visa['close'].plot()

<AxesSubplot:>

Yikes! There’s quite a crash in March of 2015. The crash in the closing stock price for VISA corresponds to a stock split. If you don’t adjust your data for stock splits, stock returns based on these underlying prices will have ridiculous outliers in the data series. This is important because this erroneous data will adversely affect the quality of your statistical analysis. For this reason, stock price analysis relies on split-adjusted data. The Alpha Vantage adjusted closing price is a price series that takes in to account splits (following the standard CRSP methodology). The adjusted close price does not demonstrate a similar crash.

visa['adjusted close'].plot()

<AxesSubplot:>

Let’s now pull data for four different companies, using a longer series of data. Our interest here will be in the FANG stocks: Facebook, Amazon, Netflix, Google. Each data series will be a new API call to Alpha Vantage.

end = datetime(2019, 12, 31)

fb = web.DataReader('FB', 'av-daily-adjusted', start, end, api_key=api_key)
amzn = web.DataReader('AMZN', 'av-daily-adjusted', start, end, api_key=api_key)
nflx = web.DataReader('NFLX', 'av-daily-adjusted', start, end, api_key=api_key)
goog = web.DataReader('GOOG', 'av-daily-adjusted', start, end, api_key=api_key)

The four DataFrames that we create above can be used just like in the case of VISA. For example:

fb['adjusted close'].plot()

<AxesSubplot:>

One trick with comparing prices across companies is that we need to normalize the price. The price of a share of stock in a company will depend on the value of the company and the number of shares being traded. Thus, it’s not an apples to apples comparison to look at stock price movements for two different companies with vastly different stock prices. In fact, it’s sometimes difficult to even visualize two different price series if the two price series are of vastly different magnitudes. For example:

fb['adjusted close'].plot()
amzn['adjusted close'].plot()

<AxesSubplot:>

The relative magnitude of FB versus AMZN makes it hard to visualize price movements for AMZN when FB is plotted in the same figure.

Thus, we divide prices by the starting price (the price at the start of our data series). This will scale the prices for all companies in the data so that their movements are more comparable.

Because fb['adjusted close'] corresponds to one series (one column) of data, we can use fb['adjusted close'].iloc[0] to get the first element of that series.

fb['ret'] = fb['adjusted close'] / fb['adjusted close'].iloc[0]
fb['ret'].plot()

<AxesSubplot:>

Compared to our earlier plot of FB’s stock movement, the information is the same. That is, the ups and downs look identical. Note, however, that the scale on the y axis has changed. This makes comparison more straight forward.

amzn['ret'] = amzn['adjusted close'] / amzn['adjusted close'].iloc[0]

fb['ret'].plot()
amzn['ret'].plot()

<AxesSubplot:>

Every stock here has its own DataFrame. That can become a lot of information to keep track of. One common trick to deal with all of these DataFrames is to pack them in to a dictionary. Here, the key of the dictionary is the stock name (the ticker symbol). The value of the dictionary is the corresponding DataFrame of data for the relevant ticker.

fang = {'FB':fb, 'AMZN':amzn, 'NFLX':nflx, 'GOOG':goog}

We can thus reference a DataFrame implicitly. Rather than referring to it’s name directly (e.g. fb), we can get it indirectly via the dictionary (e.g. fang['fb']). Thus, we can re-create the above plot with:

fang['FB']['ret'].plot()
fang['AMZN']['ret'].plot()

<AxesSubplot:>

At first glance, this may seem more cumbersome than the original method. Note, however, that packing DataFrames in to dictionaries is a huge time saver. This is because it allows us to more quickly modify values for all DataFrames. Note that we can get the keys of a dictionary via .keys().

fang.keys()

dict_keys(['FB', 'AMZN', 'NFLX', 'GOOG'])

This is convenient because we can then loop over a list of keys. By doing so, we can modify every DataFrame in our dictionary with a simple for loop. Implicitly referencing DataFrames based on a dictionary key-value relationship can be much more efficient than explicitly referencing each DataFrame by name.

for tic in fang.keys():
    fang[tic]['ret'] = fang[tic]['adjusted close'] / fang[tic]['adjusted close'].iloc[0]

Having normalized prices for all DataFrames, we can now plot the four FANG stocks.

for tic in fang.keys():
    fang[tic]['ret'].plot()

It’s not immediately clear from the figure who the standout stock is. That is, the stock that grew much more than the others over this sample period. The easiest way is to simply print out the ending value for each series. Note that just as we can type .iloc[0] to get the first element of a series, we can type .iloc[-1] to get the last element of the series (and, similarly, the second to last element is .iloc[-2]).

for tic in fang.keys():
    print(tic)
    print(fang[tic]['ret'].iloc[-1])

FB
2.6163161249203313
AMZN
5.989368598470115
NFLX
6.4910651257874274
GOOG
2.5476267601608202

The standout stock is thus NFLX.

Now, suppose we want to combine all these return series in to one DataFrame. That will simplify some later analysis for us. Besides, once we get returns, we generally don’t do much with the other columns of data (e.g. opening price or volume).

As a preliminary step towards combining data together, we need to name the returns for each DataFrame something distinct. Recall that we can combine strings together with ‘+’, as in 'cat'+'dog' equals 'catdog' (note the lack of space).

for tic in fang.keys():
    print(tic + ' ret')

FB ret
AMZN ret
NFLX ret
GOOG ret

Rather than name the return series 'ret' for each DataFrame, we’ll name it tic+' ret', where tic holds the dictionary key that we’re working with. Hence, the return series for FB would be named 'FB ret'.

for tic in fang.keys():
    fang[tic][tic + ' ret'] = fang[tic]['adjusted close'] / fang[tic]['adjusted close'].iloc[0]

fang['FB'].head()

	open	high	low	close	adjusted close	volume	split coefficient	ret	FB ret
2015-01-02	78.58	78.9300	77.700	78.450	78.450	18177475	1.0	1.000000	1.000000
2015-01-05	77.98	79.2455	76.860	77.190	77.190	26452191	1.0	0.983939	0.983939
2015-01-06	77.23	77.5900	75.365	76.150	76.150	27399288	1.0	0.970682	0.970682
2015-01-07	76.76	77.3600	75.820	76.150	76.150	22045333	1.0	0.970682	0.970682
2015-01-08	76.74	78.2300	76.080	78.175	78.175	23960953	1.0	0.996495	0.996495

With that done, we can now subset our data to only include this named return series (e.g. 'FB ret', 'AAPL ret', etc.). All other columns will be removed.

for tic in fang.keys():
    fang[tic] = fang[tic][[tic + ' ret']].copy()

fang['FB'].head()

	FB ret
2015-01-02	1.000000
2015-01-05	0.983939
2015-01-06	0.970682
2015-01-07	0.970682
2015-01-08	0.996495

Here, we have four DataFrames that we want to combine. The .merge() function used previously can link together two DataFrames, but not more than two. To handle a merger between four DataFrames, we need to use .join(). The syntax for merging two DataFrames with .merge() is df1.merge(df2), where df1 and df2 are DataFrames. We can likewise do df1.join(df2) to merge via .join(). More useful, however, is the fact that we can do df1.join([df2,df3,df4]) to combine four DataFrames all at once. Thus, we need two items. First, we need a df1; we’ll use FB for this. Second, we need a list [df2, df3, df4]. We’ll use AAPL, NFLX, and GOOGL here.

F_value = fang['FB']
ANG_value = [fang[tic] for tic in ['AMZN', 'NFLX', 'GOOG']]

F_value

	FB ret
2015-01-02	1.000000
2015-01-05	0.983939
2015-01-06	0.970682
2015-01-07	0.970682
2015-01-08	0.996495
...	...
2019-12-24	2.614659
2019-12-26	2.648693
2019-12-27	2.652645
2019-12-30	2.605609
2019-12-31	2.616316

1258 rows × 1 columns

ANG_value

[            AMZN ret
 2015-01-02  1.000000
 2015-01-05  0.979483
 2015-01-06  0.957118
 2015-01-07  0.967263
 2015-01-08  0.973875
 ...              ...
 2019-12-24  5.799332
 2019-12-26  6.057209
 2019-12-27  6.060547
 2019-12-30  5.986289
 2019-12-31  5.989369
 
 [1258 rows x 1 columns],
             NFLX ret
 2015-01-02  1.000000
 2015-01-05  0.949103
 2015-01-06  0.932854
 2015-01-07  0.937697
 2015-01-08  0.958489
 ...              ...
 2019-12-24  6.684250
 2019-12-26  6.672816
 2019-12-27  6.601801
 2019-12-30  6.485849
 2019-12-31  6.491065
 
 [1258 rows x 1 columns],
             GOOG ret
 2015-01-02  1.000000
 2015-01-05  0.979154
 2015-01-06  0.956460
 2015-01-07  0.954822
 2015-01-08  0.957832
 ...              ...
 2019-12-24  2.560088
 2019-12-26  2.592176
 2019-12-27  2.575961
 2019-12-30  2.545950
 2019-12-31  2.547627
 
 [1258 rows x 1 columns]]

Thus, we can now create a DataFrame that joins together all four single-stock DataFrames.

df = F_value.join(ANG_value)
df.head()

	FB ret	AMZN ret	NFLX ret	GOOG ret
2015-01-02	1.000000	1.000000	1.000000	1.000000
2015-01-05	0.983939	0.979483	0.949103	0.979154
2015-01-06	0.970682	0.957118	0.932854	0.956460
2015-01-07	0.970682	0.967263	0.937697	0.954822
2015-01-08	0.996495	0.973875	0.958489	0.957832

Having all data in one DataFrame is powerful. For instance, suppose that we want to create an equal-weighted portfolio of these four stocks. An equal-weighted portfolio is achieved by allocating 1/4 weight to each stock, which is equivalent to adding up the returns for all four stocks and then dividing by four. We can add across rows via .sum(axis=1), where the axis=1 argument tells pandas to sum by rows (rather than summing by column, which happens via axis=0).

df['ret'] = df[['FB ret', 'AMZN ret', 'NFLX ret', 'GOOG ret']].sum(axis=1) / 4
df.head()

	FB ret	AMZN ret	NFLX ret	GOOG ret	ret
2015-01-02	1.000000	1.000000	1.000000	1.000000	1.000000
2015-01-05	0.983939	0.979483	0.949103	0.979154	0.972920
2015-01-06	0.970682	0.957118	0.932854	0.956460	0.954279
2015-01-07	0.970682	0.967263	0.937697	0.954822	0.957616
2015-01-08	0.996495	0.973875	0.958489	0.957832	0.971673

Plotting is also more convenient when the data come from a single DataFrame. Each series name is added to the plot legend so that it’s easy to determine which stock is which.

df.plot()

<AxesSubplot:>

This portfolio perhaps doesn’t give fantastic diversification benefits. The four FANG stocks have similar ups and downs, and consequently our equal-weighted portfolio also follows those ups and downs.

One way to think about diversification is through return correlation. Use the .pct_change() function to quickly get stock returns (percent changes in price).

pct_chng = df.pct_change()
pct_chng.head()

	FB ret	AMZN ret	NFLX ret	GOOG ret	ret
2015-01-02	NaN	NaN	NaN	NaN	NaN
2015-01-05	-0.016061	-0.020517	-0.050897	-0.020846	-0.027080
2015-01-06	-0.013473	-0.022833	-0.017121	-0.023177	-0.019160
2015-01-07	0.000000	0.010600	0.005192	-0.001713	0.003497
2015-01-08	0.026592	0.006836	0.022173	0.003153	0.014679

The .std() function will then yield a volatility estimate for each stock. Note that this is a daily volatility since we are using daily stock returns. We can convert this to an annual volatility by multiplying by \(\sqrt{252}\).

import numpy as np
vol = pct_chng.std() * np.sqrt(252)
print(vol)

FB ret      0.282214
AMZN ret    0.292495
NFLX ret    0.416745
GOOG ret    0.240093
ret         0.261443
dtype: float64

Here, we observe that, while NFLX definitely grew the most over the period, it had the highest volatility.

The correlation matrix via .corr() will print out a correlation in returns.

pct_chng.corr()

	FB ret	AMZN ret	NFLX ret	GOOG ret	ret
FB ret	1.000000	0.569323	0.405252	0.599273	0.698254
AMZN ret	0.569323	1.000000	0.482377	0.641027	0.817429
NFLX ret	0.405252	0.482377	1.000000	0.466525	0.854580
GOOG ret	0.599273	0.641027	0.466525	1.000000	0.745282
ret	0.698254	0.817429	0.854580	0.745282	1.000000

As usual, seaborn will make things a little more aesthetically pleasing. Correlation matrices can be difficult to digest, because it’s just a big matrix of numbers. The .heatmap() from seaborn will colorize the correlation matrix to make it easier to explore.

import seaborn as sns
sns.heatmap(pct_chng.corr())

<AxesSubplot:>

Let’s finish with another example. Consider three cellular providers. The pandas_datareader module makes it phenomenally easy to construct a portfolio of these companies.

#ticker_list = ['VZ', 'T', 'TMUS']
ticker_list = ['AMD', 'NVDA', 'INTC', 'AAPL']

data = {}
for tic in ticker_list:
    df = web.DataReader(tic, 'av-daily-adjusted', start, end, api_key=api_key)
    df[tic + ' ret'] = df['adjusted close'] / df['adjusted close'].iloc[0]
    df = df[[tic + ' ret']].copy()
    data.update({tic: df})

df2 = data[ticker_list[0]].join([ data[m] for m in ticker_list[1:] ])
df2['portfolio ret'] = df2[[tic + ' ret' for tic in ticker_list]].sum(axis=1) / len(ticker_list)

df2.plot()

<AxesSubplot:>

An interesting feature of the above cell block is that we only need to change the first line of code, and we can create an equal-weighted portfolio of stocks for any set of stocks that we want (note, however, that the free version of the Alpha Vantage API will limit you to loading five stocks per minute).

Financial Modeling and Analytics Using Python

Stock Prices¶