Dates¶
If you see the string '2013-02-14', your brain automatically reads this as “February 14, 2013”. Python doesn’t have a brain. If we want Python to translate a string that represents a date into something that it can recognize as a date, we need give it some specific instructions.
What does it mean for a variable to be recognized as a date in Python? Well, consider the date '2013-01-31'. People with brains know that if we increment the date by one, the following date is '2013-02-01'. Brainless Python doesn’t understand something like '2013-01-31'+1.
Just like a pandas DataFrame special variable type is the most useful way to store data in Python, the best way to work with dates is using the datetime module’s datetime special variable type. The datetime module’s datetime variable has built-in features that enable Python to do things like add on a day (e.g. add a day to \(1/31/2013\) to get \(2/1/2013\)).
Remember: our goal is not to make things easy today, our goal is to be smart enough to make things easy for the rest of our lives!
Begin by importing the datetime module
import datetime
The date column in our pandas DataFrame is not a datetime variable. We will create a new column, dt, that is.
The datetime module has a function datetime.datetime.strptime() that takes a date and converts it into a special datetime variable. Similarly the function datetime.datetime.timedelta(days=1) is how we could add \(1\) day to our date.
x = 20110131
x_dt = datetime.datetime.strptime( str(x), '%Y%m%d' )
print(x)
print(x_dt)
print(x+1)
print(x_dt + datetime.timedelta(days=1))
20110131
2011-01-31 00:00:00
20110132
2011-02-01 00:00:00
Let’s break down what happens in x_dt = datetime.datetime.strptime(str(x), '%Y%m%d')
x_dt =: define a new variablex_dtequal to…datetime.datetime.strptime(): thedatetimefunction capable of converting integer or string dates intodatetimevariablesstr(x): we happened to define x as an integer, sostr()is the function we use to convert it to a stringRemember, integers are variables where we can ask Python to do
x/2and it understands what to doIn contrast, a string like
y = 'dog'does not have mathematical properties, and Python will freak out if you ask it to doy/2It just so happens that the
strptime()function expects string arguments and not integers
'%Y%m%d': this is how we tell thestrptime()function what the string date is formatted like – lookup the codes here‘20110131’ is of type ‘%Y%m%d’
‘11Jan31’ is of type ‘%y%b%d’
‘31January2011’ is of type ‘%d%B%Y’
To add a new column to data, all we need to do is put data['dt'] = and then tell Python what this column should be equal to. For example:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/learning-fintech/data/master/stocks/all_stocks_5yr.csv.gz')
data['dt'] = 1
data.head()
| date | open | high | low | close | volume | Name | ret | Mkt | SMB | HML | RMW | CMA | RF | dt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2013-02-08 | 45.07 | 45.35 | 45.00 | 45.08 | 1824755 | A | NaN | 0.58 | 0.03 | -0.21 | 0.12 | -0.45 | 0.0 | 1 |
| 1 | 2013-02-11 | 45.17 | 45.18 | 44.45 | 44.60 | 2915405 | A | -0.010648 | -0.08 | -0.03 | 0.39 | -0.18 | 0.28 | 0.0 | 1 |
| 2 | 2013-02-12 | 44.81 | 44.95 | 44.50 | 44.62 | 2373731 | A | 0.000448 | 0.16 | 0.26 | 0.71 | -0.54 | 0.45 | 0.0 | 1 |
| 3 | 2013-02-13 | 44.81 | 45.24 | 44.68 | 44.75 | 2052338 | A | 0.002913 | 0.14 | 0.08 | -0.01 | -0.01 | -0.04 | 0.0 | 1 |
| 4 | 2013-02-14 | 44.72 | 44.78 | 44.36 | 44.58 | 3826245 | A | -0.003799 | 0.11 | 0.32 | 0.12 | -0.30 | -0.21 | 0.0 | 1 |
data['dt'] = 1 creates a new column dt equal to all 1s.
We want to take the column date and use the strptime() function to make a column dt
Fortunately, pandas has a function to_datetime() that works like the strptime() function but it can convert and entire column all at once
data['dt'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data.head()
| date | open | high | low | close | volume | Name | ret | Mkt | SMB | HML | RMW | CMA | RF | dt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2013-02-08 | 45.07 | 45.35 | 45.00 | 45.08 | 1824755 | A | NaN | 0.58 | 0.03 | -0.21 | 0.12 | -0.45 | 0.0 | 2013-02-08 |
| 1 | 2013-02-11 | 45.17 | 45.18 | 44.45 | 44.60 | 2915405 | A | -0.010648 | -0.08 | -0.03 | 0.39 | -0.18 | 0.28 | 0.0 | 2013-02-11 |
| 2 | 2013-02-12 | 44.81 | 44.95 | 44.50 | 44.62 | 2373731 | A | 0.000448 | 0.16 | 0.26 | 0.71 | -0.54 | 0.45 | 0.0 | 2013-02-12 |
| 3 | 2013-02-13 | 44.81 | 45.24 | 44.68 | 44.75 | 2052338 | A | 0.002913 | 0.14 | 0.08 | -0.01 | -0.01 | -0.04 | 0.0 | 2013-02-13 |
| 4 | 2013-02-14 | 44.72 | 44.78 | 44.36 | 44.58 | 3826245 | A | -0.003799 | 0.11 | 0.32 | 0.12 | -0.30 | -0.21 | 0.0 | 2013-02-14 |
To the human eye, the columns date and dt look identical. That’s because datetime dates still print out nicely as formatted strings.
We can use the .dtypes function that is built-in to pandas DataFrame objects to confirm that dt is indeed a datetime variable.
data.dtypes
date object
open float64
high float64
low float64
close float64
volume int64
Name object
ret float64
Mkt float64
SMB float64
HML float64
RMW float64
CMA float64
RF float64
dt datetime64[ns]
dtype: object
String columns will be reported as object types in pandas.