Dates¶
If you see the string '2013-02-14'
, your brain automatically reads this as “February 14, 2013”. Python doesn’t have a brain. If we want Python to translate a string that represents a date into something that it can recognize as a date, we need give it some specific instructions.
What does it mean for a variable to be recognized as a date in Python? Well, consider the date '2013-01-31'
. People with brains know that if we increment the date by one, the following date is '2013-02-01'
. Brainless Python doesn’t understand something like '2013-01-31'+1
.
Just like a pandas
DataFrame special variable type is the most useful way to store data in Python, the best way to work with dates is using the datetime
module’s datetime special variable type. The datetime
module’s datetime variable has built-in features that enable Python to do things like add on a day (e.g. add a day to \(1/31/2013\) to get \(2/1/2013\)).
Remember: our goal is not to make things easy today, our goal is to be smart enough to make things easy for the rest of our lives!
Begin by importing the datetime
module
import datetime
The date
column in our pandas
DataFrame is not a datetime
variable. We will create a new column, dt
, that is.
The datetime
module has a function datetime.datetime.strptime()
that takes a date and converts it into a special datetime
variable. Similarly the function datetime.datetime.timedelta(days=1)
is how we could add \(1\) day to our date.
x = 20110131
x_dt = datetime.datetime.strptime( str(x), '%Y%m%d' )
print(x)
print(x_dt)
print(x+1)
print(x_dt + datetime.timedelta(days=1))
20110131
2011-01-31 00:00:00
20110132
2011-02-01 00:00:00
Let’s break down what happens in x_dt = datetime.datetime.strptime(str(x), '%Y%m%d')
x_dt =
: define a new variablex_dt
equal to…datetime.datetime.strptime()
: thedatetime
function capable of converting integer or string dates intodatetime
variablesstr(x)
: we happened to define x as an integer, sostr()
is the function we use to convert it to a stringRemember, integers are variables where we can ask Python to do
x/2
and it understands what to doIn contrast, a string like
y = 'dog'
does not have mathematical properties, and Python will freak out if you ask it to doy/2
It just so happens that the
strptime()
function expects string arguments and not integers
'%Y%m%d'
: this is how we tell thestrptime()
function what the string date is formatted like – lookup the codes here‘20110131’ is of type ‘%Y%m%d’
‘11Jan31’ is of type ‘%y%b%d’
‘31January2011’ is of type ‘%d%B%Y’
To add a new column to data
, all we need to do is put data['dt'] =
and then tell Python what this column should be equal to. For example:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/learning-fintech/data/master/stocks/all_stocks_5yr.csv.gz')
data['dt'] = 1
data.head()
date | open | high | low | close | volume | Name | ret | Mkt | SMB | HML | RMW | CMA | RF | dt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2013-02-08 | 45.07 | 45.35 | 45.00 | 45.08 | 1824755 | A | NaN | 0.58 | 0.03 | -0.21 | 0.12 | -0.45 | 0.0 | 1 |
1 | 2013-02-11 | 45.17 | 45.18 | 44.45 | 44.60 | 2915405 | A | -0.010648 | -0.08 | -0.03 | 0.39 | -0.18 | 0.28 | 0.0 | 1 |
2 | 2013-02-12 | 44.81 | 44.95 | 44.50 | 44.62 | 2373731 | A | 0.000448 | 0.16 | 0.26 | 0.71 | -0.54 | 0.45 | 0.0 | 1 |
3 | 2013-02-13 | 44.81 | 45.24 | 44.68 | 44.75 | 2052338 | A | 0.002913 | 0.14 | 0.08 | -0.01 | -0.01 | -0.04 | 0.0 | 1 |
4 | 2013-02-14 | 44.72 | 44.78 | 44.36 | 44.58 | 3826245 | A | -0.003799 | 0.11 | 0.32 | 0.12 | -0.30 | -0.21 | 0.0 | 1 |
data['dt'] = 1
creates a new column dt
equal to all 1s.
We want to take the column date
and use the strptime()
function to make a column dt
Fortunately, pandas
has a function to_datetime()
that works like the strptime()
function but it can convert and entire column all at once
data['dt'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data.head()
date | open | high | low | close | volume | Name | ret | Mkt | SMB | HML | RMW | CMA | RF | dt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2013-02-08 | 45.07 | 45.35 | 45.00 | 45.08 | 1824755 | A | NaN | 0.58 | 0.03 | -0.21 | 0.12 | -0.45 | 0.0 | 2013-02-08 |
1 | 2013-02-11 | 45.17 | 45.18 | 44.45 | 44.60 | 2915405 | A | -0.010648 | -0.08 | -0.03 | 0.39 | -0.18 | 0.28 | 0.0 | 2013-02-11 |
2 | 2013-02-12 | 44.81 | 44.95 | 44.50 | 44.62 | 2373731 | A | 0.000448 | 0.16 | 0.26 | 0.71 | -0.54 | 0.45 | 0.0 | 2013-02-12 |
3 | 2013-02-13 | 44.81 | 45.24 | 44.68 | 44.75 | 2052338 | A | 0.002913 | 0.14 | 0.08 | -0.01 | -0.01 | -0.04 | 0.0 | 2013-02-13 |
4 | 2013-02-14 | 44.72 | 44.78 | 44.36 | 44.58 | 3826245 | A | -0.003799 | 0.11 | 0.32 | 0.12 | -0.30 | -0.21 | 0.0 | 2013-02-14 |
To the human eye, the columns date
and dt
look identical. That’s because datetime
dates still print out nicely as formatted strings.
We can use the .dtypes
function that is built-in to pandas
DataFrame objects to confirm that dt
is indeed a datetime
variable.
data.dtypes
date object
open float64
high float64
low float64
close float64
volume int64
Name object
ret float64
Mkt float64
SMB float64
HML float64
RMW float64
CMA float64
RF float64
dt datetime64[ns]
dtype: object
String columns will be reported as object
types in pandas.