Dates

If you see the string '2013-02-14', your brain automatically reads this as “February 14, 2013”. Python doesn’t have a brain. If we want Python to translate a string that represents a date into something that it can recognize as a date, we need give it some specific instructions.

What does it mean for a variable to be recognized as a date in Python? Well, consider the date '2013-01-31'. People with brains know that if we increment the date by one, the following date is '2013-02-01'. Brainless Python doesn’t understand something like '2013-01-31'+1.

Just like a pandas DataFrame special variable type is the most useful way to store data in Python, the best way to work with dates is using the datetime module’s datetime special variable type. The datetime module’s datetime variable has built-in features that enable Python to do things like add on a day (e.g. add a day to \(1/31/2013\) to get \(2/1/2013\)).

Remember: our goal is not to make things easy today, our goal is to be smart enough to make things easy for the rest of our lives!

Begin by importing the datetime module

import datetime

The date column in our pandas DataFrame is not a datetime variable. We will create a new column, dt, that is.

The datetime module has a function datetime.datetime.strptime() that takes a date and converts it into a special datetime variable. Similarly the function datetime.datetime.timedelta(days=1) is how we could add \(1\) day to our date.

x = 20110131
x_dt = datetime.datetime.strptime( str(x), '%Y%m%d' )
print(x)
print(x_dt)
print(x+1)
print(x_dt + datetime.timedelta(days=1))
20110131
2011-01-31 00:00:00
20110132
2011-02-01 00:00:00

Let’s break down what happens in x_dt = datetime.datetime.strptime(str(x), '%Y%m%d')

  • x_dt = : define a new variable x_dt equal to…

  • datetime.datetime.strptime() : the datetime function capable of converting integer or string dates into datetime variables

  • str(x) : we happened to define x as an integer, so str() is the function we use to convert it to a string

    • Remember, integers are variables where we can ask Python to do x/2 and it understands what to do

    • In contrast, a string like y = 'dog' does not have mathematical properties, and Python will freak out if you ask it to do y/2

    • It just so happens that the strptime() function expects string arguments and not integers

  • '%Y%m%d' : this is how we tell the strptime() function what the string date is formatted like – lookup the codes here

    • ‘20110131’ is of type ‘%Y%m%d’

    • ‘11Jan31’ is of type ‘%y%b%d’

    • ‘31January2011’ is of type ‘%d%B%Y’

To add a new column to data, all we need to do is put data['dt'] = and then tell Python what this column should be equal to. For example:

import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/learning-fintech/data/master/stocks/all_stocks_5yr.csv.gz')

data['dt'] = 1
data.head()
date open high low close volume Name ret Mkt SMB HML RMW CMA RF dt
0 2013-02-08 45.07 45.35 45.00 45.08 1824755 A NaN 0.58 0.03 -0.21 0.12 -0.45 0.0 1
1 2013-02-11 45.17 45.18 44.45 44.60 2915405 A -0.010648 -0.08 -0.03 0.39 -0.18 0.28 0.0 1
2 2013-02-12 44.81 44.95 44.50 44.62 2373731 A 0.000448 0.16 0.26 0.71 -0.54 0.45 0.0 1
3 2013-02-13 44.81 45.24 44.68 44.75 2052338 A 0.002913 0.14 0.08 -0.01 -0.01 -0.04 0.0 1
4 2013-02-14 44.72 44.78 44.36 44.58 3826245 A -0.003799 0.11 0.32 0.12 -0.30 -0.21 0.0 1

data['dt'] = 1 creates a new column dt equal to all 1s.

We want to take the column date and use the strptime() function to make a column dt

Fortunately, pandas has a function to_datetime() that works like the strptime() function but it can convert and entire column all at once

data['dt'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data.head()
date open high low close volume Name ret Mkt SMB HML RMW CMA RF dt
0 2013-02-08 45.07 45.35 45.00 45.08 1824755 A NaN 0.58 0.03 -0.21 0.12 -0.45 0.0 2013-02-08
1 2013-02-11 45.17 45.18 44.45 44.60 2915405 A -0.010648 -0.08 -0.03 0.39 -0.18 0.28 0.0 2013-02-11
2 2013-02-12 44.81 44.95 44.50 44.62 2373731 A 0.000448 0.16 0.26 0.71 -0.54 0.45 0.0 2013-02-12
3 2013-02-13 44.81 45.24 44.68 44.75 2052338 A 0.002913 0.14 0.08 -0.01 -0.01 -0.04 0.0 2013-02-13
4 2013-02-14 44.72 44.78 44.36 44.58 3826245 A -0.003799 0.11 0.32 0.12 -0.30 -0.21 0.0 2013-02-14

To the human eye, the columns date and dt look identical. That’s because datetime dates still print out nicely as formatted strings.

We can use the .dtypes function that is built-in to pandas DataFrame objects to confirm that dt is indeed a datetime variable.

data.dtypes
date              object
open             float64
high             float64
low              float64
close            float64
volume             int64
Name              object
ret              float64
Mkt              float64
SMB              float64
HML              float64
RMW              float64
CMA              float64
RF               float64
dt        datetime64[ns]
dtype: object

String columns will be reported as object types in pandas.