Strings

Strings are a simple but powerful tool for storing non-numerical information.

String variables store textual data. To tell Python that we want to treat some text as a string, we use either single quotations ' or double quotations ". Either format is a valid approach.

fruit = 'banana'
vegetable = "carrot"

Use of single or double quotes can make nested quotations easier. For instance, what if you want to use an apostrophe inside a string variable? Consider the apostrophe in:

This isn’t an interesting sentence.

In that case, be sure to use double quotes to mark your string.

sent = "This isn't an interesting sentence." # this code will work
sent = 'This isn't an interesting sentence. # this has problems
  File "<ipython-input-4-fcbf08c4a876>", line 1
    sent = 'This isn't an interesting sentence. # this has problems
                     ^
SyntaxError: invalid syntax

Because Python won’t be able to differentiate between the ' character being used as an apostrophe and the ' character being used to signify a single quote, the easiest way to enter the above example into Python is to use double quotes for the string.

Similarly, use single quotes to mark the string if you want to include double quotes inside the string. Consider the example:

Bob said, “Yes, that is a boring sentence.”

In this case, the easiest way to do things is to enter this as a string in Python is to use single quotes.

quote = 'Bob said, "Yes, that is a boring sentence."'

Escape Characters

The trickiest problem with quotations is something like the following example:

Bob said, “Yes, that’s a boring sentence.”

Note that the textual data we want to include here has both double quotes and a single quote inside the data. To enter this text into Python as a string variable, we need to use an escape character.

Escape characters are special commands inside of a string variable that tell Python that the data is something special. Without an escape character inside a string, Python interprets every character literally. For instance, the literal interpretation of n is that it represents the letter “n”. The literal interpretation of ' is that it represents a single quote. We can “escape” from this literal interpretation of a character by prefacing that character with \. The \ symbol tells Python to treat the information following the \ specially. In the example here, \' will tell Python that the ' character is to be treated as part of the data, rather than a potential end for a single quoted string.

quote = 'Bob said, "Yes, that\'s a boring sentence."'
print(quote)
Bob said, "Yes, that's a boring sentence."

There are only a handful of escape characters to remember. The rest can be Googled if you need them. The escape characters worth remembering are:

  • \': tells Python to treat ' as a single quote (apostrophe) inside the string data

  • \": tells Python to treat " as a double quote inside the string data

  • ‘\n’ : tells Python to enter a line break.

  • ‘\t’ : tells Python to enter a tab.

print('Line one of text.\nLine two of text')
Line one of text.
Line two of text
print('\tThis is an indented sentenece.')
	This is an indented sentenece.

Concept check: Set the variable, v, below equal to a string such that the second line evaluates to (aka prints out) the following:

That isn't a smart investment.
You should but Dogecoin instead.
    (just kidding)
v =
print(v)

Characters

A string is a collection of characters. The string 'cat' is comprised of the individual characters 'c', 'a', and 't'. The string 'my\ncat' is made up of the characters 'm', 'y', '\n', 'c', 'a', 't'. Note that the third character in this latter example is an escape character '\n’ rather than an alphanumeric character (the letters 'a'-'z' or numbers '0'-'9') or a symbol (e.g. '!', '@', or '#').

Becuase strings are comprised of characters, it is helpful to know how to access individual characters of those strings. We will work with the examples 'AAPL Stock' and 'F Stock' for a moment. These two strings are items that we may come accross in a written document. Suppose that we need to extract the ticker names from these strings. For instance, if these strings are items that we’ve extracted from a news article or a tweet, then finding the ticker names helps us determine what the subject of the article/tweet is.

The first character of a string is the \(0^{th}\) character in the string. This is a quirk of Python, along with a number of many other programming languages. Python begins counting at zero, rather than at one. Hence, the first character is character \(0\), the second character is character number \(1\), etc. We can select character number \(n\) with square brackets that follow immediately after the string name.

string = 'cat'
print(string[0])
c
print(string[1])
a

Along with selecting a single character with its position in the string, we can choose a subset of characters with a set of sequential numbers. Such numbers are listed with a starting point and ending point. For instance, if we want the numbers \(1\), \(2\), \(3\), then we would indicate the starting point of \(1\) and the ending point of \(4\). Yes, \(4\), not \(3\). The ending point is the number after the last number to be included.

Consequently, 0:2 would select characters number \(0\) and \(1\) because the starting point is \(0\) and the ending point is \(1\).

print(string[0:2])
ca

Similarly, 1:3 selects characters number \(1\) and \(2\) because the starting point is \(1\) and the ending point is \(3\).

print(string[1:3])
at