3.13 Reading in Data

When working in Python, we often want to import data from an external file or, conversely, write our data to a file. To interact with external data objects, we use the open() function. This function takes two arguments: the name of the file we want to work with, and the mode that we want to interact with this file.

The common modes we use are:

  • r - Read. This allows us to import data from an external file into python.
  • w - Write. This allows us to write our data to a file. If a file with this name already exists, operating in w mode will overwrite existing file contents.
  • a - Append. This allows us to write our data to a file. If a file with this name already exists, operating in a mode will add to existing file contents.

To open a file with the name insects.txt, for reading in file contents, we would therefore use the syntax:

f = open('insects.txt', 'r')

3.13.1 Parsing a file

If we try to print() f, we get the following output: <_io.TextIOWrapper name='insects.txt' mode='r' encoding='UTF-8'>

This is because we have opened the file and saved it as a variable, but we haven’t actually read through and manipulated the data which it contains. We have two main ways of doing this in base Python:

3.13.1.1 1. Parsing with a for loop

One way to look through a file’s contents line by line is to use a for loop. We can loop through a file with the syntax:

for line in f:
  print(line)
## Hercules beetle
## 
## Swallowtail
## 
## Ornate mantis
## 
## Weevil
## 
## Pine chaffer

As it turns out, at the end of each line, there is a special end of line character, \n. To just read in the data without the return character, we can use .strip():

for line in f:
  print(line.strip('\n'))

3.13.1.2 2. Parsing with readlines()

readlines() is a method that allows us to read through the entire file all at once, returning file contents as a list:

## ['Hercules beetle\n', 'Swallowtail\n', 'Ornate mantis\n', 'Weevil\n', 'Pine chaffer\n']

This is more concise than a for loop, but all lines are read in without the manipulations that we can perform line-by-line in our loop.

3.13.1.3 Data Types

When we read in data, each line is stored as a string. If we want the interpreter to known that our data is numeric, we need to convert in manually. For example, consider a file with the following contents:

4
12.2
-9.854

Let’s read in this data and examine its type:

f = open('numbers.txt')
fileContents = []

for l in f:
  fileContents.append(l.strip())
  
print(fileContents)
## ['4', '12.2', '-9.854']
print(type(fileContents[1]))
## <class 'str'>

If we want, to do mathematics using these values, we will get an error. We need to convert to an integer or float to operate on them mathematically:

f = open('numbers.txt')
fileContents = []

for l in f:
  fileContents.append(float(l.strip()))
  
print(fileContents)
## [4.0, 12.2, -9.854]
print(type(fileContents[1]))
## <class 'float'>

3.13.1.4 Closing files

We can close a file with the .close() method. To close insects.txt (previously saved as the variable f), we would run:

Once the file is closed, we cannot read it or write to it without opening it again.