3.13 Reading in Data
When working in Python, we often want to import data from an external file or, conversely, write our data to a file. To interact with external data objects, we use the open()
function. This function takes two arguments: the name
of the file we want to work with, and the mode
that we want to interact with this file.
The common modes we use are:
r
- Read. This allows us to import data from an external file into python.w
- Write. This allows us to write our data to a file. If a file with this name already exists, operating inw
mode will overwrite existing file contents.a
- Append. This allows us to write our data to a file. If a file with this name already exists, operating ina
mode will add to existing file contents.
To open a file with the name insects.txt
, for reading in file contents, we would therefore use the syntax:
= open('insects.txt', 'r') f
3.13.1 Parsing a file
If we try to print()
f, we get the following output: <_io.TextIOWrapper name='insects.txt' mode='r' encoding='UTF-8'>
This is because we have opened the file and saved it as a variable, but we haven’t actually read through and manipulated the data which it contains. We have two main ways of doing this in base Python:
3.13.1.1 1. Parsing with a for
loop
One way to look through a file’s contents line by line is to use a for
loop. We can loop through a file with the syntax:
for line in f:
print(line)
## Hercules beetle
##
## Swallowtail
##
## Ornate mantis
##
## Weevil
##
## Pine chaffer
As it turns out, at the end of each line, there is a special end of line character, \n
. To just read in the data without the return character, we can use .strip()
:
for line in f:
print(line.strip('\n'))
3.13.1.2 2. Parsing with readlines()
readlines()
is a method that allows us to read through the entire file all at once, returning file contents as a list:
## ['Hercules beetle\n', 'Swallowtail\n', 'Ornate mantis\n', 'Weevil\n', 'Pine chaffer\n']
This is more concise than a for loop, but all lines are read in without the manipulations that we can perform line-by-line in our loop.
3.13.1.3 Data Types
When we read in data, each line is stored as a string. If we want the interpreter to known that our data is numeric, we need to convert in manually. For example, consider a file with the following contents:
4
12.2
-9.854
Let’s read in this data and examine its type:
= open('numbers.txt')
f = []
fileContents
for l in f:
fileContents.append(l.strip())
print(fileContents)
## ['4', '12.2', '-9.854']
print(type(fileContents[1]))
## <class 'str'>
If we want, to do mathematics using these values, we will get an error. We need to convert to an integer or float to operate on them mathematically:
= open('numbers.txt')
f = []
fileContents
for l in f:
float(l.strip()))
fileContents.append(
print(fileContents)
## [4.0, 12.2, -9.854]
print(type(fileContents[1]))
## <class 'float'>
3.13.1.4 Closing files
We can close a file with the .close()
method. To close insects.txt
(previously saved as the variable f
), we would run:
Once the file is closed, we cannot read it or write to it without opening it again.