File Read and Write in Python

What Is a File?

At its core, a file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. In the end, these byte files are then translated into binary 1 and 0 for easier processing by the computer.

How are files created?

A file is created using a computer software program. For example, to create a text file you would use a text editor, to create an image file you would use an image editor, and to create a document you would use a word processor.

File extensions and file types

There are millions of files used with computers today that are identified either by the file extension or the file’s data. For example, the extensions for image files in Windows are ‘.jpg’, ‘.png’, ‘.gif’, etc. In new versions of Windows, the file extensions are hidden by default. If the file extensions are hidden, you can also get a general understanding of the type of file by looking at the Type column.

Illegal characters for filename

With most operating systems, the below characters are considered illegal and cannot be used. Trying to create a file with these characters in the file name would generate an error or make the file inaccessible.

\ / : * ? ” < > |

Let’s say I wanted to access the tangent-circles file, and my current location was in the same folder as Python-Turtle-EN. In order to access the file, I need to go through the path folder and then the Session-9_File-Read-and-Write folder, finally arriving at the tangent-circles file. The Folder Path is Python-Turtle-EN/Session-9_File-Read-and-Write/. The File Name is tangent-circles. The File Extension is .png. So the full path is Python-Turtle-EN/Session-9_File-Read-and-Write/egg_rings.png.

Now my current location or current working directory (cwd) is in the Session-9_File-Read-and-Write folder of our example folder structure. Instead of referring to the tangent-circles.png by the full path of Python-Turtle-EN/Session-9_File-Read-and-Write/tangent-circles.png, the file can be simply referenced by the file name and extension egg_rings.png.

/
│
├── /Python-Turtle-EN
|   │
|   ├── Session-9_File-Read-and-Write\  ← Your current working directory (cwd) is here
|   │   └── tangent-circles.png  ← Accessing this file
|   │
|   └── hsl_color.txt
|
└── egg_rings.png

But what about hsl_color.txt? How would you access that without using the full path? You can use the special characters double-dot (..) to move one directory up. This means that ../hsl_color.txt will reference the hsl_color.txt file from the directory of Session-9_File-Read-and-Write:

/
│
├── Python-Turtle-EN/  ← Referencing this parent folder
|   │
|   ├── Session-9_File-Read-and-Write/  ← Current working directory (cwd)
|   │   └── tangent-circles.png
|   │
|   └── hsl_color.txt  ← Accessing this file
|
└── egg_rings.png

The double-dot (..) can be chained together to traverse multiple directories above the current directory. For example, to access egg_rings.png from the Session-9_File-Read-and-Write folder, you would use ../../egg_rings.png.

from matplotlib import image
from matplotlib import pyplot as plt
%matplotlib inline
im1 = image.imread('tangent-circles.png')
im2 = image.imread('../black-bear.png')
im3 = image.imread('../../egg-rings.png')
fig, axs = plt.subplots(1,3, figsize=(15, 10))
axs[0].imshow(im1)
axs[1].imshow(im2)
axs[2].imshow(im3)
plt.show()

File Paths

When you access a file on an operating system, a file path is required. The file path is a string that represents the location of a file. It’s broken up into three major parts:

  • Folder Path: the file folder location on the file system where subsequent folders are separated by a forward slash / (Unix) or backslash \ (Windows)
  • File Name: the actual name of the file
  • Extension: the end of the file path pre-pended with a period (.) used to indicate the file type
/
│
├── Python-Turtle-EN/
|   │
│   ├── Session-9_File-Read-and-Write/
│   │   └── tangent-circles.png
│   │
│   └── hsl_color.txt
|
└── egg_rings.png

Line Endings

One problem often encountered when working with file data is the representation of a new line or line ending. The line ending has its roots from back in the Morse Code era, when a specific pro-sign was used to communicate the end of a transmission or the end of a line.

Later, this was standardized for teleprinters by both the International Organization for Standardization (ISO) and the American Standards Association (ASA). ASA standard states that line endings should use the sequence of the Carriage Return (CR or \r) and the Line Feed (LF or \n) characters (CR+LF or \r\n). The ISO standard however allowed for either the CR+LF characters or just the LF character.

Windows uses the CR+LF characters to indicate a new line, while Unix and the newer Mac versions use just the LF character. This can cause some complications when you’re processing files on an operating system that is different than the file’s source. Here’s a quick example. Let’s say that we examine the file hsl_color.txt that was created on a Windows system:

red\r\n
orange\r\n
yellow\r\n
green\r\n
blue\r\n
purple\r\n
pink

Character Encodings

Another common problem that you may face is the encoding of the byte data. An encoding is a translation from byte data to human readable characters. This is typically done by assigning a numerical value to represent a character. The two most common encodings are the ASCII and UNICODE Formats. ASCII can only store 128 characters, while Unicode can contain up to 1,114,112 characters.

ASCII is actually a subset of Unicode (UTF-8), meaning that ASCII and Unicode share the same numerical to character values. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the character. For example, if a file was created using the UTF-8 encoding, and you try to parse it using the ASCII encoding, if there is a character that is outside of those 128 values, then an error will be thrown.

Opening and Closing a File in Python

When you want to work with a file, the first thing to do is to open it. This is done by invoking the open() built-in function. open() has a single required argument that is the path to the file. open() has a single return, the file object:

Opening and Closing a File in Python

When you want to work with a file, the first thing to do is to open it. This is done by invoking the open() built-in function. open() has a single required argument that is the path to the file. open() has a single return, the file object:

reader = open('..\hsl_color.txt')
try:
'Further file processing goes here'
finally:
reader.close()

The second way to close a file is to use the with statement:

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. It’s highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.

with open('..\hsl_color.txt', 'r') as reader:
'Further file processing goes here'

Most likely, you’ll also want to use the second positional argument, mode. This argument is a string that contains multiple characters to represent how you want to open the file. The default and most common is ‘r’, which represents opening the file in read-only mode as a text file:

File open mode:

  • ‘r’: Open for reading (default)
  • ‘w’: Open for writing, truncating (overwriting) the file first
  • ‘a’: Open for appending
  • ‘rb’ or ‘wb’: Open in binary mode (read/write using byte data)

There are two main categories of file objects: text files and binary files. Binary files can further be separated as buffered binary files and raw binary files.

Reading and Writing Opened Files

Once you’ve opened up a file, you’ll want to read or write to the file. First off, let’s cover reading a file. There are multiple methods that can be called on a file object to help you out:

  • .read(size=-1) : This reads from the file based on the number of size bytes. If no argument is passed or None or -1 is passed, then the entire file is read.
  • readline(size=-1): This reads at most size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or None or -1 is passed, then the entire line (or rest of the line) is read.
  • .readlines(): This reads the remaining lines from the file object and returns them as a list.

Here’s an example of how to read 5 bytes of a line each time using the Python .readline() method:

with open('..\hsl_color.txt', 'r') as reader:
# Read & print the first characters of the line 5 times
print(reader.readline())
# Notice that line is greater than the 5 chars and continues
# down the line, reading 5 chars each time until the end of the
# line and then "wraps" around
print(reader.readline())
print(reader.readline())
print(reader.readline())

red
orange
yellow
green

Here’s an example of how to read the entire file as a list using the Python .readlines() method:

f = open('..\hsl_color.txt')
f.readlines() # Returns a list object

['red\n',
 'orange\n',
 'yellow\n',
 'green\n',
 'blue\n',
 'purple\n',
 'pink\n',
 'Black']

The above example can also be done by using list() to create a list out of the file object:

f = open('..\hsl_color.txt')
list(f)

['red\n',
 'orange\n',
 'yellow\n',
 'green\n',
 'blue\n',
 'purple\n',
 'pink\n',
 'Black']

Iterating Over Each Line in the File

A common thing to do while reading a file is to iterate over each line. Here’s an example of how to use the Python .readline() method to perform that iteration:

with open('..\hsl_color.txt', 'r') as reader:
'Read and print the entire file line by line'
line = reader.readline()
while line != '': # The EOF char is an empty string
print(line, end='')
line = reader.readline()

red
orange
yellow
green
blue
purple
pink

Another way you could iterate over each line in the file is to use the Python .readlines() method of the file object. Remember, .readlines() returns a list where each element in the list represents a line in the file:

with open('..\hsl_color.txt', 'r') as reader:
for line in reader.readlines():
print(line, end='')

red
orange
yellow
green
blue
purple
pink

Write FIles

As with reading files, file objects have multiple methods that are useful for writing to a file:

  • .write(string): This writes the string to the file.
  • .writelines(seq): This writes the sequence to the file. No line endings are appended to each sequence item. It’s up to you to add the appropriate line ending(s).

with open('..\hsl_color.txt', 'r') as reader:
'Note: readlines doesn't trim the line endings'
hsl_color = reader.readlines()
with open('..\hsl_color_reversed.txt', 'w') as writer:
'Alternatively you could use'
writer.writelines(reversed(hsl_color))
'Write the hsl_color to the file in reversed order'
for color in reversed(hsl_color):
writer.write(color)

Appending to a File

Sometimes, you may want to append to a file or start writing at the end of an already populated file. This is easily done by using the ‘a’ character for the mode argument:

with open('..\hsl_color.txt', 'a') as a_writer:
a_writer.write('\nWhite\n')