Reading from text files

Here's how to read data from a text file.

To begin with use IDLE to create a small test text file:

.

We can visualize this on disk as,

T h e   f i r s t   l i n e . \n L i n e   2 . \n T h e   t h i r d   a n d   l a s t   l i n e . \n EOF

The only new item here are the \n characters These denote newline characters which like EOF characters vary between operating sytems. (Don't worry the Python you installed was compiled to know what they are on your system).

There are three common ways of reading a text file like this in:

  1. One line at a time, processing lines as we go,
  2. The whole file into a string,
  3. The whole file into a list of strings, one entry per line.

When possible, option 1. is preferred since it is the most memory efficient because it holds only a bit of the file (one line) in memory at any given time.

Here's the Python code for each of these options.

One line at a time:

    # file_read_1.py
    f = open('text_file.txt', 'r') # Open the file.
    for line in f:                 # Iterate through the file a line at a time.
        print(line, end="")         # Process the current line.
    f.close()                      # Close the file.

or with a while loop...

    # file_read_1.py
    f = open('text_file.txt', 'r') # Open the file.
    line = f.readline()            # Get the first line.
    while line != "":              # Iterate through the file a line at a time.
        print(line, end="")        # Process the current line.
        line = f.readline()
    f.close()                      # Close the file.

Notes:

>>> 
The first line.
Line 2.
The third and last line.
>>>

The whole file into a string:

    # file_read_2.py
    f = open('text_file.txt', 'r')
    s = f.read()
    print('s is', len(s), 'characters long.')
    print(s)
    f.close()
>>> 
s is 49 characters long.
The first line.
Line 2.
The third and last line.
>>> 

The whole file into a list of strings,

    # file_read_3.py
    f = open('text_file.txt', 'r')
    lines = f.readlines()
    print(lines)
    f.close()

Output:

>>> 
['The first line.\n', 'Line 2.\n', 'The third and last line.\n']
>>>

Note that the newline character is included. If you don't want it you need to strip it yourself.