RSS

Mocking two calls to open()

How to test a function that reads and writes data to the file system.

Some days ago, a coworker asked me how he could test a function which loaded a source text file, transformed the data and then wrote it back to target file.

The function looked something like this:

def convert_file(source_path, target_path):
    with open(source_path, "r") as source_file:
        data = source_file.read()
    
    [processing]

    with open(source_path, "w") as target_file:
        target_file.write(data)

He tried to write a test for it, but got stuck when he tried to mock the file system by mocking the open() function. The function called open() two times (one time to open the source file and one time to open the target file). And in both cases the mock needs to do something different. So the test function looked something like this:

call_counter = 0

@mock.patch("__builtin__.open")
def test_convert_file(mock_open):
    read_mock = MagicMock()
    read_mock.read.return_value = "test"
    write_mock = MagicMock()
    mock_open.side_effect = lambda x: read_mock if counter == 0 else write_mock
    
    convert_file("dummy_path", "other_dummy_path")
    
    write_mock.write.assert_called_with("expected_result")

Solutions

He asked me if I would knew a better way to do that.

There are two options:

  1. Use a temporary directory, placing the source file there and write the target file to this folder
  2. Split the function into IO operations and processing

The first option has several drawbacks. You need to first create the source file and read the target file in to assert correct writing. Also, you need to cleanup this temporary directory (that’s something your test framework could do for you, for pytest see here). But the function is still more coupled to the file system than it has to. So let’s take a look at the second option.

If we take a closer look at what the function does, we can see two things happening:

  • IO (Reading source file data as string, Writing data as string to target file).
  • Converting the data

We know that every function should only do one thing and should do this well. So why not split the function into two functions, one for handling IO and the other for converting the data?

The functions could look something like this:

def convert_file(source_path, target_path):
    with open(source_path, "r") as source_file:
        data = source_file.read()
    
    converted_data = convert(data)

    with open(source_path, "w") as target_file:
        target_file.write(converted_data)

def convert(data):
    
    [processing]

    return converted_data

convert_file is responsable for IO and reads the data as a string. This string is passed to the convert function, which processes the data and returns the converted data as a string. convert_file takes this string data and writes it to the target file. Testing the convert function is now really easy, because we can just pass a string into the function and assert that the correct string was returned:

def test_convert():
    result = convert("test_data")

    assert "expected_result" == result

Since convert_file is now pretty simple and contains no logic, we could also say “Too simple to fail” and don’t write a test for it. If we want to write a test to prove the IO is done correctly, we can split this function into another function:

def convert_file(source_path, target_path):
    with open(source_path, "r") as source_file:
        with open(source_path, "w") as target_file:
            return convert_file_like(source_file, target_file):

def convert_file_like(source_file, target_file):
    data = source_file.read()
    converted_data = convert(data)
    target_file.write(converted_data)

def convert(data):
    
    [processing]

    return converted_data

convert_file is now too simple to test. To test convert_file_like, we can use StringIO. StringIO enables us to have an object behave like a text file, but it’s all in memory. We can use strings as input data and get the data written to the StringIO instance as string. A test for convert_file_like would look like this:

import io

def test_convert_file_like():
    source_file_like = io.StringIO("test_data")
    target_file_like = io.StringIO()
    
    convert_file_like(source_file_like, target_file_like)
    
    assert "expected_data" == target_file_like.read()

Conclusion

To answer the question “How to mock two calls to open()?": You don’t have to mock it!

This approach is more simple, because it does not rely on a file system. It looks like we need way more code. It’s true, but the code is cleaner, because it decouples the conversion code from the IO code. So it can be resued much more easily. Imagine you had to build an Http Rest API for the conversion. With this code, you can just take the data as a string from the Http request, pass it to the convert function and write the string result to the Http response.