The File System

We use the file system to create and read files, create folders, check folders for content and move files and folders.

Depending on the file system can lead to issues in the code. For example, code that reads and processes file content gets tightly coupled to the file system, but in some cases, it doesn’t have to. What happens if the data comes from a text field in a GUI? If the code depends on files to process, it has to be adjusted. It makes testing more challenging, because you have to check in test data files into your VCS or write the test data to the file system during the test. As an alternative, the code should get the content to process as a string or as a file-like object. File-like objects provide the same methods as files but are not necessarily stored on disk. See for example StringIO.

Of course, code that deals with the creation of folders can not be decoupled from the file system. This code also needs real files and folders to work.

For automated tests, we got two cases: code that has to rely on the file system and code that can be structured differently to avoid the dependency on the file system.

If we need to rely on the file system, the best way is usually a temporary directory cleaned up after test execution. Using a temporary directory during a test has several benefits:

  1. A temporary directory is always empty; a regular folder can’t guarantee this.
  2. A temporary directory separates the tests from other files and can’t accidentally destroy them.

The test execution does not depend on a server mount where a folder might be located. Test execution does not depend on server mount, might be an interesting consideration if tests should be executed on Build servers, and they should not get server mounts. Pytest has a tmpdir fixture to support this. A fixture is a function used to setup test functions.

See this test to check if a folder was created:

def test_should_create_folder(tmpdir):
    tmp_dir_path = str(tmpdir)
    target_path = os.path.join(tmp_dir_path, 'my-folder')

    os.makedirs(target_path)

    assert os.path.exists(tmp_dir_path)

To get the path to the temporary directory, we convert the tmpdir fixture to a string. After that, we can use all the path methods we are used to. Pytest will clean up the temporary directory when the test executed.

For code that can be structured differently, we can use strings or file-like objects like StringIO. If the code depends on the way the data is read (for example, skipping empty lines a file) or the data is huge, StringIO should be used.

Here is an example with StringIO:

def test_string_io():
    my_json = '{"test": 1}'
    string_io = StringIO(my_json)
    
    data = complicated_processing(string_io)

    assert data['test'] == 1


def complicated_processing(string_io):
    return json.load(string_io)

Here is an example that uses string.

def test_string():
    my_json = '{"test": 1}'

    data = complicated_processing(my_json)

    assert data['test'] == 1


def complicated_processing(string_io):
    return json.loads(string_io)

You should make sure that classes interacting with the file system can be mocked in other classes during tests. Otherwise, these classes also depend on the file system and that can make writing tests very hard, because you always have to deal with the file system. There might be cases for integration tests (LINK), where it’s ok to rely on the file system. But unit tests should try to rely as little on the file system as possible.

Also, it decouples your logic and makes it more reusable. Imagine, data you processed and stored on the file system should now be stored in a database. If your logic is separated, this is easy to do. Otherwise, you might have to refactor a lot first.

Last modified September 20, 2020