Databases

Note: This section requires general knowledge of databases!

Databases are necessary to store and query all data related to visual effects pipelines. Examples of databases are MySQL or PostgreSQL. Shotgun and ftrack are very common in visual effects. They are also considered databases in this section.

Databases can lead to various issues in the code and during tests. Code interacting with databases should not be spread everywhere; it has to be encapsulated. Otherwise, it’s hard to change the database later, because everywhere in the code adjustments have to be done. If the database code is concentrated in some classes, exchanging it is straigtforward. If the database code is spread everywhere, it’s also hard to test it because always a database is needed. But if the database is encapsulated, only tests for database classes need a connection to a database. Testing with a database connection also provides challenges:

Database schema: The database needs to have the correct schema. If you run your tests against a shared database, the schema might be incorrect. This might often be the case when using a shared database for tests.
Cleaning up the database: If you insert into the database during tests, you should remove the data afterward.
It’s unknown what is already stored in the database. Data already stored in the database might impact the tests, for example, if you try to insert a project with the same name again, but project names have to be unique.
If the code relies on query results, this data has to be inserted upfront.
Running tests against a database might be slow.

For this reason, we want to encapsulate the database dependency to a few classes. We will use the DAO design pattern for that.

DAO Design Pattern.

The DAO design pattern is a common solution to encapsulate the database dependency. DAO is an acronym for Data Access Object. It consists of three classes:

TransferObject
DataAccessObject
DAOFactory (optional)

The TransferObject represents the data to be stored in the database. It contains no logic, just the plain data. It will be used everywhere in the code to represent a the data. Here is an example of a TransferObject:

class Project(object):
    
    def __init__(self, project_id, name):
        self.project_id = project_id
        self.name = name

The DataAccessObject implements the actual database access. The DataAccessObject uses the TransferObject for parameters and results. A DataAccessObject usually implements an abstract interface. The implementation of a DataAccessObject is specific to a database. So there might be one DataAccessObject for MySQL and one for PostgreSQL.

The DAOFactory is not strictly necessary. It’s based around the factory design pattern and is responsible for instantiating the correct DataAccessObject implementation. It encapsulates the instantiating in one method and is the only location in the code where DataAccessObjects are instantiated. All code uses the DAOFactory to get new DataAccessObjects instances. Since DataAccessObjects are only created in one place, it’s easy to switch the DataAccessObject implementation to a different database. You could even decide at runtime which implementation should be used.

Example: DAO Design Pattern with SQLite

Let’s take a look at an example implementation for SQLite. To avoid boilerplate code, we will use the attrs framework for the DataTransferObject. The attrs framework generates for you a constructor, a nice human-readable __repr__ and comparison methods (equality and ordering).

from collections import Iterable

import attr
from typing import Optional

# TransferObject
class Project(object):
    
    def __init__(self, project_id, name):
        self.project_id = project_id
        self.name = name

# abstract DataAccessObject
class ProjectDao(object):

    def save(self, project):
        # type: (Project) -> Project
        raise NotImplementedError()

    def find_by_id(self, id_):
        # type: (int) -> Project
        raise NotImplementedError()

    def find_all(self):
        # type: () -> Iterable[Project]
        raise NotImplementedError()

    def exists_by_name(self, name):
        # type: (str) -> bool
        raise NotImplementedError()

# DataAccessObject implementation specifically to SQLite
class ProjectDaoSqliteImplementation(ProjectDao):

    def __init__(self, connection):
        self._connection = connection

    def save(self, project):
        cursor = self._connection.cursor()
                
        cursor.execute("INSERT INTO projects VALUES (NULL, ?)", (project.name,))
                
        self._connection.commit()
        
        return Project(id=cursor.lastrowid, name=project.name)

    def find_by_id(self, id_):
        # type: (int) -> Optional[Project]
        cursor = self._connection.cursor()

        cursor.execute('SELECT id, name FROM projects WHERE id=?', (id,))
        row = cursor.fetchone()
        if row:
            return self._map_row(row)

    def _map_row(self, row):
        project_id, project_name = row
        return Project(id=project_id, name=project_name)

    def find_all(self):
        # type: () -> Iterable[Project]
        cursor = self._connection.cursor()

        cursor.execute('SELECT id, name FROM projects')
        rows = cursor.fetchall()
        return [self._map_row(row) for row in rows]

    def exists_by_name(self, name):
        # type: (str) -> bool
        cursor = self._connection.cursor()

        cursor.execute('SELECT id, name FROM projects WHERE name=?', (name,))
        row = cursor.fetchone()
        if row:
            return True
        return False

# DAOFactory
class ProjectDaoFactory(object):

    def create():
        # type: () -> ProjectDao
        connection = sqlite3.connect(':memory:')
        return ProjectDaoSqliteImplementation(connection)

This example shows how to store and query projects to the database using the DAO pattern. Our TransferObject is the Project class, ProjectDao defines the abstract interface which is implemented specifically for SQLite by ProjectDaoSqliteImplementation. Instances of a ProjectDao are created by the ProjectDaoFactory, the DAOFactory in this example. It’s also possible to use a configuration class to get the SQLite connection string.

Tests using the DAO pattern:

Tests for DAO classes

Tests for DAO classes have to interact with the database. In this example, we will use SQLite as an in-memory database:

import sqlite3

import mock
import pytest

from dao_pattern_simple.dao_pattern_simple import (
    ProjectDaoSqliteImplementation,
    Project,
    ProjectCreator,
)


@pytest.fixture
def connection():
    conn = sqlite3.connect(':memory:')
    c = conn.cursor()

    c.execute("CREATE TABLE projects(id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)")

    conn.commit()
    return conn


def test_should_save_and_populate_id(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)
    project = Project(id=0, name="test")

    project = project_dao.save(project)

    assert project.id


def test_should_save_and_find_saved_project_by_id(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)
    project = Project(id=0, name="test")

    project = project_dao.save(project)

    assert project_dao.find_by_id(project.id).name == "test"


def test_find_should_return_none(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)

    project = project_dao.find_by_id(1)

    assert not project


def test_should_find_all(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)
    project_dao.save(Project(id=0, name="test1"))
    project_dao.save(Project(id=0, name="test2"))
    project_dao.save(Project(id=0, name="test3"))

    projects = project_dao.find_all()

    assert len(list(projects)) == 3


def test_exists_by_name_should_return_true(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)
    project_dao.save(Project(id=0, name="test1"))

    assert project_dao.exists_by_name("test1")


def test_exists_by_name_should_return_false(connection):
    project_dao = ProjectDaoSqliteImplementation(connection)
    project_dao.save(Project(id=0, name="test1"))

    assert not project_dao.exists_by_name("test2")


def test_project_creator_should_save():
    project_dao = mock.MagicMock()
    project_dao.exists_by_name.return_value = False
    project_creator = ProjectCreator(project_dao)

    project_creator.create_project("test")

    project_dao.save.assert_called_with(Project(id=0, name="test"))


def test_project_creator_should_not_save_and_raise_value_error():
    project_dao = mock.MagicMock()
    project_dao.exists_by_name.return_value = True
    project_creator = ProjectCreator(project_dao)

    with pytest.raises(ValueError):
        project_creator.create_project("test")

    project_dao.save.assert_not_called()

We use a fixture to create a new database instance for each test, so we ensure the database is always clean.

In-memory databases have the advantage that they run in-memory on the local machine, so they are fast. After each test, they can be thrown away, so the database is always in a known state. They are really nice for tests. But in-memory databases also got disadvantages. Most databases can’t create a database in memory, for example, MySQL and PostgreSQL can not. We can still use an in-memory database in our tests and run a different database in production. But this can lead to issues since not all databases behave the same and have small differences in their SQL implementations. Also, each database has some proprietary SQL extensions which don’t work in other databases. So if you use a MySQL specific feature in your code, running tests against a SQLite in-memory database won’t work.

There are some alternatives to using in-memory databases during tests: Use an embedded version of the database as an alternative. For some databases exist embedded versions that can start an instance of the database completly isolated, which can be used during tests. This has the same benefits as using an in-memory database. An alternative is TestContainers. This project allows us to use any Docker container during tests in a reproducable way. If you use Shotgun, take a look at mockgun.

Tests for classes using DAOs

In tests for classes using DAOs we can simply mock DAOs instances. We know that the DAOs work correctly, because they have their own tests running against the database. If we mock the DAOs, we can write tests that don’t depend at all on the database. Let’s take a look at an example. The class we want to test is a simple class creating projects making sure their names are unique:

class ProjectCreator(object):
    _project_dao = None  # type: ProjectDao

    def __init__(self, project_dao):
        self._project_dao = project_dao

    def create_project(self, name):
        if self._project_dao.exists_by_name(name):
            raise ValueError("Project with name {} already exists!".format(name))

        project = Project(id=0, name=name)
        project = self._project_dao.save(project)
        print("Saved project {}".format(project))

Since the ProjectDao is passed in using Dependency Injection, we can exchange it during the tests with a mock:

def test_project_creator_should_save():
    project_dao = mock.MagicMock()
    project_dao.exists_by_name.return_value = False
    project_creator = ProjectCreator(project_dao)

    project_creator.create_project('test')

    project_dao.save.assert_called_with(Project(id=0, name='test'))


def test_project_creator_should_not_save_and_raise_value_error():
    project_dao = mock.MagicMock()
    project_dao.exists_by_name.return_value = True
    project_creator = ProjectCreator(project_dao)

    with pytest.raises(ValueError):
        project_creator.create_project('test')

    project_dao.save.assert_not_called()

Last modified September 20, 2020