Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Functools: Caching

It's an interesting assortment of utilities that alters functions or methods in some way.

functools.cache

This has a memorizing power, that's why it's so good. It basically keeps some data allowing quicker lookup.


from functools import cache


@cache
def factorial(n):
    return n * factorial(n - 1) if n else 1


print(factorial(10))
# 3628800 
# new lookups: recursive calls.

print(factorial(5))
# 120 just looks up cached value result

print(factorial(12))
# 479001600 makes some new recursive call, and add new cached value result

functools.cached_property

This would be much slower if evaluated without cache. With this simple change, you can dramatically improve performance for expensive operations.

This computes once, stashes it and then reuses it.


from functools import cached_property


class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def sum(self):
        return (sum(self._data))


ds = DataSet([1, 2, 3, 4, 5])
print(ds.sum)   # computes statistics.stdev once
print(ds.sum)   # instant lookup, no recomputation. It won't reevaluate.


del ds.sum

ds._data.append(7)
print(ds.sum)   # needs manual deletion, otherwise it will always give the same result

Example projects where you could use this: database connection and file processing.

Database Connection


import sqlite3
from functools import cached_property

class DB:
    def __init__(self, db_path):
        self._path = db_path

    @cached_property
    def conn(self):
        return sqlite3.connect(self._path)

    def get_conn(self):
        """Return a working connection, refresh if dead."""
        try:
            # Quick sanity check: lightweight query
            self.conn.execute("SELECT 1;")
            return self.conn
        except sqlite3.Error:
            # Connection is stale -> reset cache
            if "conn" in self.__dict__:
                del self.__dict__["conn"]
            return self.conn  # triggers re-connect

JSON file processing


import json
from functools import cached_property

class Config:
    def __init__(self, path):
        self._path = path

    @cached_property
    def data(self):
        with open(self._path, "r") as f:
            return json.load(f)

functools.lru_cache

Yet another function decorator, this one is more flexible than the default cache because you can set a max size or enable type hints (so it stores Ints and Doubles separately, for example).

lru means Least Recently Used, which means that when faced with new results, it will push out the oldest results if maxsize was reached.

To not use in functions where there's side effects, with mutable objects on each call, or impure functions such as time() or random(). You should only use it when you want to reuse previously computed values.

To help you tune the maxsize of the cache, this comes with a function called cache_info(). It's a good idea to tune it so that your cache does not grow without bound on long-running processes.

CacheInfo(hits=1, misses=3, maxsize=2, currsize=2)

What the terms in cache_info() means:

  • misses: Number of times your function had to run, because it wasn't cached yet.
  • hits: Number of times Python found the result already in the cache and just handed it back.
  • maxsize: How many results you're allowed to keep (You could set it to None if you want infinite caching, but it's only recommended for small input domains.)
  • currsize: How many results are actually in there right now.

Fetching articles from an API


from functools import lru_cache

@lru_cache(maxsize=2)
def get_article(article_id):
    print(f"Fetching article {article_id} from API...")
    return {"id": article_id, "content": f"Article {article_id}"}

# First two fetches -> real calls
get_article(1)
get_article(2)

# Cache hit for 1
get_article(1)

# New fetch -> kicks out least recently used (2)
get_article(3)

print(get_article.cache_info())

Output:

Fetching article 1 from API...
Fetching article 2 from API...
Fetching article 3 from API...
CacheInfo(hits=1, misses=3, maxsize=2, currsize=2)

In this example cache had room for only 2 articles at once, so it kicked out Article 2 since it was the least recently used.