Zict: Composable Mutable Mappings

The dictionary / mutable mapping interface is powerful and multi-faceted.

  • We store data in different locations such as in-memory, on disk, in archive files, etc..

  • We manage old data with different policies like LRU, random eviction, etc..

  • We might encode or transform data as it arrives or departs the dictionary through compression, encoding, etc..

To this end we build abstract MutableMapping classes that consume and build on other MutableMappings. We can compose several of these with each other to form intuitive interfaces over complex storage systems policies.

Example

In the following example we create an LRU dictionary backed by pickle-encoded, zlib-compressed, directory of files.

import pickle
import zlib

from zict import File, Func, LRU

a = File('mydir/')
b = Func(zlib.compress, zlib.decompress, a)
c = Func(pickle.dumps, pickle.loads, b)
d = LRU(100, c)

>>> d['x'] = [1, 2, 3]
>>> d['x']
[1, 2, 3]

Thread-safety

Most classes in this library are thread-safe. Refer to the documentation of the individual mappings for exceptions.

API

zict defines the following MutableMappings:

class zict.Buffer(fast: MutableMapping[KT, VT], slow: MutableMapping[KT, VT], n: float, weight: Callable[[KT, VT], float] = <function Buffer.<lambda>>, fast_to_slow_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, slow_to_fast_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None)[source]

Buffer one dictionary on top of another

This creates a MutableMapping by combining two MutableMappings, one that feeds into the other when it overflows, based on an LRU mechanism. When the first evicts elements these get placed into the second. When an item is retrieved from the second it is placed back into the first.

Parameters:
fast: MutableMapping
slow: MutableMapping
n: float

Number of elements to keep, or total weight if weight is used.

weight: f(k, v) -> float, optional

Weight of each key/value pair (default: 1)

fast_to_slow_callbacks: list of callables

These functions run every time data moves from the fast to the slow mapping. They take two arguments, a key and a value. If an exception occurs during a fast_to_slow_callbacks (e.g a callback tried storing to disk and raised a disk full error) the key will remain in the LRU.

slow_to_fast_callbacks: list of callables

These functions run every time data moves form the slow to the fast mapping.

See also

LRU

Notes

If you call methods of this class from multiple threads, access will be fast as long as all methods of fast, plus slow.__contains__ and slow.__delitem__, are fast. slow.__getitem__, slow.__setitem__ and callbacks are not protected by locks.

Examples

>>> fast = {}
>>> slow = Func(dumps, loads, File('storage/'))  
>>> def weight(k, v):
...     return sys.getsizeof(v)
>>> buff = Buffer(fast, slow, 1e8, weight=weight)  
close() None[source]

Release any system resources held by this object

evict_until_below_target(n: float | None = None) None[source]

Wrapper around zict.LRU.evict_until_below_target(). Presented here to allow easier overriding.

items() a set-like object providing a view on D's items[source]
property n: float

Maximum weight in the fast mapping before eviction happens. Can be updated; this won’t trigger eviction by itself; you should call evict_until_below_target() afterwards.

property offset: float

Offset to add to the total weight in the fast buffer to determine when eviction happens. Note that increasing offset is not the same as decreasing n, as the latter also changes what keys qualify as “heavy” and should not be stored in fast.

Always starts at zero and can be updated; this won’t trigger eviction by itself; you should call evict_until_below_target() afterwards.

set_noevict(key: KT, value: VT) None[source]

Variant of __setitem__ that does not move keys from fast to slow if the total weight exceeds n

values() an object providing a view on D's values[source]
class zict.AsyncBuffer(fast: MutableMapping[KT, VT], slow: MutableMapping[KT, VT], n: float, weight: Callable[[KT, VT], float] = <function Buffer.<lambda>>, fast_to_slow_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, slow_to_fast_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None)[source]

Extension of Buffer that allows offloading all reads and writes from/to slow to a separate worker thread.

This requires fast to be fully thread-safe (e.g. a plain dict).

slow.__setitem__ and slow.__getitem__ will be called from the offloaded thread, while all of its other methods (including, notably for the purpose of thread-safety consideration, __contains__ and __delitem__) will be called from the main thread.

Parameters:
Same as in Buffer, plus:
executor: concurrent.futures.Executor, optional

An Executor instance to use for offloading. It must not pickle/unpickle. Defaults to an internal ThreadPoolExecutor.

nthreads: int, optional

Number of offloaded threads to run in parallel. Defaults to 1. Mutually exclusive with executor parameter.

See also

Buffer
async_evict_until_below_target(n: float | None = None) None[source]

If the total weight exceeds n, asynchronously start moving keys from fast to slow in a worker thread.

async_get(keys: Collection[KT], missing: Literal['raise', 'omit'] = 'raise') Future[dict[KT, VT]][source]

Fetch one or more key/value pairs. If not all keys are available in fast, offload to a worker thread moving keys from slow to fast, as well as possibly moving older keys from fast to slow.

Parameters:
keys:

collection of zero or more keys to get

missing: raise or omit, optional
raise (default)

If any key is missing, raise KeyError.

omit

If a key is missing, return a dict with less keys than those requested.

Notes

All keys may be present when you call async_get, but __delitem__ may be called on one of them before the actual data is fetched. __setitem__ also internally calls __delitem__ in a non-atomic way, so you may get KeyError when updating a value too.

close() None[source]

Release any system resources held by this object

class zict.Cache(data: MutableMapping[KT, VT], cache: MutableMapping[KT, VT], update_on_set: bool = True)[source]

Transparent write-through cache around a MutableMapping with an expensive __getitem__ method.

Parameters:
data: MutableMapping

Persistent, slow to read mapping to be cached

cache: MutableMapping

Fast cache for reads from data. This mapping may lose keys on its own; e.g. it could be a LRU.

update_on_set: bool, optional

If True (default), the cache will be updated both when writing and reading. If False, update the cache when reading, but just invalidate it when writing.

Notes

If you call methods of this class from multiple threads, access will be fast as long as all methods of cache, plus data.__delitem__, are fast. Other methods of data are not protected by locks.

Examples

Keep the latest 100 accessed values in memory >>> from zict import Cache, File, LRU, WeakValueMapping >>> d = Cache(File(‘myfile’), LRU(100, {})) # doctest: +SKIP

Read data from disk every time, unless it was previously accessed and it’s still in use somewhere else in the application >>> d = Cache(File(‘myfile’), WeakValueMapping()) # doctest: +SKIP

close() None[source]

Release any system resources held by this object

class zict.File(directory: str | pathlib.Path, memmap: bool = False)[source]

Mutable Mapping interface to a directory

Keys must be strings, values must be buffers

Note this shouldn’t be used for interprocess persistence, as keys are cached in memory.

Parameters:
directory: str

Directory to write to. If it already exists, existing files will be imported as mapping elements. If it doesn’t exists, it will be created.

memmap: bool (optional)

If True, use mmap for reading. Defaults to False.

Notes

If you call methods of this class from multiple threads, access will be fast as long as atomic disk access such as open, os.fstat, and os.remove is fast. This is not always the case, e.g. in case of slow network mounts or spun-down magnetic drives. Bytes read/write in the files is not protected by locks; this could cause failures on Windows, NFS, and in general whenever it’s not OK to delete a file while there are file descriptors open on it.

Examples

>>> z = File('myfile')  
>>> z['x'] = b'123'  
>>> z['x']  
b'123'

Also supports writing lists of bytes objects

>>> z['y'] = [b'123', b'4567']  
>>> z['y']  
b'1234567'

Or anything that can be used with file.write, like a memoryview

>>> z['data'] = np.ones(5).data  
class zict.Func(dump: Callable[[VT], WT], load: Callable[[WT], VT], d: MutableMapping[KT, WT])[source]

Translate the values of a MutableMapping with a pair of input/output functions

Parameters:
dump: callable

Function to call on value as we set it into the mapping

load: callable

Function to call on value as we pull it from the mapping

d: MutableMapping

See also

KeyMap

Examples

>>> def double(x):
...     return x * 2
>>> def halve(x):
...     return x / 2
>>> d = {}
>>> f = Func(double, halve, d)
>>> f['x'] = 10
>>> d
{'x': 20}
>>> f['x']
10.0
close() None[source]

Release any system resources held by this object

class zict.KeyMap(fn: Callable[[KT], JT], d: MutableMapping[JT, VT])[source]

Translate the keys of a MutableMapping with a pair of input/output functions

Parameters:
fn: callable

Function to call on a key of the KeyMap to transform it to a key of the wrapped mapping. It must be pure (if called twice on the same key it must return the same result) and it must not generate collisions. In other words, fn(a) == fn(b) iff a == b.

d: MutableMapping

Wrapped mapping

See also

Func

Examples

Use any python object as keys of a File, instead of just strings, as long as their str representation is unique:

>>> from zict import File
>>> z = KeyMap(str, File("myfile"))  
>>> z[1] = 10  
close() None[source]

Release any system resources held by this object

class zict.LMDB(directory: str | pathlib.Path, map_size: int | None = None)[source]

Mutable Mapping interface to a LMDB database.

Keys must be strings, values must be bytes

Parameters:
directory: str
map_size: int

On Linux and MacOS, maximum size of the database file on disk. Defaults to 1 TiB on 64 bit systems and 1 GiB on 32 bit ones.

On Windows, preallocated total size of the database file on disk. Defaults to 10 MiB to encourage explicitly setting it.

Notes

None of this class is thread-safe - not even normally trivial methods such as __len__ `` or ``__contains__.

Examples

>>> z = LMDB('/tmp/somedir/')  
>>> z['x'] = b'123'  
>>> z['x']  
b'123'
close() None[source]

Release any system resources held by this object

items() a set-like object providing a view on D's items[source]
values() an object providing a view on D's values[source]
class zict.LRU(n: float, d: MutableMapping[KT, VT], *, on_evict: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, on_cancel_evict: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, weight: Callable[[KT, VT], float] = <function LRU.<lambda>>)[source]

Evict Least Recently Used Elements.

Parameters:
n: int or float

Number of elements to keep, or total weight if weight is used. Any individual key that is heavier than n will be automatically evicted as soon as it is inserted.

It can be updated after initialization. See also: offset attribute.

d: MutableMapping

Dict-like in which to hold elements. There are no expectations on its internal ordering. Iteration on the LRU follows the order of the underlying mapping.

on_evict: callable or list of callables

Function:: k, v -> action to call on key/value pairs prior to eviction If an exception occurs during an on_evict callback (e.g a callback tried storing to disk and raised a disk full error) the key will remain in the LRU.

on_cancel_evict: callable or list of callables

Function:: k, v -> action to call on key/value pairs if they’re deleted or updated from a thread while the on_evict callables are being executed in another. If you’re not accessing the LRU from multiple threads, ignore this parameter.

weight: callable

Function:: k, v -> number to determine the size of keeping the item in the mapping. Defaults to (k, v) -> 1

Notes

If you call methods of this class from multiple threads, access will be fast as long as all methods of d are fast. Callbacks are not protected by locks and can be arbitrarily slow.

Examples

>>> lru = LRU(2, {}, on_evict=lambda k, v: print("Lost", k, v))
>>> lru['x'] = 1
>>> lru['y'] = 2
>>> lru['z'] = 3
Lost x 1
close() None[source]

Release any system resources held by this object

evict(key: KT | NoDefault = NoDefault.nodefault) tuple[KT, VT, float] | tuple[None, None, float][source]

Evict least recently used key, or least recently inserted key with individual weight > n, if any. You may also evict a specific key.

This is typically called from internal use, but can be externally triggered as well.

Returns:
Tuple of (key, value, weight)
Or (None, None, 0) if the key that was being evicted was updated or deleted from
another thread while the on_evict callbacks were being executed. This outcome is
only possible in multithreaded access.
evict_until_below_target(n: float | None = None) None[source]

Evict key/value pairs until the total weight falls below n

Parameters:
n: float, optional

Total weight threshold to achieve. Defaults to self.n.

get_all_or_nothing(keys: Collection[KT]) dict[KT, VT][source]

If all keys exist in the LRU, update their FIFO priority and return their values; this would be the same as {k: lru[k] for k in keys}. If any keys are missing, however, raise KeyError for the first one missing and do not bring any of the available keys to the top of the LRU.

items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
n: float

Maximum weight before eviction is triggered, as set during initialization. Updating this attribute doesn’t trigger eviction by itself; you should call evict_until_below_target() explicitly afterwards.

offset: float

Offset to add to total_weight to determine if key/value pairs should be evicted. It always starts at zero and can be updated afterwards. Updating this attribute doesn’t trigger eviction by itself; you should call evict_until_below_target() explicitly afterwards. Increasing offset is not the same as reducing n, as the latter will also reduce the threshold below which a value is considered “heavy” and qualifies for immediate eviction.

set_noevict(key: KT, value: VT) None[source]

Variant of __setitem__ that does not evict if the total weight exceeds n. Unlike __setitem__, this method does not depend on the on_evict functions to be thread-safe for its own thread-safety. It also is not prone to re-raising exceptions from the on_evict callbacks.

values() an object providing a view on D's values[source]
class zict.Sieve(mappings: Mapping[MKT, MutableMapping[KT, VT]], selector: Callable[[KT, VT], MKT])[source]

Store values in different mappings based on a selector’s output.

This creates a MutableMapping combining several underlying MutableMappings for storage. Items are dispatched based on a selector function provided by the user.

Parameters:
mappings: dict of {mapping key: MutableMapping}
selector: callable (key, value) -> mapping key

Notes

If you call methods of this class from multiple threads, access will be fast as long as the __contains__ and __delitem__ methods of all underlying mappins are fast. __getitem__ and __setitem__ methods of the underlying mappings are not protected by locks.

Examples

>>> small = {}
>>> large = DataBase()                        
>>> mappings = {True: small, False: large}    
>>> def is_small(key, value):                 
...     return sys.getsizeof(value) < 10000   
>>> d = Sieve(mappings, is_small)             
close() None[source]

Release any system resources held by this object

class zict.Zip(filename: str, mode: Literal['r', 'w', 'x', 'a'] = 'a')[source]

Mutable Mapping interface to a Zip file

Keys must be strings, values must be bytes

Parameters:
filename: string
mode: string, (‘r’, ‘w’, ‘a’), defaults to ‘a’

Notes

None of this class is thread-safe - not even normally trivial methods such as __len__ `` or ``__contains__.

Examples

>>> z = Zip('myfile.zip')  
>>> z['x'] = b'123'  
>>> z['x']  
b'123'
>>> z.flush()  # flush and write metadata to disk  

Additionally, zict makes available the following general-purpose objects:

class zict.InsertionSortedSet(other: Iterable[T] = ())[source]

A set-like that retains insertion order, like a dict. Thread-safe.

Equality does not compare order or class, but only compares against the contents of any other set-like, coherently with dict and the AbstractSet design.

add(value: T) None[source]

Add element to the set. If the element is already in the set, retain original insertion order.

clear() None[source]

This is slow (creates N new iterators!) but effective.

discard(value: T) None[source]

Remove an element. Do not raise an exception if absent.

pop() T

Pop the latest-inserted key from the set

popleft() T[source]

Pop the oldest-inserted key from the set

popright() T[source]

Pop the latest-inserted key from the set

remove(value: T) None[source]

Remove an element. If not a member, raise a KeyError.

class zict.WeakValueMapping(other=(), /, **kw)[source]

Variant of weakref.WeakValueDictionary which silently ignores objects that can’t be referenced by a weakref.ref

Changelog

Release notes can be found here.