Zict: Composable Mutable Mappings¶
The dictionary / mutable mapping interface is powerful and multi-faceted.
We store data in different locations such as in-memory, on disk, in archive files, etc..
We manage old data with different policies like LRU, random eviction, etc..
We might encode or transform data as it arrives or departs the dictionary through compression, encoding, etc..
To this end we build abstract MutableMapping
classes that consume and build
on other MutableMappings
. We can compose several of these with each other
to form intuitive interfaces over complex storage systems policies.
Example¶
In the following example we create an LRU dictionary backed by pickle-encoded, zlib-compressed, directory of files.
import pickle
import zlib
from zict import File, Func, LRU
a = File('mydir/')
b = Func(zlib.compress, zlib.decompress, a)
c = Func(pickle.dumps, pickle.loads, b)
d = LRU(100, c)
>>> d['x'] = [1, 2, 3]
>>> d['x']
[1, 2, 3]
Thread-safety¶
Most classes in this library are thread-safe. Refer to the documentation of the individual mappings for exceptions.
API¶
zict defines the following MutableMappings:
- class zict.Buffer(fast: MutableMapping[KT, VT], slow: MutableMapping[KT, VT], n: float, weight: Callable[[KT, VT], float] = <function Buffer.<lambda>>, fast_to_slow_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, slow_to_fast_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None)[source]¶
Buffer one dictionary on top of another
This creates a MutableMapping by combining two MutableMappings, one that feeds into the other when it overflows, based on an LRU mechanism. When the first evicts elements these get placed into the second. When an item is retrieved from the second it is placed back into the first.
- Parameters:
- fast: MutableMapping
- slow: MutableMapping
- n: float
Number of elements to keep, or total weight if
weight
is used.- weight: f(k, v) -> float, optional
Weight of each key/value pair (default: 1)
- fast_to_slow_callbacks: list of callables
These functions run every time data moves from the fast to the slow mapping. They take two arguments, a key and a value. If an exception occurs during a fast_to_slow_callbacks (e.g a callback tried storing to disk and raised a disk full error) the key will remain in the LRU.
- slow_to_fast_callbacks: list of callables
These functions run every time data moves form the slow to the fast mapping.
See also
Notes
If you call methods of this class from multiple threads, access will be fast as long as all methods of
fast
, plusslow.__contains__
andslow.__delitem__
, are fast.slow.__getitem__
,slow.__setitem__
and callbacks are not protected by locks.Examples
>>> fast = {} >>> slow = Func(dumps, loads, File('storage/')) >>> def weight(k, v): ... return sys.getsizeof(v) >>> buff = Buffer(fast, slow, 1e8, weight=weight)
- evict_until_below_target(n: float | None = None) None [source]¶
Wrapper around
zict.LRU.evict_until_below_target()
. Presented here to allow easier overriding.
- property n: float¶
Maximum weight in the fast mapping before eviction happens. Can be updated; this won’t trigger eviction by itself; you should call
evict_until_below_target()
afterwards.See also
- property offset: float¶
Offset to add to the total weight in the fast buffer to determine when eviction happens. Note that increasing offset is not the same as decreasing n, as the latter also changes what keys qualify as “heavy” and should not be stored in fast.
Always starts at zero and can be updated; this won’t trigger eviction by itself; you should call
evict_until_below_target()
afterwards.See also
- class zict.AsyncBuffer(fast: MutableMapping[KT, VT], slow: MutableMapping[KT, VT], n: float, weight: Callable[[KT, VT], float] = <function Buffer.<lambda>>, fast_to_slow_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, slow_to_fast_callbacks: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None)[source]¶
Extension of
Buffer
that allows offloading all reads and writes from/to slow to a separate worker thread.This requires
fast
to be fully thread-safe (e.g. a plain dict).slow.__setitem__
andslow.__getitem__
will be called from the offloaded thread, while all of its other methods (including, notably for the purpose of thread-safety consideration,__contains__
and__delitem__
) will be called from the main thread.- Parameters:
- Same as in Buffer, plus:
- executor: concurrent.futures.Executor, optional
An Executor instance to use for offloading. It must not pickle/unpickle. Defaults to an internal ThreadPoolExecutor.
- nthreads: int, optional
Number of offloaded threads to run in parallel. Defaults to 1. Mutually exclusive with executor parameter.
See also
- async_evict_until_below_target(n: float | None = None) None [source]¶
If the total weight exceeds n, asynchronously start moving keys from fast to slow in a worker thread.
- async_get(keys: Collection[KT], missing: Literal['raise', 'omit'] = 'raise') Future[dict[KT, VT]] [source]¶
Fetch one or more key/value pairs. If not all keys are available in fast, offload to a worker thread moving keys from slow to fast, as well as possibly moving older keys from fast to slow.
- Parameters:
- keys:
collection of zero or more keys to get
- missing: raise or omit, optional
- raise (default)
If any key is missing, raise KeyError.
- omit
If a key is missing, return a dict with less keys than those requested.
Notes
All keys may be present when you call
async_get
, but__delitem__
may be called on one of them before the actual data is fetched.__setitem__
also internally calls__delitem__
in a non-atomic way, so you may getKeyError
when updating a value too.
- class zict.Cache(data: MutableMapping[KT, VT], cache: MutableMapping[KT, VT], update_on_set: bool = True)[source]¶
Transparent write-through cache around a MutableMapping with an expensive __getitem__ method.
- Parameters:
- data: MutableMapping
Persistent, slow to read mapping to be cached
- cache: MutableMapping
Fast cache for reads from data. This mapping may lose keys on its own; e.g. it could be a LRU.
- update_on_set: bool, optional
If True (default), the cache will be updated both when writing and reading. If False, update the cache when reading, but just invalidate it when writing.
Notes
If you call methods of this class from multiple threads, access will be fast as long as all methods of
cache
, plusdata.__delitem__
, are fast. Other methods ofdata
are not protected by locks.Examples
Keep the latest 100 accessed values in memory >>> from zict import Cache, File, LRU, WeakValueMapping >>> d = Cache(File(‘myfile’), LRU(100, {})) # doctest: +SKIP
Read data from disk every time, unless it was previously accessed and it’s still in use somewhere else in the application >>> d = Cache(File(‘myfile’), WeakValueMapping()) # doctest: +SKIP
- class zict.File(directory: str | pathlib.Path, memmap: bool = False)[source]¶
Mutable Mapping interface to a directory
Keys must be strings, values must be buffers
Note this shouldn’t be used for interprocess persistence, as keys are cached in memory.
- Parameters:
- directory: str
Directory to write to. If it already exists, existing files will be imported as mapping elements. If it doesn’t exists, it will be created.
- memmap: bool (optional)
If True, use mmap for reading. Defaults to False.
Notes
If you call methods of this class from multiple threads, access will be fast as long as atomic disk access such as
open
,os.fstat
, andos.remove
is fast. This is not always the case, e.g. in case of slow network mounts or spun-down magnetic drives. Bytes read/write in the files is not protected by locks; this could cause failures on Windows, NFS, and in general whenever it’s not OK to delete a file while there are file descriptors open on it.Examples
>>> z = File('myfile') >>> z['x'] = b'123' >>> z['x'] b'123'
Also supports writing lists of bytes objects
>>> z['y'] = [b'123', b'4567'] >>> z['y'] b'1234567'
Or anything that can be used with file.write, like a memoryview
>>> z['data'] = np.ones(5).data
- class zict.Func(dump: Callable[[VT], WT], load: Callable[[WT], VT], d: MutableMapping[KT, WT])[source]¶
Translate the values of a MutableMapping with a pair of input/output functions
- Parameters:
- dump: callable
Function to call on value as we set it into the mapping
- load: callable
Function to call on value as we pull it from the mapping
- d: MutableMapping
See also
Examples
>>> def double(x): ... return x * 2
>>> def halve(x): ... return x / 2
>>> d = {} >>> f = Func(double, halve, d) >>> f['x'] = 10 >>> d {'x': 20} >>> f['x'] 10.0
- class zict.KeyMap(fn: Callable[[KT], JT], d: MutableMapping[JT, VT])[source]¶
Translate the keys of a MutableMapping with a pair of input/output functions
- Parameters:
- fn: callable
Function to call on a key of the KeyMap to transform it to a key of the wrapped mapping. It must be pure (if called twice on the same key it must return the same result) and it must not generate collisions. In other words,
fn(a) == fn(b) iff a == b
.- d: MutableMapping
Wrapped mapping
See also
Examples
Use any python object as keys of a File, instead of just strings, as long as their str representation is unique:
>>> from zict import File >>> z = KeyMap(str, File("myfile")) >>> z[1] = 10
- class zict.LMDB(directory: str | pathlib.Path, map_size: int | None = None)[source]¶
Mutable Mapping interface to a LMDB database.
Keys must be strings, values must be bytes
- Parameters:
- directory: str
- map_size: int
On Linux and MacOS, maximum size of the database file on disk. Defaults to 1 TiB on 64 bit systems and 1 GiB on 32 bit ones.
On Windows, preallocated total size of the database file on disk. Defaults to 10 MiB to encourage explicitly setting it.
Notes
None of this class is thread-safe - not even normally trivial methods such as
__len__ `` or ``__contains__
.Examples
>>> z = LMDB('/tmp/somedir/') >>> z['x'] = b'123' >>> z['x'] b'123'
- class zict.LRU(n: float, d: MutableMapping[KT, VT], *, on_evict: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, on_cancel_evict: Callable[[KT, VT], None] | list[Callable[[KT, VT], None]] | None = None, weight: Callable[[KT, VT], float] = <function LRU.<lambda>>)[source]¶
Evict Least Recently Used Elements.
- Parameters:
- n: int or float
Number of elements to keep, or total weight if
weight
is used. Any individual key that is heavier than n will be automatically evicted as soon as it is inserted.It can be updated after initialization. See also:
offset
attribute.- d: MutableMapping
Dict-like in which to hold elements. There are no expectations on its internal ordering. Iteration on the LRU follows the order of the underlying mapping.
- on_evict: callable or list of callables
Function:: k, v -> action to call on key/value pairs prior to eviction If an exception occurs during an on_evict callback (e.g a callback tried storing to disk and raised a disk full error) the key will remain in the LRU.
- on_cancel_evict: callable or list of callables
Function:: k, v -> action to call on key/value pairs if they’re deleted or updated from a thread while the on_evict callables are being executed in another. If you’re not accessing the LRU from multiple threads, ignore this parameter.
- weight: callable
Function:: k, v -> number to determine the size of keeping the item in the mapping. Defaults to
(k, v) -> 1
Notes
If you call methods of this class from multiple threads, access will be fast as long as all methods of
d
are fast. Callbacks are not protected by locks and can be arbitrarily slow.Examples
>>> lru = LRU(2, {}, on_evict=lambda k, v: print("Lost", k, v)) >>> lru['x'] = 1 >>> lru['y'] = 2 >>> lru['z'] = 3 Lost x 1
- evict(key: KT | NoDefault = NoDefault.nodefault) tuple[KT, VT, float] | tuple[None, None, float] [source]¶
Evict least recently used key, or least recently inserted key with individual weight > n, if any. You may also evict a specific key.
This is typically called from internal use, but can be externally triggered as well.
- Returns:
- Tuple of (key, value, weight)
- Or (None, None, 0) if the key that was being evicted was updated or deleted from
- another thread while the on_evict callbacks were being executed. This outcome is
- only possible in multithreaded access.
- evict_until_below_target(n: float | None = None) None [source]¶
Evict key/value pairs until the total weight falls below n
- Parameters:
- n: float, optional
Total weight threshold to achieve. Defaults to self.n.
- get_all_or_nothing(keys: Collection[KT]) dict[KT, VT] [source]¶
If all keys exist in the LRU, update their FIFO priority and return their values; this would be the same as
{k: lru[k] for k in keys}
. If any keys are missing, however, raise KeyError for the first one missing and do not bring any of the available keys to the top of the LRU.
- n: float¶
Maximum weight before eviction is triggered, as set during initialization. Updating this attribute doesn’t trigger eviction by itself; you should call
evict_until_below_target()
explicitly afterwards.
- offset: float¶
Offset to add to
total_weight
to determine if key/value pairs should be evicted. It always starts at zero and can be updated afterwards. Updating this attribute doesn’t trigger eviction by itself; you should callevict_until_below_target()
explicitly afterwards. Increasingoffset
is not the same as reducingn
, as the latter will also reduce the threshold below which a value is considered “heavy” and qualifies for immediate eviction.
- set_noevict(key: KT, value: VT) None [source]¶
Variant of
__setitem__
that does not evict if the total weight exceeds n. Unlike__setitem__
, this method does not depend on theon_evict
functions to be thread-safe for its own thread-safety. It also is not prone to re-raising exceptions from theon_evict
callbacks.
- class zict.Sieve(mappings: Mapping[MKT, MutableMapping[KT, VT]], selector: Callable[[KT, VT], MKT])[source]¶
Store values in different mappings based on a selector’s output.
This creates a MutableMapping combining several underlying MutableMappings for storage. Items are dispatched based on a selector function provided by the user.
- Parameters:
- mappings: dict of {mapping key: MutableMapping}
- selector: callable (key, value) -> mapping key
Notes
If you call methods of this class from multiple threads, access will be fast as long as the
__contains__
and__delitem__
methods of all underlying mappins are fast.__getitem__
and__setitem__
methods of the underlying mappings are not protected by locks.Examples
>>> small = {} >>> large = DataBase() >>> mappings = {True: small, False: large} >>> def is_small(key, value): ... return sys.getsizeof(value) < 10000 >>> d = Sieve(mappings, is_small)
- class zict.Zip(filename: str, mode: Literal['r', 'w', 'x', 'a'] = 'a')[source]¶
Mutable Mapping interface to a Zip file
Keys must be strings, values must be bytes
- Parameters:
- filename: string
- mode: string, (‘r’, ‘w’, ‘a’), defaults to ‘a’
Notes
None of this class is thread-safe - not even normally trivial methods such as
__len__ `` or ``__contains__
.Examples
>>> z = Zip('myfile.zip') >>> z['x'] = b'123' >>> z['x'] b'123' >>> z.flush() # flush and write metadata to disk
Additionally, zict makes available the following general-purpose objects:
- class zict.InsertionSortedSet(other: Iterable[T] = ())[source]¶
A set-like that retains insertion order, like a dict. Thread-safe.
Equality does not compare order or class, but only compares against the contents of any other set-like, coherently with dict and the AbstractSet design.
- add(value: T) None [source]¶
Add element to the set. If the element is already in the set, retain original insertion order.
- pop() T ¶
Pop the latest-inserted key from the set
Changelog¶
Release notes can be found here.