Warning: This document is for the development version of Pydra: A simple dataflow engine with scalable semantics. The main version is master.

pydra.utils.hash module

Generic object hashing dispatch

class pydra.utils.hash.Cache(persistent: Path | str | PersistentCache | None = None, hashes: Dict[int, Hash] = _Nothing.NOTHING)

Bases: object

Cache for hashing objects, used to avoid infinite recursion caused by circular references between objects, and to store hashes of objects that have already been hashed to avoid recomputation.

This concept is extended to persistent caching of hashes for certain object types, for which calculating the hash is a potentially expensive operation (e.g. File/Directory types). For these classes the bytes_repr override function yields a “locally unique cache key” (e.g. file-system path + mtime) as the first item of its iterator.

persistent: PersistentCache | None
pydra.utils.hash.bytes_repr_mapping_contents(mapping: Mapping, cache: Cache) Iterator[bytes]

Serialize the contents of a mapping

Concatenates byte-serialized keys and hashed values.

>>> from pydra.utils.hash import bytes_repr_mapping_contents, Cache
>>> generator = bytes_repr_mapping_contents({"a": 1, "b": 2}, Cache())
>>> b''.join(generator)
b'str:1:a=...str:1:b=...'
pydra.utils.hash.bytes_repr_sequence_contents(seq: Sequence, cache: Cache) Iterator[bytes]

Serialize the contents of a sequence

Concatenates hashed values.

>>> from pydra.utils.hash import bytes_repr_sequence_contents, Cache
>>> generator = bytes_repr_sequence_contents([1, 2], Cache())
>>> list(generator)
[b'm...', b'£...']
pydra.utils.hash.hash_function(obj, **kwargs)

Generate hash of object.

pydra.utils.hash.hash_object(obj: object, cache: Cache | None = None, persistent_cache: PersistentCache | Path | None = None) Hash

Hash an object

Constructs a byte string that uniquely identifies the object, and returns the hash of that string.

Base Python types are implemented, including recursive lists and dicts. Custom types can be registered with register_serializer().

pydra.utils.hash.hash_single(obj: object, cache: Cache) Hash

Single object-scoped hash

Uses a local cache to prevent infinite recursion. This cache is unsafe to reuse across multiple objects, so this function should not be used directly.

pydra.utils.hash.register_serializer(cls, func=None)

Register a custom serializer for a type

The generator function should yield byte strings that will be hashed to produce the final hash. A recommended convention is to yield a qualified type prefix (e.g. f"{module}.{class}"), followed by a colon, followed by the serialized value.

If serializing an iterable, an open and close bracket may be yielded to identify the start and end of the iterable.

Consider using bytes_repr_mapping_contents() and bytes_repr_sequence_contents() to serialize the contents of a mapping or sequence. These do not include the prefix or brackets, so they can be reused as part of a custom serializer.

As an example, the following example is the default serializer for user-defined classes:

@register_serializer
def bytes_repr(obj: object, cache: Cache) -> Iterator[bytes]:
    cls = obj.__class__
    yield f"{cls.__module__}.{cls.__name__}:{{".encode()
    yield from bytes_repr_mapping_contents(obj.__dict__, cache)
    yield b"}"

Serializers must accept a cache argument, which is a dictionary that permits caching of hashes for recursive objects. If the hash of sub-objects is used to create an object serialization, the hash_single() function should be called with the same cache object.