Warning: This document is for an old version of Pydra: A simple dataflow engine with scalable semantics. The main version is master.

pydra.utils.hash module

Generic object hashing dispatch

pydra.utils.hash.bytes_repr_mapping_contents(mapping: Mapping, cache: Cache) Iterator[bytes]

Serialize the contents of a mapping

Concatenates byte-serialized keys and hashed values.

>>> from pydra.utils.hash import bytes_repr_mapping_contents, Cache
>>> generator = bytes_repr_mapping_contents({"a": 1, "b": 2}, Cache({}))
>>> b''.join(generator)
b'str:1:a=...str:1:b=...'
pydra.utils.hash.bytes_repr_sequence_contents(seq: Sequence, cache: Cache) Iterator[bytes]

Serialize the contents of a sequence

Concatenates hashed values.

>>> from pydra.utils.hash import bytes_repr_sequence_contents, Cache
>>> generator = bytes_repr_sequence_contents([1, 2], Cache({}))
>>> list(generator)
[b'm...', b'£...']
pydra.utils.hash.hash_function(obj)

Generate hash of object.

pydra.utils.hash.hash_object(obj: object) Hash

Hash an object

Constructs a byte string that uniquely identifies the object, and returns the hash of that string.

Base Python types are implemented, including recursive lists and dicts. Custom types can be registered with register_serializer().

pydra.utils.hash.hash_single(obj: object, cache: Cache) Hash

Single object-scoped hash

Uses a local cache to prevent infinite recursion. This cache is unsafe to reuse across multiple objects, so this function should not be used directly.

pydra.utils.hash.register_serializer(cls, func=None)

Register a custom serializer for a type

The generator function should yield byte strings that will be hashed to produce the final hash. A recommended convention is to yield a qualified type prefix (e.g. f"{module}.{class}"), followed by a colon, followed by the serialized value.

If serializing an iterable, an open and close bracket may be yielded to identify the start and end of the iterable.

Consider using bytes_repr_mapping_contents() and bytes_repr_sequence_contents() to serialize the contents of a mapping or sequence. These do not include the prefix or brackets, so they can be reused as part of a custom serializer.

As an example, the following example is the default serializer for user-defined classes:

@register_serializer
def bytes_repr(obj: object, cache: Cache) -> Iterator[bytes]:
    cls = obj.__class__
    yield f"{cls.__module__}.{cls.__name__}:{{".encode()
    yield from bytes_repr_mapping_contents(obj.__dict__, cache)
    yield b"}"

Serializers must accept a cache argument, which is a dictionary that permits caching of hashes for recursive objects. If the hash of sub-objects is used to create an object serialization, the hash_single() function should be called with the same cache object.