pydra.utils.hash module
Generic object hashing dispatch
- class pydra.utils.hash.Cache(persistent: Path | str | PersistentCache | None = None, hashes: Dict[int, Hash] = _Nothing.NOTHING)
Bases:
object
Cache for hashing objects, used to avoid infinite recursion caused by circular references between objects, and to store hashes of objects that have already been hashed to avoid recomputation.
This concept is extended to persistent caching of hashes for certain object types, for which calculating the hash is a potentially expensive operation (e.g. File/Directory types). For these classes the bytes_repr override function yields a “locally unique cache key” (e.g. file-system path + mtime) as the first item of its iterator.
- pydra.utils.hash.bytes_repr_mapping_contents(mapping: Mapping, cache: Cache) Iterator[bytes]
Serialize the contents of a mapping
Concatenates byte-serialized keys and hashed values.
>>> from pydra.utils.hash import bytes_repr_mapping_contents, Cache >>> generator = bytes_repr_mapping_contents({"a": 1, "b": 2}, Cache()) >>> b''.join(generator) b'str:1:a=...str:1:b=...'
- pydra.utils.hash.bytes_repr_sequence_contents(seq: Sequence, cache: Cache) Iterator[bytes]
Serialize the contents of a sequence
Concatenates hashed values.
>>> from pydra.utils.hash import bytes_repr_sequence_contents, Cache >>> generator = bytes_repr_sequence_contents([1, 2], Cache()) >>> list(generator) [b'm...', b'£...']
- pydra.utils.hash.hash_function(obj, **kwargs)
Generate hash of object.
- pydra.utils.hash.hash_object(obj: object, cache: Cache | None = None, persistent_cache: PersistentCache | Path | None = None) Hash
Hash an object
Constructs a byte string that uniquely identifies the object, and returns the hash of that string.
Base Python types are implemented, including recursive lists and dicts. Custom types can be registered with
register_serializer()
.
- pydra.utils.hash.hash_single(obj: object, cache: Cache) Hash
Single object-scoped hash
Uses a local cache to prevent infinite recursion. This cache is unsafe to reuse across multiple objects, so this function should not be used directly.
- pydra.utils.hash.register_serializer(cls, func=None)
Register a custom serializer for a type
The generator function should yield byte strings that will be hashed to produce the final hash. A recommended convention is to yield a qualified type prefix (e.g.
f"{module}.{class}"
), followed by a colon, followed by the serialized value.If serializing an iterable, an open and close bracket may be yielded to identify the start and end of the iterable.
Consider using
bytes_repr_mapping_contents()
andbytes_repr_sequence_contents()
to serialize the contents of a mapping or sequence. These do not include the prefix or brackets, so they can be reused as part of a custom serializer.As an example, the following example is the default serializer for user-defined classes:
@register_serializer def bytes_repr(obj: object, cache: Cache) -> Iterator[bytes]: cls = obj.__class__ yield f"{cls.__module__}.{cls.__name__}:{{".encode() yield from bytes_repr_mapping_contents(obj.__dict__, cache) yield b"}"
Serializers must accept a
cache
argument, which is a dictionary that permits caching of hashes for recursive objects. If the hash of sub-objects is used to create an object serialization, thehash_single()
function should be called with the same cache object.