Digester

`digesters.digester.Digester(*args, **kwargs)` ¶

Bases: ABC

The Digester class is an abstract base class designed to assist with digesting data from various computational chemistry and biology resources. This class provides a framework for processing, parsing, and validating data extracted from simulations, geometry optimizations, and other computational methods.

`checks()` `classmethod` ¶

Perform basic checks to raise warnings or errors before digesting.

This method should be overridden in the child class to include specific checks required for the particular type of data being processed.

`digest(ensemble_schema=None, digester_args=None, digester_kwargs=None, parallelize=False, max_workers=None)` `classmethod` ¶

Given the inputs in digester_args and digester_kwargs, digest all possible atomistic frames and populate an EnsembleSchema.

This method initializes an EnsembleSchema, calls the checks method to perform preliminary checks, prepares the inputs for digestion by calling the prepare_inputs_digester method, and iterates through each step of the digestion process to append frames to the EnsembleSchema.

This method keeps the entire EnsembleSchema in memory. If you have a large amount of data, consider using digest_chunks.

PARAMETER	DESCRIPTION
`ensemble_schema`	The schema to which frames will be appended. TYPE: `EnsembleSchema \| None` DEFAULT: `None`
`digester_args`	Arguments to pass into `prepare_inputs_digester`. TYPE: `tuple[Any, ...] \| None` DEFAULT: `None`
`digester_kwargs`	Keyword arguments to pass into `prepare_inputs_digester`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`parallelize`	Execute concurrently. TYPE: `bool` DEFAULT: `False`
`max_workers`	Maximum number of workers for concurrent operation. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`EnsembleSchema`	The updated schema with frames extracted from a digester.

`digest_chunks(ensemble_schema=None, digester_args=None, digester_kwargs=None, chunk_size=100, parallelize=False, max_workers=None)` `classmethod` ¶

Same as digest, but instead of returning a whole EnsembleSchema, it will generate ones with a specified chunk_size.

PARAMETER	DESCRIPTION
`ensemble_schema`	The schema to which frames will be appended. TYPE: `EnsembleSchema \| None` DEFAULT: `None`
`digester_args`	Arguments to pass into `prepare_inputs_digester`. TYPE: `tuple[Any, ...] \| None` DEFAULT: `None`
`digester_kwargs`	Keyword arguments to pass into `prepare_inputs_digester`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`chunk_size`	Number of frames to process before yielding an `EnsembleSchema`. TYPE: `int` DEFAULT: `100`
`parallelize`	Execute concurrently. TYPE: `bool` DEFAULT: `False`
`max_workers`	Maximum number of workers for concurrent operation. TYPE: `int \| None` DEFAULT: `None`

`digest_frame(inputs_frame, schema_map, cadence_eval='microstate')` `classmethod` ¶

Digest a single frame of input data into a EnsembleSchema.

This method processes a single frame of data by invoking static methods implemented in the child digester class. These static methods with a SchemaUUID are responsible for processing specific parts of the frame input data and returning key-value pairs that correspond to fields in the EnsembleSchema.

PARAMETER	DESCRIPTION
`inputs_frame`	The inputs for the frame digestion process. This dictionary should contain all necessary data for processing a single frame. TYPE: `dict[str, Any]`
`schema_map`	A mapping of UUIDs to field keys from `get_schema_map` TYPE: `dict[str, dict[str, str]]`
`cadence_eval`	Cadence of properties to evaluate and digest. TYPE: `Literal['microstate', 'ensemble']` DEFAULT: `'microstate'`

RETURNS	DESCRIPTION
`dict[str, Any]`	Data parsed or computed for this frame. Keys are field keys and values
`dict[str, Any]`	are the data from this frame.

RAISES	DESCRIPTION
`AttributeError`	If the static method corresponding to a field's UUID is not found in the class.
`Exception`	For any other exceptions that occur during the processing of the frame.

Notes

The method relies on metadata defined within the fields of EnsembleSchema to determine which static method to call for processing each field.
Each field in the EnsembleSchema should have metadata that includes a 'uuid' and optionally a 'cadence'. The 'cadence' should be set to 'microstate' to indicate that the field is processed per microstate.
Static methods in the child class should be decorated with @SchemaUUID to associate them with the corresponding fields in EnsembleSchema.

Example

Suppose inputs_frame contains data for atomic coordinates, the static method decorated with the appropriate UUID will be called to process these coordinates, and the resulting values will be assigned to the corresponding field in EnsembleSchema.

`gen_inputs_frame(inputs_digester)` `classmethod` ¶

Generate inputs for each frame starting from a specific frame.

PARAMETER	DESCRIPTION
`inputs_digester`	The initial inputs for the digestion process. TYPE: `dict[str, Any]`

YIELDS	DESCRIPTION
`dict[str, Any]`	A generator yielding input dictionaries for each frame.

`get_inputs_frame(inputs_digester)` `abstractmethod` `classmethod` ¶

Builds dictionary of keyword arguments for the current frame specified in inputs_digester. This is called for every frame.

PARAMETER	DESCRIPTION
`inputs_digester`	A dictionary of inputs for the digestion process. TYPE: `dict[str, Any]`

RETURNS	DESCRIPTION
`dict[str, Any]`	A dictionary of inputs for the digestion process.

`get_uuid_map()` `classmethod` ¶

Update the function UUID map by inspecting the class methods decorated with @SchemaUUID.

This method scans through all the methods in the class, identifies those decorated with @SchemaUUID, and constructs a dictionary mapping the UUIDs to the method names. This map is used to dynamically call methods based on their UUIDs during the data digestion process.

By using @SchemaUUID, each method that processes a part of the input data can be easily identified and called based on its UUID. This allows for a flexible and dynamic way to handle various data processing tasks, ensuring that each piece of data is processed by the appropriate method.

RETURNS	DESCRIPTION
	A dictionary mapping UUIDs to method names.

Example

If a method called coordinates is decorated with @SchemaUUID("81c7cec9-beec-4126-b6d8-91bee28951d6"), the returned dictionary will include an entry: {"81c7cec9-beec-4126-b6d8-91bee28951d6": "coordinates"}

Notes

This method only includes methods that

are callable,
do not start with __, and
have the __uuid__ attribute.

`next_frame(inputs_digester)` `abstractmethod` `classmethod` ¶

Advance the digester inputs to the next frame in the data. This abstract method must be implemented in any child class as each data source may have a different way of advancing to the next frame.

PARAMETER	DESCRIPTION
`inputs`	A dictionary of inputs for the digestion process.

RETURNS	DESCRIPTION
`dict[str, Any]`	A dictionary of inputs for the digestion process.

`prepare_inputs_digester(*args, **kwargs)` `abstractmethod` `classmethod` ¶

Prepare and return the inputs necessary to start the digestion process.

This abstract method must be implemented in any child class. It should return a dictionary of inputs that will be used by get_inputs_frame.

PARAMETER	DESCRIPTION
`*args`	Variable length argument list. TYPE: `Any` DEFAULT: `()`
`**kwargs`	Arbitrary keyword arguments. TYPE: `Collection[Any]` DEFAULT: `{}`

RETURNS	DESCRIPTION
`dict[str, Any]`	A dictionary of inputs for the frame digesting process.

Digester

digesters.digester.Digester(*args, **kwargs) ¶

checks() classmethod ¶

digest(ensemble_schema=None, digester_args=None, digester_kwargs=None, parallelize=False, max_workers=None) classmethod ¶

digest_chunks(ensemble_schema=None, digester_args=None, digester_kwargs=None, chunk_size=100, parallelize=False, max_workers=None) classmethod ¶

digest_frame(inputs_frame, schema_map, cadence_eval='microstate') classmethod ¶

gen_inputs_frame(inputs_digester) classmethod ¶

get_inputs_frame(inputs_digester) abstractmethod classmethod ¶

get_uuid_map() classmethod ¶

next_frame(inputs_digester) abstractmethod classmethod ¶

prepare_inputs_digester(*args, **kwargs) abstractmethod classmethod ¶