Digester
digesters.digester.Digester(*args, **kwargs)
¶
Bases: ABC
The Digester class is an abstract base class designed to assist with digesting data from various computational chemistry and biology resources. This class provides a framework for processing, parsing, and validating data extracted from simulations, geometry optimizations, and other computational methods.
checks()
classmethod
¶
Perform basic checks to raise warnings or errors before digesting.
This method should be overridden in the child class to include specific checks required for the particular type of data being processed.
digest(ensemble_schema=None, digester_args=None, digester_kwargs=None, parallelize=False, max_workers=None)
classmethod
¶
Given the inputs in digester_args
and digester_kwargs
, digest all possible
atomistic frames and populate an
EnsembleSchema
.
This method initializes an EnsembleSchema
,
calls the checks
method to
perform preliminary checks, prepares the inputs for digestion by calling the
prepare_inputs_digester
method, and iterates through each step of the digestion process to append
frames to the EnsembleSchema
.
This method keeps the entire
EnsembleSchema
in memory. If you have
a large amount of data, consider using
digest_chunks
.
PARAMETER | DESCRIPTION |
---|---|
ensemble_schema |
The schema to which frames will be appended.
TYPE:
|
digester_args |
Arguments to pass into
TYPE:
|
digester_kwargs |
Keyword arguments to pass into
TYPE:
|
parallelize |
Execute concurrently.
TYPE:
|
max_workers |
Maximum number of workers for concurrent operation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
EnsembleSchema
|
The updated schema with frames extracted from a digester. |
digest_chunks(ensemble_schema=None, digester_args=None, digester_kwargs=None, chunk_size=100, parallelize=False, max_workers=None)
classmethod
¶
Same as digest
, but
instead of returning a whole
EnsembleSchema
, it will generate ones
with a specified chunk_size
.
PARAMETER | DESCRIPTION |
---|---|
ensemble_schema |
The schema to which frames will be appended.
TYPE:
|
digester_args |
Arguments to pass into
TYPE:
|
digester_kwargs |
Keyword arguments to pass into
TYPE:
|
chunk_size |
Number of frames to process before yielding an
TYPE:
|
parallelize |
Execute concurrently.
TYPE:
|
max_workers |
Maximum number of workers for concurrent operation.
TYPE:
|
digest_frame(inputs_frame, schema_map, cadence_eval='molecule')
classmethod
¶
Digest a single frame of input data into a
[MoleculeSchema
][schemas.atomistic.MoleculeSchema].
This method processes a single frame of data by invoking static methods
implemented in the child digester class. These static methods with a
[SchemaUUID
][digesters.ids.SchemaUUID] are responsible
for processing specific parts of the frame input data and returning
key-value pairs that correspond to fields in the
[MoleculeSchema
][schemas.atomistic.MoleculeSchema].
PARAMETER | DESCRIPTION |
---|---|
inputs_frame |
The inputs for the frame digestion process. This dictionary should contain all necessary data for processing a single frame.
TYPE:
|
schema_map |
A mapping of UUIDs to field keys from
TYPE:
|
cadence_eval |
Cadence of properties to evaluate and digest.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
Data parsed or computed for this frame. Keys are field keys and values |
dict[str, Any]
|
are the data from this frame. |
RAISES | DESCRIPTION |
---|---|
AttributeError
|
If the static method corresponding to a field's UUID is not found in the class. |
Exception
|
For any other exceptions that occur during the processing of the frame. |
Notes
- The method relies on metadata defined within the fields of MoleculeSchema to determine which static method to call for processing each field.
- Each field in the MoleculeSchema should have metadata that includes a 'uuid' and optionally a 'cadence'. The 'cadence' should be set to 'molecule' to indicate that the field is processed per molecule.
- Static methods in the child class should be decorated with @SchemaUUID to associate them with the corresponding fields in MoleculeSchema.
Example
Suppose inputs_frame
contains data for atomic coordinates, the static
method decorated with the appropriate UUID will be called to process these
coordinates, and the resulting values will be assigned to the corresponding
field in MoleculeSchema.
gen_inputs_frame(inputs_digester)
classmethod
¶
Generate inputs for each frame starting from a specific frame.
PARAMETER | DESCRIPTION |
---|---|
inputs_digester |
The initial inputs for the digestion process.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
dict[str, Any]
|
A generator yielding input dictionaries for each frame. |
get_inputs_frame(inputs_digester)
abstractmethod
classmethod
¶
Builds dictionary of keyword arguments for the current frame specified
in inputs_digester
. This is called for every frame.
PARAMETER | DESCRIPTION |
---|---|
inputs_digester |
A dictionary of inputs for the digestion process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
A dictionary of inputs for the digestion process. |
get_uuid_map()
classmethod
¶
Update the function UUID map by inspecting the class methods
decorated with [@SchemaUUID
][digesters.ids.SchemaUUID].
This method scans through all the methods in the class, identifies those
decorated with [@SchemaUUID
][digesters.ids.SchemaUUID], and constructs a
dictionary mapping the UUIDs to the method names. This map is used to
dynamically call methods based on their UUIDs during the data digestion process.
By using [@SchemaUUID
][digesters.ids.SchemaUUID], each method that processes
a part of the input data can be easily identified and called based on its UUID.
This allows for a flexible and dynamic way to handle various data processing
tasks, ensuring that each piece of data is processed by the appropriate method.
RETURNS | DESCRIPTION |
---|---|
A dictionary mapping UUIDs to method names. |
Example
If a method called coordinates
is decorated with
@SchemaUUID("81c7cec9-beec-4126-b6d8-91bee28951d6"), the returned
dictionary will include an entry:
{"81c7cec9-beec-4126-b6d8-91bee28951d6": "coordinates"}
Notes
This method only includes methods that
- are callable,
- do not start with
__
, and - have the
__uuid__
attribute.
next_frame(inputs_digester)
abstractmethod
classmethod
¶
Advance the digester inputs to the next frame in the data. This abstract method must be implemented in any child class as each data source may have a different way of advancing to the next frame.
PARAMETER | DESCRIPTION |
---|---|
inputs |
A dictionary of inputs for the digestion process.
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
A dictionary of inputs for the digestion process. |
prepare_inputs_digester(*args, **kwargs)
abstractmethod
classmethod
¶
Prepare and return the inputs necessary to start the digestion process.
This abstract method must be implemented in any child class. It should
return a dictionary of inputs that will be used by
get_inputs_frame
.
PARAMETER | DESCRIPTION |
---|---|
*args |
Variable length argument list.
TYPE:
|
**kwargs |
Arbitrary keyword arguments.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
A dictionary of inputs for the frame digesting process. |