Skip to content

Schema Design

Each ensemble is represented by an Ensemble, which includes:

  • identification: metadata and fingerprinting;
  • system: atomic composition, charges, coordinates;
  • topology: atom types, bonding, and groupings;
  • energy: electronic, kinetic, and classical energies;
  • qc: quantum chemistry parameters and results;
  • time: time step and sampling intervals.

These fields are built from modular Pydantic models like Microstate, EnergySchema, QCSchema, and TimeSchema, all adhering to the same UUID/cadence principles. Together, they form a rigorously structured yet highly extensible data model for representing atomic systems at scale.

Ensemble(ens_id, parent)

Bases: Container

The Ensemble class represents a collection of molecular structures, each referred to as a microstate. This class is used to manage and validate an ensemble of molecular data, facilitating the handling of multiple molecular configurations, such as those produced during atomistic calculations.

Only data that could reasonably change shape or dimensions between ensembles (due to different numbering or ordering of atoms) should be stored here. All other data should be stored in a [Project][schemas.Project].

cadence = Cadence.MICROSTATE

coordinates = Data[adt.Float64](store_kind=StoreKind.ARRAY, uuid='81c7cec9-beec-4126-b6d8-91bee28951d6', description='Atomic coordinates')

Coordinates refer to the specific three-dimensional positions of particles defined using a set of Cartesian coordinates (\(x\), \(y\), \(z\)).

label = ens_id

topology = Topology(self)

__repr__()

n_micro(view=None, run_id=None)

Total number of microstates.