Skip to content

str

typing._array._str.OptionalStr = Str | None module-attribute

Optional string array type.

Extends Str to include None, representing string arrays that may be missing or not yet initialized.

Type Definition

npt.NDArray[np.str_] | None

Example

Optional element symbols::

symbols: Data[OptionalStr] = Data[OptionalStr](
    dtype=np.str_,
    meta=Metadata(description="Element symbols"),
    default=None
)

# Conditional symbol processing
symbol_data = molecule.symbols.value
if symbol_data is not None:
    hydrogen_mask = symbol_data == 'H'
See Also

Str: Non-optional version

typing._array._str.Str = npt.NDArray[np.str_] module-attribute

NumPy string array type for text data.

Represents NumPy arrays containing string values. These arrays store fixed-length Unicode strings and are useful for categorical text data such as element symbols, residue names, or other textual identifiers in scientific datasets.

Characteristics
  • Fixed-length Unicode strings
  • Memory efficient for repeated short strings
  • NumPy array operations supported
  • Vectorized string operations available
Typical Use Cases
  • Chemical element symbols ('H', 'He', 'Li', etc.)
  • Residue names in proteins ('ALA', 'GLY', 'SER', etc.)
  • Atom types in force fields ('CT', 'HC', 'OH', etc.)
  • Categorical labels and identifiers
  • File names or path components
Example

Element symbols::

atom_symbols: Data[Str] = Data[Str](
    dtype=np.str_,
    meta=Metadata(
        description="Chemical element symbols",
        store=StoreKind.ARRAY
    )
)

Creating symbol data::

# Water molecule symbols
symbols = np.array(['H', 'H', 'O'], dtype=np.str_)
molecule.atom_symbols = symbols

Protein residues::

# Protein sequence
residues = np.array(['MET', 'ALA', 'GLY', 'SER'], dtype=np.str_)
protein.residue_names = residues
String Operations
  • Vectorized comparison: symbols == 'H'
  • String methods: np.char.upper(symbols)
  • Pattern matching and searching
  • Concatenation and manipulation
Memory Considerations
  • All strings in array have same maximum length
  • Shorter strings are padded with null characters
  • Consider Python object arrays for variable-length strings
  • More memory efficient than object arrays for short, similar strings
Performance Notes
  • Fast for fixed-length string operations
  • NumPy vectorization benefits
  • Suitable for categorical data analysis
  • Less flexible than Python string objects
See Also

OptionalStr: Optional version including None numpy.str_: NumPy documentation for string arrays numpy.char: NumPy string operation functions