Slurm
schemas.workflow.slurm.SlurmSchema
¶
Bases: BaseModel
, YamlIO
, Render
Context manager for Slurm job submission scripts.
This class provides a structured way to define and manage the configuration for submitting jobs to a Slurm workload manager. Each attribute corresponds to a specific Slurm configuration parameter or job setup step.
account: str | None = None
class-attribute
instance-attribute
¶
Charge resources used by this job to specified account. The account is an arbitrary string.
Set this to your project's account name to properly attribute resource usage, or
leave it as None
if not needed.
Example
"research_project_123"
cluster: str = 'smp'
class-attribute
instance-attribute
¶
Cluster name where the job will run.
Ensure this matches the available cluster names in your Slurm environment. This helps direct the job to the appropriate set of resources.
Example
"hpc_cluster"
commands_post: list[str] = []
class-attribute
instance-attribute
¶
List of commands to run after the main job command.
Use this for cleanup tasks or additional processing. These commands will be executed after the main job tasks are completed.
Example
["cp /scratch/my_job/output.dat .", "rm -rf /scratch/my_job"]
commands_pre: list[str] = []
class-attribute
instance-attribute
¶
List of commands to run before the main job command.
Useful for setup tasks like copying files, creating directories, or loading additional software. These commands will be executed before the main job starts.
Example
["mkdir -p /scratch/my_job", "cp input.dat /scratch/my_job/"]
commands_run: list[str] = []
class-attribute
instance-attribute
¶
List of main commands to run for the job.
This should include the primary executable or script for the job. These are the main tasks that the job will perform.
Example
["python my_script.py", "./run_simulation.sh"]
constraint: str | None = None
class-attribute
instance-attribute
¶
Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option.
cores_per_socket: int | None = None
class-attribute
instance-attribute
¶
Restrict node selection to nodes with at least the specified number of cores per socket.
cpus_per_gpu: int | None = None
class-attribute
instance-attribute
¶
Request that ncpus processors be allocated per allocated GPU. Steps inheriting
this value will imply --exact
. Not compatible with the
--cpus-per-task
option.
cpus_per_task: int | None = None
class-attribute
instance-attribute
¶
Advise the Slurm controller that ensuing job steps will require ncpus
number
of processors per task. Without this option, the controller will just try to
allocate one processor per task.
env_vars: dict[str, str] = {}
class-attribute
instance-attribute
¶
Dictionary of environment variables to set before running the job.
Use this to configure the job's environment, setting any necessary environment variables.
Example
{"OMP_NUM_THREADS": "16", "MY_VARIABLE": "value"}
error: str = 'slurm-%j.err'
class-attribute
instance-attribute
¶
Path for the job's error output file.
Similar to output_path
, using %j
in the filename ensures that errors for each
job are logged separately.
Example
"logs/job_errors_%j.err"
gpus: int | str | None = None
class-attribute
instance-attribute
¶
Specify the total number of GPUs required for the job.
gpus_per_node: int | str | None = None
class-attribute
instance-attribute
¶
Specify the number of GPUs required for the job on each node included in the job's resource allocation.
gpus_per_socket: int | str | None = None
class-attribute
instance-attribute
¶
Specify the number of GPUs required for the job on each socket included in the job's resource allocation.
gpus_per_task: int | str | None = None
class-attribute
instance-attribute
¶
Specify the number of GPUs required for the job on each task to be spawned in the job's resource allocation.
gres: str | None = None
class-attribute
instance-attribute
¶
Specifies a comma-delimited list of generic consumable resources. The format
for each entry in the list is "name[[:type]:count]"
. The name is the type
of consumable resource (e.g. gpu). The type is an optional classification for
the resource (e.g. a100). The count is the number of those resources with a
default value of 1.
job_name: str = 'job'
class-attribute
instance-attribute
¶
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.
Example
"data_analysis_job"
mem: str | None = None
class-attribute
instance-attribute
¶
Specify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T].
mem_per_cpu: str | None = None
class-attribute
instance-attribute
¶
Minimum memory required per usable allocated CPU. Default units are megabytes.
The default value is DefMemPerCPU
and the maximum value is MaxMemPerCPU
.
modules: list[str] = []
class-attribute
instance-attribute
¶
List of modules to load before running the job.
Include all necessary software modules that your job requires. This ensures the environment is correctly set up before execution.
Example
["python/3.8", "gcc/9.2"]
nodes: int = 1
class-attribute
instance-attribute
¶
The minimum number of nodes to use for the Slurm job.
Adjust this based on the job's resource requirements. For instance, a large parallel job might need several nodes, while a smaller job might only need one.
Example
4
ntasks: int | None = None
class-attribute
instance-attribute
¶
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources.
Example
16
ntasks_per_node: int | None = None
class-attribute
instance-attribute
¶
Request that ntasks
be invoked on each node.
This typically corresponds to the number of CPU cores to use on each node. Adjust this based on the node's capabilities and the parallelism of your job.
Example
16
output: str = 'slurm-%j.out'
class-attribute
instance-attribute
¶
Path for the job's standard output file.
Use %j
to include the job ID in the filename, ensuring that each job's output is
saved to a unique file.
Example
"logs/job_output_%j.out"
partition: str = 'smp'
class-attribute
instance-attribute
¶
Partition name to submit the job to.
Choose an appropriate partition based on resource needs and availability. Partitions can have different resource limits and policies.
Example
"short"
time: str = '1-00:00:00'
class-attribute
instance-attribute
¶
Maximum time for the job.
Specified in the format D-HH:MM:SS
. Adjust this based on the expected runtime of
your job.
Example
"0-12:00:00"
for a 12-hour job.