Skip to content

使用 Slurm 和 OpenMPI

Slurm is a powerful workload manager and job scheduler for high-performance computing (HPC) clusters.

  • Batch Job (sbatch): Submit a script for later execution.
  • Interactive Job (srun): Run a job interactively in real-time.
Terminal window
# Submit a batch script
sbatch myscript.sh
# Start an interactive bash session
srun --pty /bin/bash

You can specify resources using command-line flags or #SBATCH directives in your script.

ResourceFlagExampleDescription
CPU-c / --cpus-per-tasksbatch -c 4 script.shRequest 4 CPUs per task.
Memory--memsbatch --mem=8G script.shRequest 8GB of memory per node.
GPU--gressbatch --gres=gpu:2 script.shRequest 2 GPUs.
Partition-p / --partitionsbatch -p debug script.shSubmit to the debug partition (queue).
Tasks-n / --ntaskssbatch -n 10 script.shLaunch 10 tasks (processes).
Tasks/Node--ntasks-per-nodesbatch --ntasks-per-node=5Launch 5 tasks per node.

OpenMPI is an open-source Message Passing Interface (MPI) implementation.

To run an MPI program manually (outside of Slurm), you often need a hostfile to specify where to run processes.

Example hostfile:

node01 slots=4
node02 slots=4

Run Command:

Terminal window
mpirun -n 8 -hostfile hostfile ./your_app

When using Slurm, you typically do not need to manually specify nodes or hostfiles. Slurm integrates with MPI to handle process distribution automatically.

Create a script named job.sh:

#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=01:00:00
#SBATCH --partition=standard
#SBATCH --output=mpi_job_%j.out
# Load MPI module if needed
# module load openmpi
# Run the MPI program
# srun is preferred over mpirun within Slurm
srun ./my_mpi_program
Terminal window
sbatch job.sh

Slurm will allocate 2 nodes with 16 tasks each (32 tasks total) and launch my_mpi_program on them.