This procedure explains how to build and run a C program in an OpenMP + Intel MPI environment on an Ubuntu virtual machine running on OpenStack.
Here, we assume a 16-core environment and demonstrate execution with 4 MPI processes × 4 OpenMP threads per process (total 16 parallel).
1. Preparing the Intel MPI environment
We use the runtime and compiler environment provided by Intel MPI.
When using Intel MPI and Intel compilers on Ubuntu, it is recommended to install Intel oneAPI.
# Install required tools (for GPG key management and wget)
sudo apt update
sudo apt install -y gpg-agent wget
# Retrieve Intel’s GPG key and save it to APT’s trusted keyring
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
# Add Intel oneAPI repository to APT sources
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
# Update package list
sudo apt update
# Install Intel oneAPI HPC Toolkit
sudo apt install intel-oneapi-hpc-toolkit
# Setup (must be run each time; add to ~/.bashrc for automatic configuration)
source /opt/intel/oneapi/setvars.sh
For detailed instructions and the latest information, see the official Intel website.
2. Create a test source code
# Create source code
vi hybrid_test.c
Paste the following code and save (press Esc, then type :wq, and press Enter to save and exit).
#define _GNU_SOURCE
#include <stdio.h>
#include <mpi.h>
#include <omp.h>
#include <sched.h>
int main(int argc, char** argv) {
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
int mpi_rank, mpi_size;
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
int num_threads = omp_get_num_threads();
int cpu_id = sched_getcpu();
printf("MPI Rank %d/%d: OpenMP Thread %d/%d running on CPU %dn",
mpi_rank, mpi_size, thread_id, num_threads, cpu_id);
}
MPI_Finalize();
return 0;
}
3. Compilation (using the Intel compiler)
# Use Intel MPI compiler wrapper (icx/mpiicx)
mpiicx -qopenmp -o hybrid_test hybrid_test.c
・mpiicx: Intel MPI C compiler (LLVM-based icx)
・-qopenmp: option to enable OpenMP
4. Runtime environment settings (specify OpenMP thread count)
# Specify the number of OpenMP threads per MPI process
export OMP_NUM_THREADS=4
5. Execution (Intel MPI × OpenMP hybrid)
# Run with 4 processes × 4 threads each (Intel MPI automatically optimizes core allocation)
mpirun -n 4 ./hybrid_test
-n 4: number of MPI processes (in Intel MPI, -np works the same)
※ Depending on physical CPU usage, performance may vary dynamically. Please keep this in mind when running benchmarks.
【Note】Process placement and control in Intel MPI
Intel MPI allows detailed control of core allocation via environment variables such as I_MPI_PIN and I_MPI_PIN_DOMAIN.
export I_MPI_PIN=1 # Pin processes to specific CPU cores
export I_MPI_PIN_DOMAIN=omp # Separate per OpenMP thread
Example execution result (with 16 virtual CPU cores)
MPI Rank 0/4: OpenMP Thread 0/4 running on CPU 0
MPI Rank 0/4: OpenMP Thread 1/4 running on CPU 2
MPI Rank 0/4: OpenMP Thread 2/4 running on CPU 1
MPI Rank 0/4: OpenMP Thread 3/4 running on CPU 3
MPI Rank 3/4: OpenMP Thread 0/4 running on CPU 12
MPI Rank 3/4: OpenMP Thread 1/4 running on CPU 13
MPI Rank 2/4: OpenMP Thread 0/4 running on CPU 8
MPI Rank 3/4: OpenMP Thread 2/4 running on CPU 15
MPI Rank 3/4: OpenMP Thread 3/4 running on CPU 14
MPI Rank 2/4: OpenMP Thread 3/4 running on CPU 10
MPI Rank 1/4: OpenMP Thread 0/4 running on CPU 6
MPI Rank 1/4: OpenMP Thread 1/4 running on CPU 4
MPI Rank 1/4: OpenMP Thread 2/4 running on CPU 7
MPI Rank 1/4: OpenMP Thread 3/4 running on CPU 5
MPI Rank 2/4: OpenMP Thread 2/4 running on CPU 11
MPI Rank 2/4: OpenMP Thread 1/4 running on CPU 9
With this, the basic operation check of an Intel MPI × OpenMP hybrid parallel program is complete.
For performance-critical use cases, consider using Intel compilers and MPI runtime environment.