Procedure for Building an OpenMP × OpenMPI Execution Environment

This procedure explains how to build and run C programs in a OpenMP + OpenMPI environment on an Ubuntu virtual machine running on OpenStack.
Here, we assume a 16-core environment and use the example of 4 MPI processes × 4 OpenMP threads per process (total 16-way parallelism).

1. Install required packages (Compiler, MPI, Visualization tools)

# Update package information
sudo apt update
# Install essential development tools
sudo apt install -y build-essential gfortran openmpi-bin libopenmpi-dev hwloc

・build-essential: Build tools for C/C++ (gcc, make, etc.)
・gfortran: Fortran compiler (not used here, but often required)
・openmpi-bin, libopenmpi-dev: MPI runtime environment and libraries
・hwloc: Tool for checking CPU core topology (lstopo)

2. Create test source code

# Create and edit the source code (using vi editor here)
vi hybrid_test.c

Paste the following code and save (press Esc, type :wq, then Enter to save and exit):

#define _GNU_SOURCE
#include <stdio.h>
#include <mpi.h>
#include <omp.h>
#include <sched.h>

int main(int argc, char** argv) {
    int provided;
    MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);

    int mpi_rank, mpi_size;
    MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);

    #pragma omp parallel
    {
        int thread_id = omp_get_thread_num();
        int num_threads = omp_get_num_threads();
        int cpu_id = sched_getcpu();

        printf("MPI Rank %d/%d: OpenMP Thread %d/%d running on CPU %dn",
               mpi_rank, mpi_size, thread_id, num_threads, cpu_id);
    }

    MPI_Finalize();
    return 0;
}

3. Compile (supporting both OpenMP and MPI)

# Build an executable supporting both OpenMP and MPI
mpicc -fopenmp -o hybrid_test hybrid_test.c

・mpicc: MPI C compiler wrapper included with OpenMPI (based on gcc)
・-fopenmp: Enable OpenMP support
・-o hybrid_test: Specify the output file name

4. Configure the execution environment (control OpenMP behavior)

# Set the number of OpenMP threads per MPI process
export OMP_NUM_THREADS=4

5. Run (Hybrid MPI × OpenMP execution)

# Run with 4 MPI processes × 4 OpenMP threads each
mpirun -np 4 --map-by slot:PE=4 ./hybrid_test

-np 4: Launch 4 MPI processes
--map-by slot:PE=4: Assign 4 CPU cores per process
Total execution = 4 processes × 4 threads = 16-way parallelism
Note: Depending on the physical CPU utilization, performance may vary dynamically. Please keep this in mind for benchmarking purposes.

[Note] Using Intel MPI

With Intel MPI, the OMP_NUM_THREADS setting is automatically recognized by the runtime, and optimal core allocation is applied. Therefore, options such as --map-by are usually unnecessary.

Binaries built with the Intel compiler are often 10–20% faster compared to GCC builds. For performance-critical applications, using an Intel environment is worth considering.

For details on building an OpenMP × Intel MPI execution environment, please refer to this page.

Example output (assuming a 16-core virtual CPU)

MPI Rank 1/4: OpenMP Thread 0/4 running on CPU 4
MPI Rank 1/4: OpenMP Thread 3/4 running on CPU 7
MPI Rank 1/4: OpenMP Thread 2/4 running on CPU 5
MPI Rank 1/4: OpenMP Thread 1/4 running on CPU 6
MPI Rank 2/4: OpenMP Thread 0/4 running on CPU 8
MPI Rank 2/4: OpenMP Thread 3/4 running on CPU 11
MPI Rank 2/4: OpenMP Thread 2/4 running on CPU 9
MPI Rank 2/4: OpenMP Thread 1/4 running on CPU 10
MPI Rank 0/4: OpenMP Thread 0/4 running on CPU 0
MPI Rank 0/4: OpenMP Thread 2/4 running on CPU 2
MPI Rank 0/4: OpenMP Thread 1/4 running on CPU 3
MPI Rank 0/4: OpenMP Thread 3/4 running on CPU 1
MPI Rank 3/4: OpenMP Thread 0/4 running on CPU 12
MPI Rank 3/4: OpenMP Thread 3/4 running on CPU 15
MPI Rank 3/4: OpenMP Thread 2/4 running on CPU 14
MPI Rank 3/4: OpenMP Thread 1/4 running on CPU 13

This completes the basic verification of an OpenMP × OpenMPI parallel program implemented in C.
You can build on this configuration to develop actual numerical simulations and parallel processing codes.