OpenMP support
OpenMP directives and clauses
The following section shows supported OpenMP directives and the support status of their clauses.
Note
✅ = supported; ❌ = unsupported; 🔶 = partial support
barrier
No clauses.
critical
No clauses.
for
✅ collapse, firstprivate, lastprivate, private
🔶 reduction
❌ allocate, linear, nowait, order, ordered, schedule
parallel
✅ default, firstprivate, if, num_threads, private, shared
🔶 reduction
❌ allocate, copyin, proc_bind
parallel for
Combines parallel and for directives. See clauses for for and parallel above.
single
❌ allocate, copyprivate, firstprivate, nowait, private
task
✅ default, firstprivate, private, shared
❌ affinity, allocate, detach, if, in_reduction, final, mergeable, priority, untied
taskwait
❌ depend, nowait
target
✅ device, firstprivate, map, private, thread_limit
❌ allocate, defaultmap, depend, has_device_addr, if, in_reduction, is_device_ptr, nowait, uses_allocators
teams
✅ default, firstprivate, num_teams, private, shared, thread_limit
🔶 reduction
distribute
✅ firstprivate, lastprivate, private
❌ allocate, collapse, dist_schedule, order
teams distribute
Combines teams and distribute directives. See clauses for teams and distribute above.
target teams
Combines target and teams directives. See clauses for target and teams above.
target data
✅ device, map
❌ if, use_device_ptr, use_device_addr
target enter data
✅ device, map
❌ depend, if, nowait
target exit data
Same clauses as target enter data. See above.
target update
✅ device, from, to
❌ depend, if, nowait
target teams distribute
Combines target, teams, and distribute directives. See clauses for target, teams, and distribute above.
distribute parallel for
Combines distribute and parallel for directives. See clauses for distribute, parallel, and for above.
target teams distribute parallel for
Combines target, teams, distribute, and parallel for directives. See clauses for target, teams, parallel, and for above.
OpenMP runtime functions
Thread and team information
omp_get_thread_num() |
Returns the unique identifier of the calling thread |
omp_get_num_threads() |
Returns the total number of threads in the current parallel region |
omp_set_num_threads(n) |
Sets the number of threads for subsequent parallel regions |
omp_get_max_threads() |
Returns the maximum number of threads available |
omp_get_num_procs() |
Returns the number of processors in the system |
omp_get_thread_limit() |
Returns the thread limit for the parallel region |
omp_in_parallel() |
Returns 1 if called within a parallel region, 0 otherwise |
omp_get_team_num() |
Returns the team number in a target region |
omp_get_num_teams() |
Returns the number of teams in a target region |
Timing
omp_get_wtime() |
Returns elapsed wall-clock time (useful for performance profiling) |
Nested and hierarchical parallelism
omp_set_nested(flag) |
Enables or disables nested parallelism |
omp_set_dynamic(flag) |
Enables or disables dynamic thread adjustment |
omp_set_max_active_levels(n) |
Sets the maximum number of nested parallel levels |
omp_get_max_active_levels() |
Returns the maximum number of nested parallel levels |
omp_get_level() |
Returns the current nesting level |
omp_get_active_level() |
Returns the current active nesting level |
omp_get_ancestor_thread_num(level) |
Returns the thread number at a given nesting level |
omp_get_team_size(level) |
Returns the team size at a given nesting level |
omp_get_supported_active_levels() |
Returns the supported number of nested active levels |
Advanced features
omp_get_proc_bind() |
Returns the processor binding policy |
omp_get_num_places() |
Returns the number of available places |
omp_get_place_num_procs(place) |
Returns the number of processors in a place |
omp_get_place_num() |
Returns the current place number |
omp_in_final() |
Returns 1 if called in a final task, 0 otherwise |
Device and target offloading
omp_get_num_devices() |
Returns the number of available target devices |
omp_get_device_num() |
Returns the device number of the current target device |
omp_set_default_device(device_id) |
Sets the default device for subsequent target regions |
omp_get_default_device() |
Returns the default device ID for target regions |
omp_is_initial_device() |
Returns 1 if executing on the initial device (host), 0 otherwise |
omp_get_initial_device() |
Returns the device ID of the initial device (host) |
Supported features and platforms
OpenMP and GPU offloading support
PyOMP builds on Numba Just-In-Time (JIT) compilation extensions and leverages LLVM’s OpenMP implementation to provide portable parallel execution. The supported OpenMP features depend on your versions of LLVM and Numba. For compatibility details, see the Numba support info in the Numba documentation.
PyOMP also supports GPU offloading for NVIDIA GPUs. The supported GPU architectures depend on the LLVM version and its OpenMP runtime. Consult the LLVM OpenMP documentation for details on your specific version.
Device selection and querying
PyOMP provides utilities in the offloading module to query available OpenMP target
devices and select specific devices for offloading based on device type, vendor, and
architecture. This enables fine-grained control over where target regions execute.
Discovering Available Devices
To see all available devices and their properties, use print_offloading_info():
from numba.openmp.offloading import print_offloading_info
print_offloading_info()
This prints information about all devices, including device counts and default device settings.
Finding devices by criteria
To programmatically find device IDs matching specific criteria, use find_device_ids():
from numba.openmp.offloading import find_device_ids
# Find all GPU devices
gpu_devices = find_device_ids(type="gpu")
# Find all NVIDIA GPUs
nvidia_gpus = find_device_ids(vendor="nvidia")
# Find NVIDIA GPUs with specific architecture (e.g., sm_80)
sm80_gpus = find_device_ids(vendor="nvidia", arch="sm_80")
# Find all AMD GPUs
amd_gpus = find_device_ids(vendor="amd")
# Find host/CPU device
host_devices = find_device_ids(type="host")
The function returns a list of device IDs (integers) matching the criteria. Any parameter
can be None to act as a wildcard and match all values.
Querying device properties
To determine the type, vendor, or architecture of a specific device ID, use the property getter functions:
from numba.openmp.offloading import (
get_device_type,
get_device_vendor,
get_device_arch,
)
# Check device type
dev_type = get_device_type(device_id) # Returns "gpu", "host", or None
# Check vendor
vendor = get_device_vendor(device_id) # Returns "nvidia", "amd", "host", or None
# Check architecture
arch = get_device_arch(device_id) # Returns architecture string or None
Using device ids in target regions
Once you have identified a device ID, you can use it in OpenMP target directives via the
device clause:
from numba.openmp import njit, openmp_context as openmp
from numba.openmp.offloading import find_device_ids
import numpy as np
# Find first available NVIDIA GPU
nvidia_devices = find_device_ids(vendor="nvidia")
if nvidia_devices:
device_id = nvidia_devices[0]
else:
# Fall back to host if no NVIDIA GPU found
device_id = find_device_ids(type="host")[0]
@njit
def inc(x):
with openmp(f"target loop device({device_id}) map(tofrom: x)"):
# Computation runs on specified device
for i in range(len(x)):
x[i] = x[i] + 1
return x
x = inc(np.ones(10))
print(f"Result on device {device_id}: {x}")
Version and platform support
The following table shows tested combinations of PyOMP, Numba, Python, LLVM, and supported platforms:
PyOMP |
Numba |
Python |
LLVM |
Supported Platforms |
|---|---|---|---|---|
0.5.x |
0.62.x - 0.63.x |
3.10 - 3.14 |
20.x |
linux-64, osx-arm64, linux-arm64 |
0.4.x |
0.61.x |
3.10 - 3.13 |
15.x |
linux-64, osx-arm64, linux-arm64 |
0.3.x |
0.57.x - 0.60.x |
3.9 - 3.12 |
14.x |
linux-64, osx-arm64, linux-arm64 |
OpenMP parallelism support by platform
Platform |
CPU |
NVIDIA GPU |
AMD GPU |
|---|---|---|---|
linux-64 |
✅ Supported |
✅ Supported |
🔶 Work in progress |
linux-arm64 |
✅ Supported |
✅ Supported |
🔶 Work in progress |
osx-arm64 |
✅ Supported |
❌ Unsupported |
❌ Unsupported |
Platform details
linux-64: Linux x86_64 architecture
osx-arm64: macOS ARM64 (Apple Silicon)
linux-arm64: Linux ARM64 architecture
GPU offloading: Available on Linux platforms only (linux-64 and linux-arm64)
Notes
Python 3.14 free-threaded build (cp314t) is not supported with the current Numba/llvmlite version.
LLVM version 20.1.8 is used for the current PyOMP 0.5.x releases.
For GPU offloading support, NVIDIA GPU and NVIDIA driver are required on supported Linux platforms.
AMD GPU support is in active development.