OpenMP support ============== OpenMP directives and clauses ----------------------------- The following section shows supported OpenMP directives and the support status of their clauses. .. note:: ✅ = supported; ❌ = unsupported; 🔶 = partial support barrier ~~~~~~~ No clauses. critical ~~~~~~~~ No clauses. for ~~~ * ✅ collapse, firstprivate, lastprivate, private * 🔶 reduction * ❌ allocate, linear, nowait, order, ordered, schedule parallel ~~~~~~~~ * ✅ default, firstprivate, if, num_threads, private, shared * 🔶 reduction * ❌ allocate, copyin, proc_bind parallel for ~~~~~~~~~~~~ Combines ``parallel`` and ``for`` directives. See clauses for `for`_ and `parallel`_ above. single ~~~~~~ * ❌ allocate, copyprivate, firstprivate, nowait, private task ~~~~ * ✅ default, firstprivate, private, shared * ❌ affinity, allocate, detach, if, in_reduction, final, mergeable, priority, untied taskwait ~~~~~~~~ * ❌ depend, nowait target ~~~~~~ * ✅ device, firstprivate, map, private, thread_limit * ❌ allocate, defaultmap, depend, has_device_addr, if, in_reduction, is_device_ptr, nowait, uses_allocators teams ~~~~~ * ✅ default, firstprivate, num_teams, private, shared, thread_limit * 🔶 reduction distribute ~~~~~~~~~~ * ✅ firstprivate, lastprivate, private * ❌ allocate, collapse, dist_schedule, order teams distribute ~~~~~~~~~~~~~~~~ Combines ``teams`` and ``distribute`` directives. See clauses for `teams`_ and `distribute`_ above. target teams ~~~~~~~~~~~~ Combines ``target`` and ``teams`` directives. See clauses for `target`_ and `teams`_ above. target data ~~~~~~~~~~~ * ✅ device, map * ❌ if, use_device_ptr, use_device_addr target enter data ~~~~~~~~~~~~~~~~~ * ✅ device, map * ❌ depend, if, nowait target exit data ~~~~~~~~~~~~~~~~ Same clauses as `target enter data`_. See above. target update ~~~~~~~~~~~~~ * ✅ device, from, to * ❌ depend, if, nowait target teams distribute ~~~~~~~~~~~~~~~~~~~~~~~ Combines ``target``, ``teams``, and ``distribute`` directives. See clauses for `target`_, `teams`_, and `distribute`_ above. distribute parallel for ~~~~~~~~~~~~~~~~~~~~~~~ Combines ``distribute`` and ``parallel for`` directives. See clauses for `distribute`_, `parallel`_, and `for`_ above. target teams distribute parallel for ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Combines ``target``, ``teams``, ``distribute``, and ``parallel for`` directives. See clauses for `target`_, `teams`_, `parallel`_, and `for`_ above. OpenMP runtime functions ------------------------- Thread and team information ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 35 65 * - **omp_get_thread_num()** - Returns the unique identifier of the calling thread * - **omp_get_num_threads()** - Returns the total number of threads in the current parallel region * - **omp_set_num_threads(n)** - Sets the number of threads for subsequent parallel regions * - **omp_get_max_threads()** - Returns the maximum number of threads available * - **omp_get_num_procs()** - Returns the number of processors in the system * - **omp_get_thread_limit()** - Returns the thread limit for the parallel region * - **omp_in_parallel()** - Returns 1 if called within a parallel region, 0 otherwise * - **omp_get_team_num()** - Returns the team number in a target region * - **omp_get_num_teams()** - Returns the number of teams in a target region Timing ~~~~~~ .. list-table:: :widths: 35 65 * - **omp_get_wtime()** - Returns elapsed wall-clock time (useful for performance profiling) Nested and hierarchical parallelism ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 35 65 * - **omp_set_nested(flag)** - Enables or disables nested parallelism * - **omp_set_dynamic(flag)** - Enables or disables dynamic thread adjustment * - **omp_set_max_active_levels(n)** - Sets the maximum number of nested parallel levels * - **omp_get_max_active_levels()** - Returns the maximum number of nested parallel levels * - **omp_get_level()** - Returns the current nesting level * - **omp_get_active_level()** - Returns the current active nesting level * - **omp_get_ancestor_thread_num(level)** - Returns the thread number at a given nesting level * - **omp_get_team_size(level)** - Returns the team size at a given nesting level * - **omp_get_supported_active_levels()** - Returns the supported number of nested active levels Advanced features ~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 35 65 * - **omp_get_proc_bind()** - Returns the processor binding policy * - **omp_get_num_places()** - Returns the number of available places * - **omp_get_place_num_procs(place)** - Returns the number of processors in a place * - **omp_get_place_num()** - Returns the current place number * - **omp_in_final()** - Returns 1 if called in a final task, 0 otherwise Device and target offloading ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 35 65 * - **omp_get_num_devices()** - Returns the number of available target devices * - **omp_get_device_num()** - Returns the device number of the current target device * - **omp_set_default_device(device_id)** - Sets the default device for subsequent target regions * - **omp_get_default_device()** - Returns the default device ID for target regions * - **omp_is_initial_device()** - Returns 1 if executing on the initial device (host), 0 otherwise * - **omp_get_initial_device()** - Returns the device ID of the initial device (host) Supported features and platforms --------------------------------- OpenMP and GPU offloading support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PyOMP builds on `Numba `_ Just-In-Time (JIT) compilation extensions and leverages LLVM's OpenMP implementation to provide portable parallel execution. The supported OpenMP features depend on your versions of LLVM and Numba. For compatibility details, see the `Numba support info `_ in the Numba documentation. PyOMP also supports GPU offloading for NVIDIA GPUs. The supported GPU architectures depend on the LLVM version and its OpenMP runtime. Consult the LLVM OpenMP documentation for details on your specific version. Device selection and querying ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PyOMP provides utilities in the ``offloading`` module to query available OpenMP target devices and select specific devices for offloading based on device type, vendor, and architecture. This enables fine-grained control over where target regions execute. Discovering Available Devices ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To see all available devices and their properties, use ``print_offloading_info()``: .. code-block:: python from numba.openmp.offloading import print_offloading_info print_offloading_info() This prints information about all devices, including device counts and default device settings. Finding devices by criteria ^^^^^^^^^^^^^^^^^^^^^^^^^^^ To programmatically find device IDs matching specific criteria, use ``find_device_ids()``: .. code-block:: python from numba.openmp.offloading import find_device_ids # Find all GPU devices gpu_devices = find_device_ids(type="gpu") # Find all NVIDIA GPUs nvidia_gpus = find_device_ids(vendor="nvidia") # Find NVIDIA GPUs with specific architecture (e.g., sm_80) sm80_gpus = find_device_ids(vendor="nvidia", arch="sm_80") # Find all AMD GPUs amd_gpus = find_device_ids(vendor="amd") # Find host/CPU device host_devices = find_device_ids(type="host") The function returns a list of device IDs (integers) matching the criteria. Any parameter can be ``None`` to act as a wildcard and match all values. Querying device properties ^^^^^^^^^^^^^^^^^^^^^^^^^^ To determine the type, vendor, or architecture of a specific device ID, use the property getter functions: .. code-block:: python from numba.openmp.offloading import ( get_device_type, get_device_vendor, get_device_arch, ) # Check device type dev_type = get_device_type(device_id) # Returns "gpu", "host", or None # Check vendor vendor = get_device_vendor(device_id) # Returns "nvidia", "amd", "host", or None # Check architecture arch = get_device_arch(device_id) # Returns architecture string or None Using device ids in target regions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you have identified a device ID, you can use it in OpenMP target directives via the ``device`` clause: .. code-block:: python from numba.openmp import njit, openmp_context as openmp from numba.openmp.offloading import find_device_ids import numpy as np # Find first available NVIDIA GPU nvidia_devices = find_device_ids(vendor="nvidia") if nvidia_devices: device_id = nvidia_devices[0] else: # Fall back to host if no NVIDIA GPU found device_id = find_device_ids(type="host")[0] @njit def inc(x): with openmp(f"target loop device({device_id}) map(tofrom: x)"): # Computation runs on specified device for i in range(len(x)): x[i] = x[i] + 1 return x x = inc(np.ones(10)) print(f"Result on device {device_id}: {x}") Version and platform support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following table shows tested combinations of PyOMP, Numba, Python, LLVM, and supported platforms: .. table:: :widths: auto ===================== ==================== ==================== ============ ================================ PyOMP Numba Python LLVM Supported Platforms ===================== ==================== ==================== ============ ================================ 0.5.x 0.62.x - 0.63.x 3.10 - 3.14 20.x linux-64, osx-arm64, linux-arm64 0.4.x 0.61.x 3.10 - 3.13 15.x linux-64, osx-arm64, linux-arm64 0.3.x 0.57.x - 0.60.x 3.9 - 3.12 14.x linux-64, osx-arm64, linux-arm64 ===================== ==================== ==================== ============ ================================ OpenMP parallelism support by platform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ =========== ================ ================= =================== Platform CPU NVIDIA GPU AMD GPU =========== ================ ================= =================== linux-64 ✅ Supported ✅ Supported 🔶 Work in progress linux-arm64 ✅ Supported ✅ Supported 🔶 Work in progress osx-arm64 ✅ Supported ❌ Unsupported ❌ Unsupported =========== ================ ================= =================== Platform details ^^^^^^^^^^^^^^^^ * **linux-64**: Linux x86_64 architecture * **osx-arm64**: macOS ARM64 (Apple Silicon) * **linux-arm64**: Linux ARM64 architecture * **GPU offloading**: Available on Linux platforms only (linux-64 and linux-arm64) Notes ^^^^^ * Python 3.14 free-threaded build (cp314t) is not supported with the current Numba/llvmlite version. * LLVM version 20.1.8 is used for the current PyOMP 0.5.x releases. * For GPU offloading support, NVIDIA GPU and NVIDIA driver are required on supported Linux platforms. * AMD GPU support is in active development.