HIP: Heterogenous-computing Interface for Portability
|
Term | CUDA | HIP | HC | C++AMP | OpenCL |
---|---|---|---|---|---|
Device | int deviceId | int deviceId | hc::accelerator | concurrency:: accelerator | cl_device |
Queue | cudaStream_t | hipStream_t | hc:: accelerator_view | concurrency:: accelerator_view | cl_command_queue |
Event | cudaEvent_t | hipEvent_t | hc:: completion_future | concurrency:: completion_future | cl_event |
Memory | void * | void * | void * ; hc::array ; hc::array_view | concurrency::array ;concurrency::array_view | cl_mem |
||||| | |grid|grid|extent|extent|NDRange | |block|block|tile|tile|work-group | |thread|thread|thread|thread|work-item | |warp|warp|wavefront|N/A|sub-group ||||| |Thread-
index | threadIdx.x | hipThreadIdx_x | t_idx.local[0] | t_idx.local[0] | get_local_id(0) | |Block-
index | blockIdx.x | hipBlockIdx_x | t_idx.tile[0] | t_idx.tile[0] | get_group_id(0) | |Block-
dim | blockDim.x | hipBlockDim_x | t_ext.tile_dim[0]| t_idx.tile_dim0 | get_local_size(0) | |Grid-dim | gridDim.x | hipGridDim_x | t_ext[0]| t_ext[0] | get_global_size(0) | ||||| |Device Kernel|__global__
|__global__
|lambda inside hc::
parallel_for_each
or [[hc]]|restrict(amp)
|__kernel
|Device Function|__device__
|__device__
|[[hc]]
(detected automatically in many case)|restrict(amp)
|Implied in device compilation |Host Function|__host_
(default)|__host_
(default)|[[cpu]]
(default)|restrict(cpu)
(default)|Implied in host compilation. |Host + Device Function|__host__
__device__
|__host__
__device__
| [[hc]]
[[cpu]]
|restrict(amp,cpu)
|No equivalent |Kernel Launch|<<< >>>
|hipLaunchKernel
|hc::
parallel_for_each
|concurrency::
parallel_for_each
|clEnqueueNDRangeKernel
|||||| |Global Memory|__global__
|__global__
|Unnecessary / Implied|Unnecessary / Implied|__global
|Group Memory|__shared__
|__shared__
|tile_static
|tile_static
|__local
|Constant|__constant__
|__constant__
|Unnecessary / Implied|Unnecessary / Implied|__constant
|||||| ||__syncthreads
|__syncthreads
|tile_static.barrier()
|t_idx.barrier()
|barrier(CLK_LOCAL_MEMFENCE)
|Atomic Builtins|atomicAdd
|atomicAdd
|hc::atomic_fetch_add
|concurrency::
atomic_fetch_add
|atomic_add
|Precise Math|cos(f)
|cos(f)
|hc::
precise_math::cos(f)
|concurrency::
precise_math::cos(f)
|cos(f)
|Fast Math|__cos(f)
|__cos(f)
|hc::
fast_math::cos(f)
|concurrency::
fast_math::cos(f)
|native_cos(f)
|Vector|float4
|float4
|hc::
short_vector::float4
|concurrency::
graphics::float_4
|float4
thread-index
) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.t_ext.tile_dim[0]
while C++AMP is t_ext.tile_dim0.