HIP RTC lib
HIP allows you to compile kernels at runtime with its hiprtc*
APIs. Kernels can be store as a text string and can be passed on to hiprtc APIs alongside options to guide the compilation.
NOTE:
- This library can be used on systems without HIP install nor AMD GPU driver installed at all (offline compilation). Therefore it does not depend on any HIP runtime library.
- But it does depend on COMGr. We may try to statically link COMGr into hipRTC to avoid any ambiguity.
- Developers can decide to bundle this library with their application.
Compile APIs
Example
To use hiprtc functionality, hiprtc header needs to be included first.
{#include}
Kernels can be stored in a string:
```cpp
static constexpr auto kernel {
R"(
extern "C"
__global__ void gpu_kernel(...) {
// Kernel Functionality
}
)"};
Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done via declaring hiprtcProgram prog;
and associating the string of kernel with this program:
kernel,
"gpu_kernel.cu",
num_headers,
&header_sources[0],
&header_names[0]);
hiprtcResult hiprtcCreateProgram(hiprtcProgram *prog, const char *src, const char *name, int numHeaders, const char **headers, const char **includeNames)
Creates an instance of hiprtcProgram with the given input parameters, and sets the output hiprtcProgr...
hiprtcCreateProgram API also allows you to add headers which can be included in your rtc program. For online compilation, the compiler pre-defines HIP device API functions, HIP specific types and macros for device compilation, but does not include standard C/C++ headers by default. Users can only include header files provided to hiprtcCreateProgram.
After associating the kernel string with hiprtcProgram, you can now compile this program using:
0,
options);
hiprtcResult hiprtcCompileProgram(hiprtcProgram prog, int numOptions, const char **options)
Compiles the given runtime compilation program.
hiprtcCompileProgram returns a status value which can be converted to string via hiprtcGetErrorString
. If compilation is successful, hiprtcCompileProgram will return HIPRTC_SUCCESS
.
If the compilation fails, you can look up the logs via:
size_t logSize;
if (logSize) {
string log(logSize, '\0');
}
hiprtcResult hiprtcGetProgramLogSize(hiprtcProgram prog, size_t *logSizeRet)
Gets the size of log generated by the runtime compilation program instance.
hiprtcResult hiprtcGetProgramLog(hiprtcProgram prog, char *log)
Gets the log generated by the runtime compilation program instance.
If the compilation is successful, you can load the compiled binary in a local variable.
size_t codeSize;
vector<char> kernel_binary(codeSize);
hiprtcResult hiprtcGetCodeSize(hiprtcProgram prog, size_t *codeSizeRet)
Gets the size of compilation binary by the runtime compilation program instance.
hiprtcResult hiprtcGetCode(hiprtcProgram prog, char *code)
Gets the pointer of compilation binary by the runtime compilation program instance.
After loading the binary, hiprtcProgram can be destroyed.
{hiprtcDestroyProgram(&prog);```}
The binary present in ```kernel_binary``` can now be loaded via ```hipModuleLoadData``` API.
```cpp
hipModule_t module;
hipFunction_t kernel;
hipModuleLoadData(&module, kernel_binary.data());
hipModuleGetFunction(&kernel, module, "gpu_kernel");
And now this kernel can be launched via hipModule APIs.
Please have a look at saxpy.cpp and hiprtcGetLoweredName.cpp files for a detailed example.
HIPRTC specific options
HIPRTC provides a few hiprtc specific flags
--gpu-architecture
: This flag can guide the code object generation for a specific gpu arch. Example: --gpu-architecture=gfx906:sramecc+:xnack-
, its equivalent to --offload-arch
.
- This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime.
- Otherwise, hipRTC will load the hip runtime and gather the current device and its architecture info and use it as option.
-fgpu-rdc
: This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs.
Bitcode
In the usual scenario, the kernel associated with hiprtcProgram is compiled into the binary which can be loaded and run. However, if -fpu-rdc option is provided in the compile options, HIPRTC calls comgr and generates only the LLVM bitcode. It doesn't convert this bitcode to ISA and generate the final binary.
std::string sarg = std::string("-fgpu-rdc");
const char* options[] = {
sarg.c_str() };
1,
options);
If the compilation is successful, one can load the bitcode in a local variable using the bitcode APIs provided by HIPRTC.
size_t bitCodeSize;
vector<char> kernel_bitcode(bitCodeSize);
hiprtcResult hiprtcGetBitcodeSize(hiprtcProgram prog, size_t *bitcode_size)
Gets the size of compiled bitcode by the runtime compilation program instance.
hiprtcResult hiprtcGetBitcode(hiprtcProgram prog, char *bitcode)
Gets the pointer of compiled bitcode by the runtime compilation program instance.
Linker APIs
Introduction
The bitcode generated using the HIPRTC Bitcode APIs can be loaded using hipModule APIs and also can be linked with other generated bitcodes with appropriate linker flags using the HIPRTC linker APIs. This also provides more flexibility and optimizations to the applications who want to generate the binary dynamically according to their needs. The input bitcodes can be generated only for a specific architecture or it can be a bundled bitcode which is generated for multiple architectures.
Example
Firstly, hiprtc link instance or a pending linker invocation must be created using hiprtcLinkCreate, with the appropriate linker options provided.
options,
option_vals,
&rtc_link_state );
hiprtcResult hiprtcLinkCreate(unsigned int num_options, hiprtcJIT_option *option_ptr, void **option_vals_pptr, hiprtcLinkState *hip_link_state_ptr)
Creates the link instance via hiprtc APIs.
Following which, the bitcode data can be added to this link instance via hiprtcLinkAddData (if the data is present as a string) or hiprtcLinkAddFile (if the data is present as a file) with the appropriate input type according to the data or the bitcode used.
input_type,
bit_code_ptr,
bit_code_size,
"a",
0,
0,
0);
hiprtcResult hiprtcLinkAddData(hiprtcLinkState hip_link_state, hiprtcJITInputType input_type, void *image, size_t image_size, const char *name, unsigned int num_options, hiprtcJIT_option *options_ptr, void **option_values)
Completes the linking of the given program.
input_type,
bc_file_path.c_str(),
0,
0,
0);
hiprtcResult hiprtcLinkAddFile(hiprtcLinkState hip_link_state, hiprtcJITInputType input_type, const char *file_path, unsigned int num_options, hiprtcJIT_option *options_ptr, void **option_values)
Adds a file with bit code to be linked with options.
Once the bitcodes for multiple archs are added to the link instance, the linking of the device code must be completed using hiprtcLinkComplete which generates the final binary.
&binary,
&binarySize);
hiprtcResult hiprtcLinkComplete(hiprtcLinkState hip_link_state, void **bin_out, size_t *size_out)
Completes the linking of the given program.
If the hiprtcLinkComplete returns successfully, the generated binary can be loaded and run using the hipModule* APIs.
hipError_t hipModuleLoadData(hipModule_t *module, const void *image)
builds module from code object which resides in host memory. Image is pointer to that location.
Note
- The compiled binary must be loaded before hiprtc link instance is destroyed using the hiprtcLinkDestroy API.
hiprtcResult hiprtcLinkDestroy(hiprtcLinkState hip_link_state)
Deletes the link instance via hiprtc APIs.
- The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy.
Input Types
HIPRTC provides hiprtcJITInputType enumeration type which defines the input types accepted by the Linker APIs. Here are the enum values of hiprtcJITInputType. However only the input types HIPRTC_JIT_INPUT_LLVM_BITCODE, HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE and HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE are supported currently.
HIPRTC_JIT_INPUT_CUBIN = 0,
HIPRTC_JIT_INPUT_PTX,
HIPRTC_JIT_INPUT_FATBINARY,
HIPRTC_JIT_INPUT_OBJECT,
HIPRTC_JIT_INPUT_LIBRARY,
HIPRTC_JIT_INPUT_NVVM,
HIPRTC_JIT_NUM_LEGACY_INPUT_TYPES,
HIPRTC_JIT_INPUT_LLVM_BITCODE = 100,
HIPRTC_JIT_INPUT_LLVM_BUNDLED_BITCODE = 101,
HIPRTC_JIT_INPUT_LLVM_ARCHIVES_OF_BUNDLED_BITCODE = 102,
HIPRTC_JIT_NUM_INPUT_TYPES = (HIPRTC_JIT_NUM_LEGACY_INPUT_TYPES + 3)
Error Handling
HIPRTC defines the hiprtcResult enumeration type and a function hiprtcGetErrorString for API call error handling. hiprtcResult enum defines the API result codes. HIPRTC APIs return hiprtcResult to indicate the call result. hiprtcGetErrorString function returns a string describing the given hiprtcResult code, e.g., HIPRTC_SUCCESS to "HIPRTC_SUCCESS". For unrecognized enumeration values, it returns "Invalid HIPRTC error code".
hiprtcResult enum supported values and the hiprtcGetErrorString usage are mentioned below.
HIPRTC_SUCCESS = 0,
HIPRTC_ERROR_OUT_OF_MEMORY = 1,
HIPRTC_ERROR_PROGRAM_CREATION_FAILURE = 2,
HIPRTC_ERROR_INVALID_INPUT = 3,
HIPRTC_ERROR_INVALID_PROGRAM = 4,
HIPRTC_ERROR_INVALID_OPTION = 5,
HIPRTC_ERROR_COMPILATION = 6,
HIPRTC_ERROR_LINKING = 7,
HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE = 8,
HIPRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION = 9,
HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 10,
HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 11,
HIPRTC_ERROR_INTERNAL_ERROR = 12
hiprtcResult result;
if (result != HIPRTC_SUCCESS) {
}
const char * hiprtcGetErrorString(hiprtcResult result)
Returns text string message to explain the error which occurred.
HIPRTC General APIs
HIPRTC provides the following API for querying the version.
hiprtcVersion(int* major, int* minor) - This sets the output parameters major and minor with the HIP Runtime compilation major version and minor version number respectively.
Currently, it returns hardcoded value. This should be implemented to return HIP runtime major and minor version in the future releases.
Lowered Names (Mangled Names)
HIPRTC mangles the __global__
function names and names of __device__
and __constant__
variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or __device__/__constant__
variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map __global__
function or __device__/__constant__
variable names in the source to the mangled names present in the generated binary.
The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the __global__
function or __device__/__constant__
variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API.
Note
- The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name.
- The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram.
- The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed.
Example
kernel containing various definitions __global__
functions/function templates and __device__/__constant__
variables can be stored in a string.
static constexpr const char gpu_program[]{
R"(
__device__ int V1; // set from host code
static __global__ void f1(int *result) { *result = V1 + 10; }
namespace N1 {
namespace N2 {
__constant__ int V2; // set from host code
__global__ void f2(int *result) { *result = V2 + 20; }
}
}
template<typename T>
__global__ void f3(int *result) { *result = sizeof(T); }
)"};
hiprtcAddNameExpression is called with various name expressions referring to the address of __global__
functions and __device__/__constant__
variables.
kernel_name_vec.push_back("&f1");
kernel_name_vec.push_back("N1::N2::f2");
kernel_name_vec.push_back("f3<int>");
variable_name_vec.push_back("&V1");
variable_name_vec.push_back("&N1::N2::V2");
hiprtcResult hiprtcAddNameExpression(hiprtcProgram prog, const char *name_expression)
Adds the given name exprssion to the runtime compilation program.
After which, the program is compiled using hiprtcCompileProgram and the generated binary is loaded using hipModuleLoadData. And the mangled names can be fetched using hirtcGetLoweredName.
for (decltype(variable_name_vec.size()) i = 0; i != variable_name_vec.size(); ++i) {
const char* name;
}
hiprtcResult hiprtcGetLoweredName(hiprtcProgram prog, const char *name_expression, const char **lowered_name)
Gets the lowered (mangled) name from an instance of hiprtcProgram with the given input parameters,...
for (decltype(kernel_name_vec.size()) i = 0; i != kernel_name_vec.size(); ++i) {
const char* name;
}
The mangled name of the variables are used to look up the variable in the module and update its value.
hipDeviceptr_t variable_addr;
size_t bytes{};
hipModuleGetGlobal(&variable_addr, &bytes, module, name);
hipMemcpyHtoD(variable_addr, &initial_value, sizeof(initial_value));
Finally, the mangled name of the kernel is used to launch it using the hipModule APIs.
hipFunction_t kernel;
hipModuleLaunchKernel(kernel, 1, 1, 1, 1, 1, 1, 0,
nullptr,
nullptr, config);
hipError_t hipModuleLaunchKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void **kernelParams, void **extra)
launches kernel f with launch parameters and shared memory on stream with arguments passed to kernelp...
hipError_t hipModuleGetFunction(hipFunction_t *function, hipModule_t module, const char *kname)
Function with kname will be extracted if present in module.
Please have a look at hiprtcGetLoweredName.cpp for the detailed example.
Versioning
HIPRTC follows the below versioning.
- Linux
- HIPRTC follows the same versioning as HIP runtime library.
- The soname field for the shared library is set to MAJOR version. eg: For HIP 5.3 the soname is set to 5 (hiprtc.so.5).
- Windows
- Currently, the HIPRTC dll doesn't have any version attached. It is just named as hiprtc.dll.
- In the upcoming releases, HIPRTC dll will be named as hiprtc_XXYY.dll where XX is MAJOR version and YY is MINOR version. eg: For HIP 5.3 the name is hiprtc_0503.dll.
Deprecation notice
Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explictly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols.