Next: , Previous: , Up: Top   [Contents][Index]


20 Debugging Heterogeneous Programs

Note: The commands presented in this chapter are not currently fully implemented. See AMD GPU for the current support available.

In some operating systems, such as Linux with the AMD ROCm platform installed, a single program may have multiple threads in the same process, executing on different devices which may have different target architectures. Such a system is termed a heterogeneous system and a program that uses the multiple devices is termed a heterogeneous program.

The multiple devices of a heterogeneous system are termed heterogeneous agents. They can include the following kinds of devices: CPU (Central Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), as well as other specialized hardware.

The device of a heterogeneous system that starts the execution of the program is termed the heterogeneous host agent.

The precise way threads are created on different heterogeneous agents may vary from one heterogeneous system to another, but in general the threads behave similarly no matter what heterogeneous agent is executing them, except that the target architecture may be different.

A heterogeneous program can create heterogeneous queues associated with a heterogeneous agent. The heterogeneous program can then place heterogeneous packets on a heterogeneous queue to control the actions of the associated heterogeneous agent. A heterogeneous agent removes heterogeneous packets from the heterogeneous queues assocated with it and performs the requested actions. The packet actions and scheduling of packet processing varies depending on the heterogeneous system and the target architecture of the heterogeneous agent. See Architectures.

A heterogeneous dispatch packet is used to initiate code execution on a heterogeneous agent. A single heterogeneous dispatch packet may specify that the heterogeneous agent create a set of threads that are all associated with a corresponding heterogeneous dispatch. Each thread typically has an associated position within the heterogeneous dispatch, possibly expressed as a multi-dimensional grid position. The heterogeneous agent typically can create multiple threads that execute concurrently. If a heterogeneous dispatch is larger than the number of concurrent threads that can be created, the heterogeneous agent creates threads of the heterogeneous dispatch as other threads complete. When all the threads of a heterogeneous dispatch have been created and have completed, the heterogeneous dispatch is considered complete.

The threads of a heterogeneous dispatch may be grouped into heterogeneous work-groups. The threads that belong to the same heterogeneous work-group may have special shared memory, and efficient execution synchronization abilities. A thread that is part of a heterogeneous work-group typically has an associated position within the heterogeneous work-group, possibly also expressed as a multi-dimensional grid position.

Other heterogeneous packets may control heterogeneous packet scheduling, memory visibility between the threads of a heterogeneous dispatch and other threads, or other services supported by the heterogeneous system.

On some heterogeneous systems there can be heterogeneous agents that support SIMD (Single Instruction Multiple Data) or SIMT (Single Instruction Multiple Threads) machine instructions. On these target architectures, a single machine instruction can operate in parallel on multiple heterogeneous lanes.

Source languages used by heterogeneous programs can be implemented on target architectures that support multiple heterogeneous lanes by mapping a source language thread of execution onto a heterogeneous lane of a single target architecture thread. Control flow in the source language may be implemented by controlling which heterogeneous lanes are active. If the source language control flow may result in some heterogeneous lanes becoming inactive while some remain active, the control flow is said to be divergent. Typically, the machine code may execute different divergent paths for different sets of heterogeneous lanes, before the control flow reconverges and all heterogeneous lanes become active.

Just because a target architecture supports multiple lanes, does not mean that the source language is mapped to use them to implement source language threads of execution. Therefore, a thread is only considered to have multiple heterogeneous lanes if it’s current frame corresponds to a source language that does do such a mapping.

On some heterogeneous systems there can be heterogeneous agents with target architectures that support multiple address spaces. In these target architectures, there may be memory that is physically disjoint from regular global virtual memory. There can also be cases when the same underlying memory can be accessed using linear addresses that map to the underlying physical memory in an interleaved manner. In these target architectures there can be distinct machine instructions to access the distinct address spaces. For example, there may be physical hardware scratch pad memory that is allocated and accessible only to the threads that are associated with the same heterogeneous work-group. There may be hardware address swizzle logic that allows regular global virtual memory to be allocated per heterogeneous lane such that they have a linear address view, which in fact maps to an interleaved global virtual memory access to improve cache performance.

ROCGDB provides these facilities for debugging heterogeneous programs:

A heterogeneous system may use separate code objects for the different target architectures of the heterogeneous agents. The info sharedlibrary command lists all the code objects currently loaded, regardless of their target architecture.

The following rules apply in determining the target architecture used by commands when debugging heterogeneous programs:

  1. Typically the target architecture of the heterogeneous host agent is the target architecture of the program’s code object. The set architecture command (see Specifying a Debugging Target) can be used to change this target architecture. The target architecture of other heterogeneous agents is typically the target architecture of the associated device.
  2. The target architecture of a thread is the target architecture of the selected stack frame. Typically stack frames will have the same target architecture as the heterogeneous agent on which the thread was created, however, a target may assocociate different target architectures for different stack frames.
  3. The current target architecture is the target architecture of the selected thread, or the target architecture of the heterogeneous host agent if there are no threads.

ROCGDB handles the heterogeneous agent, queue, and dispatch entities in a similar manner to threads (see Threads):

The following debugger convenience variables (see Convenience Variables) are related to heterogeneous debugging. You may find these useful in writing breakpoint conditional expressions, command scripts, and so forth.

$_thread
$_gthread
$_thread_systag
$_thread_name

See Convenience Variables.

$_agent
$_queue
$_dispatch

There are debugger convenience variables that contain the number of each heterogeneous entity associated with the current thread if it was created by a heterogeneous dispatch, or 0 otherwise. $_agent, $_queue, and $_dispatch contain the corresponding per-inferior heterogeneous entity number.

$_lane
$_glane

The heterogeneous lane number of the current lane of the current thread. $_lane contains the corresponding per-inferior heterogeneous lane number. While $_glane contains the corresponding global heterogeneous lane number. If the current thread does not have multiple heterogeneous lanes, it is treated as if it has a single heterogeneous lane number.

$_dispatch_pos

The heterogeneous dispatch position string of the current thread within its associated heterogeneous dispatch if it is was created by a heterogeneous dispatch, or the empty string otherwise. The format varies depending on the heterogeneous system and target architecture of the heterogeneous agent. See Architectures.

$_lane_name

The heterogeneous lane name string of the current heterogeneous lane, or the empty string if no name has been assigned by the lane name command.

$_thread_workgroup_pos
$_lane_workgroup_pos

The heterogeneous work-group position string of the current thread or heterogeneous lane within its associated heterogeneous dispatch if it is was created by a heterogeneous dispatch, or the empty string otherwise. The format varies depending on the heterogeneous system and target architecture of the heterogeneous agent. See Architectures.

$_lane_systag

The target system’s heterogeneous lane identifier (lane_systag) string of the current heterogeneous lane. See target system lane identifier.

The target system’s heterogeneous agent identifier (agent_systag) string of the heterogeneous agent of the current thread.

The target system’s heterogeneous queue identifier (queue_systag) string of the heterogeneous queue of the current thread.

$_dispatch_systag

The target system’s heterogeneous dispatch identifier (dispatch_systag) string of the heterogeneous dispatch of the current thread.

The following debugger convenience functions (see Convenience Functions) are related to heterogeneous debugging. Given the very large number of threads on heterogeneous systems, these may be very useful. They allow threads or thread lists to be specified based on the target system’s thread identifier (systag) or thread name, and allow heterogeneous lanes or heterogeneous lane lists to be specified based on the target system’s heterogeneous lane identifier (lane_systag) or heterogeneous lane name.

$_thread_find
$_thread_find_first_gid

See Convenience Functions.

$_lane_find(regex)

Searches for heterogeneous lanes whose name or lane_systag matches the supplied regular expression. The syntax of the regular expression is that specified by Python’s regular expression support.

Returns a string that is the space separated list of per-inferior heterogeneous lane numbers of the found heterogeneous lanes. If debugging multiple inferiors, the heterogeneous lane numbers are qualified with the inferior number. If no heterogeneous lane are found, the empty string is returned. The string can be used in commands that accept a heterogeneous lane ID list. See heterogeneous entity ID list.

For example, the following command lists all heterogeneous lanes that are part of a heterogeneous work-group with work-group position ‘(1,2,3)’ (see Heterogeneous Debugging):

(gdb) info lanes $_thread_find ("(1,2,3)")
$_lane_find_first_gid(regex)

Similar to the $_lane_find convenience function, except it returns a number that is the global heterogeneous lane number of one of the heterogeneous lanes found, or 0 if no heterogeneous lanes were found. The number can be used in commands that accept a global heterogeneous lane number. See global heterogeneous entity numbers.

For example, the following command sets the current heterogeneous lane to one of the heterogeneous lanes that are part of a heterogeneous work-group with work-item position ‘[1,2,3]’:

(gdb) lane -gid $_lane_find_first_gid ("[1,2,3]")

The following commands are related to heterogeneous debugging:

info agents [agent-id-list]

The info agents command lists the following information for each heterogeneous agent (in this order):

  1. The heterogeneous agent ID. See qualified heterogeneous entity numbers.
  2. The target system’s heterogeneous agent identifier.
  3. The target system’s device name.
  4. The number of compute unit cores.
  5. The total number of threads. The number of threads that a core can execute concurrently is dependent on the target architecture of the device.
  6. The target-specific string identifying the location of the agent. For example, some target agents may use the PCI slot number in BDF (Bus:Device.Function) notation.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous agent number indicates the heterogeneous agent executing the current thread.

Some heterogeneous agents may not be listed until the inferior has started execution of the program.

With no arguments displays information about all heterogeneous agents. You can specify the list of heterogeneous agents that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info agents
  Id   Target Id                  Device Name Cores Threads Location
* 1    AMDGPU Agent (GPUID 45151) vega20      240   2400    0a:00.0
  2    AMDGPU Agent (GPUID 39113) vega20      240   2400    44:00.0

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous agent IDs using the qualified inferior-num.agent-num format. Otherwise, only agent-num is shown.

info queues [queue-id-list]

The info queues command lists the following information for each heterogeneous queue (in this order):

  1. The heterogeneous queue ID. See qualified heterogeneous entity numbers.
  2. The target system’s heterogeneous queue identifier.
  3. The type of the heterogeneous queue. The meaning of the queue types is target architecture and operating system depentent.
  4. The target system’s heterogeneous packet identifier for the next packet that will be read from the heterogeneous queue by the device.
  5. The target system’s heterogeneous packet identifier for the next packet that will be written to the heterogeneous queue for submission to the device.
  6. The size in bytes of the heterogeneous queue packet buffer.
  7. The global memory address of the heterogeneous queue packet buffer.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous queue number indicates the heterogeneous queue executing the current thread.

With no arguments displays information about all heterogeneous queues. You can specify the list of heterogeneous queues that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info queues
  Id   Target Id                Type         Read   Write  Size     Address
* 1    AMDGPU Queue 1:1 (QID 1) HSA (Multi)  0      2      65536    0x00007ffff7f60000
  2    AMDGPU Queue 1:2 (QID 2) DMA                        1048576  0x00007ffde4e00000
  3    AMDGPU Queue 1:3 (QID 0) HSA (Multi)  4      4      262144   0x00007ffff7f00000

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous queue IDs using the qualified inferior-num.queue-num format. Otherwise, only queue-num is shown.

info dispatches [-full] [dispatch-id-list]

The info dispatches command lists the following information for each heterogeneous dispatch (in this order):

  1. The heterogeneous dispatch ID. See qualified heterogeneous entity numbers.
  2. The target system’s identifier of the dispatch packet that initiated the dispatch.
  3. The x, y, and z heterogeneous grid dimensions, in that order, in terms of work-items of the heterogeneous dispatch. The number of dimensions displayed matches the dimensionality of the heterogeneous dispatch.
  4. The x, y, and z heterogeneous work-group dimensions, in that order, in terms of work-items of the heterogeneous dispatch. The number of dimensions displayed matches the dimensionality of the heterogeneous dispatch.
  5. The fences associated with the heterogeneous dispatch packet. The kinds of fences are target architecture dependent.
  6. The size in bytes of address spaces associated with the heterogeneous dispatch. The address spaces are target architecture dependent. Only displayed if the -full option is specified.
  7. The global memory address of the kernel descriptor for the heterogeneous dispatch. The descriptor is target architecture dependent. Only displayed if the -full option is specified.
  8. The global memory address of the kernel arguments for the heterogeneous dispatch. Only displayed if the -full option is specified.
  9. The global memory address of the completion event for the heterogeneous dispatch. Omitted if the heterogeneous dispatch has no completion event. The event is target architecture and operating system dependent. Only displayed if the -full option is specified.
  10. The kernel function prototype.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous dispatch number indicates the heterogeneous dispatch executing the current thread.

With no arguments displays information about all heterogeneous dispatches. You can specify the list of heterogeneous dispatches that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info dispatches -full
  Id   Target Id                      Grid      Workgroup Fence   Address Spaces          Kernel Descriptor  Kernel Args        Completion Signal  Kernel Function
* 1    AMDGPU Dispatch 1:1:1 (PKID 0) [256,1,1] [128,1,1] B|As    Shared(0), Private(220) 0x00007ffde5409800 0x00007ffff7e00000 (nil)              bit_extract_kernel(unsigned int*, unsigned int const*, unsigned long)

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous dispatch IDs using the qualified inferior-num.dispatch-num format. Otherwise, only dispatch-num is shown.

queue find regexp
dispatch find regexp

These commands operate the same way as the ‘thread find’ command (see thread find) except that they use the target system’s heterogeneous agent, queue, and dispatch identifiers respectively.

info packets [queue-id-list]

Display information about the heterogeneous packets on one or more heterogeneous queues. With no arguments displays information about all heterogeneous queues. You can specify the list of heterogeneous queues that you want to display using the heterogeneous queue ID list syntax (see heterogeneous entity ID list).

Since heterogeneous agents may be processing heterogeneous packets asynchronously, the display is at best a snapshot, and may be inconsistent due to the heterogeneous queues being updated while they are being inspected.

The heterogeneous packets are listed contiguously for each heterogeneous agent, and for each heterogeneous queue of that heterogeneous agent, with the oldest packet first.

ROCGDB displays for each heterogeneous packet (in this order):

  1. The associated heterogeneous agent ID. See qualified heterogeneous entity numbers.
  2. The associated heterogeneous queue ID. See qualified heterogeneous entity numbers.
  3. The packet position in the heterogeneous queue, with the oldest one being 1.
  4. Additional information about the heterogeneous packet that varies depending on the heterogeneous system and may vary depending on the target architecture of the heterogeneous entity (see Architectures).
info threads [-gid] [thread-id-list]

The info threads command (see Threads) lists the threads created on all the heterogeneous agents.

If any of the threads listed have multiple heterogeneous lanes, then an additional Lanes column is displayed before the target system’s thread identifier (systag) column. For threads that have multiple heterogeneous lanes, the number of heterogeneous lanes that are active followed by a slash and the total number of heterogeneous lanes of the current frame of the thread is displayed. Otherwise, nothing is displayed.

The target system’s thread identifier (systag) (see target system thread identifier) for threads associated with heterogeneous dispatches varies depending on the heterogeneous system and target architecture of the heterogeneous agent. However, it typically will include information about the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position within the heterogeneous dispatch, and thread position within the heterogeneous work-group. See Architectures.

The stack frame summary displayed is for the active lanes of the thread. This may differ from the stack frame information for the current lane if the focus is on an inactive lane. Use the info lanes command for information about individual lanes of a thread. See Threads.

For example,

(gdb) info threads
  Id  Lanes  Target Id                                       Frame
  1          Thread 0x7ffff7fc4cc0 (LWP 74764) "bit_extract" 0x00007ffff6b56f37 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
  2          Thread 0x7ffff59cb700 (LWP 74773) "bit_extract" 0x00007ffff6b696d7 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
  4          Thread 0x7ffff7fc1700 (LWP 74775) "bit_extract" 0x00007ffff6b696d7 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
* 5   62/64  AMDGPU Thread 1:1:1:1 (0,0,0)/0 "bit_extract"   bit_extract_kernel (C_d=0x7ffde8800000, A_d=0x7ffde8e00000, N=4000000) at bit_extract.cpp:38
  6   2/64   AMDGPU Thread 1:1:1:2 (0,0,0)/1 "bit_extract"   __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
  7   64/64  AMDGPU Thread 1:1:1:3 (1,0,0)/0 "bit_extract"   __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
  8   8/64   AMDGPU Thread 1:1:1:4 (1,0,0)/1 "bit_extract"   __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
thread [-gid] thread-id [lane-index]

The thread command has an optional lane-index argument to specify the heterogeneous lane index. If the value is not between 1 and the number of heterogeneous lanes of the current frame of the thread, then ROCGDB will print an error. If omitted it defaults to 1.

The current thread is set to thread-id and the current heterogeneous lane is set to the heterogeneous lane corresponding to the specified heterogeneous lane index.

If the thread has multiple heterogeneous lanes, ROCGDB responds by displaying the system identifier of the heterogeneous lane you selected, otherwise it responds with the system identifier of the thread you selected, followed by its current stack frame summary.

thread apply [thread-id-list | all [-ascending]] [flag]… command
taas [option]… command
tfaas [option]… command
thread name [name]
thread find regexp

These commands operate the same way for all threads, regardless of whether or not the thread is associated with a heterogeneous dispatch.

If the thread’s frame has multiple heterogeneous lanes then the heterogeneous lane index 1 is used. Use the heterogeneous lane counterpart commands if it is desired to perform the the command on each lane of a thread.

See Threads.

info lanes [-gid] lane-id

Display information about one or more heterogeneous lanes. With no arguments displays information about all heterogeneous lanes. You can specify the list of heterogeneous lanes that you want to display using the heterogeneous lane ID list syntax (see heterogeneous entity ID list).

ROCGDB displays for each heterogeneous lane (in this order):

  1. The heterogeneous lane ID assigned by ROCGDB. See qualified heterogeneous entity numbers.
  2. The global heterogeneous lane ID, if the -gid option was specified. See global heterogeneous entity numbers.
  3. The thread number assigned by ROCGDB for the thread that contains the heterogeneous lane. This is displayed as a global thread number if the -gid option was specified, otherwise as a per-inferior thread number. If the thread has multiple heterogeneous lanes then this is followed by a slash and the heterogeneous lane index of the heterogeneous lane within the thread with the first lane being 1.
  4. An indication of whether the heterogeneous lane is active or inactive.
  5. The target system’s heterogeneous lane identifier (lane_systag). This varies depending on the system and target architecture of the heterogeneous agent. However, for heterogeneous agents it typically will include information about the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position within the heterogeneous dispatch, and position of the heterogeneous lane in the heterogeneous work-group. See Architectures.
  6. The heterogeneous lane’s name, if one is assigned by the user (see lane name, below).
  7. The current stack frame summary for that heterogeneous lane. If the heterogeneous lane is inactive this is the source position at which the heterogeneous lane will resume.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous lane number indicates the current heterogeneous lane.

For example,

(gdb) info lanes
  Id  Thread  Active  Target Id                                          Frame
* 1   4       Y       process 35 thread 13                               main (argc=1, argv=0x7ffffff8)
  2   5/2     Y       AMDGPU Lane 1:2:3:463/2 (2,3,4) work-item(1,2,4)   0x34e5 in saxpy ()
  3   6/12    N       AMDGPU Lane 1:2:4:456/12 (2,4,4) work-item(1,2,3)  0x34e5 in saxpy ()

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous lane IDs using the qualified inferior-num.lane-num format. Otherwise, only lane-num is shown.

If you specify the -gid option, ROCGDB displays a column indicating each heterogeneous lane’s global heterogeneous lane ID, and displays the thread’s global thread number:

(gdb) info lanes -gid
  Id   GId  Thread  Active  Target Id                                          Frame
* 1.1  1    4       Y       process 35 thread 13                               main (argc=1, argv=0x7ffffff8)
  1.2  3    5/2     Y       AMDGPU Lane 1:2:3:463/2 (2,3,4) work-item(1,2,4)   0x34e5 in saxpy ()
  2.1  1    4       Y       process 65 thread 1                                main (argc=1, argv=0x7ffffff8)
  2.2  4    6/12    N       AMDGPU Lane 1:2:4:456/12 (2,4,4) work-item(1,2,3)  0x34e5 in saxpy ()
lane [-gid] lane-id

Make heterogeneous lane ID lane-id the current heterogeneous lane and the thread that contains the heterogeneous lane the current thread. The command argument lane-id is the ROCGDB heterogeneous lane ID: if the -gid option is given it is a global heterogeneous lane identifier, as shown in the second field of the info lanes -gid display; otherwise it is a per-inferior heterogeneous lane identifier, with or without an inferior qualifier (e.g., ‘2.1’ or ‘1’), as shown in the first field of the info lanes display.

ROCGDB responds by displaying the system identifier of the heterogeneous lane you selected, and its current stack frame summary:

(gdb) lane 2
[Switching to lane 2 (Thread 0xb7fdab70 (LWP 12747))]
#0  some_function (ignore=0x0) at example.c:8
8	    printf ("hello\n");

As with the ‘[New …]’ message, the form of the text after ‘Switching to’ depends on your system’s conventions for identifying heterogeneous lanes.

lane name [name]

This command assigns a name to the current heterogeneous lane. If no argument is given, any existing user-specified name is removed. The heterogeneous lane name appears in the info lanes display.

lane find [regexp]

Search for and display heterogeneous lane ids whose name or lane_systag matches the supplied regular expression. The syntax of the regular expression is that specified by Python’s regular expression support.

As well as being the complement to the lane name command, this command also allows you to identify a heterogeneous lane by its target lane_systag. For instance, on the AMD ROCm platform, the target lane_systag is the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position and heterogeneous work-item position.

(gdb) lane find "work-group(2,3,4)"
Lane 2 has lane id 'ROCm process 35 agent 1 queue 2 dispatch 3 work-group(2,3,4) work-item(1,2,4)'
(gdb) info lane 2
  Id  Thread  Active  Target Id                                Frame
  2   5/2     Y       AMDGPU Lane 1:2:3:324/2 (2,3,4)[]1,2,4]  0x34e5 in saxpy ()
lane apply [thread-id-list | all [-ascending]] [flag]… command
laas [option]… command
lfaas [option]… command

lane apply, laas, and lfass commands are similar to their thread counterparts thread apply, taas, and tfaas respectively, except they operate on heterogeneous lanes. See Threads.

backtrace [option]… [qualifier]… [count]
frame [ frame-selection-spec ]
frame apply [all | count | -count | level level…] [option]… command
select-frame [ frame-selection-spec ]
up-silently n
down-silently n
info frame
info args [-q] [-t type_regexp] [regexp]
info locals [-q] [-t type_regexp] [regexp]
faas command

The frame commands apply to the current heterogeneous lane.

If the frame is switched from one that has multiple heterogeneous lanes to one with fewer (including only one) then the current lane is switched to the heterogeneous lane corresponding to the highest heterogeneous lane index of the new frame and ROCGDB responds by displaying the system identifier of the heterogeneous lane selected.

See Examining the Stack.

set libthread-db-search-path
show libthread-db-search-path
set debug libthread-db
show debug libthread-db

These commands only apply to threads created on the heterogeneous host agent that are not associated with a heterogeneous dispatch. There are no commands that support reporting of heterogeneous dispatch thread events.

x/i
display/i

The x/i and display/i commands (see Examining Memory) can be used to disassemble machine instructions. They use the current target architecture.

disassemble

The disassemble command (see Source and Machine Code) can also be used to disassemble machine instructions. If the start address of the range is within a loaded code object, then the target architecture of the code object is used. Otherwise, the current target architecture is used.

info registers
info all-registers
maint print reggroups

The register commands display information about the current architecture.

print

The print command evaluates the source language expression in the context of the current heterogeneous lane.

step
next
finish
until
stepi
nexti

If the current heterogeneous lane is set to an inactive heterogeneous lane, then the step, next, finish and until commands (see Continuing and Stepping) may cause other heterogeneous lanes of the same thread to advance so that the current heterogeneous lane becomes active. This may result in other heterogeneous lanes completing whole functions.

If the current heterogeneous lane is set to an inactive heterogeneous lane, then the stepi and nexti commands (see Continuing and Stepping) may not cause the source position to appear to move until execution reaches a point that makes the current heterogeneous lane active. However, other heterogeneous lanes of the same thread will advance.

break [-lane lane-index] [location] [if cond]
tbreak [-lane lane-index] [location] [if cond]
hbreak [-lane lane-index] [location] [if cond]
thbreak [-lane lane-index] [location] [if cond]
rbreak [-lane lane-index] regex
info breakpoints [list]
watch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
rwatch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
awatch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
info watchpoints [list]
catch [-lane lane-index] event
tcatch [-lane lane-index] event

When a breakpoint, watchpoint, or catchpoint (see Breakpoints; Watchpoints; and Catchpoints) is hit by a frame of a thread with multiple heterogeneous lanes, each active lane is treated independently:

If a heterogeneous lane causes a thread to halt, then the other heterogeneous lanes of the thread will no longer execute even if in non-stop mode.

For break, watch, catch, and their variants, the -lane lane-index option can be specified. This limits ROCGDB to only process breakpoints if the heterogeneous lane has a heterogeneous lane index that matches lane-index.

The info break and info watch commands add a Lane column before the Address column if any breakpoint has a lane-index specified that displays the heterogeneous lane index.

maint print address-spaces [file]

maint print address-spaces displays the address space names supported by each target architecture. The optional argument file tells to what file to write the information.

The address spaces info looks like this:

(gdb) maint print address-spaces
 Class      Arch
 global     All
 group      AMDGPU
 private    AMDGPU
 generic    AMDGPU

The global address space corresponds to the default global virtual memory address space and is available for all target architectures.

Every address entered or displayed can optionally specify the address space qualifier by appending an ‘@’ followed by an address space name. ROCGDB will print an error if the address space name is not supported by the current architecture.

For example,

(gdb) x/x 0x10021608@group
0x10021608@group:     0x0022fd98

If there is no current thread then the only address space that can be specified is global.

If entering an address and no address space is specified, the global address space is used.

If an address is displayed, the address space qualifier is omitted for the global address space.

Heterogeneous systems often have very large numbers of threads. Breakpoint conditions can be used to limit the number of threads reporting breakpoint hits. For example,

break kernel_foo if $_streq($_lane_workgroup_pos, "(0,0,0)")

The tbreak command can be used so only one heterogeneous lane will report the breakpoint. Before continuing execution, the breakpoint will need to be set again if necessary.

The set scheduler-locking on command (see Non-Stop Mode) together with the -lane breakpoint option can be used to lock ROCGDB to only resume the current thread, and only report breakpoints for a fixed heterogeneous lane index. This avoids the overhead of resuming a large number of threads every time resuming from a breakpoint, and also avoids the focus being switched to other threads that hit the breakpoints. Note however that other threads will not be executed.

The scheduler locking commands can also be helpful to prevent ROCGDB switching to other threads while concentrating on debugging one particular thread. The non-stop mode can be hepful to prevent the continue command from resuming other threads that are intentionally halted or from cancelling a single step command that is in progress by another thread and resuming it instead. See Non-Stop Mode.


Next: , Previous: , Up: Top   [Contents][Index]