Debugging SiFive HiFive Unmatched via JTAG

A mysterious bug

Debugging user space applications with GDB is common place. But when boot firmware fails it becomes a bit more tricky.

With a U-Boot v2024.01 build booting my RISC-V Unmatched board failed with the following output:

U-Boot SPL 2024.01+dfsg-1ubuntu1 (Feb 01 2024 - 13:52:26 +0000)
Trying to boot from MMC1

U-Boot 2024.01+dfsg-1ubuntu1 (Feb 01 2024 - 13:52:26 +0000)

CPU: rv64imafdc
Model: SiFive HiFive Unmatched A00
DRAM: 16 GiB
__wrpll_calc_filter_range: post-divider reference freq out of range: 4294967295
__wrpll_calc_filter_range: post-divider reference freq out of range: 4294967295
__wrpll_calc_filter_range: post-divider reference freq out of range: 4294967295
__wrpll_calc_filter_range: post-divider reference freq out of range: 4294967295
initcall failed at call 00000000fff5432e (err=-34)
### ERROR ### Please RESET the board ###

From the source code I saw that this must be related to the SiFive clock driver. So I added some printf() statements to understand what is going on. Unexpectedly with the printf() statements the error did not occur anymore! I needed a way to single step through the unchanged code.

JTAG on the Unmatched board

JTAG is a standard for on chip and on board test circuitry. The Unmatched board provides access to it via the same USB connector as the serial console.

OpenOCD, the Open On-Chip Debugger, is a software to access JTAG.

sudo apt-get install openocd

To use it a configuration file describing the board is needed. The script provided in SiFive’s SDK generated some errors. Finally this worked for me:

adapter speed   10000
adapter driver  ftdi

ftdi device_desc "Dual RS232-HS"
ftdi vid_pid 0x0403 0x6010
ftdi layout_init 0x0008 0x001b
ftdi layout_signal nSRST -oe 0x0020 -data 0x0020

set _CHIPNAME riscv
transport select jtag
jtag newtap $_CHIPNAME cpu -irlen 5

# Target: S7 (coreid 0) and U74 (coreid 1-4)
target create $_CHIPNAME.cpu0 riscv -chain-position $_CHIPNAME.cpu -coreid 0 -rtos hwthread
target create $_CHIPNAME.cpu1 riscv -chain-position $_CHIPNAME.cpu -coreid 1
target create $_CHIPNAME.cpu2 riscv -chain-position $_CHIPNAME.cpu -coreid 2
target create $_CHIPNAME.cpu3 riscv -chain-position $_CHIPNAME.cpu -coreid 3
target create $_CHIPNAME.cpu4 riscv -chain-position $_CHIPNAME.cpu -coreid 4
target smp $_CHIPNAME.cpu0 $_CHIPNAME.cpu1 $_CHIPNAME.cpu2 $_CHIPNAME.cpu3 $_CHIPNAME.cpu4

init 
halt

Debugging

When running OpenOCD it offers multiple ports to connect to:

$ openocd -f openocd.cfg
Info : starting gdb server for riscv.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections

I need U-Boot to stop at an adequate starting position for starting debugging. Adding an endless loop to the probe function of the clock driver gave me a defined entry point. As the error occurred only in main U-Boot and not in the SPL phase I added an #ifndef.

647 static int sifive_prci_probe(struct udevice *dev) 
648 { 
649         int i, err;
650         struct __prci_clock *pc;
651         struct __prci_data *pd = dev_get_priv(dev);
652
653         struct prci_clk_desc *data = 
654                 (struct prci_clk_desc *)dev_get_driver_data(dev);
655
656 #ifndef CONFIG_SPL_BUILD
657         asm(
658                 "loop:\n"
659                 "j loop\n"
660         );
661 #endif

Attaching GDB to the OpenOCD port is done via

gdb-multiarch u-boot -ex 'target extended-remote localhost:3333'

The Unmatched board has multiple harts. Only a randomly chosen one is running in U-Boot. We need to know which:

(gdb) info thread
  Id   Target Id                                                      Frame
* 1    Thread 1 "riscv.cpu0" (Name: riscv.cpu0, state: debug-request) 0x000000008000c182 in ?? ()
  2    Thread 2 "riscv.cpu1" (Name: riscv.cpu1, state: debug-request) 0x00000000800007e0 in ?? ()
  3    Thread 3 "riscv.cpu2" (Name: riscv.cpu2, state: debug-request) 0x00000000800007e0 in ?? ()
  4    Thread 4 "riscv.cpu3" (Name: riscv.cpu3, state: debug-request) 0x00000000fffb239e in ?? ()
  5    Thread 5 "riscv.cpu4" (Name: riscv.cpu4, state: debug-request) 0x000000008000c182 in ?? ()

The address of hart 4 sticks out. This must be the one running in U-Boot which has relocated itself to just below 4 GiB.

(gdb) thread 4
[Switching to thread 4 (Thread 4)]
#0  0x00000000fffb239e in ?? ()

As U-Boot has relocated itself the relocation address is needed to display the source code in the debugger. U-Boot has a global data structure with a corresponding field. On RISC-V U-Boot stores the pointer to the global data in the gp register.

(gdb) p/x *(gd_t *)$gp
$1 = {bd = 0xff731f90, flags = 0x301, baudrate = 0x1c200, cpu_clk = 0x0,
  bus_clk = 0x0, pci_clk = 0x0, mem_clk = 0x0, have_console = 0x1,
  env_addr = 0xfffd0e48, env_valid = 0x1, env_has_init = 0x400, env_load_prio = 0x0,
  ram_base = 0x80000000, ram_top = 0x100000000, relocaddr = 0xfff52000,
  ram_size = 0x400000000, mon_len = 0xad808, irq_sp = 0x0, start_addr_sp = 0xff72a900,
  reloc_off = 0x7fd52000, new_gd = 0xff731df0, dm_root = 0xff732050,
  dm_root_f = 0x801f9030, uclass_root_s = {next = 0xff733eb0, prev = 0xff732030},
  uclass_root = 0xff731ea8, timer = 0x0, fdt_blob = 0xff72a910, new_fdt = 0xff72a910,
  fdt_size = 0x74e0, fdt_src = 0x0, jt = 0x0, env_buf = {0x31, 0x31, 0x35, 0x32, 0x30,
    0x30, 0x0 <repeats 26 times>}, timebase_h = 0x0, timebase_l = 0x0,
  malloc_base = 0x801f9000, malloc_limit = 0x3000, malloc_ptr = 0x910, hose = 0x0,
  pci_ram_top = 0x0, cur_serial_dev = 0x801f9820, arch = {boot_hart = 0x3,
    firmware_fdt_addr = 0x802a5f40, available_harts = 0x8, smbios_start = 0x0},
  smbios_version = 0x0, event_state = {spy_head = {next = 0xff731f70,
      prev = 0xff731f70}}, dmtag_list = {next = 0xff731f80, prev = 0xff731f80}}

relocaddr = 0xfff52000 is the value required. Now we need to inform GDB:

(gdb) add-symbol-file u-boot 0xfff52000
add symbol table from file "u-boot" at
        .text_addr = 0xfff52000
(y or n) y
Reading symbols from u-boot...

We are still stuck in the endless loop. To get out of it we either use GDB’s jump command or increment the program counter.

(gdb) set $pc += 2

The source code is displayed with

lay src

Now we are set for debugging.

The outcome

It turned out that the in memory device-tree was corrupted. Using a watchpoint I was able to track down where this occurred and provide a patch.

1 Like