FiveForths

32-bit RISC-V Forth for microcontrollers

Devlog 20 Interpreter

December 15, 2022

  1. Log 20
  2. Fixme
  3. Error handling
  4. More initialization
  5. Interpreter
  6. Closing thoughts

Log 20

I didn’t realize but it’s been exactly 1 month since I’ve reloaded this project and started hacking on it again. I’m quite happy with the progress so far, even though I only dedicate a couple hours per session to this project.

This session’s focus is on starting to make something that looks like a Forth interpreter.

Fixme

Before I start on the interpreter, let’s get rid of a few # FIXME comments in the code (and actually fix them).

The first are a couple of missing bounds checks in COLON. First we want to ensure our token size is never larger than 32 characters because we only have 5 bits to store the size (2^5).

    li t0, 32           # load max token size  (2^5 = 32) in temporary
    bgtu a1, t0, error  # error if token size is greater than 32

Next, when we store a word, we need 3 available cells (link, hash, codeword). We want to check this before we update any important values in memory:

    # bounds check on new word memory location
    addi t4, t2, 3*CELL # prepare to move the HERE pointer to the end of the word
    li t5, PAD          # load out of bounds memory address (PAD)
    bgt t4, t5, error   # error if the memory address is out of bounds

Incidentally, the out of bounds memory address is where our PAD starts, and the space required for the word will end exactly at the new HERE address.

Finally, we’ll repeat this bounds check but only with 1 CELL, for the exit memory location (SEMI):

    # bounds check on the exit memory location
    addi t2, t2, CELL   # prepare to move the HERE pointer by 1 CELL
    li t3, PAD          # load out of bounds memory address (PAD)
    bgt t2, t3, error   # error if the memory address is out of bounds

Well in total those were a lot of changes for bounds check (and fixing a bug I discovered), but before we get to the interpreter let’s add some UART code to the error function (another # FIXME).

Error handling

I didn’t try very hard for this one and simply copied exactly what derzforth does:

# print an error message to the UART
error:
    li a0, ' '
    call uart_put
    li a0, '?'
    call uart_put
    li a0, '\n'
    call uart_put

    j reset

Once an error is hit, the next step is to reset everything, we need to reinitialize the stack pointers, variables, state, etc.. which is what we’ve done in the reset function, so let’s just jump there.

More initialization

In the reset function, there’s a bit more initialization required. The main thing we were missing is zero-filling the terminal input buffer (TIB):

tib_init:
    # initialize TOIN variable
    li t0, TIB          # load TIB memory address
    li t1, TOIN         # load TOIN variable
    li t2, TIB_TOP      # load TIB_TOP variable
    sw t0, 0(t1)        # initialize TOIN variable to contain TIB start address
tib_zerofill:
    # initialize the TIB
    beq t2, t0,tib_done # loop until TIB_TOP == TIB
    addi t2, t2, -CELL  # decrement TIB_TOP by 1 CELL
    sw zero, 0(t2)      # zero-fill the memory address
    j tib_zerofill      # repeat
tib_done:
    j interpreter       # jump to the main interpreter REPL

This is somewhat different from derzforth and sectorforth. In this case we’re starting from the top of the TIB (highest memory address), and filling it with 1 CELL (4 bytes on 32-bit RISC-V), decrementing the memory address and then looping until the entire TIB is filled with zeros.

Since we only allocated a stack size of 256 Bytes for the TIB, that equates to just 64 CELLs (i.e: 64 loop iterations) to clear it out. It’s reasonably fast.

We also need to make sure the TOIN variable gets reinitialized, so I just moved that part from reset to tib_init.

Interpreter

OK so we’ve cleared up all our # FIXMEs, now we can jump to the interpreter (haha, see what I did there?).

Here’s what I’ve got so far:

# here's where the program starts (the interpreter)
interpreter:
    call uart_get       # read a character from UART
    call uart_put       # send the character to UART

    # FIXME: validate the character
    j interpreter

Let’s see.. how should this work?

The first thing we want to do is read a character and echo it back (so we can see what we’re typing). Next, we want to validate the character by checking if it’s a comment, backspace, or newline, and printable character. At each point we’ll add the character to the terminal input buffer (TIB) until we’ve gotten a full word (token).

If we get a newline and we’re currently compiling a word then we’ll just ignore it until the semicolon is given. That will allow us to write multi-line definitions and even “upload” them via UART. If all is good, then we’ll jump into the process to validate the token, hash it, search for it in the dictionary, and either execute or compile based on the STATE variable or immediate status of the word.

Whew, that’s a mouthful but it’s pretty straightforward. I think I’ll need to write a lookup function but we’ll defer that for later.

Closing thoughts

So far quite a few changes were made in this session, a lot of code was re-organized and moved around, but everything still compiles and works perfectly so far (I think?).

I was going to get right into character validation, but I want to take a break to think more about this (and re-read my Forth books). I’ll get back to character validation in the next session.