Devlog 24 Checking Numbers

December 25, 2022

Log 24
Checking numbers
Closing thoughts

Log 24

What better way to spend Christmas Afternoon than to write a cool function in RISC-V Assembly? (don’t answer that.)

Checking numbers

One feature of Forth is its ability to check if a string token is actually a literal number, and then store it in memory as such.

I read quite a bit about this and decided to write my own simple number routine just for this use case. It seems this simple feature was missing in sectorforth and derzforth, which is fine but I kind of feel like it’s quite important, and much better than doing something like:

: dup sp@ @ ;
: -1 dup dup nand dup dup nand nand ;
: 0 -1 dup nand ;
: 1 -1 dup + dup nand ;
: 2 1 1 + ;
... etc

What horror! My implementation doesn’t handle various bases like in jonesforth, but it’s a good start and can be used later in the interpreter’s main loop. The number function will accept 2 parameters, the W working register holds the start address of the token buffer, and the X working register holds the length (in bytes) of the token. The function then returns the signed integer in W and a flag in X. The flag will either be 1 for an OK result, or 0 for an ERROR result (ex: if the token is not a number).

To start, I want numbers to be a maximum of 29 bits, which is about 9 ascii characters. Let’s start the number function:

number:
    li t1, 9                    # initialize temporary to 9: log10(2^29) = 8 + 1 = max 9 characters
    bgtu a1, t1, number_error   # if token is more than 9 characters, it's too long to be an integer

The reason is I actually want to store a number with the 3 first flag bits. I’ll use the currently unused user-defined flag and set it to 1 if the value is a number, or 0 otherwise (the current default). This will make it super easy to identify a number in memory as opposed to a word’s memory address. For now, that means we’ll limit the actual size of a number to 29 bits instead of 32. We want to quickly exit the loop if the token is too long, so we add a guard right at the start of the function.

Next, we want to initialize a few temporaries to some important values that we’ll use throughout our number conversion loop:

    mv t0, zero                 # initialize temporary to 0: holds the final integer
    li t3, CHAR_MINUS           # initialize temporary to minus character '-'
    mv t4, zero                 # initialize temporary to 0: sign flag of integer
    li t5, 10                   # initialize temporary to 10: multiplier used to convert the number

Next, we want to check if the first character in the token is a minus sign (0x2D). This tells us the number will be negative, so let’s keep track of that if it is negative, or jump to our digit checking loop if it’s positive:

    lbu t2, 0(a0)               # load first character from W working register
    bne t2, t3, number_digit    # jump to number digit loop if the first character is not a minus sign
    # first character is a minus sign, so the number will be negative
    li t4, 1                    # number is negative, store a 1 flag in temporary
    addi a0, a0, 1              # increment buffer address by 1 character
    addi a1, a1, -1             # decrease buffer size by 1

Now we enter our digit checking loop, which performs a few validations on the character we’ve loaded. The first thing to do in the loop is exit the loop if the buffer is 0:

number_digit:
    beqz a1, number_done        # if the size of the buffer is 0 then we're done

Next, we know the hex value of the 0 digit is 0x30 so we’ll subtract that from the loaded character and then check the result. We want it to be between 0 and 9, so subtracting 0x30 will give us an actual number between 0 and 9, or something else:

    lbu t2, 0(a0)               # load next character into temporary
    addi t2, t2, -0x30          # subtract 0x30 from the character
    bltz t2, number_error       # check if character is lower than 0, if yes then error
    bgtu t2, t1, number_error   # check if character is greater than 9, if yes then error

See there, we load the character, subtract 0x30, and then check if it’s less than 0, or more than 9. In both cases it’s an error and we jump to the number_error handler. Otherwise we’ve got a valid number and we can continue:

    mul t0, t0, t5              # multiply previous number by 10 (base 10)
    add t0, t0, t2              # add previous number to current digit

Here we’re multiplying the previous value by 10, because we’re using base 10 numbers and we want to essentially add a zero to the right of that digit. Then we add the loaded digit to that. Example: If we have ascii characters “12”, then it’ll become decimal 1, then 10 (after multiplying by 10), then 12 (after adding 2). Easy!

Next we’re simply moving the pointer for the token buffer and decreasing the buffer size, before looping again:

    addi a0, a0, 1              # increment buffer address by 1 character
    addi a1, a1, -1             # decrease buffer size by 1
    j number_digit              # loop to check the next character

Now let’s assume we had an error, example the token was “2abc”, then we’ll end up here:

number_error:
    li a1, 0                    # number is too large or not an integer, return 0
    ret

All it does is return 0 (or false) indicating an error.

If it wasn’t an error, then we’ll end up here:

number_done:
    beqz t4, number_store       # don't negate the number if it's positive
    neg t0, t0                  # negate the number using two's complement

This does two things, first it checks if our number was positive or negative, which we set early in the number function. If it is negative, then it uses two’s complement to negate the number. Otherwise it jumps to here:

number_store:
    li t1, (2^29)-1             # largest acceptable number size: 29 bits
    bgt t0, t1, number_error    # check if the signed number is larger than 29 bits
    mv a0, t0                   # copy final number to W working register
    li a1, 1                    # number is an integer, return 1
    ret

That’s the final part of the function. It first loads the largest value of a 29 bit number, then performs a signed compare with the final number. If it doesn’t fit, then we return an error. Otherwise we copy the number to the W register and return 1 in the X register.

Closing thoughts

This was surprisingly fun and easy to write, and I’m actually surprised that it works as expected (I think?). In the next session, I’ll focus on adding that to the interpreter’s main loop and use it to validate tokens and store them correctly in memory.

« Devlog 23 Interpreter Pt3 Devlog 25 Checking Numbers Pt2 »

FiveForths

Code

Archive

RSS

Download

Devlog 24 Checking Numbers

December 25, 2022

Log 24

Checking numbers

Closing thoughts