Devlog 24 Checking Numbers
December 25, 2022
Log 24
What better way to spend Christmas Afternoon than to write a cool function in RISC-V Assembly
? (don’t answer that.)
Checking numbers
One feature of Forth is its ability to check if a string token is actually a literal number, and then store it in memory as such.
I read quite a bit about this and decided to write my own simple number routine just for this use case. It seems this simple feature was missing in sectorforth and derzforth, which is fine but I kind of feel like it’s quite important, and much better than doing something like:
: dup sp@ @ ;
: -1 dup dup nand dup dup nand nand ;
: 0 -1 dup nand ;
: 1 -1 dup + dup nand ;
: 2 1 1 + ;
... etc
What horror! My implementation doesn’t handle various bases like in jonesforth, but it’s a good start and can be used later in the interpreter’s main loop. The number
function will accept 2 parameters, the W
working register holds the start address of the token buffer, and the X
working register holds the length (in bytes) of the token. The function then returns the signed integer in W
and a flag in X
. The flag will either be 1
for an OK
result, or 0
for an ERROR
result (ex: if the token is not a number).
To start, I want numbers to be a maximum of 29 bits, which is about 9 ascii characters. Let’s start the number
function:
number:
li t1, 9 # initialize temporary to 9: log10(2^29) = 8 + 1 = max 9 characters
bgtu a1, t1, number_error # if token is more than 9 characters, it's too long to be an integer
The reason is I actually want to store a number with the 3 first flag bits. I’ll use the currently unused user-defined flag and set it to 1
if the value is a number, or 0
otherwise (the current default). This will make it super easy to identify a number in memory as opposed to a word’s memory address. For now, that means we’ll limit the actual size of a number to 29 bits instead of 32. We want to quickly exit the loop if the token is too long, so we add a guard right at the start of the function.
Next, we want to initialize a few temporaries to some important values that we’ll use throughout our number conversion loop:
mv t0, zero # initialize temporary to 0: holds the final integer
li t3, CHAR_MINUS # initialize temporary to minus character '-'
mv t4, zero # initialize temporary to 0: sign flag of integer
li t5, 10 # initialize temporary to 10: multiplier used to convert the number
Next, we want to check if the first character in the token is a minus sign (0x2D
). This tells us the number will be negative, so let’s keep track of that if it is negative, or jump to our digit checking loop if it’s positive:
lbu t2, 0(a0) # load first character from W working register
bne t2, t3, number_digit # jump to number digit loop if the first character is not a minus sign
# first character is a minus sign, so the number will be negative
li t4, 1 # number is negative, store a 1 flag in temporary
addi a0, a0, 1 # increment buffer address by 1 character
addi a1, a1, -1 # decrease buffer size by 1
Now we enter our digit checking loop, which performs a few validations on the character we’ve loaded. The first thing to do in the loop is exit the loop if the buffer is 0:
number_digit:
beqz a1, number_done # if the size of the buffer is 0 then we're done
Next, we know the hex value of the 0
digit is 0x30
so we’ll subtract that from the loaded character and then check the result. We want it to be between 0 and 9
, so subtracting 0x30
will give us an actual number between 0 and 9, or something else:
lbu t2, 0(a0) # load next character into temporary
addi t2, t2, -0x30 # subtract 0x30 from the character
bltz t2, number_error # check if character is lower than 0, if yes then error
bgtu t2, t1, number_error # check if character is greater than 9, if yes then error
See there, we load the character, subtract 0x30
, and then check if it’s less than 0, or more than 9. In both cases it’s an error and we jump to the number_error
handler. Otherwise we’ve got a valid number and we can continue:
mul t0, t0, t5 # multiply previous number by 10 (base 10)
add t0, t0, t2 # add previous number to current digit
Here we’re multiplying the previous value by 10, because we’re using base 10 numbers and we want to essentially add a zero to the right of that digit. Then we add the loaded digit to that. Example: If we have ascii characters “12”, then it’ll become decimal 1
, then 10
(after multiplying by 10), then 12
(after adding 2). Easy!
Next we’re simply moving the pointer for the token buffer and decreasing the buffer size, before looping again:
addi a0, a0, 1 # increment buffer address by 1 character
addi a1, a1, -1 # decrease buffer size by 1
j number_digit # loop to check the next character
Now let’s assume we had an error, example the token was “2abc”, then we’ll end up here:
number_error:
li a1, 0 # number is too large or not an integer, return 0
ret
All it does is return 0 (or false) indicating an error.
If it wasn’t an error, then we’ll end up here:
number_done:
beqz t4, number_store # don't negate the number if it's positive
neg t0, t0 # negate the number using two's complement
This does two things, first it checks if our number was positive or negative, which we set early in the number
function. If it is negative, then it uses two’s complement to negate the number. Otherwise it jumps to here:
number_store:
li t1, (2^29)-1 # largest acceptable number size: 29 bits
bgt t0, t1, number_error # check if the signed number is larger than 29 bits
mv a0, t0 # copy final number to W working register
li a1, 1 # number is an integer, return 1
ret
That’s the final part of the function. It first loads the largest value of a 29 bit number, then performs a signed compare with the final number. If it doesn’t fit, then we return an error. Otherwise we copy the number to the W
register and return 1
in the X
register.
Closing thoughts
This was surprisingly fun and easy to write, and I’m actually surprised that it works as expected (I think?). In the next session, I’ll focus on adding that to the interpreter’s main loop and use it to validate tokens and store them correctly in memory.