objvm - Introducing asm0 and asm1

Building a virtual machine and building an assembler for that virtual machine are very different projects that can itch very different parts of the brain.

Because I’m most interested in small VM construction, I strived to make the assembler as simple as possible. It is inspired by two projects, nanopass (https://github.com/nanopass) and uxn/uxntal (https://100r.co/site/uxn.html , https://wiki.xxiivv.com/site/uxntal.html).

nanopass is a scheme-based DSL for building layered compilers in which you build mini-languages and the layers that support conversions from one language to another. It is nice but too strong for my current needs.

uxn is a “personal computing stack” with its own assembly language, uxntal. It aims to be simple enough for human coders. It is nice and powerful, but I’m a city-bound land-dweller so the low memory and power utilization are not part of my goal.

I’ve therefore built two simple assemblers called asm0 and asm1.

The entirity of the source code for both is below.

asm0:

sed -e 's,;.*,,g' -e 's, ,\n,g' |
    awk -b '/0x/ { printf("%c", strtonum($1)) }'

asm1:

cpp -P $@

What and Why

When building any sort of program that is merely a function from input to output such as a compiler or assembler, it helps me to break it down into manageable chunks.

I’ve stumbled on what I would call the “identity function” of assemblers. Essentially take code written in hex, a string of 0xNNs, and output it to binary.

The simplest and therefore best program is one in which the input is copied to the output. The next simplest, therefore the second best, is one which simply converts the same data from one presentation to another. This is asm0.

asm1, then, becomes a superset of asm0 that merely calls the C preprocessor. This gives us a quick but robust syntax for writing constructs, such as loops:

#include "asm1.h"
PUSH CONSTANT 0x03
PUSH SPECIAL IP PUSH CONSTANT 0x0a ADD POP REGISTER 0x00
    ... my code ...
    DEC
JNZ REGISTER 0x00

Compiling and assembling our program then becomes: asm1 < input.1asm | asm0 > output.bin

We can even convert the steps here meta opcode via cpp:

#define STOREIP08 PUSH SPECIAL IP PUSH CONSTANT 0x0a ADD POP
PUSH CONSTANT 0x03
STOREIP08 REGISTER 0x00
  ...
  DEC
JNZ REGISTER 0x00

note: STOREIP08 ends with a POP and therefore becomes a 1-argument meta-opcode!

caveat: the creation of these meta-opcodes makes it harder to count how big an operation is in code. the 08 at the end of STOREIP signifies at as 8 bytes in length.

will there asm2? asm3

Yes! asm2 is written and is slightly more complex than asm1 and 0.

Does the VM exist?

Somewhat! All of the asm1 programs in the asm/ folder work but the VM need some code cleanup before publishing.