why-write-a-compiler.md (4163B)
1 # Why Write a Compiler? 2 3 I think my primary motivation is simply that I had never done so before. 4 5 Across 30-some years programming, I've written a bunch of assemblers, several 6 interpreted languages, a number of bytecode runtimes, a linker/preprocessor for 7 java class files and bytecode into a smaller/lighter bytecode (for the hiptop 8 family devices), various disassembler, and a pile of domain specific 9 configuration languages, but never a full top-to-bottom compiled programming 10 language that starts with source code and ends up with executable machine 11 instructions of some form or another. 12 13 It feels like one of those things that everybody should do at least once. 14 15 ### Open Source FPGA Toolchains 16 17 Advances in the state of the art of open source FPGA toolchains have made doing 18 "real" digital design stuff without awful vendor tools possible, which 19 re-kindled my interest in this space. It's pretty trivial to build a little 20 CPU on verilog. It's not hard to crank out a little assembler for it. But if 21 you're doing anything serious, writing a lot of software in assembly is just 22 not that fun or efficient, at least not for me. I'd like to have a little 23 systems compiler amongst my tools that I could easily target various little 24 CPUs with. 25 26 ### The Big Compilers are Way Too Big 27 28 GCC and LLVM/clang are open source. And very powerful. And frickin' enormous. 29 30 So while I could invest in learning how to make use of, say, LLVM to build my 31 own frontends and share backends, or write backends for either, both of them 32 are quite enormous, complex source bases, and I'm not super excited about 33 dealing with giant piles of other peoples' software. 34 35 ### Self-Hosting and Embedded Platforms 36 37 While I'm writing the initial version in C, once I have sufficient feature 38 coverage I hope to mechanically translate the C version and migrate to building 39 the compiler with itself. A Self-Hosting Compiler is another one of those 40 "it'd be fun to do that at least once" sort of things. 41 42 I'd also like to be able to use the compiler not just to build for but run on 43 small embedded devices, retro computing systems, or experimental small 44 platforms. On the scale of single-digit megabytes of memory or less at the low 45 end. That definitely rules out GCC and clang. 46 47 ### At Most, I Want "Just Enough" Optimization 48 49 Modern optimizing compilers are amazing things, but they increasingly seem to 50 be getting to clever for their own good (or at least mine). For systems 51 programming, especially, I really don't want the compiler to silently drop code 52 or massively rearrange it. I want to be able to rely on the compiler mostly 53 doing what I tell it and not getting all inventive about "undefined behaviour." 54 55 C and C++ have a bunch of undefined behaviour around integer and unsigned math, 56 for example, which result in a weird gap between what the underlying machine 57 does if you ask it and what a modern compiler considers "valid", and if "not 58 valid" happily pretends like it doesn't matter. In a way I'm looking for a bit 59 more of the "high level assembly language" that C is often accused of being 60 while it is increasingly not... 61 62 https://www.yodaiken.com/2021/05/19/undefined-behavior-in-c-is-a-reading-error/ 63 64 https://www.yodaiken.com/2021/05/16/c-is-not-a-serious-programming-language/ 65 66 ### They Still Haven't Built me a "Better C" 67 68 Go comes close, but its insistence of green threads, userspace scheduling, 69 garbage collection, largish libraries, large runtime memory footprint, and 70 really awkward interworking with native C/ELF ABI code are deal breakers. 71 72 Rust is more pragmatic about inteworking with C/C++ native code and native 73 platform threading (yay), but still suffers from a large standard library that 74 doesn't seem to layer/subset well, resulting in even small programs being 75 several megabytes in size. 76 77 I feel like its heart is in the right place with the lifetime tracking stuff, 78 but it feels rather awkward to code with. It also suffers from very slow 79 compilation. 80 81 And so on and so on. 82 83 This project gives me a chance to do my own experimenting, and while I am 84 skeptical that I shall succeed wildly where all others have failed, at least 85 I'll have fun trying.