Script v0.4: From VM to Native - The Self-Hosting Journey Begins

January 27, 2026 · 8 min read

Founder of Script

This week marks a major milestone for Script: we've officially begun Phase 4 - Self-Hosting Compiler. After completing Phase 3 (Language Completion), we're now taking the next step toward making Script a truly self-contained language that can compile itself. This post explores what self-hosting means, why it matters, and how we're building toward it.

What is Self-Hosting?

A self-hosting compiler is a compiler written in the language it compiles. Currently, Script's compiler is written in Rust. The goal of Phase 4 is to port the compiler to Script itself, so that Script can compile Script.

This might sound like a chicken-and-egg problem, but it's actually a well-established pattern in language development. Here's how it works:

tscl₀ (Rust compiler) ──compile──> tscl₁ (Script-compiled binary)
     │                                 │
     │                                 └──compile──> tscl₂ (tscl₁-compiled)
     │                                               │
     │                                               └──verify──> ✓
     │
     └──validate hash(tscl₁) == hash(tscl₂)

Success condition: If tscl₁ and tscl₂ produce bit-for-bit identical binaries, we've achieved self-hosting.

Why Self-Hosting Matters

Self-hosting is more than just a technical achievement—it's a proof of maturity. Here's why it matters:

1. Language Completeness

If a language can compile itself, it's complete enough to build real software. You can't write a compiler in a language that's missing critical features. Self-hosting proves that Script has:

Sufficient control flow (loops, conditionals, functions)
Adequate data structures (objects, arrays, strings)
Proper error handling
Module system for code organization
Performance characteristics suitable for large codebases

2. Removing Runtime Dependencies

Currently, Script's VM is written in Rust and linked into every binary. While this works, it means:

Production binaries include the entire Rust runtime
We're limited by Rust's compilation model
We can't optimize the runtime as aggressively as we could if it were in Script

With self-hosting, the compiler itself runs as native code, and we can eventually remove the VM from production builds entirely.

3. Faster Iteration

Once self-hosting is achieved, we can iterate on the compiler using the compiler itself. This creates a positive feedback loop:

Improve the compiler → faster compilation
Faster compilation → easier to improve the compiler
Better compiler → better language features

4. Community Confidence

Self-hosting demonstrates that Script is serious and production-ready. It shows that the language isn't just a toy project, but something that can be used to build real, complex software.

The Bootstrap Architecture

Our self-hosting strategy follows a three-step bootstrap process:

Step 1: Stabilize the Output

Before we can port the compiler, we need deterministic, reproducible builds. This means:

ABI Versioning: We've frozen the Application Binary Interface (ABI) at version 1. This is the contract between compiled code and the runtime:

// src/runtime/abi_version.rs
pub const ABI_VERSION: u32 = 1;

The ABI defines:

Function signatures for runtime stubs (tscl_add_any, tscl_alloc_object, etc.)
NaN-boxed value encoding (64-bit words)
Heap object layouts
Calling conventions

Once frozen, we can't change these without bumping the version.

IR Serialization: We've created a deterministic Intermediate Representation (IR) format:

./target/release/script build app.tscl --emit-ir -o app

This produces a stable, text-based IR that can be:

Verified (--verify-ir)
Serialized and deserialized
Used for bootstrap verification

The IR format ensures that:

Register numbering is deterministic
Block ordering is stable
Function ordering is lexicographic
No random seeds or non-deterministic operations

Step 2: Port the Compiler

The compiler consists of several modules:

Module	Lines (est)	Priority
`lexer.tscl`	~400	1
`parser.tscl`	~1200	2
`emitter.tscl`	~800	2
`ir.tscl`	~600	3
`codegen.tscl`	~1000	4

We'll port incrementally, keeping Rust as the reference implementation. Each module will be tested independently before moving to the next.

Step 3: Bootstrap Verification

Once the compiler is ported, we verify self-hosting:

// tests/bootstrap/loop.tscl
export function testBootstrapLoop(): void {
    // Step 1: Compile compiler with Rust tscl
    const tscl1 = runRustTscl("build compiler.tscl --dist -o /tmp/tscl1");
    
    // Step 2: Compile compiler with tscl₁
    const tscl2 = runTscl("/tmp/tscl1", "build compiler.tscl --dist -o /tmp/tscl2");
    
    // Step 3: Verify bit-for-bit match
    assert(hash("/tmp/tscl1") === hash("/tmp/tscl2"), "Bootstrap not deterministic");
}

If tscl₁ and tscl₂ produce identical binaries, we've achieved self-hosting.

ABI Freezing Strategy

The ABI (Application Binary Interface) is the contract between compiled Script code and the runtime. Freezing it is critical because:

Stability: Once frozen, we can't change function signatures without breaking compatibility
Verification: We can verify that the ABI hasn't changed between bootstrap stages
Documentation: It serves as a clear specification for what the runtime provides

Current ABI Surface

The ABI consists of ~20 runtime stubs:

extern "C" {
    // Arithmetic
    fn tscl_add_any(a: u64, b: u64) -> u64;
    fn tscl_sub_any(a: u64, b: u64) -> u64;
    fn tscl_mul_any(a: u64, b: u64) -> u64;
    
    // Allocation
    fn tscl_alloc_object() -> u64;
    fn tscl_alloc_array() -> u64;
    fn tscl_alloc_string(ptr: *const u8, len: usize) -> u64;
    
    // Property access
    fn tscl_get_prop(obj: u64, key: u64) -> u64;
    fn tscl_set_prop(obj: u64, key: u64, val: u64) -> u64;
    
    // Function calls
    fn tscl_call(func: u64, args: u64, arg_count: u32) -> u64;
    
    // Error handling
    fn tscl_abort(msg: *const u8, len: usize) -> !;
}

All values are NaN-boxed into 64-bit words, allowing us to represent:

Numbers (as f64)
Booleans, null, undefined (in NaN space)
Pointers to heap objects (in NaN space)

ABI Versioning Rules

No signature changes without version bump
No NaN-box encoding changes without version bump
No layout changes to heap objects without version bump
Version bump required for any breaking change

We've documented the ABI in docs/ABI.md and created compatibility tests to ensure it doesn't drift.

IR Serialization

The Intermediate Representation (IR) is the bridge between the compiler and code generation. Serializing it allows us to:

Verify determinism: Same source → same IR (bit-for-bit)
Debug compilation: Inspect IR at each stage
Bootstrap verification: Compare IR between bootstrap stages

IR Format

; ============================================================
; tscl IR Module
; Format version: 1
; ABI version: 1
; ============================================================

fn main() -> any {
    ; Local variables
    local $0: any = console

bb0:
    v0 = const "test"
    v1 = load.local $0
    v2 = call.method v1.log(v0)
    return
}

Determinism Guarantees

Register numbering: Allocated in definition order, no renumbering
Block ordering: Entry block first, then by first reference
Function ordering: Lexicographic by name
No randomness: Fixed seeds, deterministic hash maps

CLI Flags

# Emit IR to file
./target/release/script build app.tscl --emit-ir -o app
# → app.ir

# Verify IR validity
./target/release/script build app.tscl --verify-ir

# Emit LLVM IR
./target/release/script build app.tscl --emit-llvm -o app
# → app.ll

# Emit object file
./target/release/script build app.tscl --emit-obj -o app
# → app.o

Performance Implications

Self-hosting has significant performance implications:

Native Code Generation

Currently, Script can compile to:

Cranelift JIT: Fast development, ~6x faster than VM
LLVM AOT: Optimized native binaries with LTO

Once self-hosting is complete, the compiler itself will run as native code, providing:

Faster compilation: No VM overhead
Better optimization: LLVM can optimize the compiler itself
Smaller binaries: No VM runtime in production builds

VM Removal

The VM will remain for:

Development mode: --dev flag for debugging
REPL: Interactive console
Testing: Test runner

But production builds (--release, --dist) will be VM-free, resulting in:

Smaller binaries
Faster startup
Lower memory usage

Challenges and Risks

Self-hosting is where many languages fail. Here are the risks we're aware of:

1. IR Drift

Risk: The IR format changes during porting, breaking bootstrap.

Mitigation: Keep Rust as reference, test incrementally, version the IR format.

2. Non-Determinism

Risk: Builds aren't bit-for-bit identical, breaking verification.

Mitigation: Fixed seeds, deterministic ordering, stable linker flags.

3. Bootstrap Loop

Risk: Infinite loop if tscl₁ can't compile tscl₂.

Mitigation: Verify at each step, keep Rust compiler as fallback.

4. Performance Regression

Risk: Self-hosted compiler is slower than Rust version.

Mitigation: Benchmark suite, performance budgets, incremental optimization.

What's Next

Phase 4 is a multi-week effort. Here's the roadmap:

Week 1-2: ✅ ABI freezing and IR serialization (COMPLETE)
Week 3-8: Port compiler modules (lexer → parser → emitter → IR → codegen)
Week 9: Bootstrap tests and verification
Week 10+: Performance tuning and VM removal

We're taking it step by step, ensuring each stage is solid before moving to the next.

Conclusion

Self-hosting is a major milestone that proves Script's maturity and sets the foundation for future growth. By freezing the ABI and creating deterministic IR serialization, we've laid the groundwork for a robust bootstrap process.

The journey from VM to native is challenging, but it's also exciting. Every language that achieves self-hosting joins an elite group of systems that can truly stand on their own.

We're building Script to be fast, safe, and practical. Self-hosting is the next step in that journey.

Try Script today:

cargo build --release
./target/release/script build hello.tscl -o hello
./hello

Learn more:

What is Self-Hosting?​

Why Self-Hosting Matters​

1. Language Completeness​

2. Removing Runtime Dependencies​

3. Faster Iteration​

4. Community Confidence​

The Bootstrap Architecture​

Step 1: Stabilize the Output​

Step 2: Port the Compiler​

Step 3: Bootstrap Verification​

ABI Freezing Strategy​

Current ABI Surface​

ABI Versioning Rules​

IR Serialization​

IR Format​

Determinism Guarantees​

CLI Flags​

Performance Implications​

Native Code Generation​

VM Removal​

Challenges and Risks​

1. IR Drift​

2. Non-Determinism​

3. Bootstrap Loop​

4. Performance Regression​

What's Next​

Conclusion​