Skip to content

Architecture

operator workstation provisioned cloud host
───────────────────── ──────────────────────
~/mvm/ (public) ─┐
├─rsync─→ /home/mvm/mvm/, /home/mvm/mvmd/
~/mvmd/ (private) ┘ ↓
cargo build --release
/usr/local/bin/{mvmd, mvm-hostd}
systemctl enable --now
mvm-hostd, mvmd-agent,
mvmd-coordinator, mvmd-gateway
  • mvm is the public Rust workspace — the core microVM runtime, CLI, guest, security crates, etc. Lives at tinylabscom/mvm.
  • mvmd is the private daemon repo — coordinator, gateway, agent, runtime that hosts microVM tenants.
  • mvm-deployment (this repo) is the operator-facing deploy + ops layer. It does not contain mvm or mvmd source — only the scripts, configs, and provider adapters that turn a fresh cloud host into a working live-KVM environment.

The split between cloud-init and post-deploy.sh exists because mvmd is private. The box can’t git clone it, so we rsync from the operator’s local checkout after the system is set up.

ops/ephemeral/up.sh invokes the provider adapter:

  1. Provider creates the instance with cloud-init.yaml as user-data.
  2. up.sh waits on SSH for cloud-init status: done.

Cloud-init is system-only:

  • Toolchain: rustup, Nix, Firecracker (pinned), cargo-fuzz / cargo-deny / cargo-audit
  • KVM gate: hard-fail if /dev/kvm is absent
  • State dirs: /var/lib/mvm, /etc/mvmd with correct ownership
  • System users: mvm, mvmd
  • SSH key propagation: root → mvm

No clone, no build, no service install at this stage.

After cloud-init reaches done:

  1. rsync mvm + mvmd onto the box (/home/mvm/mvm, /home/mvm/mvmd).
  2. ssh in and run ops/ephemeral/post-deploy.sh:
    • cargo build --release mvmd (each binary in its own workspace package)
    • install /usr/local/bin/{mvmd,mvm-hostd}
    • render /etc/mvmd/{coordinator,gateway}.toml from in-repo templates with a fresh HMAC secret
    • install + enable the four mvmd systemd units
    • write /etc/mvm-deploy-status = ready

post-deploy.sh is idempotent: every step is [ -x bin ] && skip or systemctl enable --now. Re-running up.sh against the same box is fast.

Every providers/<name>.sh implements four functions:

Terminal window
provider_default_instance_type() # echoed default
provider_default_region() # echoed default
provider_up <label> <cloud-init> # creates instance, prints IP on stdout's last line
provider_down <label-or-empty> # destroys instance(s) tagged mvm-ephemeral=true

up.sh and down.sh know nothing provider-specific — they source the chosen providers/${PROVIDER}.sh and call those four functions. up.sh enforces the contract via declare -F checks at startup, so a broken adapter fails fast with a clear error before any cloud calls.

  • All shell scripts pass shellcheck and shfmt -i 2 -ci. Adapters start with # shellcheck shell=bash.
  • All YAML parses cleanly: python3 -c "import yaml; yaml.safe_load(open(...))".
  • All scripts use set -euo pipefail.
  • run() helper in ops/hetzner/run-tests.sh uses local rc=0; cmd || rc=$? because if cmd; then…fi; rc=$? has a known bash + set -e interaction bug that silently captures rc=0 on failure.