Local LLMs · TUI · CLI · llama.cpp · Rust

Fast local LLMs with zero overhead.

Launch llama.cpp workloads from a beautiful terminal UI, without the setup drag.

init wizard daemon on demand OpenAI proxy --json CLI
curl -fsSL https://llamastash.dev/install.sh | sh

Next step

llamastash init

The wizard installs the right llama.cpp build for your hardware, downloads a starter GGUF, writes tuned config, and gets you started.

Agents can use llamastash init --recommended --json.

llamastash
llamastash TUI showing model list and resource panels

Recorded session · auto-loops · reduced motion swaps to static image

Features

The whole local-model loop, not just a launcher.

Install, discover, launch, smoke-test, script, and proxy local models through one tool. It stays transparent about llama.cpp while still smoothing off the annoying parts.

  • Zero-to-chat init wizard

    Run llamastash init once and it handles the annoying first-run work: detect hardware, install the right llama-server build, download a starter GGUF, write config, and smoke-launch it.

  • Scans what you already have

    Walks HuggingFace, Ollama, and LM Studio caches plus user paths. Reads GGUF metadata, dedupes symlinks and split files, and watches the catalog for new models without a restart.

  • One binary, three roles

    The TUI, CLI, and daemon are the same binary. The daemon auto-spawns when needed, running models survive UI exit, and the same launch primitives show up in both human and agent workflows.

  • Hardware-aware launches

    Built-in arch defaults, per-model ports, health-probed lifecycle, and intelligent context auto-fit mean fewer bad launches and fewer manual llama.cpp flags for every machine you touch.

  • OpenAI + Ollama proxy

    A built-in localhost proxy on 127.0.0.1:11435/v1 routes by model name, auto-starts the model you ask for, and can impersonate Ollama on 11434 when you need drop-in compatibility.

  • Agent-ready CLI

    Stable --json output, documented exit codes, pull and recommend subcommands, plus an installable AgentSkills bundle for Claude Code, OpenClaw, OpenCode, and other harnesses.

Why local

The cloud is great until it isn't.

Local models are catching up faster than most people realize. A modern laptop with 16+ GB of unified memory or a mid-range desktop GPU runs models that genuinely earn their keep — for code, writing, brainstorming, and tool-using agents.

LlamaStash exists to make that path low-friction: one binary, an init wizard that gets you to first response fast, and the same runtime surface in the TUI, CLI, and proxy.

  • Your data, your machine

    Prompts, context windows, completions — all stay in RAM on the box you're sitting at. There is no third party in the chain who could log them, leak them, or change their mind about retention policy tomorrow.

  • No surprise bills

    Local inference is free at the margin. The 10,000-token brainstorm at 2 a.m. costs the same as 10,000 tokens at noon: nothing. Sunsetting a model is your decision, not a vendor's.

  • Offline by default

    Once a model is on disk, it works on a plane, in a SCIF, in a coffee shop with sketchy WiFi. The CLI surfaces it the same way every time — no cold-start latency from a remote API.

  • Works with the tools you already use

    LlamaStash exposes a local OpenAI-compatible proxy on loopback. OpenCode, Pi, Cline, the OpenAI SDKs, or your own scripts can all talk to one stable URL while LlamaStash handles model routing and auto-start.

FAQ

Common questions

  • What does LlamaStash actually do?
    It's a terminal-native TUI and CLI for launching local models through llama.cpp. It scans the GGUF files you already have, helps you pick the right one for your hardware, starts and supervises llama-server, and exposes a local OpenAI-compatible proxy for tools and agents.
  • Does it send any data to a server?
    Inference traffic stays on your machine. The main runtime surface is local-only: Unix-socket IPC for the daemon and a loopback-only proxy for OpenAI-compatible clients. `init`, `pull`, and parts of `doctor` do use the network when they need to download or verify artifacts, but LlamaStash itself has no telemetry or analytics pipeline.
  • What platforms does it support?
    macOS (Apple Silicon + Intel) and Linux (x86_64 + aarch64) are first-class. Windows is still on the roadmap; it is not part of the first release contract.
  • Do I need llama.cpp installed already?
    Optional. If you already have `llama-server` on your PATH, LlamaStash will use it. If not, the init wizard offers to install a recommended build via brew or by downloading a prebuilt binary from llama.cpp's releases.
  • How is this different from Ollama, LM Studio, or jan?
    Ollama is opinionated around its own model packaging and daemon workflow. LM Studio and jan are heavier GUIs. LlamaStash is the transparent middle ground: one binary, daemon on demand, works directly against the GGUF files you already have, and gives you the same primitives in the TUI, CLI, and local proxy.
  • Is the install script safe to pipe into sh?
    The script is served from this site as a content-verified mirror of the asset published with each GitHub Release, with a SHA-256 sidecar verified at deploy time. If you want the most paranoid path, download it, read it, and run it yourself. Or skip the script entirely and use `cargo install llamastash --locked` or `brew install llamastash/llamastash/llamastash`.
  • Can I use LlamaStash with non-GGUF models?
    Not in 0.0.1 — llama.cpp is the runtime, and llama.cpp consumes GGUF. Other runtimes (vLLM, mlx-lm) are on the roadmap as opt-in backends once the GGUF + llama.cpp path is solid.
  • Can I point agents or editors at it?
    Yes. LlamaStash ships a local OpenAI-compatible proxy at `http://127.0.0.1:11435/v1` by default, with optional Ollama-compat mode on `11434`. It also ships an AgentSkills bundle under `skills/llamastash/` and a Claude Code plugin manifest for install flows that want a repo-packaged skill.
  • Is it open source?
    Yes — MIT licensed. Source at github.com/llamastash/llamastash. Issues and PRs welcome.