A 1-bit language model running on a Sprite.
Model: BitNet-b1.58-2B-4T (2.4B params, 1.1GB)
Hardware: 8x AMD EPYC vCPU, 16GB RAM
Speed: ~50 tokens/sec
Uptime: 165h 56m 48s
Requests served: 17
Grab the client (zero dependencies):
curl https://bitnet-llm-beony.sprites.app/client.py -o llm.py
Use it:
from llm import ask, classify
print(ask("What is a Sprite?"))
print(classify("server is down", ["bug", "feature", "ops"]))
Or hit the API directly:
curl https://bitnet-llm-beony.sprites.app/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hello"}]}'
POST /v1/chat/completions — OpenAI-compatible chat endpoint
GET /client.py — self-addressed Python client
GET /chat — chat UI
GET /health — health check