Internal benchmark rank, above every open-weight model tested
Ternary weights (−1, 0, +1) — ~1.1 GB on disk
Trained & aligned on one RTX 3050 laptop
No datacenter. No cloud bill. ~6 tok/s on CPU alone
A 2B model in a
7B+ fight.
Internal Benchmark v2 — 100 questions across 8 categories, semantic similarity scoring. Orchid lands third of twelve models, ahead of every open-weight system, including Qwen2.5-7B and Kimi k1.5.
Science 100% · Math 93.3% · Coding 93.3%
See full benchmarks →Semantic-similarity scoring is a relative comparison tool, not a substitute for standard NLP benchmarks.
A model, and the engine
that makes it run.

Orchid 1.0
The first competitive LLM trained and aligned in Colombia. Aligned with ORPO for unbiased, multilingual responses on consumer hardware — no cloud dependency.
Explore the model →ternative
The inference engine for ternary-weight LLMs with runtime LoRA — "the llama.cpp of BitNet models." It serves combinations no other stack can run correctly.
How it works →88 hours. One laptop.
No datacenter.
Every training stage ran on a single RTX 3050 laptop — 4 GB of VRAM, 16 GB of RAM, Windows 11. SFT, then two rounds of ORPO alignment, with memory tricks that made it possible to fine-tune a 2B model on hardware most people already own.
Read the full story →