OPEN-SOURCE AI · MADE IN COLOMBIA

Ranks #3.
Behind only
Claude
and GPT-4o.

Orchid 1.0 is a 2-billion-parameter ternary-weight language model fine-tuned on a single 4 GB laptop GPU. On our internal benchmark it outscores every open-weight model we tested — including 7B–9B systems.

Download Orchid Desktop Explore the model

2B ternary weightsApache 2.0Runs CPU-onlyWin · Linux

#3/ 12

Internal benchmark rank, above every open-weight model tested

2B params

Ternary weights (−1, 0, +1) — ~1.1 GB on disk

4GB VRAM

Trained & aligned on one RTX 3050 laptop

0cloud GPUs

No datacenter. No cloud bill. ~6 tok/s on CPU alone

THE PROOF

A 2B model in a
7B+ fight.

Internal Benchmark v2 — 100 questions across 8 categories, semantic similarity scoring. Orchid lands third of twelve models, ahead of every open-weight system, including Qwen2.5-7B and Kimi k1.5.

Science 100% · Math 93.3% · Coding 93.3%

See full benchmarks →

INTERNAL BENCHMARK V2

Orchid 1.0

Claude 3.5 Sonnet

89.5

GPT-4o

89.2

Orchid 1.0 · 2B

87.9

BitNet b1.58 · 2B

84.2

Kimi k1.5

82.2

Qwen2.5 · 7B

78.4

Semantic-similarity scoring is a relative comparison tool, not a substitute for standard NLP benchmarks.

TWO OPEN-SOURCE PROJECTS

A model, and the engine
that makes it run.

Model

Orchid 1.0

The first competitive LLM trained and aligned in Colombia. Aligned with ORPO for unbiased, multilingual responses on consumer hardware — no cloud dependency.

Explore the model →

−1

0

+1

Engine

ternative

The inference engine for ternary-weight LLMs with runtime LoRA — "the llama.cpp of BitNet models." It serves combinations no other stack can run correctly.

How it works →

THE STORY

88 hours. One laptop.
No datacenter.

Every training stage ran on a single RTX 3050 laptop — 4 GB of VRAM, 16 GB of RAM, Windows 11. SFT, then two rounds of ORPO alignment, with memory tricks that made it possible to fine-tune a 2B model on hardware most people already own.

Read the full story →

training_run.log

# single RTX 3050 · 4 GB VRAM · no cloud
SFT-A   LoRA r=16   reasoning        ~1 h
SFT-B   LoRA r=16   5,500 samples    ~88 h
ORPO-2  LoRA r=8    2,038 pairs      ~26 h
ORPO-3  LoRA r=8    2,104 pairs      ~54 h
# total cloud GPUs used: 0