Daniel & Michael Han:

DeepSeek-R1 has been making waves recently by rivaling OpenAI’s O1 reasoning model while being fully open-source. We explored how to enable more local users to run it and managed to quantize DeepSeek’s R1 671B parameter model to 131GB in size, a 80% reduction in size from the original 720GB, whilst being very functional.

By studying DeepSeek R1’s architecture, we managed to selectively quantize certain layers to higher bits (like 4bit), and leave most MoE layers (like those used in GPT-4) to 1.5bit (see Unsloth Dynamic 4-bit). Naively quantizing all layers breaks the model entirely, causing endless loops and gibberish outputs. Our dynamic quants solve this.

The 1.58bit quantization should fit in 160GB of VRAM for fast inference (2x H100 80GB), with it attaining around 140 tokens per second. You don’t need VRAM (GPU) to run 1.58bit R1, just 20GB of RAM (CPU) will work however it maybe slow. For optimal performance, we recommend the sum of VRAM + RAM to be at least 80GB+.

More Technology Knowledge Updates…

Frontend Aesthetics: A Prompting Guide

2025-10-24

2:40 PM
React 19.2

2025-10-02

10:31 AM
Pino

2025-09-26

9:32 AM
React Router RSC Framework Mode Preview

2025-09-24

9:20 AM
k9s

2025-09-24

6:49 AM
EUV Photolithography

2025-09-23

2:58 PM
pprof

2025-09-23

11:49 AM
Cap’n Web: RPC system for browsers and web servers

2025-09-22

10:33 PM
OpenTelemetry Tracing in 200 lines of code

2025-09-19

2:23 PM
force_balance_tags function breaks nested G in SVG

2025-09-19

10:51 AM

Reid Burke

Run DeepSeek-R1 Dynamic 1.58-bit ↗

Leave a Reply

More Technology Knowledge Updates…

Frontend Aesthetics: A Prompting Guide

React 19.2

Pino

React Router RSC Framework Mode Preview

k9s

EUV Photolithography

pprof

Cap’n Web: RPC system for browsers and web servers

OpenTelemetry Tracing in 200 lines of code

force_balance_tags function breaks nested G in SVG

STay Updated

Leave a Reply

More Technology Knowledge Updates…