Run Claude Code for FREE with Local Gemma 4
Published: April 15, 2026 | Tags: claude-code, gemma-4, lm-studio, local-ai, tutorial
The Problem
Claude Code's API costs can hit $200/month. Free credits keep getting cut. That's expensive for just coding assistance.
The solution: Run Claude Code locally with Google's Gemma 4 model. Zero API costs, completely on your machine.
Result: Free, fast, and actually works for real coding tasks.
What You Need
- Mac with 16GB RAM (or similar Linux setup)
- Claude Code installed
- LM Studio
- Google Gemma 4 E4B model (only 6GB!)
Installation Steps
Step 1: Install Claude Code
npm install -g @anthropic-ai/claude-code
Step 2: Install LM Studio
Download from lmstudio.ai and install.
Step 3: Verify LM Studio CLI
lms
If command not found, close and reopen your terminal.
Step 4: Download Gemma 4 Model
- Model: Gemma 4 E4B
- Size: ~6GB (vs 16-19GB for other models)
- Runs smoothly on 16GB RAM Macs
- Download time: a few minutes
Step 5: Start Local Server
lms server start --port 1234
You'll see a success message when started.
Step 6: Configure Claude Code (3 lines)
export CLAUDE_API_BASE_URL=http://localhost:1234/v1
export ANTHROPIC_API_KEY=anything
# Add to ~/.zshrc for permanent setup
The API key can be anything since you're talking to your own machine.
Step 7: Launch Claude Code
claude
Check the top-left corner - it should show Gemma 4, not Sonnet or Opus.
⚠️ Critical Setting: Context Window
The default context window is too small! This will cause Claude Code to freeze when using tools on complex tasks. The fix is simple but essential.
Increase context length before starting work:
lms unload all
lms load --context-length 40960
Why 40960? This gives enough tokens for most tasks. Without this, tool calling fails because the context overflows. This single setting makes Claude Code 10x faster.
Demo: Build a Chrome Dinosaur Game
Test run with a real project - building Chrome's offline dinosaur game from scratch:
- Enter a detailed prompt in one shot
- Let Gemma 4 generate the complete game
- Result: Fully playable game with score, collision detection, restart button
Outcome: One prompt → complete working game → zero API costs
Pro Tips
- One-shot prompts: Small models accumulate context with each round. Write everything you need in the first prompt.
- Be specific: Include all requirements upfront - features, style, behavior.
- Context matters: Load the model with higher context before complex tasks.
- Speed vs capability: 4B models are fast but have limits. Know when to use cloud for heavy tasks.
Resources
Originally from a Chinese tutorial video. Translated and summarized for English audience.