Working Paper

Benchmarking Local LLMs for Financial Tweet Sentiment

Can locally-deployed open-weight LLMs match cloud APIs for financial sentiment analysis? We benchmark 12 Ollama models against VADER and FinBERT on 1,870 real financial tweets from 27 Twitter/X accounts, running entirely on consumer hardware.

Evaluating Qwen3, Gemma3, LLaMA, Mistral, Phi-4, DeepSeek-R1, and 3 finance-tuned models with full ML metrics, statistical significance tests, and confidence calibration.

1,870
Real tweets
14
Models tested
27
Twitter handles
4GB
VRAM required

Ready to Transform Your Trading?

Join thousands of traders using Six7 Alpha to make smarter, faster decisions.

Get Started Free