Research

We publish our findings. Transparent methodology, open data, reproducible results.

Working Paper

Benchmarking Local LLMs for Financial Tweet Sentiment

Can locally-deployed open-weight LLMs match cloud APIs for financial sentiment analysis? We benchmark 12 Ollama models against VADER and FinBERT on 1,870 real financial tweets from 27 Twitter/X accounts, running entirely on consumer hardware.

Evaluating Qwen3, Gemma3, LLaMA, Mistral, Phi-4, DeepSeek-R1, and 3 finance-tuned models with full ML metrics, statistical significance tests, and confidence calibration.

Pre-print (coming soon) Code & Data

1,870

Real tweets

Models tested

Twitter handles

4GB

VRAM required

See how research powers the platform

Our strategies are backed by transparent methodology and reproducible backtests.

Get Started Free