S7

Research

We publish our findings. Transparent methodology, open data, reproducible results.

Working Paper

Benchmarking Local LLMs for Financial Tweet Sentiment

Can locally-deployed open-weight LLMs match cloud APIs for financial sentiment analysis? We benchmark 12 Ollama models against VADER and FinBERT on 1,870 real financial tweets from 27 Twitter/X accounts, running entirely on consumer hardware.

Evaluating Qwen3, Gemma3, LLaMA, Mistral, Phi-4, DeepSeek-R1, and 3 finance-tuned models with full ML metrics, statistical significance tests, and confidence calibration.

Pre-print (coming soon) Code & Data

1,870

Real tweets

14

Models tested

27

Twitter handles

4GB

VRAM required

See how research powers the platform

Our strategies are backed by transparent methodology and reproducible backtests.

Get Started Free