Research

Starling

Market Credibility Intelligence

AI-powered credibility scoring for retail investor predictions using sentiment analysis and systematic backtesting against market outcomes.

Financial Intelligence drksci research
Active Development Australia (ASX) Financial Services 2024-25 FastAPI Next.js PostgreSQL Playwright Perplexity AI Yahoo Finance
The Signal Problem

1,200 Posts Per Day. How Many Are Right?

HotCopper hosts Australia's largest retail investor community. Thousands of predictions daily, zero accountability. Everyone has an opinion. Nobody tracks who's actually correct.

650K+
Active Users
2M+
Posts/Year
0%
Tracked

Financial forums are peculiar places. They’re simultaneously the most democratic form of investment research and the least accountable. Anyone can post a stock tip. Nobody checks if it was right.

This creates an information asymmetry in reverse. The experienced trader who’s been consistently accurate for years looks identical to the newcomer making their first wild guess. There’s no reputation system, no track record, no empirical basis for trust.

Starling asks: What if we could systematically track every prediction and measure who’s actually right?


The Credibility Gap

Traditional financial credibility comes from credentials - CFA certifications, institutional affiliations, Bloomberg terminals. But retail investor communities operate outside this framework. They’re credential-free zones where a anonymous username can move markets.

CREDIBILITY SOURCES: TRADITIONAL VS RETAILINSTITUTIONALRETAIL FORUMSCFA CertificationAnonymous UsernameTrack Record (Audited)Post History (Unverified)Regulatory OversightCommunity ModerationFiduciary DutyNo AccountabilityHigh Trust BarrierZero Trust BarrierThe gap creates opportunity for empirical credibility systems
The credibility infrastructure gap in retail investor communities

This gap isn’t just theoretical. It has real consequences:

Information asymmetry - Experienced traders with genuine insight are drowned out by volume. The signal-to-noise ratio approaches zero.

Manipulation vulnerability - Without accountability, pump-and-dump schemes thrive. Bad actors face no consequences for consistently wrong or misleading predictions.

Wasted attention - Retail investors spend hours parsing forum posts with no way to prioritize whose opinions actually matter.


The Extraction Pipeline

Building credibility from chaos requires a systematic approach. We process forum data through a four-stage pipeline that transforms unstructured discussion into measurable prediction accuracy.

1

INGESTION

Stealth scraping with Playwright. JavaScript rendering. Rate-limited to avoid detection. Thread-level batching for efficiency.

2

ANALYSIS

Perplexity AI extracts sentiment, direction, magnitude estimates, and time horizons from natural language posts.

3

VALIDATION

Backtesting against Yahoo Finance data. Multiple time horizons: 7-day, 30-day, 90-day windows.

4

AGGREGATION

Per-user accuracy scores. Ticker-specific performance. Confidence-weighted rankings.


Sentiment Extraction

The AI analysis layer processes entire forum threads in single API calls - a cost optimization that also provides better context. Individual posts are analyzed within the conversational flow of the discussion.

// Sample extraction from HotCopper post
INPUT (forum post):
"LKE looking strong here. Chart forming a cup and handle, expecting a breakout above 0.85 within the next few weeks. Could see 1.20 by end of month if volume picks up."
OUTPUT (structured prediction):
ticker: LKE
sentiment: BULLISH
direction: UP
entry_price: 0.85
target_price: 1.20
time_horizon: MEDIUM_TERM (8-30 days)
confidence: 0.72

The extraction handles ambiguity. Not every post contains a tradeable prediction - many are questions, reactions, or general commentary. The AI distinguishes between:

  • Explicit predictions - Clear directional calls with price targets
  • Implicit sentiment - General bullish/bearish tone without specific targets
  • Neutral content - Questions, news sharing, or factual discussion

Time Horizon Buckets

Predictions aren’t binary. A correct 7-day call is different from a correct 90-day thesis. We validate across three distinct windows:

NEAR-TERM

1-7 days
Momentum plays, news reactions, technical breakouts
Avg predictions/day 342

MEDIUM-TERM

8-30 days
Earnings plays, catalyst events, trend following
Avg predictions/day 187

LONG-TERM

31-90 days
Fundamental thesis, sector rotation, macro calls
Avg predictions/day 94

Accuracy Metrics

We measure prediction quality across multiple dimensions. Direction accuracy alone isn’t sufficient - magnitude matters.

ACCURACY DECOMPOSITIONDirection AccuracyDid price move in predicted direction?UP / DOWN / NEUTRALMagnitude AccuracyHow close was the % change estimate?|predicted - actual| / actualComposite ScoreWeighted combination with confidence factorScore = (0.6 * direction) + (0.3 * magnitude) + (0.1 * confidence_calibration)
Multi-dimensional accuracy scoring system

A user who correctly predicts direction 80% of the time but wildly overestimates magnitude isn’t as valuable as one who’s accurate on both dimensions. The composite score captures this nuance.


The Leaderboard Emerges

After processing 6 months of HotCopper data, patterns emerge. The distribution of accuracy is not normal - it’s heavily skewed.

USER ACCURACY DISTRIBUTION (n=12,847)0500100030%40%50%60%70%Top 2%Direction Accuracy Rate
Most users cluster around 50% (random). The tail reveals signal.

Most users perform at or below chance level - consistent with noise. But the right tail tells a different story. Roughly 2% of users maintain accuracy above 65% across 50+ predictions. This is statistically significant signal.


User Personas

The data reveals distinct user archetypes based on prediction behavior and accuracy patterns:

🎯

THE SPECIALIST

High accuracy on 2-3 specific tickers. Deep sector knowledge. Fewer predictions but higher conviction. Average 68% direction accuracy in their domain.

📊

THE TECHNICIAN

Chart-focused predictions. Strong near-term accuracy, weaker long-term. High volume of predictions. Pattern recognition over fundamentals.

🔔

THE PROMOTER

Consistently bullish regardless of conditions. High prediction volume, low accuracy. Often first to post on small-cap announcements. Signal value: negative.


System Architecture

The technical implementation prioritizes cost efficiency and scalability. Processing millions of posts through AI analysis requires careful batching and caching.

HotCopper ──[playwright]──> Posts DB ──[perplexity]──> Predictions
Yahoo Finance ──[yfinance]──> Price Data ──[backtest]──> Outcomes
User Rankings ──> Dashboard

Key optimizations:

  • Thread-level batching - One Perplexity API call per forum thread, not per post
  • Activity-based scheduling - Active threads scraped every 6 hours, dormant threads weekly
  • Response caching - Duplicate content detection avoids redundant AI processing
  • Resumable checkpoints - Long-running scrapes can resume from interruption

The Console

The system exposes a Next.js dashboard for exploring credibility data in real-time. Key interfaces include user rankings, prediction signals, and operational monitoring.

Starling dashboard overview showing key metrics and activity
Dashboard overview with real-time metrics and recent activity
User credibility rankings leaderboard
User rankings sorted by composite credibility score
Prediction signals interface
Extracted predictions with sentiment analysis and validation status
AI Operations monitoring
AI operations panel for monitoring extraction and analysis pipelines

Key Research Questions

?

Does historical prediction accuracy predict future accuracy? Is credibility persistent or mean-reverting?

?

Can we identify market manipulation patterns through coordinated prediction behavior?

?

What's the correlation between prediction confidence and actual accuracy across user tiers?

?

Can ensemble predictions from top-ranked users outperform individual institutional analysis?


Future Applications

Credibility API

Real-time credibility scores for forum users. Integration with trading platforms to surface high-signal predictions.

Multi-Platform Expansion

Extend beyond HotCopper to Reddit (r/ASX_Bets), Twitter FinTwit, and international equivalents.

Manipulation Detection

Pattern recognition for coordinated pump schemes. Early warning system for retail investor protection.


This research focuses on the Australian Securities Exchange (ASX) via HotCopper. For collaboration inquiries: research@drksci.com