Project Overview

I didn't just code a strategy; I engineered a research terminal for pairs mean-reversion, statistical analysis, backtesting, risk/position sizing, and on-disk caching. It's an interactive Dash app that I can extend at will.


Elevator Pitch

Type two tickers; it fetches and caches their prices, aligns their time series, estimates a hedge via OLS, computes naïve vs. hedged spreads, standardises to Z-scores, and gives entry/exit signals with a configurable backtester. I also run Engle-Granger cointegration and ADF to confirm stationarity (no correlation). On top, there's position sizing, a small risk panel (profit/stop ranges from dispersion), and a ticker metadata subsystem (dividends, valuation, liquidity, earnings) with a file manager to persist and compare fundamentals.


Technical Implementation

Architecture at a Glance

UI Layer:
  • Dash + dash-bootstrap-components with multi-panel layout
  • Reactive callbacks for dynamic updates
  • Interactive controls for all parametres

Data Layer:
  • yfinance downloads with CSV caching per (ticker, period, interval)
  • On-disk metadata cache per ticker
  • Robust error handling and retry logic

Analytics Core:
  • statsmodels (OLS, ADF, cointegration tests)
  • numpy/pandas for vectorized time-series math
  • Statistical validation

Backtesting Engine:
  • Deterministic state machine over Z-scores
  • Configurable entry/exit thresholds
  • Event markers on price & Z-score charts

Persistence & Operations:
  • JSON serialization through hidden Dash store
  • Timestamped logging
  • File/folder managers for CSVs & metadata
  • USFederalHolidayCalendar + CustomBusinessDay integration

Data Engineering - Clean, Aligned, Cache-First

The foundation is a robust data pipeline that prioritises speed, reliability, and reproducibility:


Cache-Aware Loader:
  • load_or_download_data() scans Price-CSV-Saves/ for existing files
  • Standardised schema: Datetime, Close, High, Low, Open, Volume
  • Cuts API pressure and ensures reproducibility

Robust Alignment:
  • Computes spreads on inner join of both Close series
  • Removes holes from holidays/missing bars
  • Logs matrix shapes, and early errors via Write_Log()

Trading-Days Awareness:
  • get_trading_days() uses federal holiday calendar + business-day offset
  • Measures real trading horizons for normalizing holding time
  • Essential for accurate performance metrics

Value: This is production-style data plumbing with fast iterations that avoid rate-limit drama and logs events.

Statistical Engine: Real Cointegration Discipline

The heart of the system centres around statistical validation which separates correlation from true cointegration:


Hedge Ratio via OLS:
  • Hedged spread: Sₜ = Close₁ₜ - β Close₂ₜ
  • Regression: Calculated using Ordinary Least Squares (OLS)
  • Hedge Ratio: Calculated from Regression results

  • Visualisation: 5-period moving average of spreads to see drift vs. mean reversion

Stationarity:
  • Engle-Granger cointegration: (p, stat, criticals) in both directions
  • ADF testing: On each price series and on the spread
  • Unit Root: If unit root is present, then data is non-stationary

Backtesting

A deterministic backtesting engine that provides clear visual entry and exits over past data:


Signal Generation:
  • generate_trading_signals() tracks the Z-score series
  • Clean entry/exit protocol with configurable thresholds
  • Enter long spread when z < -ve entry; enter short when z > entry
  • Exit when |z| crosses back inside the exit band
  • Tracks entry, exit, position over time

Configurable Engine:
  • UI inputs for entry/exit thresholds
  • Z-score source selection (naïve vs. hedged)
  • Flexible parametre tuning

Visual Guides:
  • Dual-axis price chart: Event markers (green triangle-up = buy, red triangle-down = sell, purple X = exit)
  • Z-score chart: Entry/exit lines and zero line for clarity
  • Stats summary: Total trade count + thresholds used for reproducibility

Value: The backtester is simple, intuitive, and debuggable, allowing it to give clear insights.

"Pairs Analysis" Checks All in One Place

A comprehensive pre-trade quality assessment:


One-Click Analysis:
  • Correlation: Sanity check for basic relationship
  • Hedge ratio (β): OLS-estimation
  • ADF testing: On both legs and on the spread
  • Cointegration: Both directions for robustness

Suitability Verdict:
  • Combines correlation + cointegration + spread stationarity
  • Go/no-go decision for pairs trading

Metadata / Fundamentals Cache for Comparison

Comprehensive fundamental analysis and comparison tools:


Data Fetcher:
  • yfinance .info grouped into categories
  • Overview, Valuation, Dividends, Liquidity, Earnings

Persistence:
  • Materialise each category to CSV under Metadata-CSV-Saves/<TICKER>/<CATEGORY>.csv
  • Structured data for easy access

User Interface:
  • Category switcher + side-by-side cards for comparison
  • File manager with refresh and delete actions
  • Easy navigation between different fundamental aspects

Why it matters: Fundamentals context (dividend policy, liquidity, beta) often explains spread behavior or why cointegration holds/fails.

Visualisation: Clean, Informative

Professional-grade charts that provide actionable insights:


Three Core Charts:
  • Dual-axis price: Both assets on same timeline
  • Naïve spread: + 5-MA + Z-score standardisation
  • Hedged spread: + 5-MA + Z-score standardisation

Graph Selector:
  • User toggle which charts render
  • Customisable view based on analysis needs

Design Standards:
  • Plotly dark template for professional appearance
  • Consistent legends and axis titles
  • Colour coded dual-axis for clarity

Why This Project Has Real Value

This is not a static notebook. It's an interactive decision making tool.

  • Validates the pair statistically (cointegration + ADF)
  • Generates deterministic signals
  • Backtests them with clear visual attribution
  • Sizes positions coherently with hedge ratio
  • Saves everything so you can iterate fast without rate limits

This project represents the intersection of quant finance and software engineering.


For me, this was about building a research tool that I could actually use. It allowed me to develop a system that aided my trading decision making.


Project Gallery