Project Overview
I didn't just code a strategy; I engineered a research terminal for pairs mean-reversion, statistical analysis, backtesting, risk/position sizing, and on-disk caching. It's an interactive Dash app that I can extend at will.
Elevator Pitch
Type two tickers; it fetches and caches their prices, aligns their time series, estimates a hedge via OLS, computes naïve vs. hedged spreads, standardises to Z-scores, and gives entry/exit signals with a configurable backtester. I also run Engle-Granger cointegration and ADF to confirm stationarity (no correlation). On top, there's position sizing, a small risk panel (profit/stop ranges from dispersion), and a ticker metadata subsystem (dividends, valuation, liquidity, earnings) with a file manager to persist and compare fundamentals.
Technical Implementation
Architecture at a Glance
UI Layer:- Dash + dash-bootstrap-components with multi-panel layout
- Reactive callbacks for dynamic updates
- Interactive controls for all parametres
Data Layer:
- yfinance downloads with CSV caching per (ticker, period, interval)
- On-disk metadata cache per ticker
- Robust error handling and retry logic
Analytics Core:
- statsmodels (OLS, ADF, cointegration tests)
- numpy/pandas for vectorized time-series math
- Statistical validation
Backtesting Engine:
- Deterministic state machine over Z-scores
- Configurable entry/exit thresholds
- Event markers on price & Z-score charts
Persistence & Operations:
- JSON serialization through hidden Dash store
- Timestamped logging
- File/folder managers for CSVs & metadata
- USFederalHolidayCalendar + CustomBusinessDay integration
Data Engineering - Clean, Aligned, Cache-First
The foundation is a robust data pipeline that prioritises speed, reliability, and reproducibility:
Cache-Aware Loader:
- load_or_download_data() scans Price-CSV-Saves/ for existing files
- Standardised schema: Datetime, Close, High, Low, Open, Volume
- Cuts API pressure and ensures reproducibility
Robust Alignment:
- Computes spreads on inner join of both Close series
- Removes holes from holidays/missing bars
- Logs matrix shapes, and early errors via Write_Log()
Trading-Days Awareness:
- get_trading_days() uses federal holiday calendar + business-day offset
- Measures real trading horizons for normalizing holding time
- Essential for accurate performance metrics
Value: This is production-style data plumbing with fast iterations that avoid rate-limit drama and logs events.
Statistical Engine: Real Cointegration Discipline
The heart of the system centres around statistical validation which separates correlation from true cointegration:
Hedge Ratio via OLS:
- Hedged spread: Sₜ = Close₁ₜ - β Close₂ₜ
- Regression: Calculated using Ordinary Least Squares (OLS)
Hedge Ratio: Calculated from Regression results
- Visualisation: 5-period moving average of spreads to see drift vs. mean reversion
Stationarity:
- Engle-Granger cointegration: (p, stat, criticals) in both directions
- ADF testing: On each price series and on the spread
- Unit Root: If unit root is present, then data is non-stationary
Backtesting
A deterministic backtesting engine that provides clear visual entry and exits over past data:
Signal Generation:
- generate_trading_signals() tracks the Z-score series
- Clean entry/exit protocol with configurable thresholds
- Enter long spread when z < -ve entry; enter short when z > entry
- Exit when |z| crosses back inside the exit band
- Tracks entry, exit, position over time
Configurable Engine:
- UI inputs for entry/exit thresholds
- Z-score source selection (naïve vs. hedged)
- Flexible parametre tuning
Visual Guides:
- Dual-axis price chart: Event markers (green triangle-up = buy, red triangle-down = sell, purple X = exit)
- Z-score chart: Entry/exit lines and zero line for clarity
- Stats summary: Total trade count + thresholds used for reproducibility
Value: The backtester is simple, intuitive, and debuggable, allowing it to give clear insights.
"Pairs Analysis" Checks All in One Place
A comprehensive pre-trade quality assessment:
One-Click Analysis:
- Correlation: Sanity check for basic relationship
- Hedge ratio (β): OLS-estimation
- ADF testing: On both legs and on the spread
- Cointegration: Both directions for robustness
Suitability Verdict:
- Combines correlation + cointegration + spread stationarity
- Go/no-go decision for pairs trading
Metadata / Fundamentals Cache for Comparison
Comprehensive fundamental analysis and comparison tools:
Data Fetcher:
- yfinance .info grouped into categories
- Overview, Valuation, Dividends, Liquidity, Earnings
Persistence:
- Materialise each category to CSV under Metadata-CSV-Saves/<TICKER>/<CATEGORY>.csv
- Structured data for easy access
User Interface:
- Category switcher + side-by-side cards for comparison
- File manager with refresh and delete actions
- Easy navigation between different fundamental aspects
Why it matters: Fundamentals context (dividend policy, liquidity, beta) often explains spread behavior or why cointegration holds/fails.
Visualisation: Clean, Informative
Professional-grade charts that provide actionable insights:
Three Core Charts:
- Dual-axis price: Both assets on same timeline
- Naïve spread: + 5-MA + Z-score standardisation
- Hedged spread: + 5-MA + Z-score standardisation
Graph Selector:
- User toggle which charts render
- Customisable view based on analysis needs
Design Standards:
- Plotly dark template for professional appearance
- Consistent legends and axis titles
- Colour coded dual-axis for clarity
Why This Project Has Real Value
This is not a static notebook. It's an interactive decision making tool.
- Validates the pair statistically (cointegration + ADF)
- Generates deterministic signals
- Backtests them with clear visual attribution
- Sizes positions coherently with hedge ratio
- Saves everything so you can iterate fast without rate limits
This project represents the intersection of quant finance and software engineering.
For me, this was about building a research tool that I could actually use. It allowed me to develop a system that aided my trading decision making.
Project Gallery
Dual Securities Monitoring
Shows two securities on the same graph with dual y-axes, allowing real-time monitoring of price movements and spread behavior between the two securities
Z-Score & Rolling Average
Displays the hedged spread with z-score standardisation and rolling averages. This provides statistical context for mean-reversion analysis
Backtesting Entry/Exit Signals
Visualises the backtesting results with z-score thresholds.