Project Overview
I wanted to move beyond just analysing secondary data and actually record my own. Since I'm obsessed with American Football, and the Combine was taking place, I took an interest in the 40-yard dash. The plan: build a system using everyday devices to measure sprint times accurately at home. I also wanted to study combine data to analyse performance.
The tech stack evolved fast:
- Started with Python + OpenCV for motion detection.
- Shifted to HTML/JS + PHP so external devices could access it via a hosted site on the internet.
- Linked cameras to detect motion, log timestamps, and push data into a remote MySQL database through a self-built API.
- Tried apps with Kotlin and Android Studio, but laptops proved more reliable (better time sync).
- Set up four laptops, 10 yards apart, with a countdown beep system.
- Later, ran Machine Learning models to predict missing splits and refine results.
This was essentially about breaking down the "simple" problem (timing a run) into proper software engineering and statistical analysis.
Technical Implementation
Phase 1: One Device, Motion Detection in Python
The goal here was to prove I could make a motion detector from scratch with Python.
Key steps:- Used OpenCV (cv2) to grab frames, greyscale them, and apply Gaussian blur to smooth out noise like trees moving in the background.
- Built logic to compare frames against a static background; if pixel differences were less that the threshold, motion was flagged.
- Used time.perf_counter() for precise timestamps (Unix time wasn't reliable across devices).
- Added beep countdowns via winsound.Beep(), though lag meant “end correction” would need attention later.
- Solved hardware problems (camera focus) with LEGO contraptions that forced cameras to face straight ahead.
- Saved outputs to CSV initially, but flagged MySQL as necessary for scaling and data safety (to avoid issues like CSVs breaking from rogue commas).
Learning outcome: Motion detection is not as simple as “spot a difference.” Computers need pixel-level rules and careful noise control. This set the foundation for scaling.
Phase 2: Multiple Devices and Early Web Approach
Now the challenge was scaling beyond one device. I tried a load of routes:
- Android App (Kotlin + XML + Gradle): massive learning curve and permission nightmares led to me abandoning it.
- Python Apps (Kivy + Buildozer on Ubuntu): wouldn't compile properly on my WSL system.
- Running Python scripts on phones (Pydroid/Termux): blocked libraries and weak hardware led errors.
- Web route (HTML + JS): finally worked.
Main breakthroughs:
- Learned to call OpenCV via CDN on client-side JS.
- Used getUserMedia API to access cameras after permissions.
- Recreated Python logic: greyscale → Gaussian blur (21x21 kernel) → set static background → detected absolute differences.
- Output displayed via HTML canvases instead of .imshow().
- Had to shift from synchronous Python loops to asynchronous JS (setInterval).
- Motion detection logic ran: compared greyed frame vs a static screenshot. It then counted non-zero pixels and logged a timestamp if motion had occurred.
Lesson: JS required a total rethink - async coding, different syntax, DOM manipulation - but I finally managed to replicate the Python pipeline.
Phase 3: Hosting, Databases, and Sync
Time to make the system work across multiple machines.
Hosting attempts:
- IIS (Windows Server): ports/firewalls blocked external access.
- Python (Flask/FastAPI): couldn't get stable multi-device connections.
- Hostinger Web Hosting: led to success. HTTPS, password protection and a secure remote MySQL Database enabled the solution.
Solutions built:
- PHP scripts for writing to the DB, using prepared statements for SQL injection protection.
- Wrote a basic API: JS sends data via formatted URL → PHP reads via $_GET → data then inserted into the database.
- Tried to fix time disparity with NTP servers (Google), but phone hardware lagged. Eventually synced all laptops to windows.time instead.
- Final setup: 4 laptops across 40 yards (1 every 10 yards). I synced their clocks, logging each split into the remote MySQL database.
- Hybrid Approach: one laptop also ran Python (for start + 10yd split), while the others ran the hosted JS site.
Testing: worked outdoors, but wind/tree motion sometimes triggered false positives. Had to hotspot all devices for network access in poor connection spots.
Big win: cracked multi-device motion detection + timestamping with web hosting, PHP, and Database integration.
Phase 4: Data Analysis with ML + Stats on Secondary Data
I now wanted to perform statistical analysis on NFL combine and 100m sprint data. Then, I wanted to use the knowledge learned from applying ML models to these sources of data to
Combine Data (2009-2019):- I created and analysed scatterplot matrices. I found that the 40yd dash, shuttle, 3-cone are highly correlated.
- Inverse relationship between jumps (broad/vertical) and running drills.
- Wide Receivers (WRs) performed much better in the draft with lower 40 times.
Machine Learning - Random Forest Classifier:
- Encoded categorical variables and scaled values with MinMaxScaler.
- Achieved 75% accuracy on predicting draft status from combine metrics.
- Feature importances: 40yd dash = most important with 3-cone close behind. Age was the least relevant factor.
Sprint Science (100m Olympic splits):
- Analysed whether 10m acceleration or max velocity is a better predictor of overall time.
- Used Linear regression with error metrics (MAPE, MAE, Bias, R², NSE).
- Found final 100m time correlates better with max velocity than early acceleration.
- Regression accuracy with both features improved slightly, but high correlation between features them made gains marginal.
Own Data Regression:
- Trained a Linear Regression model on local runs (10-40m splits).
- I then used this model to fill in missing data (e.g. 30m/40m splits) with ~1-2% error.
- Proved small-sample regression still works if trained on domain-similar data.
Takeaway: Speed metrics dominated draft potential. In sprint mechanics, top speed beats early acceleration for predicting overall performance. Regression allowed filling gaps in my own dataset accurately.
Final Word
This project turned from “timing a sprint at home” into a full-stack software engineering deep dive:
- Motion detection (OpenCV, Gaussian blur, pixel thresholds).
- Client/server architecture (HTML, JS, PHP, APIs, Databases).
- Networking & hosting (IIS, Flask, Hostinger, sync via NTP/windows.time).
- Data science (linear regression, data visualisation, error metrics, feature importance).
This project forced me to think like an engineer: breaking problems down, testing in small increments, constant pivoting when something fails. I now know more about computer vision systems, asynchronous programming, server security, time protocols, and machine learning models.
And most importantly: I had fun timing my own 40yd dashes!
Project Gallery
Training Session
Debugging the motion detection system
Performance Recording
Video recording of personal 40-yard dash performance for testing, analysis and improvement