Project Overview
This project evolved from a simple computer science homework assignment into a comprehensive AI/ML pipeline. The game involves 15 matchsticks where players take 1-3 per turn, with the player forced to take the last stick losing.
I built incremental versions (V1-V10), progressing from basic game logic to full AI analysis, ultimately training a Random Forest Classifier to predict the winner of the game based on different game states.
Technical Implementation
Phase 1: Game Logic & Mathematical Analysis
Developed core game mechanics (CPU vs CPU, CPU vs human) and proved the optimal strategy mathematically:
- Mathematical proof: Formula Tn = 4n - 3 identifies losing positions (1, 5, 9, 13...)
- Strategy implementation: Optimal CPU player wins ~98.8% against random opponents
- Probability analysis: 50% optimal strategy ≈ 71.5% win rate, ⅓ optimal ≈ 58.4% win rate
Phase 2: Data Engineering Pipeline
Scaled from CSV storage to MySQL database architecture:
- Database integration: Used mysql.connector with Pandas dataframes for efficient data handling
- Data generation: Automated simulation of 1 million+ games for ML training
- Readability considerations: Using Pandas and MySQL for better data handling led to cleaner data and hence better code
Phase 3: Machine Learning Implementation
Implemented scikit-learn Random Forest Classifier for game outcome prediction:
- ML pipeline: 80/20 train-test split, feature engineering from game move sequences
- Model architecture: Random Forest chosen for noise robustness and interpretability
- Visualization: Used Graphviz & sklearn.tree for decision tree analysis
- Performance: 99.9-100% accuracy on fixed strategies, >99% on 1M game dataset
Technical Achievements
This project demonstrates a complete data science workflow:
- Game theory & probability: Mathematical proof of optimal strategy
- Simulation engineering: Millions of automated games with statistical analysis
- Data engineering: CSV → MySQL migration, Pandas workflows, scalable architectures
- Machine learning: Random Forest implementation, model interpretation, decision tree visualization
- Data visualization: Matplotlib for probability distributions and Graphviz for decision tree visualization