Project Overview

This project evolved from a simple computer science homework assignment into a comprehensive AI/ML pipeline. The game involves 15 matchsticks where players take 1-3 per turn, with the player forced to take the last stick losing.

I built incremental versions (V1-V10), progressing from basic game logic to full AI analysis, ultimately training a Random Forest Classifier to predict the winner of the game based on different game states.

Technical Implementation

Phase 1: Game Logic & Mathematical Analysis

Developed core game mechanics (CPU vs CPU, CPU vs human) and proved the optimal strategy mathematically:

  • Mathematical proof: Formula Tn = 4n - 3 identifies losing positions (1, 5, 9, 13...)
  • Strategy implementation: Optimal CPU player wins ~98.8% against random opponents
  • Probability analysis: 50% optimal strategy ≈ 71.5% win rate, ⅓ optimal ≈ 58.4% win rate

Phase 2: Data Engineering Pipeline

Scaled from CSV storage to MySQL database architecture:

  • Database integration: Used mysql.connector with Pandas dataframes for efficient data handling
  • Data generation: Automated simulation of 1 million+ games for ML training
  • Readability considerations: Using Pandas and MySQL for better data handling led to cleaner data and hence better code

Phase 3: Machine Learning Implementation

Implemented scikit-learn Random Forest Classifier for game outcome prediction:

  • ML pipeline: 80/20 train-test split, feature engineering from game move sequences
  • Model architecture: Random Forest chosen for noise robustness and interpretability
  • Visualization: Used Graphviz & sklearn.tree for decision tree analysis
  • Performance: 99.9-100% accuracy on fixed strategies, >99% on 1M game dataset

Technical Achievements

This project demonstrates a complete data science workflow:

  • Game theory & probability: Mathematical proof of optimal strategy
  • Simulation engineering: Millions of automated games with statistical analysis
  • Data engineering: CSV → MySQL migration, Pandas workflows, scalable architectures
  • Machine learning: Random Forest implementation, model interpretation, decision tree visualization
  • Data visualization: Matplotlib for probability distributions and Graphviz for decision tree visualization