aaronson-oracle-baseball

  • Metadata

  • Adapting Aaronson’s Oracle to predict baseball pitches for a complete game

  • Aaronson’s Oracle

    • Algorithm that predicts the next keystroke by analyzing the past 5 keystrokes and the 32 possible combinations they can create
    • Then see what is more likely to follow that 5 keystroke combination
    • So when the user starts to type a particular sequence it will guess what the next keystroke is
    • It is 70% accurate
    • Enhancements can be also storing other various lengths and use the most confident one
  • Pseudo-code

    • initialize a dict with 5-gram combinations as keys and the number of times each keystroke follow this combination as dict of values
    • record the rolling 5-gram combination from the user
    • look up the highest probable next keystroke, return that or default to one by chance if there are no stored values
    • update the dict with user inputs
    • calculate the rolling accuracy when making a prediction

2026-01-01: performance

The existing performance of the models were:

ModelOverallFastBreakingOff-SpeedN Pitches
Naive (Always Fast)0.56431.00000.00000.000024,917
N-Gram (n=3)0.53350.77950.24960.112124,917
N-Gram (n=4)0.53760.82230.19770.083724,917
Frequency-Based (Oracle)0.48340.59790.37750.210424,917
Markov Context0.55520.94930.05400.017524,917

2026-03-15: autoresearch

Leveraging Andrej Karthpath’s autoresearch loop to discover better models for predicting pitch sequencing. After a night of running and my Macbook dying half way, it was able to run 35 experiments and discover a model with 0.61% accuracy on the evaluation dataset that had an architecture of 6-layer transformer, 4 heads, 128d model, FFN 6x hidden dim with some regularization. It also found that other architectures like GBM or LSTM performed much worse.

Side note on how it works -> autoresearch

#data-science #projects