real games

We finally get some real California Cup games back this week with the LA and Bay Area rivalry games. I also added some more “real game” sauce to the Monte Carlo sampled margin predictor. Instead of just margins I checked in a fixed array representing 10 years of actual game scores from MCC games. Thus instead of faking a score completely or using a baseline and only adding a margin, our random predictor is actually using real game scores. (These scores are naturally biased toward the home team winning in the exact percentage we want to mimic so the net effect should be as good as flipping the 55% coin.)

This doesn’t meaningfully change the Monte Carlo simulation for the California data but does have an interesting result when I sanity check it against the Texas data set.

$ python3 ./mcc_schedule.py  -v
Texas Tech 38 at Houston 21 on Sep 04, 2021
Baylor 29 at Texas State 22 on Sep 04, 2021
Houston 44 at Rice 7 on Sep 11, 2021
North Texas 12 at SMU 35 on Sep 11, 2021
Rice 0 at Texas 58 on Sep 18, 2021
SMU 42 at TCU 34 on Sep 25, 2021
Texas Tech 35 at Texas 70 on Sep 25, 2021
Texas 32 at TCU 27 on Oct 02, 2021
TCU 52 at Texas Tech 31 on Oct 09, 2021
Rice 0 at UT San Antonio 45 on Oct 16, 2021
Texas 24 at Baylor 31 on Oct 30, 2021
North Texas 30 at Rice 24 on Oct 30, 2021
SMU 37 at Houston 44 on Oct 30, 2021
Baylor 28 at TCU 30 on Nov 06, 2021
UT San Antonio 44 at UTEP 23 on Nov 06, 2021
UTEP 17 at North Texas 20 on Nov 13, 2021
Rice at UTEP on Nov 20, 2021
Texas Tech at Baylor on Nov 26, 2021
UT San Antonio at North Texas on Nov 27, 2021

Full Enumeration Simulation:
Baylor 2 [25%]
UT San Antonio 4 [50%]
Texas 2 [25%]

Monte Carlo [Sampled Home Margin Predictor] Simulation:
Baylor 2938 [29%]
UT San Antonio 4444 [44%]
Texas 2455 [24%]
North Texas 162 [1%]

UT San Antonio          2-0
Texas                   3-1
North Texas             2-1
SMU                     2-1
Baylor                  2-1
Houston                 2-1
TCU                     2-2
Texas Tech              1-2
Texas State             0-1
UTEP                    0-2
Rice                    0-4

There are only three games remaining so the full enumeration has only 8 possibilities, none of which involve North Texas winning the title. But the Monte Carlo run shows UNT winning in 1% of outcomes. Is this a bug? I dumped out detailed results when those 1% hit and each random projection shows North Texas beating UTSA with massive (though historically possible) margins in the same timeline as Baylor’s win over Texas Tech being quite modest. In this case one of the tiebreakers on margins would favor North Texas. This is a nice side effect of using real data for the predictor. As we discussed, the “full” enumerator only checks two outcomes for each game, a win and a loss by the same (large) margin. When your tiebreaker algorithm relies on margins there are going to be corner cases that aren’t tested with only two outcomes. Using the 10,000 trial randomizer leaks in just enough “weird” scores that we tease out an actual possibility that was hidden in the enumerator. This does raise the possibility that our recursive tree-walk should actually branch out four paths from each node: A close and a blowout win for each team.