2022 forecasting – Mythical California Cup

The 2022 schedule is up in the cfbd API so let’s try a run:

$ python3 ./mcc_schedule.py -v
USC at Stanford on Sep 09, 2022
Fresno State at USC on Sep 16, 2022
San José State at Fresno State on Oct 14, 2022
San Diego State at Fresno State on Oct 28, 2022
Stanford at UCLA on Oct 28, 2022
California at USC on Nov 04, 2022
San José State at San Diego State on Nov 11, 2022
Stanford at California on Nov 18, 2022
USC at UCLA on Nov 18, 2022
UCLA at California on Nov 24, 2022

Full Enumeration Simulation:
USC 152 [14%]
San Diego State 144 [14%]
San José State 144 [14%]
California 128 [12%]
UCLA 124 [12%]
Stanford 124 [12%]
Fresno State 116 [11%]
No Winner 92 [8%]

Monte Carlo [Sampled Home Margin Predictor] Simulation:
Fresno State 1588 [15%]
California 1575 [15%]
USC 1561 [15%]
UCLA 1509 [15%]
San Diego State 1361 [13%]
Stanford 1186 [11%]
San José State 1119 [11%]
No Winner 101 [1%]

Monte Carlo [Elo Predictor] Simulation:
USC 1621 [16%]
California 1565 [15%]
UCLA 1523 [15%]
Fresno State 1441 [14%]
San Diego State 1346 [13%]
Stanford 1251 [12%]
San José State 1153 [11%]
No Winner 100 [1%]

There are no standings, possibly because no games were completed.
2022, 10, ,

Nice performance out of the box. These are the expected games I identified “by hand” in our earlier 2022 simulation exercise. The Monte Carlo sampled home margin predictor likes Fresno State and Cal for their extra home games. One unfortunate thing about this schedule is that USC/Fresno is the only P5/G5 crossover game. The whole exercise risks fragmentation if the schools don’t schedule each other a little more.

The Monte Carlo Elo Predictor looks suspiciously evenly spread. How could Stanford win 12% of all simulations? I dug around a little and realized that cfbd hasn’t populated Elo at all for 2022. They have a web UI tool for direct queries. 2022 Elo returns an empty set. Looking into the code as I have it and…

    def predict_game(self, game):
        if game.home_team not in self.elo_dict:
            home_elo = 1400
        else:
            home_elo_entry = self.elo_dict[game.home_team]
            home_elo = home_elo_entry.elo
        if game.away_team not in self.elo_dict:
            away_elo = 1400
        else:
            away_elo_entry = self.elo_dict[game.away_team]
            away_elo = away_elo_entry.elo

If there’s no set value we silently fail to the Elo midpoint, 1400. That looks like a bug. If we change the code to look at last year’s Elos we get a totally different run:

Monte Carlo [Elo Predictor] Simulation:
UCLA 4781 [47%]
Fresno State 3347 [33%]
USC 720 [7%]
California 652 [6%]
San Diego State 464 [4%]
San José State 13 [0%]
Stanford 9 [0%]
No Winner 14 [0%]

Now the have-nots are really shaken to the bottom. As a prediction this still has problems. We all know USC massively improved in the off-season. For now we have no way to capture that algorithmically. UCLA’s big advantage comes from them playing two apparent doormats (USC and Stanford) at home. Their one road game at Cal essentially is the dominant Bayesian bottleneck for them winning the title or not.

What about those “No Winner” outcomes? Most of them look something like this:

Stanford                2-1
Fresno State            2-1
UCLA                    2-1
California              2-1
San Diego State         1-1
USC                     1-3
San José State          0-2

Extremely unlikely, but our strict policy of failing on not breaking 4-way ties would die on this.

We’re only 4 months away from actual games and the testing harness and usability improvements aren’t done. That’s still next up.