algorithmic tie-breaking

As we identified last month our worst bug right now is the handling of multi-way ties in the final standings so let’s dive in with some new code. First off in this commit we get rid of the false positive and identify the ties we are not handling so that we actually fail when we fail. No more silently pretending everything is fine if we can’t actually evaluate a champion. Rename the old break_ties into the more correct break_two_team_tie.

Then with this one we actually deal with three-teamers and move 4+ into well-handled failure.

My initial instinct was that we could isolate the “two team tie” algorithm and then simply run that for all three dyads in the three-team-tie scenario. The problem with that is that our series of tiebreaks has a priority to it, so in the case of a three-dyad check we need to do a “shallow” sweep. We want to make sure all the higher priority checks complete before a lower one is invoked.

Specifically: the most important tiebreak is head-to-head record; the least important is overall scoring margin. If three teams A, B, and C are all tied with a 3-1 record, we want to make sure we check all the dyads for head-to-head before diving down to gross margin. If B’c one loss was to C that’s an immediate short-circuit and B is removed from consideration. If we do a “depth first” comparison we risk improperly eliminating A on the AB dyad check because of gross margin. Because the three-team check fails fast if there’s any resolution to any check on any dyad, there is an element of randomness in which team is eliminated first.

This is not a bug because the remaining two must still pass through the exhaustive two team tiebreak. The three-team check is picking a loser, not a winner. In other words if A, B, and C all have 3-1 records but A beat B and B beat C, we might eliminate B first and fall out of the three-team check. When the AC dyad is evaluated as the final check, C’s loss to B will show up in the common opponent margin and produce A as the winner.

So how does it work? Fortunately we have real world data from just this year in the “Texas” dataset:

$ python3 ./ -s 2021 -e 2021 -v
Texas Tech 38 at Houston 21 on Sep 04, 2021
Baylor 29 at Texas State 22 on Sep 04, 2021
Houston 44 at Rice 7 on Sep 11, 2021
North Texas 12 at SMU 35 on Sep 11, 2021
Rice 0 at Texas 58 on Sep 18, 2021
SMU 42 at TCU 34 on Sep 25, 2021
Texas Tech 35 at Texas 70 on Sep 25, 2021
Texas 32 at TCU 27 on Oct 02, 2021
TCU 52 at Texas Tech 31 on Oct 09, 2021
Rice 0 at UT San Antonio 45 on Oct 16, 2021
Texas 24 at Baylor 31 on Oct 30, 2021
North Texas 30 at Rice 24 on Oct 30, 2021
SMU 37 at Houston 44 on Oct 30, 2021
Baylor 28 at TCU 30 on Nov 06, 2021
UT San Antonio 44 at UTEP 23 on Nov 06, 2021
UTEP 17 at North Texas 20 on Nov 13, 2021
Rice 28 at UTEP 38 on Nov 20, 2021
Texas Tech 24 at Baylor 27 on Nov 27, 2021
UT San Antonio 23 at North Texas 45 on Nov 27, 2021

TBRK 3-team tie
TBRK 3-team tie broken by H2H for Baylor and Texas
TBRK 2-team tie
TBRK head-to-head didn't resolve anything
TBRK oppo check for North Texas and Baylor
TBRK common opponent margin didn't resolve anything
TBRK total margin for North Texas and Baylor
2021 final standings

Baylor                  3-1
North Texas             3-1
Texas                   3-1
UT San Antonio          2-1
SMU                     2-1
Houston                 2-1
TCU                     2-2
UTEP                    1-2
Texas Tech              1-3
Texas State             0-1
Rice                    0-5

2021, 19, Baylor, 3-1

Three teams finished 3-1 but in evaluating all the dyads the 3-team check finds that Baylor beat Texas, so Texas is out. The final tie-breaker check between Baylor and UNT comes down to gross margin. It’s not perfect but at least it eliminates the random injustice we had before. In the naive “all ties are just two teams” codebase before whichever team settled out in third (with an identical record) was never considered. In this case the obvious worst candidate is eliminated, then the champ is awarded by the weakest method, gross margin.

In the post where I identified the bug we used the 2013 season on the Texas dataset as the test. Here’s what it looks like now:

$ python3 ./ -s 2013 -e 2013 -v
Texas Tech 41 at SMU 23 on Aug 30, 2013
Rice 31 at Texas A&M 52 on Aug 31, 2013
TCU 10 at Texas Tech 20 on Sep 12, 2013
Houston 31 at Rice 26 on Sep 21, 2013
SMU 13 at Texas A&M 42 on Sep 21, 2013
Texas State 7 at Texas Tech 33 on Sep 21, 2013
UT San Antonio 32 at UTEP 13 on Sep 21, 2013
SMU 17 at TCU 48 on Sep 28, 2013
Houston 59 at UT San Antonio 28 on Sep 28, 2013
Rice 27 at UT San Antonio 21 on Oct 12, 2013
UTEP 7 at Rice 45 on Oct 26, 2013
Texas 30 at TCU 7 on Oct 26, 2013
Rice 16 at North Texas 28 on Oct 31, 2013
UTEP 7 at Texas A&M 57 on Nov 02, 2013
UTEP 7 at North Texas 41 on Nov 09, 2013
Texas Tech 34 at Baylor 63 on Nov 16, 2013
UT San Antonio 21 at North Texas 13 on Nov 23, 2013
Texas Tech 16 at Texas 41 on Nov 28, 2013
SMU 0 at Houston 34 on Nov 29, 2013
Baylor 41 at TCU 38 on Nov 30, 2013
Texas 10 at Baylor 30 on Dec 07, 2013

TBRK 3-team tie
TBRK oppo check for Baylor and Texas A&M
TBRK oppo check for Baylor and Houston
TBRK oppo check for Texas A&M and Houston
TBRK 3-team tie broken by common oppo for Texas A&M and Houston
TBRK 2-team tie
TBRK head-to-head didn't resolve anything
TBRK oppo check for Baylor and Texas A&M
TBRK common opponent margin didn't resolve anything
TBRK total margin for Baylor and Texas A&M
2013 final standings

Texas A&M               3-0
Baylor                  3-0
Houston                 3-0
North Texas             2-1
Texas                   2-1
Texas Tech              3-2
UT San Antonio          2-2
Rice                    2-3
TCU                     1-3
Texas State             0-1
UTEP                    0-4
SMU                     0-4

2013, 21, Texas A&M, 3-0

Now Houston gets a fair shake in the tiebreaker and is eliminated because it didn’t beat Rice as soundly as A&M did. Again, the final tiebreaker comes down to gross margin. In the last few months we’ve added Elo APIs to our project for the Monte Carlo simulation. I think a case could be made for a “strength of schedule” tiebreak, where we aggregate the year-end Elo for the two teams opponents (in the virtual conference.)

There’s a grey area here if you create a virtual conference that doesn’t have much scheduling tie-in… you will end up with pods of teams with similar records and no common opponents or head to head. No algorithm can save you from a badly designed “world.” One nice thing about the California Cup is that over the years there has been strong tradition of intra-California scheduling so it all makes sense. We have not seen a three team tie yet in the California data.

Some to-dos that are obvious from this work: true test scenarios! lt’s nice that Texas in 2013 produced a tie but we need to enumerate some torturous schedules with different ties and store them in a way that we can easily feed into the code and not depend on the real-world data feeding us test cases. Also the code monolith is getting a little much even for me. All the tie-break stuff is a natural to be moved into its own library.

Categorized as code