San Diego State is making history

We certainly had a decisive game this weekend. San Diego State went into Fresno as a slight 1.5 pt favorite and won the game 23-0.

Aztecs defeat Bulldogs, image by MidJourney

With that, SDSU take control of the cup race and Fresno is out. Current run:

$ python3 ./mcc_schedule.py -v
California 0 at San Diego State 34 on Sep 20, 2025
San José State 29 at Stanford 30 on Sep 27, 2025
San Diego State 23 at Fresno State 0 on Oct 25, 2025
California at Stanford on Nov 21, 2025
San José State at San Diego State on Nov 21, 2025
UCLA at USC on Nov 28, 2025
Fresno State at San José State on Nov 28, 2025

Full Enumeration Simulation:
San Diego State 10 [62%]
Stanford 4 [25%]
San José State 2 [12%]

Monte Carlo [Sampled Home Margin Predictor] Simulation:
San Diego State 6449 [64%]
Stanford 2415 [24%]
San José State 1136 [11%]

Monte Carlo [Elo Predictor] Simulation:
San Diego State 8242 [82%]
San José State 974 [10%]
Stanford 784 [8%]

San Diego State 2-0
Stanford 1-0
California 0-1
San José State 0-1
Fresno State 0-1
2025, 7, ,

With their second shutout victory in MCC play, San Diego has the chance to put a cup-winning season on the board in which they don’t allow a single point. San Jose State has been scoring 29 ppg in conference games so it won’t be a cakewalk.

Has a flawless victory season ever been done? What is the lowest total points allowed by a cup winner?

AI coding and structured data

I started this project in the fall of 2021, which means it was probably the last software I wrote without AI coding help. Looking back that makes it a time capsule of good and bad coding practices. (I have contemplated dumping the whole codebase into an AI-driven redo but if it ain’t too broke…)

But there’s an interesting result of all the AI advancements that I don’t think every non-coder realizes. Custom code for data inspection has become way more feasible, faster, cheaper. I forget the proper acronym but what we have with this system is a fairly recognizable pattern: The College Football Database has all the raw data, accessible behind a series of APIs. In order to create our virtual conference output our code loads lots of data from the API, manipulates it in its ephemeral runtime model and then prints out our desired human readable text. Load, manipulate, output, die.

Ten years ago, if I wanted to answer the further question “What is the lowest total points allowed by a cup winner?” I would have gone to work on the main code and augmented the model to add up points allowed, create a new dictionary of global results, sort, output etc. If someone would have suggested “Why don’t you treat your formatted output as a database itself and write a script that parses it and gets results there?” my response would have been “that sounds gross.”

You’d be talking about chaining another parser onto the end of the funnel. Writing parsers is annoying. If I’m going to spend my coding time I want to be working with the data structures I’ve already designed.

Enter LLMs. One of the areas where they really really shine is reading structured data and writing code that reads structured data, whatever the structure. Our yearly results files are meant to be human-readable, but they are a structure of sorts. Now instead of messing with the entire codebase I can treat my favorite coding LLM (I like blackbox.ai) as a business intelligence tool and just ask it to create a one-off to answer my question.

There’s a directory full of numbered text files with football schedule results. The files look like this:

San José State 8 at Stanford 10 on Sep 25, 1964
Stanford 20 at UCLA 27 on Oct 02, 1964
California 21 at USC 26 on Oct 23, 1964
San José State 37 at Pacific 13 on Oct 23, 1964
UCLA 25 at California 21 on Oct 30, 1964
USC 15 at Stanford 10 on Nov 06, 1964
Stanford 21 at California 3 on Nov 20, 1964
USC 34 at UCLA 13 on Nov 20, 1964

1964 final standings

USC 3-0
UCLA 2-1
Stanford 2-2
San José State 1-1
Pacific 0-1
California 0-3

1964, 8, USC, 3-0

What I want to do is write a script that loops through all files and aggregates total score info. I want to find the case where the winner at the bottom (In this example USC) has given up 0 points. It will be enough for the first pass of the script to simply aggregate how many points the winner gave up.

import os
import glob

# Directory containing the files (assuming current directory)
directory = '.'
files = glob.glob(os.path.join(directory, '*.txt'))

for file_path in files:
with open(file_path, 'r') as f:
lines = f.readlines()

if not lines:
continue

# Get the summary line (last line)
summary = lines[-1].strip()
parts = summary.split(',')
if len(parts) < 4:
continue
year = parts[0].strip()
winner = parts[2].strip()

# Find the start of standings
year_str = str(year)
standings_start = None
for i, line in enumerate(lines):
if line.strip().startswith(year_str + " final standings"):
standings_start = i
break
if standings_start is None:
continue

# Get game lines
games = lines[:standings_start]

points_allowed = 0
for game in games:
game = game.strip()
if not game:
continue
# Parse game line: "TeamA score at TeamB score on date"
parts = game.split(' at ')
if len(parts) != 2:
continue
teamA_part, rest = parts
# Parse TeamA: last word is score
teamA_words = teamA_part.split()
if not teamA_words:
continue
try:
teamA_score = int(teamA_words[-1])
teamA = ' '.join(teamA_words[:-1])
except ValueError:
continue
# Parse rest: "TeamB score on date"
rest_parts = rest.split(' on ')
if not rest_parts:
continue
teamB_part = rest_parts[0]
teamB_words = teamB_part.split()
if not teamB_words:
continue
try:
teamB_score = int(teamB_words[-1])
teamB = ' '.join(teamB_words[:-1])
except ValueError:
continue
# Check if winner is playing
if teamA == winner:
points_allowed += teamB_score
elif teamB == winner:
points_allowed += teamA_score

# For this file, points_allowed is the total points the winner gave up
# Check if it's 0
if points_allowed == 0:
print(f"File: {os.path.basename(file_path)}, Winner: {winner}, Points given up: {points_allowed}")
# Optional: print all for aggregation
# print(f"File: {os.path.basename(file_path)}, Winner: {winner}, Points given up: {points_allowed}")

It turned out my initial prompt wasn’t phrased right, and it was adding up all scores for all time. (I’m eliding a lot of excess LLM explanation. Skip to the code, robot.)

Sorry I should have emphasized that we’re looking for low scoring season… i.e. the file is called 1954.txt and it contains info for 1954. So if USC gave up 6 points total in 1954 then USC-1954 is the datapoint we’re interested in. The dict should be keyed on team-year not just team

import os
import glob

# Directory containing the files (assuming current directory)
directory = '.'
files = glob.glob(os.path.join(directory, '*.txt'))

# Dictionary to store team-year: points_allowed
season_points = {}

for file_path in files:
with open(file_path, 'r') as f:
lines = f.readlines()

if not lines:
continue

# Get the summary line (last line)
summary = lines[-1].strip()
parts = summary.split(',')
if len(parts) < 4:
continue
year = parts[0].strip()
winner = parts[2].strip()

# Find the start of standings
year_str = str(year)
standings_start = None
for i, line in enumerate(lines):
if line.strip().startswith(year_str + " final standings"):
standings_start = i
break
if standings_start is None:
continue

# Get game lines
games = lines[:standings_start]

points_allowed = 0
for game in games:
game = game.strip()
if not game:
continue
# Parse game line: "TeamA score at TeamB score on date"
parts = game.split(' at ')
if len(parts) != 2:
continue
teamA_part, rest = parts
# Parse TeamA: last word is score
teamA_words = teamA_part.split()
if not teamA_words:
continue
try:
teamA_score = int(teamA_words[-1])
teamA = ' '.join(teamA_words[:-1])
except ValueError:
continue
# Parse rest: "TeamB score on date"
rest_parts = rest.split(' on ')
if not rest_parts:
continue
teamB_part = rest_parts[0]
teamB_words = teamB_part.split()
if not teamB_words:
continue
try:
teamB_score = int(teamB_words[-1])
teamB = ' '.join(teamB_words[:-1])
except ValueError:
continue
# Check if winner is playing
if teamA == winner:
points_allowed += teamB_score
elif teamB == winner:
points_allowed += teamA_score

# Key on team-year
key = f"{winner}-{year}"
season_points[key] = points_allowed

# Sort by points_allowed ascending (lowest first)
sorted_seasons = sorted(season_points.items(), key=lambda x: x[1])

# Print the top-10 (lowest points given up)
print("Top-10 low-scoring seasons (lowest points given up first):")
for i, (key, points) in enumerate(sorted_seasons[:10]):
print(f"{i+1}. {key}: {points} points given up")

Note that all the annoying bits of parsing are handled without anyone having to spend any dev time on it. The “business intelligence” code itself is so cheap it becomes disposable without needing to disturb the codebase. Sure we could check it in but it’s really a one-time question. Since we were able to constitute it from just one sample file and a paragraph or two of human questioning we can almost assume that this code still exists, deterministically, without needing to maintain it or even save it ourselves.

$ python3 ./winner_by_team_year.py
Top-10 low-scoring seasons (lowest points given up first):
1. UCLA-1954: 6 points given up
2. USC-1947: 14 points given up
3. USC-1952: 19 points given up
4. UCLA-1955: 20 points given up
5. UCLA-1966: 22 points given up
6. USC-1960: 22 points given up
7. UCLA-1961: 22 points given up
8. USC-1963: 23 points given up
9. USC-1962: 23 points given up
10. March Field-1943: 23 points given up

And there we have our answer. I added links back to the results files but spot checks look good. Nobody’s pitched a season-long shutout but UCLA came pretty close in the deadball era 1950s. If San Diego State can hold San Jose State under 6 points they go to the top of this list…