How to Predict FRC Matches

An Introduction to Scouting

In FRC (the FIRST Robotics Competition), one of the most important, yet overlooked, aspects of the season is scouting. Normally, the focus of the season is always on the building of robots, writing and testing of software, and training of drivers. But there's another key part of the season that is often overlooked -- scouting matches and analyzing the performance of other teams. Scouting can often be the factor that determines a team's success or failure at an event -- if you don't have a good strategy going into a match or alliance selections, it's entirely possible that you'll make strategic errors.

So, what exactly is scouting? For my team (5160), scouting has traditionally involved sending people up into the stands, having them watch as many matches as possible while recording the actions that various teams take. Our team's scouting methodology in the past has been extremely time consuming and has led to rapid burnout among the people scouting. Last year I set out to fix that.

2019: Destination Deep Space

The Blue Alliance and Data Visualization

Most FRC teams have used The Blue Alliance at some point or another, and our team is no exception. Outside of our manually collected match data, our analytics and team selection process has relied heavily on the data that they give teams access to. That normally works decently well, but there isn't really much data visualization that teams are given access to.

insert image of TBA

Fortunately, all of the team data is made available through an API, so with a bit of inspiration from FRC Programming Done Right, I was able to set up a system that pulled team data down and generated team event summaries. I initially started out with a few graphs and charts focusing primarily on an individual team's performance at an event without much consideration for other teams.

Image of Sakurajima V1

This, as a visualization tool, works reasonably well for seeing if there are noticeable trends in a team's performance, like if they suddenly start having higher match scores in the later part of an event. However, it has a key flaw: my initial program didn't actually compute any metrics or perform any analysis.

Rudimentary Data Analytics

After using the data visualization software with reasonably good results at one of our events, I decided to take it a step further and introduce more "usable" metrics to gauge team performance. I started off by adding in OPR (offensive power rating, how many points a team is expected to score in a match) as a metric, followed by a standard score comparing a team's performance to that of the other teams at the event. Basically, that score was calculated by looking at how many standard deviations above or below the mean a team's mean score performance was.

$$z = \frac{\text{mean team score} - \text{mean score of all teams}}{\text{event score deviation}}$$

As a metric, this is fine for quickly looking and seeing how good a team is compared to the mean, but I was able to take it a step farther. If we can prove (or assume) that mean match scores for a team at an event are normally distributed, then we can treat that as a Z-Score and calculate a percentile score to go with it. Fortunately, we don't really need to do much to prove it, as it should follow the central limit theorem since we're looking at mean scores. As I mentioned, I was able to take advantage of that fact to convert my Z-Scores into percentiles, letting us finally quickly glance at and evaluate teams in the context of an event.

Eventually, the 2019 season approached it's end as the last off season matches were played, leaving the initial work that I did unneeded... for the time being.

2020: Infinite Recharge

With the 2020 season, and events that we'd play in, quickly approaching, I figured that it was time to revive this project. During the first few weeks of build season, I put some time into working through the math involved with calculating an improved scoring metric: Component OPRs. As I mentioned briefly before, the OPR metric in the context of match score can be considered as a way to see how many points a team is expected to contribute to a match that they play in. Mathematically, it can be solved for by treating matches as a system of equations and then solving for the team contributions.

$$\begin{bmatrix} \text{Team}_a + \text{Team}_b + \text{Team}_c \\ T_c + T_d + T_e \\ T_a + T_d + T_f \\ ... \end{bmatrix} = \begin{bmatrix} \text{Score}_\text{match 1} \\ S_2 \\ S_3 \\ ... \end{bmatrix}$$

This technique is quite powerful, as it effectively lets us calculate "how good" a team is for whatever we want. In the case of Component OPR, what we're able to do is apply that technique to how many game pieces a team scores in a given match, for example. Estimating that number is extremely useful, as we just added another metric to rank teams in the context of each other. But it also has another property that we can abuse -- the ability to predict scores.

Basic Component OPR Score Prediction

With the power of knowing how many game pieces a team is expected to score in a match, we can now figure out how many points they're expected to score in a match as well. As every game objective has a score associated with it, we can simply multiply the scoring value of a game piece by the number of pieces we expect a team to score to see how many points we expect a team to score from a given objective in a match. This can then be added up with all the other objectives and teams in a match and just like that, you have a really basic way of predicting scores.

Now, somewhat unsurprisingly, this has quite a few inherent issues that need to be tackled. Firstly, OPR (and Component OPR) is an extremely unstable metric with quite a bit of variance. When solving for it, you basically end up performing a regression, so there's associated error with it, and it usually won't converge on the true value without more data points than are given throughout the first part of a tournament. OPR is still a great factor to take into account when building a model to predict scores, however, as it clearly has a strong relation with how well a team scores.

Insert OPR vs Mean Score Graph

The other issue with this basic model is that it makes the assumption that all game objectives will contribute to the score at an equal level. Basically, it's saying that if a robot is expected to score 3 balls in the low goal during autonomous (yielding 6 points), that those points are as important to the final score as scoring 3 balls in the inner goal during autonomous (yielding 18 points). It feels like a safe assumption to make, but the next improvement to the model somewhat disproves that assumption.

Linear Regression (machine learning for statisticians)

I figured that the next step to improve my model would be to take advantage of a basic multiple linear regression. Using the previously calculated component OPRs as well as a calculation representing endgame points as the explanatory variables, and using a team's score as the dependent variable, I was able to create a linear model with an R^2 value of .73, which I consider to be decently good. Granted, this early version of my score model is only really useful for predicting an alliance's score given one team, so it should be used more as a method of ranking teams against each other.

Initial regression coefficients

I found that the more interesting thing about the model what in the coefficients of regression: I found that not all of the explanatory variables had a linear relation to the dependent variable as expected! This contradicts the initial assumption made with my first model, meaning that the sum of the "normalized component OPR" (OPR for points scored in a game objective) isn't the same as the expected score. This is good to know, as it then means that when developing our model, we'll need to use some kind of linear regression.

Week 2 & Performance Across Events

The other issue with the first model that I developed was OPR not being able to converge at only one event. Fortunately, most teams usually play at more than one event -- this means that we can look back at results from previous events to figure out a team's OPR, right? Unfortunately, it's hard to make that claim without significant statistical backing. Normally, teams will improve between events, or make modifications to their designs, meaning that intuitively team performance would vary from event to event. However, I figured that this would be the crux of the success of my model in a competition setting, so I decided to test and see if that assumption actually holds up.

score boxplots

I figured that the easiest way to do that would be through the use of a few standard stats things. Initially, I looked at all the scores across the two weeks through the use of a boxplot to see if there was any significant difference, and visually I didn't notice one. I also treated each week's scores as a distribution and compared them to each other through the use of a KS test, and didn't find statistical evidence to support that the score distributions are different. To be more thorough, I should test each team's performance across multiple weeks, although given that most teams only played once this year (darn COVID-19), I can't really do that.

The Secret Sauce: Accounting for Variance

After having dealt with the two more obvious issues that could arise with the model, the next thing to tackle is the extreme variance in OPR. I decided that for my model, in order to make more accurate predictions with an associated confidence, I'd need to determine what the statistical variances and deviations in the OPRs for a given team are.

OPR Variance

The first thing that we need to consider are our inputs to the model -- the actual component OPRs. Those OPRs are determined through team performance at an event, which is unfortunately non-consistent (this is the crux of why I'm modelling the distributions). Let's just pretend, for now, that a team's performance follows that of a normal distribution, so for example, they will score a mean number of point objectives with a deviation every match.

Next, let's quickly look at what OPR really is (if you haven't read this blog post by TBA before, it's definitely a better primer than what I'm capable of writing).

To pull from TBA, the following is how we solve for OPR through matrices.

$$ A = \text{incidence matrix of teams/matches played} \\ x = \text{OPRs being solved for} \\ s = \text{scores produced} \\ A^\intercal Ax=A^\intercal s $$

When performing this solution, and solving for \(A^\intercal s\), we actually end up using a least squares method to determine the answers, as we have to approximate the solution because of the overdetermined system.

Solving for the mean [point objective] scored each match is the easy part, that's what solving for OPR does for us. Unfortunately, in doing that, we're obfuscating the variance in the [objective] being scored, and (afaik) there's not an easy way to recover it.

I first set out looking for a method to solve for it by going to Chief Delphi, but it only looked like the question of finding an associated variance/standard deviation (SD) for OPRs had only been asked once, and it was a pretty inconclusive thread (here). After reading through this thread, I was able to latch on to a few different leads to determine the standard deviation.

I initially considered

  • Using some kind of jackknifing to determine the variance
  • Taking the standard error of the OPR fit to get a level of uncertainty for each team
  • Using "wgardner's method", a method which "approximates the error of the OPR calculation" and connect that to the calculation error

but settled on a method which basically takes calculated OPRs and determines the residuals from them when used as a prediction of team performance.

To get a better idea of what it does, here's my "pseudocode" implementation of it (for one team):

oprs    # a dict of precalculated OPRs for each team
matches # match data for the team we're looking at

expected = np.array([])
observed = np.array([])

for match in matches:
    teammates = math.alliance # get alliance partners
    score = match.score       # our observed score

    # we expect the sum of the OPRs to be the score
    expected = np.append(expected, sum([
        oprs[team] for team in teammates
    ])

    # and we know what the actual value was
    observed = np.append(observed, score)

# calculate the RMSE (root mean squared error)
residuals = observed - expected
RMSE = np.sqrt(np.mean(residuals ** 2))

Even though RMSE and SD are technically different, it's probably safe to treat them as the same thing here, especially since they conceptually can be applied in similar situations (like what we'll encounter later). But now that we have some measurement of a standard error/standard deviation, we can move on and start integrating it into our model.

Regressing with Distributions

To start off, we can take the model from before, but the only things that we really need to keep are the coefficients associated with each explanatory variable and the intercept. When integrating our newly parameterized distributions into that regression, we'll basically be applying a few basic addition and multiplication operations to the distributions in order to change and merge together the variances. There's a few rules associated with standard deviations that are important to keep in mind throughout this:

$$ \text{Summing deviations} \\ \sqrt{\Sigma_{i=1}^{n} \sigma_i^2}\\ \\ \text{Multiplication of a deviation by a constant} \\ A*\sigma $$

These two properties basically let us run our standard deviations through our predictive model for a single team and get a predicted score distribution -- this is pretty easy to do simply because our model is a combination of multiplication operations (by our regressed coefficients) summed at the end.

Predicting with Confidence

Anyways, now that we have the ability to construct a predicted score distribution with a standard deviation and a mean, we should be able to create predictive intervals. Unfortunately, this is where the math gets very ugly and unclear. Basically, we need to take our predicted distributions, determine the standard error for them, and then use that to get an interval. The issue is in finding the standard error; from my research, I was unable to find a solid source describing how to find it so I had to fudge a few formulas together.

And speaking of fudging formulas, a quick disclaimer: I'm not a statistician, I'm a student. It's extremely probable that there are few things that are wrong, but the end product works quite well so I'm letting those errors slide for now.

(in the preceding formulas, these are what each variable are)

$$ \hat y: \text{our predicted mean score}\\ t^*: \text{our t-statistic, found using the probability density function}\\ n: \text{the number of matches being used to make the prediction}\\ \sigma: \text{our predicted standard deviation}\\ (x^* - \bar x)^2: \text{our predicted mean minus the mean score observed}\\ $$

Firstly, we know that to construct a confidence interval with a t-statistic we use the following:

$$\hat y \pm t^* SE$$

We also know that for our specific use case, the standard error is

$$SE = \sigma \sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar x)^2}{\Sigma (x - \bar x)^2}}$$

(if you want to know, this is taken from p544 of "The Basic Practice of Statistics, Second Edition" by David S. Moore).

And finally, we know that the standard deviation (\(\sigma\)) is

$$\sigma = \sqrt{\frac{\Sigma(x - \bar x)^2}{n}}$$

This means that with a bit of algebraic substitution, we should be able to solve for our interval in terms of what we know, like so:

$$\hat y \pm t^* \sqrt{\frac{n \sigma^2 + \sigma^2 + (x^* - \bar x)^2}{n}}$$

And just like that, we now can get a predicted score interval given a confidence.

Who will win?

The powerful thing about our confidence based model is that we can now make match predictions and assign a probability to the outcome. This is pretty much required for any predictive model, and it lets us take advantage of a few predictive accuracy metrics such as the Brier score. The question is, how do we implement it from what we currently have?

insert confidence sliding window

We've fortunately re-entered the easy part of the math, and can programmatically do it. Basically, we just need to predict the two alliance score distributions for a given match and then play with the confidence until we get something statistically significant (i.e. there's no overlap). The best way to implement this is probably though some kind of a binary search, but I settled for simply lowering the confidence until there's no overlap. Once there is no overlap left, all we have to do is see which distribution is higher than the other, and that's our prediction.

Confidence != Probability??

One of the issues that I encountered with my model, at this point, was that the confidence wasn't actually lining up with the predictive accuracy. This is technically a sign that there's an issue with the model, however it's not an issue that can't be buffed out through the use of more modelling. In order to deal with it, I took a random sample of matches, iterated through the difference confidence levels, and then found the percent accuracy of the model at the given confidence.

Graph of percent confidence v accuracy

After graphing the two, it became somewhat apparent to me that they were almost linearly related, meaning that I could fire up Desmos and create a fit of confidence vs accuracy (here's my regression). With that newly regressed formula in hand, I took it and plugged it into my model so that it automatically converts confidence to accuracy.

Evaluating Accuracy

Now that the model is finished, it's time to evaluate it. For all of these initial accuracy tests, I used all the match data that I cached, so the accuracy is probably slightly higher than it should be, given that we've basically verifying using training data. Across all matches played, my model was 77% accurate at predicting the victor. Given the sample size of the matches I was looking at (n=3735), that's really quite impressive to me. I also individually looked at it's performance at ncwak and ncpem, two events in our district, and my model was more accurate at predicting the victor than TBA's match prediction software. Compared to FRC Cheesecake, my model is more accurate than it (71% accuracy) as well. Granted, again, my model is looking back in time so the data it has on teams is likely more than what FRC Cheesecake has. I don't think TBA's match prediction locks in after a match is played however, so it is likely more accurate than that.

Victory.

COVID-19

Unfortunately I'll likely never get to the end of this story, due to COVID-19. My model was finished two days before week 3 events, which were effectively all canceled due to the outbreak. Hopefully this post will be updated in the future, but for now, thanks for reading, and I hope that something was somewhat helpful to you.