Friday, September 1, 2017

Saving your best for last: the recency effect in Cy Young voting

TLDR: "[...] thanks to a cognitive bias in humans, September [baseball is] where pitchers can make up ground in the race for the title of their league's best pitcher - the Cy Young winner."

"[...] A strong performance in September is roughly twice as influential for a pitcher's Cy Young voting points as the exact same performance would have been in April."

September is known as the most important time in Major League Baseball (MLB): teams are jockeying for position in their divisions and each game begins to weigh more heavily on their chances of making the playoffs. But thanks to a cognitive bias in humans, September also becomes a time where pitchers can make up ground in the race for the title of their league's best pitcher - the Cy Young winner.

The Cy Young Awards are given each year to the pitcher who accumulates the most vote-points in both the American and National League (abbreviated AL and NL, respectively). Thirty members of the Baseball Writers Association of America (BBWAA) rank their top five pitchers in each league and the votes are weighted accordingly:

1st - 7 points
2nd - 4 points
3rd - 3 points
4th - 2 points
5th - 1 point

Since the voting takes place at the end of the regular season (yet before the beginning of the post-season), we might expect to see evidence of the recency effect, that is, the tendency for humans to more easily recall events that have occurred more recently than those which occurred in the distant past. In the context of Cy Young voting, we would expect to see the recency effect if voters tend to overlook pitcher performances that occurred earlier in the season and voted based on more recent performances.

To illustrate an example of the recency effect, look no further than last year's AL Cy Young voting (perhaps the recency effect to consider the most recent voting period). Rick Porcello's win over Justin Verlander was highly contested (especially by the runner-up's wife, Kate Upton) and the graph below may help to explain why.

I plot the cumulative game scores of the top 4 starting pitchers in the 2016 AL Cy Young voting race. A game score is a measure that indicates the dominance of a starting pitcher's performance with a baseline score of 50. Games scores above the 50-point baseline indicate stronger performances and game scores below 50 indicate poor pitching performances. For this exercise, I subtract the 50-point baseline from each game score to help illustrate the data. In the left panel below, I begin summing each individual's game scores from the commencement of the 2016 season and focus on the final two months of the regular season (plus the first two days in October). Here Rick Porcello (in Red Sox red) looks like a poor choice for the Cy Young. However, in the right panel, I begin summing the games scores from August 1st, 2016 until the end of October. Now the choice for the Cy Young becomes less obvious. (Note that the number of voting points each pitcher received are in parenthesis next to their name.)

To test my conjecture empirically, I collect information on all pitchers from the 2015 and 2016 seasons who:
  1. Started a minimum of 20 games, and;
  2. Finished in the top 10 in their respective league for the following categories:
    • Earned Run Average (ERA),
    • Wins, or;
    • Strikeouts.
I then calculate the game scores for each pitcher (recall that I do not add in the 50-point baseline to the game scores of each pitcher). All my data comes from I then calculate the correlation between the BBWAA voting points and the average game score of a pitcher for a given month. The results are depicted graphically below.
As I expected, there appears to be evidence of the recency effect in Cy Young voting. A quality pitching performance in April does not correlate nearly as strongly as a quality pitching performance in September. A simple regression suggests that a strong performance in September is roughly twice as influential for a pitcher's Cy Young voting points as the exact same performance would have been in April. This is consistent with the recency effect hypothesis that the voters tend to forget the performances in early April and remember more recent performances in August and September when determining their choices for Cy Young Awards.

I am not the first to illustrate that the BBWAA does not always correctly award the best pitcher in baseball with the award for the best pitcher in baseball. But thanks to the recency effect, the BBWAA gives empirical evidence to the phrase "you are only as good as your last performance."

If you are enjoying what you are reading, I would love to hear from you! Please like, share, and comment below!

As of September 1, 2017, my simple model mentioned above predicts the following:
AL Cy Young winner:
  • Corey Kluber - Cleveland Indians (84.6% chance of winning)
  • Chris Sale - Boston Red Sox (15.4%)
NL Cy Young Winner:
  • Max Scherzer - Washington Nationals (76.3% chance of winning)
  • Gio Gonzalez - Washington Nationals (23.5%)
  • Clayton Kershaw - Los Angeles Dodgers (0.2%)

1 comment:

  1. Agree with you 100%. But, I would add that it is significantly more important for a pitcher to have a great September while their team is in a playoff hunt than to have a great April. Therefore, there should be more weighting to the last month.