On May 7, 2015, one day after the so-called Wells Report was public, my colleague and fellow economist, Michael J. Moore of Northwestern University, sent me a five-word email: “Would you file this report?”
As I reviewed the attachment, I understood that Michael wasn’t asking about the Wells Report per se, but rather about its 250-page attachment in which Exponent, a consulting firm hired to analyze the air pressure of the footballs used in this year’s AFC Championship Game between the New England Patriots and the Indianapolis Colts. There Exponent summarized the data, their statistical analysis, and off-site physical testing results.
I restated Michael’s question:
Would either of us be willing to submit an expert report like the Exponent’s in a serious proceeding? And for me, even though the NFL and business schools are vastly different, would I be willing to use such a report in personnel decisions?
For the fun of it, as others have, Michael and I explored Exponent’s analyses, the data and protocol issues, and the conclusions reached with any eye toward writing case study on judgement and decision-making. Then, due to chance events, the NFL Players Association hired us to prepare testimony for the Tom Brady Appeal.
On June 23rd everyone gathered three floors below ground at 350 Park Avenue in a room with a dropped ceiling and fluorescent lights. Counsel each had a side table – the fold-out, laminate type, the witness had a special chair, and the court reporter was in the corner. Forty plastic chairs in rows were occupied by others. Commissioner Goodell sat at the head table and stayed dialed in for the 10-hour hearing.
After opening statements by counsel, Mr. Brady testified and I followed. In my direct and during questions, I kept two major points in mind. First, the testing protocol used by the NFL Officials during the game was impromptu. No one to anyone’s knowledge had tested footballs during a game. No one had wondered how much footballs might deflate in December at Mile High Stadium in Denver. When the question whether the Patriots’ balls were deflated outside the rules arose, the Officials demonstrated quite good intuition: They measured the 11 Patriots footballs and 4 of the Colts balls to serve as controls; measured each ball with two gauges to improve accuracy; and measured footballs after the game to shed light on other factors. These steps gathered the data used to answer the question posed in Well Report and by Exponent (p.10), i.e., “whether the drop in pressure, on average, measured for the Patriots footballs after the first half of play was statistically significantly greater than the corresponding average drop in pressure for Colts footballs.”
But the impromptu protocol had limitations. No pre-game measurements were recorded and so Exponent relied on the NFL Official’s recollection that the Patriots balls were set to 12.5 PSI and the Colts balls to 13.0 PSI. No one recorded the locker room temperature or the timing of measurements. No one checked for differences in moisture. Exponent also encountered anomalies that led them to presume that the gauge actually used by the NFL Official pre-game was not the one he recalled using. Then the two PSI measurements for Colt ball #3 didn’t make sense to Exponent. Were they put in the wrong columns? Aggravating these problems, the data sample was puny.
The other major point I emphasized to Commissioner Goodell is that Exponent’s statistical analysis used to answer the key question above – did the pressure in the Patriots balls drop more than the Colts ball – did not account for an obviously important factor. Officials brought the game balls from a cool and somewhat wet first-half environment into the warm environment. They first measured the 11 Patriots balls and later measured the 4 Colts balls. Why only four? They ran out of time, in part because they decided to reinflate the Patriots balls.
While the extent of the time differences in half-time measurements is not known – another protocol issue – the big fact remains: the Officials measured all Patriots balls early in the half-time when they were colder and relatively wet, and they measured the Colts balls later when they were warmer and relatively drier. Does measuring (foot)balls right after they are in a cold and relatively wet environment matter? Some know this from the hilarious Seinfeld “Shrinkage” episode involving George right after a cold swim. But if one needs convincing, Exponent’s Figure 22 shows that the PSI of a wet and cool (48°) Patriot ball brought into a warm (72°) and dry locker room would increase by up to 1.0 PSI in 12 minutes, more than the “extra deflation” attributed to the Patriots balls. Therefore, to isolate the extent of potential deflation outside the rules, one must control for time differences.
Remarkable to Professor Moore and me is that, after figuring out the science represented in their Figure 22, Exponent did not put average time differences between the measurements of the Patriots balls and the Colts balls into their statistical model. They did inquire how the order of measurement affected the subset of Patriot football measurements. They also did off-site studies and reached conclusions that timing couldn’t explain everything. But neither of these steps is a control for time in their analysis of the differences in pressure drops. When Professor Moore and I included average time differences in the Exponent model, the statistically significant result simply went away. In one scenario, the average pressure between the 11 Patriots balls and the 4 Colts balls that was “unexplained” was 0.1 PSI.
Hence, I told Commissioner Goodell that there is no statistical basis to conclude that changes in PSIs of Patriot footballs from pre-game to halftime differed from those of Colts footballs. One confirming highlight came near the end of testimony. Professor Daniel R. Marlow, who seems like a good guy and certainly looks the part of a Princeton physics professor, acknowledged that Exponent did not control for time differences.
Throughout the day Mr. Brady sat with a leg crossed, arms crossed at the wrist, excellent posture, and virtually no fidgeting. I can’t read minds but he was keen to understand everything, including the effects of changes in environment on PSI, significance tests, and why Exponent chose not to use the post-game data.
I concluded my testimony to Commissioner Goodell by expressing, with appropriate respect, a concern. As a dean, I play a role in decisions that affect the standing of professors, students, and professional staff. While no one in my world matches Mr. Brady’s profile, the Yale and Chicago campuses where I’ve worked are home to leading scholars, important practitioners, and aspiring leaders of business and society. Given the lack of even close to a solid protocol and the lack of reliable results, I would not give any weight to such empirical findings in a personnel decision.
Edward A. Snyder is Dean of Yale School of Management. Professor Michael J. Moore of Northwestern University and Dean Snyder were supported by Pierre Cremieux, Paul Greenberg, and Jimmy Royer of Analysis Group, Inc.
August 6, 2015