YH: What first got you interested in statistics?
SS: Being a big Red Sox fan, even in the 90’s, I was constantly having numbers thrown at me, and when I read Moneyball, that triggered some interesting thoughts. So I had been looking at things like on-base percentage and some other interesting baseball stats for a while, but going to the MIT Sloan Conference, an open colloquium about sports analytics, for the first time made me start to generate my own ideas.
YH: How have you gone about studying statistical analysis? Has it been a big part of your Yale education, or are you mostly self-taught?
SS: I think to learn about sports statistics in a meaningful way you need to have pressure to deliver something and you need to have someone you can email to ask which function to learn in excel. Once you have those two things, the sky is the limit, and you can move really quickly. I didn’t have either of those things for a while. When I first became interested, I would copy and mast data into excel, and struggle. Then, when I had a boss for the Pistons, I was working under deadlines and firm instructions, which was really useful. At Yale I think it’s true that without deadlines, you don’t really move.
YH: What was your job with the Pistons?
SS: They had the ninth draft pick going into that year , and they were trying to determine who they wanted to take. My job was to try to make a model given the professional output of college players. What was hard about that was adjusting for the relative strength of a given team game schedule in careful ways. They ended up drafting Andre Drummond, who had a pretty good season, even though no draft models liked him that year. He was shooting 28 percent on free throws.
YH: You’re also currently working for the White Sox, right?
SS: Yeah. Analytics in baseball are now all about what’s going on in the strike zone, in those however-many inches above home plate: the angle the ball is coming, the speed of the ball, the speed of the bat, the pitch location—all that is really what’s being analyzed. They were looking for some undergrad to do some quantitative and statistical analysis, and I came up. I have only been with them for a week now, but it’s a really interesting dataset. I didn’t know R [a programming language and software environment for statistic computing]—I was basically using a ton of excel and a little Stata, but now I’m learning R, which is just massively powerful. And it’s great because you can make beautiful pictures, because you can convince people who don’t like math.
YH: Utilizing big data in sports like baseball and basketball has become hugely popular. Why do you think it’s had a smaller impact on other sports like soccer and football?
SS: The basic reasons are the big number of players not touching the ball and the even bigger number of players on the field at one time. And on top of that, there is not as big of a European data culture. Here, there might not be as much data as people like me would want, but even in the 80’s you had the batting average flashing at the bottom of the screen. We’re moving in the direction of more data, and baseball is just perfect for it because it’s easier to find trends when you have a lot of isolated, one-on-one interactions with a clear winner. Once Dean Oliver convinced people of that fact, we found trends really quickly. I think that the next step for looking at football will be to analyze the one-on-one battles between offense and defense (receiver and cornerback, O-line and D-line, etc.). There’s a lot of progress to be made.
YH: Do you think every factor of sports performance can be modeled adequately using statistics, or are there some intangibles that can’t be modeled?
SS: You try to model everything and try not to act confident. There are tons of studies about how overconfidence in estimations makes for the worst and how one should never trust overconfident forecasters. You just try to understand the baseline power of your model. I’m not going to try to predict how the Pacers bench plays against the Bucks bench when they have three players on ten-day contracts—it would just never be accurate.
Soccer’s tough, for example. I got some data from OPTA [a sports data resource], which showed completed passes, cuts the field into three sections, and has some more advanced x-y coordinate stuff. But I’m not good enough to unlock it yet. The problem with soccer stats is that you have these various theories: you see big correlations between pass completion and winning, and you see pretty clearly that turnovers in your own third lead to goals at an alarming rate. A short, frequent passing strategy may seem effective because it leads to a lot of goals scored, but you need to look at how it effects getting scored on. It’s attractive to look at goals because there are so few of them, and they’re exciting, but you have to look at real games in convincing ways and think about what it means for your model.
YH: Most of the methods used today in sports analysis were advanced in the past decade. Where do you see the field going in the next five to ten years?
SS: In baseball, we’re looking over the plate; in basketball, they’re looking spatially at the court. I think we’re going to see interesting stuff in how we look at passing, but one big thing that’s developing is how we look at practice. In the NBA especially (I don’t know if it’s the case in other sports, but I assume it’s the same), there’s no real formula for how to get your players to perform on game day. They might be overworked, under-worked, or just not practicing the right things. Do you want to allocate practice time based on how much a situation comes up in games or based on how difficult it is? How hard are players really working during practice? You will probably start to see correlations between how hard a guy works in practice and how hard he worked in the next game and alter rotations accordingly.
YH: Do you find yourself thinking about any of this stuff when you’re playing on the squash team?
SS: I try to boil it down—a big thing is unforced errors. In squash, the low risk thing to do is getting the ball to the back of the court, so I try to think about getting the ball back there and getting out of the way. I guess, you could say that those are statistically inclined things to be thinking, but I’m not thinking of it in regression terms while I’m out there. I don’t think anyone ought to. That’s not the goal of analytics, to get people to think like that. What you think while you’re out there is the domain of sports psychologists.
YH: This summer, comedian Aziz Ansari said in an interview that on his new show, called “Work in Progress,” he collected demographic data on the audience to analyze how different jokes appealed to different cohorts. Given that you’re in Just Add Water, how do you feel about using this kind of data-driven approach to humor?
SS: In JAW, there is an interesting data piece: you’re going to find that if one person laughs, the probability of everyone else in the crowd laughing is doubled, or something like that. Demographics would be interesting. And if you look at demographics, you’d probably see that JAW does strikingly similar things at nursery schools and at Yale—when we’re out there we’re doing the same damn thing. But that is again, moving into psychology.
YH: Do you feel like you watch sports differently now than you used to?
SS: Absolutely. The stuff you’re asking about mindset has all changed for me. Even my friend Chris, a South African dude who’s really bad at basketball, now when he’s watching a game with me says you can’t be taking long twos. You try to see what’s really going on, and you try to see where guys are without the ball.
YH: Do you think it’s something everyone should be learning and is it available at Yale?
SS: They should be able to if they want. If they don’t want to, they don’t want to. There are a lot of happy people who don’t know anything about statistics. I like statistics. I’d encourage everyone to at least give it a shot. But one thing about Yale is that I want a little help learning R. I’m not great at it, and I don’t know where to get it; it’s not very easy. Office hours are during practice. There’s definitely room for more statistics at Yale.
—This interview was condensed by the author