The AT&T: It comes up short on data for PGA Tour statisticians

Pebble Beach is a great place to visit, but the tournament’s three-course format hampers data analysis for PGA Tour statisticians. 

For serious golf statisticians, the PGA Tour’s visit this week to the AT&T Pebble Beach Pro-Am concludes the Tour’s annual sojourn back into the dark ages of data analysis.

The AT&T will mark the ninth tournament of the past dozen on Tour in which full statistical data will not be available. Specifically, as is most commonly the case, Strokes Gained information will not be calculated for all four rounds of the event.

Analysts get used to the absence of full Strokes Gained data this time of year since that data is not calculated for most tournaments played outside the United States, and is limited on events played over multiple courses.

Beginning with the CJ Cup played last October in Korea, one or both of those conditions is routine on the tournament schedule. In fact, the only events producing a full statistical profile since the CJ Cup have been the Sentry Tournament of Champions, the Sony Open and last weekend’s Waste Management Phoenix Open.

The AT&T is played over three courses, of which Strokes Gained data will be unavailable on either Monterey Peninsula or Spyglass Hill. Counting the American Express and the Farmers, that will make three of the last four U.S.-based tournaments for which a full, modern statistical profile of the results cannot be calculated.

Fortunately, that is about to change. The Tour will not play another event on split courses all season. And while it will occasionally go overseas — for the WGC Mexico, the Open Championship and the Canadian Open — those events all will be fitted with the equipment necessary to provide full Strokes Gained data.

Why does anybody care about Strokes Gained data as opposed to the basic tour data? After all, that portfolio — most of which has been collected since 1980 — usually includes such elementals as driving distance, fairway accuracy, greens in regulation, scrambling percentage and putting.

In combination, that ought to provide enough data to make any analyst happy.

The problem lies in the relationship of the data to the game’s bottom line. Data is not pertinent in the abstract; its true value lies in the extent to which it can be said to influence scoring.

There is a fairly simple mathematical test that can be applied to determine the correlation between any two sets of numbers, for example driving distance and scoring average. It’s called regression analysis.

The bottom line is that the closer the relationship is to 1.0, the stronger that relationship. The closer it is to 0.0, the weaker the relationship. In golf, where scores improve as they decrease, the relationship often ranges between 0.0 and -1.0, but the idea is the same.

For the 2018-19 PGA Tour season, the following table illustrates the strength of the relationship between some standard statistical measurements and scoring. I’ve also included the relationship between the complementary Strokes Gained data point and scoring. Remember, in every case, we can ignore the minus signs, but we want to get as far from 0.0 and as close to 1.0 (or -1.0) as we can.

Skill                                    Correlation with score

Driving distance                 -0.27

Driving accuracy                -0.18

SG Driving                           -0.51

Greens in Regulation       -0.42

SG Approach Green          -0.62

Scrambling                         -0.58

SG Around Green              -0.41

Putts/round                         0.39

SG Putting                            0.40

You will notice that with the exception of recovery shots, Strokes Gained correlated more closely to a player’s actual score, and often it wasn’t a close call. In terms of measuring the significance of approach shots, Strokes Gained data correlated 20 percentage points more strongly with the score than did basic GIR.

No, this was not a one-time thing. Since Strokes Gained data was first collected during the 2004 PGA Tour season, the average correlation between a player’s Strokes Gained data and his score has been 18 percentage points stronger than the average correlation between the more standard measurements and score.

The Strokes Gained numbers simply paint a better, more accurate explanation of why a victories player won — or of why your hometown favorite came up short — than legacy data.

That is not to say the game’s analysts hate the AT&T. Pebble’s always a nice place to visit.

But for those analysts, it will also be nice to get down to the Genesis next week. It’s played on just one course, Riviera, making the full toy box of analytical data once again available.

Next: Top 50 golfers of all time