DIRT Report

Search or Filter

Search DIRT

Table of Contents

Background

CGA engaged Hanover Research (Hanover) to develop a methodology to track and trend the annual level of U.S. damages over time in order to measure progress toward its goal of decreasing damages by 50% over 5 years.

Hanover recommended modeling at the county rather than state level because it would allow selected variables to be more specific and increase the sample size of the regression model (since there are more counties than states).

Hanover employed an iterative process to identify independent variables based on the amount of variation in the number of damages using classification and regression trees. The first explanatory variable is selected based on how much variation in number of damages it can explain. Then, other variables are selected based on how much additional variation they can explain (considering that other variables have already been added to the model). The model assumes that there is a linear relationship between weighted damages and each independent variable.

More than 25 variables with direct and indirect relation to underground damages were analyzed. With weighted damages[1] as the dependent variable, correlations were checked to ensure that the independent variables were not highly correlated with each other. Only variables that explained some variation of the number of damages were retained in the final model. This process led to the identification of three independent variables that in combination correlate to the number of damages at the county level:

  • Urbanicity
  • Precipitation
  • Number of companies in utility and construction related industries

 

[1] Weighted meaning accounting for multiple reports of the same event, as used in annual DIRT reports and dashboards.

Methodology

Hanover gathered precipitation data for 2021-2023  for 3,137 counties across the U.S., available through the NOAA National Climatic Data Center. Total annual precipitation was determined for each county and year.

Hanover gathered data from the U.S. Census Bureau on the total number of companies by county and industry, as defined by the North American Industry Classification System (NAICS). Those were then subset to industries of interest selected by CGA. These industries are outlined in Table 1. Data for 2021-2023 was available for 3,030 counties across the U.S.

The U.S. Department of Agriculture (USDA) Economic Research Service’s rural-urban continuum code was used to code degree of urbanicity. This nine-point scale classifies a county from 1 (most urban) to 9 (most rural), using population size and adjacency to metro areas. Data for 2021-2023was available for 3,234 counties and county equivalents. Table 2 provides a description of the 9 points of the scale.

Each of the predictor variables were significant at p<0.0001. The overall model adjusted R Squared (R2) was 0.54, indicating that 54% of the variation in the model is explained by the independent variables included. This is considered a moderately strong R2 in social science fields, where values of 0.5 and above are typically considered acceptable given the complexity of phenomena being studied.

Each of the three predictor variables was divided into three subgroups resulting in 27 (3 x 3 x 3) groups. CGA provided DIRT data for calendar years 2021–2023, both the full datasets and the three-year consistent companies as used in the “trending” sections of the DIRT Report and Interactive Dashboard. DIRT data was aggregated to county level (FIPS codes) and sorted into the 27 variable groups for each data-year.

Index Variables (2023)

All model interpretations involve comparison to the average U.S. county. For example, a ten-inch increase in the amount of annual precipitation in a county compared to a county with the average amount of precipitation results in a 0.1 percentage point increase in the amount of underground damage reports. A one percent increase in the number of companies in selected industries, compared to a U.S. county with the average number of companies in selected industries, results in a 0.55 percentage point increase in damage reports. Company counts are based on the U.S. census results, and both headquarters and other physical (ex: satellite offices) locations are included. As a county's degree of urbanicity decreases (compared to the a county with average urbanicity), it is 10.3% less likely to have an underground damage report.

After applying the model to different combinations of datasets (i.e. full vs three-year consistent reporting companies) and testing how variations in DIRT reporting levels would affect the model, it was decided to use as the basis of the model the three-year-consistent-reporting-company DIRT dataset, and the number of weighted damages at the 80th percentile for each group.

The assumption is that all counties within the same group would behave similarly in terms of number of damages. With DIRT reporting, we know that damages are not overreported,[2] but rather that there are some counties where damages are underreported due to low or non-existent DIRT participation. Theoretically, each county could have had as many damages as the county with the highest (maximum) count in their respective group. However, to remove outliers at the high end and ensure stability in the model year-over-year, and provide a more conservative estimate, is the model conservatively assumes that each county within the group could have as many damages as the 80th percentile.

 

[2] Other than for multiple reports of the same event, which the model accounts for by using “weighted” damages.

Results

In consultation with CGA, to make the Index interpretable and intuitive it was decided to use 2022 as the “baseline” or “Year Zero” for purposes of “50-in-5,” and to scale it such that the 2022 Index value is set at 100.

Using Group 14 as an example, the steps to calculate the Index are as follows:

  1. Multiply the number of calendar year (CY-)2023 counties by the 80th percentile value for that group:  202 * 28.00 = 5,656
  2. Do the same for CY-2022: 188 * 31.00 = 5,828. 
  3. Divide the 2023 value by the 2022 value. The 2023 Index value for group 14 is (5,656 / 5,828) * 100 = 97

Note that the number of counties within a group may change year-to-year due to movement within the variables, such as more or less precipitation. Also note that for all groups the 2022 value is always 100, but 2023 can be above or below 100 (example: group 15 = 113). Having an Index value for each group will enable new analysis opportunities and focus on areas that could have the greatest potential for improvement.

The Index value for the entire U.S. is the sum of 2023 80th percentiles divided by same value for 2022, multiplied by 100. This comes to 94.0 for 2023 – year one of 50-in-5.

To account for variations in reporting levels going forward, while also ensuring consistency of the model, each year the most recent set of three-year consistent reporting companies will be identified. By using the 80th percentile value for each year-county group, the model is designed to identify overall trends (up or down) but also absorb fluctuations in the number of reporting companies.

Hanover recommends that, in addition to this primary three-year dataset, an additional subset of two-year consistent reporters be reviewed and potentially reported on in a directional manner. This group represents the reporters one year away from meeting the primary consistent reporter definition and as such will include more reporters but have a slightly lower degree of confidence. This review will include a summary of the geographical and stakeholder assignment for the two-year consistent reporters, as well as an alternate estimate of the Index. This step will allow for additional exploration of the reporting sample, to ensure that the Index is calculated on a representative group of reporters.

Damage Prevention in Your State

Explore damage prevention information, local contacts and rules for safe digging in North America.

Find Your State

CGA Toolkits

CGA has created a suite of toolkits designed to help members generate public awareness about the importance of damage prevention.

Explore Resources