Methodology
Hanover gathered precipitation data for 2021-2023 for 3,137 counties across the U.S., available through the NOAA National Climatic Data Center. Total annual precipitation was determined for each county and year.
Hanover gathered data from the U.S. Census Bureau on the total number of companies by county and industry, as defined by the North American Industry Classification System (NAICS). Those were then subset to industries of interest selected by CGA. These industries are outlined in Table 1. Data for 2021-2023 was available for 3,030 counties across the U.S.
The U.S. Department of Agriculture (USDA) Economic Research Service’s rural-urban continuum code was used to code degree of urbanicity. This nine-point scale classifies a county from 1 (most urban) to 9 (most rural), using population size and adjacency to metro areas. Data for 2021-2023was available for 3,234 counties and county equivalents. Table 2 provides a description of the 9 points of the scale.
Each of the predictor variables were significant at p<0.0001. The overall model adjusted R Squared (R2) was 0.54, indicating that 54% of the variation in the model is explained by the independent variables included. This is considered a moderately strong R2 in social science fields, where values of 0.5 and above are typically considered acceptable given the complexity of phenomena being studied.
Each of the three predictor variables was divided into three subgroups resulting in 27 (3 x 3 x 3) groups. CGA provided DIRT data for calendar years 2021–2023, both the full datasets and the three-year consistent companies as used in the “trending” sections of the DIRT Report and Interactive Dashboard. DIRT data was aggregated to county level (FIPS codes) and sorted into the 27 variable groups for each data-year.
Index Variables (2023)
All model interpretations involve comparison to the average U.S. county. For example, a ten-inch increase in the amount of annual precipitation in a county compared to a county with the average amount of precipitation results in a 0.1 percentage point increase in the amount of underground damage reports. A one percent increase in the number of companies in selected industries, compared to a U.S. county with the average number of companies in selected industries, results in a 0.55 percentage point increase in damage reports. Company counts are based on the U.S. census results, and both headquarters and other physical (ex: satellite offices) locations are included. As a county's degree of urbanicity decreases (compared to the a county with average urbanicity), it is 10.3% less likely to have an underground damage report.
After applying the model to different combinations of datasets (i.e. full vs three-year consistent reporting companies) and testing how variations in DIRT reporting levels would affect the model, it was decided to use as the basis of the model the three-year-consistent-reporting-company DIRT dataset, and the number of weighted damages at the 80th percentile for each group.
The assumption is that all counties within the same group would behave similarly in terms of number of damages. With DIRT reporting, we know that damages are not overreported,[2] but rather that there are some counties where damages are underreported due to low or non-existent DIRT participation. Theoretically, each county could have had as many damages as the county with the highest (maximum) count in their respective group. However, to remove outliers at the high end and ensure stability in the model year-over-year, and provide a more conservative estimate, is the model conservatively assumes that each county within the group could have as many damages as the 80th percentile.
[2] Other than for multiple reports of the same event, which the model accounts for by using “weighted” damages.