This week, the U.S. Census Bureau will release the final data product from the 2020 Census. On Sept. 19, we’ll release the Supplemental Demographic and Housing Characteristics File (S-DHC).
While the pandemic delayed our operations, we moved deliberately to ensure we produced the high-quality statistics the public expects and to implement new confidentiality protections.
In this blog, our goal is to equip you for the release of the S-DHC.
Combining People and Households
The S-DHC does what its name implies – it supplements the data we released in May 2023 through the Demographic and Housing Characteristics File (DHC). The DHC provided information about people (age, sex, race, ethnicity, relationship to the householder) and households (household type such as family/nonfamily and owner/renter) – in mostly separate tables.
On the other hand, the S-DHC combines data about people and households in the same tables. Specifically, it provides statistics for the average size of families and households, as well as counts of people living in certain types of households.
For example, from DHC we learned:
- The number of married-couple households that had children living in them.
- The number of households that owned their home with a mortgage or a loan.
In the S-DHC, we’ll learn:
- The number of children who lived in married-couple households.
- The average household size of people living in a home that’s owned with a mortgage or a loan.
The differences may seem subtle, but combining these details about the structure of households and the people living in them complicates protecting the confidentiality of the data, which we’ll talk more about below.
As a result of the need for stronger disclosure avoidance techniques, we are only releasing the S-DHC data at the national and state levels – a decision that enables us to both protect respondent confidentiality and provide quality statistics.
Topics and Race Groups
The S-DHC will provide eight tables, and six of them will be repeated for race and Hispanic origin groups.
The tables available are:
- Average household size by age.*
- Household type (e.g., family or nonfamily households) for the population in households.
- Population under the age of 18 by relationship and household type. *
- Population in families by age.*
- Average family size by age.*
- Family type and age for own children under the age of 18.
- Total population in occupied housing units by tenure.*
- Average household size of occupied housing units by tenure.*
* The tables marked with an asterisk are available by the following race and Hispanic origin groups:
- White alone.
- Black or African American alone.
- American Indian and Alaska Native alone.
- Asian alone.
- Native Hawaiian and Other Pacific Islander alone.
- Some Other Race alone.
- Two or More Races.
- Hispanic or Latino.
- White alone, not Hispanic or Latino.
Note that the S-DHC data are not available for detailed race and ethnicity groups, such as Chinese or Mexican, unlike the recent Detailed DHC-B and Detailed DHC-A products.
Guidance on Using the Data
As we mentioned above, protecting the combined person and household data is complicated and requires robust disclosure avoidance methods. Combining the data increases the risk of disclosing information about individuals because information for each person in the household (especially the householder) is linked to the information for everyone else in the household. This interrelationship makes it much harder to obscure the effect that one person’s record has on the others, which in turn makes it harder to guarantee that they are protected.
As with other 2020 Census data products, we protected the data by adding “statistical noise” – small, random additions or subtractions to the data, but with this data product we’ve also taken a couple of additional steps:
- Truncation. For the S-DHC, we needed to protect outliers in the data, specifically very large households. For these households, we removed individuals at random until the household met a household size threshold. We call this “truncation.” We’ll describe it in a lot more detail in an upcoming brief about disclosure avoidance in the S-DHC.
- Statistical post-processing and credible intervals. In post-processing, we also corrected for illogical situations (like negative numbers) that can arise from including statistical noise. This post-processing also allows us to create and publish “credible intervals” alongside the estimated counts. The intervals represent a range of where we expect the true (but truncated) count to fall 90% of the time. Note that unlike the margins of error published alongside American Community Survey estimates, the intervals are not uniform, such as ±3. That’s why we publish both the low and high end of the intervals. Figure 1 shows an example of the credible intervals (with the not-yet-released data blurred out).
Figure 1. Example of Credible Intervals
Providing the intervals was an innovative step for us, as it marks the first time that we have published decennial census statistics with associated estimates of disclosure avoidance-related error. While they don’t reflect all sources of error, such as coverage error and truncation error, we hope they will help you gauge the impact of confidentiality protections on the quality of the S-DHC data.
Finally, we’ll note that because of the independent inclusion of noise into the statistics, there will be some inconsistencies in the S-DHC, just like there were with other 2020 Census data products. For example:
- While “total population in households” and “total population in occupied housing units” represent the same thing, they often will not match. Use the “total population in occupied housing units” in table PH7 when possible because it has less noise than the total from table PH2.
- Average household size usually matches across tables but occasionally may not. Use data in the table most closely related to your subject of interest.
- Aggregating data across states may not match the national total.
- Aggregating race and ethnicity iterations does not sum to the total household population count.
- Totals in the S-DHC do not match other 2020 Census data products.
With these situations in mind, we encourage you to use caution when aggregating published counts to produce statistics for custom groups or geographies. Adding up the data will accumulate more noise.
Conclusion
We hope you will find the S-DHC informative about the people living in certain types of households in your state and in the country. While we have released similar, more recent data from the American Community Survey, the S-DHC represents the strength of many, many more responses since it comes from the 2020 Census. We’ve done our best to protect those responses and provide you with timely, relevant data.
While we’re excited the S-DHC wraps up the 2020 Census data products, we’re already looking ahead to the 2030 Census data products. In the coming months, we plan to share more information about our 2030 Census research on disclosure avoidance, data product planning, and public engagement opportunities.
Thank you for your input along the way as we developed the 2020 Census data products!