In Fall 2007 the Horace Hagedorn Foundation funded the Center for Urban Research to help the foundation improve its grantmaking work on immigration issues by providing Census-related information that would help to better understand the changes in Hispanic demographics throughout Long Island. At a time when national attention has focused on the strain between established residents and newly arrived immigrants in this suburban region, the Foundation supports creative community and public-private initiatives that work to diminish these tensions. Our maps and data analysis will provide valuable tools in support of this important work.
This case study describes how we used Census data sets, geographic information system (GIS) software, and statistical analysis for one component of our work for the Foundation -- estimating the Hispanic population on Long Island at the level of state legislative districts (Senate and Assembly) and county legislative districts. We estimate total Hispanic population, foreign- and native-born Hispanic population, and the population of Hispanics eligible to vote (citizens 18 and over) by legislative district.
The United States Census Bureau conducts a census of population every ten years. Now more than seven years after the 2000 Census, the Foundation needed an up-to-date understanding of how the size and spatial patterns of the Hispanic population on Long Island has changed.
In particular, the Foundation and its grantees sought to visualize where the current Hispanic population was concentrated in relation to state and local policymakers whose leadership could facilitate solutions to immigrant issues. The Center met this goal by analyzing data from the latest Census surveys and comparing current data with the 2000 decennial Census. We used GIS to allocate those population estimates to legislative districts, then created maps to visualize these patterns and inform the foundation's grantmaking.
Allocating Census data to non-Census geographies
In 2000, one out of every six households was asked a range of sociodemographic questions on topics such as income, commuting, housing characteristics, family structure, etc., as part of the Census Long Form. In order to provide current data during the intercensal period, the Bureau is replacing the Long Form with the annual American Community Survey (ACS).
For the 2005 and 2006 ACS, the smallest geography reported is the Public Use Microdata Area (PUMA), a census-defined statistical area with at least 100,000 persons. PUMAs do not correspond to legislative districts or any other geography (such villages, towns, or counties). This can be clearly seen in the figure below. PUMA geography is constrained only in that PUMAs cannot cross state lines and they must be built out of Census Tracts (a unit of Census geography that includes approximately 4,000 persons).
The challenge is therefore to take population data reported at the level of the PUMA, such as the number of Hispanic immigrants, and allocate the population to legislative districts. While GIS software makes it easy to perform an areal allocation—if 40% of a PUMA’s area falls within District X, 40% of the Hispanic population is allocated to District X—it would be better to do the allocation in a way that takes into account the likely distribution (i.e., physical location) of Hispanics within the PUMA. While this data is not reported as part of the 2005 and 2006 ACS, the 2000 Census reports characteristics of the Hispanic population (such as country of origin and immigrant vs. native birth) at the Tract level. Legislative districts may cross Tract boundaries, but do not cross the boundaries of Census Blocks (a smaller unit of geography than Tracts); therefore, our approach is to allocate the PUMA population to Blocks, then aggregate the Blocks to the legislative district.
Our analysis proceeds as follows:
- Allocate PUMA-level population counts to Tracts – this “spreads” current (2005/06) population estimates across each PUMA based on 2000 tract-level patterns;
- Allocate Tract-level population estimates to Blocks – this enables us to transfer the tract estimates to legislative districts using coterminous block-level geography; and
- Aggregate Blocks to New York State Assembly District and Senate District geographies.
Our analysis was informed in part by analyses done by Rob Paral in which he allocated large-area population estimates to smaller geographies. For example, in Undocumented Immigration by Congressional District, Paral allocates an estimate of the 2006 population of undocumented immigrants to congressional districts, by assuming that undocumented immigrants continue to follow immigration patterns from 1990-2000. Paral has worked with the Illinois Coalition of Immigrant and Refugee Rights (ICIRR), an organization with which the Hagedorn Foundation works closely.
Step 1: Obtain PUMA-level estimates of the populations of interest
We want to estimate the 2006 population of foreign-born and native-born Hispanics (mutually exclusive categories) as well as Hispanic eligible voters (a category which includes both native-born and naturalized citizens, and therefore partially overlaps both foreign- and native-born Hispanics). We obtain estimates of these three categories from ACS Table B06004I – “Place of Birth by Race (Hispanic or Latino)” and Table B05003I - “Sex by Age by Citizenship Status (Hispanic or Latino)”. We use ACS estimates for both 2005 and 2006: see Step 2 for details. Tables were downloaded from the American Fact Finder.
Some PUMAs only report total Hispanic population. In this case, we estimate immigrant, native, and vote-eligible Hispanic population of the PUMA by assuming that the proportion of each is the same in 2006 (or 2005) as it was in 2000.
In order to apply the constant-share methodology described in Step 3, we also need PUMA-level estimates of Hispanic subpopulations in 2000. We find tract-level estimates in the 2000 Census Summary File 3 (SF3) . For the foreign- and native-born Hispanic population, we obtain tract-level estimates from SF3 Table PCT63H – “Place of Birth by Citizenship Status (Hispanic or Latino)”, and aggregate the tract-level estimates over each PUMA on Long Island. SF3 does not have a table comparable to ACS Table B05003I which reports both age and citizenship status, so there is no way to extract a count of eligible voters from SF3. To estimate the population of eligible voters, we assume that citizenship of Hispanics is uncorrelated with age. The fraction of Hispanic eligible voters in each tract is then easily calculated as the fraction of Hispanic citizens multiplied by the fraction of voting-age Hispanics. For example, if 90% of the Hispanic population of a tract are citizens, and 70% are 18 years old and over, then we assume that 63% (0.9 X 0.7 = 0.63) of Hispanics in that tract are eligible voters (i.e., both citizens and at least 18 years of age) . We then aggregate these tract level-estimates over each PUMA on Long Island.
Step 2: Smooth the PUMA-level sample fluctuations in ACS data
Initially we used the ACS estimates of immigrant and native Hispanics for 2006 alone, from ACS Table B06004I – “Place of Birth by Race (Hispanic or Latino)”. But because the sample size of the ACS is much smaller than the sample size of the 2000 Census, PUMA-level estimates of the Hispanic population had large (Census-reported) margins of error. While total population (Hispanic + non-Hispanic) by PUMA was fairly stable, estimates of Hispanic population showed large changes from 2000 to 2005, and large changes again (often not in the same direction) from 2005 to 2006. In order to smooth out fluctuations that are clearly statistical artifacts, we average 2005 and 2006 PUMA-level estimates of Hispanic population.
Some PUMAs only report total Hispanic population. In this case, we estimate immigrant, native, or vote-eligible Hispanic population of the PUMA by assuming that the proportion of each is the same in 2005 or 2006 as it was in 2000.
Step 3: Allocate population of interest to Census Tracts
Relatively straightforward methods exist in the demography literature for small area population estimates when large area (in this case, PUMA) and small area (in this case, Census Tract) counts are available for one point in time (2000) and large area counts are available for a later time (2006). We use the constant-share method described in Smith, Tayman & Swanson (2002). This involves calculating the small-area share of the large-area population, and assuming that the share stays constant over time. The small-area population in 2006 is calculated as:
For example, if a given Tract contains 3% of the PUMA’s Hispanic immigrant population in 2000, it is assumed to contain 3% of the PUMA’s Hispanic immigrant population in 2006 as well. An algebraically equivalent way of conceptualizing this is that if the PUMA’s Hispanic immigrant population grows by 29%, every Tract within it is assumed to also grow by 29%.
Step 4: Allocate Tract-level estimated population to Blocks
Most Census Tracts are wholly contained within a legislative district, but for those that aren’t, some way of allocating the population to a smaller geography is necessary. Since legislative districts can be built out of Blocks, we need to allocate population to that level. Information on immigration status (foreign-born vs. native, citizenship) is not available at the Block level, but counts of total Hispanic population counts are. We allocate the population of interest by the pattern of total Hispanic population. If a Block contained 10% of the Tract’s total Hispanic population in 2000, we allocate 10% of both the immigrant and native-born 2006 Tract-level estimate to that Block. Another way of conceptualizing this is to say that we assume that the ratio of immigrant to native Hispanics is consistent throughout the Tract, or that the proportion of vote-eligible Hispanics is consistent throughout the Tract.
Step 5: Aggregate Blocks to NYS & County Legislative Districts
This is straightforward addition of the estimated populations in each Block. It is done separately for three different legislative geographies: state Assembly Districts, state Senate Districts, and county legislative districts.
Step 6: Map the results
Microsoft Access 2003 - Data from the 2005 and 2006 ACS and the 2000 Census were imported into MS Access. The calculations described in Steps 2 - 4 were performed using SQL queries. Correspondence tables that link state legislative districts to Census Blocks were available from the NYS Legislative Task Force on Demographic Research and Reapportionment. Links between varying geographies (Tracts, PUMAs, state legislative districts, etc.) were created using the JOIN statement. Aggregation of small-area estimates in Step 5 was done using the GROUP BY clause.
ArcGIS Desktop 9.2 - Early in our research, ArcGIS was used to visually inspect differences between census and legislative district geographies (see figure above), and the extent to which the population was clustered within Tracts and PUMAs. Without clustering, an areal allocation would have been adequate, but since Tracts are intended to have a roughly commensurable number of households, many Tracts were constructed with populated Census Blocks attached to unpopulated areas such as parks or transportation corridors. No correspondence table was available to link county legislative districts to Census Blocks. We used ArcGIS to perform a spatial join between county legislative districts (obtained in 2004 from the Nassau County Legislature and the Suffolk County Board of Elections) and Census Blocks. Finally, ArcGIS was used to create the final maps displayed above, which show variation in Hispanic population (by nativity and eligibility to vote) over various census and legislative geographies.
 The 100,000 minimum population is designed to guarantee the privacy of survey respondents. Beginning with the 2008 ACS (which will be released in 2009), the Census Bureau will report sociodemographic data at the Tract level, but in order to prevent disclosure, the data will be based on a 3-year average.
Note that since SF3 is based on a sample, the total number of Hispanics reported in SF3 do not equal the total number of Hispanics reported in SF1, which is based on the full-count census. For large geographic areas, SF3 should very closely match SF1.
If citizenship is correlated with age, our estimate will be biased. We did find that younger Hispanics are more likely to be citizens, most likely explained by some Hispanic minors being born in the United States to noncitizen parents. However the effect was not large, and we felt that attempting to correct for it would involve making assumptions based on limited evidence.