🔀 Merge branch:dev_atkinson into branch:main (#19)

* ✨ Added `holder` argument to `atkinson()` * 'ndi' version 0.1.6.9003 * Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = FALSE`. * Add example for `holder` argument in `atkinson()` function in README
idblr · Aug 23, 2024 · 6448163 · 6448163
1 parent 2828b07
commit 6448163
Show file tree

Hide file tree

Showing 10 changed files with 98 additions and 24 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: ndi
 Title: Neighborhood Deprivation Indices
-Version: 0.1.6.9002
-Date: 2024-08-22
+Version: 0.1.6.9003
+Date: 2024-08-23
 Authors@R:
     c(person(given = "Ian D.",
              family = "Buller",

diff --git a/NEWS.md b/NEWS.md
@@ -1,13 +1,14 @@
 # ndi (development version)
 
-## ndi v0.1.6.9002
+## ndi v0.1.6.9003
 
 ### New Features
 * Added `hoover()` function to compute the aspatial racial/ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089)
 * Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0)
 * Added `lieberson()` function to compute he aspatial racial/ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118)
 * Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `lieberson()`, `sudano()`, and `white()`, `white_blau()` functions.
 * Thank you for the feature suggestions, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592)
+* Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = FALSE`.
 
 ### Updates
 * `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout
@@ -17,7 +18,8 @@
 * 'package.R' deprecated. Replaced with 'ndi-package.R'
 * Re-formatted code and documentation throughout for consistent readability
 * Updated documentation about value range of *V* (White) from `{0 to 1}` to `{-Inf to Inf}`
-* Add examples for `hoover()` and `white_blau()` in vignette and README
+* Add examples for `hoover()` and `white_blau()` functions in vignette and README
+* Add example for `holder` argument in `atkinson()` function in README
 * Reformatted functions for consistent internal structure
 * Updated examples in vignette to showcase a larger variety of U.S. states
 * Updated examples in functions to better describe the metrics

diff --git a/R/atkinson.R b/R/atkinson.R
@@ -7,6 +7,7 @@
 #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.
 #' @param subgroup Character string specifying the income or racial/ethnic subgroup(s) as the comparison population. See Details for available choices.
 #' @param epsilon Numerical. Shape parameter that denotes the aversion to inequality. Value must be between 0 and 1.0 (the default is 0.5).
+#' @param holder Logical. If TRUE, will compute index using the Hölder mean. If FALSE, will not compute with the Hölder mean. The default is FALSE.
 #' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE.
 #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.
 #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics
@@ -85,6 +86,7 @@ atkinson <- function(geo_large = 'county',
                      year = 2020,
                      subgroup,
                      epsilon = 0.5,
+                     holder = FALSE,
                      omit_NAs = TRUE,
                      quiet = FALSE,
                      ...) {
@@ -124,6 +126,7 @@ atkinson <- function(geo_large = 'county',
 
     # Select census variables
     vars <- c(
+      TotalPop = 'B03002_001',
       NHoL = 'B03002_002',
       NHoLW = 'B03002_003',
       NHoLB = 'B03002_004',
@@ -143,11 +146,10 @@ atkinson <- function(geo_large = 'county',
       HoLSOR = 'B03002_018',
       HoLTOMR = 'B03002_019',
       HoLTRiSOR = 'B03002_020',
-      HoLTReSOR = 'B03002_021',
-      MedHHInc = 'B19013_001'
+      HoLTReSOR = 'B03002_021'
     )
 
-    selected_vars <- vars[subgroup]
+    selected_vars <- vars[c('TotalPop', subgroup)]
     out_names <- names(selected_vars) # save for output
     in_subgroup <- paste0(subgroup, 'E')
 
@@ -297,7 +299,7 @@ atkinson <- function(geo_large = 'county',
     ## Compute
     out_tmp <- out_dat %>%
       split(., f = list(out_dat$oid)) %>%
-      lapply(., FUN = a_fun, epsilon = epsilon, omit_NAs = omit_NAs) %>%
+      lapply(., FUN = a_fun, epsilon = epsilon, omit_NAs = omit_NAs, holder = holder) %>%
       utils::stack(.) %>%
       dplyr::mutate(
         A = values,
@@ -306,7 +308,7 @@ atkinson <- function(geo_large = 'county',
       dplyr::select(A, oid)
 
     # Warning for missingness of census characteristics
-    missingYN <- as.data.frame(out_dat[, in_subgroup])
+    missingYN <- as.data.frame(out_dat[, c('TotalPopE', in_subgroup)])
     names(missingYN) <- out_names
     missingYN <- missingYN %>%
       tidyr::pivot_longer(

diff --git a/R/utils.R b/R/utils.R
@@ -18,20 +18,32 @@ d_fun <- function(x, omit_NAs) {
 # Internal function for the Atkinson Index (Atkinson 1970)
 ## Returns NA value if only one smaller geography in a larger geography
 ## If denoting the Hölder mean
-a_fun <- function(x, epsilon, omit_NAs) {
-  if (omit_NAs == TRUE) { 
-    xx <- stats::na.omit(x$subgroup)
-  } else {
-    xx <- x$subgroup
-  } 
+a_fun <- function(x, epsilon, omit_NAs, holder) {
+  xx <- x[ , c('TotalPopE', 'subgroup')]
+  if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] }
   if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) {
     NA
   } else {
-    if (epsilon == 1) {
-      1 - (exp(mean(log(stats::na.omit(xx)))) / mean(xx, na.rm = TRUE))
+    if (holder == TRUE) {
+      x_i <- xx$subgroup
+      if (epsilon == 1) {
+        A <- 1 - (exp(mean(log(stats::na.omit(x_i)), na.rm = TRUE)) / mean(x_i, na.rm = TRUE))
+        return(A)
+      } else {
+        xxx <- (x_i / mean(x_i, na.rm = TRUE)) ^ (1 - epsilon)
+        A <- 1 - mean(xxx, na.rm = TRUE) ^ (1 / (1 - epsilon))
+        return(A)
+      }
     } else {
-      xxx <- (xx / mean(xx, na.rm = TRUE)) ^ (1 - epsilon)
-      1 - mean(xxx, na.rm = TRUE) ^ (1 / (1 - epsilon)) 
+      x_i <- xx$subgroup
+      X <- sum(xx$subgroup, na.rm = TRUE)
+      t_i <- xx$TotalPopE
+      N <- sum(xx$TotalPopE, na.rm = TRUE)
+      p_i <- x_i / t_i
+      P <- X / N
+      b <- epsilon
+      A <- 1 - (P / (1 - P)) * abs(sum((1 - p_i) ^ (1 - b) * p_i ^ b * t_i / (P * N), na.rm = TRUE)) ^ (1 / (1 - b))
+      return(A)
     }
   }
 }

diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@
 [![DOI](https://zenodo.org/badge/521439746.svg)](https://zenodo.org/badge/latestdoi/521439746)
 <!-- badges: end -->
 
-**Date repository last updated**: 2024-08-22
+**Date repository last updated**: 2024-08-23
 
 ### Overview
 
@@ -796,6 +796,54 @@ ggplot() +
 
 ![](man/figures/a.png)
 
+```r
+# ----------------------------------------------------------------------------- #
+# Compute aspatial racial/ethnic Atkinson Index (Atkinson) with the Hölder mean #
+# ----------------------------------------------------------------------------- #
+
+# Atkinson Index based on Atkinson (1970)
+## Selected subgroup: Not Hispanic or Latino, Black or African American alone
+## Selected large geography: census tract
+## Selected small geography: census block group
+## Default epsilon (0.5 or over- and under-representation contribute equally)
+## Using the Hölder mean based on the `Atkinson()` function from 'DescTools' package
+A_2020_DC <- atkinson(
+  geo_large = 'tract',
+  geo_small = 'block group',
+  state = 'DC',
+  year = 2020,
+  subgroup = 'NHoLB',
+  holder = TRUE
+)
+
+# Obtain the 2020 census tracts from the 'tigris' package
+tract_2020_DC <- tracts(state = 'DC', year = 2020, cb = TRUE)
+
+# Join the AI (Atkinson) values to the census tract geometry
+A_2020_DC <- tract_2020_DC %>%
+  left_join(A_2020_DC$a, by = 'GEOID')
+
+ggplot() +
+  geom_sf(
+    data = A_2020_DC,
+    aes(fill = A),
+    color = 'white'
+  ) +
+  theme_bw() +
+  scale_fill_viridis_c(limits = c(0, 1)) +
+  labs(
+    fill = 'Index (Continuous)',
+    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
+  ) +
+  ggtitle(
+    'Atkinson Index (Atkinson) with Hölder mean\n
+    Washington, D.C. census block groups to tracts',
+    subtitle = expression(paste('Black non-Hispanic (', epsilon, ' = 0.5)'))
+  )
+```
+
+![](man/figures/a_holder.png)
+
 ```r
 # ------------------------------------------------------- #
 # Compute aspatial racial/ethnic Interaction Index (Bell) #
@@ -889,7 +937,6 @@ ggplot() +
     Washington, D.C. census block groups to tracts',
     subtitle = 'Black non-Hispanic'
   )
-  ggsave('man/figures/v.png', width = 7, height = 7)
 ```
 
 ![](man/figures/v.png)
@@ -1138,7 +1185,7 @@ This package was originally developed while the author was a postdoctoral fellow
 
 ### Acknowledgments
 
-The [`messer()`](R/messer.R) function functionalizes the code found in [Hruska et al. (2022)](https://doi.org/10.1016/j.janxdis.2022.102529) available on an [OSF repository](https://doi.org/10.17605/OSF.IO/M2SAV), but with percent with income less than $30K added to the computation based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x). The [`messer()`](R/messer.R) function also allows for the computation of *NDI* (Messer) for each year between 2010-2020 (when the U.S. census characteristics are available to date). There was no code companion to compute *NDI* (Powell-Wiley) included in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) or [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) only a [description](https://www.gis.cancer.gov/research/NeighDeprvIndex_Methods.pdf), but the package author worked directly with the latter manuscript authors to replicate their [*SAS*](https://www.sas.com) code in [**R**](https://cran.r-project.org/) for the [`powell_wiley()`](R/powell_wiley.R) function. See the Accumulating Data to Optimally Predict Obesity Treatment [(ADOPT)](https://gis.cancer.gov/research/adopt.html) Core Measures Project for more details. Please note: the *NDI* (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) because the two studies used a different statistical platform (i.e., [*SPSS*](https://www.ibm.com/spss) and [*SAS*](https://www.sas.com), respectively) that intrinsically calculate the principal component analysis differently from [**R**](https://cran.r-project.org/). The internal function to calculate the Atkinson Index is based on the `atkinson()` function in the [*DescTools*](https://cran.r-project.org/package=DescTools) package.
+The [`messer()`](R/messer.R) function functionalizes the code found in [Hruska et al. (2022)](https://doi.org/10.1016/j.janxdis.2022.102529) available on an [OSF repository](https://doi.org/10.17605/OSF.IO/M2SAV), but with percent with income less than $30K added to the computation based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x). The [`messer()`](R/messer.R) function also allows for the computation of *NDI* (Messer) for each year between 2010-2020 (when the U.S. census characteristics are available to date). There was no code companion to compute *NDI* (Powell-Wiley) included in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) or [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) only a [description](https://www.gis.cancer.gov/research/NeighDeprvIndex_Methods.pdf), but the package author worked directly with the latter manuscript authors to replicate their [*SAS*](https://www.sas.com) code in [**R**](https://cran.r-project.org/) for the [`powell_wiley()`](R/powell_wiley.R) function. See the Accumulating Data to Optimally Predict Obesity Treatment [(ADOPT)](https://gis.cancer.gov/research/adopt.html) Core Measures Project for more details. Please note: the *NDI* (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) because the two studies used a different statistical platform (i.e., [*SPSS*](https://www.ibm.com/spss) and [*SAS*](https://www.sas.com), respectively) that intrinsically calculate the principal component analysis differently from [**R**](https://cran.r-project.org/). The internal function to calculate the Atkinson Index with the Hölder mean is based on the `Atkinson()` function in the [*DescTools*](https://cran.r-project.org/package=DescTools) package.
 
 When citing this package for publication, please follow:
 

diff --git a/cran-comments.md b/cran-comments.md
@@ -5,15 +5,16 @@
   * Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0)
   * Added `lieberson()` function to compute he aspatial racial/ethnic Isolation Index (_xPx\*_) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0)
   * Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `sudano()`, and `white()`, `white_blau()` functions.
-  * Thank you for the feature suggestions, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592)
+  * Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = TRUE`.
   * `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout
   * Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` functions when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA
   * Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpy_star_fun()`, respectively.
   * `tigris` and `units` are now Imports
   * 'package.R' deprecated. Replaced with 'ndi-package.R'
   * Re-formatted code and documentation throughout for consistent readability
   * Updated documentation about value range of *V* (White) from `{0 to 1}` to `{-Inf to Inf}`
-  * Add examples for `hoover()` and `white_blau()` in vignette and README
+  * Add examples for `hoover()` and `white_blau()` functions in vignette and README
+  * Add example for `holder` argument in `atkinson()` function in README
   * Reformatted functions for consistent internal structure
   * Updated examples in vignette to showcase a larger variety of U.S. states
   * Updated examples in functions to better describe the metrics

diff --git a/man/atkinson.Rd b/man/atkinson.Rd
diff --git a/man/figures/a.png b/man/figures/a.png
diff --git a/man/figures/a_holder.png b/man/figures/a_holder.png
diff --git a/tests/testthat/test-atkinson.R b/tests/testthat/test-atkinson.R
@@ -69,6 +69,13 @@ test_that('atkinson works', {
     subgroup = c('NHoLB', 'HoLB')
   ))
 
+  expect_silent(atkinson(
+    state = 'DC',
+    year = 2020,
+    subgroup = c('NHoLB', 'HoLB'),
+    holder = TRUE
+  ))
+
   expect_silent(atkinson(
     state = 'DC',
     year = 2020,