This week's data is from The Washington Post courtesy of Kate Rabinowitz, Laura Meckler, and Armand Emamdjomeh.
A lot of the visualizations were written in a scrollytelling
format with JavaScript
. If you want to play around with a similar format you could try out the experimental package rolldown
by Yihui. There is geospatial and shapefile data linked below in the methodology.
A methodology section taken verbatim from the article is below:
"This analysis used the Common Core of Data from the National Center for Education Statistics (NCES). Charter and private schools were excluded because the government has limited control over them. Virtual schools were also excluded.
The Washington Post used data from the 1994-1995 school year, the earliest near-comprehensive data, and from 2016-2017, the latest available data. Findings were checked against interim years at a five-year interval.
Diversity was defined by the proportion of students in the dominant racial group. Diverse districts are places where fewer than 75 percent of students are of the same race. Undiverse districts are where 75 to 90 percent of students are the same race. In extremely undiverse districts one racial group constitutes more than 90 percent of students.
Black, Asian, Native American and white data excludes anyone with Hispanic ethnicity. Asian includes Asians, Native Hawaiians and other Pacific Islanders. Multiracial was not a racial category in 1995.
The Post measured integration for diverse school districts that have at least six schools, more than 1,000 students and where the sum total of black and hispanic students was at least 5 percent and no more than 95 percent of students.
The variance or correlation ratio, also referred to as eta-squared, was used to measure integration. The ratio calculates how isolated a racial group or groups are while controlling for the demographics of the district. The variance ratio was computed for black and Hispanic students because of the history of exclusion and achievement gaps faced by these groups.
Integration groupings were defined by calculating Jenks breaks, a classification method for optimally determining data groupings, for the most recent data and applying it to earlier data.
The Post confirmed findings against an analysis that looked only at elementary schools, a method some researchers use to better control for differences in race across age groups and the typically smaller number of upper-level schools.
Geographic classifications are from NCES. Geospatial data is courtesy of the U.S. Census Bureau.
The code for this analysis and the output data can be found here."
school_diversity <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-24/school_diversity.csv")
variable | class | description |
---|---|---|
LEAID | character | Unique school id |
LEA_NAME | character | School District Name |
ST | character | State of school district |
d_Locale_Txt | character | Type of school district, town, rural, city, suburban combined with distant, remote, fringe, small, midsize, large |
SCHOOL_YEAR | character | School year (either 1994-1995 or 2016-2017) |
AIAN | double | American indian and alaskan native proportion of student population |
Asian | double | Asian proportion of student population |
Black | double | Black proportion of student population |
Hispanic | double | Hispanic proportion of student population |
White | double | White proportion of student population |
Multi | double | Multi-ethnic proportion of student population |
Total | double | Total student body count |
diverse | character | Diverse rating (Diverse, undiverse, extremely undiverse) |
variance | double | the variance ratio |
int_group | character | the level of integration, defined as "Highly integated", "Somewhat integrated" and "Not integrated" |