-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate functions #6
Comments
@fgregg hey we're looking to develop this functionality at ARGO. Key need to aggregate census statistics like median income correctly for our California water agency partners which have service area boundaries that don't align nicely with census boundaries. You know the story :) We have a team of CUSP grad students looking to sprint on this mid December to mid January and would love your thoughts. Plan is a simple fork for the sprint and then can PR assuming everything works nicely :) |
Sounds great! I think I would start by following the Census's guidance on aggregating statistics https://www.census.gov/content/dam/Census/library/publications/2018/acs/acs_general_handbook_2018_ch08.pdf It would be very, very nice to make use of the variance data that the census has started to make available. https://www.census.gov/programs-surveys/acs/data/variance-tables.html but that's probably a phase II or phase III project. I'd also recommend that you develop the aggregation code in a separate files from the existing ones, as it may be nice, in the future, to pull the aggregate code into a separate library. |
Hey @fgregg - I put together an initial project board for our team of students. I will be continuing to update that, but wanted to drop it in this thread for those interested. I also wanted to run the actual technical approach by you all to increase the probability of things lining up nicely. So right now looks like there is a family of .geo_X() methods that can return geojson-like structures with statistics and geometries for lower level census geographies within higher level ones as well as for arbitrary geometries. (Though for sf3, the naming convention changes?) One approach came to mind that would act pretty independently of the existing codebase, which would allow us to pull things into a separate library if that ends up feeling better. In this approach, one would create a new aggregator function that takes as inputs the statistic and geometry outputs of the .geo_X() methods along with the type of statistic to aggregate and the geometry to aggregate to--thinking is that this last piece would be necessary to properly downscale the statistics for the partial edge geometries. So something like:
Any feedback there? Lastly, on the Census Data API side of things, the table and attribute names do seem cryptic--e.g. The human-readable table/attribute name --> code direction might be tough, but the other direction doesn't seem too far-fetched and it would really be great if these codes were parsable for type of statistic. This could be used to help prevent statistical gotchas like trying to aggregate a median like an average. Not sure if you all have thought about this bit. May be for down the road though. Hopefully explicitly asking the user to provide type of statistic is a reasonable enough solution for now. |
|
Do you mean that the desired shape can cut across census geographies, and you'll need to figure out what data to apportion? |
Yep, that's all I meant by that. We see that with California water district boundaries for example. |
Okay, finding the intersections is a fairly expensive operation. When we do it here: census_area/census_area/core.py Lines 62 to 63 in 5e62f7d
It would be probably be a good idea to go ahead and return the proportion of the census tract covered falling withing the target geography, and stuff it into the statistics dictionary. That's coverage proportion is probably what you need you would be calculating with If you did it that way, you would only need "sequence of statistics", "sequences of weights", "type of statistics" |
Nice, thanks a lot Forest. I'll look into that. |
weights are going to be important as, for example, sometimes you'll want to know size of the associated population. Anyway, i think you have enough to move forward. |
@dmarulli, any updates on your project? |
His student team has there kickoff call scheduled for this upcoming Friday 12/21 so probably not. |
@fgregg FYI the functionality to calculate the areal interpolation is getting pretty close though some outstanding refactoring to clean up the student code. See here for the latest: https://github.com/argo-marketplace/census_area/tree/dev_branch Do you A) have any stylistic preferences on integration to note and B) capacity to help with that integration (bit swamped on our end)? Thanks much! |
Hi @patwater, this looks like it's pretty far from ready to be brought in. There are some nice ideas in here, but
I'm sorry to hear that you don't have the bandwidth to work on the integration. Let me know when you do. |
Yeah I hear you. Part of working with grad students early in their program... will keep you posted. |
Some interest reviving here (also I want my Hacktoberfest contributions ;). @fgregg I see your reference to |
It would be great if census_area handled the aggregations of census variables correctly.
Prior art
The text was updated successfully, but these errors were encountered: