-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DATA] population statistics error #436
Comments
Thanks -- I see digits transposed in the Northwest figures, but can't see the problem in Gauteng: 15176115 corresponds to Figure 1 on page vi. Could you elaborate please. |
Good Day
The SA Stats document in the appendix notes the Gauteng Population as 15176116.
Kind Regards
Lethabo Maluleke
…-------- Original message --------
From: Scott Hazelhurst <[email protected]>
Date: Thu, Jun 11, 2020, 10:55 AM
To: dsfsi/covid19za <[email protected]>
Cc: "Maluleke, LM, Miss [[email protected]]" <[email protected]>, Author <[email protected]>
Subject: Re: [dsfsi/covid19za] [DATA] (#436)
CAUTION: This email originated from outside of the University. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Thanks -- I see digits transposed in the Northwest figures, but can't see the problem in Gauteng: 15176115 corresponds to Figure 1 on page vi. Could you elaborate please.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdsfsi%2Fcovid19za%2Fissues%2F436%23issuecomment-642509193&data=02%7C01%7C%7C721b86de95de4992d18d08d80de514c1%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637274625321315354&sdata=smzTrm1zlrqkJpM0gTDQUT3TStoygan11dBZmHhuMm0%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKAA2Q5TLAEZJLMH3GV6E5DRWCLVBANCNFSM4N2YYO3Q&data=02%7C01%7C%7C721b86de95de4992d18d08d80de514c1%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637274625321315354&sdata=CkW5GG1mGnriPwFvKdlaJbDeKV9eqhsuiQgGuQwxL%2B8%3D&reserved=0>.
[https://cdn.sun.ac.za/100/ProductionFooter.jpg]<https://www.sun.ac.za/english/about-us/strategic-documents>
The integrity and confidentiality of this email are governed by these terms. Disclaimer<https://www.sun.ac.za/emaildisclaimer>
Die integriteit en vertroulikheid van hierdie e-pos word deur die volgende bepalings bereël. Vrywaringsklousule<https://www.sun.ac.za/emaildisclaimer>
|
Thanks @18306063 @shaze we also have the statssa midyear estimates now in the staging area folder. We might want to just make a choice on where to put that, maybe data/official_statistics/ |
OK -- the one in data/district_data has been there longer so there may be scripts dependant on it. But easy to change so it is more important to have it in the right logical place so I have no objection moving or replacing it But if using the new file I think needs to be made program friendly -- if you read in with Pandas it seems the columns as text by default, and even harder to handle if not using Pandas |
@elolelo Can you comment. |
Hi Lethabo Thanks -- it seems that they've slightly contradictory figures in the same document. Fortunately only off by 1 so way below any error mark (also adding the provincial figures does not give the total figure so we can't check that way to find which is correct) The NW error is definitely wrong. Will push with today's figures Will fix and push in few minutes |
I am not sure to what extent are these new files program friendly. They may be changed if necessary. |
Thanks. Ideally they must be computer-readable -- Pandas is the most flexible so readable by Pandas is essential.
Also for the age break down file, I think having 5 provinces followed by 4 provinces is very difficult fo a computer to follow. Two possible formats are below. My preference would be for 1 though 2 is what we're doing in other places and may be more human friendly.
Have columns: province, age group, male, female, total Province is repeated
Using the same format that we're using for keys Note using the same convention as we do for district -- spaces separating words in names of provinces and tabs separating the name of the province from the category. This approach is very readable in GitHub, but programs can parse easily and using the convention of tabs separating the province name from the category means that Final point -- I note in several places that the total is not equal to the sum of males and females. I doubt that these figures were done at time where non-binary categories were allowed so they are likely to be errors (in the source document). It might be worth pointing this out in the README. The discrepancy is so small as to be inconsequential for any work being done. Many thanks for all this work -- it is very helpful |
Which Dataset
The za_province_pop
Error Description
The Gauteng and NorthWest populations do not correspond to the National Statistics PDF document
Suggested fixes
The text was updated successfully, but these errors were encountered: