You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run the Sao Paulo pipeline. I have two questions.
When executing stage data.spatial.zones__{...}, I get the following error: "OverflowError: Python int too large to convert to C long"
I found out that the zone_id numbers (column AP_2010_CH in the spatial datafiles) are very big (e.g. 3550308005107). How did you resolve this error when you were running the Sao Paulo pipeline?
I was thinking about renaming the zone_ids but I have not a clear overview in which stages of the pipeline these zone numbers are used (e.g. to connect spatial and census data?). I do not want to break these connections in the pipeline. Could you explain in which stages these zone_ids are used? Or do you know a better way for solving this issue?
When looking at the raw.py for the census data, I noticed the columns that are selected from the original census datafile are these: ['V0001', 'V0011', 'V0221', 'V0222', 'V0601', 'V6036', 'V0401', 'V1004', 'V0010', 'V0641', 'V0642', 'V0643', 'V0644', 'V0628', 'V6529', 'V0504']. Later in the code these are renamed to ["federationCode", "areaCode", "householdWeight", "metropolitanRegion", "personNumber", "gender", "age", "goingToSchool", "employment", "onLeave", "helpsInWork", "farmWork", "householdIncome", "motorcycleAvailability", "carAvailability", "numberOfMembers"]. I looked up the meaning of these codes ("V...") in the documentation accompanying the census data. That is when I noticed that the order of the V-codes and the column names used for renaming is different. Is this correct? Or did I miss a processing step in the pipeline that makes sure that the correct column names are given to the V-codes?
Could you help me with these two questions?
Kind regards,
Lotte
The text was updated successfully, but these errors were encountered:
Hi @LotteNotelaers , first I have to say that we had no errors when we last used this pipeline. Since that time, the Python version has substantially changed, and so have the libraries. So it might be that something in the meantime stopped working.
Zones are essential, and a better alternative is to find a different type when loading them. What is the exact error message you get? Maybe using int instead of np.int could help.
Where did you observe inconsistency, in which variables? Which documentation are you reading?
Dear,
I am trying to run the Sao Paulo pipeline. I have two questions.
When executing stage data.spatial.zones__{...}, I get the following error: "OverflowError: Python int too large to convert to C long"
I found out that the zone_id numbers (column AP_2010_CH in the spatial datafiles) are very big (e.g. 3550308005107). How did you resolve this error when you were running the Sao Paulo pipeline?
I was thinking about renaming the zone_ids but I have not a clear overview in which stages of the pipeline these zone numbers are used (e.g. to connect spatial and census data?). I do not want to break these connections in the pipeline. Could you explain in which stages these zone_ids are used? Or do you know a better way for solving this issue?
When looking at the raw.py for the census data, I noticed the columns that are selected from the original census datafile are these: ['V0001', 'V0011', 'V0221', 'V0222', 'V0601', 'V6036', 'V0401', 'V1004', 'V0010', 'V0641', 'V0642', 'V0643', 'V0644', 'V0628', 'V6529', 'V0504']. Later in the code these are renamed to ["federationCode", "areaCode", "householdWeight", "metropolitanRegion", "personNumber", "gender", "age", "goingToSchool", "employment", "onLeave", "helpsInWork", "farmWork", "householdIncome", "motorcycleAvailability", "carAvailability", "numberOfMembers"]. I looked up the meaning of these codes ("V...") in the documentation accompanying the census data. That is when I noticed that the order of the V-codes and the column names used for renaming is different. Is this correct? Or did I miss a processing step in the pipeline that makes sure that the correct column names are given to the V-codes?
Could you help me with these two questions?
Kind regards,
Lotte
The text was updated successfully, but these errors were encountered: