-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing nationality with citizenship and religion #27
Comments
Thank you for opening the issue! I agree that some regional names we used here to mean a group of countries may lead to confusion (definitely in the case of Muslim), but I do want to clarify that we used similar terminology as described in this paper. Figure 5 shows their 39-leaf nationality taxonomy similar to what we used as our categories. Our specific country to region mapping can be found in this online file @arielah Should we write "Jewish" here instead of "Israeli"? Nonetheless, I agree with you that "nationality" may be a confusing term. Perhaps "Estimation of name origins" would be more proper? |
Thanks @idoerg for the feedback. I wanted to embed the figure from http://www.name-prism.com/about that we based these categories on: Here is the portion of the manuscript describing how we extracted a country for each living person on Wikipedia:
So I bolded the two strategies. The first detects place of birth for a name. The second seems to detect nationality/citizenship (or whatever Wikipedia curators consider to be the person's primary country). @arielah is that correct? We are not tied to using the Name-Prism region hierarchy. So if there is a better way to group countries, we could consider that. We can also consider changing our terminology. @cgreene maybe we can collect additional feedback and take a bit of time to research this topic further.And @idoerg, of course, any additional feedback you provide is greatly appreciated! I think that we want to stay with an approach that is based on inferring countries from names, because that is what the Wikipedia dataset supports. So we should update our language and analyses as needed to reflect this.
I like "Estimation of name origins", but am not really an expert on whether that would also have misleading connotations. |
The problem here is threefold: (1) confusing citizenship (a legal concept) with nationality (a mostly social concept, and overloaded with different, sometimes contradictory meanings in different, um, nationalities) (2) inferring citizenship from name which is problematic at best, especially in countries with large immigrant populations and/or large ethnic diversity (India is has the third largest Muslim population in the world, yet it is not a Muslim majority country), and (3) doing so with scientists who tend to be have a large representation of immigrants /expats. E.g. Many of the Muslim and Israeli names you put up there are American citizens / residents. I don't think the dataset supports that.
Probably not. The names you gave are (mostly modern) Hebrew names, if anything. Hebrew names are a subset of Jewish names (again, many of which can be misclassified as European, Muslim, African, etc.). Diasporic ethnic minorities are a problem to classify geographically, due to being, well, dispersed. |
I would say "name etymology". |
BTW, if it is geographical information you want, just use the geographic information in the mesh headings or in the author affiliation. |
@idoerg : The goal of this first effort is to measure honor and authorship rates. I agree that what we are observing are differences by name etymology, which is a more precise phrasing. The long-term objective would be to understand are reasons behind disparities in invitation rates. In this case, we might want to know if scientists within certain geographic regions (say, the US and Europe) by affiliations but with predictions denoting a high confidence of East Asian name etymology are also honored at lower rates or if the disparities arise from geographic bias from the organizations doing the honoring. Thank you for your comment - it has been really helpful in clarifying my thinking on this. I propose that we switch to the "name etymology" term now and more fully lay out potential future avenues of research that would get at the underlying disparities more precisely towards the end of the results or the start of the conclusions. |
Looking some more into the term "nationality", I am starting to think that it is the least inaccurate word for what we're extracting from Wikipedia (a mix of place of birth and country adjectives). From https://www.merriam-webster.com/dictionary/nationality From https://en.wikipedia.org/wiki/Nationality
I see how collapsing nations into the Name-Prism categories, which are labeled by things such as religion, creates confusion and is a leap from nationality.
I think we need to be clear that we're using the Wikipedia country extraction as a proxy for nationality. That it's not an exact match of nationality, but it seems like we are assigning the correct nationality to the overwhelming majority of Wikipedia names, if we are to go off of the definitions above. Thoughts? |
@dhimmel : I am not sure that we want to look specifically at nationality with our analysis. If there is a bias against honoring scientists with a family history in a country within a grouping I think we would want to detect that, even if it is not due to current nationality. I agree that the Name Prism categories are a large leap from nationality. |
From reading more of the wikipedia documentation, I agree with the comments that what we have at our disposal is what Wikipedia editors interpret to be nationality. We need to increase the specificity of how we describe this. |
I disagree with @dhimmel. Nationality is probably the most inaccurate wording you can used, given that there are 5 definitions in MW, some contradictory. The image @dhimmel has shown is the exact confusion of nationality as synonymous with citizenship. Go with 5: So I looked a bit deeper into the labeling table you were using, country_to_region.tsv This made me chuckle: Not sure why "Italian" is there? So I looked a bit more into this table, and is many things there seem to be patently wrong:
These are not anomalies, many, probably most, entries in country_to_region.tsv are wrong, some are completely arbitrary, and I am not sure what this table is trying to represent. Which, finally, explains to me why the categories in Table 1 are so confusing. |
There are a few things to address here. We are continuing to make revisions to both the figures and the text. The categories on the rightmost column that you are referring to are not used, so we will remove those for clarity. I did want to briefly address:
We performed the pubmed analysis with the exact concern about what would happen if a classification call was particularly inaccurate. We don't see the "New Jersey" effect that you propose there. |
Probably because you did not train on New Jersey names :) Do this:
Two things will happen: 1: Israel will not be over-represented anymore, because it is not a region of its own. Repeat with 🇳🇱 🇩🇪 etc. Then take all Nordic countries together. Use Nordic names. The specific issue is that a regional classification where any single country, especially a small one, comprises its own region, while all the other countries get rolled into multi-country regions doesn't make sense to me. The bigger picture here is that any regional division will probably yield different results. What if you separated Europe into Nordic and everything else? Germanic and everything else? EMBO? Why is there Hispanic and "Celtic English" but no Francophone (which will include, in addition to France and Switzerland, Quebec, New Orleans, and countries that are now lumped in Africa or East Asia). How about separating Japan from the rest of East Asia? |
Acknowledging that the names of the groups of countries were not appropriate as some were by country, some were by region, and others were by religion, we have retained the data-driven groupings but selected more appropriate names for the name origin groups. We also performed an analysis of author affiliations as suggested by @idoerg and the reviewers to detect the affiliated countries for authors and honorees (see #35, #36 and #87). This is a major improvement of the study because, in the past work, an author's affiliation and name origin have some chance of being interlinked. Now we can directly examine geographic discrepancies. Within the most represented country (the US), we also examined differences by name origin to remove the geographic confounded. We found that both components (geography and name origin) still play a role. |
Seems like Figure 2 confounds citizenship with religion and nationality.
Citizenship is a pretty clear term: there is a fairly straightforward legal definition of what citizenship is in each country.
Nationality os more vague: in the US, it is often confused with citizenship. But actually in the US, a US national may not be a US citizen.
In other countries , there are legal or common-law definition of nationality. They vary, and they may not be post-enlightenment textbook history definitions. Many people identify themselves with their nationality first, and their citizenship second. In countries where a nationality equals a minority or majority equity issue, you may be missing out on a lot of equity issues this paper is supposed to highlight.
Celtic English: an ancestry, at best.
European: regional definition, losing considerable nuances of ethnicity, race, and nationality.
Hispanic: In the US this has discriminated minority connotations, but this can include a variety of people, including hispanic names that are common in former Spanish colnoies in Africa?
East Asian again, like European, a grab-all bag that does not really
Muslim a religion overlaps with all of the above (and below)
South Asian: again: Muslim names from this region, that includes the largest Muslim population in the world, would go to “Muslim”.
African: subsaharan africa is probably the most diverse region on earth -- genetically as well as ethnically -- lumped in one category.
Israeli: names in the example are all of Israeli Jews mostly of certain disaspora origins. Israelis named Muhammad, Sergey, Adisu would go to the Muslim, European, and African categories, respectively.
Bottom line: not sure what to do, but don’t call it “nationality”. Perhaps “Rough historical name groupings”.
The text was updated successfully, but these errors were encountered: