-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Proposal: data mining on positions of users for the 2 repos #1474
Comments
That's highly unethical, i think. |
What is unethical? Analysing corporate culture based on publicly available information people themselves willfully made publicly available ? |
Well, yes. You ask people to sign a letter to support Stallman, and then use their personal data for data mining. That's what corporations are doing, and what they are hated for: collecting personal data and and then selling/abusing it. |
But indeed, the idea is cool. I think that's an interesting research, and you should make it, but some time later, after the situation with Stallman is resolved. |
A more interesting study that I think would have a huge value is to compute the percentage of signers that received money or worked on projects that received money from GAFAM&friends (e.g. GSoC, sponsorship and so on). This is not something you can automate, but I think that once the signatures will settle it could have a huge sociological (and journalistic) value. |
We could probably collect anonymous-ish data, like the percentage of corporate users here vs there and such, though not names and don't build a model. Some simple graphs would be nice to look at and wouldn't harm anyone's privacy. |
|
Is that from our repo or their repo? |
Debian is holding a vote on general solution on the letter. OMG, Debian is corrupted by a small group of people. General resolution: ratify https://github.com/rms-open-letter/rms-open-letter.github.io |
To collect "anonymous-ish" data you need to collect non-anonymous one first. And I see no issue in processing non-anonymous data: it is already public data |
My point is don't list the names of the supporters to blame or something like that. This would give SJW a reason to call out on our behavior. |
it's from open-letter |
This comment was marked as abuse.
This comment was marked as abuse.
Not really 'promoted', we just have a site that every Russian developer reads--Habr, and the letter was posted there. I think no other region has a similar site, hence the bias. |
I don't think the bias is that extreme. If you check one by one, there are ppl from all over the world, including friends from Asian, Australia and Africa. Believe it or not, there are free software activities on the third world too. Besides, due to the pressure from "popular opinion", I think a lot of people actually cannot support us. They might lose job for signing our letter. If someone work for RedHat/SUSE and sign us, then they will probably get fired the same reason as RMS. Oops, this is kind of feel like playing against big crops Even though a lot of us don't work for big companies, we are still users and developers for free software. |
If hackernews didn't nip the post about this letter in the bud, we'd be at 10k signatures right now easily |
This comment was marked as abuse.
This comment was marked as abuse.
Uh oh, beware of the evil Russian hackers!! Seriously... Please cool it with your racism. |
This comment was marked as abuse.
This comment was marked as abuse.
There is quite popular Hacker News, but submission about RMS support letter got
I'd say it's rather counterletter is being heavily dis-promoted in US. |
Oh wow! I posted the link to the support letter on Hackernews too and got flagged too. What a coincidence! |
This comment was marked as abuse.
This comment was marked as abuse.
Some outsiders criticize that a few of our signers' account is new or don't have much activity. We have the same number of signers as them now. However, we have 3.1k PR and 2.7K forks, and the open letters only have 2.3 PR and 2K forks. This may imply that a lot of their signers comes from email and added in bulk. That's interesting. |
This comment was marked as abuse.
This comment was marked as abuse.
I stand for clearness. All signatures we receive have a public source: either it's a comment on a public issue, or a PR, or a email that we publish to #3105. rms-open-letter does not publish sources of bulk signatures. |
Do you want a takedown because of violating GDPR? |
not Russia, but russian speaking. There is a great difference. |
I'm maintaining a (best-effort) list of GitHub account signing the against letter here. https://github.com/BlueFlo0d/fashy-detector/blob/main/github-users.txt |
@BlueFlo0d, this list may be completely useless. Why? Because the only reliable info on the anti-RMS letter can be got only using GitHub. The rest of info mined from the list can be completely bullshit and misinformataion, and we shouldn't trust it and shouldn't relay it. In fact it contained quite some obvious (for the ones knowing Russian) trolling names of non-existent people, like literally "a girl with penis" from "asshole-of-a-homosexulaist-labs" and Vlad(imir)Len(in)a (yes some people were really named in favour of Lenin) daughter-of-sucking-(gerund grammatically)-of-dicks (it is patronim (though may be a matronim in extremily rare circumstances), not a surname, in Russia the respectful way to call a person (especially the one who is older or has a higher status in the situation) is <first name> <patronim>) which at least shows that they haven't done basic checks when acdepting signatures, even if the anti-RMS letter maintainers recorded the signatures faithfully, if anyone can send there bullshit without any difficulty meant that the we cannot even estimate the amount of bullshit there, it may be 1 record, or it may be the whole base of signatories accepted via email. |
Yes I'm aware of that, therefore I'm retrieving information only from commit history and repository metadata. The name maybe used to improve confident level of the mined GitHub IDs though. We know that name can be non sense, but if it's the same name as the GitHub account profile says (and if the GitHub account has sufficient activity), then it's a good indicator that we get a genuine entry. |
Also I only aim for a snapshot of the data before they closed GitHub PR and switch to email completely. |
This comment was marked as abuse.
This comment was marked as abuse.
This comment has been minimized.
This comment has been minimized.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
Why does naming them always cause them to lash out? |
This comment was marked as abuse.
This comment was marked as abuse.
@nukeop, I don't think he said something deserving blocking. I think he should be unblocked.
IMHO only votes of people with FOSS contributions should be counted. Making a PR that is merged into real projects is not easy. Making more than 1 merged PR is even more difficult.
Not very easy. Even registering a email without telling it a phone number is not easy in 2021.
It is irrelevant. The name just makes it obvious that anti-RMS letter accepts "signatures" without enough checks. This automatically means that whole signatures accepted this way are not valid. At least until, the contrary is proved.
You shouldn't tell on behalf of whole Russia. Most of population of Russian Federation and Russia just don't give a f**k about what is happening with Stallman. They don't even know who Stallman is.
It shows that the anti-rms letter doesn't have sufficient checks. If opponents of the anti-rms letter have managed to send some obviously and explicitly fake signatures without being noticed, how many ones the proponents of RMS removal directly interested in making the results be as high as possible could have sent? |
He only came here to troll, not participate in the discussion in a meaningful way, so it's just noise. |
@KOLANICH, while I agree with your arguments, I don't think it's time to talk to people who come and shout their point of view in our ears. I already had some spam in many places, was told to avoid some, etc. |
Even if he has anti-stallman point, it doesn't me that he should be blocked. |
For each repo of {
rms-support-letter/rms-support-letter.github.io
,rms-open-letter/rms-open-letter.github.io
}:stargazers
,forks
owner
s}for each user compute a vector describing its membership in orgs and companies (1 - member, 0 - non-member). Can be detected by orgs and by company field in user's profile
compute an Euler's diagram (a - b, b - a, a ^ b)
for nonintersecting users compute correlation of their position (binary variable, 0 - pro-Stallman, 1 - against-Stallman) to the companies they are members of
sort the results and plot the nice plots
It also may be possible to train an XGBoost model predicting the position from memberships in companies and repos, and then apply SHAP, and again visualize feature importances.
Stated companies and locations
The text was updated successfully, but these errors were encountered: