Skip to content
This repository has been archived by the owner on Sep 24, 2022. It is now read-only.

Research questions

mx edited this page May 1, 2020 · 1 revision

This page lists questions and research areas that may help with directing this group's efforts.

Distribution of open-source projects repository hosting

Knowing these numbers will tell us where to look and where it is not worth looking.

  • GitHub
  • GitLab
  • SourceForge (still has some ecellent projects there)
  • Apache Foundation (have they all moved to GitHub by now?)
  • Self-hosted (e.g. Postgres)
  • Other

Project size to docs size

Correlating project metrics to documentation metrics may help us assess documentation maturity levels and their ditribution.

Project metrics:

  • number of files
  • total size in lines of code
  • total size of code in bytes
  • number of contributors
  • contributor per country
  • number of forks, stars, downloads
  • number of commits
  • amount of comments in the code
  • NPM, NuGet, RubyGems and other distribution numbers

Documentation metrics:

  • number of documents
  • document format (e.g. .md, .txt, .rst)
  • file names
  • number of commits to doc files
  • correlation of code commits to docs commits or a time series for both
  • number of contributors to doc files and their country
  • size of doc files in lines
  • size of doc files in bytes
  • number of images
  • code examples (.md files only?)
  • grammar score
  • translations

ReadMe analysis

Since ReadMe files are so ubiquitous it may be helpful to take a deeper look at their structure and contents.

  • Common headings / sections
  • Relation to other docs in the project
  • Basic linguistic analysis, e.g. corpus of words