The code was written in Python 3 and requires the following packages: Pandas, Numpy, Collections, Matplotlib, Seaborn, Scipy and Warnings.
The motivation behind this analysis is to explore how data scientists compare with other non-data scientist software developers ("non-data scientists") with regard to demographics, programming languages used, coding experience and job satisfaction. Consequently, in this analysis, I set out to answer the following questions, using data collected by Stack Overflow as part of their 2018 Annual Developer Survey:
- How does the demographic profile of data scientists differ from that of non-data scientists?
- What programming languages do data scientists favour and how do they differ from those used by non-data scientists?
- How much coding experience do data scientists have compared to non-data scientists?
- Are data scientists more satisfied with their jobs/careers than non-data scientists?
All analysis is contained in the Jupyter notebook DS Survey Analysis.ipynb.
To run this code, it is first necessary to download the 2018 Stack Overflow Develop Survey dataset from https://insights.stackoverflow.com/survey. The folder containing this data (developer_survey_2018) should then be saved in the current working directory in a folder named "Data".
The main findings of this analysis are summarised in a blog post available here.
The dataset used in this analysis was created by Stack Overflow and made available for download under the Open Database License (ODbL).
The code contained in this repository may be used freely with acknowledgement.