The content of this course is still under construction and will be updated throughout the semester.
Welcome to the Data Science Course repository for the Fall 2024 semester! This course is designed to provide a comprehensive introduction to Data Science, covering key concepts, tools, and techniques used in the field. Whether you are a beginner or have some prior experience, this course will help you build a solid foundation and enable you to work on practical projects.
The course is divided into several modules, each focusing on a different aspect of Data Science. You will find the following resources organized in each module folder:
- Lecture Notes: Detailed notes covering key concepts and explanations.
- Notebooks: Interactive Jupyter notebooks with examples and exercises.
- Readings: Supplementary reading materials and references.
- Exercises: Hands-on exercises to practice the concepts covered in the lectures.
- Projects: Capstone projects to apply your knowledge in real-world scenarios.
Each module will focus on a core component of Data Science:
- Introduction to Data Science
- Python Basics
- Data Collection and Cleaning
- Exploratory Data Analysis (EDA)
- Data Visualization
- Statistical Inference
- Machine Learning Basics
- Supervised Learning
- Unsupervised Learning
- Feature Engineering
- Model Evaluation and Tuning
- Introduction to Deep Learning
- Final Project
-
Clone the repository: You can clone this repository to your local machine using the following command:
git clone https://github.com/your-username/data-science-course-fall2024.git
-
Navigating the Modules: Each module has its own folder with specific materials, exercises, and projects. Follow the order of modules as outlined in the syllabus, and complete the exercises before moving to the next module.
-
Working with Jupyter Notebooks: Jupyter notebooks will be used throughout the course for coding exercises. You can launch Jupyter by running:
jupyter notebook
-
Submitting Assignments: Each exercise and project folder will include submission instructions. Typically, you’ll be asked to submit your Jupyter notebook solutions via a pull request or directly on the course platform.
This course runs for 14 weeks, with each week focusing on a specific module. The recommended timeline is as follows:
- Week 1-2: Introduction to Data Science & Data Collection
- Week 3-4: Python Basics
- Week 5-6: Data Cleaning & EDA
- Week 7-8: Data Visualization & Statistical Inference
- Week 9-10: Machine Learning Basics & Supervised Learning
- Week 11-12: Unsupervised Learning & Feature Engineering
- Week 13-14: Model Evaluation & Final Project
In this course, we will be using the following tools and libraries:
- Python: The main programming language for this course.
- Pandas: For data manipulation.
- NumPy: For numerical computations.
- Matplotlib/Seaborn: For data visualization.
- Scikit-Learn: For machine learning algorithms.
- Jupyter Notebooks: For interactive coding.
- Google Colab: For collaborative and team-based working.
Jupyter Notebook is an essential tool for data science, allowing you to create and share documents that contain live code, equations, visualizations, and narrative text. Here's how you can install it on your system.
Anaconda is a popular platform that comes with Jupyter Notebook and a variety of other essential data science libraries pre-installed. This is the easiest way to get started.
-
Download Anaconda:
- Go to the official Anaconda website: https://www.anaconda.com/products/distribution.
- Download the installer for your operating system (Windows, macOS, or Linux).
-
Install Anaconda:
- Run the installer and follow the instructions.
- Make sure to select the option to add Anaconda to your system’s PATH environment (recommended).
-
Launch Jupyter Notebook:
- Open Anaconda Navigator (you can search for it in your start menu or applications).
- From the Anaconda Navigator interface, click on the "Launch" button under Jupyter Notebook.
Alternatively, you can launch Jupyter Notebook directly via the command line:
- Open your command prompt (Windows) or terminal (macOS/Linux).
- Type:
jupyter notebook
- A new tab should open in your default browser with the Jupyter Notebook interface.
If you already have Python installed on your system and prefer a lightweight installation without Anaconda, you can install Jupyter Notebook using pip
.
-
Install Python and pip:
- If you don't already have Python installed, download it from: https://www.python.org/downloads/.
- During installation, ensure you check the option to "Add Python to PATH."
-
Install Jupyter Notebook:
- Open your command prompt (Windows) or terminal (macOS/Linux).
- Run the following command to install Jupyter:
pip install notebook
-
Launch Jupyter Notebook:
- Once installation is complete, launch Jupyter Notebook by typing:
jupyter notebook
- This will start the notebook server and open the Jupyter interface in your web browser.
- Once installation is complete, launch Jupyter Notebook by typing:
If you're looking for a cloud-based solution that doesn't require installation, you can use Google Colaboratory (Colab). Colab allows you to write and execute Python code in your browser with access to various data science libraries, and it’s free to use.
- Go to Google Colab.
- Log in with your Google account.
- Create a new notebook by clicking on "New Notebook".
- Start coding! Colab supports many of the same features as Jupyter and is great for collaborative projects.
The final project will allow you to apply everything you’ve learned in a real-world dataset analysis or machine learning problem. Detailed instructions and datasets will be provided in the final project folder. This project is a critical component of the course, and it will serve as your portfolio piece.
Feel free to contribute to the course materials by creating a pull request. If you find a bug or have suggestions for improvements, please open an issue in the repository.
For any questions or issues, feel free to reach out:
- Instructor: Abbas Pak
Happy learning and coding! 🎓🚀