Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Familiarisation #2

Open
brylie opened this issue May 31, 2024 · 0 comments
Open

Data Familiarisation #2

brylie opened this issue May 31, 2024 · 0 comments
Assignees

Comments

@brylie
Copy link
Member

brylie commented May 31, 2024

Look at the structure of the data, data types, volume, and velocity of the data.

Goals and Purpose:

  • Understand the basic characteristics of the data.
  • Identify any immediate issues or considerations for data handling.
  • Establish a foundation for further data analysis and processing.

Steps:

  1. Load the dataset into a suitable environment (e.g., JupyterLab).
  2. Examine the structure of the data, including:
    • Columns and rows
    • Data types of each attribute
    • Basic statistics (e.g., mean, median, standard deviation)
  3. Assess the volume of the data (e.g., number of records, file size).
  4. Analyze the velocity of the data (e.g., frequency of data updates, data streaming considerations).
  5. Document your findings, highlighting any potential issues or areas for further investigation.
  6. Share the data familiarization report with the team and gather feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants