Skip to content
View rafaelleinio's full-sized avatar

Block or report rafaelleinio

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rafaelleinio/README.md

hi! 👋

🤓 facts about me:

  • passionate about technologies that change people's lives
  • coding since 15 years old
  • 12+ years of programming, 7+ with python, 7+ in data.
  • my main interests are in open-source, self-service data platforms, MLOps, DDD, and TDD

💼 I have many years of experience building data platforms and dev tools in modern tech organizations. From small startups (< 50) to 10k+ corporations I know how to operate in different growth stages.

📚 I've studied at the Federal University of São Paulo (UNIFESP). I have a bachelor's degree in Science and Technology and another in Computer Science. I have also recently finished a master of science in intelligent systems, with my research in the last 4 years focused on automated anomaly detection for data quality exploring novel architectures of AutoML and Metrics Repository. My university is one of the most prestigious in Brazil, fully funded by the Brazilian government. In the years I was there it was elected top 5 in all LATAM by Time Higher Education.

If you're passionate about data quality like me, you'll definitely like my publication which can be read here.

tech I've worked with:

languages Python (main language), Shell, SQL
dev/data ops Git, Github Actions, Drone CI/CD, CircleCI
Docker, Kubernetes, Helm
Datadog
data oss DBT, Apache Spark (PySpark), Databricks, Airflow, Airbyte
PostgreSQL, Cassandra, MySQL, MongoDB, DynamoDB, Redis, DuckDB
Kafka, NATS.io
cloud AWS: S3, EMR, ECR, Athena, RDS, Redshift, Glue, Lambda, SNS, SQS, EC2
GCP: Composer, Cloud Storage, BigQuery, DataStore, Cloud Run, Compute Engine, Kubernetes Engine, Artifact Registry
OS Linux, MacOs
🐍 libs I ❤️ aiohttp, fastapi, pydantic, typer, scrapy, streamlit, tenacity
Test and Quality: pytest, mypy, flake8, isort, black
ML/AI/DS: langchain, scikit-learn, prophet, merlion, jupyter, pandas, numpy, matplotlib, seaborn

my open source work 🤘

I'm the creator of the following PyPI packages:

  • biar: batteries-included async requests tool for python
  • thoth: Python tool for profiling-based anomaly monitoring on ETL data pipelines leveraging ML and Apache Spark.

I'm also the co-creator of butterfree a tool for feature engineering and feature store. We created this tool when I was in the first MLOps squad at @quintoandar. It's used for most ML data pipelines there and has 260+ stars on GitHub.

other contributions

I've also made contributions to the following awesome open-source libraries:

  • airflow: the biggest open-source orchestration framework, created by Airbnb
  • aws-sdk-pandas: easy data integration with AWS services, created by AWS
  • merlion: a time series forecasting library for python created by SalesForce
  • sageintacct-sdk-py: a python SDK created by the open-source community for Sage Intacct (a market leader for solutions for accounting, payroll, and payments)

my projects

I have a bunch of data engineer test cases which landed me Senior positions in competitive tech companies. So before asking me a take-home assignment, please check these instead 👇

  • strider-challenge: a simple typer and sqlmodel application developed with DDD and TDD
  • pyspark-pipeline: shows the implementation of a pyspark data aggregation pipeline with automated tests
  • legiti-challenge: A nice project solution for building and running pipelines for feature store
  • meli-challenge: a solution for the characters interactions problem using graph and spark

Here's an archive of old college projects (don't judge me 😅):

  • ntsa: repository for codes, reports, and projects for the Nonlinear Time Series Analysis class from Computer Science Master's Degree Course at Federal University of São Paulo (UNIFESP).
  • neural-networs: repository for the projects of the 2019 Neural Networks class at National Institute for Space Research (INPE)
  • software-testing: Repository for the projects of the 2020 Software Testing class at the Federal University of São Paulo (UNIFESP)

let's connect!

Linkedin Badge

@rafaelleinio on Discord

Pinned Loading

  1. quintoandar/butterfree quintoandar/butterfree Public

    A tool for building feature stores.

    Python 285 36

  2. biar biar Public

    batteries-included async requests tool for python

    Python 8

  3. thoth thoth Public

    Python tool for profiling-based anomaly monitoring on ETL data pipelines leveraging ML and Apache Spark.

    Python 15 1

  4. strider-challenge strider-challenge Public

    a simple typer and sqlmodel application developed with DDD and TDD

    Python

  5. legiti-challenge legiti-challenge Public

    A nice project solution for building and running pipelines for feature store.

    Jupyter Notebook 2

  6. meli-challenge meli-challenge Public

    A nice Graph and Spark based solution for the Characters Interactions problem.

    Python 1