Rafael Leiniö rafaelleinio

hi! 👋

🤓 facts about me:

passionate about technologies that change people's lives
coding since 15 years old
12+ years of programming, 7+ with python, 7+ in data.
my main interests are in open-source, self-service data platforms, MLOps, DDD, and TDD

💼 I have many years of experience building data platforms and dev tools in modern tech organizations. From small startups (< 50) to 10k+ corporations I know how to operate in different growth stages.

📚 I've studied at the Federal University of São Paulo (UNIFESP). I have a bachelor's degree in Science and Technology and another in Computer Science. I have also recently finished a master of science in intelligent systems, with my research in the last 4 years focused on automated anomaly detection for data quality exploring novel architectures of AutoML and Metrics Repository. My university is one of the most prestigious in Brazil, fully funded by the Brazilian government. In the years I was there it was elected top 5 in all LATAM by Time Higher Education.

If you're passionate about data quality like me, you'll definitely like my publication which can be read here.

tech I've worked with:


languages	Python (main language), Shell, SQL
dev/data ops	Git, Github Actions, Drone CI/CD, CircleCI Docker, Kubernetes, Helm Datadog
data oss	DBT, Apache Spark (PySpark), Databricks, Airflow, Airbyte PostgreSQL, Cassandra, MySQL, MongoDB, DynamoDB, Redis, DuckDB Kafka, NATS.io
cloud	AWS: S3, EMR, ECR, Athena, RDS, Redshift, Glue, Lambda, SNS, SQS, EC2 GCP: Composer, Cloud Storage, BigQuery, DataStore, Cloud Run, Compute Engine, Kubernetes Engine, Artifact Registry
OS	Linux, MacOs
🐍 libs I ❤️	aiohttp, fastapi, pydantic, typer, scrapy, streamlit, tenacity Test and Quality: pytest, mypy, flake8, isort, black ML/AI/DS: langchain, scikit-learn, prophet, merlion, jupyter, pandas, numpy, matplotlib, seaborn

my open source work 🤘

I'm the creator of the following PyPI packages:

biar: batteries-included async requests tool for python
thoth: Python tool for profiling-based anomaly monitoring on ETL data pipelines leveraging ML and Apache Spark.

I'm also the co-creator of butterfree a tool for feature engineering and feature store. We created this tool when I was in the first MLOps squad at @quintoandar. It's used for most ML data pipelines there and has 260+ stars on GitHub.

other contributions

I've also made contributions to the following awesome open-source libraries:

airflow: the biggest open-source orchestration framework, created by Airbnb
aws-sdk-pandas: easy data integration with AWS services, created by AWS
merlion: a time series forecasting library for python created by SalesForce
sageintacct-sdk-py: a python SDK created by the open-source community for Sage Intacct (a market leader for solutions for accounting, payroll, and payments)

my projects

I have a bunch of data engineer test cases which landed me Senior positions in competitive tech companies. So before asking me a take-home assignment, please check these instead 👇

strider-challenge: a simple typer and sqlmodel application developed with DDD and TDD
pyspark-pipeline: shows the implementation of a pyspark data aggregation pipeline with automated tests
legiti-challenge: A nice project solution for building and running pipelines for feature store
meli-challenge: a solution for the characters interactions problem using graph and spark

Here's an archive of old college projects (don't judge me 😅):

ntsa: repository for codes, reports, and projects for the Nonlinear Time Series Analysis class from Computer Science Master's Degree Course at Federal University of São Paulo (UNIFESP).
neural-networs: repository for the projects of the 2019 Neural Networks class at National Institute for Space Research (INPE)
software-testing: Repository for the projects of the 2020 Software Testing class at the Federal University of São Paulo (UNIFESP)

let's connect!

@rafaelleinio on Discord

Provide feedback

Saved searches

Use saved searches to filter your results more quickly