This repository includes the materials for the PySpark workshop in AMLD2019.
See INSTALLATION_UNIX.md in the docs
folder.
See INSTALLATION_WINDOWS.md in the docs
folder.
See GOOGLECOLAB_README.md in the docs
folder.
If you run PySpark on your laptop then start with the notebook data_processing_start.ipynb in the src
folder.
If you run PySpark on Google Colab then start with the notebook data_processing_gc_start.ipynb in the src
folder.
If you run PySpark on your laptop then start with the notebook spark_mllib_start.ipynb in the src
folder.
If you run PySpark on Google Colab then start with the notebook spark_mllib_gc_start.ipynb in the src
folder.
See AWS_README.md in the docs
folder.