GPU cluster on EMR #6588

ericbugin · 2021-12-08T16:38:24Z

ericbugin
Dec 8, 2021

With the upgrade to Spark 3.0, Spark can utilize GPU clusters natively with no code change.

I see in the Sparknlp notes that in order to use GPU's, CUDA11 and cuDNN 8.0.2 need to be installed.

Is that the case? Will it not just piggy back off of spark? I'm just wondering how much of a hassle it would be to get an EMR or Dataproc cluster to have those installed.

(Sorry for my limited knowledge of architecture)

Please and thanks

maziyarpanahi · 2021-12-08T17:02:10Z

maziyarpanahi
Dec 8, 2021
Maintainer

Spark NLP utilizes GPU for annotators/models which use TensorFlow as a backend. What Apache Spark does with GPU has nothing to do with any Deep Learning libraries like TensorFlow, PyTorch, etc. (Apache Spark 3.x only optimizes some SQL queries and joins on GPU).

The TensorFlow used in Spark NLP is 2.4.x and it requires CUDA 11 and cuDNN 8.x. The platforms you mentioned all have that already installed. (any decent platform that supports TensorFlow or PyTorch today already comes with CUDA 11 even Google Colab - no need to install/re-install anything just making sure to choose the option that has CUDA 11.x and not 10.x - they still want to support older TensorFlow and PyTorch releases so they do tend to offer CUDA 10.x as well)

My recommendation is to first select the correct target like the correct runtime (like in Databricks), correct version (like in EMR), or an instance type (like in Dataproc) that have CUDA 11, and just choose spark-nlp-gpu Maven instead of spark-nlp. No need to install anything else or change any code anywhere. 😊 (we use GPU cluster on Databricks and EMR all the time, all compatible and pretty easy with zero code change)

1 reply

ericbugin Dec 8, 2021
Author

that is super helpful. thank you much @maziyarpanahi !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU cluster on EMR #6588

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

GPU cluster on EMR #6588

ericbugin Dec 8, 2021

Replies: 1 comment · 1 reply

maziyarpanahi Dec 8, 2021 Maintainer

ericbugin Dec 8, 2021 Author

ericbugin
Dec 8, 2021

Replies: 1 comment 1 reply

maziyarpanahi
Dec 8, 2021
Maintainer

ericbugin Dec 8, 2021
Author