Replies: 1 comment 1 reply
-
Spark NLP utilizes GPU for annotators/models which use TensorFlow as a backend. What Apache Spark does with GPU has nothing to do with any Deep Learning libraries like TensorFlow, PyTorch, etc. (Apache Spark 3.x only optimizes some SQL queries and joins on GPU). The TensorFlow used in Spark NLP is 2.4.x and it requires CUDA 11 and cuDNN 8.x. The platforms you mentioned all have that already installed. (any decent platform that supports TensorFlow or PyTorch today already comes with CUDA 11 even Google Colab - no need to install/re-install anything just making sure to choose the option that has CUDA 11.x and not 10.x - they still want to support older TensorFlow and PyTorch releases so they do tend to offer CUDA 10.x as well) My recommendation is to first select the correct target like the correct runtime (like in Databricks), correct version (like in EMR), or an instance type (like in Dataproc) that have CUDA 11, and just choose |
Beta Was this translation helpful? Give feedback.
-
With the upgrade to Spark 3.0, Spark can utilize GPU clusters natively with no code change.
I see in the Sparknlp notes that in order to use GPU's, CUDA11 and cuDNN 8.0.2 need to be installed.
Is that the case? Will it not just piggy back off of spark? I'm just wondering how much of a hassle it would be to get an EMR or Dataproc cluster to have those installed.
(Sorry for my limited knowledge of architecture)
Please and thanks
Beta Was this translation helpful? Give feedback.
All reactions