Cannot download Deep Learning models from SparkNLP model hub #14378

olivierr42 · 2024-08-23T18:31:27Z

Is there an existing issue for this?

I have searched the existing issues and did not find a match.

Who can help?

@maziyarpanahi
I saw you answered to similar requests in the past. Thank you in advance.

What are you working on?

I am working with a in-house dataset. This is not an official exemple. I am trying to use this model specifically:
https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/embeddings/xlm_roberta_embeddings/index.html

I got the same issue when trying to load the SentenceDetectorDL model (mentioned on the Hub for this model)

Current Behavior

When I try to instantiate my pipeline:

  document_assembler = DocumentAssembler().setInputCol(input_col).setOutputCol("document")

  sentencer = SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")

  embeddings = (
      XlmRoBertaSentenceEmbeddings.pretrained("multilingual_e5_base", "xx")
      .setInputCols(["sentence"])
      .setOutputCol(output_col)
  )

  pipeline = Pipeline().setStages([document_assembler, sentencer, embeddings])

I get the following error:

answer = 'xro63'
gateway_client = <py4j.clientserver.JavaClient object at 0x13f3dd710>
target_id = 'z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader'
name = 'downloadModel'

    def get_return_value(answer, gateway_client, target_id=None, name=None):
        """Converts an answer received from the Java gateway into a Python object.
    
        For example, string representation of integers are converted to Python
        integer, string representation of objects are converted to JavaObject
        instances, etc.
    
        :param answer: the string returned by the Java gateway
        :param gateway_client: the gateway client used to communicate with the Java
            Gateway. Only necessary if the answer is a reference (e.g., object,
            list, map)
        :param target_id: the name of the object from which the answer comes from
            (e.g., *object1* in `object1.hello()`). Optional.
        :param name: the name of the member from which the answer comes from
            (e.g., *hello* in `object1.hello()`). Optional.
        """
        if is_error(answer)[0]:
            if len(answer) > 1:
                type = answer[1]
                value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
                if answer[1] == REFERENCE_TYPE:
>                   raise Py4JJavaError(
                        "An error occurred while calling {0}{1}{2}.\n".
                        format(target_id, ".", name), value)
E                   py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
E                   : java.lang.UnsatisfiedLinkError: no jnitensorflow in java.library.path

Expected Behavior

I know support for M1 is experimental, but I would expect it not to crash. Especially since I am able to run Word2Vec models without issue.

Steps To Reproduce

  document_assembler = DocumentAssembler().setInputCol(input_col).setOutputCol("document")

  sentencer = SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")

  embeddings = (
      XlmRoBertaSentenceEmbeddings.pretrained("multilingual_e5_base", "xx")
      .setInputCols(["sentence"])
      .setOutputCol(output_col)
  )

  pipeline = Pipeline().setStages([document_assembler, sentencer, embeddings])

Spark NLP version and Apache Spark

sparknlp = '5.3.3'
pyspark = '3.5.1'

Type of Spark Application

Python Application

Java Version

java version "1.8.0_411"

Java Home Directory

/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home

Setup and installation

poetry add sparknlp=5.3.3

Operating System and Version

Mac M1 Sonomo 14.5

Link to your project (if available)

No response

Additional Information

I do not have issue with Word2Vec models. I also tried with SParkNLP 5.4.1, to no avail.

The text was updated successfully, but these errors were encountered:

maziyarpanahi · 2024-08-23T18:39:01Z

Hi @olivierr42

The support for Apple Silicon is experimental at this point. This is true for all the DL based models/annotators. The Word2Vec is pure written by using machine learning algorithm so it works independent of the operating system.

olivierr42 · 2024-08-23T18:58:16Z

It seems like the issue is with downloading the model. There seems to be a way to load the models from local storage, but I cannot seem to be able to make it work (it's trying to find a assets subfolder within the model folder, which does not exist if I download from the provided url).

Do you have any tips to make it work locally?

maziyarpanahi · 2024-08-24T06:07:45Z

What is the error when downloading models? You can always test it quickly in Google Colab to be sure whether it's the model or your environment.

Spark NLP works 100% offline, you can follow this instruction that shows how to download any model, extract it, and just use .load() instead of .pretrained(): https://sparknlp.org/docs/en/install#offline

PS: Your Spark application must have access to that local path

olivierr42 added the question label Aug 23, 2024

olivierr42 assigned maziyarpanahi Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot download Deep Learning models from SparkNLP model hub #14378

Cannot download Deep Learning models from SparkNLP model hub #14378

olivierr42 commented Aug 23, 2024

maziyarpanahi commented Aug 23, 2024

olivierr42 commented Aug 23, 2024

maziyarpanahi commented Aug 24, 2024

Cannot download Deep Learning models from SparkNLP model hub #14378

Cannot download Deep Learning models from SparkNLP model hub #14378

Comments

olivierr42 commented Aug 23, 2024

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

maziyarpanahi commented Aug 23, 2024

olivierr42 commented Aug 23, 2024

maziyarpanahi commented Aug 24, 2024