Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--is-pipeline relative import error #52

Open
joeyzhou98 opened this issue Aug 27, 2019 · 3 comments
Open

--is-pipeline relative import error #52

joeyzhou98 opened this issue Aug 27, 2019 · 3 comments
Labels
bug Something isn't working

Comments

@joeyzhou98
Copy link

joeyzhou98 commented Aug 27, 2019

As I am trying to write a new crawler for Zenodo, I was trying to find a way to test and execute existing pipeline to observe expected behavior. The problem is when executing
datalad crawl --is-pipeline datalad_crawler/pipelines/<pipeline>.py
There seems to be an relative import error. So my question is how do we successfully test crawling existing pipelines with --is-pipeline flag? I tested multiple different paths and all gave me the same error:
[ERROR ] Failed to import pipeline from datalad_crawler/pipelines/nda.py: attempted relative import with no known parent package [nda.py:<module>:13] [pipeline.py:load_pipeline_from_module:403] (RuntimeError)
I chose randomly nda.py as a pipeline for testing.

image

Edit: It would be great if the documentation could be more in-depth

@kyleam
Copy link
Collaborator

kyleam commented Aug 27, 2019

Thanks for reporting.

I suspect --is-pipeline doesn't get much use, and it doesn't look to be covered by tests. load_pipeline_from_module() has a bit that's supposed to handle these relative imports:

    dirname_ = dirname(module)
    assert(module.endswith('.py'))
    try:
        sys.path.insert(0, dirname_)
        modname = basename(module)[:-3]
        # to allow for relative imports within "stock" pipelines
        if dirname_ == opj(dirname(__file__), 'pipelines'):
            mod = __import__('datalad_crawler.pipelines.%s' % modname,
                             fromlist=['datalad_crawler.pipelines'])
        else:
            mod = __import__(modname, level=0)

The problem is that we don't go down the if-arm because the condition assumes __file__ will be a relative path, which isn't necessarily the case. As a quick and dirty fix, we can work around this with

diff --git a/datalad_crawler/pipeline.py b/datalad_crawler/pipeline.py
index a23c70e..4f117c3 100644
--- a/datalad_crawler/pipeline.py
+++ b/datalad_crawler/pipeline.py
@@ -50,6 +50,7 @@
 
 import sys
 from glob import glob
+from os.path import abspath
 from os.path import dirname, join as opj, isabs, exists, curdir, basename
 from os import makedirs
 
@@ -391,7 +392,7 @@ def load_pipeline_from_module(module, func=None, args=None, kwargs=None, return_
         sys.path.insert(0, dirname_)
         modname = basename(module)[:-3]
         # to allow for relative imports within "stock" pipelines
-        if dirname_ == opj(dirname(__file__), 'pipelines'):
+        if abspath(dirname_) == opj(abspath(dirname(__file__)), 'pipelines'):
             mod = __import__('datalad_crawler.pipelines.%s' % modname,
                              fromlist=['datalad_crawler.pipelines'])
         else:

But that just gets us to another failure:

$ datalad crawl --is-pipeline datalad_crawler/pipelines/nda.py
[INFO   ] Loading pipeline definition from datalad_crawler/pipelines/nda.py 
[ERROR  ] Failed to import pipeline from datalad_crawler/pipelines/nda.py: pipeline() missing 1 required positional argument: 'collection' [pipeline.py:load_pipeline_from_module:402] [pipeline.py:load_pipeline_from_module:404] (RuntimeError) 

So --is-pipeline needs some attention.

@kyleam kyleam added the bug Something isn't working label Aug 27, 2019
@joeyzhou98
Copy link
Author

@kyleam thanks for the quick reply!

How would you test new or existing pipelines as in what are the commands to execute them?

@kyleam
Copy link
Collaborator

kyleam commented Aug 27, 2019

what are the commands to execute them?

Have you tried following the demo here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants