Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem replicating the "Using Gin as a data source behind the scenes" walkthrough #779

Open
mslw opened this issue Nov 15, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@mslw
Copy link
Collaborator

mslw commented Nov 15, 2021

Current behavior

I tried to follow the Using Gin as a data source behind the scenes walkthrough from Chapter 8.6 (Dataset hosting on GIN) verbatim, but didn't succeed. There is an error when configuring the gin remote as a common data source, which I don't know how to correct. If I ignore it and carry on, the dataset gets pushed to GitHub & Gin, but without the link between the two (i.e. getting from GitHub clone doesn't work). Commands and their outputs below.

As a side note - I assume that the git + gin part of the walkthrough is meant to be stand-alone (starts by creating a new repository on Gin & uses different repo names than the examples before). If, however, the steps depends on some previous configuration, then the wording might need tweaking.

datalad siblings add -d . --name gin-update --pushurl [email protected]:/msz/gin-as-data-source.git --url https://gin.g-node.org/msz/gin-as-data-source --as-common-datasrc gin
[INFO   ] Could not enable annex remote gin-update. This is expected if gin-update is a pure Git remote, or happens if it is not accessible. 
CommandError: 'git -c diff.ignoreSubmodules=none annex initremote gin type=git location=https://gin.g-node.org/msz/gin-as-data-source autoenable=true -c annex.dotfiles=true' failed with exitcode 1 under /home/mszczepanik/Documents/tmp/gg
initremote gin 
failed
git-annex: could not find existing git remote with specified location
initremote: 1 failed

datalad siblings                                                                                                                      1 !
.: here(+) [git]
[WARNING] Could not detect whether gin-update carries an annex. If gin-update is a pure Git remote, this is expected.  
.: gin-update(-) [https://gin.g-node.org/msz/gin-as-data-source (git)]

git config --unset-all remote.gin-update.annex-ignore

datalad push --to gin-update
copy(ok): hubert-neufeld-j-udI4zim2E-unsplash.jpg (file) [to gin-update...]                                                                 
copy(ok): jay-ruzesky-9zTafGVsv-c-unsplash.jpg (file) [to gin-update...]                                                                    
copy(ok): paul-carroll-Y-nyDv3TWm0-unsplash.jpg (file) [to gin-update...]                                                                   
publish(ok): . (dataset) [refs/heads/git-annex->gin-update:refs/heads/git-annex ce10543..0fb67dd]                                           
publish(ok): . (dataset) [refs/heads/main->gin-update:refs/heads/main [new branch]]                                                         
action summary:                                                                                                                             
  copy (ok: 3)
  publish (ok: 2)

datalad siblings
.: here(+) [git]
.: gin-update(+) [https://gin.g-node.org/msz/gin-as-data-source (git)]

datalad siblings add --dataset . --name github --url [email protected]:mslw/gin-mirror.git                                               1 !
[INFO   ] Could not enable annex remote github. This is expected if github is a pure Git remote, or happens if it is not accessible. 
[WARNING] Could not detect whether github carries an annex. If github is a pure Git remote, this is expected. Remote was marked by annex as annex-ignore. Edit .git/config to reset if you think that was done by mistake due to absent connection etc 
.: github(-) [[email protected]:mslw/gin-mirror.git (git)]

datalad push --to github
publish(ok): . (dataset) [refs/heads/main->github:refs/heads/main [new branch]]                                                             
publish(ok): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [new branch]]                                                   
action summary:                                                                                                                             
  publish (ok: 2)

cd ..

datalad clone [email protected]:mslw/gin-mirror.git 
[INFO   ] Unable to parse git config from origin                                                                                            
[INFO   ] Remote origin does not have git-annex installed; setting annex-ignore                                                             
[INFO   ] This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin 
install(ok): /home/mszczepanik/Documents/tmp/gin-mirror (dataset)

cd gin-mirror

datalad get hubert-neufeld-j-udI4zim2E-unsplash.jpg
get(error): hubert-neufeld-j-udI4zim2E-unsplash.jpg (file) [not available; (Note that these git remotes have annex-ignore set: origin)]

Operating system

Debian GNU/Linux 11 (bullseye)

DataLad information

Datalad version 0.15.0
Git annex version 8.20210903-1~ndall+1

Browser

No response

Additional context

The git annex error from datalad siblings add -d . --name gin-update ... --as-common-datasrc gin says "could not find existing git remote with specified location". But that doesn't mean that I need to configure something previously, right? I'd expect the command to do the configuration.

Coincidentally, I also ran into the same error message in a completely separate dataset and usecase, when I experimented with git annex initremote (to change a local remote into a special one). So that might be something to do with my setup, but I think that one was replicated by another person.

@mslw mslw added the bug Something isn't working label Nov 15, 2021
@welcome
Copy link

welcome bot commented Nov 15, 2021

Welcome Banner (Image: CC-BY license, The Turing Way Community, & Scriberia. Zenodo. http://doi.org/10.5281/zenodo.3332808) Hi there, and welcome to the DataLad Handbook! 📙 👋 Thank you for filing an issue. We're excited to have your input and welcome your idea! 😊 If you haven't done so already, please make sure you check out our Code of Conduct.

@mslw
Copy link
Collaborator Author

mslw commented Nov 15, 2021

Huh, now I cannot replicate my own problem... I'll go through it once more tomorrow, checking my bash history, but it seems that the sample code is fine and the description is also correct.

@adswa
Copy link
Contributor

adswa commented Nov 16, 2021

Sounds good!

@mslw
Copy link
Collaborator Author

mslw commented Nov 16, 2021

For the record, it seems that the git-annex: could not find existing git remote with specified location error happens if the repository is public, but not when it's private. We opened a new issue in datalad and also referenced an older datalad issue about .git suffix confusion as the presence/absence of .git in URLS produces different problems.

That being said, in a private repository and without .git at the end of https URL everything seems to work, so the handbook example still seems fine.

@mslw
Copy link
Collaborator Author

mslw commented Nov 16, 2021

I am still at a loss here.

Private repo. The datalad siblings add succeeds:

datalad siblings add \
> -d . \
> --name gin-update \
> --pushurl [email protected]:/msz/gin-as-data-source.git \
> --url https://gin.g-node.org/msz/gin-as-data-source \  
> --as-common-datasrc gin
[INFO   ] Could not enable annex remote gin-update. This is expected if gin-update is a pure Git remote, or happens if it is not accessible. 
[WARNING] Could not detect whether gin-update carries an annex. If gin-update is a pure Git remote, this is expected. Remote was marked by annex as annex-ignore. Edit .git/config to reset if you think that was done by mistake due to absent connection etc 
.: gin-update(-) [https://gin.g-node.org/msz/gin-as-data-source (git)]

... and I'm able to follow through with the following steps:

git config --unset-all remote.gin-update.annex-ignore 
datalad push --to gin-update
(...)
action summary:                                                                                                                             
  copy (ok: 3)
  publish (ok: 2)
datalad siblings add -d . --name github --url [email protected]:mslw/data-in-gin.git
[INFO   ] Could not enable annex remote github. This is expected if github is a pure Git remote, or happens if it is not accessible. 
[WARNING] Could not detect whether github carries an annex. If github is a pure Git remote, this is expected. Remote was marked by annex as annex-ignore. Edit .git/config to reset if you think that was done by mistake due to absent connection etc 
.: github(-) [[email protected]:mslw/data-in-gin.git (git)]
datalad siblings
.: here(+) [git]
.: gin-update(+) [https://gin.g-node.org/msz/gin-as-data-source (git)]
.: github(-) [[email protected]:mslw/data-in-gin.git (git)]
datalad push --to github
(...)
action summary:                                                                                                                             
  publish (ok: 2)

But then, in a clone made from github, the get operation fails:

datalad get jay-ruzesky-9zTafGVsv-c-unsplash.jpg
get(error): jay-ruzesky-9zTafGVsv-c-unsplash.jpg (file) [not available; (Note that these git remotes have annex-ignore set: origin)]

There is a trace of the gin repository being recorded:

git annex whereis jay-ruzesky-9zTafGVsv-c-unsplash.jpg
whereis jay-ruzesky-9zTafGVsv-c-unsplash.jpg (2 copies) 
  	84756a04-af88-439c-9c27-58aea2971eb1 -- mszczepanik@bnbnb64:~/Documents/tmp/gingit
   	ee707470-5aa1-42cd-b47f-157dba749717 -- git@8242caf9acd8:/data/repos/msz/gin-as-data-source.git

But the clone does not have any remotes other than its GitHub origin:

datalad siblings
.: here(+) [git]
.: origin(-) [[email protected]:mslw/data-in-gin.git (git)]

(not sure if relevant, but also note - above - that the source repository only had one remote, gin-update, not gin)

@adswa
Copy link
Contributor

adswa commented Nov 23, 2021

making a note of a comment with a todo: datalad/datalad#6204 (comment)

mslw added a commit that referenced this issue Dec 8, 2021
Suggested add - push - configure - push as a correct sequence. Addresses #779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants