Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few tests, refactoring #44

Merged
merged 18 commits into from
Oct 15, 2023
Merged

Conversation

felix-schott
Copy link
Collaborator

@felix-schott felix-schott commented Oct 13, 2023

Hi @cholmes,

As promised, I added a few tests (#36) and refactored the code a bit. I prefer to work in containers, so I added a Dockerfile for developing as well. When working in a container, you can avoid things like the issue Darren reported (DuckDB spatial extension not installed) as you develop in a more controlled environment. If you're on linux, you can simply run the dev-container.sh script or make use of the VSCode container extensions.

Apart from adding tests, I changed a few things:

  • make the download function (more) stand-alone, decouple from CLI
  • add docstring to download function
  • centralised settings module
  • some input validation
  • type hints
  • allow dst to be a directory

Sorry for overloading this PR, some changes could have arguably been a separate PR. There are more tests that could/should be added, I focussed on the download function for now (might add more in the future).

Also, I updated the CI/CD pipeline - I'm not too familiar with Github actions and wasn't sure how to test, so not guaranteed that this will work (we'll find out).

One important thing: I added tests for all the different data formats. Downloads for all pass except for shapefile - at least in my environment. Not sure if this is a bug or something wrong with my setup? Has this ever worked? Hash out the if != shapefile bit in the test to reproduce.

Thanks for reviewing these changes, let me know if anything is unclear.
Felix

@felix-schott
Copy link
Collaborator Author

I created a bug report for the shapefile issue (#48), I believe this can be dealt with in a separate PR as it doesn't seem to be related to the test setup.

Copy link
Collaborator

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - it looks good to me, and definitely beyond my python skills, so I'm going to merge it in so we can start running tests.

},
extensions={
Format.SHAPEFILE: 'shp',
Format.GEOJSON: 'json',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to have this recognize both .geojson and .json as 'GeoJSON'? My previous solution was a total hack:

    output_extension = {
        'shapefile': '.shp',
        'geojson': 'json',

It just checked for ending in 'json' so it'd grab both .json and .geojson. Tagged it in #49 - don't need more in this PR, and would love to get this one in asap to start to have a test framework.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good point!

sources: Dict[Source, SourceSettings]
extensions: Dict[Format, str]

settings = SettingsSchema(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this, makes a ton of sense. I'm learning more python just reading this PR :)

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

I prefer to work in containers, so I added a Dockerfile for developing as well.

elif source.lower() == "overture":
data_path = "s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-quad-hive/*/*.parquet"
hive_partitioning = True
match source.lower():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i realise match case was only introduced in python 3.10 - will revert to if/else to support older versions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds great. As I mentioned in the other comment I don't know the tradeoffs in python well enough to reason about what version to target, but I generally lean towards making it easier for users as long as it's not onerous for devs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think it's better to support older versions if it's not too much hassle :)

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

So looks like 'match' is syntax from 3.10, and our test suite does 3.8 and 3.9. I have no idea about the python ecosystem, and how important it is to support older versions (and how many people are on like 3.10 and above), so I'm probably fine to just remove those tests if 3.8 and 3.9 aren't worth the tradeoffs, but I defer to those who know Python better.

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

I prefer to work in containers, so I added a Dockerfile for developing as well.

Great - I'll try out working that way, it makes lots of sense to me.

Sorry for overloading this PR, some changes could have arguably been a separate PR.

It's all good - I think as the project matures we can get more disciplined about tighter PR's, but since we don't even have tests yet and this adds them I think it is all good.

There are more tests that could/should be added, I focussed on the download function for now (might add more in the future).

Very reasonable - we can try to start pushing any new PR to have tests, and ticket some test making that could be good 'first issues' for people interested in joining.

Also, I updated the CI/CD pipeline - I'm not too familiar with Github actions and wasn't sure how to test, so not guaranteed that this will work (we'll find out).

Sounds good - happy to YOLO this and see what happens.

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

@felix-schott - just gave you increased rights on the repo, as I'd love you reviewing PR's, as I'm sure I'll learn a lot having you look over my code. No worries if you don't want to at any point, but wanted to give you the rights to do so.

@felix-schott
Copy link
Collaborator Author

i changed the | syntax to the older Union syntax - sorry for the mess, I'm used to developing in more recent Python versions. i had also added the shapefile test back and it turns out it seems to be a general linux problem as it also failed in the pipeline! anyway, i gotta leave now, will check back in tomorrow if there's still issues (looks like the ci/cd run needs to be approved by you).

thanks for giving me increased rights! happy to review changes :)

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

Awesome, thanks! I'm going to try to figure out why I need to approve every workflow. I was hoping that giving you more rights would at least let you run it. Hopefully I can find something more liberal.

@cholmes
Copy link
Collaborator

cholmes commented Oct 14, 2023

Ok, made it more liberal, to 'Only first-time contributors who recently created a GitHub account will require approval to run workflows.' so hopefully this will work by default more. It's definitely annoying for me to come to a new PR and then have to hit 'run' and wait, and clearly would be more helpful to a first time contributor to be able to see the results of CI.


@pytest.mark.integration
@pytest.mark.flaky(reruns=NUM_RERUNS)
@pytest.mark.parametrize("format", [f for f in Format if format != Format.SHAPEFILE]) # fails for shapefile!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting:

[gw5] linux -- Python 3.10.13 /opt/hostedtoolcache/Python/3.10.13/x64/bin/python3
worker 'gw5' crashed while running 'tests/test_open_buildings.py::test_download_format[Format.SHAPEFILE]'
=========================== short test summary info ============================
FAILED tests/test_open_buildings.py::test_download_format[Format.SHAPEFILE]

I was going to go into the code to skip testing shapefile, but it seems to me the line above should do that, so I'm not sure what to do other than remove shapefile as an output format, but that seems silly since it works on mac and windows.

@felix-schott felix-schott merged commit faf9388 into opengeos:main Oct 15, 2023
7 checks passed
@felix-schott
Copy link
Collaborator Author

@cholmes All tests passed - I took the liberty and merged since you previously signalled approval. Hope that's okay. Probably doesn't need a new tag since it doesn't add any major features, just tests and some refactoring. The only feature would be that directories are permitted as dst.
Thanks for your time reviewing these changes :)

@felix-schott felix-schott deleted the 36_add-tests branch October 15, 2023 10:25
@cholmes
Copy link
Collaborator

cholmes commented Oct 15, 2023

Definitely ok to merge, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants