Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build year aggregation #1056

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Build year aggregation #1056

wants to merge 30 commits into from

Conversation

koen-vg
Copy link
Contributor

@koen-vg koen-vg commented May 9, 2024

As mentioned in PyPSA/linopy#248, myopic foresight optimisation comes with a significant progressive memory overhead. The problem seems to be two-fold:

  1. The increasing number of components with different build years leads to increasing memory (and time) requirements in myopic foresight optimisations.
  2. Moreover, in at least some (realistic) cases, there are memory spikes during the model construction in myopic foresight settings.

See PyPSA/linopy#248 for a graph illustrating both effects.

This PR is an attempt at leading with the first of these two issues; in the process it does also solve the second issue in my tests (but cannot guarantee that that issue is resolved for any configuration).

Changes proposed in this Pull Request

The idea is that myopic foresight optimisation involves adding many near-copies of components, one for each planning horizon, with only one (the one corresponding to the "current" planning horizon) being extendable for each optimisation. For example, you have NO0 0 onwind-2020, NO0 0 onwind-2025, etc.

Now, these components are essentially the same apart from nominal capacity: they have the same bus connections, efficiencies, capacity factors, etc. This means that they could be treated as a single component. That is what this PR does.

The PR introduces two new functions in solve_network.py: an aggregation function and a disaggregation function. The aggregation function would aggregate e.g. the two components NO0 0 onwind-2020, NO0 0 onwind-2025 to a single component NO0 0 onwind. The network is then solved. After solving the disaggregation function reconstructs the two original components NO0 0 onwind-2020, NO0 0 onwind-2025. Ideally, these two functions should be inverses in that first aggregating and then disaggregating results in the original network.

Of course, some information might be lost in aggregation. Importantly, one needs to keep track of the nominal capacity (and lifetime) in each build year so that these capacities can be phased out at the end of their lifetimes correctly.

To deal with this, the aggregation function adds additional columns p_nom-2020, p_nom-2025, etc. to n.df(c) (for each class of components c); the contents of these columns can then be used to reconstruct the original components in the disaggregation function.

Exactly which attributes are stored between aggregation and disaggregation is flexible; currently this stands at p_nom, p_nom_max, lifetime and capital_cost.

Results

I have tested the implementation with the following configuration:

run:
  name: "mem-test"
  disable_progressbar: true
  shared_resources: false
  shared_cutouts: true

enable:
  retrieve_cutout: false

foresight: myopic

scenario:
  ll:
  - v1.5
  clusters:
  - 50
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2025
  - 2030

countries: ["DE", "NL", "BE"]

clustering:
  temporal:
    resolution_sector: "24H"
  build_year_aggregation: false

sector:
  central_heat_vent: true
  transmission_efficiency_enabled: false

electricity:
  extendable_carriers:
    Generator: [OCGT, solar, onwind, offwind-ac, offwind-dc]
    StorageUnit: [battery]
    Store: [H2]
    Link: [H2 pipeline]

  renewable_carriers: [solar, onwind, offwind-ac, offwind-dc]

  estimate_renewable_capacities:
    enable: false

renewable:
  solar:
    cutout: "europe-2013-era5"

solving:
  solver:
    name: "gurobi"

Note that I added an option and sector_opts wildcard flag to enable or disable the build year aggregation.

The networks with and without build year aggregation have nearly identical characteristics, which I take as very strong evidence that the aggregation and disaggregation implementations are correct; see e.g. the energy balances:
image

As for what the whole point of this was, the build year aggregation leads to a drastic decrease in memory footprint. Here is the memory usage over time for the first and last planning horizons with and without build year aggregation:
image

That's an improvement of more than a factor of 5! (from more than 20GB to about 4GB). (The graph includes the aggregation and disaggregation steps.)

Interestingly the solving time doesn't change so much. Evidently Gurobi can solve the non-aggregated problem just as quickly; it only needs a lot more memory.

The aggregation and disaggregation does take some time, leading to a slight overhead. At the resolution of the above test (50 nodes, 24H), this just about breaks even. At lower resolutions, the overhead of the (dis)aggregation means that the total amount of time taken when aggregating build-years can be higher than without aggregation; the memory footprint is however consistently much lower.

Notes

I wrestled with this aggregation and disaggregation for a little while, and I have to say it's rather gnarly stuff.

As it stands, I think this branch could be ready for general use "at your own risk"; it could be interesting to merge it but disable the aggregation, leaving it an option for people (like me, currently) who need to cut their memory footprint, with the understanding that things could possibly go wrong and you have to double-check that the results are correct.

(Obviously a little bit of polishing is still needed, i.e. documentation, release notes, etc. etc.)

I'd also be interested in hearing about alternative approaches. The radical approach would be for pypsa to provide built-in support for build-years somehow; this would circumvent the aggregation and disaggregation implemented here but would presumably be a pretty substantial change to pypsa. Maybe it's the best solution in the long term, however.

It's also possible that there are neater ways of implementing what I did, and suggestions for improvement are welcome. In particular, it might be possible to optimise some the operations, cause they do take some time (leading to the above overhead) and it feels like it could be faster. I did a little bit of profiling and couldn't find any immediate improvements, however.

Caveats

  • The aggregation overwrites p_nom_min and p_nom_min (and e_nom_min, e_nom_max) for non-extendable components in order to set the corresponding attributes correctly for the aggregated component.
  • Different efficiencies for different build-years is something that's only dealt with in a "lossy"/approximate manner in the current implementation: the efficiencies are simply averaged (not a weighted average!) and not reconstructed upon disaggregation. Aggregating expandable components with different efficiencies cannot really be done exactly/correctly; an alternative is to only aggregate previous build years (i.e. leaving one non-expandable and one expandable component). In practice I haven't seen this issue affect the results in a significant way, but it could if for example the efficiency of some component is changed radically from one planning horizon to the next.
  • Objective values are slightly different between aggregated and non-aggregated optimisations, and I think this could be due to aggregation of capital cost. As it stands, aggregated components have to have the capital cost of the current planning horizon for the optimisation to make sense, but that historical (existing) capacities sometimes have a lower capital cost, the aggregation leads to a larger total number. This is of course "fixed" upon disaggregation since capital cost is stored in columns by build year. In my tests I see total objective differences in the order of 0.5-1%. This could have an impact on near-optimal studies etc.
  • This does not work when some locations have urban central heat buses at some planning horizons and not others, and waste heat recovery is enabled, since this means e.g. some electrolysers with different build years will have different values for bus2. To counter this, I added an option to add central heat buses everywhere, regardless of the fraction of population served by urban central heating.

Checklist

  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Changed dependencies are added to envs/environment.yaml.
  • Changes in configuration options are added in all of config.default.yaml.
  • Changes in configuration options are also documented in doc/configtables/*.csv.
  • A release note doc/release_notes.rst is added.

@koen-vg
Copy link
Contributor Author

koen-vg commented May 9, 2024

I could also briefly explain the other changes here apart from the aggregation and disaggregation functions:

  • Of separate interest, I fixed a couple of instances of the same bug as this one: Bugfix: don't modify store carrier attribute when defining operational limits PyPSA#880, in solve_network.py. (This is also the reason why this PR needs pypsa>=0.28)
  • I encountered a strange class of bugs, appearing in make_summary.py (but possibly originating earlier) where some components would "lose" their time-varying data. This only happens when those time-varying data are all 0 anyway, but for technical reasons they would be exactly 0.0 after aggregation/disaggregation, but some close-to-zero floating point value in a non-aggregated network. Is it the case that pypsa networks will automatically remove all-zero (or all-default) columns from time-varying dataframes, and this could lead to problems? I don't know, but this is the reason for the changes to make_summary.py (just a fix, not assuming the presence of said columns and using 0 by default).
  • I also somehow had problems with the reversed column in n.links, somehow leading to netCDF errors saying that bool values are not supported. This is the reason that I had to put some effort into getting rid if this column.

(The CI is btw failing cause pypsa==0.28.0 isn't quite available yet; use the latest commit instead or wait a little before testing.)

@koen-vg
Copy link
Contributor Author

koen-vg commented May 10, 2024

I've also done some testing on a larger European-wide model (config pasted at the bottom of this comment) with 50 nodes and 6-hourly resolutions. The results hold at this scale as well:
image
In this case, the build year aggregation saves time as well as memory.

(It is curious that the optimisation at 2050 takes significantly more time than the one at 2020 even with build year aggregation; maybe this is simply down to the enforced CO2 neutrality making the solution space more "complicated"?)

In this European-wide test there were (very) minor differences in the modelling results; with build year aggregation there was some use of biogas; without there wasn't:
image
This could be down to an obscure bug that I didn't catch yet.

Total system costs do look very similar overall (except for biogas again):
image

The above figures were produced with this configuration:

run:
  name: "mem-test"
  disable_progressbar: true
  shared_resources: false
  shared_cutouts: true

enable:
  retrieve_cutout: false

foresight: myopic

scenario:
  ll:
  - v1.5
  clusters:
  - 50
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2030
  - 2040
  - 2050

clustering:
  temporal:
    resolution_sector: "6H"
  build_year_aggregation: false

sector:
  central_heat_vent: true
  central_heat_everywhere: true
  transmission_efficiency_enabled: false

electricity:
  extendable_carriers:
    Generator: [OCGT, solar, onwind, offwind-ac, offwind-dc]
    StorageUnit: [battery]
    Store: [H2]
    Link: [H2 pipeline]

  renewable_carriers: [solar, onwind, offwind-ac, offwind-dc]

  estimate_renewable_capacities:
    enable: false

renewable:
  solar:
    cutout: "europe-2013-era5"

solving:
  options:
    io_api: "direct"
  solver:
    name: "gurobi"

@fneum fneum added this to the v0.11.0 milestone May 12, 2024
Saves a large amount of memory during problem preparation and solving.
The aggregation and disaggregation steps take some time-overhead
however.
@koen-vg
Copy link
Contributor Author

koen-vg commented May 13, 2024

I'd be cool to see this in v0.11 :) I've taken the liberty to clean up the history of this branch a little bit, and added what I think is sufficient documentation. After also handling changing marginal cost over time, I can now barely see any difference between aggregated and non-aggregated results.

From my side this is good to go!

A couple of quick things:

  • I have no idea why the CI is still failing to set up the mamba environment; on my end it works fine to install pypsa 0.28.0. Does it have to update its repository is something?
  • Talking about tests, I think it's good to leave the build year aggregation off by default, but it might be an idea to add a test for it at some point. I know that testing as it stands is rather rudimentary but eventually it would be nice to have a proper testing suite (in a similar vain to pypsa) where it would be possible to test that build year aggregation comes similar results, for example. But any strong opinions on whether I should add a simple testing configuration for build year aggregation and add it to test.sh?
  • For right now build year aggregation is incompatible with time segmentation since the segmentation can be different between different planning horizons. I would like to fix this by "pinning" the segmentation scheme by planning horizon, but can solve that in a different PR.
  • On a vaguely related note: is there any policy on configuration validation? Would there be any interest in a simple validation script, called from the Snakefile, with a couple of asserts making sure that the configuration makes sense? It would be relevant for build year aggregation (making sure that segmentation isn't use, that transmission efficiency is turned off, etc.) but also in a bunch of other cases, e.g. config["electricity"]["estimate_renewable_capacities"]["enable"] == True incompatible with myopic foresight #1016. I would be happy to contribute towards this.

@koen-vg
Copy link
Contributor Author

koen-vg commented May 13, 2024

Actually sorry I take it back about this being good to go; please give me time for one more commit with a couple more fixes. Sorry I had two branches laying around and mixed them up a little.

Saves a large amount of memory during problem preparation and solving.
The aggregation and disaggregation steps take some time-overhead
however.
@koen-vg
Copy link
Contributor Author

koen-vg commented May 13, 2024

Okay now it's finally all good :)

@koen-vg
Copy link
Contributor Author

koen-vg commented May 21, 2024

I'm actually still ironing out some small issues here and there; maybe it's an idea to still wait a little bit before merging.

@fneum
Copy link
Member

fneum commented May 23, 2024

Just a few answers so far:

I encountered a strange class of bugs, appearing in make_summary.py (but possibly originating earlier) where some components would "lose" their time-varying data. This only happens when those time-varying data are all 0 anyway

I think this is a known issue: PyPSA/PyPSA#722

I have no idea why the CI is still failing to set up the mamba environment; on my end it works fine to install pypsa 0.28.0.

The environment is cached, maybe that was the cause. Anyway, the issue seems to be resolved.

Talking about tests, I think it's good to leave the build year aggregation off by default, but it might be an idea to add a test for it at some point.

Yes, fully agree.

For right now build year aggregation is incompatible with time segmentation since the segmentation can be different between different planning horizons.

This was addressed in #1065 and should be fine now.

On a vaguely related note: is there any policy on configuration validation? Would there be any interest in a simple validation script, called from the Snakefile, with a couple of asserts making sure that the configuration makes sense?

Yes, could start with a function in _helpers.py, although I wonder how sustainable/maintained it will be.

@koen-vg
Copy link
Contributor Author

koen-vg commented May 23, 2024

What about next week I write a small function in _helpers.py to validate build-year-aggregation specific configuration, just so people that want to use it don't have to track down this PR and read the listed caveats. I think using assert statements or something similar that will throw exceptions unless they are met should be effective at making sure such validation stays up to date; false positives will be discovered and have to be fixed immediately. (False negatives, i.e. illegal configuration not covered by validation, is of course the status quo anyway.)

By next week I'll have tested the build year aggregation even more myself too. It's currently working fine for me, and by then I should be even more confident that it really works.

There are no remaining outstanding issues from my side that would have to be fixed before merging.

@fneum
Copy link
Member

fneum commented May 24, 2024

I hope you don't mind @koen-vg, but I want to move this feature to the v0.12.0 release given that the pending v0.11.0 release is already very big. We can do this in relatively quick succession.

@fneum fneum removed this from the v0.11.0 milestone May 24, 2024
@fneum fneum modified the milestones: v0.12.0, v0.13.0 Aug 29, 2024
koen-vg and others added 9 commits August 31, 2024 21:53
The "reversed" column can lead to problems upon exporting a network;
somehow these problems only become apparent upon build year
aggregation and disaggregation. Nominally, netCDF doesn't support
boolean values, which is a likely explanation. However, it's unclear
the problem arises specifically with the "reversed" column and only
when aggregating/disaggregating.
The solar potential constraint as originally formulated was not
compatible with build year aggregation. Here, the constraint has been
reformulated in such a way that it works for both "regular" networks
and ones which have been aggregated by build-year.

The previous formulation effectively calculated the remaining area
available for solar (by subtracting installed solar-hsat from
remaining area as given by p_nom_max, which itself has already been
reduced by existing solar), and using this to limit the sum of newly
installed solar and solar-hsat.

The new formulation instead calculates _total_ available solar
capacity (by summing installed capacity of the "solar" carrier and
p_nom_max of extendable "solar" generators), and uses this to limit
the total (both extendable and non-extendable) capacity of all solar
carriers (including "solar-hsat" and potentially others).

The new formulation is also a little shorter overall, and another
bonus is that it should work if ever more types of solar are
added (whereas the old formulation wasn't as easily adaptable to a
situation with more than just "solar" and "solar-hsat").
@koen-vg
Copy link
Contributor Author

koen-vg commented Sep 6, 2024

In my opinion, this feature is mature enough to be merged. I have tested it quite a bit, and it seems to work very well. Build year aggregation now works with the default configuration (except for having to set the central_heat_everywhere option). I have updated the documentation too, which should reflect the current state of the feature well.

The only substantial change I implemented recently for this feature is a reformulation of the solar potential constraint in order to make it compatible with build year aggregation. I hope the new formulation is easy enough to understand, and I have tested both implementations in parallel to make sure that they give identical results:
image
(Here, blue and orange are the old and new solar potential constraint implementations, green with build year aggregation.)

I have documented the new constraint a little bit in the commit introducing it, but in short it now constrains the total amount of solar, both existing and newly built, at any given location, not just the newly built solar capacity.

@koen-vg
Copy link
Contributor Author

koen-vg commented Sep 6, 2024

Note: this feature also depends on the fix in #1262! That PR should be merged first, then this one.

@koen-vg
Copy link
Contributor Author

koen-vg commented Sep 17, 2024

@fneum what do you think about merging this before the next release? Or is there anything that's still missing from your point of view (documentation, examples, tests, whatever)? It'll make my life a little easier if I don't have to keep this branch up-to-date :)

It's an opt-in feature so shouldn't make a difference to users. Happy to write in more detail about the implementation, pros and cons, etc. if that would be helpful.

@fneum
Copy link
Member

fneum commented Sep 17, 2024

@koen-vg, I'll try to find time this week to review! Sorry that this PR hasn't been prioritized!

@koen-vg
Copy link
Contributor Author

koen-vg commented Sep 17, 2024

No worries, I understand that it's hard to find time for these things!

@koen-vg
Copy link
Contributor Author

koen-vg commented Sep 23, 2024

Hmm, I take it that with PyPSA/PyPSA#1038, this implementation could be simplified a little bit, at least in that components that are to be clustered can just be marked as inactive instead of having to store information about them in ad-hoc extra columns as is currently done. I can have a look at implementing that if we want to use that new functionality, or it can also be done in a separate PR after merging this one.

@koen-vg
Copy link
Contributor Author

koen-vg commented Oct 7, 2024

Just as a small update, for the record, build year aggregation still doesn't quite seem to eliminate all memory spikes during optimisation preparation. Here's an example from an optimisation, with and without build year aggregation. Note however that some components (with time-varying efficiencies) were excluded from build year aggregation for better accuracy.
bilde
While the advantage of build year aggregation is clear, even when aggregated there is still a memory spike up to about 33GB, whereas the build of optimisation "only" takes less than 25GB. Again, in a SLURM (or similar) setup, this leads to a ~35% memory overhead on individual optimisation jobs. Better than the 100%+ overhead without build year aggregation of course, and build year aggregation also speeds up the optimisation. But the problem of memory spikes during LP building shouldn't have to be solved by careful model set-up.

(P.S. note to anyone interested in replicating these graphs: you will have to set the memory polling interval to something less than 30 seconds, the default, otherwise the spikes won't be visible.)

@koen-vg
Copy link
Contributor Author

koen-vg commented Oct 11, 2024

Okay I've refactored the implementation, using the new active attribute from pypsa 0.31. It's actually a lot more readable now, I think!

I recommend using the following minimal configuration to test the implementation for anyone that's interested:

run:
  name: "buildyearagg"

foresight: myopic

scenario:
  ll:
  - vopt
  clusters:
  - 45
  opts:
  - ""
  sector_opts:
  - ""
  - "aggBuildYear"
  planning_horizons:
  - 2020
  - 2030
  - 2040
  - 2050

sector:
  central_heat_everywhere: true

clustering:
  temporal:
    resolution_sector: "168H"

solving:
  solver:
    name: gurobi
    options: gurobi-default

Then run the workflow up to plot_summary and compare the energy balances, etc. There are some very minor differences in the modelling results (I think due to efficiencies changing over time; they are averaged when aggregating over build-years) but overall you obtain good accuracy. Certainly the trade-off is more than worth it compared to decreasing spatio-temporal resolution.

@koen-vg
Copy link
Contributor Author

koen-vg commented Oct 11, 2024

Now, the problem is that, while correct, the way that the active attribute is implemented means that the current build year aggregated doesn't actually reduce memory overhead properly! That is because it seems like optimisation variables are created for components even if active is set to False. Properly excluding components that are not active is mentioned in PyPSA/PyPSA#1038, but implementation hasn't been discussed further and will presumably only happen after PyPSA/PyPSA#848. So if we want to keep the implementation of build-year aggregation as is, it will only become really useful after the active attribute is taken "more seriously" by PyPSA.

Anyone who still wants the memory benefits could for now use f4b208c.

@koen-vg
Copy link
Contributor Author

koen-vg commented Oct 11, 2024

I did manage to fully exclude inactive components from model building in PyPSA/PyPSA@094af75; that's a pretty quick-and-dirty implementation but shows that it works in principle. The results are good: memory is back down to where it's supposed to be.

@fneum
Copy link
Member

fneum commented Nov 14, 2024

Hey @koen-vg, sorry that I still did not get to this PR :/ I've just been running out of time. I'll be on a 2-week holiday now, but I will prioritize this after I return.

@koen-vg
Copy link
Contributor Author

koen-vg commented Nov 15, 2024

No worries at all @fneum. It anyway looks like the utility of this PR will be limited until PyPSA/PyPSA#1056 gets fixed. That is, the improved implementation using the active attribute is much neater, but doesn't actually improve memory usage (which is supposed to be the whole point) as of now since non-active components are still included in model building. So I wouldn't merge this quite yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants