skyfield calculations may be overly accurate for requirements at the cost of computation #4

thesofakillers · 2021-08-20T22:21:14Z

Have not analyzed big O performance but it is slow enough to be a nuisance for larger datasets, especially since this calculation needs to be repeated across train/test sets and perhaps even across folds.

The following lines need addressing:

nowcastlib/nowcastlib/pipeline/process/postprocess/generate.py

Lines 130 to 133 in b24e0d8

    
           sunset_idxs = np.zeros(len(datetime_series), dtype=int) 
        
           for sunset in sunsets[1:]: 
        
               change = np.where(datetime_series > sunset)[0][0] 
        
               sunset_idxs[change:] += 1

thesofakillers · 2021-08-26T12:23:06Z

Upon closer look, while the lines pointed out above indeed need addressing, the skyfield library seems generally to be quite slow for what we are asking it to do. Both the _t_since_sunset and _sun_elevation functions are quite slow for large datasets, with the latter taking the crown, with these lines also causing speed issues:

nowcastlib/nowcastlib/pipeline/features/generate.py

Lines 182 to 186 in f054704

    
           astrometric_pos = location.at( 
        
               skyfield_ts.from_datetimes( 
        
                   datetime_series.dt.tz_localize("UTC").dt.to_pydatetime() 
        
               ) 
        
           ).observe(eph["sun"])

Skyfield was selected because the popular library pyephem suggests it.

Further attention should be dedicated into determining whether

the skyfield API is being used correctly
a faster library should be used
if this is actually the speed limit for what we are computing

brandon-rhodes · 2021-08-27T11:37:14Z

I'll be interested to hear which component in that call has the higher cost. Is there a way you can produce that series of times, I wonder, without creating n separate Python datetime objects? If the times are evenly spaced on the clock or calendar there are other approaches in Skyfield for creating the time array.

In the meantime, if you split your call into two lines with a temporary variable:

dtseries = datetime_series.dt.tz_localize("UTC").dt.to_pydatetime()
result = skyfield_ts.from_datetimes(dtseries)

— then could you could separately measure the cost of the Pandas operation from the Skyfield operation.

thesofakillers · 2021-08-27T17:43:18Z

@brandon-rhodes Hi thanks for commenting! Wasn't expecting the library creator to come and help 😅

The times are indeed evenly spaced so this could be done, as you say, with other approaches from skyfield. That was just a quick and dirty solution without having to dig too deeply into the documentation.

I've updated my second comment with the complete section of code causing problems. Evaluating line by line as you suggested sheds some light on what operations are more costly. You can see my experimentation in this notebook, where i use %lprun to profile the same function with different input time series.

While for small time series the pandas operation is more costly than skyfield's from_datetimes, this flips for larger time series. More importantly however, the greatest slow down is caused by the location.at() call, which we can see takes between 60% and 80% of the execution time depending on the input size.

In the notebook the time series I try on are of length 1000, 10000 and 100000. I wasn't able to try it on 1000000 since it would not complete execution. For this particular project, time series of this size (millions of timestamps) are what we are concerned with.

Thanks again for the interest! I hope this satisfies your interest :).

We would really appreciate if you have any suggestions on how to do this in a smarter way! We're trying to get the sun's altitude at each timestamp for millions of timestamps, with the timestamps being evenly spaced (regular sample rate basically).

We are also interested in determining the most recent sunset for each timestamp (code here) although this has been less problematic

brandon-rhodes · 2021-08-31T20:36:49Z

My guess is that most of the time is invested in computing precession and nutation — the precise angle at which the Earth's poles point at any given moment. The standard nutation series, in particular, has 687 terms of 5 coefficients each which need to be evaluated at every point on your timeline, so if you have one million times then that’s something like 5×687×10⁶ multiplies and then all the necessary adds to put them together.

What kind of accuracy are you after? The full series is only useful if you're going for observatory-grade angles that are useful for pointing a radio telescope (which is the accuracy Skyfield aims for by default).

Since your times are so close together, my guess is that we could evaluate nutation and precession daily, or only a few times a day, and linearly interpolate between them, to get better speed. But that will introduce error, so we need first to know how exact an answer you need on the Sun's position in the sky.

What will you be doing with the angles? That can help establish tolerances.

thesofakillers · 2021-09-06T11:06:39Z

What will you be doing with the angles? That can help establish tolerances.

We are using the angles as additional input data to provide to machine learning models to be used for turbulence forecasting.

What kind of accuracy are you after?

I have asked my colleagues what sort of accuracy we are after and will update this post

Since your times are so close together, my guess is that we could evaluate nutation and precession daily, or only a few times a day, and linearly interpolate between them, to get better speed.

This seems like a good idea and depending on our tolerance it may be sufficient to go this route. I imagine this will be the case.

Thank you again for all the help, let us know if you have any more comments 😊

thesofakillers changed the title ~~Time since sunset calculation is slow~~ Skyfield appear to be slow Aug 26, 2021

thesofakillers changed the title ~~Skyfield appear to be slow~~ Skyfield calculations appear to be slow Aug 26, 2021

thesofakillers changed the title ~~Skyfield calculations appear to be slow~~ skyfield calculations may be overly accurate for requirements at the cost of computation Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skyfield calculations may be overly accurate for requirements at the cost of computation #4

skyfield calculations may be overly accurate for requirements at the cost of computation #4

thesofakillers commented Aug 20, 2021

thesofakillers commented Aug 26, 2021 •

edited

Loading

brandon-rhodes commented Aug 27, 2021

thesofakillers commented Aug 27, 2021 •

edited

Loading

brandon-rhodes commented Aug 31, 2021

thesofakillers commented Sep 6, 2021

skyfield calculations may be overly accurate for requirements at the cost of computation #4

skyfield calculations may be overly accurate for requirements at the cost of computation #4

Comments

thesofakillers commented Aug 20, 2021

thesofakillers commented Aug 26, 2021 • edited Loading

brandon-rhodes commented Aug 27, 2021

thesofakillers commented Aug 27, 2021 • edited Loading

brandon-rhodes commented Aug 31, 2021

thesofakillers commented Sep 6, 2021

thesofakillers commented Aug 26, 2021 •

edited

Loading

thesofakillers commented Aug 27, 2021 •

edited

Loading