-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where are all the ocean variables? #34
Comments
Ryan- You are correct there is not much ocean data there. The plan was to add additional data on request, so thanks for your request. We do have room in our AWS allocation to add more, and I would be glad for us to do so. I will discuss this with some team members at our meeting on Thursday. Besides the THETA, UVEL, VVEL, WVEL fields you mentioned, can you indicate which specific other variables you desire? Gary Strand's list of variable names is at http://www.cgd.ucar.edu/ccr/strandwg/CESM-CAM5-BGC_LENS_fields.html |
Thanks @jeffdlb! I will review the list of fields and get back to you. Our general interest is ocean heat and salt budgets. |
PS. We currently only have monthly ocean data, whereas the other realms also have some daily or 6-hour data. Is monthly sufficient for your use case? |
For coarse-resolution (non eddy-resolving) models, the oceans tend not to have too much sub-monthly variability. If we did need daily data, it would just be surface fluxes. Monthly should be fine for the other stuff. |
Ok, here is my best guess at identifying the variables we would need for the heat and salt budgets. Would be good for someone with more POP experience (e.g. @matt-long) to verify.
|
Ryan-
Many of these variables do not seem present in the monthly ocean data on
GLADE:
1310> pwd
/glade/collections/cdg/data/cesmLE/CESM-CAM5-BGC-LE/ocn/proc/tseries
1311> while read ln; do ls -d "monthly/$ln"; done < ~/oceanVars.txt
ls: cannot access monthly/THETA: No such file or directory
monthly/UVEL
ls: cannot access monthly/UVEL2: No such file or directory
monthly/VVEL
ls: cannot access monthly/VVEL2: No such file or directory
monthly/WVEL
ls: cannot access monthly/FW: No such file or directory
ls: cannot access monthly/HDIFB_SALT: No such file or directory
ls: cannot access monthly/HDIFB_TEMP: No such file or directory
ls: cannot access monthly/HDIFE_SALT: No such file or directory
ls: cannot access monthly/HDIFE_TEMP: No such file or directory
ls: cannot access monthly/HDIFN_SALT: No such file or directory
ls: cannot access monthly/HDIFN_TEMP: No such file or directory
monthly/HMXL
ls: cannot access monthly/HOR_DIFF: No such file or directory
ls: cannot access monthly/KAPPA_ISOP: No such file or directory
ls: cannot access monthly/KAPPA_THIC: No such file or directory
ls: cannot access monthly/KPP_SRC_SALT: No such file or directory
ls: cannot access monthly/KPP_SRC_TEMP: No such file or directory
ls: cannot access monthly/RESID_S: No such file or directory
ls: cannot access monthly/RESID_T: No such file or directory
monthly/QFLUX
monthly/SHF
monthly/SHF_QSW
monthly/SFWF
ls: cannot access monthly/SFWF_WRST: No such file or directory
monthly/SSH
monthly/TAUX
monthly/TAUX2
monthly/TAUY
monthly/TAUY2
ls: cannot access monthly/VNT_ISOP: No such file or directory
ls: cannot access monthly/VNT_SUBM: No such file or directory
monthly/UES
ls: cannot access monthly/UET: No such file or directory
monthly/VNS
ls: cannot access monthly/VNT: No such file or directory
monthly/WTS
ls: cannot access monthly/WTT: No such file or directory
Jeff de La Beaujardiere, PhD
Director, NCAR/CISL Information Systems Division
https://staff.ucar.edu/users/jeffdlb
https://orcid.org/0000-0002-1001-9210
…On Wed, Feb 12, 2020 at 1:46 PM Ryan Abernathey ***@***.***> wrote:
Ok, here is my best guess at identifying the variables we would need for
the heat and salt budgets. Would be good for someone with more POP
experience (e.g. @matt-long <https://github.com/matt-long>) to verify.
THETA
UVEL
UVEL2
VVEL
VVEL2
WVEL
FW
HDIFB_SALT
HDIFB_TEMP
HDIFE_SALT
HDIFE_TEMP
HDIFN_SALT
HDIFN_TEMP
HMXL
HOR_DIFF
KAPPA_ISOP
KAPPA_THIC
KPP_SRC_SALT
KPP_SRC_TEMP
RESID_S
RESID_T
QFLUX
SHF
SHF_QSW
SFWF
SFWF_WRST
SSH
TAUX
TAUX2
TAUY
TAUY2
VNT_ISOP
VNT_SUBM
UES
UET
VNS
VNT
WTS
WTT
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34?email_source=notifications&email_token=ABF4W4V7YH2OHFOZ3IW4DLTRCRNY3A5CNFSM4KPJYHB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSKIBY#issuecomment-585409543>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF4W4TKKWSQ4WIUUTK5N7TRCRNY3ANCNFSM4KPJYHBQ>
.
|
TEMP is the POP variable name for potential temperature (not THETA). Many of the data not available on glade are available on HPSS And (possibly) on NCAR Campaign storage here Note that to close a tracer budget you need Lateral diffusion (GM, submeso) Vertical mixing Surface fluxes # some choices Inventory |
According to Gary Strand, all of the LENS data is available on Glade here: /glade/collections/cdg/data/cesmLE/CESM-CAM5-BGC-LE When I look at what is available for the ocean, I find this:
If someone can help decipher these variables and determine if they are worth publishing, I would be happy to work on getting them onto AWS. |
@bonnland -- were you the one who produced the original S3 LENS datasets? If so, it would be nice to build on that effort. My impression from @jeffdlb is that they have a pipeline set up, they just need to find the data! Maybe you were part of that...sorry for my ignorance. As for the missing variables, I guess I would just request that you take the intersection between my requested list and what is actually available. I think that the list of monthly and daily variables you showed above is a great start. I would use nearly all of it. |
@rabernat I just got word from Gary; I was originally in a slightly different folder. I have the correct folder path now, and all 273 monthly ocean variables appear to be present. I was part of the original data publishing, so I know parts of the workflow. The most time consuming part is creating the CSV file describing an intake-esm catalog, which I did not originally take part in. The catalog is used to load data into xarray and then write out to Zarr. We have the file paths now; I just need to research how to construct the remaining fields for the CSV file. |
@rabernat I've loaded some variables, and the datasets are big. A single variable will take over 2TB. Here are some stats for five of the variables: Note that these sizes are uncompressed sizes, and they will be smaller on disk. Is there a priority ordering that makes sense if we can initially publish just a subset? Anderson believes that if the available space on AWS has not changed, we have around 30 TB available. @jeffdlb Do you know more exactly how much space is left on S3, and when we might get more? |
Brian-
On Thu, Mar 5, 2020 at 5:06 PM bonnland ***@***.***> wrote:
@jeffdlb <https://github.com/jeffdlb> Do you know more exactly how much
space is left on S3, and when we might get more?
We are currently using 61.5 TB for 905023 objects in ncar-cesm-lens bucket,
the vast majority of which is for atmosphere data. Ocean data only use 1.2
TB at present.
I don't think there is an automatically-enforced limit to the allocation,
so nothing will prevent writing objects after 100 TB. However, we should as
a courtesy notify AWS if we plan to go over. They have already said we can
use more if needed, within reason.
Is 20.95TB the total for all the new ocean variables, or only for a subset?
If subset, can you estimate the total (uncompressed) for all the new vars?
…-Jeff
Jeff de La Beaujardiere, PhD
Director, NCAR/CISL Information Systems Division
https://staff.ucar.edu/users/jeffdlb
https://orcid.org/0000-0002-1001-9210
|
The 20.95 TB is for only 5 variables. Of the 39 variables listed in #34 (comment), we found 38 variables. A back of the envelope calculation shows that their total uncompressed size would be ~ 170TB. |
Do we have any idea what typical zarr + zlib compression rates are for these datasets? I would not be surprised to see a factor for 2 or more. |
@rabernat The one data point I have so far is for atm/monthly/cesmLE-RCP85-TREFHT.zarr: |
I will ask Joe & Ana whether we can have up to ~150 TB more. If not, we may need to prioritize. @rabernat Do you know of any other expected users of these ocean variables? We might need to have some good justification for this >2x allocation increase. |
TEMP, UVEL, VVEL, WVEL, SHF, and SFWF would be the bare minimum I think. Will try to get a sense of other potential users. |
I've been using some of the ocean output from the CESM-LE. I've mainly been looking at overturning, heat transport, and surface forcing (i.e., MOC, SHF, UVEL, VVEL, TEMP, SST, SALT). I know there would be a lot of interest in biogeochemical variables, too. I agree it would be nice to have this on AWS data storage! |
I would definitely use it if available! SHF, SFWF, UVEL, VVEL, VNS, VNT, TEMP and SALT at least would be helpful. But also TAUX, TAUY, UES, UET and PD would be good too. |
I would definitely be keen to look at some biogeochemical variables, like DIC, DOC and O2. The full O2 budget would be dope but I presume that is a lot of data (not exactly sure which terms are needed but it seems they are usually the ones with ‘_O2’ appended (e.g. VN_O2, UE_O2 etc). Thanks for pinging me. |
@rabernat I'm in the process of creating the Zarr files for TEMP, UVEL, VVEL, WVEL, SHF, and SFWF, just as an initial test. I've discovered in the process, that the coordinate dimension describing vertical levels has different names depending on the variable. For example:
The chunk size for UVEL is 30 because we were originally thinking of splitting the 60 vertical levels into two chunks. We could do the same for WVEL; we just need to be careful about using the different coordinate dimension names when we specify chunk sizes. Should we somehow unify the dimension names for vertical levels, to simplify future user interaction with the data, or is it important to keep them distinct? Also, is there perhaps a better chunking strategy than what we are considering here? |
@bonnland the different vertical coordinates signify different locations in the level: That's a longwinded way of saying
Keep them distinct, please |
👍. When producing analysis-ready data, we should always think very carefully before changing any of the metadata.
Going a bit off topic, but I find POP to be pretty inconsistent about its dimension naming conventions. In the vertical, it uses different dimension names for the different grid positions. But in the horizontal, it is perfectly happy to use |
I don't know for sure, but I suspect that POP (or an ancestor of POP) originally had * Going even further off topic, the inconsistency that trips me up is trying to remember when I need the "g" in "lon"... going off memory, I'm 80% sure it's |
@rabernat I'm finishing up code for processing and publishing the ocean variables. I'd like to see what difference zlib compression makes. Are there any special parameters needed, or just use all defaults for the compression? Do you have an example of specifying this compression choice? |
We may want to stick with the default compressor because it appears to be providing a pretty good compression ratio: In [1]: import zarr
In [2]: zstore = "/glade/scratch/bonnland/lens-aws/ocn/monthly/cesmLE-20C-SFWF.zarr"
In [3]: ds = zarr.open_consolidated(zstore)
In [5]: ds["SFWF"].info
Out[5]:
Name : /SFWF
Type : zarr.core.Array
Data type : float32
Shape : (40, 1872, 384, 320)
Chunk shape : (40, 12, 384, 320)
Order : C
Read-only : False
Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type : zarr.storage.ConsolidatedMetadataStore
Chunk store type : zarr.storage.DirectoryStore
No. bytes : 36805017600 (34.3G)
Chunks initialized : 156/156 In [7]: !du -h {zstore}
2.5K /glade/scratch/bonnland/lens-aws/ocn/monthly/cesmLE-20C-SFWF.zarr/time_bound
....
13G /glade/scratch/bonnland/lens-aws/ocn/monthly/cesmLE-20C-SFWF.zarr |
@rabernat I should be transferring the following to AWS sometime today and tomorrow: TEMP, UVEL, VVEL, WVEL, VNS, VNT, SHF, SFWF. All will cover the CTRL, RCP85, and 20C experiments. @andersy005 should be updating the AWS intake catalog when the transfer is complete. |
Actually, it looks like we inadvertently wrote out the Zarr files with incorrect metadata. It is going to take a few more days to re-write and then transfer to AWS. |
I don't think I can make that decision for you. Simply stating that, due to the coronvirus pandemic and associated impacts on my time (enormous new child care responsibilities, remote teaching, etc.), I personally won't be able to do much on this until May (post spring semester). |
Thanks very much for doing this! I will try to make a start with what's there sometime next week. If it is easy to upload TAUX, TAUY, those would also be helpful to have (though I can start without them if you'd prefer to wait until I've tried it). |
@cspencerjones Thanks for offering to check things. It takes a good chunk of CPU hours to produce these files, so I'd feel better knowing there isn't some glitch in what we have so far that makes these data difficult to use. I will create and upload TAUX and TAUY, hopefully by Tuesday, and I'll respond here when they are ready. It would be great to see if you can use them successfully before creating more Zarr files. |
I did find some time to simply open up some data. Overall it looks good! Thanks for making this happen. Based on this quick look, I do have some feedback. Let's consider, for example, import s3fs
import xarray as xr
fs = s3fs.S3FileSystem(anon=True)
s3_path = 's3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-WVEL.zarr'
ds = xr.open_zarr(fs.get_mapper(s3_path), consolidated=True)
ds Which gives the following long output:
Based on this, I have two suggestions.
I hope this feedback is useful. |
@rabernat That is helpful feedback, and worth talking about IMHO, thank you. I will move forward with the chunking you suggest if I don't hear any objections in the next day or so. This issue of which variables should be coordinates has come up before in discussions with @andersy005 . Across variables, and possibly across ensemble members, in the original NetCDF files, these extra variables differ (examples: ULAT, ULONG, TLAT, TLONG are missing in some cases). The differences can apparently prevent concatenation into Xarray objects from working properly. I'm not as clear as Anderson on the potential problems. At any rate, It's good that we can address the metadata later if needed. It means I can move forward with creating these variables now. |
I can see how this could cause problems. However, I personally prefer to have all that stuff as coordinates. It's easy enough to just An even better option is to just drop all of the non-dimension coordinates before writing the zarr data, and then saving them to a standalone |
OK, that sounds like good advice. I'm assuming that removal of these variables is also something that can be done retroactively. Be sure to let me know if this not the case, or I will go ahead with the same procedure we've been using for now (since we have to go back anyway to fix the metadata for our other Zarr stores). |
Should be as simple as deleting the directories for those variables and re-consolidating metadata. |
It turns out that these variables consume ~20 MB per zarr store.
👍. Would this >>> print(grid_vars)
['hflux_factor', 'nsurface_u', 'DXU', 'latent_heat_vapor', 'salt_to_Svppt', 'DYT', 'TLONG', 'DYU', 'HTE', 'rho_air', 'HU', 'ULONG', 'DXT', 'rho_sw', 'HUS', 'HUW', 'moc_components', 'TAREA', 'ULAT', 'REGION_MASK', 'grav', 'transport_regions', 'KMU', 'sound', 'omega', 'ANGLET', 'HT', 'UAREA', 'heat_to_PW', 'days_in_norm_year', 'salt_to_ppt', 'dzw', 'sea_ice_salinity', 'cp_air', 'salt_to_mmday', 'dz', 'fwflux_factor', 'TLAT', 'HTN', 'mass_to_Sv', 'radius', 'latent_heat_fusion', 'T0_Kelvin', 'salinity_factor', 'sflux_factor', 'transport_components', 'KMT', 'rho_fw', 'cp_sw', 'ocn_ref_salinity', 'vonkar', 'nsurface_t', 'ANGLE', 'stefan_boltzmann', 'ppt_to_salt', 'momentum_factor'] Removing these grid variables produces a clean xarray dataset: <xarray.Dataset>
Dimensions: (d2: 2, lat_aux_grid: 395, member_id: 40, moc_z: 61, nlat: 384, nlon: 320, time: 1872, z_t: 60, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
* z_t (z_t) float32 500.0 1500.0 2500.0 ... 512502.8 537500.0
* z_t_150m (z_t_150m) float32 500.0 1500.0 2500.0 ... 13500.0 14500.0
* moc_z (moc_z) float32 0.0 1000.0 2000.0 ... 525000.94 549999.06
* z_w_top (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
* z_w_bot (z_w_bot) float32 1000.0 2000.0 3000.0 ... 525000.94 549999.06
* lat_aux_grid (lat_aux_grid) float32 -79.48815 -78.952896 ... 89.47441 90.0
* z_w (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
* time (time) object 1850-02-01 00:00:00 ... 2006-01-01 00:00:00
* member_id (member_id) int64 1 2 3 4 5 6 7 ... 34 35 101 102 103 104 105
Dimensions without coordinates: d2, nlat, nlon
Data variables:
time_bound (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
VVEL (member_id, time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>
Attributes:
nsteps_total: 750
nco_openmp_thread_number: 1
cell_methods: cell_methods = time: mean ==> the variable val...
tavg_sum: 2592000.0
tavg_sum_qflux: 2592000.0
source: CCSM POP2, the CCSM Ocean Component
contents: Diagnostic and Prognostic Variables |
@rabernat wrote:
FYI, for the 3D atmospheric data (at least monthly Q) there each chunk contains all ensemble members, 12 months of data, and 2 levels: <xarray.DataArray 'Q' (member_id: 40, time: 1032, lev: 30, lat: 192, lon: 288)> If we were to put all 30 levels in one chunk then we'd need to divide something else by a factor of ~15. Perhaps the x-y dimension should be 4x4 chunks instead of global? I know Anderson was striving for 100MB chunks but haven't checked the size of these. The ocean data have, I think, 60 levels instead of 30, so the problem is even worse. Also, @jhamman stated at the start of this project that it is possible to re-chunk under the hood if we don't like the arrangement, but I'm curious about how you do that in practice given the immutability of objects in an object store. |
I also just opened the data and had a look. I agree with Ryan that rechunking so that each chunk contains all vertical levels would be very helpful: oceanographers like to plot sections! I don't object to chunking more in time in order to achieve this. I also think that it's sensible to continue chunking by memberID, because I will want to write and test my code for one member and then operate on all the members only once or twice. I'll probably hold off doing anything more until this is a bit more sorted out. Thanks to everyone for putting in this effort! |
As an update I have re-chunked the data accordingly for all ocean variables: <xarray.Dataset>
Dimensions: (d2: 2, lat_aux_grid: 395, member_id: 40, moc_z: 61, nlat: 384, nlon: 320, time: 1872, z_t: 60, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
* z_t (z_t) float32 500.0 1500.0 2500.0 ... 512502.8 537500.0
* z_t_150m (z_t_150m) float32 500.0 1500.0 2500.0 ... 13500.0 14500.0
* moc_z (moc_z) float32 0.0 1000.0 2000.0 ... 525000.94 549999.06
* z_w_top (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
* z_w_bot (z_w_bot) float32 1000.0 2000.0 3000.0 ... 525000.94 549999.06
* lat_aux_grid (lat_aux_grid) float32 -79.48815 -78.952896 ... 89.47441 90.0
* z_w (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
* time (time) object 1850-02-01 00:00:00 ... 2006-01-01 00:00:00
* member_id (member_id) int64 1 2 3 4 5 6 7 ... 34 35 101 102 103 104 105
Dimensions without coordinates: d2, nlat, nlon
Data variables:
time_bound (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
VVEL (member_id, time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray> As you can see, I removed the |
Does @jhamman have a strategy for re-chunking in place directly on AWS S3? I suspect this would require reading data from the old objects, creating the new objects in a separate bucket as scratch space, deleting the old objects, copying the new objects to the main bucket, deleting the new objects from the scratch bucket. I can create a scratch bucket under our AWS account if desired. |
This is a minor nit, but I personally perfer |
Also, there appear to be quite a few coordinates that are not used by the data variables. These could probably be removed as well. |
I have created the Zarr files for TAUX and TAUY, but I chose to place all members in a single chunk because the chunks are so much smaller (these are 2D variables, so each chunk would be 1/60 the size of a 3D variable chunk). But because I didn't perform the same metadata operations as @andersy005, and because they are fast to recreate, I will let Anderson make these also. |
As an update, I updated the chunking scheme for all existing ocean variables on AWS-S3, removed the grid variables from the zarr stores, and created a standalone grid zarr store: In [2]: import intake
...: url = 'https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json'
...: col = intake.open_esm_datastore(url)
...: subset = col.search(component='ocn')
In [3]: subset.unique(columns=['variable', 'experiment', 'frequency'])
Out[3]:
{'variable': {'count': 11,
'values': ['SALT',
'SFWF',
'SHF',
'SSH',
'SST',
'TEMP',
'UVEL',
'VNS',
'VNT',
'VVEL',
'WVEL']},
'experiment': {'count': 3, 'values': ['20C', 'CTRL', 'RCP85']},
'frequency': {'count': 1, 'values': ['monthly']}} In [1]: import s3fs
...: import xarray as xr
...:
...: fs = s3fs.S3FileSystem(anon=True)
...: s3_path = 's3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-WVEL.zarr'
...: ds = xr.open_zarr(fs.get_mapper(s3_path), consolidated=True)
...: ds
Out[1]:
<xarray.Dataset>
Dimensions: (d2: 2, member_id: 1, nlat: 384, nlon: 320, time: 21612, z_w_top: 60)
Coordinates:
* member_id (member_id) int64 1
* time (time) object 0400-02-01 00:00:00 ... 2201-01-01 00:00:00
time_bound (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
* z_w_top (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
Dimensions without coordinates: d2, nlat, nlon
Data variables:
WVEL (member_id, time, z_w_top, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>
Attributes:
Conventions: CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netc...
NCO: 4.3.4
calendar: All years have exactly 365 days.
cell_methods: cell_methods = time: mean ==> the variable val...
contents: Diagnostic and Prognostic Variables
nco_openmp_thread_number: 1
revision: $Id: tavg.F90 41939 2012-11-14 16:37:23Z mlevy...
source: CCSM POP2, the CCSM Ocean Component
tavg_sum: 2678400.0
tavg_sum_qflux: 2678400.0
title: b.e11.B1850C5CN.f09_g16.005
In [2]: s3_path = 's3://ncar-cesm-lens/ocn/grid.zarr'
In [3]: grid = xr.open_zarr(fs.get_mapper(s3_path), consolidated=True) In [6]: xr.merge([ds, grid])
Out[6]:
<xarray.Dataset>
Dimensions: (d2: 2, lat_aux_grid: 395, member_id: 1, moc_comp: 3, moc_z: 61, nlat: 384, nlon: 320, time: 21612, transport_comp: 5, transport_reg: 2, z_t: 1, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
* member_id (member_id) int64 1
* time (time) object 0400-02-01 00:00:00 ... 2201-01-01 00:00:00
time_bound (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
* z_w_top (z_w_top) float32 0.0 1000.0 ... 500004.7 525000.94
ANGLE (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
ANGLET (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
DXT (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
DXU (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
DYT (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
DYU (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HT (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HTE (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HTN (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HU (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HUS (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
HUW (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
KMT (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
KMU (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
REGION_MASK (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
T0_Kelvin float64 ...
TAREA (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
TLAT (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
TLONG (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
UAREA (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
ULAT (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
ULONG (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
cp_air float64 ...
cp_sw float64 ...
days_in_norm_year timedelta64[ns] ...
dz (z_t) float32 dask.array<chunksize=(1,), meta=np.ndarray>
dzw (z_w) float32 dask.array<chunksize=(60,), meta=np.ndarray>
fwflux_factor float64 ...
grav float64 ...
heat_to_PW float64 ...
hflux_factor float64 ...
* lat_aux_grid (lat_aux_grid) float32 -79.48815 -78.952896 ... 90.0
latent_heat_fusion float64 ...
latent_heat_vapor float64 ...
mass_to_Sv float64 ...
moc_components (moc_comp) |S256 dask.array<chunksize=(3,), meta=np.ndarray>
* moc_z (moc_z) float32 0.0 1000.0 ... 525000.94 549999.06
momentum_factor float64 ...
nsurface_t float64 ...
nsurface_u float64 ...
ocn_ref_salinity float64 ...
omega float64 ...
ppt_to_salt float64 ...
radius float64 ...
rho_air float64 ...
rho_fw float64 ...
rho_sw float64 ...
salinity_factor float64 ...
salt_to_Svppt float64 ...
salt_to_mmday float64 ...
salt_to_ppt float64 ...
sea_ice_salinity float64 ...
sflux_factor float64 ...
sound float64 ...
stefan_boltzmann float64 ...
transport_components (transport_comp) |S256 dask.array<chunksize=(5,), meta=np.ndarray>
transport_regions (transport_reg) |S256 dask.array<chunksize=(2,), meta=np.ndarray>
vonkar float64 ...
* z_t (z_t) float32 500.0
* z_t_150m (z_t_150m) float32 500.0 1500.0 ... 13500.0 14500.0
* z_w (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
* z_w_bot (z_w_bot) float32 1000.0 2000.0 ... 549999.06
Dimensions without coordinates: d2, moc_comp, nlat, nlon, transport_comp, transport_reg
Data variables:
WVEL (member_id, time, z_w_top, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray> |
@andersy005 Did you have to create the new Zarr on GLADE and then delete/upload/replace the Zarr stores on S3, or was it possible to re-chunk in place on AWS? |
I am updating the dataset landing page to include the new variables. QUESTION: We added VNS & VNT (salt and heat fluxes in y-direction). Shouldn't we also include UES & UET (salt and heat fluxes in x-direction), and maybe WTS & WTT (fluxes across top face)? I don't see how only one component of the flux vectors can be useful. |
Hi Jeff, those variables are actually in transit now. I was going to announce their availability for performance testing after the transfer was completed. Once they have been transferred, I will update the catalog for AWS users. The variables in transit are: 3D variables: DIC, DOC, UES, UET, WTS, WTT, PD 2D variables: TAUX, TAUY, TAUX2, TAUY2, QFLUX, FW, HMXL, QSW_HTP, QSW_HBL, SHF_QSW, SFWF_WRST, RESID_S, RESID_T It has been an uphill climb to understand the difficulties of creating very large Zarr stores; the Dask workers were bogging down and crashing at first, but eventually I began understanding what configurations would lead to successful Zarr saves. |
@bonnland Excellent! Thank you very much. I will update the landing page to include those (but not publish until you are ready). |
FYI the draft unpublished landing page with recent updates is temporarily at |
@cspencerjones @rabernat @jbusecke Transfer of new ocean data is complete and available on Amazon AWS. It would be very helpful if someone could try a nontrivial computation with the data to make sure performance based on our chunking scheme is adequate. I've confirmed that the Binder notebook on Amazon works (see the README.md for the link), and the variables are visible in the catalog. Here is what I got:
|
I tried a few thing with the data this morning, including calculating density from temperature and salinity and plotting sections, transforming some variables to density coordinates and plotting time means etc. I tried using multiple workers as well. This worked ok and I think that the performance is adequate. |
That's great to hear; we can tentatively move forward with the remaining variables requested so far. They are all 3D variables: UVEL2, VVEL2 I've spent some time looking at MOC, which has a different parameterization than the other variables. Any thoughts on chunking are appreciated. At first glance, it seems we want to chunk in time, and leave all other dimensions unchunked, aiming for a chunk size between 100 and 200 MB.
|
Now that the new data have been uploaded, I believe I can publish this draft as the new landing page. |
There are still small inconsistencies to work out, AFAIK. Unless I am mistaken, Anderson republished all the ocean data the grid variables removed, but grid variables still coexist in the atmospheric data, and these grid variables are probably distinct from the ocean variables. The separate grid variables have been pushed to AWS, but they don't quite fit yet into our catalog framework, which is not yet general enough to handle variables that extend across experiments (CTRL, 20C, RCP85, etc). So the user can't load the grid variables until we generalize the catalog logic to make them available. And I'm not yet clear on whether transparent loading of these variables is a simple matter. Simpler from a data provider engineering perspective would be to modify the Kay notebook to show how grid variables are loaded for area-based computations, which would require republishing the atmosphere variables. So, some kinks are left to work out. |
I started to look at the LENS AWS data. I discovered there is very little available
There are only 3 variables: SALT (3D), SSH (2D), and SST (2D).
At minimum, I would also like to have THETA (3D), UVEL (3D), VVEL (3D), and WVEL (3D), and all the surface fluxes of heat and freshwater. Beyond that, it would be ideal to also have the necessary variables to reconstruct the tracer and momentum budgets.
Are there plans to add more data?
The text was updated successfully, but these errors were encountered: