Basic Examples
ds : an ingrid stream or an xarray dataset
%ingrid:
/ds {SOURCES .LOCAL .sst.mon.mean.nc
% Replace the time grid by an 'ingrid-friendly' (but not CF-compliant) time.
time /time (months since 1891-01-01) ordered 0.5 1 1565.5 NewEvenGRID replaceGRID
} def
ds
N.B., You could download the latest COBE SSTs and then open it directly: xr.open_dataset('sst.mon.mean.nc')
#python:
import xarray as xr
url = 'http://kage.ldeo.columbia.edu:81/SOURCES/.LOCAL/.sst.mon.mean.nc/.sst/dods'
ds = xr.open_dataset(url)
ds
N.B.: A dataset/stream ds can contain multiple variables and grids. A dataarray/field, da (for example, ingrid syntax: ds .sst and python syntax ds.sst) contains a single variable. Most of the commands used in this page can be applied to both datasets and dataarrays. If commands are applied to a dataset, it is applied to all variables in that dataset.
<p>
A dataset (stream) contains variables, grids, coordinates and metadata. These can be selected by similar methods for ingrid and python. Try selecting .sst and .lon
%ingrid:
ds .sst
#python:
ds.sst
In python, we can use the built-in xarray quick-and-dirty plotting:
#python:
ds.sst[-1].plot()
But it is nicer to have control over the size and aspect ratio, add titles, etc:
#python:
from matplotlib import pyplot as plt
fig = plt.figure(figsize=(8,4))
ds.sst[-1].plot()
plt.title('SSTs of the last month');
%ingrid:
ds .sst 273.15 add
In python, compatible objects (xarray datasets/dataarrays, numbers) can be added together
#python:
ds.sst + 273.15
%ingrid:
ds .sst time (Jan 1960) VALUE lat 20 VALUE
#python:
ds.sst.sel(time= '1960-01', lat=20, method='nearest').plot()
%ingrid:
ds T (Jan 1982) (Dec 1995) RANGE lon 20 60 RANGE
#python:
ds.sel(time=slice('1982-01','1995-12'),lon=slice(20,60))
%ingrid:
ds [time] average
ds [lat lon] average
#python:
ds.mean('time')
ds.mean(['lat','lon'])
%ingrid:
ds lon 5 boxAverage time 12 boxAverage
-
In python we normally use
resamplefor time sampling/averaging, but we can usecoarsenon any grid. -
If the grid is not divisible by the number, use
boundary='trim'.
#python:
ds.coarsen(lon=5).mean()
#or
ds.coarsen(time=12,boundary='trim').mean()
For the time grid, here is a resample example - note the location of the time values for dsY0 vs. dsY
#python
url = 'http://kage.ldeo.columbia.edu:81/SOURCES/.LOCAL/.sst.mon.mean.nc/.sst/dods'
ds = xr.open_dataset(url).mean(['lon','lat']).sel(time=slice('2000','2021'))
dsY0 = ds.resample(time='Y').mean()
dsY = ds.resample(time='Y',label='left',loffset='6M').mean()
ds.sst.plot()
dsY.sst.plot()
dsY0.sst.plot()
%ingrid:
ds .sst time 3 runningAverage
#python:
ds.sst.rolling(time=3, center=True).mean()
%ingrid:
ds .sst [time]detrend-bfl
#python:
dfit = ds.sst.polyfit('time', 1, skipna=True)
ds.sst - xr.polyval(coord=ds.time, coeffs=dfit.polyfit_coefficients)
%ingrid:
ds .sst dup [time]detrend-bfl sub dup time last VALUE exch T first VALUE sub
#python:
dfit = ds.sst.polyfit('time', 1, skipna=True)
ds['linear_fit'] = xr.polyval(coord=ds.time, coeffs=dfit.polyfit_coefficients)
ds['trend'] = (ds.linear_fit[-1] - ds.linear_fit[0])
%ingrid:
ds .sst [time] 1 SM121
#python:
ds.sst.pad(time=1,mode='symmetric').rolling(time=3, center=True).mean()
%ingrid:
ds .sst [time] 1 SM121
#python:
ds.sst.pad(time=1, mode='wrap').rolling(time=3, center=True).mean()
%ingrid:
ds .sst [time]rmsover
% or, removing the mean first:
ds .sst [time]rmsaover
#python:
ds.sst.std('time')
# or, removing the mean first:
(ds - ds.mean('time')).sst.std('time')
%ingrid:
ds .sst [lon lat] maxover
ds .sst [time] minover
#python:
ds.sst.max(['lon','lat'])
ds.sst.min('time')
%ingrid:
ds .sst 0 max 28 min
#python:
ds.sst.clip(min=0,max=28)
- Masking:
%ingrid:
ds .sst 10.0 maskgt
#python:
ds.sst.where(ds.sst<10)
- Flagging:
%ingrid:
ds .sst 10.0 flaglt
#python:
ds.sst.where(ds.sst>10,1.0).where(ds.sst<=10,0.0)
%ingrid:
% time must be called `T`
ds .sst
time (Jan 1950) (Dec 2019) RANGE
time /T renameGRID yearly-climatology
#python:
ds.sst.sel(time=slice('1950-01','2019-12')).groupby('time.month').mean()
%ingrid:
ds .sst
time (Jan 1950) (Dec 2019) RANGE
time /T renameGRID yearly-anomalies
#python:
dss = ds.sel(time=slice('1950-01','2019-12'))
dss.sst.groupby('time.month') - dss.sst.groupby('time.month').mean()
%ingrid:
ds .sst time 12 splitstreamgrid
time (Dec) (Jan) (Feb) (Mar) (Apr) VALUES
[time]average
#python:
def is_amj(month): # define a function to select the desired months
return (month >= 12) | (month <= 4)
ds.sst.sel(time=is_amj(ds.sst['time.month'])).groupby('time.year').mean()
%ingrid:
ds .sst time /T renameGRID
T (Jan 1949) (Dec 1958) RANGE
yearly-anomalies
ds .sst time /T renameGRID
T (Jan 1959) (Dec 1978) RANGE
yearly-anomalies appendstream
ds .sst time /T renameGRID
T (Jan 1979) (Dec 2001) RANGE
yearly-anomalies appendstream
#python:
ds1 = ds.sst.sel(time=slice('1949-01','1958-12'))
ds2 = ds.sst.sel(time=slice('1959-01','1978-12'))
ds3 = ds.sst.sel(time=slice('1979-01','2001-12'))
xr.concat([ds1,ds2,ds3],dim='time')
%ingrid:
ds .sst
[time]standardize
#python:
ds.sst/ds.sst.std('time')
%ingrid:
ds .sst time /T renameGRID
T (Jan 1949) (Dec 2001) RANGE
yearly-anomalies
dup lon 190 240 RANGE lat -5 5 RANGE [lon lat]average
[T]correlate
#python:
dsg = ds.sst.sel(time=slice('1949-01','2001-12')).groupby('time.month')
ds_anom = dsg - dsg.mean()
ds_nino34 = ds_anom.sortby('lat').sel(lon=slice(190,240),lat=slice(-5,5)).mean(['lon','lat'])
xr.corr(ds_anom,ds_nino34,'time')
%ingrid:
ds .sst time /T renameGRID
T (Jan 1949) (Dec 2001) RANGE
yearly-anomalies
dup lon 190 240 RANGE lat -5 5 RANGE [lon lat]average
[T]standardize
mul [T]average
#python:
dsg = ds.sst.sel(time=slice('1949-01','2001-12')).groupby('time.month')
ds_anom = (dsg - dsg.mean()).drop('month')
ds_nino34 = ds_anom.sortby('lat').sel(lon=slice(190,240),lat=slice(-5,5)).mean(['lon','lat'])
(ds_anom * ds_nino34/ds_nino34.std('time')).mean('time')
%ingrid:
ds lat cosd
#python:
coslat = np.cos(np.deg2rad(ds.lat))
So we can use this to compute area weighted averages:
%ingrid:
ds .sst {lat cosd}[lon lat]weighted-average
#python:
weights = np.cos(np.deg2rad(ds.lat))
ds.sst.weighted(weights).mean(['lon', 'lat'])
# NOTE: this is NOT the same as ds.sst.weighted(weights).mean('lon').mean('lat')
%ingrid:
ds .sst a:
lat partial
lat :a: .lat REGRID
:a
110000. div
#python:
ds.sst.differentiate('lat')/110000.
%ingrid:
ds .sst a:
lon integral
lon :a: .lon REGRID
:a
#python:
ds.sst.cumsum('lon')
%ingrid:
ds .sst [lat]average
lon 10 40 definite-integral
#python:
ds.sst.mean('lat').sel(lon=slice(10,40)).integrate('lon')
%ingrid:
SOURCES .LOCAL .tas_day_CESM2_amip_20100101-20150101.nc .tas
time /T renameGRID
monthlyAverage
#python:
import xarray as xr
url='http://kage.ldeo.columbia.edu:81/expert/SOURCES/.LOCAL/.tas_day_CESM2_amip_20100101-20150101.nc/.tas/dods'
ds=xr.open_dataset(url,decode_times=True)
ds = tas.sel(time = slice('2010-01-01','2014-12-31'))
ds_monthly = ds.resample(time='1M',label='left',loffset='15D').mean()
%ingrid:
SOURCES .LOCAL .tas_day_CESM2_amip_20100101-20150101.nc .tas
time /T renameGRID
monthlyAverage
(tas.mon.mean.nc)writeCDF
Please note, xarray saves our usual dataset with a time grid that ingrid (and ncview, etc) will not be able to parse. If you just want to read it back using xr.open_dataset(), that is fine. Otherwise the time grid and the default encoding of the netcdf file needs to be changed.
#python:
import xarray as xr
url='http://kage.ldeo.columbia.edu:81/expert/SOURCES/.LOCAL/.tas_day_CESM2_amip_20100101-20150101.nc/.tas/dods'
ds=xr.open_dataset(url,decode_times=True)
ds_monthly = ds.resample(time='1M',label='left',loffset='15D').mean()
ds_monthly.to_netcdf('tas.mon.mean.nc')
%ingrid:
ds .sst 100 replaceNaN
#python:
ds.fillna(100)
# fill missing values using interpolation or nearest neighbor values:
ds.interpolate_na(dim='lon',method='nearest').interpolate_na(dim='lat',method='linear')
%ingrid:
home .datasets .ERA5-monthly .ingrid-ready .sst
X 0 2.5 357.5 GRID
Y -90 2.5 90 GRID
#python:
import xesmf as xe
ds_regrid = xr.Dataset({'lon': (['lon'], np.arange(0.5, 359.5, 1.0, dtype='float32')),
'lat': (['lat'], np.arange(-88.5, 88.5, 1.0, dtype='float32')),
})
regridder = xe.Regridder(ds_orig, ds_regrid, 'bilinear', periodic=True)
ds_orig_regridded = regridder(ds_orig)
%ingrid:
expert
SOURCES .NOAA .NCEP-NCAR .CDAS-1 .MONTHLY .Intrinsic .PressureLevel .rhum
[X Y]1 SM121
#python:
import scipy.ndimage as ndimage
ds_array_smoothed = ndimage.gaussian_filter(ds_array, sigma=5, order=0)
%ingrid:
#python: