How to subset the RADOLAN data file based on the radarlocation?

Hello All,

I want to subset the radolan data file for a specific region. I have converted the RADOLAN data file(RW product) to Xarray dataset. Now I have 3 coordinates (time, y, x) where y is northing and x is easting.

# load radolan files
rw_filename = os.path.join(src_path, "raa01-rw_10000-*")
ds = xr.open_mfdataset(rw_filename, engine="radolan")
# print the xarray dataset
ds

Result:

<xarray.Dataset>
Dimensions:  (time: 17518, y: 900, x: 900)
Coordinates:
  * time     (time) datetime64[ns] 2006-01-01T00:45:00 ... 2021-12-31T23:50:00
  * y        (y) float64 -4.658e+06 -4.657e+06 ... -3.76e+06 -3.759e+06
  * x        (x) float64 -5.23e+05 -5.22e+05 -5.21e+05 ... 3.75e+05 3.76e+05
Data variables:
    RW       (time, y, x) float32 dask.array<chunksize=(1, 900, 900), meta=np.ndarray>
Attributes:
    radarid:         10000
    formatversion:   2
    radolanversion:  01.01.00
    radarlocations:  ['bln', 'drs', 'eis', 'emd', 'ess', 'fbg', 'fld', 'fra',...

Hi @SandeepAllampalli,

the easiest way with the current layout of your dataset:

sub = ds.sel(x=slice(-2.5e+05, 2.5e+05), y=slice(-4.25e+06, -3.75e+06), drop=True)

Please also consult the xarray documentation on Indexing and selecting data.

Hi @kmuehlbauer,

Thank you for the reply. I am trying to subset the data of a specific state in Germany. For that I have X and Y coordinates of the entire Germany. In order to subset data based on a region, first I should find out the latitudinal and longitudinal extent of a region. This is where I am facing problem. How to find out the lat and longitudinal extent of any region? Anyway I will try to post the same in xarray community group as well.

Thank you,
Sandeep

@SandeepAllampalli,

How is your specific region defined? Some binary raster filer or a vector shapefile? You would need to have at least something which defines the boundary of your region. In my example I was assuming a simple rectangular bounding box.

HTH,
Kai

@kmuehlbauer ,

It’s a binary file. This is how my dataset looks like.

<xarray.Dataset>
Dimensions:  (time: 17518, y: 900, x: 900)
Coordinates:
  * time     (time) datetime64[ns] 2006-01-01T00:45:00 ... 2021-12-31T23:50:00
  * y        (y) float64 -4.658e+06 -4.657e+06 ... -3.76e+06 -3.759e+06
  * x        (x) float64 -5.23e+05 -5.22e+05 -5.21e+05 ... 3.75e+05 3.76e+05
Data variables:
    RW       (time, y, x) float32 dask.array<chunksize=(1, 900, 900), meta=np.ndarray>
Attributes:
    radarid:         10000
    formatversion:   2
    radolanversion:  01.01.00
    radarlocations:  ['bln', 'drs', 'eis', 'emd', 'ess', 'fbg', 'fld', 'fra',...

@SandeepAllampalli,

your dataset is on RADOLAN coordinates, but the question is your region of interest. Is this region defined by some file? If so, which format?

Best, Kai

@kmuehlbauer ,

This is a dataset of entire Germany. I was asked to subset only Brandenburg state data. There is no other file.

Thanks
Sandeep

Hi @SandeepAllampalli,

then you would need e.g. a vector shape file with a polygon describing the boundary of the state of Brandenburg. With that vector shape you can subset your raster Dataset. There are xarray based solutions for this (rioxarray). You might also use other tools like GDAL. Also wradlib has some tools which might be used for these tasks (zonalstats).

You might find shapefiles for state of Brandenburg and other German regions at the Bundesamt für Kartographie und Geodäsie Verwaltungsgebiete.

Also at www.naturalearthdata.com you might find state and federal boundaries of many countries.

Best, Kai

Hi @kmuehlbauer ,

I have saved Radolan data to Netcdf file. Now I have a Germany and brandenburg shape files. I am trying to plot the netcdf data onto these corresponding shape files. I am really curious to know how to plot these netcdf data on these shape files?

import geopandas
import rioxarray
import xarray
from shapely.geometry import mapping

rad = xarray.open_dataarray("D:/Sandeep/Thesis/Data/ds2021d.nc")
rad
Out[117]: 
<xarray.DataArray 'RW' (time: 365, y: 900, x: 900)>
[295650000 values with dtype=float32]
Coordinates:
  * y            (y) float64 -4.658e+06 -4.657e+06 ... -3.76e+06 -3.759e+06
  * x            (x) float64 -5.23e+05 -5.22e+05 -5.21e+05 ... 3.75e+05 3.76e+05
  * time         (time) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
    spatial_ref  int32 0
Attributes:
    valid_min:      0
    valid_max:      4095
    standard_name:  rainfall_rate
    long_name:      RW
    unit:           mm h-1
rad.rio.set_spatial_dims(x_dim="x", y_dim="y", inplace=True)
rad.rio.write_crs("epsg:4326", inplace=True)
germany = geopandas.read_file(r'D:\Sandeep\Thesis\vg2500_12-31.utm32s.shape\vg2500\VG2500_LAN.shp', crs="epsg:4326")
clipped = rad.rio.clip(germany.geometry.apply(mapping), rad.rio.crs, drop=False)
1 Like

Hi @SandeepAllampalli,

How did you aggregate the RW data? I’d expect 365 * 24 timesteps as RW is hourly data. Did you sum it up? Then you probably calculated a rainfall_depth with unit mm.

I’m not sure about the crs, it’s definitely not epsg:4326 but the famous RADOLAN projection (RADOLAN Grid — wradlib) which can be plotted using cartopy as shown here RADOLAN Quick Start — wradlib.

For plotting I’d try to do something along the lines of (taken from the above resource):

import cartopy.crs as ccrs

map_proj = ccrs.Stereographic(
    true_scale_latitude=60.0, central_latitude=90.0, central_longitude=10.0
)
doy = 10
clipped.isel(time=doy).plot(subplot_kws=dict(projection=map_proj))

This should only plot the clipped portion of your dataset. If anything fails, please provide the output of print(clipped) so that we can see the layout of the clipped dataset.

More information on recent changes to RADOLAN (for recent datasets) are available in this gist Comparison of legacy and new DWD grids · GitHub.

HTH,
Kai

Hi @kmuehlbauer ,

I have resampled the radolan hourly data to daily, monthly, quarterly and yearly values. Basically here my goal is to subset and plot the data of Brandenburg state. I have been trying different ways in the last couple of days, however I was not successful though. This time I am sharing the radolan hourly data file. Please have a look and let me know what I have to do inorder to plot the data for the specified region.

import geopandas
import rioxarray
import xarray
from shapely.geometry import mapping
import cartopy.crs as ccrs
rad = xarray.open_dataset("D:/Sandeep/Thesis/Data/ds2006h.nc")
rad
Out[65]: 
<xarray.Dataset>
Dimensions:      (time: 8757, y: 900, x: 900)
Coordinates:
  * time         (time) datetime64[ns] 2006-01-01T00:45:00 ... 2006-12-31T22:...
  * y            (y) float64 -4.658e+06 -4.657e+06 ... -3.76e+06 -3.759e+06
  * x            (x) float64 -5.23e+05 -5.22e+05 -5.21e+05 ... 3.75e+05 3.76e+05
    lon          (y, x) float64 ...
    lat          (y, x) float64 ...
    spatial_ref  int32 0
Data variables:
    RW           (time, y, x) float32 ...
Attributes:
    radarid:         10000
    formatversion:   2
    radolanversion:  01.01.00
    radarlocations:  ['bln', 'drs', 'eis', 'emd', 'ess', 'fbg', 'fld', 'fra',...
rad.rio.set_spatial_dims(x_dim="x", y_dim="y", inplace=True)
rad.rio.write_crs("epsg:4839", inplace=True)
brandenburg = geopandas.read_file(r'D:\Sandeep\Thesis\Gemeindegrenzen_-_Brandenburg\Gemeindegrenzen_Brandenburg.shp')

This is how error looks like when I run the below line.

clipped = rad.rio.clip(brandenburg.geometry.apply(mapping), rad.rio.crs, drop=False)
Traceback (most recent call last):

  File "C:\Users\Admin\AppData\Local\Temp\ipykernel_9012\191679719.py", line 1, in <cell line: 1>
    clipped = rad.rio.clip(brandenburg.geometry.apply(mapping), rad.rio.crs, drop=False)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\rioxarray\raster_dataset.py", line 380, in clip
    self._obj[var]

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\rioxarray\raster_array.py", line 930, in clip
    cropped_ds = _clip_xarray(

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\rioxarray\raster_array.py", line 229, in _clip_xarray
    cropped_ds = xds.where(clip_mask_xray)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\common.py", line 1099, in where
    return ops.where_method(self, cond, other)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\ops.py", line 177, in where_method
    return apply_ufunc(

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\computation.py", line 1204, in apply_ufunc
    return apply_dataarray_vfunc(

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\computation.py", line 315, in apply_dataarray_vfunc
    result_var = func(*data_vars)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\computation.py", line 683, in apply_variable_ufunc
    input_data = [

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\computation.py", line 684, in <listcomp>
    broadcast_compat_data(arg, broadcast_dims, core_dims)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\computation.py", line 599, in broadcast_compat_data
    data = variable.data

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\variable.py", line 386, in data
    return self.values

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\variable.py", line 559, in values
    return _as_array_or_item(self._data)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\variable.py", line 265, in _as_array_or_item
    data = np.asarray(data)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\indexing.py", line 653, in __array__
    self._ensure_cached()

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\indexing.py", line 650, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\indexing.py", line 623, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\core\indexing.py", line 524, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\coding\variables.py", line 219, in _scale_offset_decoding
    data = np.array(data, dtype=dtype, copy=True)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\xarray\coding\variables.py", line 139, in _apply_mask
    data = np.asarray(data, dtype=dtype)

MemoryError: Unable to allocate 26.4 GiB for an array with shape (8757, 900, 900) and data type float32
1 Like

@SandeepAllampalli,

obviously your dataset was about to read completely into RAM. You would need to make sure your DataArrays are dask-backed. This can be achieved by adding keyword argument chunks={"time": 1} to the call to xr.open_dataset.

I’ve created a gist showing the workflow Clip RADOLAN xarray dataset using rioxarray with shapefile · GitHub.

HTH,
Kai

Dear @kmuehlbauer,

When I run this line of code:

bounds = brandenburg.to_crs(proj_radolan).bounds.iloc[0]

I get this error:

bounds = brandenburg.to_crs(proj_radolan).bounds.iloc[0]
Traceback (most recent call last):

  File "C:\Users\Admin\AppData\Local\Temp\ipykernel_5896\4005256220.py", line 1, in <cell line: 1>
    bounds = brandenburg.to_crs(proj_radolan).bounds.iloc[0]

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\geopandas\base.py", line 2713, in bounds
    bounds = GeometryArray(self.geometry.values).bounds

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\geopandas\array.py", line 927, in bounds
    return vectorized.bounds(self.data)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\geopandas\_vectorized.py", line 1083, in bounds
    [

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\geopandas\_vectorized.py", line 1084, in <listcomp>
    geom.bounds

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\shapely\geometry\base.py", line 475, in bounds
    return self.impl['bounds'](self)

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\shapely\coords.py", line 187, in __call__
    env = this.envelope

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\shapely\geometry\base.py", line 500, in envelope
    return geom_factory(self.impl['envelope'](self))

  File "C:\Users\Admin\anaconda3\envs\wradlib\lib\site-packages\shapely\topology.py", line 80, in __call__
    return self.fn(this._geom, *args)

OSError: exception: access violation writing 0x0000000000000001