RasterIO

A brief intro to a pythonic raster library

In order to run this notebook, please do what follows:

cd  /media/sf_LVM_shared/my_SE_data/exercise
jupyter-notebook RasterIO_Intro.ipynb

RasterIO is a library that simplifies the use of the raster swiss-knife geospatial library (GDAL) from Python (… and talking about that, how do you pronounce GDAL?).

Ratio: GDAL is C++ library with a public C interface for other languages bindings, therefore its Python API is quite rough from the mean Python programmer’s point of view (i.e. it is not pythonic enough). RasterIO is for rasters what Fiona is for vectors.

Note:`Open Source Approaches in Spatial Data Handling <https://link.springer.com/book/10.1007/978-3-540-74831-1>`__is a good comprehensive book about the history and roles of FOSS in the geospatial domain.

Rasterio basics

[2]:
import rasterio # that's to use it
[3]:
rasterio.__version__
[3]:
'1.3.2'
[38]:
dataset = rasterio.open('geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif', mode='r') # r+ w w+ r

A lots of different parameters can be specified, including all GDAL valid open() attributes. Most of the parameters make sense in write mode to create a new raster file: * dtype * nodata * crs * width, height * transform * count * driver

More information are available in the API reference documentation here. Note that in some specific cases you will need to preceed a regular open() with rasterio.Env() to customize GDAL for your uses (e.g. caching, color tables, driver options, etc.).

Note that the filename in general can be any valid URL and even a dataset within some container file (e.g. a variable in Netcdf or HDF5 file). Even object storage (e.g AWS S3) and zip URLs can be considered.

Let’s see a series of simple methods that can be used to access metadata and pixel values

[39]:
dataset
[39]:
<open DatasetReader name='geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif' mode='r'>
[40]:
dataset.name
[40]:
'geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif'
[41]:
dataset.mode # read-only?
[41]:
'r'
[42]:
dataset.closed # bis still open? you gess what the meaning of dataset.close()
[42]:
False
[43]:
dataset.count
[43]:
1
[44]:
dataset.bounds
[44]:
BoundingBox(left=-83.0, bottom=-56.0, right=-34.0, top=14.0)
[45]:
dataset.transform
[45]:
Affine(0.008333333333333333, 0.0, -83.0,
       0.0, -0.008333333333333333, 14.0)
[46]:
dataset.transform * (0,0)
[46]:
(-83.0, 14.0)
[47]:
dataset.transform * (dataset.width-1, dataset.height-1)
[47]:
(-34.00833333333333, -55.99166666666666)
[48]:
dataset.index(9.99975, 47.90025) # use geographical coords, by long and lat and gives (row,col)
[48]:
(-4069, 11159)
[49]:
dataset.xy(8399, 15999) # give geometric coords long and lat, by row and col !
[49]:
(50.32916666666665, -55.99583333333334)
[50]:
dataset.crs
[50]:
CRS.from_epsg(4326)
[51]:
dataset.shape
[51]:
(8400, 5880)
[52]:
dataset.meta
[52]:
{'driver': 'GTiff',
 'dtype': 'int16',
 'nodata': -9999.0,
 'width': 5880,
 'height': 8400,
 'count': 1,
 'crs': CRS.from_epsg(4326),
 'transform': Affine(0.008333333333333333, 0.0, -83.0,
        0.0, -0.008333333333333333, 14.0)}
[53]:
dataset.dtypes
[53]:
('int16',)

Raster bands can be loaded as a whole thanks to numpy

[54]:
dataset.indexes
[54]:
(1,)
[55]:
band = dataset.read(1) # loads the first band in a numpy array
[56]:
band
[56]:
array([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
       [-9999, -9999, -9999, ..., -9999, -9999, -9999],
       [-9999, -9999, -9999, ..., -9999, -9999, -9999],
       ...,
       [-9999, -9999, -9999, ..., -9999, -9999, -9999],
       [-9999, -9999, -9999, ..., -9999, -9999, -9999],
       [-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
[57]:
band[0,0]
[57]:
-9999
[58]:
band[4440,1555] # access by row,col as a matrix
[58]:
1325
[59]:
dataset.width, dataset.height
[59]:
(5880, 8400)
[60]:
band[949,1549]
[60]:
95

Rasterio includes also some utility methods to show raster data, using matplotlib under the hood.

[61]:
import rasterio.plot
[67]:
rasterio.plot.show(band, title='Elevation')
../_images/PYTHON_RasterIO_Intro_33_0.png
[67]:
<AxesSubplot:title={'center':'Elevation'}>
[63]:
rasterio.plot.plotting_extent(dataset)
[63]:
(-83.0, -34.0, -56.0, 14.0)
[64]:
rasterio.plot.show_hist(band, bins=100)
../_images/PYTHON_RasterIO_Intro_35_0.png

Note that you need to explicity filter out nodata values at read time.

[65]:
band = dataset.read(masked=True)
band
[65]:
masked_array(
  data=[[[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]]],
  mask=[[[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]]],
  fill_value=-9999,
  dtype=int16)
[66]:
rasterio.plot.show_hist(band, bins=100)
../_images/PYTHON_RasterIO_Intro_38_0.png

The mask operation is also efection the plot layout.

[68]:
rasterio.plot.show(band, title='Elevation')
../_images/PYTHON_RasterIO_Intro_40_0.png
[68]:
<AxesSubplot:title={'center':'Elevation'}>

Alternatively (e.g. when raster missing nodata definition, which in practice happens quite often :-)) you need to explicitly mask on your own the input data.

[69]:
import numpy as np

band = dataset.read()
band_masked = np.ma.masked_array(band, mask=(band == -9999.0)) # mask whatever required
band_masked
rasterio.plot.show_hist(band_masked, bins=100)
../_images/PYTHON_RasterIO_Intro_42_0.png

The rasterio can package can also read directly multi-bands images.

[103]:
multi = rasterio.open('geodata/glad_ard/942_crop.tif', mode='r')
[71]:
m = multi.read()
m.shape
[71]:
(8, 440, 400)

Note that rasterio uses bands/rows/cols order in managing images, which is not the same of other packages!

[93]:
def pct_clip(array,pct=[2,98]):
    array_min, array_max = np.nanpercentile(array,pct[0]), np.nanpercentile(array,pct[1])
    clip = (array - array_min) / (array_max - array_min)
    clip[clip>1]=1
    clip[clip<0]=0
    return clip

with rasterio.open("geodata/glad_ard/942_crop.tif") as src:
    with rasterio.open(
            '942_crop.tif', 'w+',
            driver='GTiff',
            dtype= rasterio.float32,
            count=3,
            crs = src.crs,
            width=src.width,
            height=src.height,
            transform=src.transform,
        ) as dst:
        V = pct_clip(src.read(4))
        dst.write(V,1)
        V = pct_clip(src.read(3))
        dst.write(V,2)
        V = pct_clip(src.read(2))
        dst.write(V,3)

fig,ax=plt.subplots()
with rasterio.open("942_crop.tif") as src2:
    rasterio.plot.show(src2.read(),transform=src2.transform,ax=ax)
../_images/PYTHON_RasterIO_Intro_47_0.png

1.2 An example of computation: NDVI

The real computation via numpy arrays

[105]:
multi_msk = multi.read(masked=True)
ndvi = (multi_msk[3]-multi_msk[2]) / (multi_msk[3]+multi_msk[2])
ndvi.min(), ndvi.max()
[105]:
(0.07209653092006033, 0.851783458062008)
[106]:
multi.shape
[106]:
(440, 400)

In order to write the result all metainfo must be prepared, the easiest way is by using the input ones.

[107]:
multi.profile
[107]:
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 400, 'height': 440, 'count': 8, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.00025, 0.0, 15.24,
       0.0, -0.00025, 38.11), 'tiled': False, 'compress': 'deflate', 'interleave': 'pixel'}
[108]:
profile = multi.profile
profile.update(dtype=rasterio.float32, count=1, driver='GTiff')
profile
[108]:
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 400, 'height': 440, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.00025, 0.0, 15.24,
       0.0, -0.00025, 38.11), 'tiled': False, 'compress': 'deflate', 'interleave': 'pixel'}
[109]:
with rasterio.open('geodata/glad_ard/942_crop_ndvi.tif', mode='w', **profile) as out:
    out.write(ndvi.astype(rasterio.float32), 1)
[110]:
rasterio.plot.show(ndvi)
../_images/PYTHON_RasterIO_Intro_56_0.png
[110]:
<AxesSubplot:>
[111]:
import numpy as np

urb = np.ma.masked_array(ndvi, mask=(ndvi >= 0.6))
rasterio.plot.show_hist(urb)
rasterio.plot.show(urb,cmap='Greys')
../_images/PYTHON_RasterIO_Intro_57_0.png
../_images/PYTHON_RasterIO_Intro_57_1.png
[111]:
<AxesSubplot:>
[112]:
ndvi_stats = [ndvi.min(), ndvi.mean(), np.ma.median(ndvi), ndvi.max(), ndvi.std()]
ndvi_stats
[112]:
[0.07209653092006033,
 0.579103729330517,
 0.6012272128758722,
 0.851783458062008,
 0.11555229471805274]
[113]:
urb_stats = [urb.min(), urb.mean(), np.ma.median(urb), urb.max(), urb.std()]
urb_stats
[113]:
[0.07209653092006033,
 0.48462524261699785,
 0.49912903658046365,
 0.5999793856936714,
 0.08450404687909031]

It is possibile to use a vector model for masking in rasterio, let’s see how. First of all, we are creating a GeoJSON vector.

[137]:
%%bash

cat >/tmp/clip.json <<EOF
{
"type": "FeatureCollection",
"name": "clip",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::32119" } },
"features": [
{ "type": "Feature", "properties": { "id": null }, "geometry": { "type": "Polygon", "coordinates": [ [ [15.24, 38.09 ], [ 15.32, 38.09 ], [15.32, 38.00], [15.24, 38.00], [15.24, 38.09] ] ] } }
]
}
EOF

[138]:
! ogrinfo -al /tmp/clip.json
INFO: Open of `/tmp/clip.json'
      using driver `GeoJSON' successful.

Layer name: clip
Geometry: Polygon
Feature Count: 1
Extent: (15.240000, 38.000000) - (15.320000, 38.090000)
Layer SRS WKT:
PROJCRS["NAD83 / North Carolina",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 North Carolina zone (meters)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",33.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-79,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",36.1666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",34.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",609601.22,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - North Carolina - counties of Alamance; Alexander; Alleghany; Anson; Ashe; Avery; Beaufort; Bertie; Bladen; Brunswick; Buncombe; Burke; Cabarrus; Caldwell; Camden; Carteret; Caswell; Catawba; Chatham; Cherokee; Chowan; Clay; Cleveland; Columbus; Craven; Cumberland; Currituck; Dare; Davidson; Davie; Duplin; Durham; Edgecombe; Forsyth; Franklin; Gaston; Gates; Graham; Granville; Greene; Guilford; Halifax; Harnett; Haywood; Henderson; Hertford; Hoke; Hyde; Iredell; Jackson; Johnston; Jones; Lee; Lenoir; Lincoln; Macon; Madison; Martin; McDowell; Mecklenburg; Mitchell; Montgomery; Moore; Nash; New Hanover; Northampton; Onslow; Orange; Pamlico; Pasquotank; Pender; Perquimans; Person; Pitt; Polk; Randolph; Richmond; Robeson; Rockingham; Rowan; Rutherford; Sampson; Scotland; Stanly; Stokes; Surry; Swain; Transylvania; Tyrrell; Union; Vance; Wake; Warren; Washington; Watauga; Wayne; Wilkes; Wilson; Yadkin; Yancey."],
        BBOX[33.83,-84.33,36.59,-75.38]],
    ID["EPSG",32119]]
Data axis to CRS axis mapping: 1,2
id: String (0.0)
OGRFeature(clip):0
  id (String) = (null)
  POLYGON ((15.24 38.09,15.32 38.09,15.32 38.0,15.24 38.0,15.24 38.09))

[139]:
import rasterio.mask
import fiona
[140]:
with fiona.open('/tmp/clip.json', 'r') as geojson:
    shapes = [feature['geometry'] for feature in geojson]
shapes
[140]:
[{'type': 'Polygon',
  'coordinates': [[(15.24, 38.09),
    (15.32, 38.09),
    (15.32, 38.0),
    (15.24, 38.0),
    (15.24, 38.09)]]}]
[141]:
clip_image, clip_transform = rasterio.mask.mask(multi, shapes, crop=True) # why transform? ;-)
clip_image.shape
[141]:
(8, 360, 320)
[142]:
multi.meta
[142]:
{'driver': 'GTiff',
 'dtype': 'uint16',
 'nodata': None,
 'width': 400,
 'height': 440,
 'count': 8,
 'crs': CRS.from_epsg(4326),
 'transform': Affine(0.00025, 0.0, 15.24,
        0.0, -0.00025, 38.11)}
[143]:
profile = multi.meta
profile.update({'driver': 'GTiff',
                'width': clip_image.shape[1],
                'height': clip_image.shape[2],
                'transform': clip_transform,
               })
profile
[143]:
{'driver': 'GTiff',
 'dtype': 'uint16',
 'nodata': None,
 'width': 360,
 'height': 320,
 'count': 8,
 'crs': CRS.from_epsg(4326),
 'transform': Affine(0.00025, 0.0, 15.24,
        0.0, -0.00025, 38.089999999999996)}
[144]:
dst = rasterio.open('/tmp/clipped.tif', 'w+', **profile)
dst.write(clip_image)
[145]:
dst.shape
[145]:
(320, 360)
[146]:
rst = dst.read(masked=True)
rst.shape
[146]:
(8, 320, 360)
[147]:
rasterio.plot.show(rst[0])
../_images/PYTHON_RasterIO_Intro_71_0.png
[147]:
<AxesSubplot:>

Note that all read/write operation in RasterIO are performed for the whole size of the dataset, so the package is limited (somehow …) by memory available. An alternative is using windowing in read/write operation.

[148]:
import rasterio.windows

rasterio.windows.Window(0,10,100,100)
[148]:
Window(col_off=0, row_off=10, width=100, height=100)
[149]:
win = rasterio.windows.Window(0,10,100,100)
src = rasterio.open('/tmp/clipped.tif', mode='r')
w = src.read(window=win)
w.shape
[149]:
(8, 100, 100)

Of course if you tile the input source and output sub-windows of results, a new trasform will need to be provided for each tile, and the window_transform() method can be used for that.

[150]:
src.transform
[150]:
Affine(0.00025, 0.0, 15.24,
       0.0, -0.00025, 38.089999999999996)
[151]:
src.window_transform(win)
[151]:
Affine(0.00025, 0.0, 15.24,
       0.0, -0.00025, 38.0875)

Note that in many cases could be convenient aligning to the existing (format related) block size organization of the file, which should be the minimal chunck of I/O in GDAL.

[152]:
for i, shape in enumerate(src.block_shapes, 1):
    print(i, shape)
1 (1, 360)
2 (1, 360)
3 (1, 360)
4 (1, 360)
5 (1, 360)
6 (1, 360)
7 (1, 360)
8 (1, 360)

More information, examples and documentation about RasterIO are here

## 1.3 The rio CLI tool

RasterIO has also a command line tool (rio) which can be used to perform a series of operation on rasters, which complementary in respect with the GDAL tools.If you install rasterio from distribution/kit it will reside in the ordinary system paths (note: you could find a rasterio binary instead of rio). If you install from the PyPI repository via pip it will install under ~/.local/bin.

[153]:
! rio --help
Usage: rio [OPTIONS] COMMAND [ARGS]...

  Rasterio command line interface.

Options:
  -v, --verbose           Increase verbosity.
  -q, --quiet             Decrease verbosity.
  --aws-profile TEXT      Select a profile from the AWS credentials file
  --aws-no-sign-requests  Make requests anonymously
  --aws-requester-pays    Requester pays data transfer costs
  --version               Show the version and exit.
  --gdal-version
  --show-versions         Show dependency versions
  --help                  Show this message and exit.

Commands:
  blocks     Write dataset blocks as GeoJSON features.
  bounds     Write bounding boxes to stdout as GeoJSON.
  calc       Raster data calculator.
  clip       Clip a raster to given bounds.
  convert    Copy and convert raster dataset.
  edit-info  Edit dataset metadata.
  env        Print information about the Rasterio environment.
  gcps       Print ground control points as GeoJSON.
  info       Print information about a data file.
  insp       Open a data file and start an interpreter.
  mask       Mask in raster using features.
  merge      Merge a stack of raster datasets.
  overview   Construct overviews in an existing dataset.
  rasterize  Rasterize features.
  rm         Delete a dataset.
  sample     Sample a dataset.
  shapes     Write shapes extracted from bands or masks.
  stack      Stack a number of bands into a multiband dataset.
  transform  Transform coordinates.
  warp       Warp a raster dataset.
[154]:
! rio bounds --indent 1 geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif # output a GeoJSON polygon for the bounding box
{
 "bbox": [
  -83.0,
  -56.0,
  -34.0,
  14.0
 ],
 "geometry": {
  "coordinates": [
   [
    [
     -83.0,
     -56.0
    ],
    [
     -34.0,
     -56.0
    ],
    [
     -34.0,
     14.0
    ],
    [
     -83.0,
     14.0
    ],
    [
     -83.0,
     -56.0
    ]
   ]
  ],
  "type": "Polygon"
 },
 "properties": {
  "filename": "SA_elevation_mn_GMTED2010_mn_msk.tif",
  "id": "0",
  "title": "geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif"
 },
 "type": "Feature"
}

A quite interesting feature is the raster calculator which uses a Lisp-like language notation, which is the s-expression notation of Snuggs, part of numpy. Largely undocumented, you need to have a look to the code here and its README file :-(

[155]:
! rio calc "(+ 2.0 (* 0.95 (read 1)))" --overwrite geodata/dem/SA_elevation_mn_GMTED2010_mn_msk.tif /tmp/elev.tif # Lisp like lists :-(

Simple comuputations can be done via rio or written via ad hoc script at your will.