Estimation of tree height using GEDI dataset - Predictors extraction at point location

The tree_height Dataset is available following the procedure described at https://spatial-ecology.net/docs/build/html/CASESTUDY/Tree_Height_01DataExplore.html

In order to run this notebook, please do what follows:

cd  /media/sf_LVM_shared/my_SE_data/exercise
pip3 install rasterio
jupyter-notebook Tree_Height_02Predictors_extraction

Hint: you should never use ``pip3`` under ``sudo``, else you could break your system-wide Python environment in very creative and subtle ways.

Introduction

The main objectives of this tutorial is to show several ways how to extract pixel value at point location.

We will use the files stored at

[27]:
! ls tree_height/txt/eu_x_y_*
tree_height/txt/eu_x_y_height_predictors_select.txt
tree_height/txt/eu_x_y_height_select.txt
tree_height/txt/eu_x_y_predictors_select.txt
tree_height/txt/eu_x_y_select.txt

In particular we will reproduce this data-table

[2]:
! head -3 tree_height/txt/eu_x_y_predictors_select.txt # what we would get
ID X Y BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 convergence cti dev-magnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness ORCDRC_WeigAver outlet_dist_dw_basin SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm treecover
1 6.050001 49.727499 1540 13 2113 5893 -10.4865598678589 -238043120 1.15841722488403 0.0690935924649239 353.983123779297 23 276.87109375 46.444091796875 347.665405273438 0.0424997806549072 9 780403 19.7989921569824 440.672210693359 85
2 6.0500017 49.922155 1491 12 1993 5912 33.2743606567383 -208915344 -1.75534081459045 0.269112348556519 267.511688232422 19 -49.5263671875 19.552734375 -130.541748046875 0.182779803872108 16 772777 20.8894119262695 457.756195068359 85

using the latitude longitude of each point

[28]:
! head tree_height/txt/eu_x_y_select.txt
6.050001 49.727499
6.0500017 49.922155
6.0500021 48.602377
6.0500089 48.151979
6.0500102 49.58841
6.0500143 48.608456
6.0500165 48.571401
6.0500189 49.921613
6.0500201 48.822645
6.0500238 49.847522
[1]:
! wc -l tree_height/txt/eu_x_y_select.txt # quite a lots of coords!
1267239 tree_height/txt/eu_x_y_select.txt

and the predictors

[30]:
! ls tree_height/geodata_raster/*.tif
tree_height/geodata_raster/BLDFIE_WeigAver.tif
tree_height/geodata_raster/CECSOL_WeigAver.tif
tree_height/geodata_raster/CHELSA_bio18.tif
tree_height/geodata_raster/CHELSA_bio4.tif
tree_height/geodata_raster/convergence.tif
tree_height/geodata_raster/cti.tif
tree_height/geodata_raster/dev-magnitude.tif
tree_height/geodata_raster/eastness.tif
tree_height/geodata_raster/elev.tif
tree_height/geodata_raster/forestheight.tif
tree_height/geodata_raster/glad_ard_SVVI_max.tif
tree_height/geodata_raster/glad_ard_SVVI_med.tif
tree_height/geodata_raster/glad_ard_SVVI_min.tif
tree_height/geodata_raster/latitude.tif
tree_height/geodata_raster/longitude.tif
tree_height/geodata_raster/northness.tif
tree_height/geodata_raster/ORCDRC_WeigAver.tif
tree_height/geodata_raster/outlet_dist_dw_basin.tif
tree_height/geodata_raster/SBIO3_Isothermality_5_15cm.tif
tree_height/geodata_raster/SBIO4_Temperature_Seasonality_5_15cm.tif
tree_height/geodata_raster/treecover.tif

Running rio tool and using the bash shell and its tools

by Francesco Lovergine

RasterIO has also a command line tool (rio) which can be used to perform a series of operation on rasters, which complementary in respect with the GDAL tools.If you install rasterio from distribution/kit it will reside in the ordinary system paths (note: you could find a rasterio binary instead of rio). If you install from the PyPI repository via pip it will install under ~/.local/bin.

[33]:
! rio --help
Usage: rio [OPTIONS] COMMAND [ARGS]...

  Rasterio command line interface.

Options:
  -v, --verbose           Increase verbosity.
  -q, --quiet             Decrease verbosity.
  --aws-profile TEXT      Select a profile from the AWS credentials file
  --aws-no-sign-requests  Make requests anonymously
  --aws-requester-pays    Requester pays data transfer costs
  --version               Show the version and exit.
  --gdal-version
  --show-versions         Show dependency versions
  --help                  Show this message and exit.

Commands:
  blocks     Write dataset blocks as GeoJSON features.
  bounds     Write bounding boxes to stdout as GeoJSON.
  calc       Raster data calculator.
  clip       Clip a raster to given bounds.
  convert    Copy and convert raster dataset.
  edit-info  Edit dataset metadata.
  env        Print information about the Rasterio environment.
  gcps       Print ground control points as GeoJSON.
  info       Print information about a data file.
  insp       Open a data file and start an interpreter.
  mask       Mask in raster using features.
  merge      Merge a stack of raster datasets.
  overview   Construct overviews in an existing dataset.
  rasterize  Rasterize features.
  rm         Delete a dataset.
  sample     Sample a dataset.
  shapes     Write shapes extracted from bands or masks.
  stack      Stack a number of bands into a multiband dataset.
  transform  Transform coordinates.
  warp       Warp a raster dataset.

The goal of the next few snippets of code is creating in Python and rasterio. That can be done either by using th rio tool or completely by Python script.

First of all, the input for rio-sample submodule needs to be in list form, and that can be easily done via sed tool (or even awk if you prefer so):

[3]:
! head tree_height/txt/eu_x_y_select.txt | sed -e 's/ /,/' -e 's/^/[/' -e 's/$/]/' # why and how?
[6.050001,49.727499]
[6.0500017,49.922155]
[6.0500021,48.602377]
[6.0500089,48.151979]
[6.0500102,49.58841]
[6.0500143,48.608456]
[6.0500165,48.571401]
[6.0500189,49.921613]
[6.0500201,48.822645]
[6.0500238,49.847522]

Once created a compatible input data file for rio, it can be used to sample every georaster.

[4]:
! head tree_height/txt/eu_x_y_select.txt | sed -e 's/ /,/' -e 's/^/[/' -e 's/$/]/' | rio sample tree_height/geodata_raster/BLDFIE_WeigAver.tif
[1540]
[1491]
[1521]
[1526]
[1547]
[1515]
[1520]
[1490]
[1554]
[1521]

Now, in principle you could stack together the input predictors and apply later the sampling. That could be done via gdalbuildvrt or rio-stack, BUT they both work currently only for homogeneous dtype, which is not our case, unfortunately.

Homework: extract data types and sizes from all those files and check rasters are different types with the same dimensions. Hint: you can do that via rio or gdal tools or pktools

[5]:
# rio stack `ls tree_height/geodata_raster/*.tif|grep -Ev 'latitude|longitude'` -o /tmp/tree_height_preds.tif
# gdalbuildvrt tree_height/geodata_raster/*.tif -o /tmp/predictors.vrt

One quite simple way to get the final result is using rio-sample to sample separately each field, then using paste to join together each per-field file.

[24]:
%%bash

# this is VERY slow on 1M of records...
time (
FILES=$(awk '{if (NR==1) print $4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22}' tree_height/txt/eu_x_y_predictors_select.txt)
for n in $FILES
do
    sed -e 's/ /,/' -e 's/^/[/' -e 's/$/]/' tree_height/txt/eu_x_y_select.txt | rio sample tree_height/geodata_raster/$n.tif | tr -d '[]' > /tmp/$n
done

seq $(wc -l tree_height/txt/eu_x_y_select.txt|cut -d' ' -f1) >/tmp/ids
echo 'ID X Y' "$FILES" >  tree_height/txt/eu_x_y_predictors_select_rio.txt
paste -d ' ' /tmp/ids tree_height/txt/eu_x_y_select.txt \
$(for n in  $(awk '{if (NR==1) print $4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22}' tree_height/txt/eu_x_y_predictors_select.txt); do ls /tmp/$n ; done) \
>> tree_height/txt/eu_x_y_predictors_select_rio.txt
)


real    20m51.133s
user    18m36.829s
sys     2m39.808s

Running time

real    20m51.133s
user    18m36.829s
sys     2m39.808s
[25]:
! head -3  tree_height/txt/eu_x_y_predictors_select_rio.txt
ID X Y BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 convergence cti dev-magnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness ORCDRC_WeigAver outlet_dist_dw_basin SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm treecover
1 6.050001 49.727499 1540 13 2113 5893 -10.486559867858887 -238043120 1.1584172248840332 0.06909359246492386 353.9831237792969 23 276.87109375 46.444091796875 347.6654052734375 0.04249978065490723 9 780403 19.798992156982422 440.6722106933594 85
2 6.0500017 49.922155 1491 12 1993 5912 33.27436065673828 -208915344 -1.755340814590454 0.26911234855651855 267.5116882324219 19 -49.5263671875 19.552734375 -130.541748046875 0.18277980387210846 16 772777 20.88941192626953 457.7561950683594 85

Running gdallocationinfo tool and using the bash shell and its tools

By Giuseppe Amatulli

We are going to use gdallocationinfo to extract information in each raster predictors.

Usage: gdallocationinfo [--help-general] [-xml] [-lifonly] [-valonly]
                    [-b band]* [-overview overview_level]
                    [-l_srs srs_def] [-geoloc] [-wgs84]
                    [-oo NAME=VALUE]* srcfile x y
[18]:
%%bash

time (
for file in $(awk '{if (NR==1) print $1="", $2="", $3="", $0}' tree_height/txt/eu_x_y_predictors_select.txt);  do
echo $file > tree_height/txt/$file.txt
gdallocationinfo -valonly -geoloc -wgs84 tree_height/geodata_raster/$file.tif < tree_height/txt/eu_x_y_select.txt >> tree_height/txt/$file.txt
done

paste -d " " <(awk '{ print $1,$2,$3}' tree_height/txt/eu_x_y_predictors_select.txt) \
$(for file in  $(awk '{if (NR==1)  print $1="", $2="", $3="", $0}' tree_height/txt/eu_x_y_predictors_select.txt); do ls tree_height/txt/$file.txt; done) \
> tree_height/txt/eu_x_y_predictors_select_gdal.txt
)

real    1m50.404s
user    1m24.222s
sys     0m12.117s

Running time

real    1m50.404s
user    1m24.222s
sys     0m12.117s
[19]:
! head -3 tree_height/txt/eu_x_y_predictors_select_gdal.txt
ID X Y BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 convergence cti dev-magnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness ORCDRC_WeigAver outlet_dist_dw_basin SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm treecover
1 6.050001 49.727499 1540 13 2113 5893 -10.4865598678589 -238043120 1.15841722488403 0.0690935924649239 353.983123779297 23 276.87109375 46.444091796875 347.665405273438 0.0424997806549072 9 780403 19.7989921569824 440.672210693359 85
2 6.0500017 49.922155 1491 12 1993 5912 33.2743606567383 -208915344 -1.75534081459045 0.269112348556519 267.511688232422 19 -49.5263671875 19.552734375 -130.541748046875 0.182779803872108 16 772777 20.8894119262695 457.756195068359 85

Of course, the same result can be achieved with direct read/write via rasterio packagein one full Python script.

Using Python with rasterio

Script construction, step by step. By Francesco Lovergine

Due to the size of the dataset it is mandatory increasing the memory available for the VM and possibly add on demand swap space (i.e. virtual memory on disk), for instance via:

sudo apt install swapspace

Of course, consider that eventually it could require a lot of temporary storage taken under /var/lib/swapspace which could be freed after use.

[51]:
import csv
import glob
import rasterio
import os.path
[32]:
files = []
for filename in glob.glob('tree_height/geodata_raster/[!l]*.tif'):
    ds = rasterio.open(filename, mode='r')
    files.append(ds)
[33]:
files
[33]:
[<open DatasetReader name='tree_height/geodata_raster/BLDFIE_WeigAver.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CECSOL_WeigAver.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CHELSA_bio18.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CHELSA_bio4.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/ORCDRC_WeigAver.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/SBIO3_Isothermality_5_15cm.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/SBIO4_Temperature_Seasonality_5_15cm.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/convergence.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/cti.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/dev-magnitude.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/eastness.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/elev.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/forestheight.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_max.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_med.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_min.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/northness.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/outlet_dist_dw_basin.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/treecover.tif' mode='r'>]
[34]:
print(len(files))
19
[35]:
os.path.basename(files[0].name)
[35]:
'BLDFIE_WeigAver.tif'
[36]:
(name, ext) = os.path.splitext(os.path.basename(files[0].name))
name
[36]:
'BLDFIE_WeigAver'
[37]:
header = ''
for ds in files:
    field_name =   os.path.splitext(os.path.basename(ds.name))[0]
    header += ' '+field_name
header.lstrip()
[37]:
'BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 ORCDRC_WeigAver SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm convergence cti dev-magnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness outlet_dist_dw_basin treecover'

Starting from those files now open in rasterio, it is possible to read the band and store them in a list

[40]:
band = files[0].read(1)
band
[40]:
array([[ 1523,  1523, 65535, ..., 65535, 65535, 65535],
       [65535, 65535, 65535, ..., 65535, 65535, 65535],
       [65535, 65535, 65535, ..., 65535, 65535, 65535],
       ...,
       [ 1514,  1514,  1514, ...,  1455,  1455,  1455],
       [65535, 65535, 65535, ...,  1457,  1457,  1457],
       [65535, 65535, 65535, ...,  1458,  1458,  1458]], dtype=uint16)
[41]:
bands = []
for ds in files:
    bands.append(ds.read(1))
[42]:
bands
[42]:
[array([[ 1523,  1523, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        ...,
        [ 1514,  1514,  1514, ...,  1455,  1455,  1455],
        [65535, 65535, 65535, ...,  1457,  1457,  1457],
        [65535, 65535, 65535, ...,  1458,  1458,  1458]], dtype=uint16),
 array([[   11,    11, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        ...,
        [   16,    16,    16, ...,    20,    20,    20],
        [65535, 65535, 65535, ...,    19,    19,    19],
        [65535, 65535, 65535, ...,    19,    19,    19]], dtype=uint16),
 array([[ 2024,  2024, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        ...,
        [ 2401,  2401,  2401, ...,  4405,  4405,  4405],
        [65535, 65535, 65535, ...,  4406,  4406,  4406],
        [65535, 65535, 65535, ...,  4408,  4408,  4408]], dtype=uint16),
 array([[ 5879,  5879, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        ...,
        [ 6272,  6272,  6272, ...,  6348,  6348,  6348],
        [65535, 65535, 65535, ...,  6348,  6348,  6348],
        [65535, 65535, 65535, ...,  6349,  6349,  6349]], dtype=uint16),
 array([[   12,    12, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        [65535, 65535, 65535, ..., 65535, 65535, 65535],
        ...,
        [    9,     9,     9, ...,    23,    23,    23],
        [65535, 65535, 65535, ...,    22,    22,    22],
        [65535, 65535, 65535, ...,    21,    21,    21]], dtype=uint16),
 array([[   22.9   ,    22.9   , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        [-9999.    , -9999.    , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        [-9999.    , -9999.    , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        ...,
        [   21.3825,    21.3825,    21.3825, ...,    24.24  ,    24.24  ,
            24.24  ],
        [-9999.    , -9999.    , -9999.    , ...,    24.264 ,    24.264 ,
            24.264 ],
        [-9999.    , -9999.    , -9999.    , ...,    24.288 ,    24.288 ,
            24.288 ]], dtype=float32),
 array([[  470.5   ,   470.5   , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        [-9999.    , -9999.    , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        [-9999.    , -9999.    , -9999.    , ..., -9999.    , -9999.    ,
         -9999.    ],
        ...,
        [  514.73  ,   514.73  ,   514.73  , ...,   491.1825,   491.1825,
           491.1825],
        [-9999.    , -9999.    , -9999.    , ...,   491.0895,   491.0895,
           491.0895],
        [-9999.    , -9999.    , -9999.    , ...,   490.9965,   490.9965,
           490.9965]], dtype=float32),
 array([[-3.0291458e+01, -3.0291458e+01, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        ...,
        [-5.1806598e+00, -5.1806598e+00, -5.7902360e+00, ...,
          4.0210270e+01,  4.8858276e+01,  4.8858276e+01],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          4.1173668e+01,  5.1179237e+01,  5.1179237e+01],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          3.4467628e+01,  4.4355145e+01,  4.4355145e+01]], dtype=float32),
 array([[ -266332416,  -266332416, -2147483648, ..., -2147483648,
         -2147483648, -2147483648],
        [-2147483648, -2147483648, -2147483648, ..., -2147483648,
         -2147483648, -2147483648],
        [-2147483648, -2147483648, -2147483648, ..., -2147483648,
         -2147483648, -2147483648],
        ...,
        [ -330251232,  -330251232,  -330806208, ...,   -72420176,
           -38702784,   -38702784],
        [-2147483648, -2147483648, -2147483648, ...,   -97209304,
           -71794584,   -71794584],
        [-2147483648, -2147483648, -2147483648, ...,  -112299568,
           -99132240,   -99132240]], dtype=int32),
 array([[ 6.7067027e-01,  6.7067027e-01, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        ...,
        [ 5.0633794e-01,  5.0633794e-01,  5.0400752e-01, ...,
          1.6250600e+00,  1.6136067e+00,  1.6136067e+00],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          1.6310033e+00,  1.6191456e+00,  1.6191456e+00],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          1.6421642e+00,  1.6330340e+00,  1.6330340e+00]], dtype=float32),
 array([[-6.9691122e-02, -6.9691122e-02, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
         -9.9990000e+03, -9.9990000e+03, -9.9990000e+03],
        ...,
        [-1.5847646e-02, -1.5847646e-02, -7.8007574e-03, ...,
          1.3672341e-02,  1.2951906e-02,  1.2951906e-02],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          1.4678583e-02,  1.4141800e-02,  1.4141800e-02],
        [-9.9990000e+03, -9.9990000e+03, -9.9990000e+03, ...,
          1.3527840e-02,  1.3261721e-02,  1.3261721e-02]], dtype=float32),
 array([[  379.93466,   379.93466, -9999.     , ..., -9999.     ,
         -9999.     , -9999.     ],
        [-9999.     , -9999.     , -9999.     , ..., -9999.     ,
         -9999.     , -9999.     ],
        [-9999.     , -9999.     , -9999.     , ..., -9999.     ,
         -9999.     , -9999.     ],
        ...,
        [  259.5505 ,   259.5505 ,   260.30533, ...,   737.22675,
           736.4584 ,   736.4584 ],
        [-9999.     , -9999.     , -9999.     , ...,   737.5121 ,
           736.75574,   736.75574],
        [-9999.     , -9999.     , -9999.     , ...,   738.052  ,
           737.48334,   737.48334]], dtype=float32),
 array([[  0,   0, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        ...,
        [  4,   4,   3, ...,  27,  29,  29],
        [255, 255, 255, ...,  24,  25,  28],
        [255, 255, 255, ...,  22,  26,  27]], dtype=uint8),
 array([[ 1492.6785  ,  1306.2141  , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        ...,
        [  406.16797 ,   512.14136 ,   852.699   , ...,    13.673584,
            16.567871,  -119.38574 ],
        [-9999.      , -9999.      , -9999.      , ...,    53.41284 ,
           106.63818 ,  -103.65137 ],
        [-9999.      , -9999.      , -9999.      , ...,  -225.1709  ,
            59.3938  ,   -91.39722 ]], dtype=float32),
 array([[ 1085.7559  ,  1035.9121  , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        ...,
        [  370.3777  ,   485.29468 ,   596.9763  , ...,  -115.798096,
           -88.07202 ,   -88.33276 ],
        [-9999.      , -9999.      , -9999.      , ...,   -47.25415 ,
           -46.921387,  -104.95679 ],
        [-9999.      , -9999.      , -9999.      , ...,  -243.46167 ,
           -13.637451,  -121.86597 ]], dtype=float32),
 array([[  959.5752  ,   688.7239  , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
         -9999.      , -9999.      ],
        ...,
        [  311.42188 ,   464.50146 ,   542.05835 , ...,  -282.40723 ,
          -237.9851  ,  -215.59985 ],
        [-9999.      , -9999.      , -9999.      , ...,  -162.25928 ,
          -190.5232  ,  -179.57202 ],
        [-9999.      , -9999.      , -9999.      , ...,  -221.65869 ,
           -44.732178,  -285.79663 ]], dtype=float32),
 array([[-7.71632195e-02, -7.71632195e-02, -9.99900000e+03, ...,
         -9.99900000e+03, -9.99900000e+03, -9.99900000e+03],
        [-9.99900000e+03, -9.99900000e+03, -9.99900000e+03, ...,
         -9.99900000e+03, -9.99900000e+03, -9.99900000e+03],
        [-9.99900000e+03, -9.99900000e+03, -9.99900000e+03, ...,
         -9.99900000e+03, -9.99900000e+03, -9.99900000e+03],
        ...,
        [ 1.26748577e-01,  1.26748577e-01,  1.23799555e-01, ...,
          5.50076319e-03,  5.47871692e-03,  5.47871692e-03],
        [-9.99900000e+03, -9.99900000e+03, -9.99900000e+03, ...,
          6.67779474e-03,  6.95462432e-03,  6.95462432e-03],
        [-9.99900000e+03, -9.99900000e+03, -9.99900000e+03, ...,
          6.94931159e-03,  7.58498861e-03,  7.58498861e-03]], dtype=float32),
 array([[ 790129,  790129,   -9999, ...,   -9999,   -9999,   -9999],
        [  -9999,   -9999,   -9999, ...,   -9999,   -9999,   -9999],
        [  -9999,   -9999,   -9999, ...,   -9999,   -9999,   -9999],
        ...,
        [ 769627,  769627,  769634, ..., 2915172, 2915151, 2915151],
        [  -9999,   -9999,   -9999, ..., 2915209, 2915168, 2915168],
        [  -9999,   -9999,   -9999, ..., 2915389, 2915195, 2915195]],
       dtype=int32),
 array([[ 37,  37, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        ...,
        [ 29,  27,  27, ...,  95,  87,  94],
        [255, 255, 255, ...,  98,  92,  85],
        [255, 255, 255, ...,  72,  72,  85]], dtype=uint8)]

Now let’s have a trial for concatenating raster values

[72]:
with open('tree_height/txt/eu_x_y_select.txt') as csvfile:
    coords = csv.reader(csvfile, delimiter=' ')
    i = 1
    for (long, lat) in coords:
        print('{} {} {} '.format(i, long, lat),end='')
        for j, ds in enumerate(files):
            idx = ds.index(float(long), float(lat))
            band = bands[j]
            val = band[idx]
            print('{} '.format(val), end='')
        print("")b
        i+=1
        if i > 10: break # just for the very first rows and check

1 6.050001 49.727499 1540 13 2113 5893 9 19.798992156982422 440.6722106933594 -10.486559867858887 -238043120 1.1584172248840332 0.06909359246492386 353.9831237792969 23 276.87109375 46.444091796875 347.6654052734375 0.04249978065490723 780403 85
2 6.0500017 49.922155 1491 12 1993 5912 16 20.88941192626953 457.7561950683594 33.27436065673828 -208915344 -1.755340814590454 0.26911234855651855 267.5116882324219 19 -49.5263671875 19.552734375 -130.541748046875 0.18277980387210846 772777 85
3 6.0500021 48.602377 1521 17 2124 5983 14 20.695877075195312 481.87969970703125 0.045293472707271576 -137479792 1.9087798595428467 -0.01605454459786415 389.75115966796875 21 93.25732421875 50.74365234375 384.5224609375 0.0362534299492836 898820 62
4 6.0500089 48.151979 1526 16 2569 6130 15 19.375 479.4102783203125 -33.654273986816406 -267223072 0.9657867550849915 0.06776714324951172 380.20770263671875 27 542.4013671875 202.26416015625 386.15673828125 0.0051393029280006886 831824 85
5 6.0500102 49.58841 1547 14 2108 5923 17 18.77750015258789 457.88006591796875 27.493824005126953 -107809368 -0.1626242697238922 0.014064788818359375 308.04278564453125 25 136.04833984375 146.835205078125 198.12744140625 0.02884702757000923 796962 85
6 6.0500143 48.608456 1515 19 2124 6010 14 19.398880004882812 474.3313293457031 -1.6020394563674927 17384282 1.44797945022583 -0.01891239918768406 364.527099609375 18 221.33984375 247.38720703125 480.387939453125 0.042747415602207184 897945 62
7 6.0500165 48.571401 1520 19 2169 6147 11 20.17045021057129 476.4145202636719 27.856502532958984 -66516432 -1.0739555358886719 0.0022801030427217484 254.67959594726562 19 125.25048828125 87.865234375 160.69677734375 0.037254270166158676 908426 96
8 6.0500189 49.921613 1490 12 1995 5912 15 20.8559627532959 457.1954040527344 22.10213851928711 -297770784 -1.402632713317871 0.30976489186286926 294.9277648925781 26 -86.7294921875 -145.584228515625 -190.06298828125 0.22243468463420868 772784 86
9 6.0500201 48.822645 1554 18 1973 6138 8 21.81229019165039 496.2311096191406 18.496583938598633 -25336536 -0.8000159859657288 0.0103699816390872 240.49375915527344 22 -51.470703125 -245.88671875 172.07470703125 0.004428229294717312 839132 64
10 6.0500238 49.847522 1521 15 2187 5886 13 21.137710571289062 466.9766845703125 -5.660452842712402 -278652608 1.4779508113861084 -0.06871972978115082 376.671142578125 12 277.29736328125 273.141845703125 -138.89599609375 0.09881718456745148 768873 70

The first 10 rows seem correct in respect with the previous file, now it is only need to dump the rows to a file, with an appropriate header.

[83]:
%%timeit -r 1 -n 1
with open('tree_height/txt/eu_x_y_predictors_select_new.txt', 'w') as out:
    print('ID X Y' + header, file=out)
    with open('tree_height/txt/eu_x_y_select.txt') as csvfile:
        coords = csv.reader(csvfile, delimiter=' ')
        i = 1
        for (long, lat) in coords:
            print('{} {} {} '.format(i, long, lat),end='', file=out)
            for j, ds in enumerate(files):
                idx = ds.index(float(long), float(lat))
                band = bands[j]
                val = band[idx]
                print('{} '.format(val), end='', file=out)
            print("", file=out)
            i+=1
            if i>100000: break # that's to get a fast partial result...
2min 10s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

You can notice that the output is extremely slow. Efficiency can be eventually increased by changing the buffering size from the default one.

[86]:
! head -3 tree_height/txt/eu_x_y_predictors_select_new.txt
ID X Y BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 ORCDRC_WeigAver SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm convergence cti dev-magnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness outlet_dist_dw_basin treecover
1 6.050001 49.727499 1540 13 2113 5893 9 19.798992156982422 440.6722106933594 -10.486559867858887 -238043120 1.1584172248840332 0.06909359246492386 353.9831237792969 23 276.87109375 46.444091796875 347.6654052734375 0.04249978065490723 780403 85
2 6.0500017 49.922155 1491 12 1993 5912 16 20.88941192626953 457.7561950683594 33.27436065673828 -208915344 -1.755340814590454 0.26911234855651855 267.5116882324219 19 -49.5263671875 19.552734375 -130.541748046875 0.18277980387210846 772777 85

Stend-alone Python script. By Francesco Lovergine

Running the script for the whole content of 1.2M of records take quite a full time, you can run it yourself or use the resulting file stored in the staging area of the VM and clone via git.

#!/usr/bin/env python3
#
# This is the whole script written in a proper way to work even in bash.
# You can copy&paste this block in a text `whatever.py` file, then
# chmod a+x whatever.py
# and run it as ./whatever.py
#

import csv
import glob
import rasterio
import os.path

files = []
for filename in glob.glob('tree_height/geodata_raster/[!l]*.tif'):
    ds = rasterio.open(filename, mode='r')
    files.append(ds)

(name, ext) = os.path.splitext(os.path.basename(files[0].name))
header = ''
for ds in files:
    field_name = os.path.splitext(os.path.basename(ds.name))[0]
    header += ' '+field_name

print('Reading raster files...')

bands = []
for ds in files:
    bands.append(ds.read(1))

print('Writing samples')

with open('tree_height/txt/eu_x_y_predictors_select_new.txt', 'w') as out:
    print('ID X Y' + header, file=out)
    with open('tree_height/txt/eu_x_y_select.txt') as csvfile:
        coords = csv.reader(csvfile, delimiter=' ')
        i = 1
        for (long, lat) in coords:
            print('{} {} {} '.format(i, long, lat),end='', file=out)
            for j, ds in enumerate(files):
                idx = ds.index(float(long), float(lat))
                band = bands[j]
                val = band[idx]
                print('{} '.format(val), end='', file=out)
            print("", file=out)
            if not i % 10: print('Record {} ...'.format(i))
            i+=1
    csvfile.close()
out.close()

print('Finished')

exit(0)

This script can be run by

[ ]:
! time python3 /media/sf_LVM_shared/my_SE_data/exercise/Tree_Height_02Predictors_extraction_python1.py

Using Python with rasterio and numpy

Script construction, step by step. By Hannah Weiser

Imports

[26]:
from pathlib import Path
import rasterio as rio
import numpy as np

Create a list of (opened) raster datasets and of the corresponding field names

[27]:
datasets = []
fieldnames = []
for filename in Path('tree_height/geodata_raster/').glob('[!l]*.tif'):
    ds = rio.open(filename, mode='r')
    datasets.append(ds)
    fieldnames.append(filename.stem)
[28]:
fieldnames
[28]:
['northness',
 'forestheight',
 'elev',
 'CHELSA_bio4',
 'SBIO3_Isothermality_5_15cm',
 'SBIO4_Temperature_Seasonality_5_15cm',
 'dev-magnitude',
 'BLDFIE_WeigAver',
 'CHELSA_bio18',
 'glad_ard_SVVI_min',
 'eastness',
 'ORCDRC_WeigAver',
 'cti',
 'treecover',
 'outlet_dist_dw_basin',
 'glad_ard_SVVI_med',
 'convergence',
 'glad_ard_SVVI_max',
 'CECSOL_WeigAver']
[29]:
datasets
[29]:
[<open DatasetReader name='tree_height/geodata_raster/northness.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/forestheight.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/elev.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CHELSA_bio4.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/SBIO3_Isothermality_5_15cm.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/SBIO4_Temperature_Seasonality_5_15cm.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/dev-magnitude.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/BLDFIE_WeigAver.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CHELSA_bio18.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_min.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/eastness.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/ORCDRC_WeigAver.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/cti.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/treecover.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/outlet_dist_dw_basin.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_med.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/convergence.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/glad_ard_SVVI_max.tif' mode='r'>,
 <open DatasetReader name='tree_height/geodata_raster/CECSOL_WeigAver.tif' mode='r'>]

Read the coordinates to a numpy array

[30]:
coords = np.genfromtxt('tree_height/txt/eu_x_y_select.txt', delimiter=' ')
[31]:
coords
[31]:
array([[ 6.050001 , 49.727499 ],
       [ 6.0500017, 49.922155 ],
       [ 6.0500021, 48.602377 ],
       ...,
       [ 9.9499985, 49.227932 ],
       [ 9.9499988, 49.936763 ],
       [ 9.9499996, 49.48964  ]])
[32]:
coords.shape
[32]:
(1267239, 2)

Create a dictionary with an array of pixel values at the coordinates for each field (i.e. values sampled from each raster)

[33]:
raster_val_dict = {}
for i, ds in enumerate(datasets):
    # transform coordinates to raster row and column indices
    rows, cols = rio.transform.rowcol(ds.transform, coords[:, 0], coords[:, 1])
    rowcols = np.array(list(zip(rows, cols)))
    # read band of raster
    band = ds.read(1)
    # get values by indexing band with row and column indices
    vals = np.array([band[row, col] for row, col in rowcols])
    # add value array to dictionary
    raster_val_dict[fieldnames[i]] = vals
    ds.close()
[34]:
raster_val_dict
[34]:
{'northness': array([ 0.04249978,  0.1827798 ,  0.03625343, ..., -0.06422238,
        -0.00754081,  0.10954276], dtype=float32),
 'forestheight': array([23, 19, 21, ..., 23, 27, 23], dtype=uint8),
 'elev': array([353.98312, 267.5117 , 389.75116, ..., 422.33276, 306.65903,
        349.73883], dtype=float32),
 'CHELSA_bio4': array([5893, 5912, 5983, ..., 6622, 6495, 6588], dtype=uint16),
 'SBIO3_Isothermality_5_15cm': array([19.798992, 20.889412, 20.695877, ..., 18.91893 , 17.010637,
        17.769867], dtype=float32),
 'SBIO4_Temperature_Seasonality_5_15cm': array([440.6722 , 457.7562 , 481.8797 , ..., 490.463  , 457.73627,
        489.45746], dtype=float32),
 'dev-magnitude': array([ 1.1584172 , -1.7553408 ,  1.9087799 , ...,  0.28438434,
        -1.0900238 ,  0.52376586], dtype=float32),
 'BLDFIE_WeigAver': array([1540, 1491, 1521, ..., 1523, 1533, 1517], dtype=uint16),
 'CHELSA_bio18': array([2113, 1993, 2124, ..., 2283, 2103, 1926], dtype=uint16),
 'glad_ard_SVVI_min': array([ 347.6654 , -130.54175,  384.52246, ...,  220.32666,  -10.63208,
         263.45093], dtype=float32),
 'eastness': array([ 0.06909359,  0.26911235, -0.01605454, ..., -0.03627645,
         0.02672861, -0.00700016], dtype=float32),
 'ORCDRC_WeigAver': array([ 9, 16, 14, ..., 11,  6,  8], dtype=uint16),
 'cti': array([-238043120, -208915344, -137479792, ..., -212329056,  140822048,
        -227026464], dtype=int32),
 'treecover': array([85, 85, 62, ..., 75, 85, 97], dtype=uint8),
 'outlet_dist_dw_basin': array([780403, 772777, 898820, ..., 867238, 862879, 812856], dtype=int32),
 'glad_ard_SVVI_med': array([ 46.44409 ,  19.552734,  50.743652, ..., 123.39404 , 104.30469 ,
        133.24927 ], dtype=float32),
 'convergence': array([-10.48656   ,  33.27436   ,   0.04529347, ..., -14.551437  ,
         16.743006  ,  -9.116502  ], dtype=float32),
 'glad_ard_SVVI_max': array([276.8711  , -49.526367,  93.257324, ..., 377.35132 , 209.16504 ,
         27.061035], dtype=float32),
 'CECSOL_WeigAver': array([13, 12, 17, ..., 16, 11, 15], dtype=uint16)}

Create the header of the output file

[35]:
# get keys from dictionary
keys = raster_val_dict.keys()
keys
[35]:
dict_keys(['northness', 'forestheight', 'elev', 'CHELSA_bio4', 'SBIO3_Isothermality_5_15cm', 'SBIO4_Temperature_Seasonality_5_15cm', 'dev-magnitude', 'BLDFIE_WeigAver', 'CHELSA_bio18', 'glad_ard_SVVI_min', 'eastness', 'ORCDRC_WeigAver', 'cti', 'treecover', 'outlet_dist_dw_basin', 'glad_ard_SVVI_med', 'convergence', 'glad_ard_SVVI_max', 'CECSOL_WeigAver'])
[36]:
# add keys (=field names) to string which starts with "ID, Longitude and Latitude"
header = f"ID Longitude Latitude {' '.join(keys)}"
header
[36]:
'ID Longitude Latitude northness forestheight elev CHELSA_bio4 SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm dev-magnitude BLDFIE_WeigAver CHELSA_bio18 glad_ard_SVVI_min eastness ORCDRC_WeigAver cti treecover outlet_dist_dw_basin glad_ard_SVVI_med convergence glad_ard_SVVI_max CECSOL_WeigAver'

Create a 2D array with the indixes, coordinates and corresponding raster values for each sampled point

[37]:
# get all arrays from dictionary as list
all_vals = list(raster_val_dict.values())
# get index, x-coordinate and y-coordinate as list and add the value arrays to that list
all_vals = [np.arange(1, coords.shape[0]+1), coords[:, 0], coords[:, 1]] + all_vals
[38]:
# stack all arrays to a 2D array
all_vals_arr = np.hstack([all_vals]).T
all_vals_arr.shape
[38]:
(1267239, 22)
[39]:
# have a look at the first three rows
all_vals_arr[:3, :]
[39]:
array([[ 1.00000000e+00,  6.05000100e+00,  4.97274990e+01,
         4.24997807e-02,  2.30000000e+01,  3.53983124e+02,
         5.89300000e+03,  1.97989922e+01,  4.40672211e+02,
         1.15841722e+00,  1.54000000e+03,  2.11300000e+03,
         3.47665405e+02,  6.90935925e-02,  9.00000000e+00,
        -2.38043120e+08,  8.50000000e+01,  7.80403000e+05,
         4.64440918e+01, -1.04865599e+01,  2.76871094e+02,
         1.30000000e+01],
       [ 2.00000000e+00,  6.05000170e+00,  4.99221550e+01,
         1.82779804e-01,  1.90000000e+01,  2.67511688e+02,
         5.91200000e+03,  2.08894119e+01,  4.57756195e+02,
        -1.75534081e+00,  1.49100000e+03,  1.99300000e+03,
        -1.30541748e+02,  2.69112349e-01,  1.60000000e+01,
        -2.08915344e+08,  8.50000000e+01,  7.72777000e+05,
         1.95527344e+01,  3.32743607e+01, -4.95263672e+01,
         1.20000000e+01],
       [ 3.00000000e+00,  6.05000210e+00,  4.86023770e+01,
         3.62534299e-02,  2.10000000e+01,  3.89751160e+02,
         5.98300000e+03,  2.06958771e+01,  4.81879700e+02,
         1.90877986e+00,  1.52100000e+03,  2.12400000e+03,
         3.84522461e+02, -1.60545446e-02,  1.40000000e+01,
        -1.37479792e+08,  6.20000000e+01,  8.98820000e+05,
         5.07436523e+01,  4.52934727e-02,  9.32573242e+01,
         1.70000000e+01]])

Write the array to a txt-file

[42]:
# define the format of each column
fmt = '%u %.6f %.6f %u %u %u %u %.6f %i %.8f %.8f %.5f %u %.6f %.6f %.6f %.8f %u %u %.6f %.4f %i'
# save
np.savetxt('tree_height/txt/eu_x_y_predictors_select_python2.txt', all_vals_arr, fmt=fmt, delimiter=' ', header=header, comments='')
[41]:
# depending on how we process the data, we could also use np.save() to save the array to a binary format

Stend-alone Python script. By Hannah Weiser

#!/usr/bin/env python
# coding: utf-8

"""
Modification of the RasterIO Final Script for Preparing the Dataset for the Next ML Exercise

Hannah Weiser, 2023-04-22
"""

# Imports
from pathlib import Path
import rasterio as rio
import numpy as np
import time

start = time.time()
print("Start processing ...")

# Create a list of (opened) raster datasets and of the corresponding field names
datasets = []
fieldnames = []
for filename in Path('tree_height/geodata_raster/').glob('[!l]*.tif'):
    ds = rio.open(filename, mode='r')
    datasets.append(ds)
    fieldnames.append(filename.stem)

# Read the coordinates to a numpy array
coords = np.genfromtxt('tree_height/txt/eu_x_y_select.txt', delimiter=' ')

# Create a dictionary with an array of pixel values at the coordinates for each field (i.e. values sampled from each raster)
raster_val_dict = {}
for i, ds in enumerate(datasets):
    # transform coordinates to raster row and column indices
    rows, cols = rio.transform.rowcol(ds.transform, coords[:, 0], coords[:, 1])
    rowcols = np.array(list(zip(rows, cols)))
    # read band of raster
    band = ds.read(1)
    # get values by indexing band with row and column indices
    vals = np.array([band[row, col] for row, col in rowcols])
    # add value array to dictionary
    raster_val_dict[fieldnames[i]] = vals
    ds.close()

# get keys from dictionary
keys = raster_val_dict.keys()
# add keys (=field names) to string which starts with "ID, Longitude and Latitude"
header = f"ID X Y {' '.join(keys)}"


# Create a 2D array with the indixes, coordinates and corresponding raster values for each sampled point

# get all arrays from dictionary as list
all_vals = list(raster_val_dict.values())
# get index, x-coordinate and y-coordinate as list and add the value arrays to that list
all_vals = [np.arange(1, coords.shape[0]+1), coords[:, 0], coords[:, 1]] + all_vals

# stack all arrays to a 2D array
all_vals_arr = np.hstack([all_vals]).T

end_operation = time.time()
print(f"Done in {end_operation-start:.1f} sec., Writing output file...")

# Write the array to a txt-file

# define the format of each column
fmt = '%u %.6f %.6f %u %u %u %u %.6f %i %.8f %.8f %.5f %u %.6f %.6f %.6f %.8f %u %u %.6f %.4f %i'
# save
np.savetxt('tree_height/txt/eu_x_y_predictors_select_python2.txt', all_vals_arr, fmt=fmt, delimiter=" ", header=header, comments='')

# depending on how we process the data later on, we could also use np.save() to save the array to a binary format

end = time.time()

print(f"Done. Writing took {end-end_operation:.1f} sec. Full script took {end-start:.1f}")
[44]:
! time python3 /media/sf_LVM_shared/my_SE_data/exercise/Tree_Height_02Predictors_extraction_python2.py
Start processing ...
Done in 101.4 sec., Writing output file...
Done. Writing took 9.7 sec. Full script took 111.2

real    1m51.826s
user    1m21.891s
sys     0m14.028s
real    1m51.826s
user    1m21.891s
sys 0m14.028s
[ ]: