Estimation of tree height using GEDI dataset - Support Vector Machine for Regression (SVR) - 2025

Let’s see a quick example of how to use Suppor Vector Regression for tree height estimation

Here are some general installation instructions:

  1. Create a conda enviroment if you don’t have one and activate it:

conda create --name geo_comp_2025 python=3.9

conda activate geo_comp_2025

  1. Install the needed packages for this tutorial

conda install -c conda-forge pandas numpy scikit-learn scipy rasterio matplotlib

Notes:

  1. If you are using jupyterlab of VScode, you might need to add this new environment to the list of kernels. In that case, do the following

  • Install ipykernel: conda install ipykernel

  • Add the environment to the list of kernels python -m ipykernel install --user --name geo_comp_2025 --display-name "Python geo_comp_2025"

  1. If you haven’t loaded the Tree Height dataset, follow the steps below

  • Install gdown on your conda env: pip install gdown

  • Download the dataset and unpack it

gdown 1Y60EuLsfmTICTX-U_FxcE1odNAf04bd

tar -xvzf tree_height.tar.gz

[ ]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
import scipy

# For visualization
import rasterio
from rasterio import *
from rasterio.plot import show
from sklearn.ensemble import RandomForestRegressor
# from sklearn.model_selection import train_test_split,GridSearchCV
# from sklearn.pipeline import Pipeline
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

We will load the data using Pandas and display few samples of it

[ ]:

predictors = pd.read_csv("/home/ahf38/Documents/geo_comp_offline/tree_height/txt/eu_x_y_height_predictors_select.txt", sep=" ", index_col=False) pd.set_option('display.max_columns',None) # change column name predictors = predictors.rename({'dev-magnitude':'devmagnitude'} , axis='columns') predictors.head(10)
ID X Y h BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 convergence cti devmagnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness ORCDRC_WeigAver outlet_dist_dw_basin SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm treecover
0 1 6.050001 49.727499 3139.00 1540 13 2113 5893 -10.486560 -238043120 1.158417 0.069094 353.983124 23 276.871094 46.444092 347.665405 0.042500 9 780403 19.798992 440.672211 85
1 2 6.050002 49.922155 1454.75 1491 12 1993 5912 33.274361 -208915344 -1.755341 0.269112 267.511688 19 -49.526367 19.552734 -130.541748 0.182780 16 772777 20.889412 457.756195 85
2 3 6.050002 48.602377 853.50 1521 17 2124 5983 0.045293 -137479792 1.908780 -0.016055 389.751160 21 93.257324 50.743652 384.522461 0.036253 14 898820 20.695877 481.879700 62
3 4 6.050009 48.151979 3141.00 1526 16 2569 6130 -33.654274 -267223072 0.965787 0.067767 380.207703 27 542.401367 202.264160 386.156738 0.005139 15 831824 19.375000 479.410278 85
4 5 6.050010 49.588410 2065.25 1547 14 2108 5923 27.493824 -107809368 -0.162624 0.014065 308.042786 25 136.048340 146.835205 198.127441 0.028847 17 796962 18.777500 457.880066 85
5 6 6.050014 48.608456 1246.50 1515 19 2124 6010 -1.602039 17384282 1.447979 -0.018912 364.527100 18 221.339844 247.387207 480.387939 0.042747 14 897945 19.398880 474.331329 62
6 7 6.050016 48.571401 2938.75 1520 19 2169 6147 27.856503 -66516432 -1.073956 0.002280 254.679596 19 125.250488 87.865234 160.696777 0.037254 11 908426 20.170450 476.414520 96
7 8 6.050019 49.921613 3294.75 1490 12 1995 5912 22.102139 -297770784 -1.402633 0.309765 294.927765 26 -86.729492 -145.584229 -190.062988 0.222435 15 772784 20.855963 457.195404 86
8 9 6.050020 48.822645 1623.50 1554 18 1973 6138 18.496584 -25336536 -0.800016 0.010370 240.493759 22 -51.470703 -245.886719 172.074707 0.004428 8 839132 21.812290 496.231110 64
9 10 6.050024 49.847522 1400.00 1521 15 2187 5886 -5.660453 -278652608 1.477951 -0.068720 376.671143 12 277.297363 273.141846 -138.895996 0.098817 13 768873 21.137711 466.976685 70
[ ]:
len(predictors) # original: 1267239
1267239

As explained in the previous lecture, ‘h’ is the estimated tree heigth. So let’s use it as our target.

[ ]:
bins = np.linspace(min(predictors['h']),max(predictors['h']),100)
plt.hist((predictors['h']),bins,alpha=0.8)
(array([4.2886e+04, 4.7517e+04, 3.0371e+04, 2.9201e+04, 3.0742e+04,
        3.2932e+04, 3.6990e+04, 4.1454e+04, 4.7479e+04, 5.3498e+04,
        5.9376e+04, 6.3533e+04, 6.7949e+04, 7.0729e+04, 7.4511e+04,
        7.8514e+04, 7.9429e+04, 7.6759e+04, 6.9654e+04, 6.0192e+04,
        4.9049e+04, 3.7490e+04, 2.7254e+04, 1.8778e+04, 1.2967e+04,
        8.6820e+03, 5.6980e+03, 3.6230e+03, 2.4370e+03, 1.5850e+03,
        1.0550e+03, 6.8900e+02, 4.7600e+02, 3.4300e+02, 2.6100e+02,
        2.3600e+02, 1.6900e+02, 1.7000e+02, 1.4700e+02, 1.3700e+02,
        1.4500e+02, 1.2300e+02, 9.8000e+01, 1.0800e+02, 9.3000e+01,
        1.0500e+02, 8.7000e+01, 1.1100e+02, 8.0000e+01, 6.7000e+01,
        8.5000e+01, 8.1000e+01, 7.5000e+01, 6.1000e+01, 5.8000e+01,
        6.4000e+01, 6.0000e+01, 4.5000e+01, 4.8000e+01, 5.2000e+01,
        5.0000e+01, 4.9000e+01, 5.6000e+01, 3.3000e+01, 3.6000e+01,
        3.6000e+01, 3.5000e+01, 2.9000e+01, 2.2000e+01, 2.3000e+01,
        1.9000e+01, 1.5000e+01, 1.4000e+01, 2.2000e+01, 1.7000e+01,
        1.7000e+01, 1.0000e+01, 8.0000e+00, 1.2000e+01, 1.2000e+01,
        1.3000e+01, 5.0000e+00, 1.5000e+01, 9.0000e+00, 5.0000e+00,
        1.0000e+01, 7.0000e+00, 6.0000e+00, 4.0000e+00, 4.0000e+00,
        5.0000e+00, 8.0000e+00, 6.0000e+00, 2.0000e+00, 2.0000e+00,
        4.0000e+00, 3.0000e+00, 3.0000e+00, 5.0000e+00]),
 array([   84.75      ,   230.16666667,   375.58333333,   521.        ,
          666.41666667,   811.83333333,   957.25      ,  1102.66666667,
         1248.08333333,  1393.5       ,  1538.91666667,  1684.33333333,
         1829.75      ,  1975.16666667,  2120.58333333,  2266.        ,
         2411.41666667,  2556.83333333,  2702.25      ,  2847.66666667,
         2993.08333333,  3138.5       ,  3283.91666667,  3429.33333333,
         3574.75      ,  3720.16666667,  3865.58333333,  4011.        ,
         4156.41666667,  4301.83333333,  4447.25      ,  4592.66666667,
         4738.08333333,  4883.5       ,  5028.91666667,  5174.33333333,
         5319.75      ,  5465.16666667,  5610.58333333,  5756.        ,
         5901.41666667,  6046.83333333,  6192.25      ,  6337.66666667,
         6483.08333333,  6628.5       ,  6773.91666667,  6919.33333333,
         7064.75      ,  7210.16666667,  7355.58333333,  7501.        ,
         7646.41666667,  7791.83333333,  7937.25      ,  8082.66666667,
         8228.08333333,  8373.5       ,  8518.91666667,  8664.33333333,
         8809.75      ,  8955.16666667,  9100.58333333,  9246.        ,
         9391.41666667,  9536.83333333,  9682.25      ,  9827.66666667,
         9973.08333333, 10118.5       , 10263.91666667, 10409.33333333,
        10554.75      , 10700.16666667, 10845.58333333, 10991.        ,
        11136.41666667, 11281.83333333, 11427.25      , 11572.66666667,
        11718.08333333, 11863.5       , 12008.91666667, 12154.33333333,
        12299.75      , 12445.16666667, 12590.58333333, 12736.        ,
        12881.41666667, 13026.83333333, 13172.25      , 13317.66666667,
        13463.08333333, 13608.5       , 13753.91666667, 13899.33333333,
        14044.75      , 14190.16666667, 14335.58333333, 14481.        ]),
 <BarContainer object of 99 artists>)
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_7_1.png
[ ]:
predictors_sel = predictors.loc[(predictors['h'] < 7000)  ].sample(100000)
predictors_sel.insert ( 4, 'hm' ,  predictors_sel['h']/100 ) # add a col of heigh in meters
print(len(predictors_sel))
predictors_sel.head(10)
100000
ID X Y h hm BLDFIE_WeigAver CECSOL_WeigAver CHELSA_bio18 CHELSA_bio4 convergence cti devmagnitude eastness elev forestheight glad_ard_SVVI_max glad_ard_SVVI_med glad_ard_SVVI_min northness ORCDRC_WeigAver outlet_dist_dw_basin SBIO3_Isothermality_5_15cm SBIO4_Temperature_Seasonality_5_15cm treecover
1095625 1095626 9.329930 49.926325 2751.75 27.5175 1431 12 2419 6210 21.117981 -216327344 1.391024 0.053138 436.207947 28 144.577393 22.498047 8.659180 0.111112 21 715006 19.184645 462.504242 92
196725 196726 6.725681 49.683399 954.25 9.5425 1511 14 2300 6115 -6.578615 -272214336 -0.653943 0.322846 317.342010 20 104.212891 19.286133 116.542969 -0.066891 20 683524 20.000200 461.280579 82
401987 401988 7.224553 49.492585 857.00 8.5700 1509 11 2218 6174 -5.644562 -63283032 -0.104253 0.053422 366.646790 22 632.265625 203.650635 397.963135 -0.016646 14 892348 19.547623 471.590210 96
218452 218453 6.780731 49.180887 1919.75 19.1975 1513 11 2044 6155 16.325632 -233124048 1.383732 0.019032 288.472107 25 112.897461 30.783203 -145.130981 0.047718 9 782564 17.172062 445.589813 82
195001 195002 6.721547 48.388958 193.75 1.9375 1522 14 2571 6272 -7.304356 -250884928 -0.231154 -0.052327 337.373840 0 841.455078 560.348389 227.107910 -0.117437 12 933888 19.015015 478.525848 37
396727 396728 7.212731 48.873052 2346.75 23.4675 1525 17 2184 6229 -65.911629 -65190392 2.136951 -0.002663 336.190887 27 190.351562 167.358887 282.472168 -0.004691 10 874345 20.289476 505.442719 97
260995 260996 6.888748 48.435493 776.75 7.7675 1422 15 2833 6231 38.480377 -365362560 -0.644891 0.146649 397.598541 21 -177.143066 -254.826416 -398.431152 0.183483 19 953960 20.743057 440.955780 85
272979 272980 6.920438 49.619537 1789.25 17.8925 1372 10 2331 5838 10.296758 -314147840 2.219449 0.126081 554.973389 20 100.246094 164.923096 320.666748 0.032013 19 801361 21.880178 433.262115 76
1047921 1047922 9.209383 49.685987 3036.00 30.3600 1484 11 2025 6534 22.080761 -208613008 -0.341067 0.173462 295.235352 28 194.066406 26.444824 46.955078 0.215097 16 711355 19.485374 472.121124 85
419140 419141 7.261843 48.594148 2283.25 22.8325 1385 18 3274 5690 17.750772 -42494012 3.259593 0.010078 871.413452 26 -16.880371 -69.031250 -269.744263 0.201674 23 856886 22.416262 434.105347 95
[ ]:
predictors_sel.shape
(100000, 24)
[ ]:
bins = np.linspace(min(predictors_sel['hm']),max(predictors_sel['hm']),100)
plt.hist((predictors_sel['hm']),bins,alpha=0.8);
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_10_0.png
[ ]:
# What we are trying to beat
y_true = predictors_sel['hm']
y_pred = predictors_sel['forestheight']

slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(y_pred, y_true)

fig,ax=plt.subplots(1,1,figsize=(5,5))
ax.scatter(y_pred, y_true)
ax.set_xlabel('Prediction')
ax.set_ylabel('True')
ax.set_title('slope: {:.4f}, r_value: {:.4f}'.format(slope, r_value))
plt.show()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_11_0.png
[ ]:
#Explore the raw data
# data = data.to_numpy()
n_plots_x = int(np.ceil(np.sqrt(predictors_sel.shape[1])))
n_plots_y = int(np.floor(np.sqrt(predictors_sel.shape[1])))
fig, ax = plt.subplots(n_plots_x, n_plots_y, figsize=(10, 10), dpi=100, facecolor='w', edgecolor='k')
ax=ax.ravel()
for idx in range(20):
    ax[idx].hist(predictors_sel.iloc[:, idx].to_numpy().flatten())
    ax[idx].set_title(predictors_sel.columns[idx])
fig.tight_layout()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_12_0.png
[ ]:
tree_height = predictors_sel['hm'].to_numpy()
data = predictors_sel.drop(columns=['ID','h','X','Y', 'hm','forestheight'], axis=1)

Now we will split the data into training vs test datasets and perform the normalization.

[ ]:
X_train, X_test, y_train, y_test = train_test_split(data.to_numpy()[:20000,:],tree_height[:20000], random_state=0)
print('X_train.shape:{}, X_test.shape:{} '.format(X_train.shape, X_test.shape))
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train.shape:(15000, 18), X_test.shape:(5000, 18)
[ ]:
# Check the normalized data
n_plots_x = int(np.ceil(np.sqrt(X_train.shape[1])))
n_plots_y = int(np.floor(np.sqrt(X_train.shape[1])))
fig, ax = plt.subplots(n_plots_x, n_plots_y, figsize=(10, 10), dpi=100, facecolor='w', edgecolor='k')
ax=ax.ravel()
for idx in range(18):
    ax[idx].hist(X_train[:, idx].flatten())
    # ax[idx].set_title(predictors_sel.columns[idx])
fig.tight_layout()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_16_0.png

Now, we will build our SVR regressor. For more details on all the parameters it accepts, please check the documentation

[ ]:
svr = SVR()
svr.fit(X_train, y_train) # Fit the SVR model according to the given training data.
print('Accuracy of SVR on training set: {:.5f}'.format(svr.score(X_train, y_train))) # Returns the coefficient of determination (R^2) of the prediction.
print('Accuracy of SVR on test set: {:.5f}'.format(svr.score(X_test, y_test)))
Accuracy of SVR on training set: 0.23120
Accuracy of SVR on test set: 0.23116

Now, let’s test the fitted model

[ ]:
y_pred = svr.predict(X_test)
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(y_pred, y_test)

fig,ax=plt.subplots(1,1,figsize=(5,5))
ax.scatter(y_pred, y_test)
ax.set_xlabel('Prediction')
ax.set_ylabel('True')
ax.set_title('slope: {:.4f}, r_value: {:.4f}'.format(slope, r_value))
plt.show()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_20_0.png
[ ]:
svr = SVR(epsilon=0.01) # Let's change the size of the epsilon-tube
svr.fit(X_train, y_train) # Fit the SVR model according to the given training data.
print('Accuracy of SVR on training set: {:.5f}'.format(svr.score(X_train, y_train))) # Returns the coefficient of determination (R^2) of the prediction.
print('Accuracy of SVR on test set: {:.5f}'.format(svr.score(X_test, y_test)))
Accuracy of SVR on training set: 0.23103
Accuracy of SVR on test set: 0.23090
[ ]:
svr = SVR(epsilon=0.01, C=2.0) # Let's add some slack to this model
svr.fit(X_train, y_train) # Fit the SVR model according to the given training data.
print('Accuracy of SVR on training set: {:.5f}'.format(svr.score(X_train, y_train))) # Returns the coefficient of determination (R^2) of the prediction.
print('Accuracy of SVR on test set: {:.5f}'.format(svr.score(X_test, y_test)))
Accuracy of SVR on training set: 0.24250
Accuracy of SVR on test set: 0.23752

Let’s do I quick plot to show how this model is performing

[ ]:
y_pred = svr.predict(X_test)
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(y_pred, y_test)

fig,ax=plt.subplots(1,1,figsize=(5,5))
ax.scatter(y_pred, y_test)
ax.set_xlabel('Prediction')
ax.set_ylabel('True')
ax.set_title('slope: {:.4f}, r_value: {:.4f}'.format(slope, r_value))
plt.show()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_24_0.png

Since ‘C’ and ‘epsilon’ are hyperparameter, how can see be sure we are selecting the best parameters for this task? One way is via testing a large range of parameters

[ ]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Define a range of 'C' values and 'epsilon' values to test
C_values = [0.1, 0.5, 2, 5, 10, 20, 30]  # Logarithmically spaced values from 0.01 to 100
epsilon_values = [0.01, 0.1, 1, 5, 10, 15]  # Linearly spaced values from 0.01 to 0.1

# Matrix to store scores for each combination of C and epsilon
scores = np.zeros((len(C_values), len(epsilon_values)))

# Loop over the values of C and epsilon
for j, epsilon in enumerate(epsilon_values):
    for i, C in enumerate(C_values):
        svr = SVR(epsilon=epsilon, C=C)
        svr.fit(X_train, y_train)

        # Evaluate on the test set
        score = svr.score(X_test, y_test)
        scores[i, j] = score
        print(f'C: {C}, epsilon: {epsilon}, score: {score}')

# Plotting the results using a heatmap
plt.figure(figsize=(12, 8))
plt.imshow(scores, interpolation='nearest', cmap=plt.cm.hot, aspect='auto')
plt.xlabel('Epsilon')
plt.ylabel('C')
plt.colorbar()
plt.xticks(np.arange(len(epsilon_values)), ['{:.2f}'.format(eps) for eps in epsilon_values], rotation=45)
plt.yticks(np.arange(len(C_values)), ['{:.2f}'.format(c) for c in C_values])
plt.title('Test R^2 Score for SVR by C and Epsilon')
plt.show()
C: 0.1, epsilon: 0.01, score: 0.20978741774185794
C: 0.5, epsilon: 0.01, score: 0.224984890251273
C: 2, epsilon: 0.01, score: 0.23751817108305906
C: 5, epsilon: 0.01, score: 0.2430134874714842
C: 10, epsilon: 0.01, score: 0.24592576392099974
C: 20, epsilon: 0.01, score: 0.2471275230713288
C: 30, epsilon: 0.01, score: 0.2460728898880883
C: 0.1, epsilon: 0.1, score: 0.20969380775602353
C: 0.5, epsilon: 0.1, score: 0.22484377552190515
C: 2, epsilon: 0.1, score: 0.23767933245407313
C: 5, epsilon: 0.1, score: 0.2431856008775254
C: 10, epsilon: 0.1, score: 0.24580704932103192
C: 20, epsilon: 0.1, score: 0.24712884164946158
C: 30, epsilon: 0.1, score: 0.246328775308439
C: 0.1, epsilon: 1, score: 0.20973564642231624
C: 0.5, epsilon: 1, score: 0.22581238652491897
C: 2, epsilon: 1, score: 0.23756944043467765
C: 5, epsilon: 1, score: 0.2437983964450433
C: 10, epsilon: 1, score: 0.24644442384127674
C: 20, epsilon: 1, score: 0.24715628012800106
C: 30, epsilon: 1, score: 0.2460471720860985
C: 0.1, epsilon: 5, score: 0.2078240741405224
C: 0.5, epsilon: 5, score: 0.22818775814332437
C: 2, epsilon: 5, score: 0.24265338342738174
C: 5, epsilon: 5, score: 0.24928741274665522
C: 10, epsilon: 5, score: 0.2530321969187912
C: 20, epsilon: 5, score: 0.25528488754992074
C: 30, epsilon: 5, score: 0.2552112081099446
C: 0.1, epsilon: 10, score: 0.17127167081172956
C: 0.5, epsilon: 10, score: 0.19584892469125115
C: 2, epsilon: 10, score: 0.20986469171197608
C: 5, epsilon: 10, score: 0.2161045583331157
C: 10, epsilon: 10, score: 0.21884297620381743
C: 20, epsilon: 10, score: 0.22052872534191204
C: 30, epsilon: 10, score: 0.22185325289567737
C: 0.1, epsilon: 15, score: 0.07433020597762285
C: 0.5, epsilon: 15, score: 0.09707592589881109
C: 2, epsilon: 15, score: 0.10631525255623031
C: 5, epsilon: 15, score: 0.11130035394140514
C: 10, epsilon: 15, score: 0.11384968636367021
C: 20, epsilon: 15, score: 0.11342898882767949
C: 30, epsilon: 15, score: 0.11237800691693522
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_26_1.png

For a fixed error margin (epsilon), you can perform gridsearch

[ ]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, r2_score
# Define your model without setting the C parameter
svr = SVR(epsilon=5)

# Define the parameter grid: list of dictionaries with the parameters you want to test
param_grid = {
    'C': [0.1, 0.5, 1, 2, 5, 10, 15, 20, 25, 30, 35, 40]  # Example values for C
}

# Define the scorer, if you want to change the default scoring method (which is R^2 for regression models)
scorer = make_scorer(r2_score)

# Set up the grid search with cross-validation
grid_search = GridSearchCV(estimator=svr, param_grid=param_grid, scoring=scorer, cv=5, verbose=1)

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Get the best model
best_svr = grid_search.best_estimator_

# Output the best parameter
print('Best value of C:', grid_search.best_params_['C'])
print('Best cross-validated R^2 of the best estimator:', grid_search.best_score_)

# Evaluate the best model on the training set
print('Accuracy of SVR on training set: {:.5f}'.format(best_svr.score(X_train, y_train)))

# Evaluate the best model on the test set
print('Accuracy of SVR on test set: {:.5f}'.format(best_svr.score(X_test, y_test)))
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Best value of C: 25
Best cross-validated R^2 of the best estimator: 0.24966897220448353
Accuracy of SVR on training set: 0.29812
Accuracy of SVR on test set: 0.25533
[ ]:
y_pred = best_svr.predict(X_test)
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(y_pred, y_test)

fig,ax=plt.subplots(1,1,figsize=(5,5))
ax.scatter(y_pred, y_test)
ax.set_xlabel('Prediction')
ax.set_ylabel('True')
ax.set_title('slope: {:.4f}, r_value: {:.4f}'.format(slope, r_value))
plt.show()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_29_0.png
[ ]:
svr = SVR(kernel='linear',epsilon=5, C=30)
svr.fit(X_train, y_train) # Fit the SVR model according to the given training data.
print('Accuracy of SVR on training set: {:.5f}'.format(svr.score(X_train, y_train))) # Returns the coefficient of determination (R^2) of the prediction.
print('Accuracy of SVR on test set: {:.5f}'.format(svr.score(X_test, y_test)))
Accuracy of SVR on training set: 0.18720
Accuracy of SVR on test set: 0.20898
[ ]:
svr = SVR(kernel='linear')
svr.fit(X_train, y_train) # Fit the SVR model according to the given training data.
print('Accuracy of SVR on training set: {:.5f}'.format(svr.score(X_train, y_train))) # Returns the coefficient of determination (R^2) of the prediction.
print('Accuracy of SVR on test set: {:.5f}'.format(svr.score(X_test, y_test)))
Accuracy of SVR on training set: 0.17729
Accuracy of SVR on test set: 0.20066
[ ]:
# import satalite indeces
glad_ard_SVVI_min = "./tree_height_2/geodata_raster/glad_ard_SVVI_min.tif"
glad_ard_SVVI_med = "./tree_height_2/geodata_raster/glad_ard_SVVI_med.tif"
glad_ard_SVVI_max = "./tree_height_2/geodata_raster/glad_ard_SVVI_max.tif"

# import climate
CHELSA_bio4 = "./tree_height_2/geodata_raster/CHELSA_bio4.tif"
CHELSA_bio18 = "./tree_height_2/geodata_raster/CHELSA_bio18.tif"

# soil
BLDFIE_WeigAver = "./tree_height_2/geodata_raster/BLDFIE_WeigAver.tif"
CECSOL_WeigAver = "./tree_height_2/geodata_raster/CECSOL_WeigAver.tif"
ORCDRC_WeigAver = "./tree_height_2/geodata_raster/ORCDRC_WeigAver.tif"

# Geomorphological
elev = "./tree_height_2/geodata_raster/elev.tif"
convergence = "./tree_height_2/geodata_raster/convergence.tif"
northness = "./tree_height_2/geodata_raster/northness.tif"
eastness = "./tree_height_2/geodata_raster/eastness.tif"
devmagnitude = "./tree_height_2/geodata_raster/dev-magnitude.tif"

# Hydrography
cti = "./tree_height_2/geodata_raster/cti.tif"
outlet_dist_dw_basin = "./tree_height_2/geodata_raster/outlet_dist_dw_basin.tif"

# Soil climate

SBIO3_Isothermality_5_15cm = "./tree_height_2/geodata_raster/SBIO3_Isothermality_5_15cm.tif"
SBIO4_Temperature_Seasonality_5_15cm = "./tree_height_2/geodata_raster/SBIO4_Temperature_Seasonality_5_15cm.tif"

# forest

treecover = "./tree_height_2/geodata_raster/treecover.tif"
[ ]:
predictors_rasters = [glad_ard_SVVI_min, glad_ard_SVVI_med, glad_ard_SVVI_max,
                      CHELSA_bio4,CHELSA_bio18,
                      BLDFIE_WeigAver,CECSOL_WeigAver, ORCDRC_WeigAver,
                      elev,convergence,northness,eastness,devmagnitude,cti,outlet_dist_dw_basin,
                      SBIO3_Isothermality_5_15cm,SBIO4_Temperature_Seasonality_5_15cm,treecover]
print(predictors_rasters)
stack = Raster(predictors_rasters)
['./tree_height_2/geodata_raster/glad_ard_SVVI_min.tif', './tree_height_2/geodata_raster/glad_ard_SVVI_med.tif', './tree_height_2/geodata_raster/glad_ard_SVVI_max.tif', './tree_height_2/geodata_raster/CHELSA_bio4.tif', './tree_height_2/geodata_raster/CHELSA_bio18.tif', './tree_height_2/geodata_raster/BLDFIE_WeigAver.tif', './tree_height_2/geodata_raster/CECSOL_WeigAver.tif', './tree_height_2/geodata_raster/ORCDRC_WeigAver.tif', './tree_height_2/geodata_raster/elev.tif', './tree_height_2/geodata_raster/convergence.tif', './tree_height_2/geodata_raster/northness.tif', './tree_height_2/geodata_raster/eastness.tif', './tree_height_2/geodata_raster/dev-magnitude.tif', './tree_height_2/geodata_raster/cti.tif', './tree_height_2/geodata_raster/outlet_dist_dw_basin.tif', './tree_height_2/geodata_raster/SBIO3_Isothermality_5_15cm.tif', './tree_height_2/geodata_raster/SBIO4_Temperature_Seasonality_5_15cm.tif', './tree_height_2/geodata_raster/treecover.tif']
[ ]:
result = stack.predict(estimator=svr, dtype='int16', nodata=-1)
[ ]:
# plot regression result
plt.rcParams["figure.figsize"] = (12,12)
result.iloc[0].cmap = "plasma"
result.plot()
plt.show()
../_images/CASESTUDY_Tree_Height_04SVM_pred_2025_35_0.png

Exercise: explore the other parameters offered by the SVM library and try to make the model better. Some suggestions:

  • Better cleaning of the data (follow Peppe’s suggestions)

  • Stronger regularization might be helpful

  • Play with different kernels

For the brave ones, try to implenent the SVR algorithm from scratch. As we saw in class, the algorithm is quite simple. Here is a simple sketch of the SVM algorithm. Make the appropriate modifications to turn it into a regression. Let us know if your implementation is better than sklearn’s.

[ ]:
## Support Vector Machine
import numpy as np

train_f1 = x_train[:,0]
train_f2 = x_train[:,1]

train_f1 = train_f1.reshape(90,1)
train_f2 = train_f2.reshape(90,1)

w1 = np.zeros((90,1))
w2 = np.zeros((90,1))

epochs = 1
alpha = 0.0001

while(epochs < 10000):
    y = w1 * train_f1 + w2 * train_f2
    prod = y * y_train
    print(epochs)
    count = 0
    for val in prod:
        if(val >= 1):
            cost = 0
            w1 = w1 - alpha * (2 * 1/epochs * w1)
            w2 = w2 - alpha * (2 * 1/epochs * w2)

        else:
            cost = 1 - val
            w1 = w1 + alpha * (train_f1[count] * y_train[count] - 2 * 1/epochs * w1)
            w2 = w2 + alpha * (train_f2[count] * y_train[count] - 2 * 1/epochs * w2)
        count += 1
    epochs += 1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 4
      1 ## Support Vector Machine 
      2 import numpy as np
----> 4 train_f1 = x_train[:,0]
      5 train_f2 = x_train[:,1]
      7 train_f1 = train_f1.reshape(90,1)

NameError: name 'x_train' is not defined
[ ]: