{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5f5d6166-1c49-4927-aebb-f50e24baba16",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Hemalatha Velappan: Classification of different tree species plantations using deep learning\n",
    "\n",
    "[Video recording](https://youtu.be/1OAzeb71lwU)\n",
    "\n",
    "### The goal of this work is to develop a model to identify planted forests and the tree species growing there. The model is developed using the \n",
    "#### (1) known locations of planted forests based on literature and personal communications, \n",
    "#### (2) image analysis and feature extraction of planted trees\n",
    "#### (3) spectral signatures unique to each species"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "81424f42-e772-47f6-b18e-a7fbae518e62",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import scipy\n",
    "from osgeo import ogr\n",
    "from sklearn.metrics import r2_score\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "b99b7457-d19e-4642-a0c6-1e3267afe73b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/media/sf_LVM_shared/my_SE_data/Plantation_datasets/Peru_Plantation_Shapefile-Updated\n"
     ]
    }
   ],
   "source": [
    "#Loading plantation shapefile location\n",
    "\n",
    "%cd /media/sf_LVM_shared/my_SE_data/Plantation_datasets/Peru_Plantation_Shapefile-Updated"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "063acc5f-873d-4514-b1fc-476e6250f163",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Driver: GTiff/GeoTIFF\n",
      "Files: SentinelMap.tif\n",
      "Size is 898, 770\n",
      "Coordinate System is:\n",
      "GEOGCRS[\"WGS 84\",\n",
      "    DATUM[\"World Geodetic System 1984\",\n",
      "        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n",
      "            LENGTHUNIT[\"metre\",1]]],\n",
      "    PRIMEM[\"Greenwich\",0,\n",
      "        ANGLEUNIT[\"degree\",0.0174532925199433]],\n",
      "    CS[ellipsoidal,2],\n",
      "        AXIS[\"geodetic latitude (Lat)\",north,\n",
      "            ORDER[1],\n",
      "            ANGLEUNIT[\"degree\",0.0174532925199433]],\n",
      "        AXIS[\"geodetic longitude (Lon)\",east,\n",
      "            ORDER[2],\n",
      "            ANGLEUNIT[\"degree\",0.0174532925199433]],\n",
      "    ID[\"EPSG\",4326]]\n",
      "Data axis to CRS axis mapping: 2,1\n",
      "Origin = (-76.889769608227439,-7.186342609899349)\n",
      "Pixel Size = (0.000269494585236,-0.000269494585236)\n",
      "Metadata:\n",
      "  AREA_OR_POINT=Area\n",
      "Image Structure Metadata:\n",
      "  COMPRESSION=LZW\n",
      "  INTERLEAVE=PIXEL\n",
      "Corner Coordinates:\n",
      "Upper Left  ( -76.8897696,  -7.1863426) ( 76d53'23.17\"W,  7d11'10.83\"S)\n",
      "Lower Left  ( -76.8897696,  -7.3938534) ( 76d53'23.17\"W,  7d23'37.87\"S)\n",
      "Upper Right ( -76.6477635,  -7.1863426) ( 76d38'51.95\"W,  7d11'10.83\"S)\n",
      "Lower Right ( -76.6477635,  -7.3938534) ( 76d38'51.95\"W,  7d23'37.87\"S)\n",
      "Center      ( -76.7687665,  -7.2900980) ( 76d46' 7.56\"W,  7d17'24.35\"S)\n",
      "Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray\n",
      "  Description = B1\n",
      "    Computed Min/Max=1005.000,8417.000\n",
      "Band 2 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B2\n",
      "    Computed Min/Max=696.000,9929.000\n",
      "Band 3 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B3\n",
      "    Computed Min/Max=471.000,10214.000\n",
      "Band 4 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B4\n",
      "    Computed Min/Max=269.000,11283.000\n",
      "Band 5 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B5\n",
      "    Computed Min/Max=269.000,10690.000\n",
      "Band 6 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B6\n",
      "    Computed Min/Max=419.000,11095.000\n",
      "Band 7 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B7\n",
      "    Computed Min/Max=437.000,11857.000\n",
      "Band 8 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B8\n",
      "    Computed Min/Max=390.000,11342.000\n",
      "Band 9 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B8A\n",
      "    Computed Min/Max=355.000,12318.000\n",
      "Band 10 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B9\n",
      "    Computed Min/Max=56.000,2030.000\n",
      "Band 11 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B10\n",
      "    Computed Min/Max=3.000,31.000\n",
      "Band 12 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B11\n",
      "    Computed Min/Max=78.000,13317.000\n",
      "Band 13 Block=256x256 Type=UInt16, ColorInterp=Undefined\n",
      "  Description = B12\n",
      "    Computed Min/Max=24.000,16040.000\n"
     ]
    }
   ],
   "source": [
    "!gdalinfo -mm SentinelMap.tif"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3382d76-5c85-40d5-aa39-bb08d62203eb",
   "metadata": {},
   "source": [
    "## Performing zonal statistics on the polygon shapefile wrt the sentinel satellite image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f5897df2-7050-4db4-bf0e-adf4dcf9144a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "processing layer Peru_with_XY\n",
      "0...10...20...30...40...50...60...70...80...90...100 - done.\n"
     ]
    }
   ],
   "source": [
    "!pkextractogr -f CSV -i SentinelMap.tif -s Peru_XY/Peru_with_XY.shp -r allpoints  -r mean -r stdev -o extracted2.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "40d7ca48-33b5-4f4f-a1f5-a8c11e6f9729",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OBJECTID</th>\n",
       "      <th>FUENTE/SOU</th>\n",
       "      <th>DOCREG</th>\n",
       "      <th>FECREG</th>\n",
       "      <th>OBSERV</th>\n",
       "      <th>ZONUTM</th>\n",
       "      <th>ORIGEN</th>\n",
       "      <th>TIPCOM</th>\n",
       "      <th>NUMREG</th>\n",
       "      <th>NOMTIT/Tit</th>\n",
       "      <th>...</th>\n",
       "      <th>b4</th>\n",
       "      <th>b5</th>\n",
       "      <th>b6</th>\n",
       "      <th>b7</th>\n",
       "      <th>b8</th>\n",
       "      <th>b9</th>\n",
       "      <th>b10</th>\n",
       "      <th>b11</th>\n",
       "      <th>b12</th>\n",
       "      <th>Classification</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>798</td>\n",
       "      <td>1965</td>\n",
       "      <td>2724</td>\n",
       "      <td>2588</td>\n",
       "      <td>2981</td>\n",
       "      <td>475</td>\n",
       "      <td>11</td>\n",
       "      <td>1301</td>\n",
       "      <td>493</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>796</td>\n",
       "      <td>2069</td>\n",
       "      <td>2740</td>\n",
       "      <td>2661</td>\n",
       "      <td>3053</td>\n",
       "      <td>475</td>\n",
       "      <td>11</td>\n",
       "      <td>1269</td>\n",
       "      <td>460</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>805</td>\n",
       "      <td>2114</td>\n",
       "      <td>2717</td>\n",
       "      <td>2656</td>\n",
       "      <td>3130</td>\n",
       "      <td>447</td>\n",
       "      <td>10</td>\n",
       "      <td>1309</td>\n",
       "      <td>485</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>741</td>\n",
       "      <td>1731</td>\n",
       "      <td>2253</td>\n",
       "      <td>2031</td>\n",
       "      <td>2501</td>\n",
       "      <td>447</td>\n",
       "      <td>10</td>\n",
       "      <td>1197</td>\n",
       "      <td>440</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>854</td>\n",
       "      <td>2127</td>\n",
       "      <td>2660</td>\n",
       "      <td>2651</td>\n",
       "      <td>3144</td>\n",
       "      <td>461</td>\n",
       "      <td>9</td>\n",
       "      <td>1496</td>\n",
       "      <td>580</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>756</td>\n",
       "      <td>1901</td>\n",
       "      <td>2482</td>\n",
       "      <td>2427</td>\n",
       "      <td>2841</td>\n",
       "      <td>461</td>\n",
       "      <td>9</td>\n",
       "      <td>1302</td>\n",
       "      <td>492</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>739</td>\n",
       "      <td>1792</td>\n",
       "      <td>2374</td>\n",
       "      <td>2293</td>\n",
       "      <td>2648</td>\n",
       "      <td>438</td>\n",
       "      <td>9</td>\n",
       "      <td>1283</td>\n",
       "      <td>518</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>797</td>\n",
       "      <td>2108</td>\n",
       "      <td>2745</td>\n",
       "      <td>2731</td>\n",
       "      <td>3185</td>\n",
       "      <td>488</td>\n",
       "      <td>10</td>\n",
       "      <td>1364</td>\n",
       "      <td>502</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>738</td>\n",
       "      <td>1755</td>\n",
       "      <td>2448</td>\n",
       "      <td>2214</td>\n",
       "      <td>2763</td>\n",
       "      <td>472</td>\n",
       "      <td>9</td>\n",
       "      <td>1178</td>\n",
       "      <td>439</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>99039</td>\n",
       "      <td>GORE San Martín</td>\n",
       "      <td>APP-RNPF</td>\n",
       "      <td>27:59.0</td>\n",
       "      <td>ara autorizar el aprovechamiento la U.O.G.F.  ...</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>22-SAM/REG-PLT-2019-050</td>\n",
       "      <td>CARBAJAL VIGO, SEBASTIAN</td>\n",
       "      <td>...</td>\n",
       "      <td>817</td>\n",
       "      <td>1925</td>\n",
       "      <td>2571</td>\n",
       "      <td>2396</td>\n",
       "      <td>2900</td>\n",
       "      <td>472</td>\n",
       "      <td>9</td>\n",
       "      <td>1404</td>\n",
       "      <td>533</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 56 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   OBJECTID       FUENTE/SOU    DOCREG   FECREG  \\\n",
       "0     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "1     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "2     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "3     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "4     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "5     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "6     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "7     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "8     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "9     99039  GORE San Martín  APP-RNPF  27:59.0   \n",
       "\n",
       "                                              OBSERV  ZONUTM  ORIGEN  TIPCOM  \\\n",
       "0  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "1  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "2  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "3  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "4  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "5  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "6  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "7  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "8  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "9  ara autorizar el aprovechamiento la U.O.G.F.  ...      18       1       1   \n",
       "\n",
       "                    NUMREG                NOMTIT/Tit  ...   b4    b5    b6  \\\n",
       "0  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  798  1965  2724   \n",
       "1  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  796  2069  2740   \n",
       "2  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  805  2114  2717   \n",
       "3  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  741  1731  2253   \n",
       "4  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  854  2127  2660   \n",
       "5  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  756  1901  2482   \n",
       "6  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  739  1792  2374   \n",
       "7  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  797  2108  2745   \n",
       "8  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  738  1755  2448   \n",
       "9  22-SAM/REG-PLT-2019-050  CARBAJAL VIGO, SEBASTIAN  ...  817  1925  2571   \n",
       "\n",
       "     b7    b8   b9  b10   b11  b12 Classification  \n",
       "0  2588  2981  475   11  1301  493             10  \n",
       "1  2661  3053  475   11  1269  460             10  \n",
       "2  2656  3130  447   10  1309  485             10  \n",
       "3  2031  2501  447   10  1197  440             10  \n",
       "4  2651  3144  461    9  1496  580             10  \n",
       "5  2427  2841  461    9  1302  492             10  \n",
       "6  2293  2648  438    9  1283  518             10  \n",
       "7  2731  3185  488   10  1364  502             10  \n",
       "8  2214  2763  472    9  1178  439             10  \n",
       "9  2396  2900  472    9  1404  533             10  \n",
       "\n",
       "[10 rows x 56 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictors = pd.read_csv(\"extracted2.csv\")\n",
    "predictors.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5acc5a43-9afe-4760-a059-58067371b978",
   "metadata": {},
   "source": [
    "## The following are the tree species and the corresponding label numbers\n",
    "\n",
    "##### Acrocarpus fraxinifolius\t1\n",
    "##### Calycophyllum spruceanum\t2\n",
    "##### Cedrela Mixed\t3\n",
    "##### Guazuma crinita\t4\n",
    "##### Miconia barbeyana\t5\n",
    "##### Ochroma pyramidale\t6\n",
    "##### Other Mixed\t7\n",
    "##### Swietenia Cedrela Mixed\t8\n",
    "##### Swietenia macrophylla\t9\n",
    "##### Swietenia Mixed\t10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "44a36f7f-1f15-42d1-a113-0cbf528a48a1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['X', 'Y', 'b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'b8', 'b9', 'b10', 'b11', 'b12', 'Classification']\n"
     ]
    }
   ],
   "source": [
    "Desired_columns = ['X', 'Y', 'b0', 'b1','b2', 'b3','b4', 'b5', 'b6', 'b7', 'b8', 'b9', 'b10', 'b11', 'b12', 'Classification']\n",
    "print(Desired_columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5542ec02-8bce-414c-adff-a84cd25dd789",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>X</th>\n",
       "      <th>Y</th>\n",
       "      <th>b0</th>\n",
       "      <th>b1</th>\n",
       "      <th>b2</th>\n",
       "      <th>b3</th>\n",
       "      <th>b4</th>\n",
       "      <th>b5</th>\n",
       "      <th>b6</th>\n",
       "      <th>b7</th>\n",
       "      <th>b8</th>\n",
       "      <th>b9</th>\n",
       "      <th>b10</th>\n",
       "      <th>b11</th>\n",
       "      <th>b12</th>\n",
       "      <th>Classification</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-76.75757</td>\n",
       "      <td>-7.1862</td>\n",
       "      <td>1230</td>\n",
       "      <td>932</td>\n",
       "      <td>792</td>\n",
       "      <td>507</td>\n",
       "      <td>798</td>\n",
       "      <td>1965</td>\n",
       "      <td>2724</td>\n",
       "      <td>2588</td>\n",
       "      <td>2981</td>\n",
       "      <td>475</td>\n",
       "      <td>11</td>\n",
       "      <td>1301</td>\n",
       "      <td>493</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-76.75757</td>\n",
       "      <td>-7.1862</td>\n",
       "      <td>1230</td>\n",
       "      <td>922</td>\n",
       "      <td>787</td>\n",
       "      <td>479</td>\n",
       "      <td>796</td>\n",
       "      <td>2069</td>\n",
       "      <td>2740</td>\n",
       "      <td>2661</td>\n",
       "      <td>3053</td>\n",
       "      <td>475</td>\n",
       "      <td>11</td>\n",
       "      <td>1269</td>\n",
       "      <td>460</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-76.75757</td>\n",
       "      <td>-7.1862</td>\n",
       "      <td>1227</td>\n",
       "      <td>930</td>\n",
       "      <td>806</td>\n",
       "      <td>495</td>\n",
       "      <td>805</td>\n",
       "      <td>2114</td>\n",
       "      <td>2717</td>\n",
       "      <td>2656</td>\n",
       "      <td>3130</td>\n",
       "      <td>447</td>\n",
       "      <td>10</td>\n",
       "      <td>1309</td>\n",
       "      <td>485</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-76.75757</td>\n",
       "      <td>-7.1862</td>\n",
       "      <td>1227</td>\n",
       "      <td>916</td>\n",
       "      <td>761</td>\n",
       "      <td>476</td>\n",
       "      <td>741</td>\n",
       "      <td>1731</td>\n",
       "      <td>2253</td>\n",
       "      <td>2031</td>\n",
       "      <td>2501</td>\n",
       "      <td>447</td>\n",
       "      <td>10</td>\n",
       "      <td>1197</td>\n",
       "      <td>440</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-76.75757</td>\n",
       "      <td>-7.1862</td>\n",
       "      <td>1232</td>\n",
       "      <td>943</td>\n",
       "      <td>823</td>\n",
       "      <td>535</td>\n",
       "      <td>854</td>\n",
       "      <td>2127</td>\n",
       "      <td>2660</td>\n",
       "      <td>2651</td>\n",
       "      <td>3144</td>\n",
       "      <td>461</td>\n",
       "      <td>9</td>\n",
       "      <td>1496</td>\n",
       "      <td>580</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          X       Y    b0   b1   b2   b3   b4    b5    b6    b7    b8   b9  \\\n",
       "0 -76.75757 -7.1862  1230  932  792  507  798  1965  2724  2588  2981  475   \n",
       "1 -76.75757 -7.1862  1230  922  787  479  796  2069  2740  2661  3053  475   \n",
       "2 -76.75757 -7.1862  1227  930  806  495  805  2114  2717  2656  3130  447   \n",
       "3 -76.75757 -7.1862  1227  916  761  476  741  1731  2253  2031  2501  447   \n",
       "4 -76.75757 -7.1862  1232  943  823  535  854  2127  2660  2651  3144  461   \n",
       "\n",
       "   b10   b11  b12  Classification  \n",
       "0   11  1301  493              10  \n",
       "1   11  1269  460              10  \n",
       "2   10  1309  485              10  \n",
       "3   10  1197  440              10  \n",
       "4    9  1496  580              10  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Desired_Output = predictors[Desired_columns]\n",
    "Desired_Output.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "08680efb-d9fb-4c19-a9ba-31eae361d021",
   "metadata": {},
   "outputs": [],
   "source": [
    "Desired_Output = Desired_Output.to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4aeff4f-1488-4109-835c-3576764f92a4",
   "metadata": {},
   "source": [
    "## The input and output variables are split between training and testing by 70:30\n",
    "\n",
    "#### All the 14 columns are X variables. The final column that has categorical numbers is the target or Y variable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2d70727c-91d1-407a-a982-92c641d4d0c2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_train.shape: torch.Size([6942, 14]), X_test.shape: torch.Size([2976, 14]), y_train.shape: torch.Size([6942]), y_test.shape: torch.Size([2976])\n"
     ]
    }
   ],
   "source": [
    "#Split the data\n",
    "X_train, X_test, y_train, y_test = train_test_split(Desired_Output[:,:14], Desired_Output[:,15], test_size=0.30, random_state=0)\n",
    "X_train = torch.FloatTensor(X_train)\n",
    "y_train = torch.LongTensor(y_train)-1\n",
    "X_test = torch.FloatTensor(X_test)\n",
    "y_test = torch.LongTensor(y_test)-1\n",
    "print('X_train.shape: {}, X_test.shape: {}, y_train.shape: {}, y_test.shape: {}'.format(X_train.shape, X_test.shape, y_train.shape, y_test.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9b93bdb-5529-4e5a-b951-9ec3472a305d",
   "metadata": {},
   "source": [
    "## The feedforward module with 3 hidden layers are built with different activation functions applied to the layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "88a5fa44-c211-4879-b043-2e50ba5e4950",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a feedforward module\n",
    "\n",
    "class Feedforward(torch.nn.Module):\n",
    "    def __init__(self, input_size, hidden_size, output_size=10):\n",
    "        super(Feedforward, self).__init__()\n",
    "        self.input_size = input_size\n",
    "        self.hidden_size  = hidden_size\n",
    "        self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)\n",
    "        self.fc2 = torch.nn.Linear(self.hidden_size, self.hidden_size)\n",
    "        self.relu = torch.nn.ReLU()\n",
    "        self.fc3 = torch.nn.Linear(self.hidden_size, output_size)\n",
    "        self.sigmoid = torch.nn.Sigmoid()\n",
    "        self.tanh = torch.nn.Tanh()\n",
    "    def forward(self, x):\n",
    "        hidden = self.relu(self.fc1(x))\n",
    "        hidden = self.relu(self.fc2(hidden))\n",
    "        output = self.tanh(self.fc3(hidden))\n",
    "\n",
    "        return output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "33e371f4-d88a-4ccb-ac3e-717ec92a8e2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Feedforward(14, 256) #input_size = 14 and hidden_size = 256\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.001)\n",
    "loss_function = nn.CrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "14ec12f1-cf05-4013-a61b-05a0e73da061",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch:   1 loss: 2.96000528\n",
      "epoch:  26 loss: 2.80332470\n",
      "epoch:  27 loss: 2.8033246994\n"
     ]
    }
   ],
   "source": [
    "epochs = 27 #27 is chosen because choosing higher numbers make python to crash and after 26 the loss is stabilized\n",
    "aggregated_losses = []\n",
    "\n",
    "for i in range(epochs):\n",
    "    i += 1\n",
    "    y_pred = model(X_train)\n",
    "    single_loss = loss_function(y_pred, y_train)\n",
    "    aggregated_losses.append(single_loss)\n",
    "\n",
    "    if i%25 == 1:\n",
    "        print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')\n",
    "\n",
    "    optimizer.zero_grad()\n",
    "    single_loss.backward()\n",
    "    optimizer.step()\n",
    "\n",
    "print(f'epoch: {i:3} loss: {single_loss.item():10.10f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ec56cfc6-8bc2-49e8-854f-a5e717792efd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 2.82178712\n"
     ]
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    y_val = model(X_test)\n",
    "    loss = loss_function(y_val, y_test)\n",
    "print(f'Loss: {loss:.8f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9b79a967-c913-4905-9f49-9712cc019f2f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[  0  52   0   0   0   0   0   0   0]\n",
      " [  0  17   0   0   0   0   0   0   0]\n",
      " [  0  10   0   0   0   0   0   0   0]\n",
      " [  0 269   0   0   0   0   0   0   0]\n",
      " [  0 869   0   0   0   0   0   0   0]\n",
      " [  0 922   0   0   0   0   0   0   0]\n",
      " [  0 293   0   0   0   0   0   0   0]\n",
      " [  0   8   0   0   0   0   0   0   0]\n",
      " [  0 536   0   0   0   0   0   0   0]]\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.00      0.00      0.00        52\n",
      "           1       0.01      1.00      0.01        17\n",
      "           2       0.00      0.00      0.00        10\n",
      "           3       0.00      0.00      0.00       269\n",
      "           5       0.00      0.00      0.00       869\n",
      "           6       0.00      0.00      0.00       922\n",
      "           7       0.00      0.00      0.00       293\n",
      "           8       0.00      0.00      0.00         8\n",
      "           9       0.00      0.00      0.00       536\n",
      "\n",
      "    accuracy                           0.01      2976\n",
      "   macro avg       0.00      0.11      0.00      2976\n",
      "weighted avg       0.00      0.01      0.00      2976\n",
      "\n",
      "0.00571236559139785\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/lib/python3/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
      "  _warn_prf(average, modifier, msg_start, len(result))\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n",
    "\n",
    "print(confusion_matrix(y_test, y_val.argmax(dim=1)))\n",
    "print(classification_report(y_test,y_val.argmax(dim=1)))\n",
    "print(accuracy_score(y_test, y_val.argmax(dim=1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf4578c8-c0a9-4fc5-8bc2-1d54f4da73bb",
   "metadata": {},
   "source": [
    "## Creating a single-layer perceptron model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "7eeaf9c3-6f79-4b8f-b673-72691596609b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the model\n",
    "class Perceptron(torch.nn.Module):\n",
    "    def __init__(self,input_size, output_size,use_activation_fn=None):\n",
    "        super(Perceptron, self).__init__()\n",
    "        self.fc = nn.Linear(input_size,output_size)\n",
    "        self.relu = torch.nn.ReLU() # instead of Heaviside step fn\n",
    "        self.sigmoid = torch.nn.Sigmoid()\n",
    "        self.tanh = torch.nn.Tanh()\n",
    "        self.use_activation_fn=use_activation_fn\n",
    "    def forward(self, x):\n",
    "        output = self.fc(x)\n",
    "        if self.use_activation_fn=='sigmoid':\n",
    "            output = self.sigmoid(output) # To add the non-linearity. Try training you Perceptron with and without the non-linearity\n",
    "        elif self.use_activation_fn=='tanh':\n",
    "            output = self.tanh(output) \n",
    "        elif self.use_activation_fn=='relu':\n",
    "            output = self.relu(output) \n",
    "\n",
    "        return output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "f9e1249a-0960-42b2-a576-1c17b58c3ce9",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Perceptron(input_size=14, output_size=10, use_activation_fn='tanh')\n",
    "criterion = torch.nn.CrossEntropyLoss()\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "ceef18da-5b12-47f8-9ad0-7e4454e69b43",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch:   1 loss: 2.86867380\n",
      "epoch:  26 loss: 2.86806893\n",
      "epoch:  51 loss: 2.86806870\n",
      "epoch:  76 loss: 2.86806870\n",
      "epoch: 101 loss: 2.86806870\n",
      "epoch: 126 loss: 2.86806870\n",
      "epoch: 151 loss: 2.86801815\n",
      "epoch: 176 loss: 2.86790514\n",
      "epoch: 201 loss: 2.86790514\n",
      "epoch: 226 loss: 2.86790514\n",
      "epoch: 251 loss: 2.86790514\n",
      "epoch: 276 loss: 2.86790514\n",
      "epoch: 300 loss: 2.8679051399\n"
     ]
    }
   ],
   "source": [
    "epochs = 300\n",
    "aggregated_losses = []\n",
    "\n",
    "for i in range(epochs):\n",
    "    i += 1\n",
    "    y_pred = model(X_train)\n",
    "    single_loss = loss_function(y_pred, y_train)\n",
    "    aggregated_losses.append(single_loss)\n",
    "\n",
    "    if i%25 == 1:\n",
    "        print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')\n",
    "\n",
    "    optimizer.zero_grad()\n",
    "    single_loss.backward()\n",
    "    optimizer.step()\n",
    "\n",
    "print(f'epoch: {i:3} loss: {single_loss.item():10.10f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "5e5d77f3-791c-4977-a4ea-b4ec422e8521",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 2.87435365\n"
     ]
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    y_val = model(X_test)\n",
    "    loss = loss_function(y_val, y_test)\n",
    "print(f'Loss: {loss:.8f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "78680374-f6f7-4533-bde0-75dbcfe7fcf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 52   0   0   0   0   0   0   0   0]\n",
      " [ 17   0   0   0   0   0   0   0   0]\n",
      " [ 10   0   0   0   0   0   0   0   0]\n",
      " [269   0   0   0   0   0   0   0   0]\n",
      " [869   0   0   0   0   0   0   0   0]\n",
      " [922   0   0   0   0   0   0   0   0]\n",
      " [293   0   0   0   0   0   0   0   0]\n",
      " [  8   0   0   0   0   0   0   0   0]\n",
      " [536   0   0   0   0   0   0   0   0]]\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.02      1.00      0.03        52\n",
      "           1       0.00      0.00      0.00        17\n",
      "           2       0.00      0.00      0.00        10\n",
      "           3       0.00      0.00      0.00       269\n",
      "           5       0.00      0.00      0.00       869\n",
      "           6       0.00      0.00      0.00       922\n",
      "           7       0.00      0.00      0.00       293\n",
      "           8       0.00      0.00      0.00         8\n",
      "           9       0.00      0.00      0.00       536\n",
      "\n",
      "    accuracy                           0.02      2976\n",
      "   macro avg       0.00      0.11      0.00      2976\n",
      "weighted avg       0.00      0.02      0.00      2976\n",
      "\n",
      "0.01747311827956989\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n",
    "\n",
    "print(confusion_matrix(y_test, y_val.argmax(dim=1)))\n",
    "print(classification_report(y_test,y_val.argmax(dim=1)))\n",
    "print(accuracy_score(y_test, y_val.argmax(dim=1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b95b9593-40db-4ee4-8b9b-300187895589",
   "metadata": {},
   "source": [
    "### Results:\n",
    "\n",
    "#### Using single-layer perceptron and multi-layer feedforward deep learning models did not yield high accuracy in distinguishing different tree plantations using Sentinel-2 dataset. There could be multiple reasons behind this. Since spectral band information from Sentinel 2 satellite image constitute most of the X variables and because of the high spectral similarity between tree species, the model couldn't find a way to distinguish the species. It can also because of the poor modeling parameters such as poor selection of activation functions, learning rate, epoch numbers etc. which could have affected the model as well. In the future, more input parameters like texture metrics, band information from sentinel-1 etc. can be added into the model for better variance. Further different modelling parameters can be explored to improve the accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "bc3b813c-6bc7-4560-88eb-16f38d88255a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[NbConvertApp] Converting notebook /media/sf_LVM_shared/my_SE_data/exercise/Final_Project.ipynb to html\n",
      "[NbConvertApp] Writing 642237 bytes to /media/sf_LVM_shared/my_SE_data/exercise/Final_Project.html\n"
     ]
    }
   ],
   "source": [
    "!jupyter nbconvert --to html /media/sf_LVM_shared/my_SE_data/exercise/Final_Project.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c878985-6243-4e1b-af63-0ca0be1716ae",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}