User Tools

Site Tools


wiki:anunna_setting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:anunna_setting [2019/07/14 08:48] (current)
Line 1: Line 1:
 +===== Geocomputation at High Performance Computing Cluster (HPC) Anunna =====
  
 +{{ :​wiki:​cluster_computing.jpg?​400 |}}
 +
 +You can log in to Anunna with the following line and by changing the user_name with the login name that the Anunna'​s system admin sent you. After, a prompt terminal will request to insert the password that the Anunna'​s system admin sent you.
 +
 +   ssh -X  -Y user_name@login.anunna.wur.nl
 +
 +==== Setting up your home and sw for Geocomputation analisys ====
 +
 +  cd $HOME
 +  # create a folder for your scripts ​
 +  mkdir $HOME/​scripts
 +  cd $HOME/​scripts
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​anunna_setting.sh
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​sc01_split_tif.sh
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​sc02a_filter_tif_forloop.sh
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​sc02b_filter_tif_xargs.sh
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​sc02c_filter_tif_njobs.sh
 +  wget http://​www.spatial-ecology.net/​ost4sem/​exercise/​hpc/​sc02d_filter_tif_arrayjobs.sh
 +  sed -i -e "​s/​insert_your_user/​$USER/​g"​ *  ​
 +  ​
 +**Available storage in anunna**\\
 +/​lustre/​scratch/​GUESTS/​$USER : regularly cleaned up (files >1 month old will be removed)\\
 +/​lustre/​nobackup/​GUESTS/​$USER : extra cost\\
 +/​lustre/​backup/​GUESTS/​$USER ​  : will be backed up (for extra cost!)\\
 +
 +== Run anunna_setting.sh to copy data, create directories and copy bash setting. ==
 +
 +  bash $HOME/​scripts/​anunna_setting.sh  ​
 +  ​
 +<code bash| anunna_setting.sh>​
 +# create folders for standard error and standard output ​
 +mkdir /​lustre/​scratch/​GUESTS/​$USER/​stderr
 +mkdir /​lustre/​scratch/​GUESTS/​$USER/​stdout ​
 +
 +# create soft link to scratch ​
 +ln -s /​lustre/​scratch/​GUESTS/​$USER/ ​ $HOME/​scratch30
 +
 +# copy data 
 +cp  -r /​tmp/​ost4sem $HOME/
 +
 +# copy visualization tool 
 +
 +mkdir $HOME/bin
 +cp -r /​tmp/​bin ​ $HOME
 +
 +# copy bash setting
 +cp  /​tmp/​.bashrc_GUESTS ​ $HOME/​.bashrc
 +
 +# load the new bash setting
 +source $HOME/​.bashrc
 +</​code>​
 +
 +At this point your home should be configured to run Gecomputation procedures. ​
 +The Geocomputation software (GRASS PKTOOLS and GDAL) are loaded directly in your bashrc. You can read it by "more $HOME/​.bashrc"​
 +
 +===== Filter an image =====
 +
 +{{overview_scripts.png?​1000 |}}
 +
 +Status of the jobs in slurm can be seen by:
 +
 +
 +  squeue --all
 +  sacct
 +  sinfo
 +  ​
 +
 +building up some specific alias and save to $HOME/​.bashrc
 +
 +  alias myq='​squeue -u $USER   -o "%.9F %.10K %.4P %.80j %3D%2C%.8T %.9M  %.9l  %.S  %R"'​
 +
 +==== Prepare raster dataset ====
 +
 +A portion of a Landsat image will be divided in 4 vrt tiles each one containing 3 bands. The vrt will be used in the following scripting procedures. ​  
 +
 +  sbatch /​home/​GUESTS/​$USER/​scripts/​sc01_split_tif.sh
 +
 +<code bash| sc01_split_tif.sh>​
 +#!/bin/bash
 +#SBATCH -p GUESTS_Low
 +#SBATCH -J sc01_split_tif.sh
 +#SBATCH -n 1 -c 1 -N 1
 +#SBATCH -t 1:​00:​00 ​
 +#SBATCH -o /​lustre/​scratch/​GUESTS/​insert_your_user/​stdout/​sc01_split_tif.sh.%J.out
 +#SBATCH -e /​lustre/​scratch/​GUESTS/​insert_your_user/​stderr/​sc01_split_tif.sh.%J.err
 +#SBATCH --mail-type=ALL
 +#SBATCH --mail-user=email
 +#SBATCH --mem-per-cpu=500M
 +
 +#### sbatch /​home/​GUESTS/​$USER/​scripts/​sc01_split_tif.sh
 +
 +
 +DIR=/​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat
 +
 +gdalbuildvrt -overwrite -separate -te 36.5 -1.5 37 -1 $DIR/​stack_UL.vrt $DIR/​LT51680612010231MLK00_B1_proj.tif $DIR/​LT51680612010231MLK00_B2_proj.tif $DIR/​LT51680612010231MLK00_B3_proj.tif
 +gdalbuildvrt -overwrite -separate -te 36.5 -2 37 -1.5 $DIR/​stack_LL.vrt $DIR/​LT51680612010231MLK00_B1_proj.tif $DIR/​LT51680612010231MLK00_B2_proj.tif $DIR/​LT51680612010231MLK00_B3_proj.tif
 +
 +gdalbuildvrt -overwrite -separate -te 37 -1.5 37.5 -1 $DIR/​stack_UR.vrt $DIR/​LT51680612010231MLK00_B1_proj.tif $DIR/​LT51680612010231MLK00_B2_proj.tif $DIR/​LT51680612010231MLK00_B3_proj.tif
 +gdalbuildvrt -overwrite -separate -te 37 -2 37.5 -1.5 $DIR/​stack_LR.vrt $DIR/​LT51680612010231MLK00_B1_proj.tif $DIR/​LT51680612010231MLK00_B2_proj.tif $DIR/​LT51680612010231MLK00_B3_proj.tif
 +
 +
 +</​code>​
 +
 +
 +==== sc02a Proces 4 tiles in one node using a cpu with the bash for loop ====
 +
 +This is the easiest procedure to perform a geocomputation operation. Lunch a job that use a normal for loop to iterate on the 4 tiles. After the iterations (pkfilter) the for tiles can be re-merged by gdalbuildvrt and gdal_translate.
 +
 +  sbatch /​home/​GUESTS/​$USER/​scripts/​sc02a_filter_tif_forloop.sh
 +
 +<code bash| sc02a_filter_tif_forloop.sh>​
 +#!/bin/bash
 +#SBATCH -p GUESTS_Low
 +#SBATCH -J sc02a_filter_tif_forloop.sh
 +#SBATCH -n 1 -c 1 -N 1
 +#SBATCH -t 1:​00:​00 ​
 +#SBATCH -o /​lustre/​scratch/​GUESTS/​insert_your_user/​stdout/​sc02a_filter_tif_forloop.sh.%J.out
 +#SBATCH -e /​lustre/​scratch/​GUESTS/​insert_your_user/​stderr/​sc02a_filter_tif_forloop.sh.%J.err
 +#SBATCH --mail-type=ALL
 +#SBATCH --mail-user=email
 +#SBATCH --mem-per-cpu=500
 +
 +#### sbatch /​home/​GUESTS/​$USER/​scripts/​sc02a_filter_tif_forloop.sh
 +
 +
 +DIR=/​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat
 +
 +echo filter the stack_??​.vrt files 
 +
 +for file in $DIR/​stack_??​.vrt ​ ; do 
 +filename=$(basename $file .vrt)
 +pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/​$filename.tif ​
 +done 
 +
 +echo  re-create the large tif 
 +
 +gdalbuildvrt -overwrite $DIR/​stack.vrt ​  ​$DIR/​stack_UL.tif ​ $DIR/​stack_LL.tif ​   $DIR/​stack_UR.tif ​  ​$DIR/​stack_LR.tif  ​
 +gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9 ​ $DIR/​stack.vrt $DIR/​stack_filter.tif ​
 +rm $DIR/​stack_UL.tif ​ $DIR/​stack_LL.tif ​   $DIR/​stack_UR.tif ​  ​$DIR/​stack_LR.tif $DIR/​stack.vrt ​
 +
 +</​code> ​
 +
 +
 +==== sc02b Multi-process inside one node using 4 cpu using xargs ====
 +
 +This is one of the most efficient ways to perform a geocomputation operation. Lunch a job that use xargs to compute the iterations in a multicore (4 cpu in this case). After the iterations (pkfilter) the 4 tiles can be re-merged by gdalbuildvrt and gdal_translate. The use of xargs allows to constrains all the iterations in one node using different cpus. The advantage is that after xargs all the tiles will be ready to be merged back. A disadvantage can be that in case you are requesting many cpu (e.g. 24) you have to wait that one node will have 24 cpu free. A good compromise can be just requested 8-12 cpu and add more time to the wall time (-t) 
 +
 +  sbatch /​home/​GUESTS/​$USER/​scripts/​sc02b_filter_tif_xargs.sh
 +
 +<code bash| sc02b_filter_tif_xargs.sh>​
 +#!/bin/bash
 +#SBATCH -p GUESTS_Low
 +#SBATCH -J sc02b_filter_tif_xargs.sh
 +#SBATCH -n 1 -c 4 -N 1
 +#SBATCH -t 1:​00:​00 ​
 +#SBATCH -o /​lustre/​scratch/​GUESTS/​insert_your_user/​stdout/​sc02b_filter_tif_xargs.sh.%J.out
 +#SBATCH -e /​lustre/​scratch/​GUESTS/​insert_your_user/​stderr/​sc02b_filter_tif_xargs.sh.%J.err
 +#SBATCH --mail-type=ALL
 +#SBATCH --mail-user=email
 +#SBATCH --mem-per-cpu=500
 +
 +#### sbatch /​home/​GUESTS/​$USER/​scripts/​sc02b_filter_tif_xargs.sh
 +
 +
 +export DIR=/​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat
 +
 +echo start the multicore computation
 +
 +ls $DIR/​stack_??​.vrt | xargs -n 1 -P 4 bash -c $'  ​
 +file=$1
 +filename=$(basename $file .vrt)
 +pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/​$filename.tif ​
 +' _ 
 +
 +echo  re-create the large tif 
 +
 +gdalbuildvrt -overwrite $DIR/​stack.vrt ​  ​$DIR/​stack_UL.tif ​ $DIR/​stack_LL.tif ​   $DIR/​stack_UR.tif ​  ​$DIR/​stack_LR.tif  ​
 +gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9 ​ $DIR/​stack.vrt $DIR/​stack_filter.tif ​
 +rm $DIR/​stack_UL.tif ​ $DIR/​stack_LL.tif ​   $DIR/​stack_UR.tif ​  ​$DIR/​stack_LR.tif $DIR/​stack.vrt ​
 +
 +</​code>​
 +
 +==== sc02c Proces 4 tiles with 4 indepent jobs - one node one cpu ====
 +
 +This is a good way to run 4 independent jobs. Each job can perform one iteration. This option is good if need to lunch 100-200 jobs. You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is wait that all the jobs are finished. ​
 +
 +  for file in /​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat/​stack_??​.vrt ​
 +  do sbatch --export=file=$file ​ /​home/​GUESTS/​$USER/​scripts/​sc02c_filter_tif_njobs.sh ​
 +  done 
 +
 +<code bash| sc02c_filter_tif_njobs.sh>​
 +#!/bin/bash
 +#SBATCH -p GUESTS_Low
 +#SBATCH -J sc02c_filter_tif_njobs.sh
 +#SBATCH -n 1 -c 1 -N 1
 +#SBATCH -t 1:​00:​00 ​
 +#SBATCH -o /​lustre/​scratch/​GUESTS/​insert_your_user/​stdout/​sc02c_filter_tif_njobs.sh.%J.out
 +#SBATCH -e /​lustre/​scratch/​GUESTS/​insert_your_user/​stderr/​sc02c_filter_tif_njobs.sh.%J.err
 +#SBATCH --mail-type=ALL
 +#SBATCH --mail-user=email
 +#SBATCH --mem-per-cpu=500
 +
 +#### for file in /​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat/​stack_??​.vrt ​ ; do sbatch --export=file=$file ​ /​home/​GUESTS/​$USER/​scripts/​sc02c_filter_tif_njobs.sh ; done 
 +
 +
 +DIR=/​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat
 +
 +echo filter the $file file 
 +
 +filename=$(basename $file .vrt)
 +pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/​$filename.tif ​
 +
 +echo  re-create the large tif by another script
 +</​code>​
 +
 +==== sc02c Proces 4 tiles with 1 job lunching 4-array-job - one node one cpu ====
 +
 +This is a good way to run 4 independent jobs-array. Each job-array can perform one iteration. This option is good if need to lunch many many computations (e.g. 1000-2000). You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is wait that all the jobs are finished. ​
 +
 +  sbatch /​home/​GUESTS/​$USER/​scripts/​sc02d_filter_tif_arrayjobs.sh
 +
 +<code bash| sc02d_filter_tif_arrayjobs.sh>​
 +#!/bin/bash
 +#SBATCH -p GUESTS_Low
 +#SBATCH -J sc02d_filter_tif_arrayjobs.sh
 +#SBATCH -n 1 -c 1 -N 1
 +#SBATCH -t 1:​00:​00 ​
 +#SBATCH -o /​lustre/​scratch/​GUESTS/​insert_your_user/​stdout/​sc02d_filter_tif_arrayjobs.sh.%A_%a.out ​
 +#SBATCH -e /​lustre/​scratch/​GUESTS/​insert_your_user/​stderr/​sc02d_filter_tif_arrayjobs.sh.%A_%a.err
 +#SBATCH --mail-type=ALL
 +#SBATCH --mail-user=email
 +#SBATCH --mem-per-cpu=500
 +#SBATCH --array=1-4
 +
 +#### sbatch /​home/​GUESTS/​$USER/​scripts/​sc02d_filter_tif_arrayjobs.sh
 +
 +
 +DIR=/​home/​GUESTS/​$USER/​ost4sem/​exercise/​KenyaGIS/​Landsat
 +
 +file=$(ls $DIR/​stack_??​.vrt ​ | head  -n  $SLURM_ARRAY_TASK_ID | tail  -1 )
 +
 +filename=$(basename $file .vrt)
 +pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/​$filename.tif ​
 +</​code>​
wiki/anunna_setting.txt ยท Last modified: 2019/07/14 08:48 (external edit)