Manipulate GSIM files
Recorded lecture: 1:25:10 - 2:05:20
The Global Streamflow Indices and Metadata Archive (GSIM) –
Part 1: The production of daily streamflow archive and metadata
Part 2: Quality Control, Time-series Indices and Homogeneity Assessment
Copy paste the following commands in to the terminal. No jupyther-notebook file is provided.
Download the GSIM archive
cd /media/sf_LVM_shared
mkdir -p GSIM/zip
cd GSIM/zip
wget https://store.pangaea.de/Publications/GudmundssonL-etal_2018/GSIM_indices.zip
unzip GSIM_indices.zip
cd GSIM_indices/TIMESERIES/monthly
Data exploration
To return a fast results we only perform the operation on ./US_*.mon files
Create x_y.txt file
First count longitude and latitude information.
grep latitude US*.mon | awk '{ print $4 }' | wc -l
grep longitude US*.mon | awk '{ print $4 }' | wc -l
Now that we know that for longitude/latitude strings are not missing any entrance we can combine and create the x_y.txt
paste -d " " <(grep longitude US*.mon | awk '{print $4}') <(grep latitude US*.mon | awk '{print $4 }') > x_y.txt
Count number of observations
For the column “MEAN” count the overall number of observations and also the ones that reported NA
paste -d " " <(awk -F , '{ if(NF>5) print $2}' US*.mon | grep -v date | wc -l ) <( awk -F , '{ if(NF>5) print $2}' US*.mon | grep -v date | grep NA | wc -l )
overall observation 2,053,753 ; observation with NA 556,197
Count how many observations per date
List (and count) unique date observations
awk -F , '{ if(NF>5) { if ($1 > 0) { print $1 }} }' ./US_*.mon | sort | uniq -c > count_date.txt
Monthly MEAN distribution
Check if your data are normally distributed.
awk -F , '{ if(NF>5) { if ($2 > 0 ) { print int($2) }} }' ./US_*.mon | sort -g | uniq -c
The monthly MEAN is skewed to the left.