Linux Operation System as a base for Spatial Ecology Computing

Linux is a generic term refering to Unix-like computer operating systems based on the Linux kernel. Their development is one of the most prominent examples of free and open source software collaboration; typically all the underlying source code can be used, freely modified, and redistributed, both commercially and non-commercially, by anyone under licenses such as the GNU.

In this site an introduction will be given to the Unix/Linux Shell using Bash language to manipulate data rather than interacting with/setting the operation system. The final aim is to build a stand-alone implementation / processes that include a combination of bash/R/AWK/gnuplot commands that can be run several times using the features of each software. In this part of the training site we provide various examples of bash commands reported in this Unix/Linux Command Reference.

In the jupyter-notebook you can call/use bash language by using two symbols:

%%bash
bash-command

before the bash commands, or

! bash-command

followed by the bash commands

Bash language syntax

The object of this document is the use of Bash language to explore and manipulate files rather than to set/interact with the operation system. You can read and follow jupter-notebook or you can copy the commands included in the frames part of this document and paste them into an interactive Bash shell. Once you have familiarity with the general commands of Bash you can further advance in learning bash with online manuals and guides. There is a large variety of documentation available at: http://www.linux.org/lessons/advanced/x1110.html http://tldp.org/LDP/abs/html/

The best way is just to try each command using a file, and/or search on the Internet for more examples and deeper explanations.

Searching for a command, getting help

In a shell window (the terminal) the following prompt is written:

user@pc_name:directrory$

after the $ you are able to insert the command Command syntax:

command [option] [file]

The square bracts “[ ]” identify an optional feature of the command. It can be inserted to retrieve more information or different setting of a command. To get a command for a specific action type “man -k thewordthatyouneed”

e.g. I want to search for a command able to count the line in a file

[1]:
! man -k count
acct (2)             - switch process accounting on or off
acct (5)             - process accounting file
argz_count (3)       - functions to handle an argz list
cksum (1)            - checksum and count the bytes in a file
CPU_COUNT (3)        - macros for manipulating CPU sets
CPU_COUNT_S (3)      - macros for manipulating CPU sets
error_message_count (3) - glibc error reporting functions
ibv_attach_counters_point_flow (3) - attach individual counter definition to ...
ibv_destroy_counters (3) - Create or destroy a counters handle
ibv_read_counters (3) - Read counter values
fincore (1)          - count pages of file contents in core
get_avphys_pages (3) - get total and available physical page counts
get_phys_pages (3)   - get total and available physical page counts
git-count-objects (1) - Count unpacked number of objects and their disk consu...
goa-daemon (8)       - GNOME Online Accounts Daemon
ibv_create_counters (3) - Create or destroy a counters handle
mlx5dv_dr_action_create_flow_counter (3) - Create devx flow counter actions
mlx5dv_ts_to_ns (3)  - Convert device timestamp from HCA core clock units to ...
pam_lastlog (8)      - PAM module to display date of last login and perform i...
pam_succeed_if (8)   - test account characteristics
pam_tally (8)        - The login counter (tallying) module
pam_tally2 (8)       - The login counter (tallying) module
pcre16_refcount (3)  - Perl-compatible regular expressions
pcre2_get_ovector_count (3) - Perl-compatible regular expressions (revised API)
pcre32_refcount (3)  - Perl-compatible regular expressions
pcre_refcount (3)    - Perl-compatible regular expressions
rdma-statistic (8)   - RDMA statistic counter configuration
sum (1)              - checksum and count the blocks in a file
systemd-bless-boot-generator (8) - Pull systemd-bless-boot.service into the i...
timer_getoverrun (2) - get overrun count for a POSIX per-process timer
userdel (8)          - delete a user account and related files
usermod (8)          - modify a user account
v.in.geonames (1grass) - Imports geonames.org country files into a vector poi...
v.qcount (1grass)    - Indices for quadrat counts of vector point lists.
v.vect.stats (1grass) - Count points in areas, calculate statistics from poin...
wc (1)               - print newline, word, and byte counts for each file

in the last lines you get:

“wc (1) - print newline, word, and byte counts for each file”

so the command “wc” is your command. To get information about a command type “man command” or info “command” e.g.

[2]:
! man wc
WC(1)                            User Commands                           WC(1)

NNAAMMEE
       wc - print newline, word, and byte counts for each file

SSYYNNOOPPSSIISS
       wwcc [_O_P_T_I_O_N]... [_F_I_L_E]...
       wwcc [_O_P_T_I_O_N]... _-_-_f_i_l_e_s_0_-_f_r_o_m_=_F

DDEESSCCRRIIPPTTIIOONN
       Print newline, word, and byte counts for each FILE, and a total line if
       more than one FILE is specified.  A word is a non-zero-length  sequence
       of characters delimited by white space.

       With no FILE, or when FILE is -, read standard input.

       The  options  below may be used to select which counts are printed, al‐
       ways in the following order: newline, word,  character,  byte,  maximum
       line length.

       --cc, ----bbyytteess
              print the byte counts

       --mm, ----cchhaarrss
              print the character counts

       --ll, ----lliinneess
              print the newline counts

       ----ffiilleess00--ffrroomm=_F
              read  input  from the files specified by NUL-terminated names in
              file F; If F is - then read names from standard input

       --LL, ----mmaaxx--lliinnee--lleennggtthh
              print the maximum display width

       --ww, ----wwoorrddss
              print the word counts

       ----hheellpp display this help and exit

       ----vveerrssiioonn
              output version information and exit

AAUUTTHHOORR
       Written by Paul Rubin and David MacKenzie.

RREEPPOORRTTIINNGG BBUUGGSS
       GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
       Report wc translation bugs to <https://translationproject.org/team/>

CCOOPPYYRRIIGGHHTT
       Copyright © 2018 Free Software Foundation, Inc.   License  GPLv3+:  GNU
       GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
       This  is  free  software:  you  are free to change and redistribute it.
       There is NO WARRANTY, to the extent permitted by law.

SSEEEE AALLSSOO
       Full documentation at: <https://www.gnu.org/software/coreutils/wc>
       or available locally via: info '(coreutils) wc invocation'

GNU coreutils 8.30              September 2019                           WC(1)

Input/Output redirect

Running a command, saving a result

The symbols “>” are used to save the result of a command in a file. Instead “<” is used to retrieve information from a file. In these cases, using the informatics terminology we can use the expression ’standard input redirection” or and “standard output redirection”.

This page summarize the Standard Input and Output Redirection commonly used.

In this course we will mainly use the symbol “>”, “>>”, “<”. e.g.

[4]:
!ls
00_Setting_Colab_for_for_Spatial_Ecology_course.ipynb  02_pktools_osgeo.ipynb
01_gdal.ipynb                                          03_bash_osgeo.ipynb
02_pktools_colab.ipynb                                 geodata
[5]:
! ls > mylist.txt
[6]:
! more mylist.txt
00_Setting_Colab_for_for_Spatial_Ecology_course.ipynb
01_gdal.ipynb
02_pktools_colab.ipynb
02_pktools_osgeo.ipynb
03_bash_osgeo.ipynb
geodata
mylist.txt
[7]:
! ls >> mylist.txt
[8]:
! more mylist.txt
00_Setting_Colab_for_for_Spatial_Ecology_course.ipynb
01_gdal.ipynb
02_pktools_colab.ipynb
02_pktools_osgeo.ipynb
03_bash_osgeo.ipynb
geodata
mylist.txt
00_Setting_Colab_for_for_Spatial_Ecology_course.ipynb
01_gdal.ipynb
02_pktools_colab.ipynb
02_pktools_osgeo.ipynb
03_bash_osgeo.ipynb
geodata
mylist.txt

Special Characters

Special characters, also called metacharacters, are a group of characters that have particular meanings in the bash language. Listed here are those used in the following scripts. Type the examples and try to get the meaning.

The asterisk “*” symbol identifies a string with one or more character

[9]:
! ls /dev/tty*
/dev/tty    /dev/tty23  /dev/tty39  /dev/tty54      /dev/ttyS10  /dev/ttyS26
/dev/tty0   /dev/tty24  /dev/tty4   /dev/tty55      /dev/ttyS11  /dev/ttyS27
/dev/tty1   /dev/tty25  /dev/tty40  /dev/tty56      /dev/ttyS12  /dev/ttyS28
/dev/tty10  /dev/tty26  /dev/tty41  /dev/tty57      /dev/ttyS13  /dev/ttyS29
/dev/tty11  /dev/tty27  /dev/tty42  /dev/tty58      /dev/ttyS14  /dev/ttyS3
/dev/tty12  /dev/tty28  /dev/tty43  /dev/tty59      /dev/ttyS15  /dev/ttyS30
/dev/tty13  /dev/tty29  /dev/tty44  /dev/tty6       /dev/ttyS16  /dev/ttyS31
/dev/tty14  /dev/tty3   /dev/tty45  /dev/tty60      /dev/ttyS17  /dev/ttyS4
/dev/tty15  /dev/tty30  /dev/tty46  /dev/tty61      /dev/ttyS18  /dev/ttyS5
/dev/tty16  /dev/tty31  /dev/tty47  /dev/tty62      /dev/ttyS19  /dev/ttyS6
/dev/tty17  /dev/tty32  /dev/tty48  /dev/tty63      /dev/ttyS2   /dev/ttyS7
/dev/tty18  /dev/tty33  /dev/tty49  /dev/tty7       /dev/ttyS20  /dev/ttyS8
/dev/tty19  /dev/tty34  /dev/tty5   /dev/tty8       /dev/ttyS21  /dev/ttyS9
/dev/tty2   /dev/tty35  /dev/tty50  /dev/tty9       /dev/ttyS22
/dev/tty20  /dev/tty36  /dev/tty51  /dev/ttyprintk  /dev/ttyS23
/dev/tty21  /dev/tty37  /dev/tty52  /dev/ttyS0      /dev/ttyS24
/dev/tty22  /dev/tty38  /dev/tty53  /dev/ttyS1      /dev/ttyS25

The questionmark “?” symbol identifies a a single character

[14]:
%%bash
ls /dev/tty?
/dev/tty0
/dev/tty1
/dev/tty2
/dev/tty3
/dev/tty4
/dev/tty5
/dev/tty6
/dev/tty7
/dev/tty8
/dev/tty9

The square brackets “[ ]” identify one of a single character listed

[15]:
! ls /dev/tty[2-4]
/dev/tty2  /dev/tty3  /dev/tty4

Curly brackets “{}” symbol identify one of a single string listed

[24]:
%%bash
ls /dev/{tty,loop}*
/dev/loop0
/dev/loop1
/dev/loop10
/dev/loop11
/dev/loop12
/dev/loop13
/dev/loop14
/dev/loop15
/dev/loop16
/dev/loop2
/dev/loop3
/dev/loop4
/dev/loop5
/dev/loop6
/dev/loop7
/dev/loop8
/dev/loop9
/dev/loop-control
/dev/tty
/dev/tty0
/dev/tty1
/dev/tty10
/dev/tty11
/dev/tty12
/dev/tty13
/dev/tty14
/dev/tty15
/dev/tty16
/dev/tty17
/dev/tty18
/dev/tty19
/dev/tty2
/dev/tty20
/dev/tty21
/dev/tty22
/dev/tty23
/dev/tty24
/dev/tty25
/dev/tty26
/dev/tty27
/dev/tty28
/dev/tty29
/dev/tty3
/dev/tty30
/dev/tty31
/dev/tty32
/dev/tty33
/dev/tty34
/dev/tty35
/dev/tty36
/dev/tty37
/dev/tty38
/dev/tty39
/dev/tty4
/dev/tty40
/dev/tty41
/dev/tty42
/dev/tty43
/dev/tty44
/dev/tty45
/dev/tty46
/dev/tty47
/dev/tty48
/dev/tty49
/dev/tty5
/dev/tty50
/dev/tty51
/dev/tty52
/dev/tty53
/dev/tty54
/dev/tty55
/dev/tty56
/dev/tty57
/dev/tty58
/dev/tty59
/dev/tty6
/dev/tty60
/dev/tty61
/dev/tty62
/dev/tty63
/dev/tty7
/dev/tty8
/dev/tty9
/dev/ttyprintk
/dev/ttyS0
/dev/ttyS1
/dev/ttyS10
/dev/ttyS11
/dev/ttyS12
/dev/ttyS13
/dev/ttyS14
/dev/ttyS15
/dev/ttyS16
/dev/ttyS17
/dev/ttyS18
/dev/ttyS19
/dev/ttyS2
/dev/ttyS20
/dev/ttyS21
/dev/ttyS22
/dev/ttyS23
/dev/ttyS24
/dev/ttyS25
/dev/ttyS26
/dev/ttyS27
/dev/ttyS28
/dev/ttyS29
/dev/ttyS3
/dev/ttyS30
/dev/ttyS31
/dev/ttyS4
/dev/ttyS5
/dev/ttyS6
/dev/ttyS7
/dev/ttyS8
/dev/ttyS9

Quoting

You can prevent the shell from interpreting a metacharacter by placing a backslash “”. In this way the metacharacter become a normal character.

file1 will be copied to file?

[26]:
! cp mylist.txt mylist\?.txt
! ls

You can also insert the metacharacter between quotation marks.

[27]:
! ls /dev/"tt*"
ls: cannot access '/dev/tt*': No such file or directory

Pipe

The pipe “|” metacharacter enables you to run a set of chained processes. To understand lets do an example creating a temporal file called tmp.txt and counting how many lines there are in the file.

[30]:
%%bash
ls /usr/bin > tmp.txt
wc -l tmp.txt
2227 tmp.txt

The same can be written

[31]:
! ls /usr/bin | wc -l
2227

without creating an intermediate file.