Introduction to Python

Longzhu Shen Spatial Ecology Jun 2019

Code availability at

wget https://github.com/selvaje/spatial-ecology-codes/blob/master/docs/source/PYTHON/01_Python_Intro.ipynb

Why Python?

  • Free, portable, easy to learn

  • Wildly popular, huge and growing community

  • Intuitive, natural syntax

  • Ideal for rapid prototyping but also for large applications

  • Very efficient to write, reasonably efficient to run as is

  • Can be very efficient (numpy, cython, …)

  • Huge number of packages (modules)

You can use Python to…

  • Convert or filter files

  • Automate repetitive tasks

  • Compute statistics

  • Build processing pipelines

  • Build simple web applications

  • Perform large numerical computations

  • Python can be run interactively or as a program

Python Environment

Home

Science Applications

Integrated Environment

IDE

Different ways to run Python

  1. Create a file using editor, then:

    $ python myscript.py

  2. Run interpreter interactively

    $ python

Basic Data Types: integer, floating point, string, boolean

  • variables do not need to be declared or typed

  • integers and floating points can be used together

  • the same variable can hold different types

[1]:
radius=3
pi=3.14
diam=radius*2
area=pi*(radius**2)
title="fun with strings"
pi='cherry'
delicious=True
print (radius,diam,area,title,pi,delicious)
3 6 28.26 fun with strings cherry True
  • data type conversion

[10]:
num = [234,2435,243264]
print (num)
num.append(23453)
print (num)
[234, 2435, 243264]
[234, 2435, 243264, 23453]
[11]:
str(num)
[11]:
'[234, 2435, 243264, 23453]'
[12]:
strn = '234'
int(strn)
[12]:
234

Data Types: lists

  • Lists are like arrays in other languages but with higher flexibility

  • heterogeneous data types

  • nest lists

[13]:
l=[1,2,3,4,5,6,7,8,9,10]
l[3]
[13]:
4
[16]:
l[5:7]
[16]:
[6, 7]
[19]:
l[:]
[19]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[20]:
#comment
l[2]=3.14
l
[20]:
[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10]

Add to a list

[21]:
l.append(999)
l
[21]:
[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10, 999]

Modify a list

[22]:
l=[1,2,3,4,5,6,7,8,9]
l[2]=[11,12,13]
l
[22]:
[1, 2, [11, 12, 13], 4, 5, 6, 7, 8, 9]
[23]:
l[3:6]=['four to six']
l

[23]:
[1, 2, [11, 12, 13], 'four to six', 7, 8, 9]

joining lists

[14]:
my_string_list = ['apple', 'orange', 'grape']
my_string_list
[14]:
['apple', 'orange', 'grape']
[25]:
additions_to_list = ['pineapple', 'mango']
my_string_list + additions_to_list
[25]:
['apple', 'orange', 'grape', 'pineapple', 'mango']

Data Types: tuples

  • Tuples are ‘immutable’ lists, meaning that once they are created they cannot be changed

[15]:
t=(1,2,3,4,5,6,7,8,10)
t
[15]:
(1, 2, 3, 4, 5, 6, 7, 8, 10)
[16]:
t[4:6]
[16]:
(5, 6)
[17]:
t[5]=99
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-62c2311f04ba> in <module>
----> 1 t[5]=99

TypeError: 'tuple' object does not support item assignment

Data Types: strings

Strings are fully featured types in python.

  • strings are defined with ’ or “

  • strings cannot be modified

  • strings can be concatenated and sliced much like lists

  • strings are objects with lots of useful methods

[10]:
s="Some0String"
s
[10]:
'Some0String'
[9]:
s="int\"s"
s
[9]:
'int"s'
[11]:
s[4]
[11]:
'0'

Data Types: dictionaries

Dicts are what python calls “hash tables”

  • dicts associate keys with values, which can be of (almost) any type

  • dicts have length, but are not ordered

  • looking up values in dicts is very fast, even if the dict is BIG.

[12]:
coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}
coins
[12]:
{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}
[13]:
coins['dime']
[13]:
10

Basic Printing

[14]:
print("Simple")
Simple
[15]:
import math
x=16
print("The sqrt of %i is %f" % (x, math.sqrt(x)))
print("The sqrt of {} is {}".format(x, math.sqrt(x)))
print("the sqrt of %(x)i is %(xx)f" % {"x":x, "xx":math.sqrt(x)})
The sqrt of 16 is 4.000000
The sqrt of 16 is 4.0
the sqrt of 16 is 4.000000

Control Flow Statements: if

  • if statements allow you to do a test, and do something based on the result

  • else is optional

[18]:
import random
v=random.randint(0,100)
if v < 50:
    print ("small", v)
    print ("another line")
else:
    print ("big", v)
print ("after else")
small 3
another line
after else

Control Flow Statements: while

  • While statements execute one or more statements repeatedly until the test is false

[25]:
import random
count=0
while count<100:
    count=count+random.randint(0,10)
    print (count)
    return count

random.choice(count)
  File "<ipython-input-25-73852ac4f990>", line 6
    return count
    ^
SyntaxError: 'return' outside function

Control Flow Statements: for

For statements take some sort of iterable object and loop once for every value.

[23]:
for fruit in ['apple', 'orange', 'banana']:
    print(fruit)
apple
orange
banana
[38]:
for i in range(3,7):
    print(i)
3
4
5
6

Using for loops and dicts

If you loop over a dict, you’ll get just keys. Use items() for keys and values.

[39]:
for denom in coins:
    print (denom)
penny
nickle
dime
quarter
[25]:
for value in coins.values():
    print (value)
1
5
10
25

Control Flow Statements: altering loops

While and For loops can skip steps (continue) or terminate early (break).

[40]:
for i in range(10):
    if i%2 != 0: continue
    print (i)
0
2
4
6
8
[41]:
for i in range(10):
    if i>5: break
    print (i)
0
1
2
3
4
5

Read from standard input

[43]:
inputstr = input();
inputstr
test
[43]:
'test'

Functions

Functions allow you to write code once and use it many times.

Functions also hide details so code is more understandable.

[44]:
def area(w, h):
    return w*h

area(6, 10)
[44]:
60

Summary of basic elements of Python

  • 4 basic types: int, float, boolean, string

  • 3 complex types: list, dict, tuple

  • 4 control constructs: if, while, for, def

Example 1: File Reformatter

Task: given a file of hundreds or thousands of lines:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10
...

Remove the last 3 letters from the 5th column:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10
...

In this example, we’ll show: - reading lines of a file - parsing and modifying the lines - writing them back out - creating a script to do the above and running it - passing the script the file to modify

Step 1: open the input file

[32]:
import sys
fp=open('badfile.txt')
[33]:
fp
[33]:
<_io.TextIOWrapper name='badfile.txt' mode='r' encoding='UTF-8'>

Open takes a filename, and returns a ``file pointer’’.

We’ll use that to read from the file.

Step 2: read the first header line, and print it out

[26]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
fp.
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project

We’ll call readline() on the file pointer to get a single line from the file. (the header line).

Strip() removes the return at the end of the line.

Then we print it.

Step 3: for each remaining line in the file, read the line

[38]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
    print(l.strip())
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10

A file pointer is an example of an iterator.

Instead of explicitly calling readline() for each line, we can just loop on the file pointer, getting one line each time.

Since we already read the header, we won’t get that line.

Step 4: find the value in the 5th column, and remove last 3 letters

[39]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
    flds=l.strip().split(',')
    flds[4]=flds[4][:-3]
    #print(flds)
    print(", ".join(flds))
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212, 1, A1, human, TAAGGCGA-TAGAT, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A2, human, CGTACTAG-CTCTC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A3, human, AGGCAGAA-TATCC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A4, human, TCCTGAGC-AGAGT, None, N, Eland-rna, Mei, Jon_mix10

Like before, we strip the return from the line.

We split it into individual elements where we find commas.

The 5th field is referenced by flds[4], since python starts indexing with 0. [:-3] takes all characters of the string until the last 3.

Exercise 1: Fibonacci Series

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones

https://en.wikipedia.org/wiki/Fibonacci_number

[7]:
# This is the well-known Fibonacci series
a, b = 0, 1
while b<2000:
    print (b)
    a, b = b, a+b
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597

Task: write a function to generate a Fibonacci series for a given boundary (any number) and save the output into a list

## Exercise 2: Unique pair Input file 1 2 1 2 3 4 4 5 3 4 Output file 1 2 3 3 4 2 4 5 1 1. Read the file /home/user/ost4sem/exercise/python_intro/pairs.dat 2. Loop through the rows 3. Split the string 4. Count unique pairs 5. Print unique pairs and their count

Sources : this material was adopted from the following sources