Introduction to Python

Longzhu Shen Spatial Ecology Jun 2019

Code availability at

wget https://github.com/selvaje/spatial-ecology-codes/blob/master/docs/source/PYTHON/01_Python_Intro.ipynb

Why Python?

Free, portable, easy to learn
Wildly popular, huge and growing community
Intuitive, natural syntax
Ideal for rapid prototyping but also for large applications
Very efficient to write, reasonably efficient to run as is
Can be very efficient (numpy, cython, …)
Huge number of packages (modules)

You can use Python to…

Convert or filter files
Automate repetitive tasks
Compute statistics
Build processing pipelines
Build simple web applications
Perform large numerical computations
Python can be run interactively or as a program

Python Environment

Home

https://www.python.org/

Science Applications

Integrated Environment

https://www.anaconda.com/

IDE

https://jupyter.org/

Different ways to run Python

Create a file using editor, then:

$ python myscript.py
Run interpreter interactively

$ python

Basic Data Types: integer, floating point, string, boolean

variables do not need to be declared or typed
integers and floating points can be used together
the same variable can hold different types

[1]:

radius=3
pi=3.14
diam=radius*2
area=pi*(radius**2)
title="fun with strings"
pi='cherry'
delicious=True
print (radius,diam,area,title,pi,delicious)

3 6 28.26 fun with strings cherry True

data type conversion

[10]:

num = [234,2435,243264]
print (num)
num.append(23453)
print (num)

[234, 2435, 243264]
[234, 2435, 243264, 23453]

[11]:

str(num)

[11]:

'[234, 2435, 243264, 23453]'

[12]:

strn = '234'
int(strn)

[12]:

Data Types: lists

Lists are like arrays in other languages but with higher flexibility
heterogeneous data types
nest lists

[13]:

l=[1,2,3,4,5,6,7,8,9,10]
l[3]

[13]:

[16]:

l[5:7]

[16]:

[6, 7]

[19]:

l[:]

[19]:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

[20]:

#comment
l[2]=3.14
l

[20]:

[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10]

Add to a list

[21]:

l.append(999)
l

[21]:

[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10, 999]

Modify a list

[22]:

l=[1,2,3,4,5,6,7,8,9]
l[2]=[11,12,13]
l

[22]:

[1, 2, [11, 12, 13], 4, 5, 6, 7, 8, 9]

[23]:

l[3:6]=['four to six']
l

[23]:

[1, 2, [11, 12, 13], 'four to six', 7, 8, 9]

joining lists

[14]:

my_string_list = ['apple', 'orange', 'grape']
my_string_list

[14]:

['apple', 'orange', 'grape']

[25]:

additions_to_list = ['pineapple', 'mango']
my_string_list + additions_to_list

[25]:

['apple', 'orange', 'grape', 'pineapple', 'mango']

Data Types: tuples

Tuples are ‘immutable’ lists, meaning that once they are created they cannot be changed

[15]:

t=(1,2,3,4,5,6,7,8,10)
t

[15]:

(1, 2, 3, 4, 5, 6, 7, 8, 10)

[16]:

t[4:6]

[16]:

(5, 6)

[17]:

t[5]=99

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-62c2311f04ba> in <module>
----> 1 t[5]=99

TypeError: 'tuple' object does not support item assignment

Data Types: strings

Strings are fully featured types in python.

strings are defined with ’ or “
strings cannot be modified
strings can be concatenated and sliced much like lists
strings are objects with lots of useful methods

[10]:

s="Some0String"
s

[10]:

'Some0String'

[9]:

s="int\"s"
s

[9]:

'int"s'

[11]:

s[4]

[11]:

'0'

Data Types: dictionaries

Dicts are what python calls “hash tables”

dicts associate keys with values, which can be of (almost) any type
dicts have length, but are not ordered
looking up values in dicts is very fast, even if the dict is BIG.

[12]:

coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}
coins

[12]:

{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}

[13]:

coins['dime']

[13]:

Basic Printing

[14]:

print("Simple")

Simple

[15]:

import math
x=16
print("The sqrt of %i is %f" % (x, math.sqrt(x)))
print("The sqrt of {} is {}".format(x, math.sqrt(x)))
print("the sqrt of %(x)i is %(xx)f" % {"x":x, "xx":math.sqrt(x)})

The sqrt of 16 is 4.000000
The sqrt of 16 is 4.0
the sqrt of 16 is 4.000000

Control Flow Statements: if

if statements allow you to do a test, and do something based on the result
else is optional

[18]:

import random
v=random.randint(0,100)
if v < 50:
    print ("small", v)
    print ("another line")
else:
    print ("big", v)
print ("after else")

small 3
another line
after else

Control Flow Statements: while

While statements execute one or more statements repeatedly until the test is false

[25]:

import random
count=0
while count<100:
    count=count+random.randint(0,10)
    print (count)
    return count

random.choice(count)

  File "<ipython-input-25-73852ac4f990>", line 6
    return count
    ^
SyntaxError: 'return' outside function

Control Flow Statements: for

For statements take some sort of iterable object and loop once for every value.

[23]:

for fruit in ['apple', 'orange', 'banana']:
    print(fruit)

apple
orange
banana

[38]:

for i in range(3,7):
    print(i)

Using `for` loops and `dicts`

If you loop over a dict, you’ll get just keys. Use items() for keys and values.

[39]:

for denom in coins:
    print (denom)

penny
nickle
dime
quarter

[25]:

for value in coins.values():
    print (value)

Control Flow Statements: altering loops

While and For loops can skip steps (continue) or terminate early (break).

[40]:

for i in range(10):
    if i%2 != 0: continue
    print (i)

[41]:

for i in range(10):
    if i>5: break
    print (i)

Read from standard input

[43]:

inputstr = input();
inputstr

test

[43]:

'test'

Functions

Functions allow you to write code once and use it many times.

Functions also hide details so code is more understandable.

[44]:

def area(w, h):
    return w*h

area(6, 10)

[44]:

Summary of basic elements of Python

4 basic types: int, float, boolean, string
3 complex types: list, dict, tuple
4 control constructs: if, while, for, def

Example 1: File Reformatter

Task: given a file of hundreds or thousands of lines:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10
...

Remove the last 3 letters from the 5th column:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10
...

In this example, we’ll show: - reading lines of a file - parsing and modifying the lines - writing them back out - creating a script to do the above and running it - passing the script the file to modify

Step 1: open the input file

[32]:

import sys
fp=open('badfile.txt')

[33]:

fp

[33]:

<_io.TextIOWrapper name='badfile.txt' mode='r' encoding='UTF-8'>

Open takes a filename, and returns a ``file pointer’’.

We’ll use that to read from the file.

Step 2: read the first header line, and print it out

[26]:

import sys
fp=open('badfile.txt')
print (fp.readline().strip())
fp.

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project

We’ll call readline() on the file pointer to get a single line from the file. (the header line).

Strip() removes the return at the end of the line.

Then we print it.

Step 3: for each remaining line in the file, read the line

[38]:

import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
    print(l.strip())

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10

A file pointer is an example of an iterator.

Instead of explicitly calling readline() for each line, we can just loop on the file pointer, getting one line each time.

Since we already read the header, we won’t get that line.

Step 4: find the value in the 5th column, and remove last 3 letters

[39]:

import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
    flds=l.strip().split(',')
    flds[4]=flds[4][:-3]
    #print(flds)
    print(", ".join(flds))

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212, 1, A1, human, TAAGGCGA-TAGAT, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A2, human, CGTACTAG-CTCTC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A3, human, AGGCAGAA-TATCC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A4, human, TCCTGAGC-AGAGT, None, N, Eland-rna, Mei, Jon_mix10

Like before, we strip the return from the line.

We split it into individual elements where we find commas.

The 5th field is referenced by flds[4], since python starts indexing with 0. [:-3] takes all characters of the string until the last 3.

Exercise 1: Fibonacci Series

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones

https://en.wikipedia.org/wiki/Fibonacci_number

[7]:

# This is the well-known Fibonacci series
a, b = 0, 1
while b<2000:
    print (b)
    a, b = b, a+b

Task: write a function to generate a Fibonacci series for a given boundary (any number) and save the output into a list

## Exercise 2: Unique pair Input file 1 2 1 2 3 4 4 5 3 4 Output file 1 2 3 3 4 2 4 5 1 1. Read the file /home/user/ost4sem/exercise/python_intro/pairs.dat 2. Loop through the rows 3. Split the string 4. Count unique pairs 5. Print unique pairs and their count