Introduction to Python
Longzhu Shen Spatial Ecology Jun 2019
Code availability at
wget https://github.com/selvaje/spatial-ecology-codes/blob/master/docs/source/PYTHON/01_Python_Intro.ipynb
Why Python?
Free, portable, easy to learn
Wildly popular, huge and growing community
Intuitive, natural syntax
Ideal for rapid prototyping but also for large applications
Very efficient to write, reasonably efficient to run as is
Can be very efficient (numpy, cython, …)
Huge number of packages (modules)
You can use Python to…
Convert or filter files
Automate repetitive tasks
Compute statistics
Build processing pipelines
Build simple web applications
Perform large numerical computations
Python can be run interactively or as a program
Python Environment
Home
Science Applications
Integrated Environment
IDE
Different ways to run Python
Create a file using editor, then:
$ python myscript.py
Run interpreter interactively
$ python
Basic Data Types: integer, floating point, string, boolean
variables do not need to be declared or typed
integers and floating points can be used together
the same variable can hold different types
[1]:
radius=3
pi=3.14
diam=radius*2
area=pi*(radius**2)
title="fun with strings"
pi='cherry'
delicious=True
print (radius,diam,area,title,pi,delicious)
3 6 28.26 fun with strings cherry True
data type conversion
[10]:
num = [234,2435,243264]
print (num)
num.append(23453)
print (num)
[234, 2435, 243264]
[234, 2435, 243264, 23453]
[11]:
str(num)
[11]:
'[234, 2435, 243264, 23453]'
[12]:
strn = '234'
int(strn)
[12]:
234
Data Types: lists
Lists are like arrays in other languages but with higher flexibility
heterogeneous data types
nest lists
[13]:
l=[1,2,3,4,5,6,7,8,9,10]
l[3]
[13]:
4
[16]:
l[5:7]
[16]:
[6, 7]
[19]:
l[:]
[19]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[20]:
#comment
l[2]=3.14
l
[20]:
[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10]
Add to a list
[21]:
l.append(999)
l
[21]:
[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10, 999]
Modify a list
[22]:
l=[1,2,3,4,5,6,7,8,9]
l[2]=[11,12,13]
l
[22]:
[1, 2, [11, 12, 13], 4, 5, 6, 7, 8, 9]
[23]:
l[3:6]=['four to six']
l
[23]:
[1, 2, [11, 12, 13], 'four to six', 7, 8, 9]
joining lists
[14]:
my_string_list = ['apple', 'orange', 'grape']
my_string_list
[14]:
['apple', 'orange', 'grape']
[25]:
additions_to_list = ['pineapple', 'mango']
my_string_list + additions_to_list
[25]:
['apple', 'orange', 'grape', 'pineapple', 'mango']
Data Types: tuples
Tuples are ‘immutable’ lists, meaning that once they are created they cannot be changed
[15]:
t=(1,2,3,4,5,6,7,8,10)
t
[15]:
(1, 2, 3, 4, 5, 6, 7, 8, 10)
[16]:
t[4:6]
[16]:
(5, 6)
[17]:
t[5]=99
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-62c2311f04ba> in <module>
----> 1 t[5]=99
TypeError: 'tuple' object does not support item assignment
Data Types: strings
Strings are fully featured types in python.
strings are defined with ’ or “
strings cannot be modified
strings can be concatenated and sliced much like lists
strings are objects with lots of useful methods
[10]:
s="Some0String"
s
[10]:
'Some0String'
[9]:
s="int\"s"
s
[9]:
'int"s'
[11]:
s[4]
[11]:
'0'
Data Types: dictionaries
Dicts are what python calls “hash tables”
dicts associate keys with values, which can be of (almost) any type
dicts have length, but are not ordered
looking up values in dicts is very fast, even if the dict is BIG.
[12]:
coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}
coins
[12]:
{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}
[13]:
coins['dime']
[13]:
10
Basic Printing
[14]:
print("Simple")
Simple
[15]:
import math
x=16
print("The sqrt of %i is %f" % (x, math.sqrt(x)))
print("The sqrt of {} is {}".format(x, math.sqrt(x)))
print("the sqrt of %(x)i is %(xx)f" % {"x":x, "xx":math.sqrt(x)})
The sqrt of 16 is 4.000000
The sqrt of 16 is 4.0
the sqrt of 16 is 4.000000
Control Flow Statements: if
if statements allow you to do a test, and do something based on the result
else is optional
[18]:
import random
v=random.randint(0,100)
if v < 50:
print ("small", v)
print ("another line")
else:
print ("big", v)
print ("after else")
small 3
another line
after else
Control Flow Statements: while
While statements execute one or more statements repeatedly until the test is false
[25]:
import random
count=0
while count<100:
count=count+random.randint(0,10)
print (count)
return count
random.choice(count)
File "<ipython-input-25-73852ac4f990>", line 6
return count
^
SyntaxError: 'return' outside function
Control Flow Statements: for
For statements take some sort of iterable object and loop once for every value.
[23]:
for fruit in ['apple', 'orange', 'banana']:
print(fruit)
apple
orange
banana
[38]:
for i in range(3,7):
print(i)
3
4
5
6
Using for
loops and dicts
If you loop over a dict, you’ll get just keys. Use items() for keys and values.
[39]:
for denom in coins:
print (denom)
penny
nickle
dime
quarter
[25]:
for value in coins.values():
print (value)
1
5
10
25
Control Flow Statements: altering loops
While and For loops can skip steps (continue) or terminate early (break).
[40]:
for i in range(10):
if i%2 != 0: continue
print (i)
0
2
4
6
8
[41]:
for i in range(10):
if i>5: break
print (i)
0
1
2
3
4
5
Read from standard input
[43]:
inputstr = input();
inputstr
test
[43]:
'test'
Functions
Functions allow you to write code once and use it many times.
Functions also hide details so code is more understandable.
[44]:
def area(w, h):
return w*h
area(6, 10)
[44]:
60
Summary of basic elements of Python
4 basic types: int, float, boolean, string
3 complex types: list, dict, tuple
4 control constructs: if, while, for, def
Example 1: File Reformatter
Task: given a file of hundreds or thousands of lines:
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10
...
Remove the last 3 letters from the 5th column:
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10
...
In this example, we’ll show: - reading lines of a file - parsing and modifying the lines - writing them back out - creating a script to do the above and running it - passing the script the file to modify
Step 1: open the input file
[32]:
import sys
fp=open('badfile.txt')
[33]:
fp
[33]:
<_io.TextIOWrapper name='badfile.txt' mode='r' encoding='UTF-8'>
Open takes a filename, and returns a ``file pointer’’.
We’ll use that to read from the file.
Step 2: read the first header line, and print it out
[26]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
fp.
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
We’ll call readline() on the file pointer to get a single line from the file. (the header line).
Strip() removes the return at the end of the line.
Then we print it.
Step 3: for each remaining line in the file, read the line
[38]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
print(l.strip())
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10
A file pointer is an example of an iterator.
Instead of explicitly calling readline() for each line, we can just loop on the file pointer, getting one line each time.
Since we already read the header, we won’t get that line.
Step 4: find the value in the 5th column, and remove last 3 letters
[39]:
import sys
fp=open('badfile.txt')
print (fp.readline().strip())
for l in fp:
flds=l.strip().split(',')
flds[4]=flds[4][:-3]
#print(flds)
print(", ".join(flds))
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212, 1, A1, human, TAAGGCGA-TAGAT, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A2, human, CGTACTAG-CTCTC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A3, human, AGGCAGAA-TATCC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A4, human, TCCTGAGC-AGAGT, None, N, Eland-rna, Mei, Jon_mix10
Like before, we strip the return from the line.
We split it into individual elements where we find commas.
The 5th field is referenced by flds[4], since python starts indexing with 0. [:-3] takes all characters of the string until the last 3.
Exercise 1: Fibonacci Series
In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones
https://en.wikipedia.org/wiki/Fibonacci_number
[7]:
# This is the well-known Fibonacci series
a, b = 0, 1
while b<2000:
print (b)
a, b = b, a+b
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
Task: write a function to generate a Fibonacci series for a given boundary (any number) and save the output into a list
## Exercise 2: Unique pair Input file 1 2 1 2 3 4 4 5 3 4 Output file 1 2 3 3 4 2 4 5 1 1. Read the file /home/user/ost4sem/exercise/python_intro/pairs.dat 2. Loop through the rows 3. Split the string 4. Count unique pairs 5. Print unique pairs and their count