# Introduction to Python

Longzhu Shen Spatial Ecology Jun 2019

Video tutorial

Code availability at

wget https://github.com/selvaje/spatial-ecology-codes/blob/master/docs/source/PYTHON/01_Python_Intro.ipynb


## Why Python?

• Free, portable, easy to learn

• Wildly popular, huge and growing community

• Intuitive, natural syntax

• Ideal for rapid prototyping but also for large applications

• Very efficient to write, reasonably efficient to run as is

• Can be very efficient (numpy, cython, …)

• Huge number of packages (modules)

## You can use Python to…

• Convert or filter files

• Compute statistics

• Build processing pipelines

• Build simple web applications

• Perform large numerical computations

• Python can be run interactively or as a program

## Different ways to run Python

1. Create a file using editor, then:

$python myscript.py 2. Run interpreter interactively $ python

## Basic Data Types: integer, floating point, string, boolean

• variables do not need to be declared or typed

• integers and floating points can be used together

• the same variable can hold different types

[1]:

radius=3
pi=3.14
title="fun with strings"
pi='cherry'
delicious=True

3 6 28.26 fun with strings cherry True

• data type conversion

[10]:

num = [234,2435,243264]
print (num)
num.append(23453)
print (num)

[234, 2435, 243264]
[234, 2435, 243264, 23453]

[11]:

str(num)

[11]:

'[234, 2435, 243264, 23453]'

[12]:

strn = '234'
int(strn)

[12]:

234


## Data Types: lists

• Lists are like arrays in other languages but with higher flexibility

• heterogeneous data types

• nest lists

[13]:

l=[1,2,3,4,5,6,7,8,9,10]
l[3]

[13]:

4

[16]:

l[5:7]

[16]:

[6, 7]

[19]:

l[:]

[19]:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

[20]:

#comment
l[2]=3.14
l

[20]:

[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10]


[21]:

l.append(999)
l

[21]:

[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10, 999]


Modify a list

[22]:

l=[1,2,3,4,5,6,7,8,9]
l[2]=[11,12,13]
l

[22]:

[1, 2, [11, 12, 13], 4, 5, 6, 7, 8, 9]

[23]:

l[3:6]=['four to six']
l


[23]:

[1, 2, [11, 12, 13], 'four to six', 7, 8, 9]


joining lists

[14]:

my_string_list = ['apple', 'orange', 'grape']
my_string_list

[14]:

['apple', 'orange', 'grape']

[25]:

additions_to_list = ['pineapple', 'mango']

[25]:

['apple', 'orange', 'grape', 'pineapple', 'mango']


## Data Types: tuples

• Tuples are ‘immutable’ lists, meaning that once they are created they cannot be changed

[15]:

t=(1,2,3,4,5,6,7,8,10)
t

[15]:

(1, 2, 3, 4, 5, 6, 7, 8, 10)

[16]:

t[4:6]

[16]:

(5, 6)

[17]:

t[5]=99

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-62c2311f04ba> in <module>
----> 1 t[5]=99

TypeError: 'tuple' object does not support item assignment


## Data Types: strings

Strings are fully featured types in python.

• strings are defined with ’ or “

• strings cannot be modified

• strings can be concatenated and sliced much like lists

• strings are objects with lots of useful methods

[10]:

s="Some0String"
s

[10]:

'Some0String'

[9]:

s="int\"s"
s

[9]:

'int"s'

[11]:

s[4]

[11]:

'0'


## Data Types: dictionaries

Dicts are what python calls “hash tables”

• dicts associate keys with values, which can be of (almost) any type

• dicts have length, but are not ordered

• looking up values in dicts is very fast, even if the dict is BIG.

[12]:

coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}
coins

[12]:

{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}

[13]:

coins['dime']

[13]:

10


## Basic Printing

[14]:

print("Simple")

Simple

[15]:

import math
x=16
print("The sqrt of %i is %f" % (x, math.sqrt(x)))
print("The sqrt of {} is {}".format(x, math.sqrt(x)))
print("the sqrt of %(x)i is %(xx)f" % {"x":x, "xx":math.sqrt(x)})

The sqrt of 16 is 4.000000
The sqrt of 16 is 4.0
the sqrt of 16 is 4.000000


## Control Flow Statements: if

• if statements allow you to do a test, and do something based on the result

• else is optional

[18]:

import random
v=random.randint(0,100)
if v < 50:
print ("small", v)
print ("another line")
else:
print ("big", v)
print ("after else")

small 3
another line
after else


## Control Flow Statements: while

• While statements execute one or more statements repeatedly until the test is false

[25]:

import random
count=0
while count<100:
count=count+random.randint(0,10)
print (count)
return count

random.choice(count)

  File "<ipython-input-25-73852ac4f990>", line 6
return count
^
SyntaxError: 'return' outside function



## Control Flow Statements: for

For statements take some sort of iterable object and loop once for every value.

[23]:

for fruit in ['apple', 'orange', 'banana']:
print(fruit)

apple
orange
banana

[38]:

for i in range(3,7):
print(i)

3
4
5
6


## Using for loops and dicts

If you loop over a dict, you’ll get just keys. Use items() for keys and values.

[39]:

for denom in coins:
print (denom)

penny
nickle
dime
quarter

[25]:

for value in coins.values():
print (value)

1
5
10
25


## Control Flow Statements: altering loops

While and For loops can skip steps (continue) or terminate early (break).

[40]:

for i in range(10):
if i%2 != 0: continue
print (i)

0
2
4
6
8

[41]:

for i in range(10):
if i>5: break
print (i)

0
1
2
3
4
5


[43]:

inputstr = input();
inputstr

test

[43]:

'test'


## Functions

Functions allow you to write code once and use it many times.

Functions also hide details so code is more understandable.

[44]:

def area(w, h):
return w*h

area(6, 10)

[44]:

60


## Summary of basic elements of Python

• 4 basic types: int, float, boolean, string

• 3 complex types: list, dict, tuple

• 4 control constructs: if, while, for, def

## Example 1: File Reformatter

Task: given a file of hundreds or thousands of lines:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10
...


Remove the last 3 letters from the 5th column:

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10
...


In this example, we’ll show: - reading lines of a file - parsing and modifying the lines - writing them back out - creating a script to do the above and running it - passing the script the file to modify

## Step 1: open the input file

[32]:

import sys

[33]:

fp

[33]:

<_io.TextIOWrapper name='badfile.txt' mode='r' encoding='UTF-8'>


Open takes a filename, and returns a file pointer’’.

We’ll use that to read from the file.

## Step 2: read the first header line, and print it out

[26]:

import sys
fp.

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project


We’ll call readline() on the file pointer to get a single line from the file. (the header line).

Strip() removes the return at the end of the line.

Then we print it.

## Step 3: for each remaining line in the file, read the line

[38]:

import sys
for l in fp:
print(l.strip())

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10


A file pointer is an example of an iterator.

Instead of explicitly calling readline() for each line, we can just loop on the file pointer, getting one line each time.

## Step 4: find the value in the 5th column, and remove last 3 letters

[39]:

import sys
for l in fp:
flds=l.strip().split(',')
flds[4]=flds[4][:-3]
#print(flds)
print(", ".join(flds))

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212, 1, A1, human, TAAGGCGA-TAGAT, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A2, human, CGTACTAG-CTCTC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A3, human, AGGCAGAA-TATCC, None, N, Eland-rna, Mei, Jon_mix10
160212, 1, A4, human, TCCTGAGC-AGAGT, None, N, Eland-rna, Mei, Jon_mix10


Like before, we strip the return from the line.

We split it into individual elements where we find commas.

The 5th field is referenced by flds[4], since python starts indexing with 0. [:-3] takes all characters of the string until the last 3.

## Exercise 1: Fibonacci Series

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones

https://en.wikipedia.org/wiki/Fibonacci_number

[7]:

# This is the well-known Fibonacci series
a, b = 0, 1
while b<2000:
print (b)
a, b = b, a+b

1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597


Task: write a function to generate a Fibonacci series for a given boundary (any number) and save the output into a list

## Exercise 2: Unique pair Input file 1 2 1 2 3 4 4 5 3 4 Output file 1 2 3 3 4 2 4 5 1 1. Read the file /home/user/ost4sem/exercise/python_intro/pairs.dat 2. Loop through the rows 3. Split the string 4. Count unique pairs 5. Print unique pairs and their count