{ "cells": [ { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "a20290f5-2a26-4622-9e8b-ce94cc1dcb1a" }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction to Python\n", "Longzhu Shen Spatial Ecology Jun 2019\n", "\n", "[Video tutorial](https://youtu.be/j-bmWPJzKMQ)\n", "\n", "Code availability at\n", "\n", " wget https://github.com/selvaje/spatial-ecology-codes/blob/master/docs/source/PYTHON/01_Python_Intro.ipynb" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "494d6527-1db0-4bca-a3a9-ea7d28eed532" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Why Python?\n", "- Free, portable, easy to learn\n", "- Wildly popular, huge and growing community\n", "- Intuitive, natural syntax\n", "- Ideal for rapid prototyping but also for large applications\n", "- Very efficient to write, reasonably efficient to run as is\n", "- Can be very efficient (numpy, cython, ...)\n", "- Huge number of packages (modules)\n" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "12ea1d90-4e9e-435e-9531-27bf0426ae6f" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## You can use Python to...\n", "- Convert or filter files\n", "- Automate repetitive tasks\n", "- Compute statistics\n", "- Build processing pipelines\n", "- Build simple web applications\n", "- Perform large numerical computations\n", "- Python can be run interactively or as a program" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "12ea1d90-4e9e-435e-9531-27bf0426ae6f" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Python Environment \n", "\n", "### Home\n", "- https://www.python.org/\n", "\n", "### Science Applications\n", "- http://www.numpy.org/\n", "- https://www.scipy.org/\n", "\n", "### Integrated Environment\n", "- https://www.anaconda.com/\n", "\n", "### IDE \n", "- https://jupyter.org/" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "f8c8a2ab-9b15-45e6-bb89-f37ae176b4f2" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Different ways to run Python\n", "\n", "1. Create a file using editor, then:\n", "\n", " ```$ python myscript.py```\n", "\n", "1. Run interpreter interactively\n", "\n", " ```$ python ```" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "e675a434-eb22-4d12-a159-d6464ec9d916" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Basic Data Types: _integer, floating point, string, boolean_" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "cf3cb8c6-45f0-48de-b171-598ebf92904f" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "- variables do not need to be declared or typed\n", "- integers and floating points can be used together\n", "- the same variable can hold different types\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbpresent": { "id": "8f67df50-3050-47d2-a5d9-4329a61325fa" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 6 28.26 fun with strings cherry True\n" ] } ], "source": [ "radius=3\n", "pi=3.14\n", "diam=radius*2\n", "area=pi*(radius**2)\n", "title=\"fun with strings\"\n", "pi='cherry'\n", "delicious=True\n", "print (radius,diam,area,title,pi,delicious)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "cf3cb8c6-45f0-48de-b171-598ebf92904f" }, "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "- data type conversion\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[234, 2435, 243264]\n", "[234, 2435, 243264, 23453]\n" ] } ], "source": [ "num = [234,2435,243264] \n", "print (num)\n", "num.append(23453)\n", "print (num)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'[234, 2435, 243264, 23453]'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "str(num)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "234" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strn = '234'\n", "int(strn)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "01c5e397-ce22-4e60-bece-26892ff66c77" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Data Types: _lists_\n", "\n", "- Lists are like arrays in other languages but with higher flexibility\n", "- heterogeneous data types\n", "- nest lists" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "nbpresent": { "id": "32dcb56e-4dae-4bef-821a-48f9334a6f3d" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l=[1,2,3,4,5,6,7,8,9,10]\n", "l[3]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "nbpresent": { "id": "8dc3a7fd-a5ce-41f2-b336-c9df5e112dfd" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[6, 7]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l[5:7]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "nbpresent": { "id": "fb27453f-b80c-48e1-96d6-d77ec61835cd" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l[:]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "nbpresent": { "id": "6fc8d55d-6c8e-4b5b-ac6f-dc4da49e5dc1" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#comment\n", "l[2]=3.14\n", "l" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "6fc8d55d-6c8e-4b5b-ac6f-dc4da49e5dc1" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Add to a list" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "nbpresent": { "id": "4f14437f-6521-4091-bea7-e3c7a4506478" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3.14, 4, 5, 6, 7, 8, 9, 10, 999]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l.append(999)\n", "l" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "6fc8d55d-6c8e-4b5b-ac6f-dc4da49e5dc1" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Modify a list" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "nbpresent": { "id": "b4655176-1398-4987-8ff0-6f46bd24c9e9" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, [11, 12, 13], 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l=[1,2,3,4,5,6,7,8,9]\n", "l[2]=[11,12,13]\n", "l" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "nbpresent": { "id": "378a93fb-d87e-46da-80c4-ac69fd10baaf" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[1, 2, [11, 12, 13], 'four to six', 7, 8, 9]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l[3:6]=['four to six']\n", "l\n" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "6fc8d55d-6c8e-4b5b-ac6f-dc4da49e5dc1" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "joining lists" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['apple', 'orange', 'grape']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_string_list = ['apple', 'orange', 'grape']\n", "my_string_list" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['apple', 'orange', 'grape', 'pineapple', 'mango']" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "additions_to_list = ['pineapple', 'mango']\n", "my_string_list + additions_to_list" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "82714e0c-2a74-45d8-b7cc-99a09924f080" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Data Types: _tuples_\n", "\n", "- Tuples are **'immutable'** lists, meaning that once they are created they **cannot** be changed" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "nbpresent": { "id": "e772b731-9081-4946-92c9-992f9c176d99" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(1, 2, 3, 4, 5, 6, 7, 8, 10)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t=(1,2,3,4,5,6,7,8,10)\n", "t" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "nbpresent": { "id": "3797ae0a-3333-4e11-b5f8-89569c0d37ba" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(5, 6)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[4:6]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "nbpresent": { "id": "32ca7f19-e063-44ad-bff2-d71eb833af58" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m99\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "t[5]=99" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "a304d336-0970-4929-b6ed-13c646c59725" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Data Types: _strings_\n", "Strings are fully featured types in python.\n", "\n", "- strings are defined with ' or \"\n", "- strings cannot be modified\n", "- strings can be concatenated and sliced much like lists\n", "- strings are objects with lots of useful methods\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbpresent": { "id": "4765d0a2-2a8a-497a-b969-9d6b80486aec" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Some0String'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s=\"Some0String\"\n", "s " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "nbpresent": { "id": "1ccf400f-f3da-4ece-982d-8678acaf3d50" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'int\"s'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s=\"int\\\"s\"\n", "s " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "nbpresent": { "id": "10caeeb9-57bc-4ecc-a962-7b32fb3d076e" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'0'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[4]" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "f4c0903e-8146-4f12-9b0f-f09553cabfaa" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Data Types: _dictionaries_\n", "\n", "Dicts are what python calls \"hash tables\"\n", "\n", "- dicts associate keys with values, which can be of (almost) any type\n", "- dicts have length, but are not ordered\n", "- looking up values in dicts is very fast, even if the dict is BIG.\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbpresent": { "id": "812f1017-5af8-477d-9f6f-ded5379ca997" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}\n", "coins" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "nbpresent": { "id": "cbe769ce-ad16-4805-afe1-fd8aaba865c1" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coins['dime']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Basic Printing\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Simple\n" ] } ], "source": [ "print(\"Simple\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sqrt of 16 is 4.000000\n", "The sqrt of 16 is 4.0\n", "the sqrt of 16 is 4.000000\n" ] } ], "source": [ "import math\n", "x=16\n", "print(\"The sqrt of %i is %f\" % (x, math.sqrt(x)))\n", "print(\"The sqrt of {} is {}\".format(x, math.sqrt(x)))\n", "print(\"the sqrt of %(x)i is %(xx)f\" % {\"x\":x, \"xx\":math.sqrt(x)})" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "138e7c96-d712-484e-8e82-5b21a548b0af" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Control Flow Statements: _if_\n", "\n", "- if statements allow you to do a test, and do something based on the result\n", "- _else_ is optional\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "nbpresent": { "id": "8b9c7e6e-3770-4bec-ab1d-4219b015f840" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "small 3\n", "another line\n", "after else\n" ] } ], "source": [ "import random\n", "v=random.randint(0,100)\n", "if v < 50:\n", " print (\"small\", v)\n", " print (\"another line\")\n", "else:\n", " print (\"big\", v) \n", "print (\"after else\")" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "9e1755ac-fc3a-470f-ad85-81383bcc5548" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Control Flow Statements: _while_\n", "\n", "- While statements execute one or more statements repeatedly until the\n", "test is false" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "nbpresent": { "id": "1209ed01-100c-4249-9790-7ceaf4b8d25f" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "ename": "SyntaxError", "evalue": "'return' outside function (, line 6)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m6\u001b[0m\n\u001b[0;31m return count\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m 'return' outside function\n" ] } ], "source": [ "import random\n", "count=0\n", "while count<100:\n", " count=count+random.randint(0,10)\n", " print (count)\n", " return count\n", " \n", "random.choice(count)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "15c9bfc0-08a7-43e4-bfcc-fcd6ce5810ad" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Control Flow Statements: _for_\n", "\n", "For statements take some sort of iterable object and loop once for\n", "every value." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "nbpresent": { "id": "3289621e-dc24-4f9a-a9ca-c4ed005fb5e7" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apple\n", "orange\n", "banana\n" ] } ], "source": [ "for fruit in ['apple', 'orange', 'banana']:\n", " print(fruit)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "nbpresent": { "id": "2e40b847-7de3-426b-93e2-66fa8507cb37" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "4\n", "5\n", "6\n" ] } ], "source": [ "for i in range(3,7):\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "d4bddfab-61ba-4fb3-91d3-69bc8065c1f5" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Using ```for``` loops and ```dicts```\n", "If you loop over a dict, you'll get just keys. Use items() for keys and values." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "nbpresent": { "id": "0601b993-3329-4700-a4c1-36e64b485275" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "penny\n", "nickle\n", "dime\n", "quarter\n" ] } ], "source": [ "for denom in coins: \n", " print (denom)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "nbpresent": { "id": "fff88f2e-1c89-460f-8fbd-3bf309bea9b9" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "5\n", "10\n", "25\n" ] } ], "source": [ "for value in coins.values(): \n", " print (value)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "568d1355-af0f-4d7c-8a72-d8eb8e6a3480" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Control Flow Statements: altering loops\n", "While and For loops can skip steps (continue) or terminate early (break)." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "nbpresent": { "id": "aedb70f1-1599-42f5-9ea7-25177e656040" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "2\n", "4\n", "6\n", "8\n" ] } ], "source": [ "for i in range(10):\n", " if i%2 != 0: continue\n", " print (i)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "nbpresent": { "id": "1ad39692-d322-4b4b-8b41-d7705e3ff75c" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n" ] } ], "source": [ "for i in range(10):\n", " if i>5: break\n", " print (i)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "69d16429-24ce-4d40-885c-8847d95c18b0" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Read from standard input" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test\n" ] }, { "data": { "text/plain": [ "'test'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inputstr = input();\n", "inputstr" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "69d16429-24ce-4d40-885c-8847d95c18b0" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Functions\n", "Functions allow you to write code once and use it many times.\n", "\n", "Functions also hide details so code is more understandable.\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "nbpresent": { "id": "52ea5585-d6b7-4f6d-b565-23d11b1c819a" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "60" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def area(w, h):\n", " return w*h\n", "\n", "area(6, 10) " ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "b1618f18-c6fc-4d71-b0dd-af66de188f3e" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary of basic elements of Python\n", "- 4 basic types: int, float, boolean, string\n", "- 3 complex types: list, dict, tuple\n", "- 4 control constructs: if, while, for, def\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "efc2efdd-8beb-49d3-8905-1f73fecb687d" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Example 1: File Reformatter\n", "\n", "Task: given a file of hundreds or thousands of lines:\n", "\n", "```\n", "FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...\n", "160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10\n", "...\n", "```\n", "\n", "Remove the last 3 letters from the 5th column:\n", "\n", "```\n", "FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...\n", "160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10\n", "...\n", "```\n", "\n", "In this example, we'll show:\n", "- reading lines of a file\n", "- parsing and modifying the lines\n", "- writing them back out\n", "- creating a script to do the above and running it\n", "- passing the script the file to modify" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "4b900509-1a2e-493a-b91d-6f7733b2eb5c" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 1: open the input file" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "nbpresent": { "id": "9df59d49-02d8-4108-b063-d58b0f01bbf5" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import sys\n", "fp=open('badfile.txt')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "nbpresent": { "id": "4fea7e01-5e99-486d-b238-6275ff3659ce" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "<_io.TextIOWrapper name='badfile.txt' mode='r' encoding='UTF-8'>" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fp" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "e67dde7c-a815-4650-a5db-c68087ea73a4" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Open takes a filename, and returns a ``file pointer''.\n", "\n", "We'll use that to read from the file." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "684785a7-608d-4eae-968d-75d8daebdaed" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 2: read the first header line, and print it out" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "nbpresent": { "id": "778bfab0-219a-4ebd-8d36-57f0e55d7641" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project\n" ] } ], "source": [ "import sys\n", "fp=open('badfile.txt')\n", "print (fp.readline().strip())\n", "fp." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "2d8dd505-f5bf-4b59-ba48-b4a2d6cd5f57" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "We'll call readline() on the file pointer to get a single line from the file.\n", "(the header line).\n", "\n", "Strip() removes the return at the end of the line.\n", "\n", "Then we print it." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "3f85fedf-db72-44af-b673-e3d1b66c67ad" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 3: for each remaining line in the file, read the line" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "nbpresent": { "id": "19f8b102-3960-4f9a-af07-efda4fd07d88" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project\n", "160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10\n", "160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10\n" ] } ], "source": [ "import sys\n", "fp=open('badfile.txt')\n", "print (fp.readline().strip())\n", "for l in fp:\n", " print(l.strip())" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "cb0419a7-7c7d-4e8a-88c2-72bbbdcddc2f" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "A file pointer is an example of an iterator.\n", "\n", "Instead of explicitly calling readline() for each line, we can just loop on the file\n", "pointer, getting one line each time.\n", "\n", "Since we already read the header, we\n", "won't get that line." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "18f07bd0-4807-400f-b98b-6da00a29a45d" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 4: find the value in the 5th column, and remove last 3 letters\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "nbpresent": { "id": "d0c99a58-e78f-40c6-a2f9-356ba462abb3" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project\n", "160212, 1, A1, human, TAAGGCGA-TAGAT, None, N, Eland-rna, Mei, Jon_mix10\n", "160212, 1, A2, human, CGTACTAG-CTCTC, None, N, Eland-rna, Mei, Jon_mix10\n", "160212, 1, A3, human, AGGCAGAA-TATCC, None, N, Eland-rna, Mei, Jon_mix10\n", "160212, 1, A4, human, TCCTGAGC-AGAGT, None, N, Eland-rna, Mei, Jon_mix10\n" ] } ], "source": [ "import sys\n", "fp=open('badfile.txt')\n", "print (fp.readline().strip())\n", "for l in fp:\n", " flds=l.strip().split(',')\n", " flds[4]=flds[4][:-3]\n", " #print(flds)\n", " print(\", \".join(flds))" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "798ae994-7b95-4d34-95db-fde9158ff3ba" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Like before, we strip the return from the line.\n", "\n", "We split it into\n", "individual elements where we find commas.\n", "\n", "The 5th field is referenced by\n", "flds[4], since python starts indexing with 0. [:-3] takes all characters\n", "of the string until the last 3." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "efc2efdd-8beb-49d3-8905-1f73fecb687d" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 1: Fibonacci Series\n", "\n", "In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones\n", "\n", "https://en.wikipedia.org/wiki/Fibonacci_number" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "1\n", "2\n", "3\n", "5\n", "8\n", "13\n", "21\n", "34\n", "55\n", "89\n", "144\n", "233\n", "377\n", "610\n", "987\n", "1597\n" ] } ], "source": [ "# This is the well-known Fibonacci series\n", "a, b = 0, 1\n", "while b<2000:\n", " print (b)\n", " a, b = b, a+b" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "efc2efdd-8beb-49d3-8905-1f73fecb687d" }, "slideshow": { "slide_type": "slide" } }, "source": [ "Task: write a function to generate a Fibonacci series for a given boundary (any number) and save the output into a list" ] }, { "cell_type": "raw", "metadata": { "nbpresent": { "id": "efc2efdd-8beb-49d3-8905-1f73fecb687d" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 2: Unique pair \n", "\n", "Input file\n", "1 2\n", "1 2\n", "3 4\n", "4 5 \n", "3 4 \n", "\n", "Output file \n", "1 2 3\n", "3 4 2 \n", "4 5 1\n", "\n", "1. Read the file /home/user/ost4sem/exercise/python_intro/pairs.dat \n", "2. Loop through the rows\n", "3. Split the string\n", "4. Count unique pairs\n", "5. Print unique pairs and their count" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "3923476c-75be-4220-b7f4-7881833e8e98" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Sources : this material was adopted from the following sources\n", "- https://github.com/ycrc/Python-Bootcamp\n", "- https://github.com/fpl/geotutorial_basic" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }