Course: Geocomputation and Machine Learning for environmental applications (intermediate level) – 2022

Our 8th year running!

Dates

On-line teaching: April and May 2022 (8 weeks)

  • Every Tuesday: starting from April 5th, at 3PM – 5:45 PM UTC time (CET 4PM, EST 10AM, PST 7AM)
  • Every Thursday: ending on   May 26th, at 3PM – 5:45 PM UTC time (CET 4PM, EST 10AM, PST 7AM)

In-person: June (1 week) in Matera, Italy (to be defined in accordance to the pandemic situation).

  • June 13th – 17th  (confirmation in the end of April)

For an idea of subjects taught, please view this video https://youtu.be/1jcZAY-ZJmk

Info

Course programme

In this course, students will be introduced to an array of powerful open-source geocomputation tools and machine learning methodologies under Linux environment. Students who have never been exposed to programming under Linux are expected to reach the stage where they feel confident in using very advanced open source data processing routines. Students with a precedent programming background will find the course beneficial in enhancing their programming skills for better modelling and coding proficiency. Our dual teaching aim is to equip attendees with powerful tools as well as rendering their abilities of continuing independent development afterwards. The acquired skills will be beneficial, not only for GIS related application, but also for general data processing and applied statistical computing in a number of fields. These essentially lay the foundation for career development as a data scientist in the geographic domain.

Trainers:

Course requirements:

The course is aimed at masters or doctoral degree candidates, and researchers and professionals with an interest in spatio-temporal data analysis and modelling. We also accept undergraduate students. Course participants should have basic computer skills and a strong desire to learn command line tools to process data. A basic knowledge of python is requested and can it be achieved before the course by following a self-taught online course (e.g. https://geo-python-site.readthedocs.io/en/latest/ followed by https://autogis-site.readthedocs.io/en/latest/). Besides, we expect participants to have a specific interest in geographical data analyses, and prior experience in the use of Geographic Information Systems and basic statistic knowledge will be helpful. Basic concept of GIS, such as what is a raster/vector, overlays, buffering etc, and basic concept of statistic, such as mean standard, deviation, residuals  will be assumed as given. Aside from the on-line lectures we estimate other 10-14 hours/week of homowork, material review, modeling implementation, coding, trouble shouting. Participants need to have their own laptops with a minimum of 8GB RAM and 40GB free disk space.

Academic program:

The proposed course intends to provide students with the opportunity to develop crucial skills required for advanced spatial data processing. Throughout the course students will focus on developing fundamental and independent-learning skills in advanced data processing – a field that is continuously evolving with the availability of increasingly complex data and ongoing technological advancement. A diverse set of complementary and sometimes overlapping tools will be presented for an overview of the universe of open source softwares available for spatial data processing. We demonstrate their strengths, weaknesses and key features for various data processing objectives (ex.: modelling, data filtering, queries, GIS analyses, graphics or
reporting) and data types. Specifically, we guide students in using these tools and software and assist them along the steep curve of learning, command-line programming. We focus our training on helping students to develop independent learning skills and to find online help, solutions and strategies, in order to fix bugs, and independently progress with complex data processing problems.

The Academic Programme is divided into 3 main areas of study:

On-line lectures: (15 min to 1 hour each) Students take part in a series of lectures introducing the basic functioning of tools, theoretical aspects or background information needed for a better understanding of concepts that are subsequently applied in data processing.

On-line tutorials: Students are guided during hands-on sessions where trainers perform data analyses on real case study datasets, allowing the former to replicate the procedures on their own laptops. 

On-line exercises: In addition to tutorials and lectures, students are encouraged to embark on their respective projects of interest during exercise sessions. Specific tasks are set to help reinforce the newly learned data processing skills. Such exercise sessions equip students with the confidence and resources to become independent learners and to effectively address the demands of advanced spatial-data processing. Exercises are designed to enhance participants’ programming skills and mathematical modelling understanding within the context of GIS and Remote Sensing. The exercises and examples provided are cross-disciplinary in nature. They may cover forestry, landscape planning, predictive modelling and species distribution, mapping, nature conservation, computational social science and other spatially related fields of studies. Furthermore, these case studies can be viewed as template procedures and easily adapted to be applied to different thematic challenges across disciplines.

In-person week: This session will be offered in Matera, Italy, if at least 10 people are enrolled and subject to Covid-related travel rules. The in-person week aims to achieve  group collaboration and troubleshooting under the direct supervision of the trainers. During this week, there will be brief talks from the trainers, students, and the remaining time will be dedicated to student project development and assistance. The in-person week is not compulsory but strongly encouraged. Matera, also is a very beautiful town for sightseeing. Students will be responsible for purchasing flights and accommodation themselves.

Learning objectives

This course will enable students to further develop and enhance their spatio-temporal data processing skills. Most importantly, it will endow them with proficiency in a fully-functional open source operating system with all the requisite software tools. With continuous practice through the weeks, students will become familiar with command lines and cover numerous topics, including:

  • Learning a large suite of existing tools and knowing which ones to employ for project-specific applications.
  • Acquiring confidence in using several command line utilities for spatial data processing under the Linux operating system.
  • Developing data processing skills; and understanding data types, data modelling and data processing techniques.
  • Independent learning, critical thinking and efficient data processing.

Note-taking and Organization: Our previous experience suggests notes-taking and organising electronic data in order are helpful in gaining more out of the class. Your curiosity in searching new methods, commands and scripting procedures will be fundamental to your overall success.

PC and data Storage: You need to have a laptop with 40 GB of free disk space and 8GB of RAM.

Class materials: All the class materials will be presented from www.spatial-ecology.net site and in particular
under the documentation site https://spatial-ecology.net/docs/build/html/index.html.

Course requirements: Course participants are expected to have intermediate skills in GIS/RS, statistic and python, and a strong desire to learn
GIS using open source tools. We assume participants to be interested in geographical data analyses and possessing prior knowledge in basic calculus and statistics.

Course certification

At the end of the course, attendees will receive a course certificate, subject to successful completion of the course which require 100% attendance, home work delivery, final project presentation (in-person at Matera, or online).  For university students, course credit approval will be at the discretion of the concerned university.

Time table:

  • 3.00 – 3.20 PM recap and clarification (free style student supervision and homework solution)
  • 3.20 – 4.20 PM Lecture
  • 4.20 – 4.35 PM break
  • 4.35 – 5.45 PM Lecture

Summary syllabus

  • OSGeo-live operating system /  Linux bash programming
  • AWK – Gnuplot
  • Gdal/OGR geospatial libraries, PKTOOLS, GRASS
  • Basic statistical modeling, residuals analysis, error indices.
  • Python for Machine Learning regression/classification in supervised framework.
  • Students projects presentation.

Preliminary course programme 

We are working on the final course program in the next coming months. Nonetheless, the preliminary course programme  provides a good understanding of the subjects.