Course: Geocomputation & Machine Learning for Environmental Applications (intermediate level) – 2024

Our 10th year running!

WE ARE REACHING OUR ENROLMENT LIMIT FOR THE COURSE. PLEASE CONTACT US BEFORE PROCEEDING WITH REGISTRATION.

Dates

On-line teaching: April & May 2024 (8 weeks)

  • Lectures begin: 02 April every Tuesday and Thursday at 3:00 – 5:45 pm UTC (CEST 5 pm, EDT 11 am, PDT 8 am)
  • Catch-up session: Tuesday 30 April at 3:00 – 5:45 pm UTC (CEST 5 pm, EDT 11 am, PDT 8 am)
  • Catch-up session: Thursday 02 May at 3:00 – 5:45 pm UTC (CEST 5 pm, EDT 11 am, PDT 8 am)
  • Lectures resume: 07 May every Tuesday, Wednesday and Thursday at 11:00 am – 1:45 pm UTC (CEST 1 pm, EDT 7 am, PDT 4 am)
  • Lectures end 30 May at 11:00 am – 1:45 pm UTC (CEST 1 pm, EDT 7 am, PDT 4 am)

5-day + 5-day in-person workshop: Matera, Italy

  •  In-person week: June 10-14, coding hackathon in Matera, Italy (Strongly suggested).
  • In-person week: June 17-21, extended study group in Matera, Italy. Self organizing working group – no class (Voluntary).

For course reviews from last year’s course, please click here for 2022 and here for 2023.

Info

Course programme

On this course, students are introduced to an array of powerful open-source geocomputation tools and machine learning methodologies under the Linux environment. Those with no exposure to programming under Linux are able to skills and confidence in using advanced open source data processing routines. Meanwhile, those with a programming background benefit by enhancing their modelling and coding skills. Our aim is to equip attendees with powerful tools and endow them with the ability to continue independent development. The skills acquired on this course will be beneficial, not only for GIS applications but for general data processing and applied statistical computing in various fields. These skills lay the foundation for career development as a geographic data scientist.

Instructors:

Course requirements:

The course is aimed at masters or doctoral degree candidates, and researchers and professionals with an interest in spatio-temporal data analysis and modelling. We also accept undergraduate students. Course attendees should be motivated by a desire to learn command line tools to process data.

A basic knowledge of python is expected and can be achieved prior to joining the course through self-taught online tutorials, e.g. https://geo-python-site.readthedocs.io/en/latest/ followed by https://autogis-site.readthedocs.io/en/latest/). Additionally, we expect participants to have a specific interest in geographical data analyses. Therefore, experience in the use of Geographic Information Systems and basic statistical knowledge is essential. Basic concepts of GIS, such as rasters/vectors, overlays, and buffering, as well as fundamentals of statistics, such as mean standard, deviation, and residuals are prerequisites for this course. This year, we are offering a dedicated GRASS module* free of charge. Working knowledge in bash and GDAL are prerequisites for this module.

* The GRASS module is an initiative in the framework of NSF-funded POSE project TI-2303651: Growing GRASS OSE for Worldwide Access to Multidisciplinary Geospatial Analytics.

Overall, 10-15 hours per week are required outside class time for homework, material review, model implementation, coding, and trouble shooting.

The completion (attending and presenting the final project) of the “Geocomputation & Machine Learning for environmental applications” course allows to be directly enrolled in the GEO-OPEN-HACK-2024: Big Geospatial Data Hackathon with Open Infrastructure and Tools (advanced level). Anyway, the hackathon registration need to be fill in.

Academic programme:

The proposed course intends to provide students with the opportunity to develop crucial skills required for advanced spatial data processing. Throughout the course students will focus on developing fundamental and independent-learning skills in advanced data processing – a field that is continuously evolving with the availability of increasingly complex data and ongoing technological advancement. A diverse set of complementary and sometimes overlapping tools will be presented in an overview of open source software available for spatial data processing. We demonstrate their strengths, weaknesses and key features for various data processing objectives (e.g. modelling, data filtering, queries, GIS analyses, graphics or reporting) and data types. Specifically, we guide students in using these tools and software, and assist them along the steep curve of learning command-line programming. We focus our training on helping students to find online help, solutions and strategies, in order to fix bugs, and independently progress with complex data processing problems.

The Academic programme covers different teaching strategies, followed by an in-person workshop:

Online live lectures: (15 min to 1 hour each) Students take part in a series of live lectures introducing the basic functioning of tools, theoretical aspects or background information needed for a better understanding of concepts that are subsequently applied in data processing.

Pre-recorded flipped classroom: (15 min to 1 hour each).  With a flipped classroom, students watch online lectures pre-recorded, and meet the teachers to collaborate in online discussions, addressing doubts, replay to live questions, receiving from the teacher guidance and feedback, thus creating meaningful learning opportunities

Online tutorials: Students are guided during hands-on sessions where instructors perform data analyses on real case study datasets, allowing the former to replicate the procedures on their own laptops.

Online exercises: In addition to tutorials and lectures, students are encouraged to embark on their respective projects of interest during exercise sessions. Specific tasks are set to help reinforce the newly learned data processing skills. These exercises equip students with the confidence and resources to become independent learners, and to effectively address the demands of advanced spatial-data processing. The cross-disciplinary exercises are designed to enhance programming skills and the understanding of mathematical modelling within the context of GIS and Remote Sensing. They may cover forestry, landscape planning, predictive modelling and species distribution, mapping, nature conservation, computational social science and other spatially related fields of studies. Furthermore, these case studies can be considered template procedures and are easily adapted to multi-disciplinary challenges.

Home assignments: We assign homework, which is discussed in the following live online seminar. This allows everybody to benefit from question and answer sessions. Assignments are of two different kinds: suggested assignments and compulsory assignments. The suggested assignments encourage heuristics whereas the compulsory assignments address a specific coding challenge, and are designed for problem solving and critical thinking.

Recorded sessions: all the lectures are recorded and made available for asynchronous viewing, which is particularly helpful to students working across time zones.

In-person workshop: This session is offered in Matera, Italy if at least 10 people are enrolled. To join this workshop, a deposit of EUR 100 (GBP 88 / USD 108) is required at the point of registration. The deposit is refunded in full when attendees arrive in Matera. The in-person week involves group collaboration and troubleshooting under the direct supervision of the instructors. The schedule is dedicated to project development and assistance in an interactive format between students and instructors. Attendance at this workshop is not compulsory but strongly encouraged. Matera is a beautiful town and also offers a wonderful sightseeing opportunity. Flights, accommodationand other travel costs are the responsibility of the students.

Course administration and learning objectives:

This course enables students to further develop and enhance their spatio-temporal data processing skills. Most importantly, it endows them with proficiency in a fully-functional open source operating system with all the requisite software tools. With continuous practice through the weeks, students become familiar with command lines and cover numerous topics, including:

  • Learning a large suite of existing tools and knowing which ones to employ for project-specific applications.
  • Acquiring confidence in using several command line utilities for spatial data processing under the Linux operating system.
  • Developing data processing skills; and understanding data types, data modelling and data processing techniques.
  • Independent learning, critical thinking and efficient data processing.

Course requirements: Participants are expected to have intermediate skills in GIS/RS, and python, and a strong desire to learn GIS using open source tools. We assume that participants have an interest in geographical data analyses as well as a basic, working knowledge of calculus and statistics.

Note-taking and organisation:  We urge you to take notes extensively and regularly organise electronic data in order to gain the most from the classes. Curiosity in formulating new research methods, commands and scripting procedures will be fundamental to your overall success.

PC and data Storage: A laptop with at least 40 GB of free disk space and 8GB of RAM.

Class materials: All class material will be presented on www.spatial-ecology.net, under https://spatial-ecology.net/docs/build/html/index.html.

Course certification

At the end of the course, attendees will receive a course certificate, subject to successful completion of the course which requires 100% attendance, home work delivery, and final project presentation (either in-person at Matera or online). For university students, course credit approval will be at the discretion of the concerned university.

Time table:

  • 3:00 – 3:20 pm materials recap and discussion (Q&A and homework solutions)
  • 3:20 – 4:20 pm lecture
  • 4:20 – 4:35 pm break
  • 4:35 – 5:45 pm lecture

Syllabus summary

  • OSGeo-live operating system /  Linux bash programming
  • AWK – Gnuplot
  • Gdal/OGR geospatial libraries, PKTOOLS
  • GRASS – offered in 2024 as a special stand-alone module, free of charge
  • Basic statistical modeling, residuals analysis, error indices
  • Python for Machine Learning regression/classification in supervised framework
  • Students projects presentation.

Preliminary course programme 

The finalised course programme will be released in the coming months. Nonetheless, the preliminary course programme offers a helpful overview of the subject matter.