# Aggregate daily time series into monthly
"t.rast.aggregate",
gs.run_command(input="lst_daily",
="lst_monthly",
output="lst_monthly",
basename="average",
method="1 months",
granularity="gran") suffix
Time series gap filling
In this fifth time series tutorials, we will go through an important topic when working with optical remote sensing derived data or products: gaps and gap filling. There are several methods to perform missing data imputation. Here, we’ll only demonstrate the usage of GRASS tools that allow us to perform gap filling in time, also called temporal interpolation. Specifically, we’ll show how to reconstruct missing data using:
- t.rast.gapfill,
- r.hants, and
- r.series.lwr.
This tutorial can be run locally or in Google Colab. However, make sure you install GRASS 8.4+, download the LST sample data and set up your project as explained in the first time series tutorial.
There are different types of gaps that we might want/need to fill when working with time series data: - full maps missing, e.g., a daily time series where some days are missing because product tiles are missing from the archive, or when we want to interpolate from weekly to daily data - (some) maps in the time series have missing data/gaps because of clouds, snow, product quality flags applied, etc.
Full maps missing
For the case of full maps missing, GRASS offers simple linear interpolation in time through the t.rast.gapfill tool. Let’s see an example: we will first aggregate our daily LST time series with a monthly granularity, then make a copy of it using t.copy, unregister a couple of maps here and there with t.unregister, apply a temporal linear interpolation with t.rast.gapfill and compare the results.
# Create a copy
"t.copy",
gs.run_command(input="lst_monthly",
="lst_monthly_copy")
output
# Unregister maps from lst_monthly_copy
# Note that we remove 1, 2 and 3 consecutive maps in different periods of the year
= "lst_monthly_2014_03,lst_monthly_2014_10,lst_monthly_2014_11,lst_monthly_2015_06,lst_monthly_2015_07,lst_monthly_2015_08"
to_unregister
"t.unregister",
gs.run_command(input="lst_monthly_copy",
=to_unregister)
maps
# Check gaps
import pandas as pd
"t.rast.list",
pd.DataFrame(gs.parse_command(input="lst_monthly_copy",
="deltagaps",
methodformat="csv")))
# Fill gaps
"t.rast.gapfill",
gs.run_command(input="lst_monthly_copy",
="gaps")
basename
# Check gaps again and compare values with lst_monthly
= gs.parse_command("t.rast.list",
filled input="lst_monthly_copy",
="name,min,max",
columns="start_time < '2016-01-01'",
whereformat="json")
= gs.parse_command("t.rast.list",
orig input="lst_monthly",
="name,min,max",
columns="start_time < '2016-01-01'",
whereformat="json")
"data"]),
pd.concat([pd.DataFrame(orig["data"]).add_suffix("_filled")], axis=1) pd.DataFrame(filled[
# Plot the two time series for the city of Trento
!g.gui.tplot strds=lst_monthly,lst_monthly_copy \
=4410837,2559852 \
coordinates="Trento daily LST" xlabel="Time" ylabel="LST (C)" \
title=800,500 output=trento_gapfilled.png size
t.unregister allows us to remove maps from the temporal database or from a STRDS without actually removing them from the mapset, i.e., we only remove their timestamp and hence their record in the STRDS object.
Maps with holes
For the case when maps have no data areas (i.e. holes or gaps) because of cloud or cloud shadow masking, or as a result of QA flags application, two GRASS tools can be used, namely r.hants or r.series.lwr. When to use one or the other will mostly depend on the variable that the time series represent (i.e., temperature, NDVI, chlorophyll, etc.) and its granularity (i.e., hourly, daily, weekly, monthly, etc.).
Simulating the holes
To demonstrate the use of r.hants and r.series.lwr with our LST time series, we will simulate the occurrence of clouds by randomly masking different provinces in our study area. This will be a very simplified example for demonstration purposes only. A proper example of how to mask clouds and apply quality assessment flags will be developed in a different series of tutorials.
We clip the provinces vector map with the computational region in order to get the list of polygons’ ids or cat values, that we’ll use as “clouds”.
# Clip Italy provinces to the computational region
"v.clip",
gs.run_command(input="italy_borders_2",
="italy_borders_2_clip",
output="r")
flags
# Get unique categories
= gs.parse_command("v.category",
cats input="italy_borders_2_clip",
="print")
option
= list(cats.keys()) cats
Then we list the maps over which we will create the gaps.
# Get list of monthly maps
= gs.parse_command("t.rast.list",
maps input="lst_monthly",
="name",
columns="comma",
method="u")
flags
= list(maps.keys()) maps
Finally, we actually use 4 random polygons to create holes in each map of the monthly LST time series. Basically, for each map of the series, we apply an inverse mask of 4 random polygons, we overwrite the maps to actually get the holes (i.e., MASK is otherwise virtual), and we remove the mask.
import random
for i in range(len(maps)):
"r.mask",
gs.run_command(="italy_borders_2_clip",
vector=random.sample(cats, 4),
cats="i")
flags=f"{maps[i]} = {maps[i]}")
gs.mapcalc(exp
"r.mask", flags="r") gs.run_command(
Let’s check the holes we created:
= gj.Map()
gaps map="lst_monthly_2017_01")
gaps.d_rast( gaps.show()
Filling the holes
Harmonic analysis of time series (HANTS)
r.hants performs a Harmonic ANalysis of Time Series (HANTS) in order to estimate missing values and identify outliers. This algorithm considers only the most significant frequencies expected to be present in the time profiles (e.g. determined from a preceding Fast Fourier Transform analysis), and applies a least squares curve fitting procedure based on harmonic components (sines and cosines).
The option nf
, number of frequencies, should be carefully chosen. As a rule of thumb, nf should be at least the estimated periodicity plus 3, e.g. for NDVI with an annual cycle (one peak per year), the number of frequencies should be at least 4 when analyzing one year. The number of frequencies should not be too large, either. Otherwise, outliers can no longer be identified because of overfitting. Moreover, the number of frequencies should be smaller than n input maps / 2 if missing values should be reconstructed.
# Install extension
"g.extension", extension="r.hants") gs.run_command(
# Basic usage of r.hants
"r.hants",
gs.run_command(input=maps,
=4,
nf=12) base_period
Other r.hants options and flags to adjust the fit can be found at the tool manual page: https://grass.osgeo.org/grass-stable/manuals/addons/r.hants.html and also within the original publication.
# List filled maps
= gs.list_grouped(type="raster",
hants_maps ="*hants")["italy_LST_daily"] pattern
# Create new time series
"t.create",
gs.run_command(="lst_monthly_hants",
outputtype="strds",
="absolute",
temporaltype="Gap-filled monthly LST",
title="HANTS gap-filled monthly LST - North Italy, 2014-2018") description
# register maps
"t.register",
gs.run_command(="i",
flagsinput="lst_monthly_hants",
type="raster",
=hants_maps,
maps="2014-01-01",
start="1 months") increment
# Print time series info
"t.info", input="lst_monthly_hants") gs.run_command(
Let’s see a plot.
!g.gui.tplot strds=lst_monthly_hants,lst_monthly \
=4410837,2559852 \
coordinates="Trento reconstructed LST" xlabel="Time" ylabel="LST (C)" \
title=800,500 output=trento_hants.png size
It is important to highlight that r.hants will fit a single model to the whole input time series. For multiple years, that might not be the best option because interannual variability will not be accounted for. So, for the case of multiple years, it is recommended to run years separately and then merge the results.
Note that r.hants will reconstruct all cells within the input maps, whether they have gaps or not. If we want to keep the original values, we could patch the original series with the reconstructed result. That could be done as follows:
# Patching
for i, j in zip(maps, hants_maps):
print(i, j)
=f"{j}_patch"
out"r.patch",
gs.run_command(input=[i, j],
=out) output
Local weighted regression
r.series.lwr performs a local weighted regression (LWR) in time in order to estimate missing values and identify outliers. For each observation in the time series, the neighbor values in time are used to estimate a polynomial function that best fits the observations. The values are weighted according to their distance in time to the current observation. Values that are farther away get lower weights. The difference among the weight functions lies in how strongly the current observation is emphasized with respect to its temporal neighbors.
The option order
determines the order of the polynomial function used to fit the observations. An order of 0 is a weighted average, an order of 1 is a linear regression. Recommended is order=2
.
All gaps in the time series are by default interpolated, as long as the time series contains sufficient non-NULL observations. Optionally, the maximum size of gaps to be interpolated can be set with the maxgap
option.
The module uses an adaptive bandwidth to fit the polynomial and searches for: order + 1 + dod valid values around the current observation. The degree of over-determination (dod
) is the user defined number of extra temporal neighbors that should be considered for the estimation of the value at each time step.
Just for comparison purposes, we’ll use the lst_monthly
time series. However, r.series.lwr is known to be more effective when there’s no such a clear cyclic pattern or in smaller granularities, like daily or weekly, when data shows more variation.
# Install extension
"g.extension", extension="r.series.lwr") gs.run_command(
# Run r.series.lwr
"r.series.lwr",
gs.run_command(input=maps,
="_lwr",
suffix=2,
order="tricube") weight
Other r.series.lwr options and flags to adjust the fit can be found at the tool manual page: https://grass.osgeo.org/grass-stable/manuals/addons/r.series.lwr.html.
Let’s create a time series and plot the results.
# List filled maps
= gs.list_grouped(type="raster",
lwr_maps ="*lwr")["italy_LST_daily"] pattern
# Create new time series
"t.create",
gs.run_command(="lst_monthly_lwr",
outputtype="strds",
="absolute",
temporaltype="Gap-filled monthly LST",
title="LWR gap-filled monthly LST - North Italy, 2014-2018") description
# register maps
"t.register",
gs.run_command(="i",
flagsinput="lst_monthly_lwr",
type="raster",
=lwr_maps,
maps="2014-01-01",
start="1 months") increment
# Print time series info
"t.info", input="lst_monthly_hants") gs.run_command(
!g.gui.tplot strds=lst_monthly_lwr,lst_monthly \
=4410837.455830389,2559852.473498233 \
coordinates="Trento reconstructed LST" xlabel="Time" ylabel="LST (C)" \
title=800,500 output=trento_lwr.png size
Note that there’s an overshoot towards negative values because of missing data in the first date. Extrapolation can be avoided by using the i
flag for interpolation only.
Comparison of HANTS and LWR results
There are no major differences among the results of these two reconstructing methods, probably because it is a smoothed time series. But have a look at a comparison of applying HANTS and LWR to a Chlorophyll-a time series with 8-day granularity:
More details can be found in the Filling and Reconstructing time series of the temporal data processing wiki page.
References
- Roerink, G., Menenti, M., Verhoef, W. 2000. Reconstructing cloudfree NDVI composites using Fourier analysis of time series. International Journal of Remote Sensing, 21 (9), 1911-1917. DOI.
- Gebbert, S., Pebesma, E. 2014. TGRASS: A temporal GIS for field based environmental modeling. Environmental Modelling & Software 53, 1-12. DOI.
- Gebbert, S., Pebesma, E. 2017. The GRASS GIS temporal framework. International Journal of Geographical Information Science 31, 1273-1292. DOI.
- Temporal data processing wiki page.
The development of this tutorial was funded by the US National Science Foundation (NSF), award 2303651.