aelab

Introduction

The goal of achieving net zero emissions by 2050 is a significant driver of research focused on understanding greenhouse gas (GHG) flux dynamics and the autotrophic production of ecosystems. This ambitious target highlights the urgent need to explore how different ecosystems contribute to GHG emissions and how they can be managed to reduce these emissions effectively. To support this research, this R package offers a variety of useful tools designed to facilitate the analysis of data related to GHG flux and production in aquatic environments.

GHG flux calculation

Various techniques are used to measure greenhouse gas (GHG) flux, including eddy covariance (EC), the boundary layer method (BLM), and the static chamber method. This study provides R functions specifically designed to analyze data from GHG flux measurements using the static chamber method with a portable gas analyzer.

Load and process the raw data file

First, we need to obtain necessary parameters from the raw data downloaded from the portable gas analyzer. We will remove unnecessary rows and columns and extract the GHG concentrations along with the corresponding date and time recorded by the analyzer using tidy_ghg_analyzer(). Additionally, this function will also eliminate any NaN values in the GHG data. Currently, this function supports raw data downloaded from:
(i) LI-COR Trace Gas Analyzer (LI-7810 and LI-7820). (ii) ABB LGR-ICOS Gas Analyzer (M-GGA-918).

# The provided file is a raw data file downloaded from 
# the LI-COR Trace Gas Analyzer
ghg_data_path <- system.file("extdata", "ch4.xlsx", package = "aelab", mustWork = T)
ch4 <- tidy_ghg_analyzer(ghg_data_path, "ch4")
ch4[c(1:3), ]
##         DATE     TIME      CO2      CH4           date_time
## 1 2023/03/11 07:31:59 799.9406 2999.952 2023-03-11 07:31:59
## 2 2023/03/11 07:32:00 770.1415 2995.596 2023-03-11 07:32:00
## 3 2023/03/11 07:32:01 771.6826 2993.581 2023-03-11 07:32:01

Sometimes, the date and time recorded by the analyzer do not match the actual date and time we recorded in situ. Therefore, we need to convert the date_time column from the previous step to align with the real-life time, if there are any discrepancies:

# The analyzer's time was assumed to be 
# 15 minutes and 30 seconds faster than real time
ch4 <- convert_time(ch4, min = -15, sec = 30)
ch4[c(1:3), c(5:6)]
##             date_time       real_datetime
## 1 2023-03-11 07:31:59 2023-03-11 07:17:29
## 2 2023-03-11 07:32:00 2023-03-11 07:17:30
## 3 2023-03-11 07:32:01 2023-03-11 07:17:31

Calculate the slope

We first need to obtain the slope from the linear regression of GHG concentration changes over time within the chamber to calculate the flux. This can be done by defining the measurement time interval, either from a reference file (see Method 1 below) or directly in R (see Method 2 below).

Method 1: Load from excel

In Excel, enter the date and time when the GHG flux measurement started, then load the file into R. Make sure the column containing the date and time information is formatted in ISO 8601 as YYYY-MM-DD hh:mm:ss.

ref_data_path <- system.file("extdata", "reference.xlsx", package = "aelab", mustWork = T)
ref <- read_excel(ref_data_path)
ref
## # A tibble: 3 × 3
##   real_date           real_time           date_time          
##   <dttm>              <dttm>              <dttm>             
## 1 2023-03-11 00:00:00 1899-12-31 07:32:00 2023-03-11 07:32:00
## 2 2023-03-11 00:00:00 1899-12-31 08:32:00 2023-03-11 08:32:00
## 3 2023-03-11 00:00:00 1899-12-31 09:32:00 2023-03-11 09:32:00

Now, we can perform multiple simple linear regressions on the GHG concentrations, separated by the date_time value in the ref data, using calculate_regression(). The measurement duration, referred to as duration_minutes, is set to a default of 7 minutes, and the number of rows used for the regression, also denoted as duration_minutes, defaults to 300 rows (which is equivalent to 5 minutes). These values can be modified if desired.

When the measurement duration (specified by duration_minutes) exceeds the data used for regression (specified by num_rows), this function automatically selects the rows with the highest R\(^2\) within the defined interval.

In the output tibble, start_time and end_time indicate the time range of the data used for the regression. slope represents the slope, and r_square denotes the R\(^2\) value of the regression. reference_time is the start time of the measurement from the input.

calculate_regression(ch4, ghg = "CH4", reference_time = ref$date_time,
                     duration_minutes = 7, num_rows = 300)
## # A tibble: 3 × 5
##   start_time          end_time              slope r_square reference_time     
##   <chr>               <chr>                 <dbl>    <dbl> <dttm>             
## 1 2023/03/11 07:33:42 2023/03/11 07:38:41 -0.0406    0.671 2023-03-11 07:32:00
## 2 2023/03/11 08:32:00 2023/03/11 08:36:59 -0.207     0.671 2023-03-11 08:32:00
## 3 2023/03/11 09:34:01 2023/03/11 09:39:00  0.104     0.444 2023-03-11 09:32:00

Method 2: Type directly in R

The start time of measurement can also be input directly into the function. Please note that the reference_time must be in POSIXct format.

calculate_regression(ch4, ghg = "CH4", reference_time = as.POSIXct("2023-03-11 07:32:00", tz = "UTC"))
## # A tibble: 1 × 5
##   start_time          end_time              slope r_square reference_time     
##   <chr>               <chr>                 <dbl>    <dbl> <dttm>             
## 1 2023/03/11 07:33:42 2023/03/11 07:38:41 -0.0406    0.671 2023-03-11 07:32:00

Calculate the flux

The equation we used to calculate the GHG flux was derived from Yong et al. (2024):

\[ F = \frac{(S \times V \times c)}{R \times T \times A} \]

where S is the slope obtained from the linear regression of GHG concentration changes over time (ppm s\(^{-1}\)), V is the chamber volume (liters), c is the conversion factor from seconds to hours, R is the ideal gas constant (0.082 L atm K\(^{−1}\) mol\(^{−1}\)), T is the temperature inside the chamber (kelvin), and A is the surface area of the chamber (m\(^2\)).

Therefore, the function calculate_ghg_flux() requires a dataframe with columns for the slope, area, volume, and temperature. Note that the units of the parameters must match those described above, resulting in a flux with units of mmol m\(^{-2}\) d\(^{-1}\).

results_ch4 <- calculate_regression(ch4, ghg = "CH4", reference_time = as.POSIXct("2023-03-11 07:32:00", tz = "UTC"))
flux_ch4 <- data.frame(
    slope = results_ch4$slope,
    area = 1, # in square meter
    volume = 1, # in litre
    temp = 1) # in celcius
calculate_ghg_flux(flux_ch4)
##                                       slope area volume temp         flux
## seq_along(selected_data[[ghg]]) -0.04058922    1      1    1 -0.006499966
##                                         unit
## seq_along(selected_data[[ghg]]) mmol m-2 d-1

NEM calculation

The change in oxygen concentration in the water reflects the balance between photosynthetic production, respiratory consumption, and physical exchange at the water-atmosphere interface. By measuring dissolved oxygen (DO) concentrations at high frequency with in situ data loggers, we can calculate gross primary production (GPP), ecosystem respiration (ER), and net ecosystem metabolism (NEM), as detailed by Staehr et al. (2010).

Load and process the raw data file

First, we need to obtain the necessary parameters from the raw data downloaded from the data loggers. We will remove unnecessary rows and columns and extract the dissolved oxygen concentrations and water temperature, along with the corresponding date and time recorded by the logger using process_hobo(). This function will also eliminate any NA values in the dataframe. The no_hobo input was necessary for further analysis. Currently, it supports raw data downloaded from the Onset HOBO Dissolved Oxygen Data Logger (U26-001).

hobo_data_path <- system.file("extdata", "ex_hobo.csv", package = "aelab")
do <- process_hobo(hobo_data_path, no_hobo = "code_for_logger")
do[c(1:3), ]
##             date_time   do  temp         no_hobo
## 1 2024-04-08 12:00:00 7.69 25.86 code_for_logger
## 2 2024-04-08 12:30:00 7.67 26.02 code_for_logger
## 3 2024-04-08 13:00:00 7.65 26.20 code_for_logger

The air pressure and wind speed at the sampling site should be obtained either manually or from a nearby weather station. The process_weather function tidies the CSV file downloaded from Taiwan’s weather station. In addition to extracting the columns with air pressure and wind speed values, it also duplicates the hourly data to match the half-hourly recorded frequency of the DO concentrations.

weather_data_path <- system.file("extdata", "ex_weather.csv", package = "aelab")
weather <- process_weather(weather_data_path, date = "2024-04-10", zone = "zone_A")
weather[c(1:5), ]
##   pressure_hpa wind_ms           date_time   zone
## 1       1015.2     1.4 2024-04-10 00:30:00 zone_A
## 2       1015.2     1.4 2024-04-10 01:00:00 zone_A
## 3       1015.0     1.6 2024-04-10 01:30:00 zone_A
## 4       1015.0     1.6 2024-04-10 02:00:00 zone_A
## 5       1014.9     1.4 2024-04-10 02:30:00 zone_A

In addition to the previously mentioned parameters, we also needed the water depth, salinity, start and end date and time of the data logger deployment, and the sunrise and sunset times during that period. These parameters can be entered in Excel and then imported into R using process_info() to ensure that all necessary parameters are included. The column zone corresponds to process_weather(), and column no_hobo corresponds to process_hobo().

info_data_path <- system.file("extdata", "info.xlsx", package = "aelab")
info <- process_info(info_data_path)
info
## # A tibble: 1 × 9
##   zone   site   no_hobo depth_m salinity start_date_time     end_date_time      
##   <chr>  <chr>  <chr>     <dbl>    <dbl> <dttm>              <dttm>             
## 1 zone_A site_A code_f…       1       10 2024-04-10 00:00:00 2024-04-10 23:59:00
## # ℹ 2 more variables: sunrise <dttm>, sunset <dttm>

Finally, we can merge the dataframe containing air pressure and wind speed data from process_weather() with the processed data downloaded from the data loggers. Next, we will combine this with the other necessary parameters using the zone and no_hobo columns. We can use the plot_hobo() function to inspect the changes in DO concentrations.

data <- merge(weather, do, by = "date_time")
merged_df <- data %>% 
  inner_join(info, by = c("zone", "no_hobo"))
merged_df[c(1:3), ]
##             date_time pressure_hpa wind_ms   zone   do  temp         no_hobo
## 1 2024-04-10 00:30:00       1015.2     1.4 zone_A 8.28 25.10 code_for_logger
## 2 2024-04-10 01:00:00       1015.2     1.4 zone_A 8.07 25.04 code_for_logger
## 3 2024-04-10 01:30:00       1015.0     1.6 zone_A 7.80 24.98 code_for_logger
##     site depth_m salinity start_date_time       end_date_time
## 1 site_A       1       10      2024-04-10 2024-04-10 23:59:00
## 2 site_A       1       10      2024-04-10 2024-04-10 23:59:00
## 3 site_A       1       10      2024-04-10 2024-04-10 23:59:00
##               sunrise              sunset
## 1 1899-12-31 05:45:00 1899-12-31 18:15:00
## 2 1899-12-31 05:45:00 1899-12-31 18:15:00
## 3 1899-12-31 05:45:00 1899-12-31 18:15:00
plot_hobo(merged_df)

Calculate GPP and ER

GPP and ER can be calculated using calculate_do(). NEM (denoted as nep in the output dataframe) was calculated by GPP + ER.

calculate_do(merged_df)
##     site         no_hobo       date     r_day      gpp        nep
## 1 site_A code_for_logger 2024-04-10 -17.77689 17.56762 -0.2092709

Reference

Staehr, P. A., Bade, D., Van de Bogert, M. C., Koch, G. R., Williamson, C., Hanson, P., Cole, J. J., & Kratz, T. (2010). Lake metabolism and the diel oxygen technique: State of the science. Limnology and Oceanography: Methods, 8(11), 628–644. https://doi.org/10.4319/lom.2010.8.0628 Yong, Z.-J., Lin, W.-J., Lin, C.-W., & Lin, H.-J. (2024). Tidal influence on carbon dioxide and methane fluxes from tree stems and soils in mangrove forests. Biogeosciences, 21(22), 5247–5260. https://doi.org/10.5194/bg-21-5247-2024