UKFE

Introduction

This training material covers the functionality in the UKFE R package. It is an interactive document where you can run the code yourself and there are some exercises where you have to fill in the gaps.

To fully complete this training, you must have access to RStudio (or another suitable R environment). Some functionality cannot run in this interactive platform alone. Note that you need to install R before installing RStudio.

You must have solved and run the code for each exercise in order for the following linked exercises to work. Note that answers appear below the code. If there are multiple parts of the code that get written out, then the order of the results is the same as the order of the code.

Some functions within the UKFE package require internet access to perform API (application programming interface) calls to external websites in order to retrieve data. This functionality is not always available within this interactive document. However, we have included these exercises so that you can run them in your own R environment, where they should operate as intended.

Please note that the Zdists() function is computationally intensive and may occasionally time out when run within this platform. This is not an issue with the function itself. Running it locally in your own R environment should improve performance and reliability.

These exercises have been developed for the purposes of the training and code is likely to require editing before being used in practice.

Before doing any analysis, you need to install the UKFE package. This only needs doing once (it will need redoing if the UKFE package is updated or if you update your version of R).

install.packages("UKFE")

Once you have installed the package, this will be saved in your package library. Every time you want to use the package, you need to load it from the library using the following code:

library(UKFE)

This will allow you to use the functions within the package.

It is recommended that you start each analysis with a clean environment in RStudio. This will help prevent data from previous analyses interfering with your current analysis. Ideally, RStudio should be restarted between analyses (you can edit the RStudio Global options so that data from a previous R session is not reloaded when starting RStudio). You can clear objects from the workspace using the following code:

rm(list = ls())

R basics

Before we get into the UKFE package, this section covers some R basics that you may find useful when using the UKFE package.

Numerical operations

In R we can perform arithmetic operations, for example:

Objects

An object acts as a container for storing data. For example, you can assign some text (called a string) to the object called greeting and then use this variable later:

In the UKFE package, numeric variables are used a lot:

These can then be used in arithmetic operations:

Vectors

Vectors are one-dimensional arrays that store data. Each element has the same data type. To create one, you can use the c function, which stands for combine or concatenate. UKFE uses lots of numeric vectors.

You can perform calculations with vectors:

Data frames

Data frames are structures made up of rows and columns, similar to the structure of a csv file. Each column in a data frame is a vector. Data frames are common in the UKFE package.

Subsetting allows you to extract specific parts of a data frame.

Accessing columns

You can use the $ operator or column names inside square brackets.

Accessing rows and elements

Use square brackets [row, column]:

Selecting multiple rows or columns

Use vectors of indices:

Or use column names (this example extracts both columns because there are only two columns to start with):

Functions

Functions are built-in or user-defined operations that take input values (called arguments) and return an output. Arguments are placed inside parentheses after the function name. For example, we can use the mean() function to find the mean of a vector:

We can also use a variable inside a function:

Some functions take multiple arguments, which can be specified by name for clarity:

Functions come with help files that tell you more about the function, including explanations of the arguments and outputs.

Functions may have default values for some arguments. If you don’t specify them, R uses the defaults.

You can view a function’s arguments and defaults using:

You can learn about the functions in the UKFE package using this documentation and the functions’ help files.

Learning resources for R

If you are finding the UKFE documentation difficult to follow, or if you would like to further develop your knowledge of R, there are many resources available to help. The following suggestions are particularly useful for those who are new to R or programming in general.

Online books

  • R for Data Science and R Programming for Data Science are introductory texts, freely available online. They cover the essentials of working with R, including practical examples and exercises.
  • Both books include lists of additional resources in their ‘R Resources’ sections.

Interactive courses

  • DataCamp offers interactive, browser-based courses in R. Some introductory content is available for free.

Webinars and video tutorials

  • Posit Webinars feature recorded sessions on a wide range of topics, from beginner to advanced level.
  • YouTube also hosts many high-quality R tutorials and lecture series. Searching for “R programming for beginners” will return playlists of relevant material.
  • Coursera offers structured R courses from universities such as Johns Hopkins and the University of Washington. These courses are suitable for beginners and are sometimes available for free or with financial aid.

Blogs and tutorials

  • R-bloggers contains tutorials and articles from across the R community. Posts range from beginner-friendly guides to advanced techniques, and often include worked examples and practical applications.

Support and questions

  • If you have questions specifically about this package, you may find it helpful to visit the GitHub Discussions forum [TODO: Add link once set up] associated with the repository. This is a good place to ask questions, report issues, or see if others have encountered similar problems.

  • Stack Overflow is a useful resource for general R-related queries, especially for language syntax, errors, and troubleshooting.

UKFE datasets

There are five datasets that come loaded with the UKFE. The function QMEDData() can be used to retrieve catchment descriptors from the NRFA and calculated statistics for sites suitable for QMED and pooling. You can save and retrieve the data as follows:

The five available datasets are:

  • AMSP

  • NRFAData

  • QMEDData

  • ThamesPQ

  • UKoutline

You can find out more about a dataset using its help file, which can be accessed as follows:

?AMSP

Alternatively, you can view the package documentation on CRAN: https://cran.r-project.org/web/packages/UKFE/UKFE.pdf

Exercise to explore NRFA Data

# Look at the documentation (help file) for NRFAData (commented out 
# in this solution so that it will not run)
1# ?NRFAData

# Save the 'NRFAData' data frame to an object called 'NRFAData' within your R environment
NRFAData <- NRFAData

# Look at the head (first 6 rows) of your data
head(NRFAData)                                
1
Use the question mark followed by the function name to access the function’s help file.

Data input functions

There are several functions with names starting with GetData that extract data from the websites of different organisations using their APIs. These are:

  • GetDataEA_QH(): Extracts flow or level data from the Environment Agency’s Hydrology Data Explorer.

  • GetDataEA_Rain(): Extracts rainfall data from the Environment Agency’s Hydrology Data Explorer.

  • GetDataMetOffice(): Extracts regional mean (monthly, seasonal and annual) temperature or rainfall from the UK Met Office. Sunshine duration is also available.

  • GetDataNRFA(): Extracts National River Flow Archive data (daily mean flow or catchment rainfall, AMAX, POT, gaugings and metadata).

  • GetDataSEPA_QH(): Extracts flow or level data from the Scottish Environment Protection Agency.

  • GetDataSEPA_Rain(): Extracts hourly rainfall data from the Scottish Environment Protection Agency.

Exercises

Using the help files, complete the following exercises:

GetDataEA_QH

Use the GetDataEA_QH() function to find the river gauges for the river Tame.

You can access the GetDataEA_QH() help file for more information about how to use the function by running ?GetDataEA_QH.

# Find gauges on the river Tame
GetDataEA_QH(RiverName = "Tame")

Use the GetDataEA_QH() function to find all of the gauges within 10km of a location with latitude 50.6 and longitude -3.9.

You can access the GetDataEA_QH() help file for more information about how to use the function by running ?GetDataEA_QH.

# Find gauges within 10km of a latitude 50.6 and longitude -3.9
GetDataEA_QH(Lat = 50.6, Lon = -3.9, Range = 10)

Use the GetDataEA_QH function to retrieve all available daily maximum flow data from the gauge with WISKI ID SX67F051.

You can access the GetDataEA_QH() help file for more information about how to use the function by running ?GetDataEA_QH.

# Get all available daily maximum flow data from the Bellever gauge on the 
# East Dart River
BelleverMax <- GetDataEA_QH(WISKI_ID = "SX67F051")
head(BelleverMax)

Use the GetDataEA_QH() function to retrieve 15-minute data for the gauge with WISKI ID SX67F051 (East Dart River at Bellever) from 23rd November 2024 to 25th November 2024.

You can access the GetDataEA_QH() help file for more information about how to use the function by running ?GetDataEA_QH.

# Get 15-minute data from the Bellever for the November 2024 event
BelleverNov2024 <- GetDataEA_QH(WISKI_ID = "SX67F051", From = "2024-11-23", 
                                To = "2024-11-25", Period = "15Mins")
head(BelleverNov2024)

GetDataEA_Rain

Note that the second exercise in this section is not interactive but the code should run in your own RStudio environment.

Use the GetDataEA_Rain() function to find gauges within 10km of a location with latitude 54.5 and longitude -3.2.

You can access the GetDataEA_Rain() help file for more information about how to use the function by running ?GetDataEA_Rain.

# Get information about available rain gauges within a 10km radius of 
# Lat = 54.5, Lon = -3.2
GetDataEA_Rain(Lat = 54.5, Lon = -3.2) 

Use the GetDataEA_Rain() function to get hourly data for December 2025 from the rain gauge with WISKI ID 592463 and then view the head of the resulting data frame.

You can access the GetDataEA_Rain() help file for more information about how to use the function by running ?GetDataEA_Rain.

# Use the WISKI reference for the Honister rain gauge to get some hourly 
# rainfall data for Dec 2015
HonisterDec2015 <- GetDataEA_Rain(WISKI_ID = "592463", Period = "Hourly", 
                                  From = "2015-12-01", To = "2015-12-31") 
head(HonisterDec2015) 

GetDataMetOffice

Note that the exercises in this section are not interactive but the code should run in your own RStudio environment.

Use the GetDataMetOffice() function to get the rainfall data for the UK.

You can access the GetDataMetOffice() help file for more information about how to use the function by running ?GetDataMetOffice.

# Get the rainfall series for the UK
UKRain <-  GetDataMetOffice(Variable = "Rainfall", Region = "UK") 

Use the GetDataMetOffice() function to get mean temperature data for East Anglia.

You can access the GetDataMetOffice() help file for more information about how to use the function by running ?GetDataMetOffice.

# Get mean temperature data for East Anglia
TempEastAnglia <- GetDataMetOffice(Variable = "Tmean", Region = "East_Anglia") 

GetDataNRFA

Use the GetDataNRFA() function to get the concurrent rainfall and mean flow series for site with NRFA ID 15006 (Tay at Ballathie) and then view the head of the data frame.

You can access the GetDataNRFA() help file for more information about how to use the function by running ?GetDataNRFA.

# Get the concurrent rainfall and mean flow series for the Tay at Ballathie 
BallathiePQ <-  GetDataNRFA(ID = 15006, Type = "PQ")
head(BallathiePQ)

Use the GetDataNRFA() function to get the gaugings data at the same location.

You can access the GetDataNRFA() help file for more information about how to use the function by running ?GetDataNRFA.

# Get the gaugings
BallathieGaugings <- GetDataNRFA(ID = 15006, Type = "Gaugings")
# View the head of the gaugings
head(BallathieGaugings)

GetDataSEPA_QH

Use the GetDataSEPA_QH() function to find the gauges on the River Spey.

You can access the GetDataSEPA_QH() help file for more information about how to use the function by running ?GetDataSEPA_QH.

# Find gauges on the River Spey
GetDataSEPA_QH(RiverName = "Spey")

Use the GetDataSEPA_QH() function to retrieve the daily mean flow data from the gauge with ID 37174.

You can access the GetDataSEPA_QH() help file for more information about how to use the function by running ?GetDataSEPA_QH.

# Get all available daily mean flow data from the Boat o Brig gauge on the Spey
SpeyDaily <- GetDataSEPA_QH(StationID = "37174")

# Find the date and time of the highest recorded peak for this gauge
SpeyDaily$DateTime[SpeyDaily$Q == max(SpeyDaily$Q)]

Use the GetDataSEPA_QH() function to extract 15-minute data from the same gauge for the highest recorded peak, using a date range from the day before the peak to the day after.

You can access the GetDataSEPA_QH() help file for more information about how to use the function by running ?GetDataSEPA_QH.

# Get 15-minute data from the Boat o Brig gauge for the highest recorded peak
BoatOBrigAug1970 <- GetDataSEPA_QH(StationID = "37174", From = "1970-08-16", 
                                   To = "1970-08-19", Period = "15Mins")
# View the first 15 rows of the data
head(BoatOBrigAug1970, n = 15)

GetDataSEPA_Rain

Use the GetDataSEPA_Rain() function to retrieve the data from Bannockburn for October 1998 and view the head of the data frame.

You can access the GetDataSEPA_Rain() help file for more information about how to use the function by running ?GetDataSEPA_Rain.

# Get information about the Bannockburn rain gauge for October 1998
Bannockburn <- GetDataSEPA_Rain(StationName = "Bannockburn", From = "1998-10-01", To = "1998-10-31") 
head(Bannockburn)

Flood Estimation Handbook (FEH) workflows

This section contains several scenarios that cover a range of options within the UKFE package that are available when completing FEH Statistical analyses. Scenarios 1-3 represent gauged catchments and 4-6 represent ungauged catchments.

Scenarios 1-3: Gauged catchments

Scenario 1: Estimate the growth curve for a flood estimation point at NRFA gauge 27081 (enhanced single-site analysis)

This scenario represents an urban gauged catchment in England that is classed as suitable for pooling in the NRFA.

The first step is to retrieve the catchment descriptors for gauge 27081. This can be done using the GetCDs() function, given that the site is in the NRFA. If the site is considered suitable for QMED and pooling, this function extracts the catchment descriptors from the QMEDData() data frame, otherwise they are extracted using the NRFA API. Note that if they are brought in from the NRFA API (when not suitable for QMED or pooling), some of the descriptors differ, for example the gauge location is provided rather than the catchment centroid. There will be a warning message when this happens.

The next step is to calculate QMED. This is a gauged catchment so QMED can be estimated as the median of the annual maximum (AMAX) flows. This site is suitable for pooling, so the GetAM() function can be used to extract the AMAX (otherwise the AMImport() function or the GetDataNRFA() function could have been used). The GetAM() and AMImport() functions exclude years classed as rejected in the NRFA Peak Flow Dataset. If using the GetDataNRFA() function, the rejected years need to be manually removed (this will be covered in a later scenario).

Basic maths functions to calculate statistics such as the maximum, minimum, median and mean are available in base R.

The next step is to create a pooling group; the Pool() function can be used. The first argument is the catchment descriptors at the site of interest. This site is gauged and suitable for pooling and therefore we want to include this gauge in the pooling group. Rural sites (where URBEXT2000 < 0.03) will be included by default; however as this site is urban the gauge will need to be included by setting the iug argument to TRUE. The iug argument stands for ‘include urban gauge’ and will add the closest site in catchment descriptor space (which should be the gauge of interest) to the pooling group regardless of the URBEXT2000 value. Note that away from the site of interest other pooling group members with URBEXT2000 > 0.03 will still be excluded, but this can be adjusted using the UrbMax argument if required.

Another argument of note in the Pool() function for this scenario is the DeUrb argument. If this is set to TRUE, the L-CV and L-Skew of any site in the pooling group with URBEXT2000 > 0.03 will be de-urbanised, so that all pooling group members represent rural growth curves. This scenario includes an urban gauge so we can set the DeUrb argument to TRUE. This will de-urbanise the subject site so that it is consistent with the rest of the (otherwise rural) pooling group. If there is no urban site (URBEXT2000 > 0.03) in the pooling group, UKFE will not allow for the pooling group L-moments to be deurbanised. Note that this procedure is different from WINFAP which applies the deurbanisation to all sites, even when effectively rural.

The pooling group heterogeneity can be assessed using the H2() function. There are two elements to the output: the first is the value and the second is an interpretation.

For enhanced single-site analysis, the WGaugLcv() and WGaugLSkew() functions can be used to find the gauged weighted L-CV and L-Skew from the pooling group. The only argument is the pooling group.

You can use the DiagPlots() function to create diagnostic plots to aid review of the pooling group:

The Pool() function does not currently include gauge names in the pooling group so you can either manually check for gauges on the same river or merge your own lookup table with the pooling group. You could use the NRFA metadata by installing the ‘rnrfa’ R package and using its catalogue() function. An example workflow is as follows:

# Load the package
library(rnrfa)

# Extract the NRFA gauge metadata
gauge_metadata <- catalogue()

# Extract just the gauge ID and name columns (first two columns) and convert to 
# a data frame for consistency with the pooling group
gauge_id_name <- data.frame(gauge_metadata[, 1:2])

# Retrieve catchment descriptors 
CDs_27081 <- GetCDs(27081)

# Create a pooling group that includes the gauge at the flood estimation point 
# (urban catchment) and deurbanises the L-CV and L-Skew of urban sites in the 
# group
Pool_27081 <- Pool(CDs_27081, iug = TRUE, DeUrb = TRUE)

# Take a copy of the pooling group (so that the original can be used in later 
# functions)
Pool_27081_copy <- Pool_27081

# Make the gauge IDs in the copy of the pooling group into a column called 'id' 
# to match the NRFA metadata (they are currently row names)
Pool_27081_copy$id <- rownames(Pool_27081)

# Merge the pooling group with the NRFA metadata using the 'id' column to merge 
# and setting the `sort` argument to `FALSE` so that the order of the rows is 
# not changed
Pool_27081_merged <- merge(Pool_27081_copy, gauge_id_name, by = "id", sort = FALSE)
head(Pool_27081_merged)

# Write out the pooling group information to a csv file (this is not run here 
# but you can use similar code). Use the help file to assist. You will need to 
# replace my_filepath with an actual file path ending with ".csv".
# write.csv(Pool_27081_merged, my_filepath, row.names = FALSE)

# Write out a subset of the pooling group containing just the information you 
# want. This example extracts data for a range of columns and rounds some of 
# the numbers to either 3 decimal places or 3 significant figures.
# Create the subset of data
Pool_27081_merged_subset <- 
  data.frame(Station_ID = Pool_27081_merged$id, Station_name = Pool_27081_merged$name,
             Distance = round(Pool_27081_merged$SDM, 3), Years_of_data = Pool_27081_merged$N, 
             QMED_AM = signif(Pool_27081_merged$QMED, 3), LCV = round(Pool_27081_merged$Lcv, 3), 
             LSKEW = round(Pool_27081_merged$LSkew, 3), Discordancy = round(Pool_27081_merged$Discordancy, 3))
# View the subset
Pool_27081_merged_subset
# Write the data to a csv file (not run here but you can use similar code)
# write.csv(Pool_27081_merged_subset, my_filepath, row.names = FALSE)

We can use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions. Note that this function is slightly different from the zdist function described in the science report ‘Improving the FEH statistical procedures for flood frequency estimation, Environment Agency (2008)’ (this is explained in the function’s help file).

This function gives a different answer each time it is run because there is a random simulation element to it. If you want to make your work repeatable, you can set the seed before running the function. This enables consistent random number generation each time you run the code. Any number can be used in the set.seed() function. The output contains two elements: the first is the Zdist score for each distribution, and the second which distribution has the best fit (the closest absolute value to zero).

Once the pooling group has been created, it can be used to estimate the growth curve using the PoolEst() function. For gauged results, the gauged argument needs to be set to TRUE. QMED needs to be supplied using the QMED argument. You can select the distribution you wish to use using the dist argument. If urbanised results are required, the UrbAdj argument should be set to TRUE to apply an urban adjustment to the L-CV and L-skew (and hence the growth curve). If this is the case, either the urban extent or the catchment descriptors need to be supplied. In this example, we include the catchment descriptors.

The results are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q) and growth factor estimates (GF). Two additional columns quantify the 95% confidence interval using methods detailed in ‘Circulation’ edition 152 (the British Hydrological Society magazine).

The second element is the estimated pooled L-CV and L-skew (linear coefficients of variation and skewness). Note that the Pooled L-CV and L-skew reported in these results are different from those from the WGaugLcv() and WGaugLSkew() functions above because they did not have an urban adjustment applied.

The third and fourth elements provide parameters (location, scale and shape if the chosen distribution has three parameters) for the standardised and non-standardised distributions respectively.

The growth curve can be plotted using the EVPool() function. If we set the gauged argument to TRUE, the plot shows the gauged curve based on the pooling group, along with the single site curve with the observed points added. If it is gauged and the AMAX argument is set to NULL then the AMAX from the first site in the pooling group will be plotted (this should be the gauge at the site of interest). The AMAX argument can be used to plot other AMAX series. The distribution is set to generalised logistic by default so the dist() argument only needs to be included if you are using a different distribution (it is included for clarity here). The UrbAdj argument needs to be set to TRUE to apply an urban adjustment to the pooled growth curve (we will set this for consistency with the results from the PoolEst() function). If this is the case, the CDs argument also needs including. The plot title can be changed using the Title argument.

The growth curve can be converted to a flood frequency curve by including the QMED argument.

Scenario 2: Estimate the growth curve for a flood estimation point at NRFA gauge 90003

Enhanced single-site analysis

We will use a similar workflow to Scenario 1 but for a catchment in Scotland that is not urban. Start by clearing your environment:

Then extract the catchment descriptors using the GetCDs() function:

Use the GetCDs() function.

# Retrieve catchment descriptors 
CDs_90003 <- GetCDs(90003)
# View the catchment descriptors
CDs_90003

As this is gauged, we can estimate QMED directly from the median of the AMAX data. Use the GetAM() function to extract the AMAX data for NRFA gauge 90003.

Use the GetAM() function.

# Extract the AMAX data and save to an object called 'AMAX_90003'
AMAX_90003 <- GetAM(90003)

# View the head (first six rows) of the AMAX data
head(AMAX_90003)

# Plot the AMAX data
AMplot(AMAX_90003)

Now estimate QMED. Remember that the flow values are in the second column of the extracted AMAX data.

# Estimate QMED
QMED_90003 <- median(AMAX_90003[, 2])
# View the result
QMED_90003

We can now create the pooling group. As this catchment is not urban, we do not have to set the iug argument to be TRUE. Its default setting is FALSE so we do not need to include it, but we can for clarity. If you try setting it to TRUE then you will see a warning which says that the function is treating it as FALSE. Sites with urban catchments will be excluded from the pooling group by default and our catchment is not urban so we also do not need to deurbanise the L-CV and L-skew of urban sites in the group by setting the DeUrb argument to TRUE. This argument also has a default of FALSE but we can include it in our code for clarity.

Use the Pool function to create a pooling group.

# Create a pooling group that includes the gauge at the flood estimation 
# point (not an urban catchment)
Pool_90003 <- Pool(CDs = CDs_90003, iug = FALSE, DeUrb = FALSE)

# View the pooling group
Pool_90003

You can obtain the total number of years of record across the sites in the pooling group by using the sum() function in base R:

# Obtain the total number of years of record in the pooling group
sum(Pool_90003$N)

Assess the pooling group heterogeneity using the H2() function:

Use the H2() function.

# Assess the pooling group heterogeneity
H2(Pool_90003)

Find the gauged weighted L-CV and L-skew from the pooling group:

# Find the gauged weighted L-CV and L-skew from the pooling group
# L-CV
WGaugLcv(Pool_90003)
# L-skew
WGaugLSkew(Pool_90003)

Create diagnostic plots to aid review of the pooling group:

Use the DiagPlots() function.

# Create diagnostic plots to aid review of the pooling group
DiagPlots(Pool_90003)

You note that site 83002 in the pooling group has a short record and an unusually high L-Kurtosis and decide to remove it from the pooling group. You can do this using the exclude argument in the Pool() function. This argument can be used to exclude either one or more sites. The site with the next lowest similarity distance measure will then be added to the pooling group until the number of total gauged record years is at least 500 (or whatever value you set for the N argument, which has a default of 500).

Recreate the pooling group, excluding site 83002, and find the same information relating to the pooling group as above:

# Recreate the pooling group, excluding site 83002
Pool_90003 <- Pool(CDs_90003, iug = FALSE, DeUrb = FALSE, exclude = 83002)

# View the pooling group
Pool_90003

# Find the total number of years
sum(Pool_90003$N)

# Assess the pooling group heterogeneity
H2(Pool_90003)

# Find the gauged weighted L-CV and L-skew from the pooling group
# L-CV
WGaugLcv(Pool_90003)
# L-skew
WGaugLSkew(Pool_90003)

Create diagnostic plots to aid review of the pooling group:

# Create diagnostic plots to aid review of the pooling group
DiagPlots(Pool_90003)

Use the rnrfa package to extract gauge metadata and merge with your pooling group information so that you can look at the gauge names. Note that the code in the exercise assumes that you already have the rnrfa package loaded. This exercise is not interactive but the code can be run in your own RStudio environment.

# Extract the NRFA gauge metadata
gauge_metadata <- catalogue()

# Extract just the gauge ID and name columns (first two columns) and convert 
# to a data frame for consistency with the pooling group
gauge_id_name <- data.frame(gauge_metadata[, 1:2])

# Take a copy of the pooling group (so that the original can be used in 
# later functions)
Pool_90003_copy <- Pool_90003

# Make the gauge IDs in the pooling group into a column called 'id' to match
# the NRFA metadata (they are currently row names)
Pool_90003_copy$id <- rownames(Pool_90003)

# Merge the pooling group with the NRFA metadata using the 'id' column to 
# merge and setting the `sort` argument to `FALSE` so that the order of 
# the rows is not changed
Pool_90003_merged <- merge(Pool_90003_copy, gauge_id_name, by = "id", sort = FALSE)
head(Pool_90003_merged)

# Write out the pooling group information to a csv file if you wish

# Create a subset of the pooling group containing just the information you want. 
Pool_90003_merged_subset <- 
  data.frame(Station_ID = Pool_90003_merged$id, Station_name = Pool_90003_merged$name,
             Distance = round(Pool_90003_merged$SDM, 3), Years_of_data = Pool_90003_merged$N, 
             QMED_AM = signif(Pool_90003_merged$QMED, 3), LCV = round(Pool_90003_merged$Lcv, 3), 
             LSKEW = round(Pool_90003_merged$LSkew, 3), Discordancy = round(Pool_90003_merged$Discordancy, 3))
Pool_90003_merged_subset

# Write the data to a csv file if you wish

We can now use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions:

Use the Zdists() function.

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_90003) 

Use the pooling group to estimate the growth curve using the PoolEst() function with the generalised logistic distribution. Remember that we are treating this as gauged. Set the return periods of interest to be 2, 5, 10, 20, 30, 50, 75, 100, 200, 500 and 1000 using the RP argument.

# Estimate the growth curve from the pooling group without using an urban 
# adjustment (this means that CDs do not need to be provided)
Results <- PoolEst(Pool_90003, gauged = TRUE, QMED = QMED_90003, dist = "GenLog", 
                   UrbAdj = FALSE, RP = c(2, 5, 10, 20, 30, 50, 75, 100, 200, 500, 1000))

# View the results
Results

Extract the pooling group L-moments and 100-year growth factor:

# Extract the pooling group L-moments
PG_Lmoments <- Results[[2]]
PG_Lmoments

# Extract the 100-year growth factor from the first element of Results 
# (remember to check which row and column you need from the Results table)
GF_100yr <- Results[[1]][8, 3]
GF_100yr

You will notice that the L-moments in the results here in this scenario are the same as the weighted ones derived earlier using the WGaugLcv() and WGaugLSkew() functions because neither have an urban adjustment applied.

Plot the growth curve and flood frequency curve using the EVPool() function. Refer back to Scenario 1 for hints.

# Plot the growth curve
EVPool(Pool_90003, AMAX = NULL, gauged = TRUE, dist = "GenLog", UrbAdj = FALSE)

# Plot the flood frequency curve by including the QMED value (to convert
# the growth curve to a flood frequency curve)
EVPool(Pool_90003, AMAX = NULL, gauged = TRUE, dist = "GenLog", UrbAdj = FALSE, 
       QMED = QMED_90003, Title = "Flood frequency curve")
Pooled analysis

So far, we have completed an enhanced single-site analysis by including the gauge. For comparison, we can complete an ungauged pooled analysis at this site by removing the site from the pooling group.

Create a pooling group that does not include the gauge of interest and also excludes gauge 83002, which was found to have a short record length. Keep the other arguments as for the enhanced single-site option.

# Create the pooling group, excluding sites 90003 and 83002
# Hint: You need to create a vector of sites to exclude using the `c()` function
Pool_90003_ungauged <- Pool(CDs_90003, iug = FALSE, DeUrb = FALSE, exclude = c(90003, 83002))

# View the pooling group
Pool_90003_ungauged

# Find the total number of years in the pooling group
sum(Pool_90003_ungauged$N)

# Assess the pooling group heterogeneity
H2(Pool_90003_ungauged)

We can now use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions:

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_90003_ungauged) 

Use the pooling group to estimate the growth curve using the PoolEst() function with the generalised logistic distribution. Remember that we are NOT treating this as gauged. Set the return periods of interest to be 2, 5, 10, 20, 30, 50, 75, 100, 200, 500 and 1000 using the RP argument.

# Estimate the growth curve from the pooling group without using an urban 
# adjustment (this means that CDs do not need to be provided)
Results <- PoolEst(Pool_90003_ungauged, gauged = FALSE, QMED = QMED_90003, dist = "GenLog", UrbAdj = FALSE, RP = c(2, 5, 10, 20, 30, 50, 75, 100, 200, 500, 1000))
# View the results
Results

Scenario 3: Estimate the growth curve for a flood estimation point at NRFA gauge 69048 (enhanced single-site analysis)

This scenario represents a non-urban gauged catchment in England that is classed as suitable for QMED in the NRFA.

As for the previous scenarios, we start by clearing our environment:

Then extract the catchment descriptors using the GetCDs() function:

Use the GetCDs() function.

# Retrieve catchment descriptors 
CDs_69048 <- GetCDs(69048)
# View the catchment descriptors
CDs_69048

As this is gauged, we can estimate QMED directly from the median of the AMAX data. This site is suitable for QMED rather than pooling and therefore, we cannot use the GetAM() function to extract the AMAX data (try it and see what happens - you should get an error). If we had downloaded an xml file then we could have used the AMImport() function but we will instead use GetDataNRFA(). A key point to note about this function is that it does not automatically remove rejected years. Use the help file to fill in the blanks below.

# Extract the AMAX data and save to an object called 'AMAX_69048'
AMAX_69048 <- GetDataNRFA(69048, Type = "AMAX")

# View the AMAX data
AMAX_69048

The function outputs the AMAX record and tells us which water years are classed as rejected. We can then remove these by identifying which rows contain these water years (rows 1 and 21) and removing them as follows:

# Remove 1997 and 2017 (rejected years) from the AMAX data - the AMAX values are in
# the first element of AMAX_69048 so we need to extract the first element of the list
# using double square brackets. Then we remove the rows from this using single square
# brackets. The rejected years are in rows 1 and 21 respectively so we create a 
# vector of the rows to remove and essentially subtract those rows while keeping all 
# columns.
AMAX_69048 <- AMAX_69048[[1]][-c(1, 21), ]

# View AMAX data
AMAX_69048

# Plot the AMAX data using the AMplot function
AMplot(AMAX_69048)

The functions for pooling group formation and estimation rely on the NRFAData() data frame that sits within the package. This only contains gauges that are classed as suitable for pooling so we need to add our gauge to it if we want to include our gauge in an enhanced single-site analysis (for the purposes of this exercise, we will assume that we want to use it). You can add your gauge to both the NRFAData() and AMSP() data frames using the AddGauge() function. Use the help file for this function to fill in the blanks:

# Prepare gauge data for adding to the relevant data frames
Gauge <- AddGauge(CDs = CDs_69048, AMAX = AMAX_69048, ID = "69048")

# View Gauge
Gauge

# Bind the gauge to NRFAData using the first element of Gauge
NRFAData <- rbind(NRFAData, Gauge[[1]])

# View the last rows of NRFAData using the 'tail' function. You should be able to
# see that your gauge has been added.
tail(NRFAData)

# Bind the gauge to AMSP using the second element of Gauge
AMSP <- rbind(AMSP, Gauge[[2]])

# View the last rows of AMSP using the 'tail' function. You should be able to
# see that your gauge is at the bottom of the dataset.
tail(AMSP)

Note that these changes to the underlying data frames will only apply to your R session and will disappear once you restart R.

Now estimate QMED. Remember that the flow values are in the second column of the extracted AMAX data.

# Estimate QMED
QMED_69048 <- median(AMAX_69048[, 2])
# # View the result
QMED_69048

We can now create the pooling group. As this catchment is not urban, we do not have to set the iug argument to be TRUE. Its default setting is FALSE so we do not need to include it at all, but we can for clarity. Sites with urban catchments will be excluded from the pooling group by default and our catchment is not urban so we also do not need to deurbanise the L-CV and L-skew of urban sites in the group by setting the DeUrb argument to TRUE. This argument also has a default of FALSE but we can include it in our code for clarity.

Use the Pool function to create a pooling group.

# Create a pooling group that includes the gauge at the flood estimation 
# point (not an urban catchment)
Pool_69048 <- Pool(CDs_69048, iug = FALSE, DeUrb = FALSE)

# View the pooling group
Pool_69048

Obtain the total number of years of record across the sites in the pooling group by using the sum() function in base R:

# Sum the record length column of the pooling group to obtain the total 
# number of years
sum(Pool_69048$N)

Assess the pooling group heterogeneity using the H2() function:

# Assess the pooling group heterogeneity
H2(Pool_69048)

Create diagnostic plots to aid review of the pooling group:

# Create diagnostic plots to aid review of the pooling group
DiagPlots(Pool_69048)

For this exercise, we will assume that the pooling group does not need any alterations, but you can use methods from the previous scenarios to assess the pooling group. In practice, it looks like at least one gauge should be excluded.

We can now use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions:

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_69048) 

Use the pooling group to estimate the growth curve using the PoolEst() function with the Kappa3 distribution. Remember that we are treating this as gauged. Set the return periods of interest to be 2, 10, 25, 50, 100, 500 and 1000 using the RP argument.

# Estimate the growth curve from the pooling group without using an urban 
# adjustment (this means that CDs do not need to be included)
Results <- PoolEst(Pool_69048, gauged = TRUE, QMED = QMED_69048, dist = "Kappa3", 
                   UrbAdj = FALSE, RP = c(2, 10, 25, 50, 100, 500, 1000))
# View the results
Results

Plot the growth curve using the EVPool() function and then the flood frequency curve.

# Plot the growth curve
EVPool(Pool_69048, AMAX = NULL, gauged = TRUE, dist = "Kappa3", UrbAdj = FALSE)

# Plot the flood frequency curve by including the QMED value
EVPool(Pool_69048, AMAX = NULL, gauged = TRUE, dist = "Kappa3", UrbAdj = FALSE, 
       QMED = QMED_69048, Title = "Flood frequency curve")

Scenarios 4-6: Ungauged catchments

The focus of these scenarios is primarily on estimating QMED using donors. The pooling groups and results are found in the same way as the previous examples. These exercises involve use of xml files that contain catchment descriptor data for ungauged catchments; note that these files contain dummy data and the results should not be used outside of the training exercises.

Scenario 4: Estimate QMED for an ungauged catchment in Wales using one donor and then estimate the growth curve

As for the previous scenarios, we start by clearing our environment:

As this is an ungauged catchment, we need to import the catchment descriptors from an xml file. This can be done using the CDsXML() function:

# Extract catchment descriptors from xml file for Wales catchment
CDs_Wales <- CDsXML("Data/FEH_Catchment_Descriptors_Scenario4.xml")
CDs_Wales

We can use the DonAdj function to find donor adjustment candidates for QMED. This provides results in order of the proximity to the centroid of the subject catchment. There is no filter for urbanisation in this function, therefore this should be manually assessed when checking for suitability. See the help file for other arguments you can set.

Use the DonAdj() function.

# Find donor adjustment candidate sites. Show the first 6 rows.
Donors_Wales <- DonAdj(CDs_Wales, rows = 6)
Donors_Wales

Once you have selected your donor catchment, you can include it when estimating QMED using the QMED() function. We will use 66011 as our donor here. Use the QMED() help file to fill in the blanks.

We will assume that the URBEXT2000 value has already been adjusted for the present day and will therefore not apply an urban expansion factor. As the site is rural we are interested in the final rural QMED results. These are returned by setting the ‘UrbAdj’ argument to ‘FALSE’.

For this exercise we will also adjust the observed QMED values of the donor sites for urbanisation using the ‘DonUrbAdj’ argument, to enable a like-for-like comparison with the catchment descriptor QMED value. Note that this adjustment is equivalent to the “use QMED obs deurbanised” tick box option which is applied as default in WINFAP. If the donor site is not urban this adjustment is likely to have a minimal difference, though it is standard practice to apply for consistency.

# Estimate QMED
QMED_Wales <- QMED(CDs = CDs_Wales, Don1 = 66011, uef = FALSE, 
                   UrbAdj = FALSE, DonUrbAdj = TRUE)
QMED_Wales

Note that the QMEDDonEq() function is also available for flexibility. See the help file for this function for an example of how to use this.

Create a pooling group. This is not an urban catchment based on URBEXT2000.

# Create the pooling group
Pool_Wales_ungauged <- Pool(CDs_Wales, iug = FALSE, DeUrb = FALSE)

# View the pooling group
Pool_Wales_ungauged

# Find the total number of years
sum(Pool_Wales_ungauged$N)

# Assess the pooling group heterogeneity
H2(Pool_Wales_ungauged)

For this exercise, we will assume that the pooling group does not require editing.

We can now use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions:

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_Wales_ungauged) 

Use the pooling group to estimate the growth curve using the PoolEst() function with the Kappa3 distribution. This is not an urban catchment so we will not set an urban adjustment and therefore catchment descriptors are not required to be supplied.

Remember that this is ungauged.

# Estimate the growth curve from the pooling group without using an 
# urban adjustment
Results <- PoolEst(x = Pool_Wales_ungauged, gauged = FALSE, QMED = QMED_Wales, 
                   dist = "Kappa3", UrbAdj = FALSE)
# View the results
Results

Plot the growth curve using the EVPool() function and then the flood frequency curve.

# Plot the growth curve
EVPool(Pool_Wales_ungauged, gauged = FALSE, dist = "Kappa3", UrbAdj = FALSE)

# Plot the flood frequency curve by including the QMED value
EVPool(Pool_Wales_ungauged, gauged = FALSE, dist = "Kappa3", UrbAdj = FALSE, 
       QMED = QMED_Wales, Title = "Flood frequency curve")

The plots for the ungauged scenarios show the pooling group single-site curves and the pooled growth and flood frequency curves.

Scenario 5: Estimate QMED for an ungauged catchment in Northern Ireland using two donors then estimate the growth curve

As for the previous scenarios, we start by clearing our environment:

As this is an ungauged catchment, we need to import the catchment descriptors from an xml file. This can be done using the CDsXML() function:

# Extract catchment descriptors from xml file for Northern Ireland catchment
CDs_NI <- CDsXML("Data/FEH_Catchment_Descriptors_Scenario5.xml")
CDs_NI

We can use the DonAdj function to find donor adjustment candidates for QMED. This provides results in order of the proximity to the centroid of the subject catchment. See the help file for other arguments you can set. [TODO: Note that there is currently a bug when using Northern Ireland catchments so the donor catchments shown here are not correct. This bug has been resolved for the upcoming release of the UKFE package.]

# Find donor adjustment candidate sites. Show the first 6 rows.
Donors_NI <- DonAdj(CDs_NI, rows = 6)
Donors_NI

We will use 236005 and 203022 as our donors for this exercise. Use the QMED() help file to fill in the blanks. To use multiple donors, you will need to supply them as a vector.

We will assume that the URBEXT2000 value has already been adjusted for the present day and will therefore not apply an urban expansion factor. As the site is rural we are interested in the final rural QMED results which are returned by setting the ‘UrbAdj’ argument to ‘FALSE’.

For this exercise we will also adjust the observed QMED values of the donor sites for urbanisation using the ‘DonUrbAdj’ argument, to enable a like-for-like comparison with the catchment descriptor QMED value.

You will notice that when you use multiple donors then the associated weights for each donor are also returned from the QMED() function and therefore, we need to extract only the value of QMED in order for later functions to work.

# Estimate QMED using two donors (236005 and 203022)
QMED_NI_all_info <- QMED(CDs = CDs_NI, Don2 = c(236005, 203022), uef = FALSE, 
                         UrbAdj = FALSE, DonUrbAdj = TRUE)
QMED_NI_all_info

# Extract only the QMED value for use in later functions (this is in 
# the QMEDs.adj column)
QMED_NI <- QMED_NI_all_info$QMEDs.adj
QMED_NI

Create a pooling group. This is not an urban catchment based on URBEXT2000. This time we will not include arguments in the Pool() function that are unnecessary because we are happy with the default values. You can see the default values in the help file.

# Create the pooling group
Pool_NI_ungauged <- Pool(CDs_NI)

# View the pooling group
Pool_NI_ungauged

# Find the total number of years
sum(Pool_NI_ungauged$N)

# Assess the pooling group heterogeneity
H2(Pool_NI_ungauged)

For this exercise, we will assume that the pooling group does not require editing.

We can now use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions:

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_NI_ungauged) 

Use the pooling group to estimate the growth curve using the PoolEst() function with the generalised logistic distribution. This is not an urban catchment so we will not set an urban adjustment and therefore catchment descriptors are not required to be supplied. This time we will not include arguments in the PoolEst() and EVPool() functions that are unnecessary because we are happy with the default values. You can see the default values in the help file (for example, the gauged and UrbAdj arguments are FALSE by default). It can be helpful to include for clarity, as in previous scenarios.

Remember that this is ungauged.

# Estimate the growth curve from the pooling group without using an 
# urban adjustment
Results <- PoolEst(x = Pool_NI_ungauged, QMED = QMED_NI, dist = "GenLog")
# View the results
Results

Plot the growth curve using the EVPool() function and then the flood frequency curve.

# Plot the growth curve
EVPool(x = Pool_NI_ungauged, dist = "GenLog")

# Plot the flood frequency curve by including the QMED value
EVPool(x = Pool_NI_ungauged, dist = "GenLog", QMED = QMED_NI, Title = "Flood frequency curve")

The plots for the ungauged scenarios show the pooling group single-site curves and the pooled growth and flood frequency curves.

Scenario 6: Estimate QMED for an ungauged urban catchment in England using two donors then estimate the growth curve

As for the previous scenarios, we start by clearing our environment:

As this is an ungauged catchment, we need to import the catchment descriptors from an xml file. This can be done using the CDsXML() function:

# Extract catchment descriptors from xml file for urban catchment in England
CDs_urban <- CDsXML("Data/FEH_Catchment_Descriptors_Scenario6.xml")
CDs_urban

We can use the DonAdj function to find donor adjustment candidates for QMED. This provides results in order of the proximity to the centroid of the subject catchment. See the help file for other arguments you can set.

# Find donor adjustment candidate sites. Show the first 20 rows.
Donors_urban <- DonAdj(CDs_urban, rows = 20)
Donors_urban

We will use 54034 and 54040 as our donors for this exercise, and will assume that the URBEXT2000 value has already been adjusted for the present day. As the site is urban we are interested in the final urbanised QMED results which are returned by setting the ‘UrbAdj’ argument to ‘TRUE’.

For this exercise we will also adjust the observed QMED values of the donor sites for urbanisation using the ‘DonUrbAdj’ argument, to enable a like-for-like comparison with the catchment descriptor QMED value.

Remember that when you use multiple donors then the associated weights for each donor are also returned from the QMED() function and therefore, we need to extract only the value of QMED in order for later functions to work.

# Estimate QMED using two donors (54034 and 54040)
QMED_urban_all_info <- QMED(CDs = CDs_urban, Don2 = c(54034, 54040), uef = FALSE, 
                            UrbAdj = TRUE, DonUrbAdj = TRUE)
QMED_urban_all_info

# Extract only the QMED value for use in later functions (this is in the 
# QMEDs.adj column)
QMED_urban <- QMED_urban_all_info$QMEDs.adj
QMED_urban

Create a pooling group using the Pool() function. This site is ungauged and urban so the default settings of FALSE are appropriate for the iug and DeUrb arguments because there will be no urban gauge in the pooling group.

# Create a pooling group
Pool_urban_ungauged <- Pool(CDs_urban)

# View the pooling group
Pool_urban_ungauged

# Assess the pooling group heterogeneity
H2(Pool_urban_ungauged)

You can use the DiagPlots() function to create diagnostic plots to aid review of the pooling group:

# Create diagnostic plots
DiagPlots(Pool_urban_ungauged)

We will assume that the pooling group does not need editing for the purposes of this exercise.

Use the Zdists() function on the pooling group to assess goodness-of-fit for a range of distributions.

# Set the seed to obtain a reproducible result from the Zdists function
set.seed(1)
# Assess goodness-of-fit for a range of distributions
Zdists(Pool_urban_ungauged)

Estimate the growth curve using the PoolEst() function. Remember that this is ungauged and urban; we will apply an urban adjustment to the growth curve, which means that we will also need to supply the catchment descriptors (or the urban extent, but we will supply the catchment descriptors here). Use the Kappa3 distribution.

# Estimate the growth curve from the pooling group using an urban adjustment
Results <- PoolEst(Pool_urban_ungauged, gauged = FALSE, QMED = QMED_urban, dist = "Kappa3", 
                   UrbAdj = TRUE, CDs = CDs_urban)
Results

Plot the growth curve and flood frequency curve using the EVPool() function. The UrbAdj argument needs to be set to TRUE to apply an urban adjustment to the pooled growth curve (we will set this for consistency with the results from the PoolEst() function). If this is the case, the CDs argument also needs including.

# Plot the growth curve
EVPool(Pool_urban_ungauged, gauged = FALSE, dist = "Kappa3", UrbAdj = TRUE, CDs = CDs_urban)

# Plot the flood frequency curve by including the QMED value to use to convert the growth curve to a flood frequency curve
EVPool(Pool_urban_ungauged, gauged = FALSE, dist = "Kappa3", UrbAdj = TRUE, CDs = CDs_urban, 
       QMED = QMED_urban, Title = "Flood frequency curve")

Other useful UKFE functions

There are a range of functions that exist to provide flexibility but may not be used in a standard workflow for FEH Statistical analysis. This section covers some of these additional functions.

Hydrology at speed: Getting to know QuickResults

If you just need a quick set of results for an FEH Statistical analysis which does not involve making a lot of user decisions (but instead uses some default assumptions) then the UKFE package has a quick and easy way of obtaining a range of common results. This starts in a familiar way:

The QuickResults() function can be used to obtain a data frame of return periods and their corresponding peak flow estimates, growth factor estimates, and some uncertainty bounds. It also provides the estimated L-CV and L-skew from the pooling group. If the default return periods of 2, 5, 10, 20, 50, 75, 100, 200, 500, 1000 are chosen then it also returns the distribution parameters for the growth curve, and the distribution parameters for the frequency curve.

However, just because the QuickResults() function offers a one stop shop for generating results directly from catchment descriptors, this doesn’t mean that it lacks in customisable options for users. From this default example you may be able to guess some of the customisations available via the function’s arguments:

  • dons can be set as 0, 1, or 2 to reflect the number of donors required

  • There are other distributions available that aren’t just the default GenLog; these include GEV, Kappa3, and Gumbel

  • Plotting can be turned off by setting plot = FALSE

However, this isn’t the only customisation available; maybe we’re dealing with a gauged scenario, in which case we can specify the QMED for the gauge:

One other option to consider is that the FUngauged argument can be used to obtain an ungauged estimate by excluding the gauged site (this being the site with the most similar catchment descriptors). The results will be the same as the ungauged option above.

Let’s consolidate this knowledge with the next exercise.

Obtain some quick results for the gauge at Kingston upon Thames (let’s pretend it’s ungauged). Try using the “Gumbel” distribution, with 1 donor, and make sure you don’t forget to output a plot!

QuickResults(cds_39001, dist = "Gumbel", ______ = 1, plot = ______)

Consult the documentation or the above examples for more information about the variables in QuickResults().

# Obtain the catchment descriptors for gauge 39001
cds_39001 <- GetCDs(39001)

# Run QuickResults 
QuickResults(cds_39001, dist = "Gumbel", dons = 1, plot = TRUE)

Just add AMAX: Instant return periods with distribution-specific functions

A wide range of distributions can be applied to an annual maximum (AMAX) flow series to allow the user to estimate quantiles or return periods, e.g. for a single-site analysis. The following example uses the generalised logistic distribution:

Similar workflows follow for GEVAM, GumbelAM, and Kappa3AM.

However, the generalised Pareto distribution function works slightly differently from the others:

Let’s apply this knowledge in the following exercise.

In this exercise we would like to estimate the 1000-year return period flow and the return period of a 450m3/s flow directly from the AMAX data for gauge 27009, using the generalised logistic distribution.

# Use GetAM to extract sample AMAX data
am_27009 <- GetAM(27009)

# Estimate the 1000-year return period flow for this sample data using the 
# generalised logistic distribution
GenLogAM(am_27009$Flow, RP = ______)

# Estimate the RP for a 450m3/s discharge
GenLogAM(am_27009$Flow, q = ______)
# Use GetAM to extract sample AMAX data
am_27009 <- GetAM(27009)

# Estimate the 1000-year return period flow for this sample data using the 
# generalised logistic distribution
GenLogAM(am_27009$Flow, RP = 1000)

# Estimate the RP for a 450m3/s discharge
GenLogAM(am_27009$Flow, q = 450)

Modelling the extremes: Fitting distributions and estimating return periods

There is an alternative workflow available to users which uses the AM family of functions. This makes the parameters of the distribution available for reporting:

The GenLogPars function allows users a whole range of options for estimating the distribution parameters:

These parameters can then be used to estimate quantiles or return periods. This is similar to the GenLogAM function but with slightly more flexibility as a total workflow when combined with the previous steps:

A similar workflow exists for the functions created for the GEV, Gumbel, GenPareto, and Kappa3 distributions.

Let’s implement our learning in the next exercise.

We want to find the 100-year return period flow when gauge 27028 is approximated by the GEV distribution. This will involve the following steps:

  • Extract the AMAX data for this gauge

  • Then, estimate the parameters for the GEV distribution, without using maximum likelihood estimation

  • Use these parameters to estimate the 100-year return period flow

# Use GetAM to extract sample AMAX data
am_27028 <- GetAM(27028)

# Subset for just the flow AMAX values
flow_27028 <- am_27028$______

# Estimate the GEV parameters from sample data 
# (do not use maximum likelihood estimation)
gev_pars_27028 <- GEVPars(flow_27028, mle = _______)
gev_pars_27028

# Estimate the 100-year return period flow for this sample data using the 
# GEV distribution
gev_pars_27028 <- as.numeric(gev_pars_27028)
GEVEst(loc = gev_pars_27028[1], scale = gev_pars_27028[2], 
          shape = gev_pars_27028[3], RP = _______)
# Use GetAM to extract sample AMAX data
am_27028 <- GetAM(27028)

# Subset for just the flow AMAX values
flow_27028 <- am_27028$Flow

# Estimate the GEV parameters from sample data 
# (do not use maximum likelihood estimation)
gev_pars_27028 <- GEVPars(flow_27028, mle = FALSE)
gev_pars_27028

# Estimate the 100-year return period flow for this sample data using the 
# GEV distribution
gev_pars_27028 <- as.numeric(gev_pars_27028)
GEVEst(loc = gev_pars_27028[1], scale = gev_pars_27028[2], 
          shape = gev_pars_27028[3], RP = 100)

Taking a moment: Using L-moments in UKFE

Moments before the flood: Finding growth factors from L-moments

It’s very simple to use the Lcv(), LSkew(), LKurt(), and LMoments() functions to obtain the L-moments of a set of sample data. You will see that the LMoments() function includes the preceding functions:

Now that the various L-moments have been calculated for this gauge, we can easily obtain growth factors for a whole range of distributions:

The GenLogGF(), GenParetoGF(), GumbelGF(), and Kappa3GF() functions are all similar except that Gumbel() does not use L-skew and GenParetoGF() has an extra peaks per year (ppy) setting.

Let’s jump into the next exercise!

Find the L-CV and L-skew for gauge 54906 and then estimate the 1000-year growth factor using the Gumbel distribution.

# Use GetAM to extract sample AMAX data for gauge 54906
am_54906 <- GetAM(54906)$Flow

# Calculate the L-CV and L-skew of the sample
lcv_54906 <- _______(am_54906)
lskew_54906 <- _______(am_54906)

# Estimate the 1000-year growth factor from the L-CV obtained in 
# the previous step using the Gumbel distribution
GumbelGF(lcv_54906, RP = _______)

Remember to use the Lcv and LSkew functions in the right place.

We want the 1000-year growth factor.

# Use GetAM to extract sample AMAX data for gauge 54906
am_54906 <- GetAM(54906)$Flow

# Calculate the L-CV and L-skew of the sample
lcv_54906 <- Lcv(am_54906)
lskew_54906 <- LSkew(am_54906)

# Estimate the 1000-year growth factor from the L-CV obtained in 
# the previous step using the Gumbel distribution
GumbelGF(lcv_54906, RP = 1000)

How’s the water? A deep dive into pooling analysis

This section recaps ideas presented in the FEH scenarios, plus some additional options.

This setup will be a familiar workflow when using the UKFE package:

At this point, there are a range of options available to users. If a user is interested in the heterogeneity of their pooling group then they can leverage the H2() function to calculate this, with an option to calculate the H1 version of the test as well:

It is also very simple to calculate the Z distance to see this goodness of fit measure of the pooling group:

It is possible that users may be interested in viewing the gauged L-CV and L-skew weights of their pooling group:

A similar workflow is available if the scenario is ungauged, using WeightsUnLcv and WeightsUnLSkew.

Let’s recap the learning from this section with another exercise.

In this exercise, we’d like to find out if we create a pooling group for gauge 49008, how heterogeneous it is using just the H2 score (set H1 accordingly), which distribution best fits our pooling group, and finally, the gauged L-CV and L-skew weights for each site in the pooling group.

# Get catchment descriptors for gauge 49008
cds_49008 <- GetCDs(49008)

# Create a pooling group for gauge 49008
pool_49008 <- Pool(cds_49008)

# Calculate the heterogeneity measure H2 of the pooling group
H2(pool_49008, H1 = ______)

# Calculate the goodness of fit score for the pooling group
_______(pool_49008)

# View the gauged L-CV and L-skew weights for each site in the pooling group
WeightsGLcv(pool_49008) # L-CV
WeightsGLSkew(pool_49008) # L-skew

Remember, we are only interested in the H2 score so H1 needs to be set accordingly. Consult the earlier examples from this section for a similar workflow.

# Get catchment descriptors for gauge 49008
cds_49008 <- GetCDs(49008)

# Create a pooling group for gauge 49008
pool_49008 <- Pool(cds_49008)

# Calculate the heterogeneity measure H2 of the pooling group
H2(pool_49008, H1 = FALSE)

# Calculate the goodness of fit score for the pooling group
Zdists(pool_49008)

# View the gauged L-CV and L-skew weights for each site in the pooling group
WeightsGLcv(pool_49008) # L-CV
WeightsGLSkew(pool_49008) # L-skew

Moments in the city: Using LcvUrb and LSkewUrb for urban adjustment

We can use the QuickResults function to obtain pooled L-CV and L-skew estimates for gauge 53006 using just the catchment descriptors:

From here, it is straightforward to perform an urban adjustment to the L-CV and L-skew:

If we had set the DeUrb argument to TRUE then the de-urbanisation adjustment would have been performed instead.

Let’s explore a similar workflow in the following exercise.

Use QuickResults() to extract some basic estimates for the pooled L-CV and L-skew of gauge 39003 and then urbanise them.

# Get catchment descriptors for gauge 39003
cds_39003 <- GetCDs(39003)

# Extract the pooled L-CV and L-skew of the pooling group
lmom_39003 <- _______(cds_39003, plot = FALSE)[[2]]
lmom_39003

# Urbanise the L-CV and L-skew
lcv_39003 <- LcvUrb(lmom_39002[1], URBEXT2000 = cds_39003$Value[cds_39003$Descriptor == "URBEXT2000"], DeUrb = _______)
lskew_39003 <- LSkewUrb(lmom_39003[2], URBEXT2000 = cds_39003$Value[cds_39003$Descriptor == "URBEXT2000"], DeUrb = _______)
lcv_39003
lskew_39003

Make sure you remember to urbanise rather than de-urbanise the L-CV and L-skew.

# Get catchment descriptors for gauge 39003
cds_39003 <- GetCDs(39003)

# Extract the pooled L-CV and L-skew of the pooling group
lmom_39003 <- QuickResults(cds_39003, plot = FALSE)[[2]]
lmom_39003

# Urbanise the L-CV and L-skew
lcv_39003 <- LcvUrb(lmom_39003[1], URBEXT2000 = cds_39003$Value[cds_39003$Descriptor == "URBEXT2000"], DeUrb = FALSE)
lskew_39003 <- LSkewUrb(lmom_39003[2], URBEXT2000 = cds_39003$Value[cds_39003$Descriptor == "URBEXT2000"], DeUrb = FALSE)
lcv_39003
lskew_39003

Going the distance: Getting to know NGRDist

The NGRDist() function is a very simple function for finding the distance (in kilometres) between two British National Grid reference points.

Let’s find out the Pythagorean distance between the gauge for the Weisdale Burn at Weisdale Mill and the Penberth River at Penberth in the following exercise. As this map displays, they’re quite far apart!

Extract the catchment descriptors for the two gauges, then use the values of Easting and Northing to calculate the distance (in km) between them using the NGRDist() function.

Consult the above example for more information about how to use the catchment descriptors in the NGRDist() function.

# Obtain the catchment descriptors for gauges 108001 and 49008 using the GetCDs function
cds_108001 <- GetCDs(108001)
cds_49008 <- GetCDs(49008)

# Calculate the Pythagorean distance between the two sets of coordinates
NGRDist(i = c(cds_108001$Value[cds_108001$Descriptor == "Easting"], 
              cds_108001$Value[cds_108001$Descriptor == "Northing"]), 
        j = c(cds_49008$Value[cds_49008$Descriptor == "Easting"], 
              cds_49008$Value[cds_49008$Descriptor == "Northing"]))

When nothing happens: Adjusting for non-flood years with NonFloodAdj and NonFloodAdjPool

The NonFloodAdj function can be used to calculate L-CV and L-skew while accounting for any non-flood years.

A similar workflow follows for the NonFloodAdjPool function, which allows this same adjustment to be made to all, or a subset of, sites in a pooling group:

The NonFloodAdjPool has several options for allowing the user more freedom.

If we would like to view the non-flood year statistics before identifying the indices to adjust or the percentage threshold we would like to use then we can simply set ReturnStats = TRUE as follows:

This then allows the user to make an informed decision on the indices they would like to adjust or the non-flood year percentage they would like to allow:

These adjusted L-CV and L-skew values can then be used for deriving growth factors using a range of distributions.

Let’s consolidate our learning with a couple of exercises:

Use the GetAM() function to extract the AMAX series for gauge 24009, then subset this to just the flow values to use as the input to the NonFloodAdj() function.

# Extract the AMAX series for gauge 24009
am_24009 <- GetAM(24009)

# Remove the non-flood years from the AMAX series and calculate the adjusted 
# L-CV and L-skew
NonFloodAdj(am_24009______)

Consult the above examples for more information about how to use the NonFloodAdj() function.

# Extract the AMAX series for gauge 24009
am_24009 <- GetAM(24009)

# Remove the non-flood years from the AMAX series and calculate the adjusted 
# L-CV and L-skew
NonFloodAdj(am_24009[,2])

Let’s explore a potential workflow in the following exercise.

We want to build a pooling group for gauge 11002. We can then use this in the NonFloodAdjPool() function to identify the gauge with the highest non-flood year percentage and then apply the adjustment to just this gauge.

# Obtain the catchment descriptors for gauge 11002
cds_11002 <- GetCDs(11002)

# Build the pooling group for gauge 11002
pool_11002 <- Pool(cds_11002)

# Return the Non Flood stats
NonFloodAdjPool(pool_11002, ReturnStats = TRUE)

# Apply the NonFloodAdjPool function to the index with the highest 
# percentage of non-flood years
pool_11002_nf <- NonFloodAdjPool(pool_11002, Index = _______)

Check which row has the highest value for the PercentNonFlood column and use this as the index in the NonFloodAdjPool() function.

# Obtain the catchment descriptors for gauge 11002
cds_11002 <- GetCDs(11002)

# Build the pooling group for gauge 11002
pool_11002 <- Pool(cds_11002)

# Return the Non Flood stats
NonFloodAdjPool(pool_11002, ReturnStats = TRUE)

# Apply the NonFloodAdjPool function to the index with the highest 
# percentage of non-flood years
pool_11002_nf <- NonFloodAdjPool(pool_11002, Index = c(5))

Uncertainty

The uncertainty of gauged or ungauged pooled estimates can be obtained in the UKFE package via the following workflow:

This is the most basic example, however, the Uncertainty function offers a plethora of options to users:

What if we don’t want to use the default distribution of generalised logistic? And maybe we want to use an urban adjustment:

I hear you thinking, what if I want the bootstrapping to be performed by resampling with replacement instead of by simulation with a distribution?

Sometimes the central estimate of the single site estimate is outside the intervals of the enhanced single site estimate. UKFE anticipates this by providing the argument IncAMest which allows users to expand the confidence interval to include the single site central estimate. However, we can choose to turn this off if we prefer:

Maybe we’ve been asked for 50% confidence levels, a factorial standard error of 1.2 and no plot?

We can then use the data frame of return periods and their corresponding peak flow estimates generated by the Uncertainty function to estimate distribution parameters using the OptimPars function:

As with lots of other functions, the distribution is set as the generalised logistic distribution by default but is easily customised according to the user’s requirements with GEV, Kappa3, and Gumbel all available.

Let’s try to apply this potential workflow in the following exercise.

Follow the example workflow:

  • Extract the catchment descriptors

  • Use these to form a pooling group, but exclude the gauge itself to obtain the ungauged pooling group

  • Then use this in the Uncertainty() function with the GEV distribution and 90% confidence intervals. Make sure you set the Gauged argument appropriately. Then estimate QMED using the UKFE function, and don’t output a plot.

  • The output from this can be used as an input for the OptimPars function. Use the same distribution as for the Uncertainty() function for consistency.

# Obtain the catchment descriptors for gauge 203018
cds_203018 <- GetCDs(203018)

# Create the ungauged pooling group for gauge 203018 by excluding it
# from the process
pool_ung_203018 <- Pool(cds_203018, exclude = 203018)

# Quantify the uncertainty of the ungauged pooled results
uncertainty_203018 <- Uncertainty(pool_ung_203018, Dist = "GEV", Conf = ______, 
                                  Gauged = ______, qmed = QMED(cds_203018), Plot = ______)
                                  
# Extract the return periods and their respective flows
flow_est_203018 <- uncertainty_203018[, 1:2]

# Estimate the distribution parameters for gauge 203018
OptimPars(flow_est_203018, dist = ______)

Consult the documentation or the above examples for more information about the variables in Uncertainty() and OptimPars().

# Obtain the catchment descriptors for gauge 203018
cds_203018 <- GetCDs(203018)

# Create the ungauged pooling group for gauge 203018 by excluding it 
# from the process
pool_ung_203018 <- Pool(cds_203018, exclude = 203018)

# Quantify the uncertainty of the ungauged pooled results
uncertainty_203018 <- Uncertainty(pool_ung_203018, Dist = "GEV", Conf = 0.9, 
                                  Gauged = FALSE, qmed = QMED(cds_203018), Plot = FALSE)

# Extract the return periods and their respective flows
flow_est_203018 <- uncertainty_203018[, 1:2]

# Estimate the distribution parameters for gauge 203018
OptimPars(flow_est_203018, dist = "GEV")

Modelling storm response with ReFH

The UKFE package aims to replicate the ReFH model in an easy-to-use but highly customisable function:

Further customising arguments are available to users:

Suppose we’ve been given the maximum soil moisture capacity, initial soil moisture content, time to peak, baseflow lag, baseflow recharge, event rainfall, catchment area, initial baseflow, the return period for alpha adjustment, and finally, the season. UKFE’s ReFH function can accommodate this as follows:

Now let’s apply this function in the next exercise:

Use the catchment descriptors for gauge 203018 as the input for the ReFH() function with the following customisations:

  • Set the duration as 20

  • Set the plot title as “Design Hydrograph - Site 203018”

  • Set the season to summer

  • Use a return period for alpha adjustment of 0.25 (remember about the alpha argument)

The required arguments are CDs, duration, PlotTitle, season, alpha, and RPa. 

Consider the previous examples and help documentation for what values should be used.
# Obtain the catchment descriptors for gauge 203018
cds_203018 <- GetCDs(203018)

# Run ReFH 
ReFH(
  CDs = cds_203018,
  duration = 20,
  PlotTitle = "Design Hydrograph - Site 203018",
  season = "summer",
  alpha = TRUE,
  RPa = 0.25
)   

There are many more combinations of parameters for you to explore and leverage!

When the catchment goes concrete: Understanding UAF

Although the urban adjustment is made available directly within the QMED function, it is also made available as a standalone function for full transparency and reporting purposes. The below exercise shows how UAF (urban adjustment factor) and PRUAF (percentage runoff urban adjustment factor) can be calculated:

Now apply this in the following exercise:

Use the GetCDs function to obtain the catchment descriptors for NRFA gauge 53006 and then use these catchment descriptors as the input to the UAF() function to obtain the UAF and PRUAF.

# Obtain the catchment descriptors for gauge 53006
cds_53006 <- GetCDs(53006)  

Now use this as the input for the `UAF()` function.
# Obtain the catchment descriptors for gauge 53006
cds_53006 <- GetCDs(53006)                              

# Calculate the UAF and PRUAF of gauge 53006
UAF(cds_53006)