# fasstr Users Guide

fasstr, the Flow Analysis Summary Statistics Tool for R, is a set of R functions to tidy, summarize, analyze, trend, and visualize streamflow data. This package summarizes continuous daily mean streamflow data into various daily, monthly, annual, and long-term statistics, completes trending and frequency analyses, with outputs in both table and plot formats.

This vignette documents the usage of the many functions and arguments provided in fasstr. This vignette is a high-level adjunct to the details found in the various function documentations (see help(package = "fasstr") for documentation). You’ll learn how to install the package and a HYDAT database, input data into fasstr functions, add relevant columns and rows to daily data, screen data for outliers and missing dates, calculate and visualize various summary statistics, trend annual flows, and complete volume frequency analyses.

A quick reference PDF cheat sheet is also available for fasstr usage of functions and arguments. It can be downloaded here.

This guide contains the following sections to help understand the usage of the fasstr functions and arguments:

1. Getting Started
2. Flow Data Inputs
3. Function Outputs
4. Data Tidying (fill_* and add_* functions)
5. Data Screening (screen_* functions)
6. Calculating Statistics (calc_* functions)
7. Analyses (compute_* functions)
8. Customizing Functions - Data filtering and options
9. Writing Tables and Plots (write_* functions)

## 1. Getting Started

You can install fasstr directly from CRAN:

install.packages("fasstr")

To install the development version from GitHub, use the remotes package then the fasstr package:

if(!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("bcgov/fasstr")

Several other packages will be installed with fasstr. These include tidyhydat for downloading Water Survey of Canada hydrometric data, zyp for trending, ggplot2 for creating plots, and tidyr and dplyr for data wrangling and summarizing, amongst others.

To call fasstr functions you can either load the package using the library() function or access a specific function using a double-colon (e.g. fasstr::calc_daily_stats()). fasstr exports the pipe, %>%, so it can be used for tidy workflows.

library(fasstr)

To use the station_number argument of the fasstr functions, you will need to download a Water Survey of Canada HYDAT database to your computer using the following tidyhydat function. The function will save the database on your computer and know where to find it each time you open R or RStudio. Due to the size of the database, it will take several minutes to download.

tidyhydat::download_hydat()

As HYDAT is updated frequently you may want to periodically update it yourself using the function above. You can check the local version using the following code:

tidyhydat::hy_version()

## 2. Flow Data Inputs

All functions in fasstr require a daily mean streamflow data set from one or more hydrometric stations. Long-term and continuous data sets are preferred for most analyses, but seasonal and partial data can be used. Note that if partial data sets are used, NA‘s may be produced for certain statistics. Please see the ’Handling Missing Dates’ section in Section 8 for more information. Data is provided to each function using one of the following arguments:

• data, as a data frame of daily flow values, or
• station_number, as a list of Water Survey of Canada HYDAT station numbers.

### data (and dates, values, and groups)

Using the data option, a data frame of daily data containing columns of dates (YYYY-MM-DD in date format), values (mean daily discharge in cubic metres per second in numeric format), and, optionally, grouping identifiers (character string of station names or numbers) is called. By default, the functions will look for columns identified as ‘Date’, ‘Value’, and ‘STATION_NUMBER’, respectively, to be compatible with the HYDAT default columns. However, columns of different names can be identified using the dates, values, groups column arguments (ex. values = Yield_mm). The values of these arguments are not required to be surrounded by quotes; both "Date" and Date will provide the appropriate column called “Date”. An example where groupings other than station numbers could be used include certain time periods of a study for a single station (before, during, and after watershed experiment treatments or before and after the construction of a dam, appropriately identified in a column). The following is an example of an appropriate data frame with default column names (STATION_NUMBER not required):

  STATION_NUMBER       Date Value
1        08NM116 1949-04-01  1.13
2        08NM116 1949-04-02  1.53
3        08NM116 1949-04-03  2.07
4        08NM116 1949-04-04  2.07
5        08NM116 1949-04-05  2.21
6        08NM116 1949-04-06  2.21

The following is an example fasstr function arguments if your daily data data frame has the default columns names (no need to list them):

calc_longterm_daily_stats(data = flow_data)

The following is an example if your daily data data frame has non-default columns names of “Stations”, “Dates”, and “Flows”:

calc_longterm_daily_stats(data = flow_data,
dates = Dates,
values = Flows,
groups = Stations)

The data argument is listed first in the list of arguments for each function, so flow data frames can be passed onto fasstr functions using the pipe operator, %>%, without listing the data frame in a tidy workflow.

### station_number

Alternatively, you can directly extract flow data directly from a HYDAT database by listing station numbers in the station_number argument while leaving the data arguments blank. Data frames from HYDAT also include ‘Parameter’ and ‘Symbol’ columns. The following is an example of listing stations:

calc_longterm_daily_stats(station_number = "08NM116")
calc_longterm_daily_stats(station_number = c("08NM116", "08NM242"))

This package allows for multiple stations (or other groupings) to be analyzed in many of the functions; provided they are identified using the groups column argument (defaults to STATION_NUMBER). If named grouping column doesn’t exist or is improperly named then all values listed in the values column will be summarized.

## 3. Function Types and Outputs

fasstr provides various functions to help in streamflow analyses. They can be generally categorized into the following groups (with more details in the sections below):

• data tidying (to prepare data for analyses; add_* and fill_* functions),
• data screening (to look for outliers and missing data; screen_* functions),
• calculating summary statistics (long-term, annual, monthly and daily statistics; calc_*functions),
• computing analyses (volume frequency analyses and trending; compute_* functions),
• visualizing data (plotting the various statistics; plot_* functions), and
• writing data (to save your data and plots; write_* functions)

### Tibble Data Frames

Functions that produce tables create them as tibble data frames. To facilitate the writing of the fasstr tibbles to a directory as .csv, .xls, or .xlsx files with some functionality of rounding digits, the write_results() function can be used (see section 9 for more information).

### ggplot2 Plots

Functions that produce plots create them as lists of ggplot2 objects. The use of ggplot2 plots allows for further customization of plots for the user (axis titles, colours, etc.). All plotting functions produce lists to be consistent with table naming conventions of fasstr, allow multiple plots to be created with one function, and to easily allow the saving of multiple plots to a directory. To assist with the saving of lists of plots, a provided function called write_plots() will directly save the list of plots within a directory or single PDF document, with the fasstr plot objects names (see section 9 for more information). Individual plots can be subsetted from their lists using either the dollar sign, $(e.g. one_plot <- plots$plotname), or double square brackets, [ ] (e.g. one_plot <- plots[[plotname]] or one_plot <- plots[[1]]).

Some functions produce both tibbles and plots as lists and can be subsequently subsetted as desired.

## 4. Data Tidying Functions

There are several functions that are used to prepare your flow data set for your own analysis. These functions begin with add_ or fill_ and add columns or rows, respectively, to your flow data frame. These functions include:

• fill_missing_dates() - fills in missing dates or dates with no flow values with NA
• add_date_variables() - add year, month, and day of year variables (and water years if selected)
• add_seasons() - add a column of seasons
• add_rolling_means() - add rolling n-day averages (e.g. 7-day rolling average)
• add_basin_area() - add a basin area column to daily flows
• add_daily_volume() - add daily volumetric flows (in cubic metres)
• add_daily_yield() - add daily water yields (in millimetres)
• add_cumulative_volume() - add daily cumulative volumetric flows on an annual basis (in cubic metres)
• add_cumulative_yield() - add daily cumulative water yields on an annual basis (in millimetres)

The functions are set up to easily incorporate the use of the pipe operator:

fill_missing_dates(station_number = "08HA011") %>%
add_rolling_means(roll_days = 7)
# A tibble: 21,915 x 11
STATION_NUMBER Date       Parameter Value Symbol CalendarYear Month MonthName
<chr>          <date>     <chr>     <dbl> <chr>         <dbl> <dbl> <fct>
1 08HA011        1960-01-01 Flow       62.9 E              1960     1 Jan
2 08HA011        1960-01-02 Flow       58   E              1960     1 Jan
3 08HA011        1960-01-03 Flow       54.9 E              1960     1 Jan
4 08HA011        1960-01-04 Flow       51.3 E              1960     1 Jan
5 08HA011        1960-01-05 Flow       47.3 <NA>           1960     1 Jan
6 08HA011        1960-01-06 Flow       46.7 <NA>           1960     1 Jan
7 08HA011        1960-01-07 Flow       43.9 E              1960     1 Jan
8 08HA011        1960-01-08 Flow       41.9 E              1960     1 Jan
9 08HA011        1960-01-09 Flow       40.8 E              1960     1 Jan
10 08HA011        1960-01-10 Flow       38.5 E              1960     1 Jan
# ... with 21,905 more rows, and 3 more variables: WaterYear <dbl>,
#   DayofYear <dbl>, Q7Day <dbl>

### Filling missing dates

To ensure that analyses do not skip over dates, the fill_missing_dates() function looks for gaps in dates and adds the dates and fills in the flow values with NA. It does not do any gap filling (linear or correlations, for example), it assigns missing flow values with NA. It also fills dates to create complete start and end years. For example, if data starts in April, all flow values starting from January will be filled with NA. The timing of the year depends on the water_year_start argument. When water_year_start is left blank, it will fill to complete calendar years (Jan-Dec). If water_year_start is set to another month (numeric) then it will fill to complete water years of the desired year.

Run and compare the following lines to see how missing dates are filled:

# Very gappy (early years):
tidyhydat::hy_daily_flows(station_number = "08NM116")

# Gap filled with NA's
tidyhydat::hy_daily_flows(station_number = "08NM116") %>%
fill_missing_dates()

It is ideal to fill missing dates before using other add_* functions so dates added are not missing the other new date values.

### Adding date variables and seasons

The add_date_variables() function adds useful dates columns for summarizing data. The function defaults include ‘CalendarYear’, ‘Month’ (numeric), ‘MonthName’ (month abbreviation; e.g. Jan), ‘WaterYear’ (year based on selected water_year_start), and ‘DayofYear’ (the day of year based on selected water_year_start from 1-365). The month of the start of the water year is chosen using the water_year_start argument, which defaults to “1” for January.

Run and compare the following lines to see how the date columns are added:

# Just calendar year info

# If water years are required starting August (use month number)
water_year_start = 8)

The add_seasons() function adds a column of seasons identifiers called “Season”. The length of seasons, in months, is provided using the seasons_length argument. As seasons are grouped by months the length of the seasons must be divisible into 12 with season lengths of 1, 2, 3, 4, 6, or 12 months. The start of the first season coincides with the start month of each year; ‘Jan-Jun’ for 6-month seasons starting with calendar years or ‘Dec-Feb’ for 3-month seasons starting with water year starting in December. Run and compare the following lines to see how seasons columns are added:

#  2 seasons starting January
seasons_length = 6)

#  4 seasons starting October
water_year_start = 10,
seasons_length = 3)

#  4 Seasons starting December
water_year_start = 12,
seasons_length = 3)

Adding rolling means (running means or averages) of daily data, can be done using the add_rolling_means() functions. Based on the selected “n” rolling days using the roll_days argument, a column for each “n” will be added. One rolling mean column can be added by listing one number (e.g. roll_days = 7) or multiple columns can be added by listing each one (e.g. roll_days = c(3,7,30)). Each column will be named “Q’n’Day” where n is the number (e.g. Q7Day or Q30Day).

Where the alignment of the rolling mean is compared to the date is important to know when analyzing data. The alignment, using the roll_align argument, determine the date at which the rolling means occur.

• roll_align = "right" - the date will have the mean of that date’s flow value and the previous n-1 days
• roll_align = "left" - the date will have the mean of that date’s flow value and the next n-1 days
• roll_align = "center"
• odd numbered roll_days - date will have the mean of that date’s flow value and half of n-1 days before and half of n-1 days after
• even numbered roll_days - date will have the mean of that date’s flow and half of n days after, and the remaining before ((n/2)-1 days before the date) (i.e. the first of the middle two dates)

# A tibble: 6 x 5
Date       Value Q5Day_left Q5Day_center Q5Day_right
<date>     <dbl>      <dbl>        <dbl>       <dbl>
1 1960-01-01  62.9       54.9         NA          NA
2 1960-01-02  58         51.6         NA          NA
3 1960-01-03  54.9       48.8         54.9        NA
4 1960-01-04  51.3       46.2         51.6        NA
5 1960-01-05  47.3       44.1         48.8        54.9
6 1960-01-06  46.7       42.4         46.2        51.6

Even roll_days example:

# A tibble: 6 x 5
Date       Value Q6Day_left Q6Day_center Q6Day_right
<date>     <dbl>      <dbl>        <dbl>       <dbl>
1 1960-01-01  62.9       53.5         NA          NA
2 1960-01-02  58         50.4         NA          NA
3 1960-01-03  54.9       47.7         53.5        NA
4 1960-01-04  51.3       45.3         50.4        NA
5 1960-01-05  47.3       43.2         47.7        NA
6 1960-01-06  46.7       41.5         45.3        53.5

To add a column of basin areas, for viewing or analyzing, the add_basin_area() function can be used. The basin area will be extracted from HYDAT, if available, under two conditions where the basin_area argument can be left blank:

• if the station_number argument is used
• if your data data frame has a grouping column consisting of HYDAT station numbers

If you would like to apply your own basin area size(s) or override the HYDAT areas, you use the basin_area argument in the following ways:

• for a single station or applying to all stations, list a single number (i.e. basin_area = 800)
• for different areas for multiple stations, you list each basin area for each station (i.e. basin_area = c("08NM116" = 800, "08NM242" = 4))

Run and compare the following lines to see how basin area columns are added:

# Using the station_number argument or data frame as HYDAT groupings

# Using the basin_area argument
basin_area = 800)

# Using the basin_area argument with multiple stations
basin_area = c("08NM116" = 800, "08NM242" = 4))

### Adding daily volumetric discharge or water yields

Converting daily mean discharge into other units can be useful for different analyses. Columns of total daily discharge converted from daily mean into volumetric flows, named “Volume_m3” in cubic metres per second, or area-based water yields, named “Yield_mm” in millimetres, can be used using the add_daily_volume() and add_daily_yield() functions, respectively. Volumetric gives the total volume per day, and the water yield gives the total water depth, provided an upstream drainage basin area is provided. Basin area can be provided using the basin_area argument, or if there is a groups column of HYDAT station numbers in your data then it will automatically be extracted from HYDAT, if available. (see adding basin areas above or section 8 for more information).

# Add a column of converted discharge (m3/s) into volume (m3)

# Add a column of converted discharge (m3/s) into yield (mm), with HYDAT station groups

# Add a column of converted discharge (m3/s) into yield (mm), with setting the basin area
basin_area = 800)   

### Adding annual cumulative daily volumetric flows or water yields

These functions create a rolling cumulative of daily total flows on an annual basis, as volumetric flows, named “Cumul_Volume_m3” in cubic metres per second, or area-based water yields, named “Cumul_Yield_mm” in millimetres. A total flow for a given a day is the sum of all previous days and that day, within a given year (Jan 15 cumulative flow value is the sum of all total flows from Jan 1-15). It restarts for each year (based on the starting month) and no values for a year are calculated if there is missing data for a given year as the total for a given year cannot be determined.

# Add a column of cumulative volumes (m3)

# Add a column of cumulative yield (mm), with HYDAT station number groups

# Add a column of cumulative yield (mm), with setting the basin area
basin_area = 800)  

### Pipelines

By utilizing the data argument as the first one list, it enables the user to work with the tidying functions within a tidy ‘pipeline’ and can pass onto the other fasstr functions.

fill_missing_dates(station_number = "08NM116") %>%
add_cumulative_yield()
# A tibble: 25,202 x 19
STATION_NUMBER Date       Parameter Value Symbol CalendarYear Month MonthName
<chr>          <date>     <chr>     <dbl> <chr>         <dbl> <dbl> <fct>
1 08NM116        1949-01-01 Flow         NA <NA>           1949     1 Jan
2 08NM116        1949-01-02 Flow         NA <NA>           1949     1 Jan
3 08NM116        1949-01-03 Flow         NA <NA>           1949     1 Jan
4 08NM116        1949-01-04 Flow         NA <NA>           1949     1 Jan
5 08NM116        1949-01-05 Flow         NA <NA>           1949     1 Jan
6 08NM116        1949-01-06 Flow         NA <NA>           1949     1 Jan
7 08NM116        1949-01-07 Flow         NA <NA>           1949     1 Jan
8 08NM116        1949-01-08 Flow         NA <NA>           1949     1 Jan
9 08NM116        1949-01-09 Flow         NA <NA>           1949     1 Jan
10 08NM116        1949-01-10 Flow         NA <NA>           1949     1 Jan
# ... with 25,192 more rows, and 11 more variables: WaterYear <dbl>,
#   DayofYear <dbl>, Season <fct>, Q3Day <dbl>, Q7Day <dbl>, Q30Day <dbl>,
#   Basin_Area_sqkm <dbl>, Volume_m3 <dbl>, Yield_mm <dbl>,
#   Cumul_Volume_m3 <dbl>, Cumul_Yield_mm <dbl>

## 5. Data Screening Functions

If you are looking at some data for the first time, it may be useful to explore the data quality and availability. The following functions will help to explore the data:

• plot_flow_data() - plot daily mean streamflow
• screen_flow_data() - calculate annual summary and identify missing data
• plot_data_screening() - plot annual summary statistics for data screening
• plot_missing_dates() - plot annual and monthly missing dates

To view the entire daily flow data set to view for gaps and outliers, or changes in flow over time, the plot_flow_data() function will plot all daily data in the data frame. The plot can be filtered by years and dates.

plot_flow_data(station_number = "08NM116") 
$Daily_Flows When plotting multiple stations, they automatically produce a separate plot for each station. However, setting one_plot = TRUE will plot all stations on the same plot. plot_flow_data(station_number = c("08NM241", "08NM242"), one_plot = TRUE)  $Daily_Flows

The screen_flow_data() function provides an overview of the number of flow values per year and each month per year, along with annual minimums, maximums, means, and standard deviations to inspect for outliers in the data.

screen_flow_data(station_number = "08NM116")
# A tibble: 69 x 22
STATION_NUMBER  Year n_days   n_Q n_missing_Q Minimum Maximum  Mean Median
<chr>          <dbl>  <int> <int>       <int>   <dbl>   <dbl> <dbl>  <dbl>
1 08NM116         1949    365   183         182   0.623    49.3  7.77   2.27
2 08NM116         1950    365   183         182   0.623    52.1  7.76   2.07
3 08NM116         1951    365   183         182   0.623    49.3  8.99   3.71
4 08NM116         1952    366   183         183   0.850    50.7 10.3    3.17
5 08NM116         1953    365   183         182   0.340    62.3  8.30   4.56
6 08NM116         1954    365   183         182   0.566    36.2 11.3    5.38
7 08NM116         1955    365   160         205   0.396    34    8.97   4.02
8 08NM116         1956    366   176         190   0.719    38.5  9.04   3.97
9 08NM116         1957    365   170         195   0.680    42.5  8.88   2.44
10 08NM116         1958    365   183         182   0.311    34    6.98   2.32
# ... with 59 more rows, and 13 more variables: StandardDeviation <dbl>,
#   Jan_missing_Q <int>, Feb_missing_Q <int>, Mar_missing_Q <int>,
#   Apr_missing_Q <int>, May_missing_Q <int>, Jun_missing_Q <int>,
#   Jul_missing_Q <int>, Aug_missing_Q <int>, Sep_missing_Q <int>,
#   Oct_missing_Q <int>, Nov_missing_Q <int>, Dec_missing_Q <int>

To view the summary data in the screen_flow_data() function, the plot_data_screening() function will plot the annual minimums, maximums, means, and standard deviations.

plot_data_screening(station_number = "08NM116") 
$Data_Screening Use the plot_missing_dates() function to plot out the missing dates for each month of each year to view for data availability and gaps. plot_missing_dates(station_number = "08NM116")  $Missing_Dates

## 6. Functions for Calculating Statistics

The majority of the fasstr functions produce statistics over a certain time period, either long-term, annually, monthly, or daily. These statistics are produced using the calc_* functions and can be visualized using their corresponding plot_* functions. The following sections are an overview of these functions.

### Basic Summary Statistics

These functions calculate the means, medians, maximums, minimums, and percentiles (choose using the percentiles argument) of a flow data set:

• calc_longterm_daily_stats() - calculate the long-term and long-term monthly summary statistics based on daily mean flows
• calc_longterm_monthly_stats() - calculate the long-term annual and monthly summary statistics based on monthly mean flows
• calc_annual_stats() - calculate annual summary statistics
• calc_monthly_stats() - calculate annual monthly summary statistics
• calc_daily_stats() - calculate daily summary statistics

These basic statistics can also be viewed using their corresponding plotting functions:

• plot_longterm_daily_stats() - plot the long-term monthly summary statistics based on daily mean flows
• plot_longterm_monthly_stats() - plot the long-term monthly summary statistics based on annual monthly mean flows
• plot_annual_stats() - plot annual summary statistics
• plot_monthly_stats() - plot annual monthly summary statistics
• plot_daily_stats() - plot daily summary statistics

This function produced flow duration curves:

• plot_flow_duration() - plot flow duration curves

These other long-term functions summarize the data over the entire record:

• calc_longterm_mean() - calculate the long-term mean annual discharge
• calc_longterm_percentile() - calculate the long-term percentiles
• calc_flow_percentile() - calculate the percentile rank of a flow value

#### Basic long-term statistics

The long-term calc_ and plot_ functions calculate the long-term and long-term monthly mean, median, maximum, minimum, and percentiles of all daily mean flows.

For calc_longterm_daily_stats(), for a given month, all daily flow values for a given month over the entire record are summarized together. For the ‘Long-term’ category, it summarizes all flow values over the entire record to determine the mean, median, maximum, minimum, and selected percentiles of daily flows. You can also specify a certain period of months to summarize together (ex. Jul-Sep flows) using the custom_months argument (listing the months) and labeling it using the custom_months_label argument (ex. “Summer Flows”).

calc_longterm_daily_stats(station_number = "08NM116",
start_year = 1974)
# A tibble: 13 x 8
STATION_NUMBER Month      Mean Median Maximum Minimum    P10   P90
<chr>          <fct>     <dbl>  <dbl>   <dbl>   <dbl>  <dbl> <dbl>
1 08NM116        Jan        1.14  0.940    9.5    0.160  0.576  1.78
2 08NM116        Feb        1.16  0.960    5.81   0.140  0.542  1.95
3 08NM116        Mar        1.80  1.29    17.5    0.380  0.717  3.55
4 08NM116        Apr        8.19  5.88    53.5    0.505  1.42  18.0
5 08NM116        May       24.4  21.8     80.8    2.55  10.3   40.8
6 08NM116        Jun       22.5  20.3     86.2    0.450  6.20  41.1
7 08NM116        Jul        6.23  3.94    76.8    0.332  1.18  13.7
8 08NM116        Aug        2.18  1.56    22.4    0.427  0.834  4.15
9 08NM116        Sep        2.30  1.60    17.6    0.364  0.771  4.70
10 08NM116        Oct        2.13  1.65    15.2    0.267  0.844  4.25
11 08NM116        Nov        1.92  1.51    11.7    0.260  0.599  3.75
12 08NM116        Dec        1.26  1.07     7.30   0.244  0.541  2.20
13 08NM116        Long-term  6.28  1.81    86.2    0.140  0.710 20.1 

The plot_longterm_daily_stats() will plot the monthly mean, median, maximum, and minimum values along with selected inner and outer percentiles ribbons on one plot. Change the inner and outer percentile ranges using the inner_percentiles and outer_percentiles arguments, remove the maximum and minimum ribbon using include_extremes = FALSE, or add a specific year using add_year.

plot_longterm_daily_stats(station_number = "08NM116",
start_year = 1974,
inner_percentiles = c(25,75),
outer_percentiles = c(10,90)) 
$Long-term_Daily_Statistics Similarly, the calc_longterm_monthly_stats() functions will calculate the mean, median, maximum, and percentiles of monthly mean flows from all years. Meaning the all daily flows for each month and each year are averaged, and the statistics are based on these annual monthly means. The “Annual” data row summarizes the mean, median, maximum, and percentiles from all annual means. calc_longterm_monthly_stats(station_number = "08NM116", start_year = 1974) # A tibble: 13 x 8 STATION_NUMBER Month Mean Median Maximum Minimum P10 P90 <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 08NM116 Jan 1.14 0.972 6.12 0.316 0.625 1.67 2 08NM116 Feb 1.16 0.965 3.83 0.353 0.600 1.72 3 08NM116 Mar 1.80 1.44 6.93 0.507 0.843 2.84 4 08NM116 Apr 8.19 7.73 23.9 1.60 2.88 13.0 5 08NM116 May 24.4 23.8 45.0 14.0 16.1 32.7 6 08NM116 Jun 22.5 22.1 48.6 3.15 11.8 35.6 7 08NM116 Jul 6.23 4.42 25.6 0.921 1.98 12.9 8 08NM116 Aug 2.18 1.76 10.2 0.872 1.13 3.37 9 08NM116 Sep 2.30 1.72 8.11 0.700 1.01 4.05 10 08NM116 Oct 2.13 1.82 5.66 0.533 1.02 3.66 11 08NM116 Nov 1.92 1.54 5.41 0.498 0.715 3.38 12 08NM116 Dec 1.26 1.10 3.65 0.450 0.548 2.14 13 08NM116 Annual 6.28 6.26 11.1 2.88 4.37 8.36 The corresponding plot_longterm_monthly_stats() function plots the data, with similar options as plot_longterm_daily_stats(). plot_longterm_monthly_stats(station_number = "08NM116", start_year = 1974)  $Long-term_Monthly_Statistics

#### Basic annual statistics

The calc_annual_stats() and plot_annual_stats() functions calculate the mean, median, maximum, minimum, and percentiles of daily flows for every year of data provided. In calculating, all daily flow values are grouped by year.

calc_annual_stats(station_number = "08NM116",
start_year = 1974)
# A tibble: 44 x 8
STATION_NUMBER  Year  Mean Median Maximum Minimum   P10   P90
<chr>          <dbl> <dbl>  <dbl>   <dbl>   <dbl> <dbl> <dbl>
1 08NM116         1974  8.43   1.34    66     0.447 0.709  33.0
2 08NM116         1975  5.48   1.54    48.7   0.320 0.580  19.6
3 08NM116         1976  8.18   3.84    71.1   0.736 0.884  25.6
4 08NM116         1977  4.38   1.26    36     0.564 0.776  17.2
5 08NM116         1978  6.75   3.28    44.5   0.532 0.828  19.7
6 08NM116         1979  4.40   1.56    43     0.411 0.618  15.9
7 08NM116         1980  5.37   1.88    46.2   0.623 0.793  20.1
8 08NM116         1981  7.67   2.77    60.6   0.398 1.5    22.3
9 08NM116         1982  8.46   2.68    54.5   0.815 1.40   30.3
10 08NM116         1983  7.85   3.13    60.2   0.530 1.44   23.5
# ... with 34 more rows

The percentiles in the plot_annual_stats() function are fully customizable like the calc_ function.

plot_annual_stats(station_number = "08NM116",
start_year = 1974) 

#### Basic daily statistics

The calc_daily_stats() and plot_daily_stats() functions calculate the mean, median, maximum, minimum, and percentiles of daily flows for each day of the year. For example, for a given day of year (i.e. day 1 (Jan-01) or day 2 (Jan-02)), all flow values for that day from the entire record are summarized together. Only the first 365 days of each year are summarized (ignores the 366th day from leap years). In calculating, all daily flow values are grouped by day of year.

calc_daily_stats(station_number = "08NM116",
start_year = 1974)
# A tibble: 365 x 11
STATION_NUMBER Date  DayofYear  Mean Median Minimum Maximum    P5   P25   P75
<chr>          <chr>     <dbl> <dbl>  <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
1 08NM116        Jan-~         1  1.08  0.970   0.328    2.51 0.540 0.692  1.38
2 08NM116        Jan-~         2  1.05  0.920   0.310    2.26 0.526 0.690  1.35
3 08NM116        Jan-~         3  1.03  0.897   0.290    2    0.524 0.703  1.22
4 08NM116        Jan-~         4  1.04  0.903   0.284    2.52 0.505 0.732  1.29
5 08NM116        Jan-~         5  1.03  0.895   0.302    2.25 0.534 0.709  1.23
6 08NM116        Jan-~         6  1.03  0.876   0.315    2.32 0.519 0.742  1.30
7 08NM116        Jan-~         7  1.06  0.905   0.312    2.80 0.493 0.744  1.20
8 08NM116        Jan-~         8  1.10  0.960   0.314    4    0.514 0.755  1.23
9 08NM116        Jan-~         9  1.11  0.977   0.327    4.20 0.509 0.740  1.33
10 08NM116        Jan-~        10  1.12  0.947   0.334    4.70 0.450 0.707  1.30
# ... with 355 more rows, and 1 more variable: P95 <dbl>

The plotting daily statistics function will plot the monthly mean, median, maximum, and minimum values along with selected inner and outer percentiles ribbons on one plot. Change the inner and outer percentile ranges using the inner_percentiles and outer_percentiles arguments, remove the maximum and minimum ribbon using include_extremes = FALSE, or add a specific year using add_year.

plot_daily_stats(station_number = "08NM116",
start_year = 1974) 
$Daily_Statistics plot_daily_stats(station_number = "08NM116", start_year = 1974, add_year = 2000)  $Daily_Statistics

#### Flow Duration

Flow duration curves can be produced using the function, where selected months and time periods can be selected:

plot_flow_duration(station_number = "08NM116",
start_year = 1974) 
$Flow_Duration plot_flow_duration(station_number = "08NM116", start_year = 1974, months = 7:9, include_longterm = FALSE)  $Flow_Duration

#### Other Long-term Statistics

calc_longterm_mean() calculates the mean of all the daily flows, and specific percents of the long-term mean (using percent_MAD argument). It can also be known as the long-term mean annual discharge, MAD.

calc_longterm_mean(station_number = "08NM116",
start_year = 1974,
percent_MAD = c(5,10,20))
# A tibble: 1 x 5
STATION_NUMBER LTMAD 5%MAD 10%MAD 20%MAD
<chr>          <dbl>   <dbl>    <dbl>    <dbl>
1 08NM116         6.28   0.314    0.628     1.26

calc_longterm_percentile() calculates the selected long-term percentiles of all the daily flow values.

calc_longterm_percentile(station_number = "08NM116",
start_year = 1974,
percentiles = c(25,50,75))
# A tibble: 1 x 4
STATION_NUMBER   P25   P50   P75
<chr>          <dbl> <dbl> <dbl>
1 08NM116         1.03  1.81  5.72

calc_flow_percentile() calculates the percentile rank of a specified flow value, provided as flow_value. It compares the flow value to all daily flow values to determines the percentile rank.

calc_flow_percentile(station_number = "08NM116",
start_year = 1974,
flow_value = 6.270)
# A tibble: 1 x 2
STATION_NUMBER Percentile
<chr>               <dbl>
1 08NM116              76.3

#### Basic statistics and plotting volumetric and yield flows

The calc_ and plot_ functions will summarize any values provided to the functions with the default column being ‘Value’. While for fasstr this defaults to daily mean flows, any daily value can be summarized (water level, precipitation amount, etc.) if the methods of analyses are similar for the parameter type. As there are no units presented in the calc_ functions this should not be problem for most calculations. However, the plots come standard with a “Discharge (cms)” y-axis, which can be changed afterwards using ggplot2 functions.

To facilitate the plotting of the daily volume or yield statistics from fasstr, after adding them to your flow data using the add_daily_volume() or add_daily_yield() functions, by listing the values argument as either ‘Volume_m3’ or ‘Yield_mm’ (from their respective add_* functions), the discharge axis title will adjust accordingly.

add_daily_volume(station_number = "08NM116") %>%
plot_annual_stats(values = "Volume_m3",
start_year = 1974) 
$Annual_Statistics add_daily_yield(station_number = "08NM116") %>% plot_daily_stats(values = "Yield_mm", start_year = 1974)  $Daily_Statistics

### Cumulative Flow Statistics

Total volumetric of runoff yield flows within a given year can provide important hydrological information on a basin-wide scale. These functions calculate the total volume (in cubic metres) or yield (in millimetres; based on basin size) for a flow data set, at the annual, monthly, or daily cumulative scale.

• calc_annual_cumulative_stats() - calculate annual (and seasonal) cumulative flows
• calc_monthly_cumulative_stats() - calculate cumulative monthly flow statistics
• calc_daily_cumulative_stats() - calculate cumulative daily flow statistics

These statistics can also be viewed using their corresponding plotting functions:

• plot_annual_cumulative_stats() - plot annual and seasonal total flows
• plot_monthly_cumulative_stats() - plot cumulative monthly flow statistics
• plot_daily_cumulative_stats() - plot cumulative daily flow statistics

While these functions default to volumetric flows, using use_yield = TRUE and basin_area arguments will calculate totals in runoff yield. If there is a groups column of HYDAT station numbers, then the function will automatically pull the basin area out of HYDAT if available; otherwise a basin area will be required. Due to the requirements of a complete annual data set to calculate total flows, only years of complete data are used.

#### Cumulative annual statistics

The calc_annual_cumulative_stats() function provides the total annual volume or runoff yield (if use_yield = TRUE is used). It totals all flows for a given year in cubic metres.

calc_annual_cumulative_stats(station_number = "08NM116", start_year = 1974)
# A tibble: 44 x 3
STATION_NUMBER  Year Total_Volume_m3
<chr>          <dbl>           <dbl>
1 08NM116         1974      265854182.
2 08NM116         1975      172900397.
3 08NM116         1976      258693177.
4 08NM116         1977      138177100.
5 08NM116         1978      212792574.
6 08NM116         1979      138807734.
7 08NM116         1980      169956317.
8 08NM116         1981      241854163.
9 08NM116         1982      266735721.
10 08NM116         1983      247618080.
# ... with 34 more rows

By using the include_seasons = TRUE (logical TRUE/FALSE) argument, total seasonal flows columns will be added to the results. Two columns of two-seasons (2-six months), and four columns of four-seasons (4-three months) will be added. The start month of the first seasons will begin in the first month of the year (ex. Jan for Calendar years or Oct for water years starting in October).

calc_annual_cumulative_stats(station_number = "08NM116",
start_year = 1974,
include_seasons = TRUE)
# A tibble: 44 x 9
STATION_NUMBER  Year Total_Volume_m3 Jan-Jun_Volume_m3 Jul-Dec_Volume_m3
<chr>          <dbl>           <dbl>               <dbl>               <dbl>
1 08NM116         1974      265854182.          223662989.           42191194.
2 08NM116         1975      172900397.          136045958.           36854438.
3 08NM116         1976      258693177.          164417817.           94275360.
4 08NM116         1977      138177100.          115279113.           22897987.
5 08NM116         1978      212792574.          146659335.           66133239.
6 08NM116         1979      138807734.          117444383.           21363350.
7 08NM116         1980      169956317.          131126774.           38829542.
8 08NM116         1981      241854163.          165675542.           76178621.
9 08NM116         1982      266735721.          154229097.          112506624.
10 08NM116         1983      247618080.          191691360.           55926720.
# ... with 34 more rows, and 4 more variables: Jan-Mar_Volume_m3 <dbl>,
#   Apr-Jun_Volume_m3 <dbl>, Jul-Sep_Volume_m3 <dbl>, Oct-Dec_Volume_m3 <dbl>

The total volumes for each year can be plotted using the plot_annual_cumulative_stats() function. When using include_seasons = TRUE two additional plots will be created, one for two- and four-seasons.

plot_annual_cumulative_stats(station_number = "08NM116",
start_year = 1974) 

#### Cumulative daily statistics

The calc_daily_cumulative_stats() and plot_daily_cumulative_stats() functions calculate the mean, median, maximum, minimum, and percentiles of total cumulative daily flows. For each day of each year, the total volume or runoff yield is determined. Then within a given year, the cumulative total for each day is determined by added all previous days (ex. Jan-01 = Jan-01 total; Jan-02 = Jan-01+Jan-02 totals, etc.). Then the mean, median, maximum, minimum, and percentiles are calculated based on those daily cumulative totals for each year. In interpreting the information, if a given total flow is below the mean value, then the cumulative flow is less than average. In other words, less volume has passed through the station than normal at that point in time. Viewing the plot below may help understand how this function works. The percentiles in the calc_ function are flexible using the percentiles argument.

calc_daily_cumulative_stats(station_number = "08NM116",
start_year = 1974)
# A tibble: 365 x 11
STATION_NUMBER Date   DayofYear    Mean  Median Minimum Maximum     P5    P25
<chr>          <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 08NM116        Jan-01         1  93151.  83808.  28339.  2.17e5 4.67e4 5.97e4
2 08NM116        Jan-02         2 183942. 160099.  55123.  4.12e5 9.18e4 1.19e5
3 08NM116        Jan-03         3 273110. 236304.  80179.  5.81e5 1.35e5 1.79e5
4 08NM116        Jan-04         4 363318. 313416. 104717.  7.69e5 1.82e5 2.41e5
5 08NM116        Jan-05         5 452302. 388411. 130810.  9.53e5 2.32e5 3.09e5
6 08NM116        Jan-06         6 541123. 466733. 158026.  1.12e6 2.81e5 3.68e5
7 08NM116        Jan-07         7 632968. 546739. 184982.  1.29e6 3.27e5 4.31e5
8 08NM116        Jan-08         8 728307. 623376. 212112.  1.45e6 3.72e5 4.96e5
9 08NM116        Jan-09         9 824635. 712325. 240365.  1.80e6 4.17e5 5.63e5
10 08NM116        Jan-10        10 921179. 803131. 269222.  2.21e6 4.65e5 6.27e5
# ... with 355 more rows, and 2 more variables: P75 <dbl>, P95 <dbl>

The plot_daily_cumulative_stats() function will plot the daily cumulative total mean, median, maximum, and minimum values along with the 5th, 25th, 75th, and 95th percentiles all on one plot. The percentiles are not customizable for this function.

plot_daily_cumulative_stats(station_number = "08NM116",
start_year = 1974,
use_yield = TRUE) 

#### Annual low-flows

The calc_annual_lowflows() calculates the annual minimum values, the day of year, and dates of specified rolling mean days (can do multiple days if desired).

calc_annual_lowflows(station_number = "08NM116",
start_year = 1974)
# A tibble: 44 x 14
STATION_NUMBER  Year Min_1_Day Min_1_Day_DoY Min_1_Day_Date Min_3_Day
<chr>          <dbl>     <dbl>         <dbl> <date>             <dbl>
1 08NM116         1974     0.447           333 1974-11-29         0.533
2 08NM116         1975     0.320            11 1975-01-11         0.378
3 08NM116         1976     0.736            38 1976-02-07         0.741
4 08NM116         1977     0.564            73 1977-03-14         0.627
5 08NM116         1978     0.532            55 1978-02-24         0.630
6 08NM116         1979     0.411           268 1979-09-25         0.416
7 08NM116         1980     0.623             9 1980-01-09         0.632
8 08NM116         1981     0.398           261 1981-09-18         0.468
9 08NM116         1982     0.815             6 1982-01-06         0.883
10 08NM116         1983     0.530           357 1983-12-23         0.562
# ... with 34 more rows, and 8 more variables: Min_3_Day_DoY <dbl>,
#   Min_3_Day_Date <date>, Min_7_Day <dbl>, Min_7_Day_DoY <dbl>,
#   Min_7_Day_Date <date>, Min_30_Day <dbl>, Min_30_Day_DoY <dbl>,
#   Min_30_Day_Date <date>

The annual low flow values and the day of the low flow values can be plotted, separately, using the plot_annual_lowflows() function.

plot_annual_lowflows(station_number = "08NM116",
start_year = 1974) 
$Annual_Low_Flows $Annual_Low_Flows_Dates

#### Annual peaks

Similar to calc_annual_lowflows(), calc_annual_peaks() calculates the annual minimum and maximum values, the day of year, and dates of specified rolling mean days.

calc_annual_peaks(station_number = "08NM116",
start_year = 1974)
# A tibble: 44 x 8
STATION_NUMBER  Year Min_1_Day Min_1_Day_DoY Min_1_Day_Date Max_1_Day
<chr>          <dbl>     <dbl>         <dbl> <date>             <dbl>
1 08NM116         1974     0.447           333 1974-11-29          66
2 08NM116         1975     0.320            11 1975-01-11          48.7
3 08NM116         1976     0.736            38 1976-02-07          71.1
4 08NM116         1977     0.564            73 1977-03-14          36
5 08NM116         1978     0.532            55 1978-02-24          44.5
6 08NM116         1979     0.411           268 1979-09-25          43
7 08NM116         1980     0.623             9 1980-01-09          46.2
8 08NM116         1981     0.398           261 1981-09-18          60.6
9 08NM116         1982     0.815             6 1982-01-06          54.5
10 08NM116         1983     0.530           357 1983-12-23          60.2
# ... with 34 more rows, and 2 more variables: Max_1_Day_DoY <dbl>,
#   Max_1_Day_Date <date>

#### Number of days per year outside of normal

The calc_annual_outside_normal() calculates the number of days per year that are above and below “normal”, “normal” typically defined as 25th and 75th percentiles. The normal limits can be determined using the normal_percentiles argument, listing the lower and upper normal ranges, respectively (e.g. normal_percentiles = c(25, 75)). The function calculates the lower and upper percentiles for each day of the year over all years and sums all days that are above or below the daily normal ranges for a given year. Rolling averages can also be used in this function using the roll_days argument.

calc_annual_outside_normal(station_number = "08NM116",
start_year = 1974)
# A tibble: 44 x 5
STATION_NUMBER  Year Days_Below_Normal Days_Above_Normal Days_Outside_Normal
<chr>          <dbl>             <int>             <int>               <int>
1 08NM116         1974                72                77                 149
2 08NM116         1975               138                32                 170
3 08NM116         1976                54               144                 198
4 08NM116         1977               107                 8                 115
5 08NM116         1978                21               114                 135
6 08NM116         1979               144                67                 211
7 08NM116         1980                79                61                 140
8 08NM116         1981                15               203                 218
9 08NM116         1982                35               208                 243
10 08NM116         1983                14               208                 222
# ... with 34 more rows

Each of the above, below, and total days outside of normal can be plotted using the plot_annual_outside_normal() function.

plot_annual_outside_normal(station_number = "08NM116",
start_year = 1974) 

## 7. Functions for Computing Analyses

There are several functions that provide more in-depth analyses. These functions begin with compute_ instead of calc_ and typically produce more than just a tibble data frame of statistics, like the calc_ functions. Most of these produce a list of objects, consisting of both tibbles and plots. There are three groups of analysis functions: annual trending, annual volume frequency analyses, and a full analysis (of most fasstr functions). There is a separate vignette for each analysis type to provide more information.

### Volume Frequency Analyses

There are five fasstr functions that perform various volume frequency analyses. Frequency analyses are used to determine probabilities of events of certain sizes (typically annual high or low flows). The analyses produce plots of event series and computed quantiles fitted from either Log-Pearson Type III or Weibull probability distributions. See the frequency analysis vignette for more information.

The compute_annual_frequencies() performs an annual daily (or selected duration using roll_days argument) low-flow (by default) or high-flow (using use_max = TRUE argument) frequency analysis on annual series. This analysis uses the daily mean lows or highs. The compute_hydat_peak_frequencies() function performs an annual instantaneous low (by default) or high peak frequency analysis. The data argument cannot be used for the HYDAT peak analysis. Both functions output several objects in a list:

1. $Freq_Analysis_Data - Tibble of computed annual minimums (or maximums) 2.$Freq_Plot_Data - Tibble of plotting coordinates used in the frequency plot
3. $Freq_Plot - ggplot2 object of the frequency plot 4.$Freq_Fitting - List of fitdistrplus objects of the fitted distributions.

#### Data frame options

An option when working with the functions that produce data frames is to transpose the rows and columns of the data. Most functions by default provide data results such there are columns of statistics for each station and time period. See the example here:

calc_longterm_daily_stats(station_number = "08NM116",
start_year = 1980,
end_year = 2010)
# A tibble: 13 x 8
STATION_NUMBER Month      Mean Median Maximum Minimum   P10   P90
<chr>          <fct>     <dbl>  <dbl>   <dbl>   <dbl> <dbl> <dbl>
1 08NM116        Jan        1.20  0.965    9.5    0.160 0.548  1.85
2 08NM116        Feb        1.15  0.968    4.41   0.140 0.489  1.97
3 08NM116        Mar        1.82  1.38     9.86   0.380 0.720  3.70
4 08NM116        Apr        8.33  6.22    37.9    0.505 1.54  17.8
5 08NM116        May       23.6  20.9     74.4    3.83  9.37  40.8
6 08NM116        Jun       21.3  19.4     84.5    0.450 6.10  38.6
7 08NM116        Jul        6.42  3.94    54.5    0.332 1.02  14.7
8 08NM116        Aug        2.11  1.57    13.3    0.427 0.779  4.21
9 08NM116        Sep        2.21  1.62    14.6    0.364 0.740  4.35
10 08NM116        Oct        2.10  1.65    15.2    0.267 0.803  3.95
11 08NM116        Nov        2.02  1.71    11.7    0.260 0.562  3.78
12 08NM116        Dec        1.31  1.08     7.30   0.342 0.5    2.37
13 08NM116        Long-term  6.14  1.89    84.5    0.140 0.685 19.3 

In some circumstances, however, it may be more convenient to wrangle the data such that there are columns for stations (or groupings) and a single column with all statistics, and then the values are placed in columns for each respective time period. See the following example when setting transpose = TRUE.

calc_longterm_daily_stats(station_number = "08NM116",
start_year = 1980,
end_year = 2010,
transpose = TRUE)
# A tibble: 6 x 15
STATION_NUMBER Statistic   Jan   Feb   Mar    Apr   May    Jun    Jul    Aug
<chr>          <fct>     <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1 08NM116        Mean      1.20  1.15  1.82   8.33  23.6  21.3    6.42   2.11
2 08NM116        Median    0.965 0.968 1.38   6.22  20.9  19.4    3.94   1.57
3 08NM116        Maximum   9.5   4.41  9.86  37.9   74.4  84.5   54.5   13.3
4 08NM116        Minimum   0.160 0.140 0.380  0.505  3.83  0.450  0.332  0.427
5 08NM116        P10       0.548 0.489 0.720  1.54   9.37  6.10   1.02   0.779
6 08NM116        P90       1.85  1.97  3.70  17.8   40.8  38.6   14.7    4.21
# ... with 5 more variables: Sep <dbl>, Oct <dbl>, Nov <dbl>, Dec <dbl>,
#   Long-term <dbl>

#### Plotting options

##### Logarithmic discharge scale

Depending on the plotting function, discharge data will be plotted using a linear or a logarithmic scale (depending on the scale of data). This can be altered using the log_discharge argument. Here is example of plotting with a linear scale (default log_discharge = FALSE):

plot_annual_stats(station_number = "08NM116",
start_year = 1980,
end_year = 2010)
$Annual_Statistics Set the discharge scale to be logarithmic (log_discharge = TRUE): plot_annual_stats(station_number = "08NM116", start_year = 1980, end_year = 2010, log_discharge = TRUE) $Annual_Statistics

##### Including a standard title on the plot

The logical include_title argument adds the station number (or grouping identifier from the groupings argument), and in some cases the statistics as well. The argument’s default is FALSE.

Example of including a title when plotting (include_title = TRUE):

plot_annual_stats(station_number = "08NM116",
start_year = 1980,
end_year = 2010,
include_title = TRUE)

#### Writing a list of data frames and plots

As some objects produced with this package, mainly with the compute_* functions, contain lists of both data frames and ggplot2 objects, a function is provided, called write_objects_list(), to assist in saving all objects within the list into a designated directory folder, where all table and plot files are named by the object names. The name of the folder is provided using the folder_name argument. If the folder does not exist, one will be created. The file type for tables and plots are chosen using the table_filetype and plot_filetype arguments respectively. There are also options to customize plot output size with width, height, units and dpi arguments, as similar to those in ggplots2:ggsave() can also be used.

The following will save all plots and tables in a folder called “Frequency Analysis” in the working directory:

freq_analysis <- compute_annual_frequencies(station_number = "08NM116")

write_objects_list(list = freq_analysis,
folder_name = "Frequency Analysis",
plot_filetype = "png",
table_filetype = "xlsx")`