```
library(bumbl)
library(car)
#> Loading required package: carData
```

Bumblebee colony growth is characterized by an initial exponential growth period when workers are being produced, followed by a switch to production of reproductive individuals (gynes and drones). Reproductive individuals leave the nest, resulting in a decline in colony size. The initial growth rate, the rate or decline, and the time to switching to reproduction may vary in response to environmental conditions (Crone and Williams 2016).

The `bumbl()`

function fits a model, described below and in Crone & Williams (2016), that finds a growth rate (\(\lambda\)), a decline rate (\(\delta\)) , and a switchpoint (\(\tau\). When supplied with a dataset with multiple colonies, a model is fit separately for each colony and these three parameters (as well as an estimated starting colony size and maximum colony size) are returned. In this example, we’ll use the built-in `bombus`

dataset. For more information on this dataset see the help file with `?bombus`

.

Before the switch point, growth (colony weight, \(W_t\)) is defined as:

\[ W_t = \lambda^tW_0 \] After the switchpoint (\(t > \tau\)), it’s defined as:

\[ W_t = \lambda^{\tau}W_{0}\delta^{t-\tau} \]

Where \(\delta\) is a rate of decline. Therefore:

\[ W_t = \begin{cases} \lambda^tW_0 & t\leq\tau \\ \lambda^{\tau}W_{0}\delta^{t-\tau} & t > \tau \end{cases} \]

After log-linearizing this model, it it looks like this:

For \(t<=\tau\):

\[ \ln(W_t) = \ln(W_0) + t\ln(\lambda) \]

For \(t>\tau\)

\[ \ln(W_t) = \ln(W_0) + \tau\ln(\lambda) + (t-\tau)\ln(\delta) \]

`bumbl()`

works by treating this as a generalized linear model with a log-link after creating a new variable, `.post`

which is 0 before \(\tau\) and \(t-\tau\) after \(\tau\). Then the value of \(\tau\) that maximizes likelihood is found and used to fit a final model.

\[ ln(W_t) = \beta_0 + \beta_1t + \beta_2 \textrm{ .post} \]

To clarify, in the above equation:

- \(\beta_0 = \ln(W_0)\)
- \(\beta_1 = \ln(\lambda)\)
- \(\beta_2 = \ln(\delta-\lambda) = \frac{\ln(\delta)}{\ln(\lambda)}\)

```
head(bombus)
#> # A tibble: 6 x 10
#> site colony wild habitat date week mass d.mass floral_resources
#> <fct> <fct> <dbl> <fct> <date> <int> <dbl> <dbl> <dbl>
#> 1 PUT2 9 0.98 W 2003-04-03 0 1910. 0.1 27.8
#> 2 PUT2 9 0.98 W 2003-04-09 1 1940 30.6 27.8
#> 3 PUT2 9 0.98 W 2003-04-15 2 1938 28.6 27.8
#> 4 PUT2 9 0.98 W 2003-04-22 3 1976. 67.1 27.8
#> 5 PUT2 9 0.98 W 2003-05-01 4 2010. 101. 7.96
#> 6 PUT2 9 0.98 W 2003-05-07 5 2143 234. 7.96
#> # … with 1 more variable: cum_floral <dbl>
```

The `bumbl()`

function expects a tidy dataset with a column for some measure of time, and a column for some measure of colony size at minimum. A formula is required, which at minimum should have colony size on the left-hand side, and time on the right-hand side.

```
<- bumbl(bombus, colonyID = colony, t = week, formula = d.mass ~ week)
out #> Warning: Search for optimal switchpoint did not converge for colonyID '32'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '71'. Omitting from results.
head(out)
#> # A tibble: 6 x 7
#> colony converged tau logN0 logLam decay logNmax
#> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 14 TRUE 6.40 4.04 0.295 -0.458 5.83
#> 3 17 TRUE 6.36 3.39 0.407 -0.662 5.83
#> 4 20 TRUE 7.27 2.79 0.194 -0.345 4.14
#> 5 22 TRUE 6.92 2.65 0.280 -0.439 4.57
#> 6 24 TRUE 6.23 4.06 0.167 -0.391 5.06
```

For each colony, we get the maximum likelihood estimate for \(\tau\) (`tau`

), the estimated initial colony weight (\(\ln({W_0})\), `logN0`

) on a log scale, the average colony growth rate (\(\ln(\lambda)\), `logLam`

) on a log scale, the rate of decline after \(\tau\) (\(ln(\delta - \lambda)\), `decay`

), and the estimated maximum weight of each colony, on a log scale (`logNmax`

).

The supplied formula can also include covariates for colony growth. The model coefficients for any additional covariates are included in the output of `bumbl()`

. Here, I’ve included cumulative floral resources as a covariate. Accounting for some covariates, such as time of day, might give better estimates of the switchpoint or growth and decay rates.

```
<- bumbl(bombus, colonyID = colony, t = week,
out2 formula = d.mass ~ week + cum_floral)
#> Warning: Search for optimal switchpoint did not converge for colonyID '20'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '67'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '86'. Omitting from results.
head(out2)
#> # A tibble: 6 x 8
#> colony converged tau logN0 logLam decay logNmax beta_cum_floral
#> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 TRUE 5.56 3.12 0.361 -0.633 5.73 0.0142
#> 2 14 TRUE 5.57 -4.63 -0.381 -0.534 5.78 0.154
#> 3 17 TRUE 8.47 3.99 0.567 -0.545 5.82 -0.0267
#> 4 22 TRUE 7.11 2.87 0.409 -0.491 4.59 -0.00960
#> 5 24 TRUE 6.40 4.04 0.185 -0.386 5.04 -0.00174
#> 6 26 TRUE 7.28 3.34 -0.158 -0.111 5.49 0.0223
```

You may also use count data, such as number of workers entering or exiting a hive, with either a Poisson or negative binomial GLM by supplying the `family`

argument.

As is the case with any GLM, some model diagnostics should be performed before interpreting the results. One way to check that results are sensible, is to plot them. The `plot()`

method for data frames produced by `bumbl()`

(`plot.bumbldf()`

) plots each colony’s observed and estimated growth trajectory as a separate plot. If you only want to display certain results, you can supply a vector of indices or colony ID’s to the `colony`

argument.

```
plot(out, colony = c("17", "104", "20", "24"))
#> Creating plots for 4 colonies...
```

Observed values for colony 20 don’t follow a neat growth and decline shape, and the colony mass was consistently very low. Because of this, we might not trust the value for `tau`

as representing a true switch from workers to reproductives.

There is also an `autoplot()`

method that can be used if you have the `ggplot2`

package installed. It returns a `ggplot`

object which can be modified.

```
library(ggplot2)
<- autoplot(out, colony = c("17", "104", "20", "24")) p
```

`+ theme_bw() p `

If you’d rather get these data and produce your own plots, run `bumbl()`

with `augment = TRUE`

to get fitted values to plot.

Other model diagnostics (plots or statistics) can be obtained from the GLMs fit to each colony. Running `bumbl()`

with `keep.model = TRUE`

produces an output with a list-column containing the GLMs.

```
<- bumbl(bombus, colonyID = colony, t = week, formula = d.mass ~ week, keep.model = TRUE)
out3 #> Warning: Search for optimal switchpoint did not converge for colonyID '32'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '71'. Omitting from results.
head(out3)
#> # A tibble: 6 x 8
#> colony model converged tau logN0 logLam decay logNmax
#> <chr> <list> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 <glm> TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 14 <glm> TRUE 6.40 4.04 0.295 -0.458 5.83
#> 3 17 <glm> TRUE 6.36 3.39 0.407 -0.662 5.83
#> 4 20 <glm> TRUE 7.27 2.79 0.194 -0.345 4.14
#> 5 22 <glm> TRUE 6.92 2.65 0.280 -0.439 4.57
#> 6 24 <glm> TRUE 6.23 4.06 0.167 -0.391 5.06
```

Say, for example, you want to compute an R^{2} value using the `rsq`

package.

```
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:car':
#>
#> recode
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:car':
#>
#> some
library(rsq)
#> Registered S3 methods overwritten by 'lme4':
#> method from
#> cooks.distance.influence.merMod car
#> influence.merMod car
#> dfbeta.influence.merMod car
#> dfbetas.influence.merMod car
#for a single colony
# m <- out3$model[[1]]
# rsq(m)
#for all colonies
.1 <-
out3%>%
out3 mutate(r2 = purrr::map_dbl(model, rsq), .after = model)
.1 %>% filter(colony %in% c("104", "17", "20", "24"))
out3#> # A tibble: 4 x 9
#> colony model r2 converged tau logN0 logLam decay logNmax
#> <chr> <list> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 <glm> 0.977 TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 17 <glm> 0.956 TRUE 6.36 3.39 0.407 -0.662 5.83
#> 3 20 <glm> 0.473 TRUE 7.27 2.79 0.194 -0.345 4.14
#> 4 24 <glm> 0.729 TRUE 6.23 4.06 0.167 -0.391 5.06
```

We can see that colony 20 has a much lower R^{2} value than colonies 104, 17, and 24 which matches our expectations from plotting the observed and fitted values.

See `vignette("nest", package = "tidyr")`

for more about working with list-columns containing models.

Crone, Elizabeth E., and Neal M. Williams. 2016. “Bumble Bee Colony Dynamics: Quantifying the Importance of Land Use and Floral Resources for Colony Growth and Queen Production.” *Ecology Letters* 19 (4): 460–68. https://doi.org/10.1111/ele.12581.