The Analytic Garden: We were almost there

This disease, called COVID 19, will be over much sooner than you think. Christian people all over this country, praying, have overwhelmed it.
~ Kenneth Copeland

With everything going on with COVID, you don't know what's going to happen.
~ Demetrius Andrade

Here are the current 7-day moving averages of cases in the five county area in NY's Capital Distract. The data is from https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv.

It's depressing that despite vaccination we are at approximately at the same case-load as December 2020. In July 202, we were down to essentially no cases. What happened.

Delta happened, school started, the weather got colder, more time indoors, people got tired of masks and social distancing, people refused to get vaccinated, vaccine effectiveness declined. Lots happened. It looks like we might reach the January-February 2021 peak in cases in early 2022.

Here's the code for producing this plot.

#' plot_county_cases - plot case moving averages for selected counties
#'
#' @param state - selected state, New York is default,
#' @param counties - a list of county names. Capital District counties are default.
#' @param lag - number of days for moving average. Default is 7
#'
#' @return a data frame with columns
#'    date
#'    county - selected counties
#'    state - selected state
#'    fips - standard location data, see https://www.census.gov/prod/techdoc/cbp/95-96cd/fips-st.pdf
#'    cases - cases lagged by lag days
#'    avg - moving average
#'    
#' @requires
#' counties must be in state of an error is thrown.

plot_county_cases <- function(state = 'New York',
                              counties = c('Albany', 'Columbia', 'Rensselaer', 'Saratoga', 'Schenectady'),
                              lag = 7) {
  require(tidyverse)
  require(grid)
  require(gridExtra)
  require(zoo)
  require(rlist)
  
  url <- 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
  
  df <- read_csv(url)
  df <- df %>% filter(county %in% {{ counties }}) %>% filter(state == {{ state }})
  
  df2 <- data.frame(date = NULL,
                    county = NULL,
                    state = NULL,
                    daily = NULL,
                    avg = NULL)
  
  plot_list <- list()
  for(county in counties) {
    temp <- df %>% 
      filter(county == {{ county }}) %>% 
      mutate(daily = cases - lag(cases)) %>% 
      mutate(avg = rollapply(daily, lag, mean, align='right', fill = NA)) %>%
      mutate(avg = ifelse(avg >= 0, avg, 0))
    
    p <- ggplot(temp, aes(x=date, y=avg)) + 
      geom_line() +
      labs(title = paste('Cases ', lag, '-day Moving Average ', county, ' County', sep = ''),
           y = 'Cases')
    plot_list <- list.append(plot_list, p)
    
    df2 <- bind_rows(df2, temp)
  }
  
  grid.arrange(grobs = plot_list, ncol=2, top = textGrob(Sys.Date()))
  
  p2 <- ggplot(df2) +
    geom_line(aes(x = date, y = avg, color = county)) +
    ggtitle(paste('Cases', lag, '-day Moving Average', Sys.Date()))
  print(p2)
  
  return(df)
}

With a few small tweaks to thye code above, we can see that the five counties have similar rates when normalized by population.

Here's the modified code.

#' plot_county_cases_pop - plot case moving averages for selected counties normalized by population
#'
#' @param state - selected state, New York is default,
#' @param counties - a list of county names. Capital District counties are default.
#' @param lag - number of days for moving average. Default is 7
#' @param df_pop - a dataframe of county population from https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates/2020-evaluation-estimates/2010s-counties-total.html
#' @param date - an option date string. If NULL, today's date is used. 
#'
#' @return a data frame with columns
#'    date
#'    county - selected counties
#'    state - selected state
#'    fips - standard location data, see https://www.census.gov/prod/techdoc/cbp/95-96cd/fips-st.pdf
#'    cases - cases/100000 lagged by lag days
#'    avg - moving average
#'    
#' @requires
#' counties must be in state of an error is thrown.

plot_county_cases_pop <- function(state = 'New York',
                                  counties = c('Albany', 'Columbia', 'Rensselaer', 'Saratoga', 'Schenectady'),
                                  lag = 7,
                                  df_pop = county_pop,
                                  date = NULL) {
  require(tidyverse)
  require(grid)
  require(gridExtra)
  require(zoo)
  require(rlist)
  
  if(is.null(date)) {
    date = Sys.Date()
  }
  
  url <- 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
  
  df <- read_csv(url)
  df <- df %>% filter(county %in% {{ counties }}) %>% filter(state == {{ state }})
  
  df2 <- data.frame(date = NULL,
                    county = NULL,
                    state = NULL,
                    daily = NULL,
                    avg = NULL)
  
  plot_list <- list()
  for(county in counties) {
    # get population for this county
    population <- df_pop %>%
      filter(STNAME == state & CTYNAME == paste({{ county }}, 'County'))
    pop <- population$POPESTIMATE2020 / 100000
    
    temp <- df %>% 
      filter(county == {{ county }}) %>% 
      mutate(daily = (cases - lag(cases))/pop) %>% 
      mutate(avg = rollapply(daily, lag, mean, align='right', fill = NA)) %>%
      mutate(avg = ifelse(avg >= 0, avg, 0))
    
    p <- ggplot(temp, aes(x=date, y=avg)) + 
      geom_line() +
      labs(title = paste('Cases per 100,000', lag, '-day Moving Average ', county, ' County', sep = ''),
           y = 'Cases')
    plot_list <- list.append(plot_list, p)
    
    df2 <- bind_rows(df2, temp)
  }
  
  grid.arrange(grobs = plot_list, ncol=2, top = textGrob(date))
  
  p2 <- ggplot(df2) +
    geom_line(aes(x = date, y = avg, color = county)) +
    ggtitle(paste('Cases per 100,000', lag, '-day Moving Average', date))
  print(p2)
  
  return(df2)
}

The Analytic Garden

We were almost there

No comments:

Post a Comment

Labels

Contributors

wfmu