Quantcast
Channel: CodeSection,代码区,网络安全 - CodeSec
Viewing all articles
Browse latest Browse all 12749

Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

$
0
0

The family got hit pretty hard with the flu right as the Christmas festivities started and we were all pretty much bed-ridden zombies up until today (2017-01-02). When in the throes of a very bad ILI it’s easy to imagine that you’re a victim of a severe outbreak, especially with ancillary data from others that they, too, either just had/have the flu or know others who do. Thankfully, I didn’t have to accept this emotional opine and could turn to the cdcfluview package to see just how this year is measuring up.

Influenza cases are cyclical, and that’s super-easy to see with a longer-view of the CDC national data:

library(cdcfluview)
library(tidyverse)
library(stringi)
flu <- get_flu_data("national", years=2010:2016)
mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>%
select(-`AGE 25-64`) %>%
gather(age_group, count, starts_with("AGE")) %>%
mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>%
mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
ggplot(aes(week, count, group=age_group)) +
geom_line(aes(color=age_group)) +
scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
facet_wrap(~age_group, scales="free") +
labs(x=NULL, y="Count of reported ILI cases",
title="U.S. National ILI Case Counts by Age Group (2010:2011 flu season through 2016:2017)",
caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
ggthemes::scale_color_tableau(name=NULL) +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="none")
Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

We can use the same data to zoom in on this season:

mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>%
select(-`AGE 25-64`) %>%
gather(age_group, count, starts_with("AGE")) %>%
mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>%
mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
filter(week >= as.Date("2016-07-01")) %>%
ggplot(aes(week, count, group=age_group)) +
geom_line(aes(color=age_group)) +
scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
facet_wrap(~age_group, scales="free") +
labs(x=NULL, y="Count of reported ILI cases",
title="U.S. National ILI Case Counts by Age Group (2016:2017 flu season)",
caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
ggthemes::scale_color_tableau(name=NULL) +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="none")
Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

So, things are trending up, but how severe is this year compared to others? While looking at the number/percentage of ILI cases is one way to understand severity, another is to look at the mortality rate. The cdcfluview package has a get_mortality_surveillance_data() function, but it’s region-based and I’m really only looking at national data in this post. A helpful individual pointed out a new CSV file at https://www.cdc.gov/flu/weekly/index.htm#MS which we can reproducibly programmatically target (so we don’t have to track filename changes by hand) with:

library(rvest)
pg <- read_html("https://www.cdc.gov/flu/weekly/index.htm#MS")
html_nodes(pg, xpath=".//a[contains(@href, 'csv') and contains(@href, 'NCHS')]") %>%
html_attr("href") -> mort_ref
mort_url <- sprintf("https://www.cdc.gov%s", mort_ref)
df <- readr::read_csv(mort_url)

We can, then, take a look at the current “outbreak” status (when real-world mortality events exceed the model threshold):

mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>%
select(week, Expected, Threshold, `Percent of Deaths Due to Pneumonia and Influenza`) %>%
gather(category, percent, -week) %>%
mutate(percent=percent/100) %>%
ggplot() +
geom_line(aes(week, percent, group=category, color=category)) +
scale_x_date(date_labels="%Y-%U") +
scale_y_continuous(label=scales::percent) +
ggthemes::scale_color_tableau(name=NULL) +
labs(x=NULL, y=NULL, title="U.S. Pneumonia & Influenza Mortality",
subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
caption="Source: National Center for Health Statistics Mortality Surveillance System") +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="bottom")
Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

That view is for all mortality events from both influenza and pneumonia. We can look at the counts for just influenza as well:

mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>%
select(week, `Influenza Deaths`) %>%
ggplot() +
geom_line(aes(week, `Influenza Deaths`), color=ggthemes::tableau_color_pal()(1)) +
scale_x_date(date_labels="%Y-%U") +
scale_y_continuous(label=scales::comma) +
ggthemes::scale_color_tableau(name=NULL) +
labs(x=NULL, y=NULL, title="U.S. Influenza Mortality (count of mortality events)",
subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
caption="Source: National Center for Health Statistics Mortality Surveillance System") +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="bottom")
Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

It’s encouraging that the overall combined mortality rate is trending downwards and that the mortality rate for influenza is very low. Go. Science.

I’ll be adding a function to cdcfluview to retrieve this new data set a bit later in the year.

Hopefully you’ll avoid the flu and enjoy a healthy and prosperous 2017.


Viewing all articles
Browse latest Browse all 12749

Trending Articles