How to make an interactive map of USA with R and leaflet
Dec 13, 2017
Francisco Requena
306 minute read

In a few days, I will give a talk in the R Group Meeting of Madrid. I will show HealthPlot, an open data project that I have been developing during this year. It’s a web application developed with R and Shiny with maps and graphics about different aspects of the public health in USA.

You can find topics such as: languages, immigration, religion, police killings, mean home value… this information has been obtained from public databases of the US Government such as American Community Survey, Centers for Disease Control and Prevention

I went to show some code in my presentation but I decided that would be a better option to write a blog post about it.

In the next post, I will try to reflect what have been my workflow in each dataset analyzed from the search for the data in the databases to get the final map.

1. Get the information

In this case, we will try to find information about Life expectancy in USA. The Institute for Health Metrics and Evaluation from the University of Washingthon is a great resource with information about public health data such as pandemics, health financing, diseases rates… not only about USA but also the rest of the world. For this example, we are going to download the data using this portal.

This data contains information about the Life expectancy in USA by State and County from 1980 to 2014. Besides, it has a value with the difference (in percentage %) of the life expectancy in 1980 and 2014.

2. Cleaning data

The file has a .xlsx format and we will not be able to use read.csv or read.delim functions. For this case, we will use the package XLConnect that allow us in a simpy way to select the data that we are interested in the spreadsheet.

library(XLConnect) # Read data
library(rgdal) # Read shapefile 
library(dplyr) # Clean data
library(leaflet) # Plot maps
life <- readWorksheetFromFile("data/life.xlsx",
                            sheet=1,
                            startRow = 2,
                            endCol = 12)
glimpse(life)
## Observations: 3,196
## Variables: 11
## $ Location                                <chr> "United States", "Alab...
## $ FIPS                                    <dbl> NA, 1, 1001, 1003, 100...
## $ Life.expectancy..1980.                  <chr> "73.75 (73.73, 73.77)"...
## $ Life.expectancy..1985.                  <chr> "74.72 (74.71, 74.74)"...
## $ Life.expectancy..1990.                  <chr> "75.39 (75.38, 75.41)"...
## $ Life.expectancy..1995.                  <chr> "75.86 (75.85, 75.88)"...
## $ Life.expectancy..2000.                  <chr> "76.94 (76.92, 76.95)"...
## $ Life.expectancy..2005.                  <chr> "77.65 (77.61, 77.68)"...
## $ Life.expectancy..2010.                  <chr> "78.82 (78.81, 78.84)"...
## $ Life.expectancy..2014.                  <chr> "79.08 (79.04, 79.11)"...
## $ X..Change.in.Life.Expectancy..1980.2014 <chr> "7.22 (7.16, 7.27)", "...

To create a leaflet map, our dataframe has to have two essential columns, regardless of the data:

1. A code which defines a region/area of our map. In this case, we have the FIPS values where each code represents a State or County of USA. The states will be represented by a value of length 2 (e.g. ‘01’ = Alabama, ‘15’ = Hawaii) and the counties by a value of length 3 (e.g. ‘009’ = Blount County) and the FIPS code of the state to which it belongs. In this case, Blount County is in the state of Alabama, the final FIPS will be: 01+009 (‘01009’).

2. The information that we want to display in the map. In this case we will create 4 maps: Life expectancy (by state and county). Change in Life expectancy (1980-2014) (by state and county). As we have states and counties in the same dataset (FIPS column), we just need two columns more to create 4 maps in total.

life <- life[,c(2,10,11)]
life <- life[-1,]

colnames(life) <- c('GEOID', 'exp_leg', 'change_leg')

# The column with the values will be duplicated. A column  with the 95 % Confidence Interval and the value. The other only with the value.
life$exp <- life$exp_leg
life$change <- life$change_leg

life$exp <- gsub( " *\\(.*?\\) *", "", life$exp) # Delete values between parenthesis
life$change <- gsub( " *\\(.*?\\) *", "", life$change) # Delete values between parenthesis

life$exp <- as.numeric(as.character(life$exp))
life$change <- as.numeric(as.character(life$change))
life$GEOID <- as.character(life$GEOID)

# There are some values that are not correctly converted to character class (eliminate the first zero). In this case, we select those that have a length of 1 or 4 and we add a zero.
life$GEOID <- ifelse(nchar(life$GEOID) == 1 | nchar(life$GEOID) == 4  , paste0('0',life$GEOID),life$GEOID)
glimpse(life)
## Observations: 3,195
## Variables: 5
## $ GEOID      <chr> "01", "01001", "01003", "01005", "01007", "01009", ...
## $ exp_leg    <chr> "75.65 (75.57, 75.72)", "75.67 (75.15, 76.18)", "78...
## $ change_leg <chr> "4.65 (4.48, 4.82)", "5.10 (3.86, 6.43)", "5.41 (4....
## $ exp        <dbl> 75.65, 75.67, 78.08, 75.42, 73.97, 76.16, 73.86, 73...
## $ change     <dbl> 4.65, 5.10, 5.41, 5.46, 3.29, 2.91, 5.70, 2.41, 2.4...

3. Getting shapefile maps

Once we have the data ready, we have to create the shapefile map. In the case of USA the Census Bureau gives the files for the state and county map with difference resolutions. Besides, you can find a lot of different map here such as metropolitan areas.

For this example, I will use the shapefiles maps with a resolution of 20m and with the latest date: 2016. To read the downloaded data, we use the readOGR function from the rgdal package. We will get two shapefiles maps: states map and counties map.

# US MAP - STATES

# The dsn argument is the path of the files and the layer is the name of all files (without the extension).
# Important: If you have downloaded other files (e.g. year 2015) the name of your 
# files will be different and therefore, the layer will be other, check your files.

us.map.state <- readOGR(dsn= './data/map_usa_states', layer = "cb_2016_us_state_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "./data/map_usa_states", layer: "cb_2016_us_state_20m"
## with 52 features
## It has 9 fields
# Remove Alaska(2), Hawaii(15), Puerto Rico (72), Guam (66), Virgin Islands (78), American Samoa (60) Mariana Islands (69), Micronesia (64), Marshall Islands (68), Palau (70), Minor Islands (74)

us.map.state <- us.map.state[!us.map.state$STATEFP %in% c("02", "72", "66", "78", "60", "69","64", "68", "70", "74"),]

# Make sure other outling islands are removed.
us.map.state <- us.map.state[!us.map.state$STATEFP %in% c("81", "84", "86", "87", "89", "71", "76","95", "79"),]

# US MAP - COUNTIES

us.map.county <- readOGR(dsn= './data/map_usa_counties', layer = "cb_2016_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "./data/map_usa_counties", layer: "cb_2016_us_county_20m"
## with 3220 features
## It has 9 fields
# Remove Alaska(2), Hawaii(15), Puerto Rico (72), Guam (66), Virgin Islands (78), American Samoa (60) Mariana Islands (69), Micronesia (64), Marshall Islands (68), Palau (70), Minor Islands (74)

us.map.county <- us.map.county[!us.map.county$STATEFP %in% c("02", "15", "72", "66", "78", "60", "69","64", "68", "70", "74"),]

# Make sure other outling islands are removed.
us.map.county <- us.map.county[!us.map.county$STATEFP %in% c("81", "84", "86", "87", "89", "71", "76","95", "79"),]

4. Life expectancy map

Once we have these two kinds of data:

  1. Dataframe with a code (FIPS) of the location and the values that we will display in the map.
  2. Shapefile map.

We only have two merge both objects using merge function from base. The key in this process is that both objects have to share a column with the same name (‘GEOID’) that we will use in the ‘by’ argument of the merge function.

We don’t have to select those observations which belong to a state or county because the us.map.state object will have a column (‘GEOID’) with the FIPS values of every state, therefore, all observations which belong to a county will be deleted in the leafmap object. In the case of the us.map.county will happen the opposite.

leafmap <- merge(us.map.state, life, by= 'GEOID')

popup_dat <- paste0("<strong>State: </strong>",
                    leafmap$NAME,
                    "<br><strong>Life expectancy with 95% CI : </strong>",
                    leafmap$exp_leg)

pal <- colorNumeric("Spectral", NULL, n = 13)

  leaflet(data = leafmap) %>% 
      addTiles() %>%
      addPolygons(fillColor = ~pal(exp),
                  fillOpacity = 0.8,
                  color = "#BDBDC3",
                  weight = 1,
                  popup = popup_dat) %>%
      addLegend("bottomright", pal = pal, values = ~exp,
                title = "Life expectancy",
                opacity = 1 )
leafmap <- merge(us.map.county, life, by= 'GEOID')

popup_dat <- paste0("<strong>State: </strong>",
                    leafmap$NAME,
                    "<br><strong>Life expectancy with 95% CI : </strong>",
                    leafmap$exp_leg)
pal <- colorQuantile("Spectral", NULL, n = 10)


  leaflet(data = leafmap) %>% 
      addTiles() %>%
      addPolygons(fillColor = ~pal(exp),
                  fillOpacity = 0.8,
                  color = "#BDBDC3",
                  weight = 1,
                  popup = popup_dat)