Let’s now use a more realistic example and visualize a dataset included in the vega_datasets package https://github.com/vega/vega-datasets.html.

We can import the vega datasets using the altair library.

vega_data = altair::import_vega_data()

Check out the list of the available datasets:

vega_data$list_datasets()

and select the one you want to work with. Here, we are using the dataset for Natural Disasters from Our World in Data.

data_source = vega_data$disasters()

Alternatively, you may load data from a local file using standard R code, or read the data from a url using:

data_source = read.csv(url("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv"))

After importing the data, we can take a first look using standard R code:

str(data_source)
summary(data_source)
head(data_source); tail(data_source)
> str(data_source)
'data.frame':	803 obs. of  3 variables:
 $ Entity: Factor w/ 11 levels "All natural disasters",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year  : int  1900 1901 1902 1903 1905 1906 1907 1908 1909 1910 ...
 $ Deaths: int  1267360 200018 46037 6506 22758 42970 1325641 75033 1511524 148233 ...
> summary(data_source)
                   Entity         Year          Deaths       
 All natural disasters:117   Min.   :1900   Min.   :      1  
 Earthquake           :111   1st Qu.:1946   1st Qu.:    270  
 Extreme weather      :111   Median :1975   Median :   1893  
 Flood                : 89   Mean   :1969   Mean   :  81213  
 Landslide            : 79   3rd Qu.:1996   3rd Qu.:  10362  
 Epidemic             : 69   Max.   :2017   Max.   :3706227  
 (Other)              :227                                   
> head(data_source)
                 Entity Year  Deaths
1 All natural disasters 1900 1267360
2 All natural disasters 1901  200018
3 All natural disasters 1902   46037
4 All natural disasters 1903    6506
5 All natural disasters 1905   22758
6 All natural disasters 1906   42970
> tail(data_source)
      Entity Year Deaths
798 Wildfire 2012     21
799 Wildfire 2013     35
800 Wildfire 2014     16
801 Wildfire 2015     67
802 Wildfire 2016     39
803 Wildfire 2017     75

We can now make an altair R plot similar to the one at altair Python https://altair-viz.github.io/gallery/natural_disasters.html For now, we may filter the data in R and use the subset of the data to make the chart. On the data transform section we will see how to do the filtering inside the chart specification.


Below is the code to make this plot.

data_source_subset = subset(data_source, data_source$Entity != "All natural disasters") 

chart_disasters = alt$Chart(data_source_subset)$
  mark_circle(
    opacity=0.8,
    stroke='black',
    strokeWidth=1
  )$
  encode(
    x = "Year:O",
    y = "Entity:N",
    color = "Entity:N",
    size = "Deaths:Q"
  )$
  properties(
    height=200,
    width=500
  )

Here, the global properties of the circles are specified inside the mark attribute while the properties that depend on the data inside the encoding. Using the mark type rect with color and opacity channels we can make a heatmap plot.

chart_disasters = alt$Chart(data_source_subset)$
  mark_rect()$
  encode(
    x = "Entity:O",
    y = "Year:O",
    color = "Entity:N",
    opacity = 'Deaths:Q'
  )$
  properties(
    height=600,
    width=200
  )

Next, using the code below, we can make a time series plot of deaths from all natural disasters from 1900 until 2017.

data_source_subset = subset(data_source, data_source$Entity == "All natural disasters") 

chart_disasters = alt$Chart(data_source_subset)$
  mark_line()$
  encode(
    x='Year:Q',
    y='Deaths:Q',
    tooltip = c("Year", "Deaths")
  )$
  properties(
    height=300,
    width=600
  )

Exercise - Use the color channel to make a time series plot per Entity.

Exercise - Change the field types. What is the result?