Since we are wokring in R, we can modify the data outside the plot specification and then use the modified dataset inside the plot encoding. However, using the altair package, calculations inside the plot specification can be sometimes easier. In this section, we are discussing field trasforms that can be done inside encoding. As we have seen from the beginning of this tutorial, the encoding determines the mapping between the channels and the data. We have already used encoding channels such as position channels x and y and mark property channels, for instance, color and opacity. We only need to add bin = TRUE in the x position channel of a quantitative field to use the binned version of the field in the plot. Below, there is the code to produce a barchart of the sum of deaths versus the binned years.

data_source_subset = subset(data_source, data_source$Entity == "All natural disasters") 

chart_disasters = alt$Chart(data_source_subset)$
  mark_bar()$
  encode(
    x = alt$X("Year:Q", bin = TRUE),
    y = 'sum(Deaths):Q',
    tooltip = 'sum(Deaths):Q'
  )


Mind the difference in the syntax here. We used the long form x = alt$X(), we have seen in the simple barchart section, so that we can specify the binning inside encoding. Other adjustments can be related to scale or axis.


Exercise - Check the documentation of the binning parameters https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html and increase the value of the maximum number of bins.


Exercise - Using data_source_subset = subset(data_source, data_source$Entity != "All natural disasters") make a heatmap that shows the count of disasters per year, like the one below.



Another filed transformation is the one that scales the original field domain to the custom range we specify. For instance, we can transform a quantitative field using the log scale, as we can see below.

chart_disasters = alt$Chart("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")$
  mark_bar()$
  encode(
    x = 'Entity:N',
    y = alt$Y('sum(Deaths):Q', scale=alt$Scale(type='log'))
  )$
  properties(
    height = 300,
    width = 600
  ) 


Fortunately, not in all years from 1900 to 2017 all types of registered disasters occured. Did you notice that in 1904 there is no natural disaster registered? Let’s enrich the dataset in R with a variable for missing values based on the year.

data_source = read.csv(url("https://raw.githubusercontent.com/vega/vega-datasets/master/data/disasters.csv")) # original data
Year = seq(1900, 2017, 1) # create year vector 
Entity = sort(rep(unique(data_source$Entity), 118)) # create entity vector
data_mod = cbind.data.frame(Year, Entity) # create dataframe with complete set of year and entity
data_source_modified = merge(data_source, data_mod, by = c("Year", "Entity"), all = T) # merge df with original data
data_source_modified[is.na(data_source_modified$Deaths),"Deaths"] = 0 # replace NA with zero
data_source_modified$Missing = NULL # create new variable
data_source_modified[data_source_modified$Deaths == 0,"Missing"] = "1" # the value for missing
data_source_modified[data_source_modified$Deaths != 0,"Missing"] = "0" # the value for non-missing
str(data_source_modified) # look at the new data structure
rm(Year, Entity, data_mod) # remove objects that are not needed

Now we can plot the full time series, and specify a custom color scale for the presence of absence of the year in the data. So, the domain of the data is 0 for Non-Missing, 1 for Missing and the custom range is the two colors of our preference, here black and red.

domain_var = c("0", "1")
range_color = c('black', 'red')
range_size = c(50, 150)

data_source_subset = subset(data_source_modified, data_source_modified$Entity == "All natural disasters")

chart_disasters = alt$Chart(data_source_subset)$
  mark_circle(
    opacity = 0.8
  )$
  encode(
    x = 'Year:O',
    y = 'Deaths:Q',
    color = alt$Color('Missing', scale = alt$Scale(domain = domain_var, range = range_color)),
    size = alt$Size('Missing', scale = alt$Scale(domain = domain_var, range = range_size)),
    tooltip = c("Year", "Deaths")
  )$
  properties(
    height = 300,
    width = 600
  )$
  interactive()