Introduction

The volcano3D package enables exploration of probes differentially expressed between three groups. Its main purpose is for the visualisation of differentially expressed genes in a three-dimensional volcano plot. These plots can be converted to interactive visualisations using plotly.

This vignette consists of a case study from the PEAC rheumatoid arthritis project (Pathobiology of Early Arthritis Cohort). The methodology has been published in Lewis, Myles J., et al. ‘Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.’ Cell reports 28.9 (2019): 2455-2470. (DOI: 10.1016/j.celrep.2019.07.091) with an interactive web tool available at https://peac.hpc.qmul.ac.uk.

There are also supplementary vignettes with further information on:

Getting Started

Prerequisites

Install from CRAN

CRAN status

install.packages("volcano3D")

Install from Github

GitHub tag

library(devtools)
install_github("KatrionaGoldmann/volcano3D")

Load the package

library(volcano3D)

Dictionary

Variables used in this vignette:

Variable Definition
contrast the variable by which samples can be split into three groups.
groups the three levels/categories of the contrast variable. These should not contain underscores.
comparison two groups between which a statistical test can be performed. There should be three comparisons total. For the examples outlined in this vignette we look at comparisons: ‘lymphoid-myeloid’, ‘lymphoid-fibroid’ and ‘myeloid-fibroid’.
p p value
FC fold change
padj adjusted p value
suffix the tail word in a column name. In this package it states the statistical parameter (e.g. _logFC is the log FC variable).
prefix the leading word in a column name. In this package it states the statistical test (e.g. LRT is the likelihood ratio test).
polar A polar coordinates object, of S4 class, containing the expression data, sample data, pvalues and polar coordinates.

Example 1. Synovial Gene Data

This vignette will demonstrate the power of this package using a basic example from the PEAC data set. Here we will focus on the synovial data from this cohort.

Using the synovial biopsies from PEAC we can create a polar object for differentially expressed genes. The sample data used in this vignette can be loaded from the volcano3Ddata package.

devtools::install_github("KatrionaGoldmann/volcano3Ddata")
library(volcano3Ddata)
data("syn_data")

Samples in this cohort fall into three pathotype groups:

kable(table(syn_metadata$Pathotype), col.names = c("Pathotype", "Count"))
Pathotype Count
Fibroid 16
Lymphoid 45
Myeloid 20

In this example we are interested in genes that are differentially expressed between each of these groups.

First we will set up a polar object, using the polar_coords function, which uses:

Variable Details

sampledata

(required)

This shows information for each sample in rows and must contain:

  • an ID column: Containing the sample IDs. This must be titled ‘ID’.

  • a contrast column: Indicates which of the three groups each sample belongs to.

contrast

(required)
The column name in sampledata which contains the three-level factor used for contrast

pvalues

(required)

the pvalues data.frame which contains the statistical significance of probes between groups. This contains:

expression

(required)
A data frame or matrix containing the expression data. This is used to calculate z-score and fold change, therefore it should be a normalised expression object such as log transformed or variance stabilised.
groups The groups to be compared (in order). If NULL this defaults to levels(sampledata[, 'contrasts']). These must not contain underscores.
p_col_suffix The suffix of column names with pvalues (default is ‘pvalue’). This must not contain underscores.
padj_col_suffix The suffix of column names with adjusted pvalues (default is ‘padj’). This must not contain underscores. If NULL the adjusted pvalue is calculated using p_col_suffix and pvalue_method.
padjust_method The method to calculate adjusted pvalues if not already provided. Must be one of c(‘holm’, ‘hochberg’, ‘hommel’, ‘bonferroni’, ‘BH’, ‘BY’, ‘fdr’, ‘none’). Default is ‘BH’.
fc_col_suffix The suffix of column names with log(fold change) values (default is ‘logFC’). This must not contain underscores.
multi_group_prefix The prefix for columns containing statistics for a multi-group test (this is typically a likelihood ratio test or ANOVA). Default is NULL. This must not contain underscores.
label_column A column name in pvalues which is to be used to label markers of interest at plotting stage. If NULL the rownames will be used.

We can map the PEAC data to polar coordinates using:

syn_polar <- polar_coords(sampledata = syn_metadata,
                          contrast = "Pathotype",
                          pvalues = syn_pvalues,
                          expression = syn_rld,
                          p_col_suffix = "pvalue",
                          padj_col_suffix = "padj",
                          fc_col_suffix = "log2FoldChange",
                          multi_group_prefix = "LRT",
                          non_sig_name = "Not Significant",
                          significance_cutoff = 0.01,
                          label_column = NULL,
                          fc_cutoff = 0.1)

This creates a polar class object with slots for: sampledata, contrast, pvalues, multi_group_test, expression, polar and non_sig_name. The pvalues slot which should have a data frame with at least two statistics for each comparison - pvalue and adjusted pvalue - and an optional logarithmic fold change statistic. In this case we are including fold change for downstream visualisations:

head(syn_polar@pvalues)
Fibroid_Lymphoid _pvalue Fibroid_Lymphoid _logFC Fibroid_Lymphoid _padj Lymphoid_Myeloid _pvalue Lymphoid_Myeloid _logFC Lymphoid_Myeloid _padj Myeloid_Fibroid _pvalue Myeloid_Fibroid _logFC Myeloid_Fibroid _padj LRT _pvalue LRT _padj label
A2M 0.7775361 -0.0478326 1 0.1563045 -0.2202768 1.0000000 0.1694654 0.2681094 1 0.2739571 0.4353436 A2M
A2ML1 0.0002106 1.6854024 1 0.0000030 -1.9299687 0.0464724 0.6348844 0.2445664 1 0.0000022 0.0000202 A2ML1
A4GALT 0.0000000 1.0248506 0 0.0000192 -0.5646665 0.2927492 0.0055222 -0.4601841 1 0.0000000 0.0000000 A4GALT
A4GNT 0.0238302 -1.1527859 1 0.6691773 -0.1857435 1.0000000 0.0197330 1.3385294 1 0.0732984 0.1583022 A4GNT
AAAS 0.8574281 -0.0245296 1 0.6608837 0.0548697 1.0000000 0.8469982 -0.0303402 1 0.9075226 0.9711813 AAAS
AACS 0.2815302 0.1729592 1 0.0630182 -0.2732040 1.0000000 0.5874802 0.1002448 1 0.1472059 0.2732939 AACS

The sig column in syn_polar@polar allows us to determine relative differences in expression between groups (in this case pathotypes). The ‘+’ indicates which pathotypes are significantly ‘up’ compared to others. For example:

  • genes labelled ‘Lymphoid+’ are significantly up in Lymphoid vs Myeloid and Lymphoid vs Fibroid.

  • genes up in two pathotypes such as ‘Lymphoid+Myeloid+’ are up in both Lymphoid and Myeloid, therefore Lymphoid vs Fibroid and Myeloid vs Fibroid are statistically significant.

  • genes which show no significant difference between pathotypes are classed according to non_sig_name

This gives us:

setNames(data.frame(table(syn_polar@polar$sig)), c("Significance", "Frequency"))
Significance Frequency
Fibroid+ 885
Fibroid+Lymphoid+ 19
Fibroid+Myeloid+ 1504
Lymphoid+ 1793
Lymphoid+Myeloid+ 500
Myeloid+ 119
Not Significant 11415

Volcano Plots

If there is a fold change column previously provided, we can now investigate the comparisons between pathotypes using the volcano_trio function. This creates three ggplot outputs.

syn_plots <- volcano_trio(polar = syn_polar,
                          sig_names = rep(c("not significant","significant"), each=2),
                          colours = rep(c( "slateblue1", "grey60"), each=2),
                          colour_scheme="none",
                          text_size = 9,
                          marker_size=1.5,
                          shared_legend_size = 0.9,
                          label_rows = c("SLAMF6", "PARP16", "ITM2C"),
                          fc_line = FALSE,
                          share_axes = FALSE)

syn_plots$All

Alternatively using the polar significance levels

syn_plots <- volcano_trio(polar = syn_polar,
                          colour_scheme="polar",
                          colours = c('green3', 'cyan',  'blue',
                                     'purple', 'red', 'gold2', 'grey60'),
                          text_size = 9,
                          marker_size=1.5,
                          shared_legend_size = 0.9,
                          label_rows = c("SLAMF6", "PARP16", "ITM2C"),
                          fc_line = FALSE,
                          p_line = FALSE,
                          share_axes = FALSE)

syn_plots$All

or the upregulated group within the standard volcano plot:

syn_plots <- volcano_trio(polar = syn_polar,
                          colour_scheme="upregulated",
                          colours = c('green3', 'cyan',  'blue',
                                     'purple', 'red', 'gold2', 'grey60'),
                          text_size = 9,
                          marker_size=1.5,
                          shared_legend_size = 0.9,
                          label_rows = c("SLAMF6", "PARP16", "ITM2C"),
                          fc_line = FALSE,
                          p_line = FALSE,
                          share_axes = FALSE)

syn_plots$All

Radial Plots

The differential expression can now be visualised on an interactive radar plot using radial_plotly.

Using the hover parameter (implemented v1.0.2) the plotly hover information can be altered by referencing the columns available in and i.e.:

unique(c(colnames(syn_polar@polar), colnames(syn_polar@pvalues)))
##  [1] "Name"                    "Fibroid_axis"           
##  [3] "Lymphoid_axis"           "Myeloid_axis"           
##  [5] "y_zscore"                "x_zscore"               
##  [7] "r_zscore"                "x_fc"                   
##  [9] "y_fc"                    "r_fc"                   
## [11] "angle"                   "angle_degrees"          
## [13] "max_exp"                 "sig"                    
## [15] "label"                   "Fibroid_Lymphoid_pvalue"
## [17] "Fibroid_Lymphoid_logFC"  "Fibroid_Lymphoid_padj"  
## [19] "Lymphoid_Myeloid_pvalue" "Lymphoid_Myeloid_logFC" 
## [21] "Lymphoid_Myeloid_padj"   "Myeloid_Fibroid_pvalue" 
## [23] "Myeloid_Fibroid_logFC"   "Myeloid_Fibroid_padj"   
## [25] "LRT_pvalue"              "LRT_padj"

For example to show all pvalues on hover we can use:

hovertext = "paste(label, 
              '\nL vs M pvalue:', format(Lymphoid_Myeloid_pvalue, digits=3), 
              '\nM vs F pvalue:', format(Myeloid_Fibroid_pvalue, digits=3), 
              '\nL vs F pvalue:', format(Fibroid_Lymphoid_pvalue, digits=3))"

Similarly the labelRows variable allows any markers of interest to be labelled:

radial_plotly(polar = syn_polar,
              label_rows = c("SLAMF6", "PARP16", "ITM2C"),
              hover=hovertext)

By hovering over certain points you can also determine genes for future interrogation.

Similarly we can create a static ggplot image using radial_ggplot:

radial_ggplot(polar = syn_polar,
              label_rows = c("SLAMF6", "FMOD"),
              marker_size = 2.3,
              legend_size = 10) +
  theme(legend.position = "right")

Alternatively a continuous colour scale can be produced by converting the angle to a hsv variable. This angle can be offset by continuous_shift.

radial_ggplot(polar = syn_polar,
              label_rows = c("SLAMF6", "PARP16", "ITM2C"),
              marker_size = 2.3,
              marker_alpha=0.7,
              colour_scale = "continuous",
              continuous_shift=1.33,
              legend_size = 10)

Boxplots

We can then interrogate any one specific variable as a boxplot, to investigate these differences. This is build using either ggplot2 or plotly so can easily be edited by the user to add features. Using plotly:

plot1 <- boxplot_trio(syn_polar,
                      value = "FAM92B",
                      text_size = 7,
                      test = "polar_padj",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"),
                      step_increase = 0.2,
                      plot_method='plotly')

plot2 <- boxplot_trio(syn_polar,
                      value = "SLAMF6",
                      text_size = 7,
                      test = "polar_multi_padj",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"),
                      plot_method='plotly')

plot3 <- boxplot_trio(syn_polar,
                      value = "PARP16",
                      text_size = 7,
                      stat_size=2.5,
                      test = "t.test",
                      levels_order = c("Myeloid", "Fibroid"),
                      box_colours = c("pink", "gold"),
                      plot_method='plotly')

plotly::subplot(plot1, plot2, plot3, titleY=TRUE, margin=0.05)

Or using ggplot

plot1 <- boxplot_trio(syn_polar,
                      value = "FAM92B",
                      text_size = 7,
                      test = "polar_padj",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"),
                      step_increase = 0.1)

plot2 <- boxplot_trio(syn_polar,
                      value = "SLAMF6",
                      text_size = 7,
                      test = "polar_multi_padj",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"))

plot3 <- boxplot_trio(syn_polar,
                      value = "PARP16",
                      text_size = 7,
                      stat_size=2.5,
                      test = "t.test",
                      levels_order = c("Myeloid", "Fibroid"),
                      box_colours = c("pink", "gold"))

ggarrange(plot1, plot2, plot3, ncol=3)

Three Dimensional Volcano Plots

The final thing we can look at is the 3D volcano plot which projects differential gene expression onto cylindrical coordinates.

p <- volcano3D(syn_polar,
               label_rows = c("SLAMF6", "PARP16", "ITM2C"),
               label_size = 10,
               axis_title_offset = 1.3,
               colour_code_labels = F,
               label_colour = "black",
               xy_aspectratio = 1,
               z_aspectratio = 0.9,
               plot_height = 800)

p

We can alter the colour code using the colours parameter. These are assigned in order group1+, group1+group2+, group2+, group2+group3+, group3+, group1+group3+.

p <- volcano3D(syn_polar,
               label_rows = c("SLAMF6", "PARP16", "ITM2C"),
               label_size = 10,
               xy_aspectratio = 1,
               z_aspectratio = 0.9,
               colours = c("grey60", "grey60", "blue",
                           "grey60", "grey60", "grey60"),
               hover_text = "paste0(Name, '\n', sig)",
               plot_height=800)

p

Saving Plotly Plots

Static Images

There are a few ways to save plotly plots as static images. Firstly plotly offers a download button ( ) in the figure mode bar (appears top right). By default this saves images as png, however it is possible to convert to svg, jpeg or webp using:

p %>% plotly::config(toImageButtonOptions = list(format = "svg"))

Alternatively, if orca command-line utility is installed, this can also be used to save plotly images. To install follow the instructions here.

orca(p, "./volcano_3d_synovium.svg", format = "svg")

Interactive HTML

The full plotly objects can be saved to HTML by converting them to widgets and saving with the htmlwidgets package:

htmlwidgets::saveWidget(as_widget(p), "volcano3D.html")

Altering the Plots

Altering the Grid

By default volcano3D generates a grid using 12 spokes. If you wish to override any of these variables it is possible to pass in your own grid. This is also possible for the 3D radial plots where the z elements can be left as NULL.

By manually creating a grid object it is possible to change the tick points on axes as well as the number of radial spokes (n_spokes). The default ticks are calculated from r_vector and z_vector using the pretty function. This can be overwritten by passing tick points in r_axis_ticks and z_axis_ticks.

For example, we can decrease the number of radial spokes to 4, while altering the z axis ticks:

four_grid = polar_grid(r_vector=syn_polar@polar$r_zscore,
                       z_vector=NULL,
                       r_axis_ticks = NULL,
                       z_axis_ticks = c(0, 8, 16, 32),
                       n_spokes = 4)

We can inspect the grid using show_grid() which creates both the polar and cylindrical coordinate system:

p <- show_grid(four_grid)

p$cylindrical

and pass it into the plotting functions:

volcano3D(syn_polar,
          grid = four_grid,
          label_rows = c("SLAMF6", "PARP16", "ITM2C"),
          label_size = 10,
          xy_aspectratio = 1,
          z_aspectratio = 0.9)

For example to extend the radial axis and increase the number of spokes in 2D we can apply:

new_grid = polar_grid(r_vector=NULL,
                      z_vector=-log10(syn_polar@pvalues$LRT_pvalue),
                      r_axis_ticks = c(1, 2, 3),
                      z_axis_ticks = NULL,
                      n_spokes = 24)

Altering the Annotations

To amend or reposition the plotly labels it is possible to use the editable parameter. This allows you to move annotation features (labels), change text including legends and titles. Click on a feature to change it:

p = radial_plotly(polar = syn_polar,
                  label_rows = c("SLAMF6", "PARP16", "ITM2C"))

p %>% config(editable = TRUE)

Altering the Labels

By default the rownames of the pvalues object is used to label markers of interest. This can be altered by amending the label column in the polar object:

syn_polar@polar$label <- paste0(syn_polar@polar$label, "!")

radial_ggplot(polar = syn_polar,
              label_rows = c("SLAMF6", "PARP16", "ITM2C"),
              grid = new_grid,
              marker_size = 2.3,
              legend_size = 10) +
  theme(legend.position = "right")

Example 2. Synovial Modular Data

We can collapse this example into a modular analysis using a list of gene sets. In this example we have used the blood transcript modules curated by Li et. al. in ‘Li, S., Rouphael, N., Duraisingham, S., Romero-Steiner, S., Presnell, S., Davis, C., … & Kasturi, S. (2014). Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nature immunology, 15(2), 195.’. The pvalues were generated using QuSAGE methodology.

Creating polar coordinates

The modular analysis can be loaded through Li_pvalues:

data(Li_pvalues)

syn_mod_polar <- polar_coords(sampledata = syn_metadata,
                              contrast = "Pathotype",
                              pvalues = syn_mod_pvalues,
                              p_col_suffix = "p.value",
                              padj_col_suffix = "q.value",
                              fc_col_suffix = "logFC",
                              multi_group_prefix = NULL,
                              expression = syn_mod,
                              significance_cutoff = 0.01,
                              fc_cutoff = 0.1
)

Volcano Plots

syn_mod_plots <- volcano_trio(polar = syn_mod_polar,
                              label_rows = c("M156.0", "M37.2"),
                              shared_legend_size = 1,
                              sig_names = c("Not Sig",
                                            paste("Padj <", 0.05),
                                            paste("|FC| >", 1),
                                            paste("Padj <", 0.05,
                                                  "&\n|FC| >", 1)),
                              share_axes = FALSE)

syn_mod_plots$All

Radial Plots

radial_plotly(polar = syn_mod_polar,
              label_rows = c("M156.0", "M37.2"))

Or a ggplot static image using radial_ggplot:

radial_ggplot(polar = syn_mod_polar,
              label_rows = c("M156.0", "M37.2"),
              marker_size = 2.7,
              label_size = 5,
              axis_lab_size = 3,
              axis_title_size = 5,
              legend_size = 10)

Boxplots

We can then interrogate specific modules with boxplot_trio:

plot1 <- boxplot_trio(syn_mod_polar,
                      value = "M156.0",
                      test = "wilcox.test",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"))

plot2 <- boxplot_trio(syn_mod_polar,
                      value = "M37.2",
                      test = "wilcox.test",
                      levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
                      box_colours = c("blue", "red", "green3"))

ggpubr::ggarrange(plot1, plot2)


Citation

If you use this package please cite as:

citation("volcano3D")
## 
## To cite package 'volcano3D' in publications use:
## 
##   Katriona Goldmann and Myles Lewis (2020). volcano3D: Interactive
##   Plotting of Three-Way Differential Expression Analysis.
##   https://katrionagoldmann.github.io/volcano3D/index.html,
##   https://github.com/KatrionaGoldmann/volcano3D.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {volcano3D: Interactive Plotting of Three-Way Differential Expression
## Analysis},
##     author = {Katriona Goldmann and Myles Lewis},
##     year = {2020},
##     note = {https://katrionagoldmann.github.io/volcano3D/index.html, https://github.com/KatrionaGoldmann/volcano3D},
##   }

or using:

Lewis, Myles J., et al. ‘Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.’ Cell reports 28.9 (2019): 2455-2470.