The volcano3D package enables exploration of probes differentially expressed between three groups. Its main purpose is for the visualisation of differentially expressed genes in a threedimensional volcano plot. These plots can be converted to interactive visualisations using plotly.
This vignette consists of a case study from the PEAC rheumatoid arthritis project (Pathobiology of Early Arthritis Cohort). The methodology has been published in Lewis, Myles J., et al. ‘Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.’ Cell reports 28.9 (2019): 24552470. (DOI: 10.1016/j.celrep.2019.07.091) with an interactive web tool available at https://peac.hpc.qmul.ac.uk.
There are also supplementary vignettes with further information on:
This vignette will demonstrate the power of this package using a basic example from the PEAC data set. Here we will focus on the synovial data from this cohort.
Using the synovial biopsies from PEAC we can create a polar object for differentially expressed genes. The sample data used in this vignette can be loaded from the volcano3Ddata package.
devtools::install_github("KatrionaGoldmann/volcano3Ddata")
library(volcano3Ddata)
data("syn_data")
Samples in this cohort fall into three pathotype groups:
Pathotype  Count 

Fibroid  16 
Lymphoid  45 
Myeloid  20 
In this example we are interested in genes that are differentially expressed between each of these groups.
The differential expression can be mapped to polar coordinates using the polar_coords function, which has inputs:
Variable  Details 

outcome (required) 
Vector containing threelevel factor indicating which of the three classes each sample belongs to. 
data (required) 
A dataframe or matrix containing data to be compared between the three classes (e.g. gene expression data). Note that variables are in columns, so gene expression data will need to be transposed. This is used to calculate zscore and fold change, so for gene expression count data it should be normalised such as log transformed or variance stabilised count transformation. 
pvals (optional) 
the pvals matrix which contains the statistical significance of probes or attributes between classes. This contains:

padj (optional) 
Matrix containing the adjusted pvalues matching the pvals matrix. 
pcutoff  Cutoff for pvalue significance 
scheme  Vector of colours starting with nonsignificant attributes 
labs 
Optional character vector for labelling classes. Default
NULL leads to abbreviated labels based on levels in
outcome using abbreviate() . A vector of length
3 with custom abbreviated names for the outcome levels can be supplied.
Otherwise a vector length 7 is expected, of the form “ns”, “B+”, “B+C+”,
“C+”, “A+C+”, “A+”, “A+B+”, where “ns” means nonsignificant and A, B, C
refer to levels 1, 2, 3 in outcome , and must be in the
correct order.

The polar_coords()
function can be used to map any
multidimensional data where you are comparing 3 classes. The function
will perform statistical analysis comparing each column or attribute
against the three classes using a group test which can either be a
oneway ANOVA or KruskalWallis test. Then it will perform pairwise
comparisons of the three groups. Alternatively a table of pvalues can
be supplied by the user.
For researchers investigating RNASeq gene expression data, the package provides 2 easy to pipeline functions which enable analysis of count data via the Bioconductor packages ‘DESeq2’ or ‘limma voom’.
Thus we can map the PEAC data to polar coordinates using the
deseq_polar()
function.
library(DESeq2)
library(volcano3Ddata)
data("syn_txi")
syn_metadata$Pathotype < factor(syn_metadata$Pathotype,
levels = c('Lymphoid', 'Myeloid', 'Fibroid'))
# setup initial dataset from Tximport
dds < DESeqDataSetFromTximport(txi = syn_txi,
colData = syn_metadata,
design = ~ Pathotype + Batch + Gender)
# initial analysis run
dds_DE < DESeq(dds)
# likelihood ratio test on 'Pathotype'
dds_LRT < DESeq(dds, test = "LRT", reduced = ~ Batch + Gender, parallel = TRUE)
# create 'volc3d' class object for plotting
res < deseq_polar(dds_DE, dds_LRT, "Pathotype")
# plot 3d volcano plot
volcano3D(res)
An alternative method is to use the voom_polar()
function which uses the ‘limma voom’ RNASeq analysis pipeline.
library(limma)
library(edgeR)
data("syn_txi")
syn_tpm < syn_txi$counts
syn_tpm < syn_tpm[!duplicated(rownames(syn_tpm)), ]
# create 'volc3d' class object for plotting
resl < voom_polar(~0 + Pathotype + Batch + Gender, syn_metadata, syn_tpm)
# plot 3d volcano plot
volcano3D(resl)
Finally we show how you can manually provide data (ideally normally
distributed) to generate a ‘volc3d’ object for plotting using the
polar_coords()
function. If the argument pvals
is not supplied, then polar_coords
will calculate pvalues
for the group test (either oneway ANOVA or KruskalWallis test) and
pairwise tests (ttest or Wilcoxon test). Alternatively the user can
provide a matrix of pvalues using their own statistical method.
# Place Pathotype levels in correct sequence
syn_metadata$Pathotype < factor(syn_metadata$Pathotype,
levels = c('Lymphoid', 'Myeloid', 'Fibroid'))
syn_polar < polar_coords(outcome = syn_metadata$Pathotype,
data = t(syn_rld))
This creates a ‘volc3d’ S4 class object with slots for:
df
, outcome
, data
,
pvals
and padj
. The df
slot is a
list of 2 dataframes. The first dataframe contains polar coordinates for
scaled data, while the 2nd dataframe has polar coordinates for unscaled
data. In gene expression data which is log2 transformed, the polar scale
for the 2nd dataframe equates to log2 fold change.
The lab
column in syn_polar@df[[1]]
allows
us to determine relative differences in expression between groups (in
this case pathotypes). The ‘+’ indicates which pathotypes are
significantly ‘up’ compared to others. For example:
genes labelled ‘L+’ are significantly up in Lymphoid vs other groups
genes up in two pathotypes such as ‘L+M+’ are up in both Lymphoid and Myeloid, therefore Lymphoid vs Fibroid and Myeloid vs Fibroid are statistically significant.
genes which show no significant difference between pathotypes are
classed as ns
This gives us:
setNames(data.frame(table(syn_polar@df[[1]]$lab)), c("Significance", "Frequency"))
Significance  Frequency 

ns  11091 
M+  59 
M+F+  1614 
F+  1049 
L+F+  16 
L+  2074 
L+M+  332 
To subset and view objects of specific significance groups you can
use the significance_subset
function.
pvals_subset < significance_subset(syn_polar,
significance = c("L+", "M+"),
output="pvals")
polar_subset < significance_subset(syn_polar,
significance = c("L+", "M+"),
output="polar")
head(pvals_subset) %>%
kable() %>%
kable_styling(full_width = F)
onewayp  p1  p2  p3  

A4GNT  0.0048730  0.8756236  0.0022831  0.0463533 
AAGAB  0.0053439  0.5366807  0.0018890  0.0326287 
AASDHPPT  0.0039278  0.2234050  0.0018139  0.0857253 
ABCA1  0.0075685  0.3563701  0.0017229  0.0188816 
ABCC3  0.0063489  0.3280569  0.0017382  0.0313977 
ABCC4  0.0001794  0.0553812  0.0000308  0.1019856 
The differential expression can now be visualised on an interactive
radar plot using radial_plotly. The labelRows
argument
allows genes/ attributes of interest to be labelled:
radial_plotly(polar = syn_polar,
label_rows = c("SLAMF6", "PARP16", "ITM2C"),
hover=hovertext)