# Introduction

The volcano3D package enables exploration of probes differentially expressed between three groups. Its main purpose is for the visualisation of differentially expressed genes in a three-dimensional volcano plot. These plots can be converted to interactive visualisations using plotly.

This vignette consists of a case study from the PEAC rheumatoid arthritis project (Pathobiology of Early Arthritis Cohort). The methodology has been published in Lewis, Myles J., et al. ‘Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.’ Cell reports 28.9 (2019): 2455-2470. (DOI: 10.1016/j.celrep.2019.07.091) with an interactive web tool available at https://peac.hpc.qmul.ac.uk.

There are also supplementary vignettes with further information on:

• Using the volcano3D package to perform 2x3-way analysis. In this type of analysis there is a binary factor such as drug response (responders vs non-responders) and a 2nd factor with 3 classes such as a trial with 3 drugs. See here.
• Using the volcano3D package to create and deploy a shiny app. See here.

## Getting Started

### Install from CRAN

install.packages("volcano3D")

### Install from Github

library(devtools)
install_github("KatrionaGoldmann/volcano3D")

library(volcano3D)

# Example 1. Synovial Gene Data

This vignette will demonstrate the power of this package using a basic example from the PEAC data set. Here we will focus on the synovial data from this cohort.

Using the synovial biopsies from PEAC we can create a polar object for differentially expressed genes. The sample data used in this vignette can be loaded from the volcano3Ddata package.

devtools::install_github("KatrionaGoldmann/volcano3Ddata")
library(volcano3Ddata)
data("syn_data")

Samples in this cohort fall into three pathotype groups:

kable(table(syn_metadata$Pathotype), col.names = c("Pathotype", "Count")) Pathotype Count Fibroid 16 Lymphoid 45 Myeloid 20 In this example we are interested in genes that are differentially expressed between each of these groups. The differential expression can be mapped to polar coordinates using the polar_coords function, which has inputs: Variable Details outcome (required) Vector containing three-level factor indicating which of the three classes each sample belongs to. data (required) A dataframe or matrix containing data to be compared between the three classes (e.g. gene expression data). Note that variables are in columns, so gene expression data will need to be transposed. This is used to calculate z-score and fold change, so for gene expression count data it should be normalised such as log transformed or variance stabilised count transformation. pvals (optional) the pvals matrix which contains the statistical significance of probes or attributes between classes. This contains: • the first column is a group test such as one-way ANOVA or Kruskal-Wallis test. • columns 2-4 contain p-values one for each comparison in the sequence A vs B, A vs C, B vs C, where A, B, C are the three levels in sequence in the outcome factor. For gene expression RNA-Seq count data, conduit functions using ‘limma voom’ or ‘DESeq’ pipelines to extract p-values for analysis are provided in functions deseq_polar() and voom_polar(). If p-values are not provided by the user, they can be calculated via the polar_coords() function. padj (optional) Matrix containing the adjusted p-values matching the pvals matrix. pcutoff Cut-off for p-value significance scheme Vector of colours starting with non-significant attributes labs Optional character vector for labelling classes. Default NULL leads to abbreviated labels based on levels in outcome using abbreviate(). A vector of length 3 with custom abbreviated names for the outcome levels can be supplied. Otherwise a vector length 7 is expected, of the form “ns”, “B+”, “B+C+”, “C+”, “A+C+”, “A+”, “A+B+”, where “ns” means non-significant and A, B, C refer to levels 1, 2, 3 in outcome, and must be in the correct order. The polar_coords() function can be used to map any multi-dimensional data where you are comparing 3 classes. The function will perform statistical analysis comparing each column or attribute against the three classes using a group test which can either be a one-way ANOVA or Kruskal-Wallis test. Then it will perform pairwise comparisons of the three groups. Alternatively a table of p-values can be supplied by the user. For researchers investigating RNA-Seq gene expression data, the package provides 2 easy to pipeline functions which enable analysis of count data via the Bioconductor packages ‘DESeq2’ or ‘limma voom’. Thus we can map the PEAC data to polar coordinates using the deseq_polar() function. library(DESeq2) library(volcano3Ddata) data("syn_txi") syn_metadata$Pathotype <- factor(syn_metadata$Pathotype, levels = c('Lymphoid', 'Myeloid', 'Fibroid')) # setup initial dataset from Tximport dds <- DESeqDataSetFromTximport(txi = syn_txi, colData = syn_metadata, design = ~ Pathotype + Batch + Gender) # initial analysis run dds_DE <- DESeq(dds) # likelihood ratio test on 'Pathotype' dds_LRT <- DESeq(dds, test = "LRT", reduced = ~ Batch + Gender, parallel = TRUE) # create 'volc3d' class object for plotting res <- deseq_polar(dds_DE, dds_LRT, "Pathotype") # plot 3d volcano plot volcano3D(res) An alternative method is to use the voom_polar() function which uses the ‘limma voom’ RNA-Seq analysis pipeline. library(limma) library(edgeR) data("syn_txi") syn_tpm <- syn_txi$counts
syn_tpm <- syn_tpm[!duplicated(rownames(syn_tpm)), ]

# create 'volc3d' class object for plotting
resl <- voom_polar(~0 + Pathotype + Batch + Gender, syn_metadata, syn_tpm)

# plot 3d volcano plot
volcano3D(resl)

Finally we show how you can manually provide data (ideally normally distributed) to generate a ‘volc3d’ object for plotting using the polar_coords() function. If the argument pvals is not supplied, then polar_coords will calculate p-values for the group test (either one-way ANOVA or Kruskal-Wallis test) and pairwise tests (t-test or Wilcoxon test). Alternatively the user can provide a matrix of p-values using their own statistical method.

# Place Pathotype levels in correct sequence
syn_metadata$Pathotype <- factor(syn_metadata$Pathotype,
levels = c('Lymphoid', 'Myeloid', 'Fibroid'))

syn_polar <- polar_coords(outcome = syn_metadata$Pathotype, data = t(syn_rld)) This creates a ‘volc3d’ S4 class object with slots for: df, outcome, data, pvals and padj. The df slot is a list of 2 dataframes. The first dataframe contains polar coordinates for scaled data, while the 2nd dataframe has polar coordinates for unscaled data. In gene expression data which is log2 transformed, the polar scale for the 2nd dataframe equates to log2 fold change. The lab column in syn_polar@df[[1]] allows us to determine relative differences in expression between groups (in this case pathotypes). The ‘+’ indicates which pathotypes are significantly ‘up’ compared to others. For example: • genes labelled ‘L+’ are significantly up in Lymphoid vs other groups • genes up in two pathotypes such as ‘L+M+’ are up in both Lymphoid and Myeloid, therefore Lymphoid vs Fibroid and Myeloid vs Fibroid are statistically significant. • genes which show no significant difference between pathotypes are classed as ns This gives us: setNames(data.frame(table(syn_polar@df[[1]]$lab)), c("Significance", "Frequency"))
Significance Frequency
ns 11091
M+ 59
M+F+ 1614
F+ 1049
L+F+ 16
L+ 2074
L+M+ 332

To subset and view objects of specific significance groups you can use the significance_subset function.

pvals_subset <- significance_subset(syn_polar,
significance = c("L+", "M+"),
output="pvals")

polar_subset <- significance_subset(syn_polar,
significance = c("L+", "M+"),
output="polar")

kable() %>%
kable_styling(full_width = F)
onewayp p1 p2 p3
A4GNT 0.0048730 0.8756236 0.0022831 0.0463533
AAGAB 0.0053439 0.5366807 0.0018890 0.0326287
AASDHPPT 0.0039278 0.2234050 0.0018139 0.0857253
ABCA1 0.0075685 0.3563701 0.0017229 0.0188816
ABCC3 0.0063489 0.3280569 0.0017382 0.0313977
ABCC4 0.0001794 0.0553812 0.0000308 0.1019856

The differential expression can now be visualised on an interactive radar plot using radial_plotly. The labelRows argument allows genes/ attributes of interest to be labelled:
radial_plotly(polar = syn_polar,
hover=hovertext)