Tutorial : Transcriptome-Wide Differential Expression Analysis with DiffSegR • DiffSegR

Data

In this tutorial we use an RNA-Seq dataset (SRA046998) generated to study the differences in Arabidopsis thaliana chloroplast RNA metabolism between a control (wt) and a null mutant of the chloroplast-targeted \(3'\to5'\) exoribonuclease polynucleotide phosphorylase (pnp1-1). In comparison to control plants, this mutant overaccumulate RNA fragments that are mainly located outside of the annotated genic areas and the RNA-Seq data have been extensively validated using molecular techniques (Castandet et al. 2013; Hotto et al. 2015). Both conditions contain two biological replicates. Only a diluted set of reads overlapping positions 71950 to 78500 on chloroplast genome has been kept as a minimal dataset example.

Overview

After setting up the necessary working environnement and loading the software, we are going through each step of a classical differential expression analysis along the genome with DiffSegR:

Definition of the experiment and calculation of coverages from bam files ;
Summarize the differential transcription landscape, i.e. (i) identify segments along the genome in the per-base \(\log_2\)-FC and (ii) count reads overlapping them ;
Use the counts as measures of the expression abundance of the segments for differential expression analysis ;
Annotate the differential expressed regions (DERs) ;
Visualize the DERs.

Setup

First, we create a working directory that will host all the files produced by DiffSegR during the analysis (coverages, results).

working_directory <- "./DIFFSEGR_TEST"
dir.create(working_directory, showWarnings = FALSE)

Then, we save sample information that we will use for this tutorial in a data.frame. In practice, one can create the working directory and write the text file by hand. The sample information file needs to include the following columns:

sample : identifier for the sample ;
condition : identifier for the condition ;
replicate : identifier for the replicate ;
bam : path to the bam file (.bam) ;
isPairedEnd : logical indicating if the library contain paired-end reads or not (TRUE/FALSE) ;
strandSpecific : integer indicating if a strand-specific analysis should be performed. 0 (unstranded), 1 (stranded) and 2 (reversely stranded). WARNING: All samples must be consistently stranded or non-stranded. Ensure uniformity across your dataset to avoid analysis errors.

Note: Coverages are computed internally by DiffSegR from bam files and saved in rds format at the location specified by the user (coverage column).

#- create sample information table --------------------------------------------#
sample_info <- data.frame(
  sample    = c("pnp1_1_1", "pnp1_1_2", "wt_1", "wt_2"),
  condition = rep(c("pnp1_1", "wt"), each = 2),
  replicate = rep(1:2,2),
  bam       = sapply(
    c("pnp1_1_1_ChrC_71950_78500.bam", 
      "pnp1_1_2_ChrC_71950_78500.bam",
      "wt_1_ChrC_71950_78500.bam",
      "wt_2_ChrC_71950_78500.bam"
    ),
    function(bam) system.file("extdata", bam, package = "DiffSegR")
  ),
  isPairedEnd    = rep(FALSE, 4),
  strandSpecific = rep(1, 4)
)

#- display sample information table -------------------------------------------#
knitr::kable(sample_info, row.names = FALSE)

sample	condition	replicate	bam	isPairedEnd	strandSpecific
pnp1_1_1	pnp1_1	1	/tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam	FALSE	1
pnp1_1_2	pnp1_1	2	/tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam	FALSE	1
wt_1	wt	1	/tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam	FALSE	1
wt_2	wt	2	/tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam	FALSE	1

Analysis

Now, we present a basic use of the main functions of the DiffSegR R package. Our focus is on analyzing the sequence stretch from positions 71950 to 78500 in the chloroplast genome (ChrC). This locus corresponds to the polycistronic gene cluster from psbB to petD. To optimize processing, we’ll allocate 4 logical cores for examining this locus. Users have the flexibility to adjust this setting based on their available computational resources. Additionally, it’s possible to concurrently analyze multiple genomic regions of interest (see our multi-loci analysis tutorial). By doing so, users can efficiently manage the total number of threads for the analysis, as well as how many threads are assigned to each locus. Thus, the ratio of the total number of threads to the number assigned per locus determines the number of locus analyzed simultaneously.

library(DiffSegR)
nb_threads_tot = 4
nb_threads_locus = 4

Step 1: Definition of the experiment and calculation of coverages from bam files

newExperiment() imports relevant data about the RNA-Seq samples and a specified locus (e.g., ChrC: 71950 - 78500).

data <- newExperiment(
  sampleInfo   = sample_info,
  loci         = data.frame(
    seqid      = "ChrC", 
    chromStart = 71950, 
    chromEnd   = 78500,
    locusID    = "psbB_petD"
  ),
  referenceCondition = "wt",
  otherCondition     = "pnp1_1",
  nbThreads          = nb_threads_tot,
  nbThreadsByLocus   = nb_threads_locus,
  coverage           = working_directory
)

print(data)
#> $sampleInfo
#>                                 sample condition replicate
#> pnp1_1_1_ChrC_71950_78500.bam pnp1_1_1    pnp1_1         1
#> pnp1_1_2_ChrC_71950_78500.bam pnp1_1_2    pnp1_1         2
#> wt_1_ChrC_71950_78500.bam         wt_1        wt         1
#> wt_2_ChrC_71950_78500.bam         wt_2        wt         2
#>                                                                                                                  bam
#> pnp1_1_1_ChrC_71950_78500.bam /tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam
#> pnp1_1_2_ChrC_71950_78500.bam /tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam
#> wt_1_ChrC_71950_78500.bam         /tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam
#> wt_2_ChrC_71950_78500.bam         /tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam
#>                               isPairedEnd strandSpecific
#> pnp1_1_1_ChrC_71950_78500.bam       FALSE              1
#> pnp1_1_2_ChrC_71950_78500.bam       FALSE              1
#> wt_1_ChrC_71950_78500.bam           FALSE              1
#> wt_2_ChrC_71950_78500.bam           FALSE              1
#> 
#> $referenceCondition
#> [1] "wt"
#> 
#> $otherCondition
#> [1] "pnp1_1"
#> 
#> $loci
#> GRanges object with 1 range and 1 metadata column:
#>       seqnames      ranges strand |     locusID
#>          <Rle>   <IRanges>  <Rle> | <character>
#>   [1]     ChrC 71950-78500      * |   psbB_petD
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $nbThreads
#> [1] 4
#> 
#> $nbThreadsByLocus
#> [1] 4
#> 
#> $stranded
#> [1] TRUE
#> 
#> $coverageDir
#> [1] "./DIFFSEGR_TEST"

coverage() generates coverage profiles from BAM files for the locus of interest. When reads are stranded (stranded = TRUE), it creates a separate coverage profile for each strand across replicates of the biological conditions being compared. By default, coverage is calculated using a geometric mean of the 5’ and 3’ ends of aligned reads (coverageType = "average"). If the coverage is too sparse, one may calculate the coverage based on the entire read by using the argument coverageType = "fullLength".

coverage(
  data         = data,
  coverageType = "average"
)
#> 
#>  > Calculating coverage of locus ChrC 71950-78500 (ID : psbB_petD) ...
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.18.0
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : counted                                          ||
#> ||   Min overlapping bases : 1                                                ||
#> ||          Read reduction : to 3' end                                        ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid11345 ...         ||
#> ||    Features : 13102                                                        ||
#> ||    Meta-features : 13102                                                   ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24249 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21098 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17531 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14066 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.18.0
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : counted                                          ||
#> ||   Min overlapping bases : 1                                                ||
#> ||          Read reduction : to 5' end                                        ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid11345 ...         ||
#> ||    Features : 13102                                                        ||
#> ||    Meta-features : 13102                                                   ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24251 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21088 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17520 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14060 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#>  > Finished in 1.25298714637756 secs

To enhance efficiency, users have the option to pre-process by subsetting BAM files to the locus of interest, which can notably reduce running times (refer to subsettingBams parameter in ?DiffSegR::coverage).

coverage(
  data           = data,
  coverageType   = "average",
  subsettingBams = TRUE
)

Setp 2: Summarizing the differential transcription landscape

segmentationLFC() transforms coverage profiles into the per-base \(\log_2\)-FC (one by strand). The reference biological condition specified by the user is set to the denominator (referenceCondition = "wt"). segmentationLFC() then calls fpopw::Fpop_w(), a changepoint detection in the mean of a Gaussian signal approach, on the per-base \(\log_2\)-FC of both strands to retrieve the segments boundaries (changepoints). The hyperparameter \(\alpha\) specified by the user controls the number of returned segments. The number of changepoints is a decreasing function of \(\alpha\). By default alpha=2. The segments are stored as a GRanges object.

features <- segmentationLFC(
  data  = data, 
  alpha = 2
)
#> 
#>  > Segmenting the log2-FC of locus ChrC 71950-78500 (ID : psbB_petD) ...
#> 
#>  > Finished in 0.33150053024292 secs

counting() calls Rsubread::featureCounts() which assigns to the segments the mapped reads from each replicate of each biological condition. A read is allowed to be assigned to more than one feature if it is found to overlap with more than one feature. Upon completion, Rsubread::featureCounts() generates a count matrix. Finally, counting() returns the segments and associated count matrix as a SummarizedExperiment object.

SExp <- counting(
  data     = data, 
  features = features
)
#> 
#>  > Counting ...
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.18.0
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : counted                                          ||
#> ||   Min overlapping bases : 1                                                ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid11345 ...         ||
#> ||    Features : 69                                                           ||
#> ||    Meta-features : 69                                                      ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24260 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21100 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17533 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14070 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#>  > Finished in 0.166458129882812 secs

Between positions 71950 and 78500 of chloroplast genome, segmentationLFC() returns 17 segments on the forward (+) strand and 52 segments on the reverse (-) strand. Segment coordinates and associated counts can be retrieved using respectively SummarizedExperiment::rowRanges and SummarizedExperiment::assay accessors.

#- number of segments found ---------------------------------------------------# 
SummarizedExperiment::strand(SExp)
#> factor-Rle of length 69 with 2 runs
#>   Lengths: 17 52
#>   Values :  +  -
#> Levels(3): + - *

#- display first five segments ------------------------------------------------#
segment_coordinates <- as.data.frame(SummarizedExperiment::rowRanges(SExp))
knitr::kable(segment_coordinates[1:5, colnames(segment_coordinates)%in%c(
  "seqnames",
  "start",
  "end",
  "width",
  "strand")
])

seqnames	start	end	width	strand
ChrC	71950	72255	306	+
ChrC	72256	72402	147	+
ChrC	72403	74038	1636	+
ChrC	74039	74305	267	+
ChrC	74306	74841	536	+

#- display counts associated to first five segments ---------------------------#
counts <- SummarizedExperiment::assay(SExp)
knitr::kable(counts[1:5,])

	pnp1_1_1	pnp1_1_2	wt_1	wt_2
ChrC_71950_72255_plus	143	139	84	84
ChrC_72256_72402_plus	649	665	326	293
ChrC_72403_74038_plus	13203	11731	10849	8366
ChrC_74039_74305_plus	155	132	98	104
ChrC_74306_74841_plus	2337	1996	1589	1218

Step 3: Differential expression analysis (DEA)

dea() function takes as input the recently built SummarizedExperiment object and calls DESeq2 to test for each line, i.e. each segment, the difference in expression between the two compared biological conditions. Here we control all the resulting p-values using a Benjamini–Hochberg procedure at level significanceLevel = 1e-2.

Note: By default sample-specific size factors are computed by DESeq2. They can be overwritten by the user. In this example, sample-specific size factors have the same weight sizeFactors = rep(1,4).

dds <- dea(
  SExp              = SExp,
  design            = ~ condition,
  sizeFactors       = rep(1,4),
  significanceLevel = 1e-2
)
#> 
#>  > differential expression analysis ...
#> renaming the first element in assays to 'counts'
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> 
#>  > multiple testing correction ...

Regions differentially expressed (DERs) are set to DER = TRUE. On this example, dea() returns 39 DERs.

#- number of DERs -------------------------------------------------------------#
DERs <- dds[SummarizedExperiment::mcols(dds)$DER,]
length(DERs)
#> [1] 39

#- display first five DERs by position ----------------------------------------#
DERs <- as.data.frame(SummarizedExperiment::rowRanges(DERs))
knitr::kable(DERs[1:5, colnames(DERs)%in%c(
  "seqnames",
  "start",
  "end",
  "width",
  "strand",
  "log2FoldChange",
  "padj")
])

seqnames	start	end	width	strand	log2FoldChange	padj
ChrC	72256	72402	147	+	1.0859541	0.0000025
ChrC	74306	74841	536	+	0.6263374	0.0037268
ChrC	74842	74938	97	+	1.0248663	0.0000195
ChrC	74939	75471	533	+	1.2142538	0.0000001
ChrC	76882	77054	173	+	1.7448836	0.0000000

Step 4: Annotate the DERs

annotateNearest() relies on user specified annotations in GFF3 or GTF format to annotate the DERs found during the DEA. Seven classes of labels translate the relative positions of the DER and its closest annotation(s): antisense; upstream; downstream; inside; overlapping 3’; overlapping 5’; overlapping 5’, 3’.

exons_path <- system.file(
 "extdata", 
 "TAIR10_ChrC_exon.gff3", 
 package = "DiffSegR"
)
annotated_DERs <- annotateNearest(
  features = SummarizedExperiment::rowRanges(dds),   
  annotations = exons_path,
  select      = c("seqnames", "start", "end", "width", "strand", 
     "description", "distance", "Parent", "log2FoldChange", "pvalue"),
  orderBy     = "pvalue",
  outputDirectory = working_directory,
)
#> 
#>  > Loading annotations from /tmp/Rtmpmuyfvl/temp_libpath1d15a968813/DiffSegR/extdata/TAIR10_ChrC_exon.gff3 ...
#> 
#>  > Annotate differential expressed regions ...
#> 
#>  > A report can be found in ./DIFFSEGR_TEST/annotated_DERs.txt

knitr::kable(annotated_DERs, row.names = FALSE, digits = 3)

seqnames	start	end	width	strand	description	distance	Parent	log2FoldChange	pvalue
ChrC	77964	78112	149	+	downstream	291	ATCG00730.1	4.642	0.000
ChrC	77964	78112	149	+	antisense	0	ATCG00740.1	4.642	0.000
ChrC	76882	77054	173	+	upstream	143	ATCG00730.1	1.745	0.000
ChrC	77740	77963	224	+	downstream	67	ATCG00730.1	2.319	0.000
ChrC	77740	77963	224	+	antisense	0	ATCG00740.1	2.319	0.000
ChrC	77055	77192	138	+	upstream	5	ATCG00730.1	1.929	0.000
ChrC	74939	75471	533	+	downstream	92	ATCG00720.1	1.214	0.000
ChrC	78113	78214	102	+	downstream	440	ATCG00730.1	3.919	0.000
ChrC	78113	78214	102	+	antisense	0	ATCG00740.1	3.919	0.000
ChrC	72256	72402	147	+	overlapping 5’	0	ATCG00680.1	1.086	0.000
ChrC	74001	74019	19	-	downstream	229	ATCG00700.1	3.807	0.000
ChrC	74020	74023	4	-	downstream	225	ATCG00700.1	3.747	0.000
ChrC	73994	74000	7	-	downstream	248	ATCG00700.1	3.858	0.000
ChrC	74281	74283	3	-	inside	0	ATCG00700.1	-1.820	0.000
ChrC	74306	74309	4	-	inside	0	ATCG00700.1	-2.022	0.000
ChrC	74301	74302	2	-	inside	0	ATCG00700.1	-1.965	0.000
ChrC	74842	74938	97	+	overlapping 3’	0	ATCG00720.1	1.025	0.000
ChrC	74024	74035	12	-	downstream	213	ATCG00700.1	3.550	0.000
ChrC	74284	74300	17	-	inside	0	ATCG00700.1	-1.735	0.000
ChrC	74303	74304	2	-	inside	0	ATCG00700.1	-1.920	0.000
ChrC	74305	74305	1	-	inside	0	ATCG00700.1	-1.920	0.000
ChrC	74036	74039	4	-	downstream	209	ATCG00700.1	4.585	0.000
ChrC	74269	74273	5	-	inside	0	ATCG00700.1	-1.699	0.000
ChrC	74040	74077	38	-	downstream	171	ATCG00700.1	4.544	0.000
ChrC	74334	74338	5	-	inside	0	ATCG00700.1	-2.121	0.000
ChrC	74267	74268	2	-	inside	0	ATCG00700.1	-1.675	0.000
ChrC	74274	74277	4	-	inside	0	ATCG00700.1	-1.651	0.000
ChrC	74278	74280	3	-	inside	0	ATCG00700.1	-1.651	0.000
ChrC	74310	74333	24	-	inside	0	ATCG00700.1	-1.882	0.000
ChrC	74339	74339	1	-	inside	0	ATCG00700.1	-2.185	0.000
ChrC	74264	74265	2	-	inside	0	ATCG00700.1	-1.631	0.000
ChrC	74340	74347	8	-	inside	0	ATCG00700.1	-2.066	0.000
ChrC	74266	74266	1	-	inside	0	ATCG00700.1	-1.585	0.000
ChrC	74348	74349	2	-	inside	0	ATCG00700.1	-2.452	0.000
ChrC	74078	74083	6	-	downstream	165	ATCG00700.1	4.143	0.000
ChrC	74078	74083	6	-	antisense	0	ATCG00690.1	4.143	0.000
ChrC	73966	73993	28	-	downstream	255	ATCG00700.1	3.755	0.000
ChrC	73692	73953	262	-	downstream	295	ATCG00700.1	2.566	0.000
ChrC	73692	73953	262	-	antisense	0	ATCG00680.1	2.566	0.000
ChrC	73954	73965	12	-	downstream	283	ATCG00700.1	4.358	0.001
ChrC	74261	74263	3	-	inside	0	ATCG00700.1	-1.409	0.001
ChrC	74084	74205	122	-	downstream	43	ATCG00700.1	4.285	0.001
ChrC	74084	74205	122	-	antisense	0	ATCG00690.1	4.285	0.001
ChrC	74306	74841	536	+	overlapping 5’, 3’	0	ATCG00710.1	0.626	0.002
ChrC	74306	74841	536	+	overlapping 5’	0	ATCG00720.1	0.626	0.002
ChrC	74306	74841	536	+	antisense	0	ATCG00700.1	0.626	0.002
ChrC	74256	74260	5	-	inside	0	ATCG00700.1	-1.301	0.005

Step 5: Visualizing the DERs

exportResults() saves for both strands DERs, not-DERs, segmentation, coverages and log2-FC per-base in formats readable (BEDGRAPH, GFF3) by genome viewers like the Integrative Genome Viewer (IGV) (Robison et al. 2011). For IGV, exportResults() also creates an IGV sessiong in XML format that allows loading all tracks in one click.

The tracks from top to bottom stand for:

(log2-Cov +) the mean of coverages on the log scale for the forward strand in both biological conditions of interest. Here, the blue line and the red line stand for wt and pnp1-1 conditions, respectively ;
(log2-FC +) the per-base \(\log_2\)-FC between pnp1-1 (numerator) and wt (denominator) for the forward strand. The straight horizontal line is the zero indicator. When the per-base \(\log_2\)-FC is above or under the zero indicator line, it suggests an up-regulation and an down-regulation in pnp1-1 compare to wt, respectively. Over the per-base \(\log_2\)-FC is displayed the changepoints positions (vertical blue lines) and the mean of each segment (horizontal blue lines that connect two changepoints) ;
(DiffSegR +) the results of the DEA on the forward strand. Up-regulated regions appear in green. Down-regulated regions appear in purple. Not-DERs appear in grey ;
(annotations) The features provided by the user for interpetations.

Symmetrically, the rest of the tracks correspond to the same data on the reverse strand.

Note :

ChrC_path  <- system.file(
 "extdata", 
 "TAIR10_ChrC.fa", 
 package = "DiffSegR"
)
exportResults(
 data            = data,
 features        = SummarizedExperiment::rowRanges(dds),
 sizeFactors     = DESeq2::sizeFactors(dds),
 outputDirectory = working_directory,
 genome          = ChrC_path,
 annotation      = exons_path,
 select          = c("log2FoldChange", "pvalue", "padj")
) 
#> Loading required package: rtracklayer
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:DiffSegR':
#> 
#>     coverage
#> Loading required package: GenomeInfoDb
#> 
#>  > Export results from locus ChrC 71950-78500 (ID : psbB_petD) ...

DiffSegR output between positions 76,990 to 78,248 on the chloroplast genome. The session is loaded in IGV 2.17.2 for Linux.

In collaboration projects, users will typically send the directory containing all tracks to collaborators. To load the session in IGV, the collaborator must update the paths for the genome and annotations in the XML file. This step may be necessary for each IGV session produced by a multi-loci analysis. To streamline this process, collaborators can modify all necessary paths directly from their R environment using the setPaths() function.

setPaths(
  directory = "collaborator_igv_session(s)_directory",
  genome    = "collaborator_genome_path",
  annotations = "collaborator_annotations_path"
)

References

Castandet B, Hotto AM, Fei Z, Stern DB. Strand-specific RNA sequencing uncovers chloroplast ribonuclease functions. FEBS Lett 587(18), 3096-101 (2013) doi:10.1016/j.febslet.2013.08.004.

Hotto AM, Castandet B, Gilet L, Higdon A, Condon C, Stern DB. Arabidopsis chloroplast mini-ribonuclease III participates in rRNA maturation and intron recycling. Plant Cell 27(3), 724-40 (2015) doi:10.1105/tpc.114.134452.

James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011) doi:10.1038/nbt.1754.