Skip to contents

Data

In this tutorial we use an RNA-Seq dataset (SRA046998) generated to study the differences in Arabidopsis thaliana chloroplast RNA metabolism between a control (wt) and a null mutant of the chloroplast-targeted \(3'\to5'\) exoribonuclease polynucleotide phosphorylase (pnp1-1). In comparison to control plants, this mutant overaccumulate RNA fragments that are mainly located outside of the annotated genic areas and the RNA-Seq data have been extensively validated using molecular techniques (Castandet et al. 2013; Hotto et al. 2015). Both conditions contain two biological replicates. Only a diluted set of reads overlapping positions 71950 to 78500 on chloroplast genome has been kept as a minimal dataset example.

Overview

After setting up the necessary working environnement and loading the software, we are going through each step of a classical differential expression analysis along the genome with DiffSegR:

  1. Computation of coverages and per-base \(\log_2\)-FC from bam files ;
  2. Summarize the differential transcription landscape, i.e. (i) identify segments along the genome in the per-base \(\log_2\)-FC and (ii) count reads overlapping them ;
  3. Use the counts as measures of the expression abundance of the segments for differential expression analysis ;
  4. Annotate the differential expressed regions (DERs) ;
  5. Visualize the DERs.

Setup

First, we create a working directory that will host all the files produced by DiffSegR during the analysis (coverages, results).

working_directory <- "./DIFFSEGR_TEST"
dir.create(working_directory, showWarnings = FALSE)

Then, we save sample information that we will use for this tutorial in a text file. In practice, one can create the working directory and write the text file by hand. The sample information file needs to include the following columns:

  1. sample : identifier for the sample ;
  2. condition : identifier for the condition ;
  3. replicate : identifier for the replicate ;
  4. bam : path to the bam file (.bam) ;
  5. coverage : path to the coverage file (.rds).

Note: Coverages are computed internally by DiffSegR from bam files and saved in rds format at the location specified by the user (coverage).

#- create sample information table --------------------------------------------#
sample_info <- data.frame(
  sample    = c("pnp1_1_1", "pnp1_1_2", "wt_1", "wt_2"),
  condition = rep(c("pnp1_1", "wt"), each = 2),
  replicate = rep(1:2,2),
  bam       = sapply(
    c("pnp1_1_1_ChrC_71950_78500.bam", 
      "pnp1_1_2_ChrC_71950_78500.bam",
      "wt_1_ChrC_71950_78500.bam",
      "wt_2_ChrC_71950_78500.bam"
    ),
    function(bam) system.file("extdata", bam, package = "DiffSegR")
  ),
  coverage  = file.path(
    working_directory,
    paste0(c("pnp1_1_1", "pnp1_1_2", "wt_1", "wt_2"), ".rds")
  )
)
#- save sample information table ----------------------------------------------# 
write.table(
  sample_info, 
  file.path(working_directory, "sample_info.txt")
)
#- display sample information table -------------------------------------------#
knitr::kable(sample_info, row.names = FALSE)
sample condition replicate bam coverage
pnp1_1_1 pnp1_1 1 /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_1.rds
pnp1_1_2 pnp1_1 2 /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_2.rds
wt_1 wt 1 /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam ./DIFFSEGR_TEST/wt_1.rds
wt_2 wt 2 /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam ./DIFFSEGR_TEST/wt_2.rds

Analysis

Now, we present a basic use of the main functions of the DiffSegR R package. The analysis was carried on 4 logical cores. This parameter can be modified by the user according to the resources at his disposal.

library(DiffSegR)
nb_threads = 4

Step 1: Computation of coverages and per-base \(\log_2\)-FC from bam files

loadData():

  1. builds coverages from bam files within a locus specified by the user. Here we focus on positions 71950 to 78500 of chloroplast genome (ChrC). If the reads are stranded (stranded = TRUE), the function builds one coverage profile per-strand for each replicate of both compared biological conditions. By default the heuristic used to compute coverages is the geometric mean of 5’ and 3’ profiles of the aligned reads (coverageType = "average") ;
  2. transforms coverages into the per-base \(\log_2\)-FC (one by strand). The reference biological condition specified by the user is set to the denominator (referenceCondition = "wt") ;
  3. returns the coverages and the per-base \(\log_2\)-FC as a list of run-length encoding objects.

Note: If coverages have already been computed, set fromBam = FALSE to load them directly from specified rds files.

data <- loadData(
  sampleInfo   = file.path(working_directory,"sample_info.txt"),
  locus        = list(
    seqid      = "ChrC", 
    chromStart = 71950, 
    chromEnd   = 78500
  ),
  referenceCondition = "wt",
  stranded           = TRUE,
  fromBam            = TRUE,
  nbThreads          = nb_threads
)
#> building coverage profiles ...
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.10.4
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : not counted                                      ||
#> ||   Min overlapping bases : 1                                                ||
#> ||          Read reduction : to 3' end                                        ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ...         ||
#> ||    Features : 13102                                                        ||
#> ||    Meta-features : 13102                                                   ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24249 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21098 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17531 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14066 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.10.4
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : not counted                                      ||
#> ||   Min overlapping bases : 1                                                ||
#> ||          Read reduction : to 5' end                                        ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ...         ||
#> ||    Features : 13102                                                        ||
#> ||    Meta-features : 13102                                                   ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24251 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21088 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17520 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14060 (99.9%)                        ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> building log2-FC per-base ...
#> finished in 1.24791789054871 secs

print(data)
#> $coverages
#> $coverages$plus
#> $coverages$plus$pnp1_1_1
#> numeric-Rle of length 6551 with 5255 runs
#>   Lengths:       86        1        3        1 ...        1        1       10
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$plus$pnp1_1_2
#> numeric-Rle of length 6551 with 5209 runs
#>   Lengths:       89        1        2        1 ...        5        2       30
#>   Values : 0.000000 0.732051 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$plus$wt_1
#> numeric-Rle of length 6551 with 4610 runs
#>   Lengths:       46        1       15        1 ...       23        1       14
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$plus$wt_2
#> numeric-Rle of length 6551 with 4601 runs
#>   Lengths:       57        1       26        1 ...       22        1        2
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> 
#> $coverages$minus
#> $coverages$minus$pnp1_1_1
#> numeric-Rle of length 6551 with 526 runs
#>   Lengths:       31        1       68        1 ...        6        1        1
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$minus$pnp1_1_2
#> numeric-Rle of length 6551 with 585 runs
#>   Lengths:      406        1       42        1 ...        2        1        1
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$minus$wt_1
#> numeric-Rle of length 6551 with 463 runs
#>   Lengths:      427        1       83        1 ...        4        1        2
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#> 
#> $coverages$minus$wt_2
#> numeric-Rle of length 6551 with 544 runs
#>   Lengths:      458        1       63        1 ...        1       18        1
#>   Values : 0.000000 0.414214 0.000000 0.414214 ... 0.414214 0.000000 0.732051
#> 
#> 
#> 
#> $log2FoldChange
#> $log2FoldChange$plus
#> numeric-Rle of length 6551 with 6004 runs
#>   Lengths:         46          1         10 ...          1          2
#>   Values :  0.0000000 -0.2500000  0.0000000 ...  -0.250000   0.000000
#> 
#> $log2FoldChange$minus
#> numeric-Rle of length 6551 with 1239 runs
#>   Lengths:           31            1           68 ...            1            1
#>   Values :         0.00         0.25         0.00 ...  5.00000e-01 -3.96241e-01
#> 
#> 
#> $locus
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames      ranges strand
#>          <Rle>   <IRanges>  <Rle>
#>   [1]     ChrC 71950-78500      *
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $stranded
#> [1] TRUE
#> 
#> $sampleInfo
#>                                 sample condition replicate
#> pnp1_1_1_ChrC_71950_78500.bam pnp1_1_1    pnp1_1         1
#> pnp1_1_2_ChrC_71950_78500.bam pnp1_1_2    pnp1_1         2
#> wt_1_ChrC_71950_78500.bam         wt_1        wt         1
#> wt_2_ChrC_71950_78500.bam         wt_2        wt         2
#>                                                                                                                   bam
#> pnp1_1_1_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam
#> pnp1_1_2_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam
#> wt_1_ChrC_71950_78500.bam         /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam
#> wt_2_ChrC_71950_78500.bam         /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam
#>                                                   coverage
#> pnp1_1_1_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_1.rds
#> pnp1_1_2_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_2.rds
#> wt_1_ChrC_71950_78500.bam         ./DIFFSEGR_TEST/wt_1.rds
#> wt_2_ChrC_71950_78500.bam         ./DIFFSEGR_TEST/wt_2.rds
#> 
#> $referenceCondition
#> [1] "wt"

Setp 2: Summarizing the differential transcription landscape

segmentation():

  1. calls fpopw::Fpop_w(), a changepoint detection in the mean of a Gaussian signal approach, on the per-base \(\log_2\)-FC of both strands to retrieve the segments boundaries (changepoints). The hyperparameter \(\alpha\) specified by the user controls the number of returned segments. The number of changepoints is a decreasing function of \(\alpha\). By default alpha=2. The segments are stored as a GenomicRanges object.
  2. calls Rsubread::featureCounts() which assigns to the segments the mapped reads from each replicate of each biological condition. A read is allowed to be assigned to more than one feature if it is found to overlap with more than one feature. Rsubread::featureCounts() ends by building a count matrix.
  3. returns the segments and associated count matrix as a SummarizedExperiment object.
SExp <- segmentation(
  data                   = data, 
  nbThreadsFeatureCounts = nb_threads,
  outputDirectory        = working_directory
)
#> changepoint detection on ChrC 71950 78500 plus ...
#>   > model selection ...
#>   > report available in ./DIFFSEGR_TEST/all_models_plus.pdf
#> changepoint detection on ChrC 71950 78500 minus ...
#>   > model selection ...
#>   > report available in ./DIFFSEGR_TEST/all_models_minus.pdf
#> finished in 0.521576166152954 secs
#> counting ...
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.10.4
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : counted                                          ||
#> ||   Min overlapping bases : 1                                                ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ...         ||
#> ||    Features : 69                                                           ||
#> ||    Meta-features : 69                                                      ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24260 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21100 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17533 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14070 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> finished in 0.125975608825684 secs

Between positions 71950 and 78500 of chloroplast genome, segmentation() returns 17 segments on the forward (+) strand and 52 segments on the reverse (-) strand. Segment coordinates and associated counts can be retrieved using respectively SummarizedExperiment::rowRanges and SummarizedExperiment::assay accessors.

#- number of segments found ---------------------------------------------------# 
SummarizedExperiment::strand(SExp)
#> factor-Rle of length 69 with 2 runs
#>   Lengths: 17 52
#>   Values :  +  -
#> Levels(3): + - *
#- display first five segments ------------------------------------------------#
segment_coordinates <- as.data.frame(SummarizedExperiment::rowRanges(SExp))
knitr::kable(segment_coordinates[1:5, colnames(segment_coordinates)%in%c(
  "seqnames",
  "start",
  "end",
  "width",
  "strand")
]) 
seqnames start end width strand
ChrC 71950 72255 306 +
ChrC 72256 72402 147 +
ChrC 72403 74038 1636 +
ChrC 74039 74305 267 +
ChrC 74306 74841 536 +
#- display counts associated to first five segments ---------------------------#
counts <- SummarizedExperiment::assay(SExp)
knitr::kable(counts[1:5,])
pnp1_1_1 pnp1_1_2 wt_1 wt_2
ChrC_71950_72255_plus 143 139 84 84
ChrC_72256_72402_plus 649 665 326 293
ChrC_72403_74038_plus 13203 11731 10849 8366
ChrC_74039_74305_plus 155 132 98 104
ChrC_74306_74841_plus 2337 1996 1589 1218

Step 3: Differential expression analysis (DEA)

dea() function takes as input the recently built SummarizedExperiment object and calls DESeq2 to test for each line, i.e. each segment, the difference in expression between the two compared biological conditions. Here we control all the resulting p-values using a Benjamini–Hochberg procedure at level significanceLevel = 1e-2.

Note: By default sample-specific size factors are computed by DESeq2. They can be overwritten by the user. In this example, sample-specific size factors have the same weight sizeFactors = rep(1,4).

dds <- dea(
  data              = data,
  SExp              = SExp,
  design            = ~ condition,
  sizeFactors       = rep(1,4),
  significanceLevel = 1e-2
)
#>   > differential expression analysis ...
#> renaming the first element in assays to 'counts'
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#>   >  multiple testing correction ...
#>   >  normalize denoised log2-FC and coverage ...

Regions differentially expressed (DERs) are set to rejectedHypotheses = TRUE. On this example, dea() returns 39 DERs.

#- number of DERs -------------------------------------------------------------#
DERs <- dds[SummarizedExperiment::mcols(dds)$rejectedHypotheses,]
length(DERs)
#> [1] 39
#- display first five DERs by position ----------------------------------------#
DERs <- as.data.frame(SummarizedExperiment::rowRanges(DERs))
knitr::kable(DERs[1:5, colnames(DERs)%in%c(
  "seqnames",
  "start",
  "end",
  "width",
  "strand",
  "log2FoldChange",
  "padj")
])
seqnames start end width strand log2FoldChange padj
ChrC 72256 72402 147 + 1.0859541 0.0000025
ChrC 74306 74841 536 + 0.6263374 0.0037268
ChrC 74842 74938 97 + 1.0248663 0.0000195
ChrC 74939 75471 533 + 1.2142538 0.0000001
ChrC 76882 77054 173 + 1.7448836 0.0000000

Step 4: Annotate the DERs

annotateNearest() relies on user specified annotations in gff3 or gtf format to annotate the DERs found during the DEA. Seven classes of labels translate the relative positions of the DER and its closest annotation(s): antisense; upstream; downstream; inside; overlapping 3’; overlapping 5’; overlapping 5’, 3’.

exons_path <- system.file(
 "extdata", 
 "TAIR10_ChrC_exon.gff3", 
 package = "DiffSegR"
)
annotated_DERs <- annotateNearest(
   data        = data,
   dds         = dds,
   annotations = exons_path,
   select      = c("seqnames", "start", "end", "width", "strand", 
     "description", "distance", "Parent", "log2FoldChange", "pvalue"),
   orderBy     = "pvalue"
)
#>  > loading annotations from /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/TAIR10_ChrC_exon.gff3 ...
#>  > annotate differential expressed regions ...
knitr::kable(annotated_DERs, row.names = FALSE, digits = 3)
seqnames start end width strand description distance Parent log2FoldChange pvalue
ChrC 77964 78112 149 + downstream 291 ATCG00730.1 4.642 0.000
ChrC 77964 78112 149 + antisense 0 ATCG00740.1 4.642 0.000
ChrC 76882 77054 173 + upstream 143 ATCG00730.1 1.745 0.000
ChrC 77740 77963 224 + downstream 67 ATCG00730.1 2.319 0.000
ChrC 77740 77963 224 + antisense 0 ATCG00740.1 2.319 0.000
ChrC 77055 77192 138 + upstream 5 ATCG00730.1 1.929 0.000
ChrC 74939 75471 533 + downstream 92 ATCG00720.1 1.214 0.000
ChrC 78113 78214 102 + downstream 440 ATCG00730.1 3.919 0.000
ChrC 78113 78214 102 + antisense 0 ATCG00740.1 3.919 0.000
ChrC 72256 72402 147 + overlapping 5’ 0 ATCG00680.1 1.086 0.000
ChrC 74001 74019 19 - downstream 229 ATCG00700.1 3.807 0.000
ChrC 74020 74023 4 - downstream 225 ATCG00700.1 3.747 0.000
ChrC 73994 74000 7 - downstream 248 ATCG00700.1 3.858 0.000
ChrC 74281 74283 3 - inside 0 ATCG00700.1 -1.820 0.000
ChrC 74306 74309 4 - inside 0 ATCG00700.1 -2.022 0.000
ChrC 74301 74302 2 - inside 0 ATCG00700.1 -1.965 0.000
ChrC 74842 74938 97 + overlapping 3’ 0 ATCG00720.1 1.025 0.000
ChrC 74024 74035 12 - downstream 213 ATCG00700.1 3.550 0.000
ChrC 74284 74300 17 - inside 0 ATCG00700.1 -1.735 0.000
ChrC 74303 74304 2 - inside 0 ATCG00700.1 -1.920 0.000
ChrC 74305 74305 1 - inside 0 ATCG00700.1 -1.920 0.000
ChrC 74036 74039 4 - downstream 209 ATCG00700.1 4.585 0.000
ChrC 74269 74273 5 - inside 0 ATCG00700.1 -1.699 0.000
ChrC 74040 74077 38 - downstream 171 ATCG00700.1 4.544 0.000
ChrC 74334 74338 5 - inside 0 ATCG00700.1 -2.121 0.000
ChrC 74267 74268 2 - inside 0 ATCG00700.1 -1.675 0.000
ChrC 74274 74277 4 - inside 0 ATCG00700.1 -1.651 0.000
ChrC 74278 74280 3 - inside 0 ATCG00700.1 -1.651 0.000
ChrC 74310 74333 24 - inside 0 ATCG00700.1 -1.882 0.000
ChrC 74339 74339 1 - inside 0 ATCG00700.1 -2.185 0.000
ChrC 74264 74265 2 - inside 0 ATCG00700.1 -1.631 0.000
ChrC 74340 74347 8 - inside 0 ATCG00700.1 -2.066 0.000
ChrC 74266 74266 1 - inside 0 ATCG00700.1 -1.585 0.000
ChrC 74348 74349 2 - inside 0 ATCG00700.1 -2.452 0.000
ChrC 74078 74083 6 - downstream 165 ATCG00700.1 4.143 0.000
ChrC 74078 74083 6 - antisense 0 ATCG00690.1 4.143 0.000
ChrC 73966 73993 28 - downstream 255 ATCG00700.1 3.755 0.000
ChrC 73692 73953 262 - downstream 295 ATCG00700.1 2.566 0.000
ChrC 73692 73953 262 - antisense 0 ATCG00680.1 2.566 0.000
ChrC 73954 73965 12 - downstream 283 ATCG00700.1 4.358 0.001
ChrC 74261 74263 3 - inside 0 ATCG00700.1 -1.409 0.001
ChrC 74084 74205 122 - downstream 43 ATCG00700.1 4.285 0.001
ChrC 74084 74205 122 - antisense 0 ATCG00690.1 4.285 0.001
ChrC 74306 74841 536 + overlapping 5’, 3’ 0 ATCG00710.1 0.626 0.002
ChrC 74306 74841 536 + overlapping 5’ 0 ATCG00720.1 0.626 0.002
ChrC 74306 74841 536 + antisense 0 ATCG00700.1 0.626 0.002
ChrC 74256 74260 5 - inside 0 ATCG00700.1 -1.301 0.005

Step 5: Visualizing the DERs

exportResults() saves for both strands DERs, not-DERs, segmentation, coverages and log2-FC per-base in formats readable (bedGraph, gff3) by genome viewers like the Integrative Genome Viewer (IGV) (Robison et al. 2011). For IGV, exportResults() also creates an IGV sessiong in xml format that allows loading all tracks in one click.

The tracks from top to bottom stand for:

  1. (log2-Cov plus) the mean of coverages on the log scale for the forward strand in both biological conditions of interest. Here, the blue line and the red line stand for wt and pnp1-1 conditions, respectively ;
  2. (log2-FC plus) the per-base \(\log_2\)-FC between pnp1-1 (numerator) and wt (denominator) for the forward strand. The straight horizontal line is the zero indicator. When the per-base \(\log_2\)-FC is above or under the zero indicator line, it suggests an up-regulation and an down-regulation in pnp1-1 compare to wt, respectively. Over the per-base \(\log_2\)-FC is displayed the changepoints positions (vertical blue lines) and the mean of each segment (horizontal blue lines that connect two changepoints) ;
  3. (DiffSegR plus) the results of the DEA on the forward strand. Up-regulated regions appear in green. Down-regulated regions appear in purple. Not-DERs appear in grey ;
  4. (annotations) The features provided by the user for interpetations.

Symmetrically, the rest of the tracks correspond to the same data on the reverse strand.

ChrC_path  <- system.file(
 "extdata", 
 "TAIR10_ChrC.fa", 
 package = "DiffSegR"
)
exportResults(
 data            = data,
 dds             = dds, 
 outputDirectory = working_directory,
 genome          = ChrC_path,
 annotation      = exons_path,
 select          = c("log2FoldChange", "pvalue", "padj")
) 
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
DiffSegR output between positions 76,990 to 78,248 on the chloroplast genome. The session is loaded in IGV 2.12.3 for Linux.

DiffSegR output between positions 76,990 to 78,248 on the chloroplast genome. The session is loaded in IGV 2.12.3 for Linux.

References

Castandet B, Hotto AM, Fei Z, Stern DB. Strand-specific RNA sequencing uncovers chloroplast ribonuclease functions. FEBS Lett 587(18), 3096-101 (2013). doi:10.1016/j.febslet.2013.08.004.

Hotto AM, Castandet B, Gilet L, Higdon A, Condon C, Stern DB. Arabidopsis chloroplast mini-ribonuclease III participates in rRNA maturation and intron recycling. Plant Cell 27(3), 724-40 (2015). doi:10.1105/tpc.114.134452.

James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011). doi:10.1038/nbt.1754.