Tutorial : Transcriptome-Wide Differential Expression Analysis with DiffSegR
Source:vignettes/introductory_tutorial.Rmd
introductory_tutorial.Rmd
Data
In this tutorial we use an RNA-Seq dataset (SRA046998) generated to study the differences in Arabidopsis thaliana chloroplast RNA metabolism between a control (wt) and a null mutant of the chloroplast-targeted \(3'\to5'\) exoribonuclease polynucleotide phosphorylase (pnp1-1). In comparison to control plants, this mutant overaccumulate RNA fragments that are mainly located outside of the annotated genic areas and the RNA-Seq data have been extensively validated using molecular techniques (Castandet et al. 2013; Hotto et al. 2015). Both conditions contain two biological replicates. Only a diluted set of reads overlapping positions 71950 to 78500 on chloroplast genome has been kept as a minimal dataset example.
Overview
After setting up the necessary working environnement and loading the software, we are going through each step of a classical differential expression analysis along the genome with DiffSegR
:
- Computation of coverages and per-base \(\log_2\)-FC from bam files ;
- Summarize the differential transcription landscape, i.e. (i) identify segments along the genome in the per-base \(\log_2\)-FC and (ii) count reads overlapping them ;
- Use the counts as measures of the expression abundance of the segments for differential expression analysis ;
- Annotate the differential expressed regions (DERs) ;
- Visualize the DERs.
Setup
First, we create a working directory that will host all the files produced by DiffSegR
during the analysis (coverages, results).
working_directory <- "./DIFFSEGR_TEST"
dir.create(working_directory, showWarnings = FALSE)
Then, we save sample information that we will use for this tutorial in a text file. In practice, one can create the working directory and write the text file by hand. The sample information file needs to include the following columns:
- sample : identifier for the sample ;
- condition : identifier for the condition ;
- replicate : identifier for the replicate ;
- bam : path to the bam file (.bam) ;
- coverage : path to the coverage file (.rds).
Note: Coverages are computed internally by DiffSegR
from bam files and saved in rds format at the location specified by the user (coverage
).
#- create sample information table --------------------------------------------#
sample_info <- data.frame(
sample = c("pnp1_1_1", "pnp1_1_2", "wt_1", "wt_2"),
condition = rep(c("pnp1_1", "wt"), each = 2),
replicate = rep(1:2,2),
bam = sapply(
c("pnp1_1_1_ChrC_71950_78500.bam",
"pnp1_1_2_ChrC_71950_78500.bam",
"wt_1_ChrC_71950_78500.bam",
"wt_2_ChrC_71950_78500.bam"
),
function(bam) system.file("extdata", bam, package = "DiffSegR")
),
coverage = file.path(
working_directory,
paste0(c("pnp1_1_1", "pnp1_1_2", "wt_1", "wt_2"), ".rds")
)
)
#- save sample information table ----------------------------------------------#
write.table(
sample_info,
file.path(working_directory, "sample_info.txt")
)
#- display sample information table -------------------------------------------#
knitr::kable(sample_info, row.names = FALSE)
sample | condition | replicate | bam | coverage |
---|---|---|---|---|
pnp1_1_1 | pnp1_1 | 1 | /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam | ./DIFFSEGR_TEST/pnp1_1_1.rds |
pnp1_1_2 | pnp1_1 | 2 | /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam | ./DIFFSEGR_TEST/pnp1_1_2.rds |
wt_1 | wt | 1 | /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam | ./DIFFSEGR_TEST/wt_1.rds |
wt_2 | wt | 2 | /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam | ./DIFFSEGR_TEST/wt_2.rds |
Analysis
Now, we present a basic use of the main functions of the DiffSegR
R package. The analysis was carried on 4
logical cores. This parameter can be modified by the user according to the resources at his disposal.
Step 1: Computation of coverages and per-base \(\log_2\)-FC from bam files
- builds coverages from bam files within a locus specified by the user. Here we focus on positions
71950
to78500
of chloroplast genome (ChrC
). If the reads are stranded (stranded = TRUE
), the function builds one coverage profile per-strand for each replicate of both compared biological conditions. By default the heuristic used to compute coverages is the geometric mean of 5’ and 3’ profiles of the aligned reads (coverageType = "average"
) ; - transforms coverages into the per-base \(\log_2\)-FC (one by strand). The reference biological condition specified by the user is set to the denominator (
referenceCondition = "wt"
) ; - returns the coverages and the per-base \(\log_2\)-FC as a list of run-length encoding objects.
Note: If coverages have already been computed, set fromBam = FALSE
to load them directly from specified rds files.
data <- loadData(
sampleInfo = file.path(working_directory,"sample_info.txt"),
locus = list(
seqid = "ChrC",
chromStart = 71950,
chromEnd = 78500
),
referenceCondition = "wt",
stranded = TRUE,
fromBam = TRUE,
nbThreads = nb_threads
)
#> building coverage profiles ...
#>
#> ========== _____ _ _ ____ _____ ______ _____
#> ===== / ____| | | | _ \| __ \| ____| /\ | __ \
#> ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
#> ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
#> ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#> ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
#> Rsubread 2.10.4
#>
#> //========================== featureCounts setting ===========================\\
#> || ||
#> || Input files : 4 BAM files ||
#> || ||
#> || pnp1_1_1_ChrC_71950_78500.bam ||
#> || pnp1_1_2_ChrC_71950_78500.bam ||
#> || wt_1_ChrC_71950_78500.bam ||
#> || wt_2_ChrC_71950_78500.bam ||
#> || ||
#> || Paired-end : no ||
#> || Count read pairs : no ||
#> || Annotation : R data.frame ||
#> || Dir for temp files : . ||
#> || Threads : 4 ||
#> || Level : meta-feature level ||
#> || Multimapping reads : counted ||
#> || Multi-overlapping reads : not counted ||
#> || Min overlapping bases : 1 ||
#> || Read reduction : to 3' end ||
#> || ||
#> \\============================================================================//
#>
#> //================================= Running ==================================\\
#> || ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ... ||
#> || Features : 13102 ||
#> || Meta-features : 13102 ||
#> || Chromosomes/contigs : 1 ||
#> || ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 24260 ||
#> || Successfully assigned alignments : 24249 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 21100 ||
#> || Successfully assigned alignments : 21098 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 17533 ||
#> || Successfully assigned alignments : 17531 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 14070 ||
#> || Successfully assigned alignments : 14066 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Write the final count table. ||
#> || Write the read assignment summary. ||
#> || ||
#> \\============================================================================//
#>
#>
#> ========== _____ _ _ ____ _____ ______ _____
#> ===== / ____| | | | _ \| __ \| ____| /\ | __ \
#> ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
#> ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
#> ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#> ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
#> Rsubread 2.10.4
#>
#> //========================== featureCounts setting ===========================\\
#> || ||
#> || Input files : 4 BAM files ||
#> || ||
#> || pnp1_1_1_ChrC_71950_78500.bam ||
#> || pnp1_1_2_ChrC_71950_78500.bam ||
#> || wt_1_ChrC_71950_78500.bam ||
#> || wt_2_ChrC_71950_78500.bam ||
#> || ||
#> || Paired-end : no ||
#> || Count read pairs : no ||
#> || Annotation : R data.frame ||
#> || Dir for temp files : . ||
#> || Threads : 4 ||
#> || Level : meta-feature level ||
#> || Multimapping reads : counted ||
#> || Multi-overlapping reads : not counted ||
#> || Min overlapping bases : 1 ||
#> || Read reduction : to 5' end ||
#> || ||
#> \\============================================================================//
#>
#> //================================= Running ==================================\\
#> || ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ... ||
#> || Features : 13102 ||
#> || Meta-features : 13102 ||
#> || Chromosomes/contigs : 1 ||
#> || ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 24260 ||
#> || Successfully assigned alignments : 24251 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 21100 ||
#> || Successfully assigned alignments : 21088 (99.9%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 17533 ||
#> || Successfully assigned alignments : 17520 (99.9%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 14070 ||
#> || Successfully assigned alignments : 14060 (99.9%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Write the final count table. ||
#> || Write the read assignment summary. ||
#> || ||
#> \\============================================================================//
#> building log2-FC per-base ...
#> finished in 1.24791789054871 secs
print(data)
#> $coverages
#> $coverages$plus
#> $coverages$plus$pnp1_1_1
#> numeric-Rle of length 6551 with 5255 runs
#> Lengths: 86 1 3 1 ... 1 1 10
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$plus$pnp1_1_2
#> numeric-Rle of length 6551 with 5209 runs
#> Lengths: 89 1 2 1 ... 5 2 30
#> Values : 0.000000 0.732051 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$plus$wt_1
#> numeric-Rle of length 6551 with 4610 runs
#> Lengths: 46 1 15 1 ... 23 1 14
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$plus$wt_2
#> numeric-Rle of length 6551 with 4601 runs
#> Lengths: 57 1 26 1 ... 22 1 2
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#>
#> $coverages$minus
#> $coverages$minus$pnp1_1_1
#> numeric-Rle of length 6551 with 526 runs
#> Lengths: 31 1 68 1 ... 6 1 1
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$minus$pnp1_1_2
#> numeric-Rle of length 6551 with 585 runs
#> Lengths: 406 1 42 1 ... 2 1 1
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$minus$wt_1
#> numeric-Rle of length 6551 with 463 runs
#> Lengths: 427 1 83 1 ... 4 1 2
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.000000 0.414214 0.000000
#>
#> $coverages$minus$wt_2
#> numeric-Rle of length 6551 with 544 runs
#> Lengths: 458 1 63 1 ... 1 18 1
#> Values : 0.000000 0.414214 0.000000 0.414214 ... 0.414214 0.000000 0.732051
#>
#>
#>
#> $log2FoldChange
#> $log2FoldChange$plus
#> numeric-Rle of length 6551 with 6004 runs
#> Lengths: 46 1 10 ... 1 2
#> Values : 0.0000000 -0.2500000 0.0000000 ... -0.250000 0.000000
#>
#> $log2FoldChange$minus
#> numeric-Rle of length 6551 with 1239 runs
#> Lengths: 31 1 68 ... 1 1
#> Values : 0.00 0.25 0.00 ... 5.00000e-01 -3.96241e-01
#>
#>
#> $locus
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] ChrC 71950-78500 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $stranded
#> [1] TRUE
#>
#> $sampleInfo
#> sample condition replicate
#> pnp1_1_1_ChrC_71950_78500.bam pnp1_1_1 pnp1_1 1
#> pnp1_1_2_ChrC_71950_78500.bam pnp1_1_2 pnp1_1 2
#> wt_1_ChrC_71950_78500.bam wt_1 wt 1
#> wt_2_ChrC_71950_78500.bam wt_2 wt 2
#> bam
#> pnp1_1_1_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_1_ChrC_71950_78500.bam
#> pnp1_1_2_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/pnp1_1_2_ChrC_71950_78500.bam
#> wt_1_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_1_ChrC_71950_78500.bam
#> wt_2_ChrC_71950_78500.bam /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/wt_2_ChrC_71950_78500.bam
#> coverage
#> pnp1_1_1_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_1.rds
#> pnp1_1_2_ChrC_71950_78500.bam ./DIFFSEGR_TEST/pnp1_1_2.rds
#> wt_1_ChrC_71950_78500.bam ./DIFFSEGR_TEST/wt_1.rds
#> wt_2_ChrC_71950_78500.bam ./DIFFSEGR_TEST/wt_2.rds
#>
#> $referenceCondition
#> [1] "wt"
Setp 2: Summarizing the differential transcription landscape
- calls
fpopw::Fpop_w()
, a changepoint detection in the mean of a Gaussian signal approach, on the per-base \(\log_2\)-FC of both strands to retrieve the segments boundaries (changepoints). The hyperparameter \(\alpha\) specified by the user controls the number of returned segments. The number of changepoints is a decreasing function of \(\alpha\). By defaultalpha=2
. The segments are stored as a GenomicRanges object. - calls
Rsubread::featureCounts()
which assigns to the segments the mapped reads from each replicate of each biological condition. A read is allowed to be assigned to more than one feature if it is found to overlap with more than one feature.Rsubread::featureCounts()
ends by building a count matrix. - returns the segments and associated count matrix as a
SummarizedExperiment
object.
SExp <- segmentation(
data = data,
nbThreadsFeatureCounts = nb_threads,
outputDirectory = working_directory
)
#> changepoint detection on ChrC 71950 78500 plus ...
#> > model selection ...
#> > report available in ./DIFFSEGR_TEST/all_models_plus.pdf
#> changepoint detection on ChrC 71950 78500 minus ...
#> > model selection ...
#> > report available in ./DIFFSEGR_TEST/all_models_minus.pdf
#> finished in 0.521576166152954 secs
#> counting ...
#>
#> ========== _____ _ _ ____ _____ ______ _____
#> ===== / ____| | | | _ \| __ \| ____| /\ | __ \
#> ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
#> ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
#> ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#> ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
#> Rsubread 2.10.4
#>
#> //========================== featureCounts setting ===========================\\
#> || ||
#> || Input files : 4 BAM files ||
#> || ||
#> || pnp1_1_1_ChrC_71950_78500.bam ||
#> || pnp1_1_2_ChrC_71950_78500.bam ||
#> || wt_1_ChrC_71950_78500.bam ||
#> || wt_2_ChrC_71950_78500.bam ||
#> || ||
#> || Paired-end : no ||
#> || Count read pairs : no ||
#> || Annotation : R data.frame ||
#> || Dir for temp files : . ||
#> || Threads : 4 ||
#> || Level : meta-feature level ||
#> || Multimapping reads : counted ||
#> || Multi-overlapping reads : counted ||
#> || Min overlapping bases : 1 ||
#> || ||
#> \\============================================================================//
#>
#> //================================= Running ==================================\\
#> || ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49780 ... ||
#> || Features : 69 ||
#> || Meta-features : 69 ||
#> || Chromosomes/contigs : 1 ||
#> || ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 24260 ||
#> || Successfully assigned alignments : 24260 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 21100 ||
#> || Successfully assigned alignments : 21100 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 17533 ||
#> || Successfully assigned alignments : 17533 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam... ||
#> || Strand specific : stranded ||
#> || Single-end reads are included. ||
#> || Total alignments : 14070 ||
#> || Successfully assigned alignments : 14070 (100.0%) ||
#> || Running time : 0.00 minutes ||
#> || ||
#> || Write the final count table. ||
#> || Write the read assignment summary. ||
#> || ||
#> \\============================================================================//
#> finished in 0.125975608825684 secs
Between positions 71950
and 78500
of chloroplast genome, segmentation()
returns 17 segments on the forward (+) strand and 52 segments on the reverse (-) strand. Segment coordinates and associated counts can be retrieved using respectively SummarizedExperiment::rowRanges
and SummarizedExperiment::assay
accessors.
#- number of segments found ---------------------------------------------------#
SummarizedExperiment::strand(SExp)
#> factor-Rle of length 69 with 2 runs
#> Lengths: 17 52
#> Values : + -
#> Levels(3): + - *
#- display first five segments ------------------------------------------------#
segment_coordinates <- as.data.frame(SummarizedExperiment::rowRanges(SExp))
knitr::kable(segment_coordinates[1:5, colnames(segment_coordinates)%in%c(
"seqnames",
"start",
"end",
"width",
"strand")
])
seqnames | start | end | width | strand |
---|---|---|---|---|
ChrC | 71950 | 72255 | 306 | + |
ChrC | 72256 | 72402 | 147 | + |
ChrC | 72403 | 74038 | 1636 | + |
ChrC | 74039 | 74305 | 267 | + |
ChrC | 74306 | 74841 | 536 | + |
#- display counts associated to first five segments ---------------------------#
counts <- SummarizedExperiment::assay(SExp)
knitr::kable(counts[1:5,])
pnp1_1_1 | pnp1_1_2 | wt_1 | wt_2 | |
---|---|---|---|---|
ChrC_71950_72255_plus | 143 | 139 | 84 | 84 |
ChrC_72256_72402_plus | 649 | 665 | 326 | 293 |
ChrC_72403_74038_plus | 13203 | 11731 | 10849 | 8366 |
ChrC_74039_74305_plus | 155 | 132 | 98 | 104 |
ChrC_74306_74841_plus | 2337 | 1996 | 1589 | 1218 |
Step 3: Differential expression analysis (DEA)
dea()
function takes as input the recently built SummarizedExperiment
object and calls DESeq2
to test for each line, i.e. each segment, the difference in expression between the two compared biological conditions. Here we control all the resulting p-values using a Benjamini–Hochberg procedure at level significanceLevel = 1e-2
.
Note: By default sample-specific size factors are computed by DESeq2
. They can be overwritten by the user. In this example, sample-specific size factors have the same weight sizeFactors = rep(1,4)
.
dds <- dea(
data = data,
SExp = SExp,
design = ~ condition,
sizeFactors = rep(1,4),
significanceLevel = 1e-2
)
#> > differential expression analysis ...
#> renaming the first element in assays to 'counts'
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> > multiple testing correction ...
#> > normalize denoised log2-FC and coverage ...
Regions differentially expressed (DERs) are set to rejectedHypotheses = TRUE
. On this example, dea()
returns 39 DERs.
#- number of DERs -------------------------------------------------------------#
DERs <- dds[SummarizedExperiment::mcols(dds)$rejectedHypotheses,]
length(DERs)
#> [1] 39
#- display first five DERs by position ----------------------------------------#
DERs <- as.data.frame(SummarizedExperiment::rowRanges(DERs))
knitr::kable(DERs[1:5, colnames(DERs)%in%c(
"seqnames",
"start",
"end",
"width",
"strand",
"log2FoldChange",
"padj")
])
seqnames | start | end | width | strand | log2FoldChange | padj |
---|---|---|---|---|---|---|
ChrC | 72256 | 72402 | 147 | + | 1.0859541 | 0.0000025 |
ChrC | 74306 | 74841 | 536 | + | 0.6263374 | 0.0037268 |
ChrC | 74842 | 74938 | 97 | + | 1.0248663 | 0.0000195 |
ChrC | 74939 | 75471 | 533 | + | 1.2142538 | 0.0000001 |
ChrC | 76882 | 77054 | 173 | + | 1.7448836 | 0.0000000 |
Step 4: Annotate the DERs
annotateNearest()
relies on user specified annotations in gff3 or gtf format to annotate the DERs found during the DEA. Seven classes of labels translate the relative positions of the DER and its closest annotation(s): antisense; upstream; downstream; inside; overlapping 3’; overlapping 5’; overlapping 5’, 3’.
exons_path <- system.file(
"extdata",
"TAIR10_ChrC_exon.gff3",
package = "DiffSegR"
)
annotated_DERs <- annotateNearest(
data = data,
dds = dds,
annotations = exons_path,
select = c("seqnames", "start", "end", "width", "strand",
"description", "distance", "Parent", "log2FoldChange", "pvalue"),
orderBy = "pvalue"
)
#> > loading annotations from /tmp/RtmpXaWfVn/temp_libpathc03c318e5828/DiffSegR/extdata/TAIR10_ChrC_exon.gff3 ...
#> > annotate differential expressed regions ...
knitr::kable(annotated_DERs, row.names = FALSE, digits = 3)
seqnames | start | end | width | strand | description | distance | Parent | log2FoldChange | pvalue |
---|---|---|---|---|---|---|---|---|---|
ChrC | 77964 | 78112 | 149 | + | downstream | 291 | ATCG00730.1 | 4.642 | 0.000 |
ChrC | 77964 | 78112 | 149 | + | antisense | 0 | ATCG00740.1 | 4.642 | 0.000 |
ChrC | 76882 | 77054 | 173 | + | upstream | 143 | ATCG00730.1 | 1.745 | 0.000 |
ChrC | 77740 | 77963 | 224 | + | downstream | 67 | ATCG00730.1 | 2.319 | 0.000 |
ChrC | 77740 | 77963 | 224 | + | antisense | 0 | ATCG00740.1 | 2.319 | 0.000 |
ChrC | 77055 | 77192 | 138 | + | upstream | 5 | ATCG00730.1 | 1.929 | 0.000 |
ChrC | 74939 | 75471 | 533 | + | downstream | 92 | ATCG00720.1 | 1.214 | 0.000 |
ChrC | 78113 | 78214 | 102 | + | downstream | 440 | ATCG00730.1 | 3.919 | 0.000 |
ChrC | 78113 | 78214 | 102 | + | antisense | 0 | ATCG00740.1 | 3.919 | 0.000 |
ChrC | 72256 | 72402 | 147 | + | overlapping 5’ | 0 | ATCG00680.1 | 1.086 | 0.000 |
ChrC | 74001 | 74019 | 19 | - | downstream | 229 | ATCG00700.1 | 3.807 | 0.000 |
ChrC | 74020 | 74023 | 4 | - | downstream | 225 | ATCG00700.1 | 3.747 | 0.000 |
ChrC | 73994 | 74000 | 7 | - | downstream | 248 | ATCG00700.1 | 3.858 | 0.000 |
ChrC | 74281 | 74283 | 3 | - | inside | 0 | ATCG00700.1 | -1.820 | 0.000 |
ChrC | 74306 | 74309 | 4 | - | inside | 0 | ATCG00700.1 | -2.022 | 0.000 |
ChrC | 74301 | 74302 | 2 | - | inside | 0 | ATCG00700.1 | -1.965 | 0.000 |
ChrC | 74842 | 74938 | 97 | + | overlapping 3’ | 0 | ATCG00720.1 | 1.025 | 0.000 |
ChrC | 74024 | 74035 | 12 | - | downstream | 213 | ATCG00700.1 | 3.550 | 0.000 |
ChrC | 74284 | 74300 | 17 | - | inside | 0 | ATCG00700.1 | -1.735 | 0.000 |
ChrC | 74303 | 74304 | 2 | - | inside | 0 | ATCG00700.1 | -1.920 | 0.000 |
ChrC | 74305 | 74305 | 1 | - | inside | 0 | ATCG00700.1 | -1.920 | 0.000 |
ChrC | 74036 | 74039 | 4 | - | downstream | 209 | ATCG00700.1 | 4.585 | 0.000 |
ChrC | 74269 | 74273 | 5 | - | inside | 0 | ATCG00700.1 | -1.699 | 0.000 |
ChrC | 74040 | 74077 | 38 | - | downstream | 171 | ATCG00700.1 | 4.544 | 0.000 |
ChrC | 74334 | 74338 | 5 | - | inside | 0 | ATCG00700.1 | -2.121 | 0.000 |
ChrC | 74267 | 74268 | 2 | - | inside | 0 | ATCG00700.1 | -1.675 | 0.000 |
ChrC | 74274 | 74277 | 4 | - | inside | 0 | ATCG00700.1 | -1.651 | 0.000 |
ChrC | 74278 | 74280 | 3 | - | inside | 0 | ATCG00700.1 | -1.651 | 0.000 |
ChrC | 74310 | 74333 | 24 | - | inside | 0 | ATCG00700.1 | -1.882 | 0.000 |
ChrC | 74339 | 74339 | 1 | - | inside | 0 | ATCG00700.1 | -2.185 | 0.000 |
ChrC | 74264 | 74265 | 2 | - | inside | 0 | ATCG00700.1 | -1.631 | 0.000 |
ChrC | 74340 | 74347 | 8 | - | inside | 0 | ATCG00700.1 | -2.066 | 0.000 |
ChrC | 74266 | 74266 | 1 | - | inside | 0 | ATCG00700.1 | -1.585 | 0.000 |
ChrC | 74348 | 74349 | 2 | - | inside | 0 | ATCG00700.1 | -2.452 | 0.000 |
ChrC | 74078 | 74083 | 6 | - | downstream | 165 | ATCG00700.1 | 4.143 | 0.000 |
ChrC | 74078 | 74083 | 6 | - | antisense | 0 | ATCG00690.1 | 4.143 | 0.000 |
ChrC | 73966 | 73993 | 28 | - | downstream | 255 | ATCG00700.1 | 3.755 | 0.000 |
ChrC | 73692 | 73953 | 262 | - | downstream | 295 | ATCG00700.1 | 2.566 | 0.000 |
ChrC | 73692 | 73953 | 262 | - | antisense | 0 | ATCG00680.1 | 2.566 | 0.000 |
ChrC | 73954 | 73965 | 12 | - | downstream | 283 | ATCG00700.1 | 4.358 | 0.001 |
ChrC | 74261 | 74263 | 3 | - | inside | 0 | ATCG00700.1 | -1.409 | 0.001 |
ChrC | 74084 | 74205 | 122 | - | downstream | 43 | ATCG00700.1 | 4.285 | 0.001 |
ChrC | 74084 | 74205 | 122 | - | antisense | 0 | ATCG00690.1 | 4.285 | 0.001 |
ChrC | 74306 | 74841 | 536 | + | overlapping 5’, 3’ | 0 | ATCG00710.1 | 0.626 | 0.002 |
ChrC | 74306 | 74841 | 536 | + | overlapping 5’ | 0 | ATCG00720.1 | 0.626 | 0.002 |
ChrC | 74306 | 74841 | 536 | + | antisense | 0 | ATCG00700.1 | 0.626 | 0.002 |
ChrC | 74256 | 74260 | 5 | - | inside | 0 | ATCG00700.1 | -1.301 | 0.005 |
Step 5: Visualizing the DERs
exportResults()
saves for both strands DERs, not-DERs, segmentation, coverages and log2-FC per-base in formats readable (bedGraph, gff3) by genome viewers like the Integrative Genome Viewer (IGV) (Robison et al. 2011). For IGV, exportResults()
also creates an IGV sessiong in xml format that allows loading all tracks in one click.
The tracks from top to bottom stand for:
- (log2-Cov plus) the mean of coverages on the log scale for the forward strand in both biological conditions of interest. Here, the blue line and the red line stand for wt and pnp1-1 conditions, respectively ;
- (log2-FC plus) the per-base \(\log_2\)-FC between pnp1-1 (numerator) and wt (denominator) for the forward strand. The straight horizontal line is the zero indicator. When the per-base \(\log_2\)-FC is above or under the zero indicator line, it suggests an up-regulation and an down-regulation in pnp1-1 compare to wt, respectively. Over the per-base \(\log_2\)-FC is displayed the changepoints positions (vertical blue lines) and the mean of each segment (horizontal blue lines that connect two changepoints) ;
- (DiffSegR plus) the results of the DEA on the forward strand. Up-regulated regions appear in green. Down-regulated regions appear in purple. Not-DERs appear in grey ;
- (annotations) The features provided by the user for interpetations.
Symmetrically, the rest of the tracks correspond to the same data on the reverse strand.
ChrC_path <- system.file(
"extdata",
"TAIR10_ChrC.fa",
package = "DiffSegR"
)
exportResults(
data = data,
dds = dds,
outputDirectory = working_directory,
genome = ChrC_path,
annotation = exons_path,
select = c("log2FoldChange", "pvalue", "padj")
)
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
References
Castandet B, Hotto AM, Fei Z, Stern DB. Strand-specific RNA sequencing uncovers chloroplast ribonuclease functions. FEBS Lett 587(18), 3096-101 (2013). doi:10.1016/j.febslet.2013.08.004.
Hotto AM, Castandet B, Gilet L, Higdon A, Condon C, Stern DB. Arabidopsis chloroplast mini-ribonuclease III participates in rRNA maturation and intron recycling. Plant Cell 27(3), 724-40 (2015). doi:10.1105/tpc.114.134452.
James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011). doi:10.1038/nbt.1754.