Skip to contents

Before using this tutorial, we recommend that you read through our introductory tutorial.

Overview

In this tutorial we explore more advanced uses of DiffSegR:

  1. How to explore a range of segmentation models ?
  2. How to select a subset of segments whose absolute \(\log_2\)-FC passes a threshold while keeping statistical guarantees on the true discovery proportion in this subset ? (Post hoc inference)

Exploring more segmentations

After visual inspection, one can disagree with the segmentation of the \(\log_2\)-FC proposed by DiffSegR. It can happen when spurious changepoints, resulting from an over-segmentation of the \(\log_2\)-FC, seems to be added. In this case it is often useful to explore several different segmentations in order to choose a more suitable hyperparameter \(\alpha\) value.

To explore all reachable segmentations between 1 to Kmax segments, one can pass segmentNeighborhood = TRUE and specify the value of Kmax to segmentation(). This step can be time consuming. A grid search approach is also implemented in DiffSegR (see gridSearch and alphas parameters of segmentation()).

SExp <- segmentation(
  data                   = data, 
  nbThreadsFeatureCounts = nb_threads,
  outputDirectory        = working_directory,
  segmentNeighborhood    = TRUE,
  Kmax = 50,
  alpha = 4
)
#> changepoint detection on ChrC 71950 78500 plus ...
#>   > exploring models using segment neighborhood ...
#>   > model selection ...
#>   > report available in ./DIFFSEGR_TEST/all_models_plus.pdf
#> changepoint detection on ChrC 71950 78500 minus ...
#>   > exploring models using segment neighborhood ...
#>   > model selection ...
#>   > report available in ./DIFFSEGR_TEST/all_models_minus.pdf
#> finished in 0.618608951568604 secs
#> counting ...
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.10.4
#> 
#> //========================== featureCounts setting ===========================\\
#> ||                                                                            ||
#> ||             Input files : 4 BAM files                                      ||
#> ||                                                                            ||
#> ||                           pnp1_1_1_ChrC_71950_78500.bam                    ||
#> ||                           pnp1_1_2_ChrC_71950_78500.bam                    ||
#> ||                           wt_1_ChrC_71950_78500.bam                        ||
#> ||                           wt_2_ChrC_71950_78500.bam                        ||
#> ||                                                                            ||
#> ||              Paired-end : no                                               ||
#> ||        Count read pairs : no                                               ||
#> ||              Annotation : R data.frame                                     ||
#> ||      Dir for temp files : .                                                ||
#> ||                 Threads : 4                                                ||
#> ||                   Level : meta-feature level                               ||
#> ||      Multimapping reads : counted                                          ||
#> || Multi-overlapping reads : counted                                          ||
#> ||   Min overlapping bases : 1                                                ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Load annotation file .Rsubread_UserProvidedAnnotation_pid49626 ...         ||
#> ||    Features : 28                                                           ||
#> ||    Meta-features : 28                                                      ||
#> ||    Chromosomes/contigs : 1                                                 ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_1_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 24260                                                ||
#> ||    Successfully assigned alignments : 24260 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file pnp1_1_2_ChrC_71950_78500.bam...                          ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 21100                                                ||
#> ||    Successfully assigned alignments : 21100 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_1_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 17533                                                ||
#> ||    Successfully assigned alignments : 17533 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Process BAM file wt_2_ChrC_71950_78500.bam...                              ||
#> ||    Strand specific : stranded                                              ||
#> ||    Single-end reads are included.                                          ||
#> ||    Total alignments : 14070                                                ||
#> ||    Successfully assigned alignments : 14070 (100.0%)                       ||
#> ||    Running time : 0.00 minutes                                             ||
#> ||                                                                            ||
#> || Write the final count table.                                               ||
#> || Write the read assignment summary.                                         ||
#> ||                                                                            ||
#> \\============================================================================//
#> finished in 0.126221656799316 secs

The figures showing the path of all reachable segmentations up to Kmax segments of the log2-FC on reverse and forward strands are accessible in the working directory under the name all_models_minus.rds and all_models_plus.rds.

A good practice is to select a robust segmentation. Visually it is equivalent to pick the elbow of this curve. In this example we can select \(\log(\alpha)=2.1\) or \(\alpha=4.28\) symbolized by the red line. Setting alpha=4 in segmentation() leads to 11 segments on the reverse strand.

Path of all reachable segmentations up to 50 segments of the log2-FC on reverse strand. Corresponding alpha values on the log scale can be read on X-axis.

Path of all reachable segmentations up to 50 segments of the log2-FC on reverse strand. Corresponding alpha values on the log scale can be read on X-axis.

Post hoc inference

If the result of the DEA does not correspond to what the user expected, they may tend to snoop in the data and eventually select a subset of segments of interest. For instance it is a common usage to define a threshold on the absolute \(\log_2\)-FC per-segment. In this case DiffSegR will declare differentially expressed the largest set of segments with an absolute \(\log_2\)-FC > threshold and a true discovery proportion lower bound choose by the user (postHoc_tdpLowerBound). This post-hoc lower bound is obtained by controlling the joint error rate (JER) using the Simes family of thresholds (Blanchard et al. 2020).

To use post hoc inference one have to specify :

  1. its decision rule, i.e a predicate function that returns a single TRUE or FALSE if the segment on which it is applied meets the conditions defined in the predicate. The criteria used in the predicate have to be defined for each segment in SummarizedExperiment::mcols(dds). Here, predicate = function(x) 2^abs(x$log2FoldChange)>2 ;
  2. the minimum true discovery proportion among the subset of segments declared differentially expressed. Here, postHoc_tdpLowerBound = 0.05 ;
  3. the significance level of the test procedure. Here, postHoc_significanceLevel = 0.05.
dds <- dea(
  data                      = data,
  SExp                      = SExp,
  design                    = ~ condition,
  sizeFactors               = rep(1,4),
  predicate                 = function(x) 2^abs(x$log2FoldChange)>2,
  postHoc_significanceLevel = 0.05,
  postHoc_tdpLowerBound     = 0.95
)
#>   > differential expression analysis ...
#> renaming the first element in assays to 'counts'
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#>   >  multiple testing correction ...
#>   >  normalize denoised log2-FC and coverage ...
#>   >  postHoc selection ...
#> 21 rejected hypotheses with at least 0.952380952380952 of true positives.

#- check fold-change of differentially expressed segments ---------------------#
DERs <- dds[SummarizedExperiment::mcols(dds)$rejectedHypotheses,]
2^abs(SummarizedExperiment::mcols(DERs)$log2FoldChange)
#>  [1]  2.122779  2.034771  2.320207  3.351678  3.807408  4.991526 24.968253
#>  [8]  8.352940  2.833332  7.733332 14.499984 15.714271 17.666617 19.499909
#> [15]  3.318839  3.119999  3.903844  3.730767  4.062498  3.686272  4.174997

References

Blanchard, G., Neuvial, P. and Roquain, E. Post hoc confidence bounds on false positives using reference families. Ann. Statist 48(3), 1281-1303 (2020). doi.org/10.1214/19-AOS1847.