Summaries

Dan MacLean

2018-03-21

The atacr package provides functions for getting quick summaries of your data. An overview comes from summary()

summary(counts)
## ATAC-seq experiment of 2 treatments in 6 samples
##  Treatments: control,treatment 
##  Samples: control_001,control_002,control_003,treatment_001,treatment_002,treatment_003 
##  Bait regions used: 50 
##  Total Windows: 100 
##  
##  On/Off target read counts:
##           sample off_target on_target percent_on_target
## 1   control_001          0   1472324               100
## 2   control_002          0   1706695               100
## 3   control_003          0   2025406               100
## 4 treatment_001          0   1643446               100
## 5 treatment_002          0   1406754               100
## 6 treatment_003          0   1443139               100 
##  Quantiles: 
##  $bait_windows
##     control_001 control_002 control_003 treatment_001 treatment_002
## 1%        14.45        1.47       17.43         37.17         18.92
## 5%        26.80        6.25       35.30         66.85         35.60
## 95%    94587.95   129606.45   142283.75     130083.65     118031.55
## 99%   147815.24   207795.95   346273.00     212651.53     172249.70
##     treatment_003
## 1%          20.68
## 5%          45.85
## 95%     141042.80
## 99%     162228.25
## 
## $non_bait_windows
##     control_001 control_002 control_003 treatment_001 treatment_002
## 1%           NA          NA          NA            NA            NA
## 5%           NA          NA          NA            NA            NA
## 95%          NA          NA          NA            NA            NA
## 99%          NA          NA          NA            NA            NA
##     treatment_003
## 1%             NA
## 5%             NA
## 95%            NA
## 99%            NA
##  
##  Read depths:
##           sample off_target on_target
## 1   control_001         NA  29446.48
## 2   control_002         NA  34133.90
## 3   control_003         NA  40508.12
## 4 treatment_001         NA  32868.92
## 5 treatment_002         NA  28135.08
## 6 treatment_003         NA  28862.78

which shows the on and off target hit counts, the quantiles and the mean read depths.

The count distributions across the bait and non-bait windows by sample can be plotted quickly with coverage_summary().

coverage_summary(counts)
## Picking joint bandwidth of 0.358

Diagnostic plots

It is possible to look coverage in a given data set and look at raw counts.

plot_counts(counts, which = "bait_windows", log10 = FALSE)
## Picking joint bandwidth of 11200

Low counts in windows

The number of windows below a threshold for each experiment can be seen with windows_below_coverage_threshold_plot(), and you can set the lower and upper bounds with the to and from arguments.

windows_below_coverage_threshold_plot(counts, from = 5, to = 25)

MA plots

MA plots of sample count versus all sample median count - to highlight odd looking experiments and extreme outliers - can be displayed with ma_plot(). By default this will use the bait_windows data, but you can set the which argument to use other subsets, e.g non_bait_windows

ma_plot(counts)

Per chromosome plots

These are bar charts of coverage at the windows across the chromosomes (seqnames) provided in the data.

plot_count_by_chromosome(counts)
## Warning: Expected 3 pieces. Additional pieces discarded in 300 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
The simulated data here are spread randomly across the chromosome.

The simulated data here are spread randomly across the chromosome.

Sample comparison plots

A matrix of correlation between counts in the samples can be plot with the sample_correlation_plot() function. In this plot the colour and size scale of the dots represents the Pearson correlation coefficient. Pairwise comparisons with p < 0.05 have a blank space.

sample_correlation_plot(counts)

A PCA plot that clusters the most simlar samples can also be generated using the sample_pca_plot() function.

sample_pca_plot(counts)