Introductions
In this package, we provide e-value for four DMR (differentially methylated region) detection tools (MethylKit, Metilene, BiSeq and DMRfinder) and general purpose. For DMR
(methylKit
, biseq
, DMRfinder
or metilene
), the met-evalue calculation is conducted by the metevalue.[DMR]
function.
DMR | Method | methyrate Example | Method.output Example |
---|---|---|---|
MethylKit | metevalue.methylKit |
data(demo_methylkit_methyrate) |
data(demo_methylkit_met_all) |
BiSeq | metevalue.biseq |
data(demo_biseq_methyrate) |
data(demo_biseq_DMR) |
DMRfinder | metevalue.DMRfinder |
data(demo_DMRfinder_rate_combine) |
data(demo_DMRfinder_DMRs) |
Metilene | metevalue.metilene |
data(demo_metilene_input) |
data(demo_metilene_out) |
Two routines are supported to calculate the combined e-value:
- Call by files: Here the
files
include the outputs of givenDMR
packages and its corresponding e-value of each regions; - Call by R data frames: Here the
R data frames
are correspondingdata.frame
objects.
Details are as follows.
Call by files
We design the metevalue.[DMR]
function to accept similar parameter patterns:
metevalue.[DMR](# Output file name of [DMR]
methyrate, # Output file name of [DMR] with e-value of each region
[DMR].output, adjust.methods = "BH", # Adjust methods of e-value
sep = "\t", # seperator, default is the TAB key.
bheader = FALSE # A logical value indicating whether the [DMR].output file
# contains the names of the variables as its first line
)
Here [DMR]
could be one of methylKit
, biseq
, DMRfinder
or metilene
.
Call by R data frames
We provide the evalue_buildin_var_fmt_nm
and varevalue.metilene
function to handle the general DMR e-value calculation in DNA methylation studies:
# Here `[DMR]` coudle be one of `methylKit`, `biseq`, `DMRfinder` or `metilene`.
= "[DMR]"
method_in_use = evalue_buildin_var_fmt_nm(
result # Data frame of the methylation rate
methyrate, # Data frame of output data corresponding to the
DMR_evalue_output, # "method" option
method = method_in_use) # DMR: "metilene", "biseq", "DMRfinder" or "methylKit"
= list(a = result$a,
result b = result$b,
a_b = evalue_buildin_sql(result$a, result$b, method = method_in_use))
= varevalue.metilene(result$a, result$b, result$a_b) result
Replace [DMR]
to one of methylKit
, biseq
, DMRfinder
or metilene
accordingly.
Notice: for different
[DMR]
, thedata.frame
schemas are different!!! Check the R help document for details. Check the Demo data section for details. # Examples
MethylKit Example
methylKit is a R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing.
Currently, metevalue
package supports the e-value analytics of the methylKit
output file.
library(metevalue)
####Simulation Data ####
set.seed(1234)
# methylKit is a Bioconductor package
library(methylKit)
=list( system.file("extdata",
file.list"test1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"test2.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawList object: myobj
=methRead(file.list,
myobjsample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG"
)
=unite(myobj, destrand=FALSE)
meth<- getData(meth)[,seq(6,ncol(meth),3)]
meth.C <- getData(meth)[,seq(7,ncol(meth),3)]
meth.T <- meth.C/(meth.C + meth.T)
mr = getData(meth)[,1:2]
chr_pos = data.frame(chr_pos,mr)
methyrate names(methyrate) = c('chr', 'pos', rep('g1',2), rep('g2',2))
<-tileMethylCounts(myobj)
region<-unite(region,destrand=F)
meth<-calculateDiffMeth(meth)
myDiff<-getMethylDiff(myDiff,type="all")
met_all
= tempfile(c("rate_combine", "methylKit_DMR_raw"))
example_tempfiles tempdir()
write.table(methyrate, file=example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table (met_all, file=example_tempfiles[2], sep ="\t", row.names =F, col.names =T, quote =F)
evalue.methylKit
function could be used to tackle the problem.
= metevalue.methylKit(example_tempfiles[1], example_tempfiles[2], bheader = T)
result str(result)
Alternatively, one could use the build-in functions to derive functions which avoid the file operation:
= evalue_buildin_var_fmt_nm(methyrate, met_all, method="methylKit")
result = list(a = result$a,
result b = result$b,
a_b = evalue_buildin_sql(result$a, result$b, method="methylKit"))
= varevalue.metilene(result$a, result$b, result$a_b)
result str(result)
BiSeq Example
First, we load the methylation data at CpG site levels from ‘BiSeq’ package. Then we clustered CpG sites into DMRs using ‘BiSeq’.
library(BiSeq)
library(dplyr)
data(rrbs)
<- rawToRel(rrbs)
rrbs.rel <- methLevel(rrbs.rel)
methyrate <- data.frame(methyrate)
methyrate = cbind(rows = as.numeric(row.names(methyrate)), methyrate)
methyrateq = data.frame(rows = as.numeric(row.names(methyrate)), rowRanges(rrbs))
methypos = left_join(methypos, methyrateq)
methyrate = methyrate[,c(2,3,7:16)]
methyrate names(methyrate) <- c('chr','pos',rep('g1',5),rep('g2',5))
<- clusterSites(object = rrbs,perc.samples = 3/4,min.sites = 20,max.dist = 100)
rrbs.clust.unlim
clusterSitesToGR(rrbs.clust.unlim)
<- totalReads(rrbs.clust.unlim) > 0
ind.cov
<- quantile(totalReads(rrbs.clust.unlim)[ind.cov])
quant <- limitCov(rrbs.clust.unlim, maxCov = quant)
rrbs.clust.lim <- predictMeth(object = rrbs.clust.lim)
predictedMeth
<- predictedMeth[, colData(predictedMeth)$group == "test"]
test<- predictedMeth[, colData(predictedMeth)$group == "control"]
control <- rowMeans(methLevel(test))
mean.test <- rowMeans(methLevel(control))
mean.control
<- betaRegression(formula = ~group,link = "probit",object = predictedMeth,type = "BR")
betaResults <- makeVariogram(betaResults)
vario <- smoothVariogram(vario, sill = 0.9)
vario.sm
<- estLocCor(vario.sm)
locCor <- testClusters(locCor)
clusters.rej <- trimClusters(clusters.rej)
clusters.trimmed <- findDMRs(clusters.trimmed,max.dist = 100,diff.dir = TRUE)
DMRs
= tempfile(c('rate_combine', 'BiSeq_DMR'))
example_tempfiles write.table(methyrate, example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table(DMRs, example_tempfiles[2], quote=F, row.names = F,col.names = F, sep = '\t')
Finally, we added E-values and adjusted E-values as additional columns to the output file of ‘BiSeq’.BiSeq_evalue
function could be used to tackle the problem.
<- data.frame(DMRs)
hh = evalue_buildin_var_fmt_nm(rate_combine, hh, method="biseq")
result = list(a = result$a, b = result$b,
result a_b = evalue_buildin_sql(result$a, result$b,method="biseq"))
= varevalue.metilene(result$a, result$b, result$a_b)
result str(result)
DMRfinder Example
Given the input file
rate_combine_DMRfinder
: a file containing methylation rates at each CpG siteDMRfinder_DMR
: the output file from ‘DMRfinder’
<- read.table("rate_combine_DMRfinder", header = T)
rate_combine head(rate_combine)
<- read.table("DMRfinder_DMR", header = T)
DMRs head(DMRs)
Adding E-values and adjusted E-values as additional columns to file ‘DMRfinder_DMR’
<- evalue.DMRfinder('rate_combine_DMRfinder', 'DMRfinder_DMR')
result head(result)
Alternatively, function varevalue.metilene
can also provides e-value and adjusted e-value.
= evalue_buildin_var_fmt_nm(rate_combine, DMRs, method="DMRfinder")
result = list(a = result$a,
result b = result$b,
a_b = evalue_buildin_sql(result$a, result$b, method="DMRfinder"))
= varevalue.metilene(result$a, result$b, result$a_b)
result head(result)
Metilene Example
Given
metilene.input
: the input file ofMetilene
containing methylation rates at each CpG sitemetilene.out
: the output file ofMetilene
<- read.table("metilene.input", header = T)
input head(input)
<- read.table("metilene.out", header = F)
out head(out)
Adding E-values and adjusted E-values as additional columns to metilene.out
<- evalue.metilene('metilene.input', 'metilene.out')
result head(result)
Alternatively, function varevalue.metilene
can also provides e-value and adjusted e-value.
= evalue_buildin_var_fmt_nm(input, out, method="metilene")
result = list(a = result$a,
result b = result$b,
a_b = evalue_buildin_sql(result$a, result$b, method="metilene"))
= varevalue.metilene(result$a, result$b, result$a_b)
result head(result)
Misc
Demo data
Demo data for different metevalue.[DMR]
functions are listed in the section.
Input Data Examples: MethylKit
methyrate Example
chr | pos | g1 | g1 | g2 | g2 |
---|---|---|---|---|---|
chr21 | 9853296 | 0.5882353 | 0.8048048 | 0.8888889 | 0.8632911 |
chr21 | 9853326 | 0.7058824 | 0.7591463 | 0.8750000 | 0.7493404 |
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups, repeat 8 times. Notice that there are “NaN” within the feature columns.
methylKit.output Example
chr | start | end | strand | pvalue | qvalue | meth.diff |
---|---|---|---|---|---|---|
chr21 | 9927001 | 9928000 | * | 0 | 0 | -34.07557 |
chr21 | 9944001 | 9945000 | * | 0 | 0 | -40.19089 |
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- strand: Strand
- pvalue: The adjusted p-value based on BH method in MWU-test
- qvalue: cutoff for qvalue of differential methylation statistic
- methyl.diff: The difference between the group means of methylation level
Input Data Examples: BiSeq
methyrate Example
chr | pos | g1 | g1 | g1 | g1 | g1 | g2 | g2 | g2 | g2 | g2 |
---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 870425 | 0.8205128 | 1 | 0.7 | NaN | NaN | 0.3125 | 0.7419355 | 0.2461538 | 0.1794872 | 0.2413793 |
chr1 | 870443 | 0.8461538 | 1 | 0.7 | NaN | NaN | 0.3750 | 0.3225806 | 0.2923077 | 0.0512821 | 0.2413793 |
- chr: Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups
biseq.output Example
seqnames | start | end | width | strand | median.p | median.meth.group1 | median.meth.group2 | median.meth.diff |
---|---|---|---|---|---|---|---|---|
chr1 | 872369 | 872616 | 248 | * | 0.0753559 | 0.9385462 | 0.8666990 | 0.0710524 |
chr1 | 875227 | 875470 | 244 | * | 0.0000026 | 0.5136315 | 0.1991452 | 0.2942668 |
- seqnames: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- width: The range of the corresponding region
- strand: Strand
- median.p: The median of p-values in the corresponding region
- median.meth.group1 : The median of methylation level for the corresponding segment of group 1
- median.meth.group2 : The median of methylation level for the corresponding segment of group 2
- median.meth.diff: The median of the difference between the methylation level
Input Data Examples: DMRfinder
methyrate Example
chr | pos | g1 | g1.1 | g2 | g2.1 |
---|---|---|---|---|---|
chr1 | 202833315 | 0 | 0.0000000 | 0 | 0 |
chr1 | 202833323 | 1 | 0.8095238 | 1 | 1 |
- chr: Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups
DMRfinder.output Example
chr | start | end | CpG | Control.mu | Exptl.mu | Control.Exptl.diff | Control.Exptl.pval |
---|---|---|---|---|---|---|---|
chr8 | 25164078 | 25164102 | 3 | 0.9241646 | 0.7803819 | -0.1437827 | 0.0333849 |
chr21 | 9437432 | 9437538 | 14 | 0.7216685 | 0.1215506 | -0.6001179 | 0.0000000 |
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- CpG: The number of CpG sites within the corresponding region
- Control.mu: The average methylation rate in control group
- Expt1.mu: The average methylation rate in experiment group
- Control.Expt1.diff: The methylation difference between control and experiment groups
- Control.Expt1.pval: P-value based on Wald-test
Input Data Examples: DMRfinder
methyrate Example
chrom | pos | g1 | g1.1 | g1.2 | g1.3 | g1.4 | g1.5 | g1.6 | g1.7 | g2 | g2.1 | g2.2 | g2.3 | g2.4 | g2.5 | g2.6 | g2.7 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr21 | 9437433 | 0.9285714 | NA | 0.7222222 | 0.75 | 1 | 0.6666667 | 1 | 0.8695652 | 0.0000000 | 0 | 0 | 0 | 0.0000000 | 0.0 | NA | 0.00 |
chr21 | 9437445 | 1.0000000 | NA | 0.9444444 | 0.75 | 1 | 0.6666667 | 0 | 0.8695652 | 0.6111111 | 0 | 0 | 0 | 0.7333333 | 0.6 | NA | 0.75 |
- chr: Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups
metilene.output Example
chr | start | end | q-value | methyl.diff | CpGs | p | p2 | m1 | m2 |
---|---|---|---|---|---|---|---|---|---|
chr21 | 9437432 | 9437540 | 0 | 0.610989 | 26 | 0 | 0 | 0.73705 | 0.12606 |
chr21 | 9708982 | 9709189 | 0 | 0.475630 | 28 | 0 | 0 | 0.58862 | 0.11299 |
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2