Title: | Variable Length Markov Chains ('VLMC') Models |
---|---|
Description: | Functions, Classes & Methods for estimation, prediction, and simulation (bootstrap) of Variable Length Markov Chain ('VLMC') Models. |
Authors: | Martin Maechler [aut, cre] |
Maintainer: | Martin Maechler <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.4-4 |
Built: | 2024-10-31 16:32:55 UTC |
Source: | https://github.com/cran/VLMC |
Simple conversion functions for discrete data (e.g., time series),
between 0:k
integers and single letter characters.
alpha2int(x, alpha) int2alpha(i, alpha)
alpha2int(x, alpha) int2alpha(i, alpha)
x |
character vector of single letters. |
alpha |
the alphabet, as one character string. |
i |
integer vector of numbers in |
alpha2int(x,*)
returns an integer
vector of the same
length as x
, consisting of values from 0:k
where
k + 1
is the length of the alphabet, nchar(alpha)
.
int2alpha(i,*)
returns a vector of single letter
character
of the same length as i
.
vlmc
, and
int2char()
and its inverse, char2int()
,
both working with multi-character strings instead of vectors of single
characters; further, alphabet
.
alphabet <- "abcdefghijk" (ch <- sample(letters[1:10], 30, replace = TRUE)) (ic <- alpha2int(ch, alphabet)) stopifnot(int2alpha(ic, alphabet) == ch)
alphabet <- "abcdefghijk" (ch <- sample(letters[1:10], 30, replace = TRUE)) (ic <- alpha2int(ch, alphabet)) stopifnot(int2alpha(ic, alphabet) == ch)
Return the alphabet in use, as a vector of “characters”.
alphabet(x, ...) ## S3 method for class 'vlmc' alphabet(x, ...)
alphabet(x, ...) ## S3 method for class 'vlmc' alphabet(x, ...)
x |
any R object, currently only available for
|
... |
potential further arguments passed to and from methods. |
a character
vector, say r
, with length equal to
the alphabet size. Currently, typically all r[i]
are strings
of just one character.
alpha2int
for conversion to and from integer
codings.
data(bnrf1) vb <- vlmc(bnrf1EB, cutoff = 5) alphabet(vb) # |--> "a" "c" "g" "t"
data(bnrf1) vb <- vlmc(bnrf1EB, cutoff = 5) alphabet(vb) # |--> "a" "c" "g" "t"
This is a method for the as.dendrogram
generic function
## S3 method for class 'vlmc' as.dendrogram(object, ...)
## S3 method for class 'vlmc' as.dendrogram(object, ...)
object |
a |
... |
further arguments passed to and from methods. |
An object of class dendrogram
, i.e. a nested list
described on that page.
as.dendrogram
, plot.dendrogram
.
data(presidents) dpr <- factor(cut(presidents, c(0,45,70,100)), exclude=NULL)# NA = 4th level (vlmc.pres <- vlmc(dpr)) draw(vlmc.pres) (dv.dpr <- as.dendrogram(vlmc.pres)) str(dv.dpr) str(unclass(dv.dpr)) plot(dv.dpr, type ="tr", nodePar = list(pch=c(1,16), cex = 1.5)) ## Artificial example f1 <- c(1,0,0,0) ; f2 <- rep(1:0, 2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = 0.1)) (dvlmc <- as.dendrogram(vlmc.dt1c01)) str(dvlmc) ## not so useful: plot(dvlmc, nodePar= list(pch=c(1,16))) ## complete disaster: plot(dvlmc, type ="tr", nodePar= list(pch=c(1,16))) ## but this is not (yet) so much better (want the same angles to left ## and right!! plot(dvlmc, type ="tr", nodePar = list(pch=c(1,16)), center=TRUE, main = format(vlmc.dt1c01$call)) mtext(paste("dt1 =", gsub(" ","",deparse(dt1,width=100))))
data(presidents) dpr <- factor(cut(presidents, c(0,45,70,100)), exclude=NULL)# NA = 4th level (vlmc.pres <- vlmc(dpr)) draw(vlmc.pres) (dv.dpr <- as.dendrogram(vlmc.pres)) str(dv.dpr) str(unclass(dv.dpr)) plot(dv.dpr, type ="tr", nodePar = list(pch=c(1,16), cex = 1.5)) ## Artificial example f1 <- c(1,0,0,0) ; f2 <- rep(1:0, 2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = 0.1)) (dvlmc <- as.dendrogram(vlmc.dt1c01)) str(dvlmc) ## not so useful: plot(dvlmc, nodePar= list(pch=c(1,16))) ## complete disaster: plot(dvlmc, type ="tr", nodePar= list(pch=c(1,16))) ## but this is not (yet) so much better (want the same angles to left ## and right!! plot(dvlmc, type ="tr", nodePar = list(pch=c(1,16)), center=TRUE, main = format(vlmc.dt1c01$call)) mtext(paste("dt1 =", gsub(" ","",deparse(dt1,width=100))))
Two gene DNA data “discrete time series”,
bnrf1EB
the BNRF1 gene from the Epstein-Barr virus,
bnrf1HV
the BNRF1 gene from the herpes virus.
data(bnrf1)
data(bnrf1)
The EB sequence is of length
3954, whereas the HV has
3741 nucleotides.
Both are R factor
s with the four levels
c("a","c","g","t")
.
Martin Maechler (original packaging for R).
See the references; data used to be at
https://anson.ucdavis.edu/~shumway/tsa.html
, and are now available
in CRAN package astsa, e.g., bnrf1ebv
.
Shumway, R. and Stoffer, D. (2000) Time Series Analysis and its Applications. Springer Texts in Statistics.
data(bnrf1) bnrf1EB[1:500] table(bnrf1EB) table(bnrf1HV) n <- length(bnrf1HV) table(t = bnrf1HV[-1], "t-1" = bnrf1HV[-n]) plot(as.integer(bnrf1EB[1:500]), type = "b") ## Simplistic gene matching: percent.eq <- sapply(0:200, function(i) 100 * sum(bnrf1EB[(1+i):(n+i)] == bnrf1HV))/n plot.ts(percent.eq)
data(bnrf1) bnrf1EB[1:500] table(bnrf1EB) table(bnrf1HV) n <- length(bnrf1HV) table(t = bnrf1HV[-1], "t-1" = bnrf1HV[-n]) plot(as.integer(bnrf1EB[1:500]), type = "b") ## Simplistic gene matching: percent.eq <- sapply(0:200, function(i) 100 * sum(bnrf1EB[(1+i):(n+i)] == bnrf1HV))/n plot.ts(percent.eq)
Compute the Deviance, i.e., - 2 log[likelihood(*)] of a fitted VLMC object. The log-likelihood is also known as “entropy”.
## S3 method for class 'vlmc' deviance(object, ...)
## S3 method for class 'vlmc' deviance(object, ...)
object |
typically the result of |
... |
possibly further arguments (none at the moment). |
A number, the deviance, i.e., .
where the log.likelihood is really what we currently have
as
entropy()
.
Martin Maechler
example(vlmc) deviance(vlmc.pres) devianceR <- function(object) { dn <- dimnames(pr <- predict(object)) -2 * sum(log(pr[cbind(2:nrow(pr), match(dn[[1]][-1], dn[[2]]))])) } all.equal(deviance(vlmc.pres), devianceR(vlmc.pres), tol = 1e-14)
example(vlmc) deviance(vlmc.pres) devianceR <- function(object) { dn <- dimnames(pr <- predict(object)) -2 * sum(log(pr[cbind(2:nrow(pr), match(dn[[1]][-1], dn[[2]]))])) } all.equal(deviance(vlmc.pres), devianceR(vlmc.pres), tol = 1e-14)
Draws a vlmc
object, typically the result of
vlmc(.)
, to the R console,
using one line per node.
draw(x, ...) ## S3 method for class 'vlmc' draw(x, kind = 3, flag = TRUE, show.hidden = 0, cumulative = TRUE, delta = cumulative, debug = FALSE, ...)
draw(x, ...) ## S3 method for class 'vlmc' draw(x, kind = 3, flag = TRUE, show.hidden = 0, cumulative = TRUE, delta = cumulative, debug = FALSE, ...)
x |
typically the result of |
kind |
integer code for the “kind of drawing”, in {0,1,2,3}. |
flag |
logical; .. |
integer code; if not |
|
cumulative |
logical indicating if the cumulative counts should
be shown for nonterminal nodes; the ‘delta’s can only be computed
from the cumulative counts, i.e., |
delta |
logical indicating if delta,
i.e. |
debug |
logical; if |
... |
(potentially more arguments) |
.............
.............
Note that the counts internally are stored “non-cumulatively”,
i.e., as difference counts which is useful for likelihood (ratio)
computations. In the internal C code, the difference counts are
originally computed by the comp_difference()
function after tree
generation. draw(*, cumulative = TRUE)
internally calls the C
function cumulate()
for the cumulative sums.
nothing is returned.
Martin Maechler
vlmc
.
example(vlmc) draw(vlmc.dt1c01) draw(vlmc.dt1c01, flag = FALSE) draw(vlmc.dt1c01, kind = 1) draw(vlmc.dt1) draw(vlmc.dt1, show = 3) draw(vlmc.dt1, cumulative = FALSE)
example(vlmc) draw(vlmc.dt1c01) draw(vlmc.dt1c01, flag = FALSE) draw(vlmc.dt1c01, kind = 1) draw(vlmc.dt1) draw(vlmc.dt1, show = 3) draw(vlmc.dt1, cumulative = FALSE)
Utility for converting a vlmc
state ID to the
corresponding context. Of rare interest to the average user.
id2ctxt(id, m=nchar(alpha), alpha=NULL)
id2ctxt(id, m=nchar(alpha), alpha=NULL)
id |
integer, a context ID such as optionally returned by
|
m |
integer, the alphabet length. Defaults to
|
alpha |
alphabet string |
a list (if alpha
is not specified) or character vector of the
same length as id
, giving the
context (as integer vector or single string) of the corresponding id
predict.vlmc(*, type = "ID")
.
id2ctxt(c(2,3,5,9), alpha = "Ab") str(id2ctxt(c(2,3,5,9), 2))
id2ctxt(c(2,3,5,9), alpha = "Ab") str(id2ctxt(c(2,3,5,9), 2))
Simple conversion utilities for character to integer conversion and vice versa.
int2char(i, alpha) char2int(x, alpha)
int2char(i, alpha) char2int(x, alpha)
i |
integer vectors, typically in |
alpha |
character string with several letters, representing the alphabet. |
x |
character string, typically with letters from |
int2char()
gives a string (length 1 character) with as many
characters as length(i)
, by 0-indexing into the alphabet
alpha
.
char2int()
gives an integer vector of length nchar(x)
of integer codes according to alpha
(starting at 0 !).
int2alpha()
(which is used by int2char
)
and its inverse, int2alpha()
, both working with vectors
of single characters instead of multi-character strings.
char2int("vlmc", paste(letters, collapse="")) int2char(c(0:3, 3:1), "abcd") int2char(c(1:0,3,3), "abc") # to eat ;-)
char2int("vlmc", paste(letters, collapse="")) int2char(c(0:3, 3:1), "abcd") int2char(c(1:0,3,3), "abc") # to eat ;-)
Compute the log-likelihood or “entropy” of a fitted
vlmc
object. This is a method for the
generic logLik
.
entropy(object) ## S3 method for class 'vlmc' logLik(object, ...) entropy2(ivlmc1, ivlmc2, alpha.len = ivlmc1[1])
entropy(object) ## S3 method for class 'vlmc' logLik(object, ...) entropy2(ivlmc1, ivlmc2, alpha.len = ivlmc1[1])
object |
typically the result of |
ivlmc1 , ivlmc2
|
two |
alpha.len |
positive integer specifying the alphabet length. |
... |
(potentially more arguments; required by generic) |
The logLik.vlmc()
method computes the log likelihood for a fitted
vlmc
object. entropy
is an alias for
logLik
for reasons of back compatibility.
entropy2
is less clear ... ... [[[ FIXME ]]] ... ...
a negative number, in some contexts typically further divided by
log(x$alpha.len)
.
Note that the logLik
method is used by the default method of
the AIC
generic function (from R version 1.4.x), and
hence provides AIC(object)
for vlmc objects. Also, since vlmc
version 1.3-13, BIC()
works as well.
Martin Maechler
deviance.vlmc
,
vlmc
, draw.vlmc
.
dd <- cumsum(rpois(999, 1.5)) %% 10 (vd <- vlmc(dd)) entropy(vd)# the bare number logLik(vd) logLik(vdL <- vlmc(dd, cutoff = 3)) entropy2(vd $vlmc.vec, vdL$vlmc.vec) ## AIC model selection: f1 <- c(1,0,0,0) # as in example(vlmc) f2 <- rep(1:0,2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) AIC(print(vlmc(dt1))) AIC(print(vlmc(dt1, cutoff = 2.6))) AIC(print(vlmc(dt1, cutoff = 0.4)))# these two differ ``not really'' AIC(print(vlmc(dt1, cutoff = 0.1))) ## Show how to compute it from the fitted conditional probabilities : logLikR <- function(x) { dn <- dimnames(pr <- predict(x)) sum(log(pr[cbind(2:nrow(pr), match(dn[[1]][-1], dn[[2]]))])) } all.equal( logLikR(vd), c(logLik (vd)), tol=1e-10) # TRUE, they do the same ## Compare different ones: [cheap example]: example(draw) for(n in ls()) if(is.vlmc(get(n))) { vv <- get(n) cat(n,":",formatC(logLik(vv) / log(vv$alpha.len), format= "f", wid=10),"\n") }
dd <- cumsum(rpois(999, 1.5)) %% 10 (vd <- vlmc(dd)) entropy(vd)# the bare number logLik(vd) logLik(vdL <- vlmc(dd, cutoff = 3)) entropy2(vd $vlmc.vec, vdL$vlmc.vec) ## AIC model selection: f1 <- c(1,0,0,0) # as in example(vlmc) f2 <- rep(1:0,2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) AIC(print(vlmc(dt1))) AIC(print(vlmc(dt1, cutoff = 2.6))) AIC(print(vlmc(dt1, cutoff = 0.4)))# these two differ ``not really'' AIC(print(vlmc(dt1, cutoff = 0.1))) ## Show how to compute it from the fitted conditional probabilities : logLikR <- function(x) { dn <- dimnames(pr <- predict(x)) sum(log(pr[cbind(2:nrow(pr), match(dn[[1]][-1], dn[[2]]))])) } all.equal( logLikR(vd), c(logLik (vd)), tol=1e-10) # TRUE, they do the same ## Compare different ones: [cheap example]: example(draw) for(n in ls()) if(is.vlmc(get(n))) { vv <- get(n) cat(n,":",formatC(logLik(vv) / log(vv$alpha.len), format= "f", wid=10),"\n") }
Amount of daily rainfall in Melbourne, Australia, 1981-1990, measured in millimeters. The amounts are integers with many zeros and three days of more than 500mm rain.
data(OZrain)
data(OZrain)
A time-series of length 3653 with the amount of daily rainfall in mm.
Because of the two leap years 1984 and '88, we have constructed it
with ts(*, start=1981, frequency=365.25,
end = 1981+ (3653 - 1)/365.25)
.
There must be one extra observation since for the ten years with two leap years, there are only 3652 days. In 61 out of 100 days, there's no rain.
‘rainfall.dat’ in Rob J. Hyndman's Time Series Data Library, currently available at https://pkg.yangzhuoranyang.com/tsdl/
originally, Australian Bureau of Meteorology, https://www.abs.gov.au.
data(OZrain) (n <- length(OZrain)) ## should be 1 more than ISOdate(1990,12,31) - ISOdate(1981, 1,1)## but it's 2 .. has.rain <- OZrain > 0 summary(OZrain[has.rain])# Median = 18, Q3 = 50 table(rain01 <- as.integer(has.rain)) table(rain4c <- cut(OZrain, c(-.1, 0.5, 18.5, 50.1, 1000))) AIC(v1 <- vlmc(rain01))# cutoff = 1.92 AIC(v00 <- vlmc(rain01, cut = 1.4)) AIC(v0 <- vlmc(rain01, cut = 1.5)) hist(OZrain) hist(OZrain, breaks = c(0,1,5,10,50,1000), xlim = c(0,100)) plot(OZrain, main = "Rainfall 1981-1990 in Melbourne") plot(OZrain, log="y", main = "Non-0 Rainfall [LOG scale]") lOZ <- lowess(log10(OZrain[has.rain]), f= .05) lines(time(OZrain)[has.rain], 10^lOZ$y, col = 2, lwd = 2)
data(OZrain) (n <- length(OZrain)) ## should be 1 more than ISOdate(1990,12,31) - ISOdate(1981, 1,1)## but it's 2 .. has.rain <- OZrain > 0 summary(OZrain[has.rain])# Median = 18, Q3 = 50 table(rain01 <- as.integer(has.rain)) table(rain4c <- cut(OZrain, c(-.1, 0.5, 18.5, 50.1, 1000))) AIC(v1 <- vlmc(rain01))# cutoff = 1.92 AIC(v00 <- vlmc(rain01, cut = 1.4)) AIC(v0 <- vlmc(rain01, cut = 1.5)) hist(OZrain) hist(OZrain, breaks = c(0,1,5,10,50,1000), xlim = c(0,100)) plot(OZrain, main = "Rainfall 1981-1990 in Melbourne") plot(OZrain, log="y", main = "Non-0 Rainfall [LOG scale]") lOZ <- lowess(log10(OZrain[has.rain]), f= .05) lines(time(OZrain)[has.rain], 10^lOZ$y, col = 2, lwd = 2)
Compute predictions on a fitted VLMC object
for each (but the first) element of another discrete time series.
Computes by default a matrix of prediction probabilities. The argument
type
allows other predictions such as the most probable
"class"
or "response"
, the context length (tree
"depth"
), or an "ID"
of the corresponding context.
## S3 method for class 'vlmc' predict(object, newdata, type = c("probs", "class","response", "id.node", "depth", "ALL"), se.fit = FALSE, allow.subset = TRUE, check.alphabet=TRUE, ...) ## S3 method for class 'vlmc' fitted(object, ...)
## S3 method for class 'vlmc' predict(object, newdata, type = c("probs", "class","response", "id.node", "depth", "ALL"), se.fit = FALSE, allow.subset = TRUE, check.alphabet=TRUE, ...) ## S3 method for class 'vlmc' fitted(object, ...)
object |
typically the result of |
newdata |
a discrete “time series”, a numeric, character or
factor, as the |
type |
character indicating the type of prediction required,
options given in the Usage secion above, see also the
Value section below. The default |
se.fit |
a switch indicating if standard errors are required.
|
allow.subset |
logical; if |
check.alphabet |
logical; if |
... |
(potentially further arguments) required by generic. |
Depending on the type
argument,
"probs" |
an
|
"class" , "response"
|
the corresponding most probable value of Y[];
as |
"id.node" |
an (integer) “ID” of the current context (= node of the tree represented VLMC). |
"depth" |
the context length, i.e., the depth of the
Markov chain, at the current observation (of |
"ALL" |
an object of class
which has its own print method ( |
The predict
method and its possible arguments may still be
developed, and we are considering to return the marginal
probabilities instead of NA
for the first value(s).
The print
method print.predict.vlmc
uses
fractions
from package MASS to display
the probabilities , for
, as these are rational
numbers, shown as fractions of integers.
vlmc
and residuals.vlmc
. For
simulation, simulate.vlmc
.
f1 <- c(1,0,0,0) f2 <- rep(1:0,2) (dt2 <- rep(c(f1,f1,f2,f1,f2,f2,f1),2)) (vlmc.dt2c15 <- vlmc(dt2, cutoff = 1.5)) draw(vlmc.dt2c15) ## Fitted Values: all.equal(predict(vlmc.dt2c15, dt2), predict(vlmc.dt2c15)) (pa2c15 <- predict(vlmc.dt2c15, type = "ALL")) ## Depth = context length ([1] : NA) : stopifnot(nchar(pa2c15 $ ctxt)[-1] == predict(vlmc.dt2c15, type = "depth")[-1]) same <- (ff1 <- pa2c15 $ fitted) == (ff2 <- int2alpha(predict(vlmc.dt2c15, type ="response"), alpha="01")) which(!same) #-> some are different, since max.col() breaks ties at random! ndt2 <- c(rep(0,6),f1,f1,f2) predict(vlmc.dt2c15, ndt2, "ALL") (newdt2 <- sample(dt2, 17)) pm <- predict(vlmc.dt2c15, newdt2, allow.subset = TRUE) summary(apply(pm, 1, sum))# all 1 predict(vlmc.dt2c15, newdt2, type = "ALL") data(bnrf1) (vbnrf <- vlmc(bnrf1EB)) (pA <- predict(vbnrf, bnrf1EB[1:24], type = "ALL")) pc <- predict(vbnrf, bnrf1EB[1:24], type = "class") pr <- predict(vbnrf, bnrf1EB[1:24], type = "resp") stopifnot(as.integer (pc[-1]) == 1 + pr[-1], as.character(pc[-1]) == strsplit(vbnrf$alpha,NULL)[[1]][1 + pr[-1]]) ##-- Example of a "perfect" fit -- just for illustration: ## the default, thresh = 2 doesn't fit perfectly(i=38) (vlmc.dt2c0th1 <- vlmc(dt2, cutoff = 0, thresh = 1)) ## "Fitted" = "Data" (but the first which can't be predicted): stopifnot(dt2[-1] == predict(vlmc.dt2c0th1,type = "response")[-1])
f1 <- c(1,0,0,0) f2 <- rep(1:0,2) (dt2 <- rep(c(f1,f1,f2,f1,f2,f2,f1),2)) (vlmc.dt2c15 <- vlmc(dt2, cutoff = 1.5)) draw(vlmc.dt2c15) ## Fitted Values: all.equal(predict(vlmc.dt2c15, dt2), predict(vlmc.dt2c15)) (pa2c15 <- predict(vlmc.dt2c15, type = "ALL")) ## Depth = context length ([1] : NA) : stopifnot(nchar(pa2c15 $ ctxt)[-1] == predict(vlmc.dt2c15, type = "depth")[-1]) same <- (ff1 <- pa2c15 $ fitted) == (ff2 <- int2alpha(predict(vlmc.dt2c15, type ="response"), alpha="01")) which(!same) #-> some are different, since max.col() breaks ties at random! ndt2 <- c(rep(0,6),f1,f1,f2) predict(vlmc.dt2c15, ndt2, "ALL") (newdt2 <- sample(dt2, 17)) pm <- predict(vlmc.dt2c15, newdt2, allow.subset = TRUE) summary(apply(pm, 1, sum))# all 1 predict(vlmc.dt2c15, newdt2, type = "ALL") data(bnrf1) (vbnrf <- vlmc(bnrf1EB)) (pA <- predict(vbnrf, bnrf1EB[1:24], type = "ALL")) pc <- predict(vbnrf, bnrf1EB[1:24], type = "class") pr <- predict(vbnrf, bnrf1EB[1:24], type = "resp") stopifnot(as.integer (pc[-1]) == 1 + pr[-1], as.character(pc[-1]) == strsplit(vbnrf$alpha,NULL)[[1]][1 + pr[-1]]) ##-- Example of a "perfect" fit -- just for illustration: ## the default, thresh = 2 doesn't fit perfectly(i=38) (vlmc.dt2c0th1 <- vlmc(dt2, cutoff = 0, thresh = 1)) ## "Fitted" = "Data" (but the first which can't be predicted): stopifnot(dt2[-1] == predict(vlmc.dt2c0th1,type = "response")[-1])
This is an auxiliary function which recursively displays (prints) the
integer result vector of a vlmc
fit.
prt.vvec(v, nalph, pad=" ")
prt.vvec(v, nalph, pad=" ")
v |
typically |
nalph |
alphabet size; typically |
pad |
character, to be used for padding |
summary.vlmc
which uses prt.vvec
.
example(vlmc) str(vv <- vlmc.dt1$vlmc) prt.vvec(vv[-1], n = 2) prt.vvec(vv[-1], n = 2, pad = " | ")
example(vlmc) str(vv <- vlmc.dt1$vlmc) prt.vvec(vv[-1], n = 2) prt.vvec(vv[-1], n = 2, pad = " | ")
Plots the residuals of a fitted VLMC model against the contexts, i.e., produces a boxplot of residuals for all contexts used in the model fit.
This has proven to be useful function, and the many optional arguments allow quite a bit of customization. However, the current implementation is somewhat experimental and the defaults have been chosen from only a few examples.
RCplot(x, r2 = residuals(x, "deviance")^2, alphabet = x$alpha, lab.horiz = k <= 20, do.call = TRUE, cex.axis = if (k <= 20) 1 else if (k <= 40) 0.8 else 0.6, y.fact = if (.Device == "postscript") 1.2 else 0.75, col = "gray70", xlab = "Context", main = NULL, med.pars = list(col = "red", pch = 12, cex = 1.25 * cex.axis), ylim = range(0, r2, finite = TRUE), ...)
RCplot(x, r2 = residuals(x, "deviance")^2, alphabet = x$alpha, lab.horiz = k <= 20, do.call = TRUE, cex.axis = if (k <= 20) 1 else if (k <= 40) 0.8 else 0.6, y.fact = if (.Device == "postscript") 1.2 else 0.75, col = "gray70", xlab = "Context", main = NULL, med.pars = list(col = "red", pch = 12, cex = 1.25 * cex.axis), ylim = range(0, r2, finite = TRUE), ...)
x |
an R object of class |
r2 |
numeric vector, by default of squared deviance residuals of
|
alphabet |
the alphabet to use for labeling the contexts,
via |
lab.horiz |
logical indicating if the context labels should be written horizontally or vertically. |
do.call |
logical indicating if the |
cex.axis |
the character expansion for axis labeling, see also
|
y.fact |
numeric factor for expanding the space to use for the
context labels (when |
col |
color used for filling the boxes. |
xlab |
x axis label (with default). |
main |
main title to be used, |
med.pars |
graphical parameters to be used for coding of medians that are almost 0. |
ylim |
y range limits for plotting. |
... |
further arguments to be passed to |
Invisibly, a list with components
k |
the number of contexts (and hence box plots) used. |
fID |
a factor (as used in the interncal call to
|
rp |
a list as resulting from the above call to |
Martin Maechler
Mächler M. and Bühlmann P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. J. Computational and Graphical Statistics 2, 435–455.
summary.vlmc
for other properties of a VLMC model.
example(vlmc) RCplot(vlmc.pres) RCplot(vlmc.dt1c01)## << almost perfect fit (0 resid.)
example(vlmc) RCplot(vlmc.pres) RCplot(vlmc.dt1c01)## << almost perfect fit (0 resid.)
Compute residuals of a fitted vlmc
object.
This is yet a matter of research and may change in the future.
## S3 method for class 'vlmc' residuals(object, type = c("classwise", "deviance", "pearson", "working", "response", "partial"), y = object$y, ...)
## S3 method for class 'vlmc' residuals(object, type = c("classwise", "deviance", "pearson", "working", "response", "partial"), y = object$y, ...)
object |
typically the result of |
type |
The type of residuals to compute, defaults to
|
y |
discrete time series with respect to which the residuals are to be computed. |
... |
possibly further arguments (none at the moment). |
If type = "classwise"
(the default), a numeric matrix of dimension
of values
where the indicator
is 1 iff
y[i] == a[j]
and a
is the alphabet (or levels) of
y
, and are the elements of the estimated (1-step
ahead) predicted probabilities,
p <- predict(object)
.
Hence, for each , the only positive residual stands for the
observed class.
For all other type
s, the result is
a numeric vector of the length of the original time-series (with first
element NA
).
For type = "deviance"
,
where
is the predicted probability for the i-th
observation which is the same as
above (now
assuming
).
The sum of the squared deviance residuals is the deviance of
the fitted model.
Martin Maechler
vlmc
,deviance.vlmc
, and
RCplot
for a novel residual plot.
example(vlmc) rp <- residuals(vlmc.pres) stopifnot(all(abs(apply(rp[-1,],1,sum)) < 1e-15)) matplot(seq(presidents), rp, ylab = "residuals", type="l") ## ``Tukey-Anscombe'' (the following is first stab at plot method): matplot(fitted(vlmc.pres), rp, ylab = "residuals", xaxt = "n", type="b", pch=vlmc.pres$alpha) axis(1, at = 0:(vlmc.pres$alpha.len-1), labels = strsplit(vlmc.pres$alpha,"")[[1]]) summary(rd <- residuals(vlmc.pres, type = "dev")) rd[1:7] ## sum of squared dev.residuals === deviance : all.equal(sum(rd[-1] ^ 2), deviance(vlmc.pres))
example(vlmc) rp <- residuals(vlmc.pres) stopifnot(all(abs(apply(rp[-1,],1,sum)) < 1e-15)) matplot(seq(presidents), rp, ylab = "residuals", type="l") ## ``Tukey-Anscombe'' (the following is first stab at plot method): matplot(fitted(vlmc.pres), rp, ylab = "residuals", xaxt = "n", type="b", pch=vlmc.pres$alpha) axis(1, at = 0:(vlmc.pres$alpha.len-1), labels = strsplit(vlmc.pres$alpha,"")[[1]]) summary(rd <- residuals(vlmc.pres, type = "dev")) rd[1:7] ## sum of squared dev.residuals === deviance : all.equal(sum(rd[-1] ^ 2), deviance(vlmc.pres))
Simulate from fitted VLMC model – basis of the VLMC bootstrap
## S3 method for class 'vlmc' simulate(object, nsim = 1, seed = NULL, n, n.start = 64 * object$size[["context"]], integer.return = FALSE, keep.RSeed = TRUE, ...)
## S3 method for class 'vlmc' simulate(object, nsim = 1, seed = NULL, n, n.start = 64 * object$size[["context"]], integer.return = FALSE, keep.RSeed = TRUE, ...)
object |
typically the result of |
nsim , n
|
non-negative integer, giving the length of the result.
Note that |
seed |
random seed initializer; see |
n.start |
the number of initial values to be discarded (because of initial effects). |
integer.return |
logical; if |
keep.RSeed |
logical indicating if the seed should be stored with
the result (as ‘required’ by the generic
|
... |
(potentially further arguments for other |
The .Random.seed
is used and updated as with other random
number generation routines such as rbinom
.
Note that if you want to simulate from a given start sequence
x0
, you'd use predict.vlmc(x, x0, type= "response")
— actually not quite yet.
A "simulate.vlmc"
object, basically a vector of length
nsim
. Either integer
or character
,
depending on the integer.return
argument, see above. Further,
if keep.RSeed
was true (as by default), a "seed"
attribute
with the random seed at the start of the simulation, for reproducibility.
Martin Maechler
vlmc
and predict.vlmc
.
example(vlmc) simulate(vlmc.dt1, 100) simulate(vlmc.dt1c01, 100, int = TRUE) # n.start = 0: 1st few observations will resemble the data simulate(vlmc.dt1c01, 20, n.start=0, int = TRUE)
example(vlmc) simulate(vlmc.dt1, 100) simulate(vlmc.dt1c01, 100, int = TRUE) # n.start = 0: 1st few observations will resemble the data simulate(vlmc.dt1c01, 20, n.start=0, int = TRUE)
Compute (and print) a summary of a vlmc
object which is
typically the result of vlmc(..)
.
## S3 method for class 'vlmc' summary(object, ...) ## S3 method for class 'summary.vlmc' print(x, digits = getOption("digits"), vvec.printing = FALSE, ...)
## S3 method for class 'vlmc' summary(object, ...) ## S3 method for class 'summary.vlmc' print(x, digits = getOption("digits"), vvec.printing = FALSE, ...)
object |
an R object of class |
x |
an R object of class |
digits |
integer giving the number of significant digits for printing numbers. |
vvec.printing |
logical indicating if the |
... |
potentially further arguments [Generic]. |
summary.vlmc()
returns an object of class "summary.vlmc"
for which there's a print method. It is basically a list containing
all of object
, plus additionally
confusion.table |
the symmetric contingency table of data vs fitted. |
depth.stats |
statistics of Markov chain depth along the data;
currently just |
R2 |
the |
data(bnrf1) vb <- vlmc(bnrf1EB) svb <- summary(vb) svb
data(bnrf1) vb <- vlmc(bnrf1EB) svb <- summary(vb) svb
Fit a Variable Length Markov Chain (VLMC) to a discrete time series,
in basically two steps:
First a large Markov Chain is generated containing (all if
threshold.gen = 1
) the context states of the time series. In
the second step, many states of the MC are collapsed by pruning
the corresponding context tree.
Currently, the “alphabet” may contain can at most 26 different “character”s.
vlmc(dts, cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2, alpha.c = 0.05, threshold.gen = 2, code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE, dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) ) is.vlmc(x) ## S3 method for class 'vlmc' print(x, digits = max(3, getOption("digits") - 3), ...)
vlmc(dts, cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2, alpha.c = 0.05, threshold.gen = 2, code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE, dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) ) is.vlmc(x) ## S3 method for class 'vlmc' print(x, digits = max(3, getOption("digits") - 3), ...)
dts |
a discrete “time series”; can be a numeric, character or factor. |
cutoff.prune |
non-negative number; the cutoff used for pruning;
defaults to half the |
alpha.c |
number in (0,1) used to specify |
threshold.gen |
integer |
code1char |
logical; if true (default), the data |
y |
logical; if true (default), the data |
debug |
logical; should debugging info be printed to stderr. |
quiet |
logical; if true, don't print some warnings. |
dump |
integer in |
ctl.dump |
integer of length 2, say |
x |
a fitted |
digits |
integer giving the number of significant digits for printing numbers. |
... |
potentially further arguments [Generic]. |
A "vlmc"
object, basically a list with components
nobs |
length of data series when fit. (was named |
threshold.gen , cutoff.prune
|
the arguments (or their defaults). |
alpha.len |
the alphabet size. |
alpha |
the alphabet used, as one string. |
size |
a named integer vector of length (>=) 4, giving characteristic sizes of the fitted VLMC. Its named components are
|
vlmc.vec |
integer vector, containing (an encoding of) the fitted VLMC tree. |
y |
if |
call |
the |
Set cutoff = 0, thresh = 1
for getting a “perfect fit”,
i.e. a VLMC which perfectly re-predicts the data (apart from the first
observation). Note that even with cutoff = 0
some pruning may
happen, for all (terminal) nodes with =0.
Martin Maechler
Buhlmann P. and Wyner A. (1998) Variable Length Markov Chains. Annals of Statistics 27, 480–513.
Mächler M. and Bühlmann P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. J. Computational and Graphical Statistics 2, 435–455.
Mächler M. (2004) VLMC — Implementation and R interface; working paper.
draw.vlmc
,
entropy
, simulate.vlmc
for “VLMC bootstrapping”.
f1 <- c(1,0,0,0) f2 <- rep(1:0,2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) (vlmc.dt1 <- vlmc(dt1)) vlmc(dt1, dump = 1, ctl.dump = c(wid = 3, nmax = 20), debug = TRUE) (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1)) data(presidents) dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level levels(dpres)#-> make the alphabet -> warning vlmc.pres <- vlmc(dpres, debug = TRUE) vlmc.pres ## alphabet & and its length: vlmc.pres$alpha stopifnot( length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len ) ## You now can use larger alphabets (up to 95) letters: set.seed(7); it <- sample(40, 20000, replace=TRUE) v40 <- vlmc(it) v40 ## even larger alphabets now give an error: il <- sample(100, 10000, replace=TRUE) ee <- tryCatch(vlmc(il), error= function(e)e) stopifnot(is(ee, "error"))
f1 <- c(1,0,0,0) f2 <- rep(1:0,2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) (vlmc.dt1 <- vlmc(dt1)) vlmc(dt1, dump = 1, ctl.dump = c(wid = 3, nmax = 20), debug = TRUE) (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1)) data(presidents) dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level levels(dpres)#-> make the alphabet -> warning vlmc.pres <- vlmc(dpres, debug = TRUE) vlmc.pres ## alphabet & and its length: vlmc.pres$alpha stopifnot( length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len ) ## You now can use larger alphabets (up to 95) letters: set.seed(7); it <- sample(40, 20000, replace=TRUE) v40 <- vlmc(it) v40 ## even larger alphabets now give an error: il <- sample(100, 10000, replace=TRUE) ee <- tryCatch(vlmc(il), error= function(e)e) stopifnot(is(ee, "error"))
Character string, giving the version number (and date) of the VLMC package.
vlmc.version ## Not run: [1] "VLMC 1.3-14; after $Date: 2014/06/03 08:05:21 $ UTC" ## End(Not run)
vlmc.version ## Not run: [1] "VLMC 1.3-14; after $Date: 2014/06/03 08:05:21 $ UTC" ## End(Not run)
Compute the tree representation of a "vlmc"
object as R
list
.
vlmctree(x) ## S3 method for class 'vtree' str(object, ...) .vvec2tree(vv, k, chk.lev)
vlmctree(x) ## S3 method for class 'vtree' str(object, ...) .vvec2tree(vv, k, chk.lev)
x , object
|
typically the result of |
vv |
integer vector encoding the fitted vlmc, typically
|
k |
integer, the alphabet size. |
chk.lev |
integer internally used for consistency checking. |
... |
further arguments passed to or from methods. |
.vvec2tree
is the internal (recursive) function building up the
tree.
str.vtree
is a method for the generic str
function and typically for the output of vlmctree()
. For each
node, it gives the “parenting level” in braces and the counts.
A list
of class
"vtree"
representing the tree structure recursively.
Each “node” of the tree is itself a list with components
level |
length-2 integer giving the level in {0,1,...},
counted from the root (which is |
count |
integer vector of length |
total |
equals to |
child |
a list (of length |
Martin Maechler
vlmc
.
data(presidents) dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level (vlmc.prc1 <- vlmc(dpres, cut = 1, debug = TRUE)) str(vv.prc1 <- vlmctree(vlmc.prc1))
data(presidents) dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level (vlmc.prc1 <- vlmc(dpres, cut = 1, debug = TRUE)) str(vv.prc1 <- vlmctree(vlmc.prc1))