Title: | Supervised Clustering of Predictor Variables Such as Genes |
---|---|
Description: | Methodology for supervised grouping aka "clustering" of potentially many predictor variables, such as genes etc, implementing algorithms 'PELORA' and 'WILMA'. |
Authors: | Marcel Dettling [aut], Martin Maechler [aut, cre] |
Maintainer: | Martin Maechler <[email protected]> |
License: | GPL-3 |
Version: | 1.1-2 |
Built: | 2024-12-12 04:10:50 UTC |
Source: | https://github.com/mmaechler/supclust |
These are not to be called by the user.
Yields the coefficients of the penalized logistic
regression model that is fitted by pelora
with its groups of
predictor variables (genes) as input
## S3 method for class 'pelora' coef(object, ...)
## S3 method for class 'pelora' coef(object, ...)
object |
an R object of |
... |
further arguments passed to and from methods. |
A numeric vector of length , giving the
penalized logistic regression coefficients for the intercept and the
noc
groups and/or single variables identified by pelora
.
Marcel Dettling, [email protected]
pelora
, also for references.
## Running the examples of Pelora's help page example(pelora, echo = FALSE) coef(fit)
## Running the examples of Pelora's help page example(pelora, echo = FALSE) coef(fit)
The four functions nnr
(nearest neighbor rule),
dlda
(diagonal linear discriminant analysis), logreg
(logistic regression) and aggtrees
(aggregated trees) are used
for binary classification with the cluster representatives of Wilma's
output.
dlda (xlearn, xtest, ylearn) nnr (xlearn, xtest, ylearn) logreg (xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)
dlda (xlearn, xtest, ylearn) nnr (xlearn, xtest, ylearn) logreg (xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)
xlearn |
Numeric matrix of explanatory variables ( |
xtest |
A numeric matrix of explanatory variables ( |
ylearn |
Numeric vector of length |
nnr
implements the 1-nearest-neighbor-rule with
Euclidean distance function. dlda
is linear discriminant
analysis, using the restriction that the covariance matrix is diagonal
with equal variance for all predictors. logreg
is default
logistic regression. aggtrees
fits a default stump (a
classification tree with two terminal nodes) by rpart
for every
predictor variable and uses majority voting to determine the final
classifier.
Numeric vector of length , containing the predicted class
labels for the test observations. The class labels are coded by 0 and
1.
Marcel Dettling
see those in wilma
.
## Generating random learning data: 20 observations and 10 variables (clusters) set.seed(342) xlearn <- matrix(rnorm(200), nrow = 20, ncol = 10) ## Generating random test data: 8 observations and 10 variables(clusters) xtest <- matrix(rnorm(80), nrow = 8, ncol = 10) ## Generating random class labels for the learning data ylearn <- as.numeric(runif(20)>0.5) ## Predicting the class labels for the test data nnr(xlearn, xtest, ylearn) dlda(xlearn, xtest, ylearn) logreg(xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)
## Generating random learning data: 20 observations and 10 variables (clusters) set.seed(342) xlearn <- matrix(rnorm(200), nrow = 20, ncol = 10) ## Generating random test data: 8 observations and 10 variables(clusters) xtest <- matrix(rnorm(80), nrow = 8, ncol = 10) ## Generating random class labels for the learning data ylearn <- as.numeric(runif(20)>0.5) ## Predicting the class labels for the test data nnr(xlearn, xtest, ylearn) dlda(xlearn, xtest, ylearn) logreg(xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)
Yields the fitted values, i.e., the centroids of the (gene)
groups that have been identified by pelora
.
## S3 method for class 'pelora' fitted(object, ...)
## S3 method for class 'pelora' fitted(object, ...)
object |
An R object of |
... |
Further arguments passed to and from methods. |
Numeric matrix of fitted values (for cases in rows, and
noc
group centroids in columns).
Marcel Dettling, [email protected]
pelora
, also for references.
## Running the examples of Pelora's help page example(pelora, echo = FALSE) fitted(fit)
## Running the examples of Pelora's help page example(pelora, echo = FALSE) fitted(fit)
Yields the fitted values, i.e. the centroids of the (gene)
clusters that have been found by wilma
.
## S3 method for class 'wilma' fitted(object, ...)
## S3 method for class 'wilma' fitted(object, ...)
object |
An R object of |
... |
further arguments passed to and from methods. |
Numeric matrix of fitted values (for cases in rows, and
noc
group centroids in columns).
Marcel Dettling, [email protected]
wilma
, also for references.
## Running the examples of Wilma's help page example(wilma, echo = FALSE) fitted(fit)
## Running the examples of Wilma's help page example(wilma, echo = FALSE) fitted(fit)
Part of the training set of the famous AML/ALL-leukemia dataset from the Whitehead Institute. It has been reduced to 250 genes, about half of which are very informative for classification, whereas the other half was chosen randomly.
data(leukemia)
data(leukemia)
Contains three R-objects:
The expression (38 x 250) matrix leukemia.x
,
the associated binary (0,1
) response variable leukemia.y
,
and the associated 3-class response variable leukemia.z
with
values in 0,1,2
.
Marcel Dettling
originally at http://www.genome.wi.mit.edu/MPR/
, (which is
not a valid URL any more).
First published in
Golub et al. (1999)
Molecular Classification of Cancer: Class Discovery and Class
Prediction by Gene Expression Monitoring.
Science 286, 531–538.
data(leukemia, package="supclust") str(leukemia.x) str(leukemia.y) str(leukemia.z) op <- par(mfrow= 1:2) plot(leukemia.x[,56], leukemia.y) plot(leukemia.x[,174],leukemia.z) par(op)
data(leukemia, package="supclust") str(leukemia.x) str(leukemia.y) str(leukemia.z) op <- par(mfrow= 1:2) plot(leukemia.x[,56], leukemia.y) plot(leukemia.x[,174],leukemia.z) par(op)
For a set of observations grouped into two classes (for
example
expression values of a gene), the
margin
function measures the size of the gap between the classes. This is the
distance between the observation of response class zero having the
lowest value, and the individual of with response one having the
highest value.
margin(x, resp)
margin(x, resp)
x |
Numeric vector of length |
resp |
Numeric vector of length |
A numeric value, the margin
. Positive margin
indicates perfect separation of the response classes, whereas negative
margin
means imperfect separation.
Marcel Dettling
see those in wilma
.
wilma
, score
is the second statistic
that is used there.
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)
Performs selection and supervised grouping of predictor variables in large (microarray gene expression) datasets, with an option for simultaneous classification. Works in a greedy forward strategy and optimizes the binomial log-likelihood, based on estimated conditional probabilities from penalized logistic regression analysis.
pelora(x, y, u = NULL, noc = 10, lambda = 1/32, flip = "pm", standardize = TRUE, trace = 1)
pelora(x, y, u = NULL, noc = 10, lambda = 1/32, flip = "pm", standardize = TRUE, trace = 1)
x |
Numeric matrix of explanatory variables ( |
y |
Numeric vector of length |
u |
Numeric matrix of additional (clinical) explanatory variables
( |
noc |
Integer, the number of clusters that should be searched for on the data. |
lambda |
Real, defaults to 1/32. Rescaled penalty parameter that
should be in |
flip |
Character string, describing a method how the |
standardize |
Logical, defaults to |
trace |
Integer >= 0; when positive, the output of the internal
loops is provided; |
pelora
returns an object of class "pelora". The functions
print
and summary
are used to obtain an overview of the
variables (genes) that have been selected and the groups that have
been formed. The function plot
yields a two-dimensional
projection into the space of the first two group centroids that
pelora
found. The generic function fitted
returns
the fitted values, these are the cluster representatives. coef
returns the penalized logistic regression coefficients
for each of the predictors. Finally,
predict
is used for
classifying test data with Pelora's internal penalized logistic
regression classifier on the basis of the (gene) groups that have been
found.
An object of class "pelora" is a list containing:
genes |
A list of length |
values |
A numerical matrix with dimension |
y |
Numeric vector of length |
steps |
Numerical vector of length |
lambda |
The rescaled penalty parameter. |
noc |
The number of clusters that has been searched for on the data. |
px |
The number of columns (genes) in the |
flip |
The method that has been chosen for sign-flipping the
|
var.type |
A factor with |
crit |
A list of length |
signs |
Numerical vector of length |
samp.names |
The names of the samples (rows) in the
|
gene.names |
The names of the variables (columns) in the
|
call |
The function call. |
Marcel Dettling, [email protected]
Marcel Dettling (2003) Extracting Predictive Gene Groups from Microarray Data and Combining them with Clinical Variables https://stat.ethz.ch/Manuscripts/dettling/presentation3.pdf
Marcel Dettling and Peter Bühlmann (2002). Supervised Clustering of Genes. Genome Biology, 3(12): research0069.1-0069.15, doi:10.1186/gb-2002-3-12-research0069.
Marcel Dettling and Peter Bühlmann (2004). Finding Predictive Gene Groups from Microarray Data. Journal of Multivariate Analysis 90, 106–131, doi:10.1016/j.jmva.2004.02.012
wilma
for another supervised clustering technique.
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Pelora fit <- pelora(leukemia.x, leukemia.y, noc = 3) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", noc = c(1,2,3)) predict(fit, newdata = xN, type = "pro", noc = c(1,3)) ## Fitting Pelora such that the first 70 variables (genes) are not grouped fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70]) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70]) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc = 1:10) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Pelora fit <- pelora(leukemia.x, leukemia.y, noc = 3) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", noc = c(1,2,3)) predict(fit, newdata = xN, type = "pro", noc = c(1,3)) ## Fitting Pelora such that the first 70 variables (genes) are not grouped fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70]) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70]) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc = 1:10) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")
Yields a projection of the cases (for example gene
expression profiles) into the space of the first two gene group
centroids that were identified by
pelora
.
## S3 method for class 'pelora' plot(x, main = "2-Dimensional Projection Pelora's output", xlab = NULL, ylab = NULL, col = seq(x$yvals), ...)
## S3 method for class 'pelora' plot(x, main = "2-Dimensional Projection Pelora's output", xlab = NULL, ylab = NULL, col = seq(x$yvals), ...)
x |
An R object of |
main |
A character string, giving the title of the plot. |
xlab |
A character string, giving the annotation of the
|
ylab |
A character string, giving the annotation of the
|
col |
A numeric vector of length 2, coding the colors that will be used for plotting the class labels. |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
pelora
, also for references.
## Running the examples of Pelora's help page example(pelora, echo = FALSE) plot(fit)
## Running the examples of Pelora's help page example(pelora, echo = FALSE) plot(fit)
Yields a projection of the cases (for example gene
expression profiles) into the space of the first two gene group
centroids that were identified by
wilma
.
## S3 method for class 'wilma' plot(x, xlab = NULL, ylab = NULL, col = seq(x$yvals), main = "2-Dimensional Projection of Wilma's Output", ...)
## S3 method for class 'wilma' plot(x, xlab = NULL, ylab = NULL, col = seq(x$yvals), main = "2-Dimensional Projection of Wilma's Output", ...)
x |
an R object of |
xlab |
character string, giving the annotation of the
|
ylab |
character string, giving the annotation of the
|
col |
a numeric vector of length 2, coding the colors that will be used for plotting the class labels. |
main |
a character string, giving the title of the plot. |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
wilma
, also for references.
## Running the examples of Wilma's help page example(wilma, echo = FALSE) plot(fit)
## Running the examples of Wilma's help page example(wilma, echo = FALSE) plot(fit)
Yields fitted values, predicted class labels and
conditional probability estimates for training and test data, which
are based on the gene groups pelora
found, and on its internal
penalized logistic regression classifier.
## S3 method for class 'pelora' predict(object, newdata = NULL, newclin = NULL, type = c("fitted", "probs", "class"), noc = object$noc, ...)
## S3 method for class 'pelora' predict(object, newdata = NULL, newclin = NULL, type = c("fitted", "probs", "class"), noc = object$noc, ...)
object |
An R object of |
newdata |
Numeric matrix with the same number of explanatory
variables as the original |
newclin |
Numeric matrix with the same number of additional
(clinical) explanatory variables as the original |
type |
Character string, describing whether fitted values
|
noc |
Integer, saying with how many clusters the fitted values, probability estimates or class labels should be determined. Also numeric vectors are allowed as an argument. The output is then a numeric matrix with fitted values, probability estimates or class labels for a multiple number of clusters. |
... |
Further arguments passed to and from methods. |
If newdata = NULL
, then the in-sample fitted values,
probability estimates and class label predictions are returned.
Depending on whether noc
is a single number or a numeric
vector. In the first case, a numeric vector of length is
returned, which contains fitted values for
noc
clusters, or
probability estimates/class label predictions with noc
clusters.
In the latter case, a numeric matrix with length(noc)
columns,
each containing fitted values for noc
clusters, or
probability estimates/class label predictions with noc
clusters, is returned.
Marcel Dettling, [email protected]
pelora
, also for references.
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Pelora fit <- pelora(leukemia.x, leukemia.y, noc = 3) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", noc = c(1,2,3)) predict(fit, newdata = xN, type = "pro", noc = c(1,3)) ## Fitting Pelora such that the first 70 variables (genes) are not grouped fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70]) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70]) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc = 1:10) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Pelora fit <- pelora(leukemia.x, leukemia.y, noc = 3) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", noc = c(1,2,3)) predict(fit, newdata = xN, type = "pro", noc = c(1,3)) ## Fitting Pelora such that the first 70 variables (genes) are not grouped fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70]) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70]) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc = 1:10) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")
Yields fitted values or predicted class labels for training
and test data, which are based on the supervised gene clusters
wilma
found, and on a choice of four different classifiers: the
nearest-neighbor rule, diagonal linear discriminant analysis, logistic
regression and aggregated trees.
## S3 method for class 'wilma' predict(object, newdata = NULL, type = c("fitted", "class"), classifier = c("nnr", "dlda", "logreg", "aggtrees"), noc = object$noc, ...)
## S3 method for class 'wilma' predict(object, newdata = NULL, type = c("fitted", "class"), classifier = c("nnr", "dlda", "logreg", "aggtrees"), noc = object$noc, ...)
object |
an R object of |
newdata |
numeric matrix with the same number of explanatory
variables as the original |
type |
character string describing whether fitted values
|
classifier |
character string specifying which classifier should
be used. Choices are |
noc |
integer specifying how many clusters the fitted values or class label predictions should be determined. Also numeric vectors are allowed as an argument. The output is then a numeric matrix with fitted values or class label predictions for a multiple number of clusters. |
... |
further arguments passed to and from methods. |
If newdata = NULL
, then the in-sample fitted values or class
label predictions are returned.
Depending on whether noc
is a single number or a numeric
vector. In the first case, a numeric vector of length is
returned, which contains fitted values for
noc
clusters, or
class label predictions with noc
clusters.
In the latter case, a numeric matrix with length(noc)
columns,
each containing fitted values for noc
clusters, or class label
predictions with noc
clusters, is returned.
Marcel Dettling, [email protected]
wilma
also for references, and for the four
classifiers
nnr
, dlda
, logreg
,
aggtrees
.
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Wilma fit <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1) ## Fitted values and class predictions for the training data predict(fit, type = "cla") predict(fit, type = "fitt") ## Predicting fitted values and class labels for test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3)) predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3)) predict(fit, newdata = xN, type = "cla", classifier = "logreg") predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Wilma fit <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1) ## Fitted values and class predictions for the training data predict(fit, type = "cla") predict(fit, type = "fitt") ## Predicting fitted values and class labels for test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3)) predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3)) predict(fit, newdata = xN, type = "cla", classifier = "logreg") predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")
Yields an overview about the type, size and final criterion
value of the predictor variables that were selected by pelora
.
## S3 method for class 'pelora' print(x, digits = getOption("digits"), details = FALSE, ...)
## S3 method for class 'pelora' print(x, digits = getOption("digits"), details = FALSE, ...)
x |
an R object of |
digits |
the number of digits that should be printed. |
details |
logical, defaults to |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
pelora
, also for references.
## Running the examples of Pelora's help page example(pelora, echo = FALSE) print(fit)
## Running the examples of Pelora's help page example(pelora, echo = FALSE) print(fit)
Yields an overview about the size and the final criterion
values of the clusters that were selected by wilma
.
## S3 method for class 'wilma' print(x, ...)
## S3 method for class 'wilma' print(x, ...)
x |
An R object of |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
wilma
, also for references.
## Running the examples of Wilma's help page example(wilma, echo = FALSE) print(fit)
## Running the examples of Wilma's help page example(wilma, echo = FALSE) print(fit)
For a set of observations grouped into two classes (for
example
expression values of a gene), the
score
function measures the separation of the classes. It can be interpreted
as counting for each observation having response zero, the number of
individuals of response class one that are smaller, and summing up
these quantities.
score(x, resp)
score(x, resp)
x |
Numeric vector of length |
resp |
Numeric vector of length |
A numeric value, the score
. The minimal score
is
zero, the maximal score
is the product of the number of samples
in class 0 and class 1. Values near the minimal or maximal
score
indicate good separation, whereas intermediate
score
means poor separation.
Marcel Dettling, [email protected]
wilma
also for references;
margin
is the second statistic that is used there.
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Score = ", score(leukemia.x[,69], leukemia.y))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Score = ", score(leukemia.x[,161], leukemia.y),2)) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Score = ", score(x[,161], leukemia.y),2)) par(op)
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Score = ", score(leukemia.x[,69], leukemia.y))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Score = ", score(leukemia.x[,161], leukemia.y),2)) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Score = ", score(x[,161], leukemia.y),2)) par(op)
Computes the empirical correlation for each predictor
variable (gene) in the x
-Matrix with the response y
, and
multiplies its values with (-1) if the empirical correlation has a
negative sign. For gene expression data, this amounts to treating
under- and overexpression symmetrically. After the sign.change
,
low (expression) values point towards response class 0 and high
(expression) values point towards class 1.
sign.change(x, y)
sign.change(x, y)
x |
Numeric matrix of explanatory variables ( |
y |
Numeric vector of length |
Returns a list containing:
x.new |
The sign-flipped |
signs |
Numeric vector of length |
Marcel Dettling, [email protected]
pelora
also for references,
as well as for older methodology,
wilma
and sign.flip
.
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.change(leukemia.x, leukemia.y)$x.new plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.change(leukemia.x, leukemia.y)$x.new plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)
Computes the score
for each predictor variable
(gene) in the x
-Matrix, and multiplies its values with (-1) if
its score
is greater or equal than half of the maximal
score
. For gene expression data, this amounts to treating
under- and overexpression symmetrically. After the sign-flip
procedure, low (expression) values point towards response class 0 and
high (expression) values point towards class 1.
sign.flip(x, y)
sign.flip(x, y)
x |
Numeric matrix of explanatory variables ( |
y |
Numeric vector of length |
Returns a list containing:
flipped.matrix |
The sign-flipped |
signs |
Numeric vector of length |
Marcel Dettling, [email protected]
wilma
also for the references
and score
, as well as for a
newer methodology, pelora
and sign.change
.
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)# reset
data(leukemia, package="supclust") op <- par(mfrow=c(1,3)) plot(leukemia.x[,69],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,69], leukemia.y),2))) ## Sign-flipping is very important plot(leukemia.x[,161],leukemia.y) title(paste("Margin = ", round(margin(leukemia.x[,161], leukemia.y),2))) x <- sign.flip(leukemia.x, leukemia.y)$flipped.matrix plot(x[,161],leukemia.y) title(paste("Margin = ", round(margin(x[,161], leukemia.y),2))) par(op)# reset
Standardizes each column (gene) of the x
-matrix to
zero mean and unit variance. This function is not to be called by the
user, the standardization is handled internally in pelora
.
standardize.genes(exmat)
standardize.genes(exmat)
exmat |
Numeric matrix of explanatory variables ( |
Returns a list containing:
x |
The standardized |
means |
Numeric vector of length |
sdevs |
Numeric vector of length |
Marcel Dettling, [email protected]
pelora
also for the references.
Yields detailed information about the variables (genes) that have been selected, and how they were grouped.
## S3 method for class 'pelora' summary(object, digits, ...)
## S3 method for class 'pelora' summary(object, digits, ...)
object |
an R object of |
digits |
The number of digits that should be printed. |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
pelora
, also for references.
## Running the examples of Pelora's help page example(pelora, echo = FALSE) summary(fit)
## Running the examples of Pelora's help page example(pelora, echo = FALSE) summary(fit)
Yields detailed information about the variables (genes) that have been selected, and how they were clustered.
## S3 method for class 'wilma' summary(object, ...)
## S3 method for class 'wilma' summary(object, ...)
object |
An R object of |
... |
Further arguments passed to and from methods. |
Marcel Dettling, [email protected]
wilma
, also for references.
## Running the examples of Wilma's help page example(wilma, echo = FALSE) summary(fit)
## Running the examples of Wilma's help page example(wilma, echo = FALSE) summary(fit)
Performs supervised clustering of predictor variables for large (microarray gene expression) datasets. Works in a greedy forward strategy and optimizes a combination of the Wilcoxon and Margin statistics for finding the clusters.
wilma(x, y, noc, genes = NULL, flip = TRUE, once.per.clust = FALSE, trace = 0)
wilma(x, y, noc, genes = NULL, flip = TRUE, once.per.clust = FALSE, trace = 0)
x |
Numeric matrix of explanatory variables ( |
y |
Numeric vector of length |
noc |
Integer, the number of clusters that should be searched for on the data. |
genes |
Defaults to |
flip |
Logical, defaults to |
once.per.clust |
Logical, defaults to |
trace |
Integer >= 0; when positive, the output of the internal
loops is provided; |
wilma
returns an object of class "wilma". The functions
print
and summary
are used to obtain an overview of the
clusters that have been found. The function plot
yields a
two-dimensional projection into the space of the first two clusters
that wilma
found. The generic function fitted
returns
the fitted values, these are the cluster representatives. Finally,
predict
is used for classifying test data on the basis of
Wilma's cluster with either the nearest-neighbor-rule, diagonal linear
discriminant analysis, logistic regression or aggregated trees.
An object of class "wilma" is a list containing:
clist |
A list of length |
steps |
Numerical vector of length |
y |
Numeric vector of length |
x.means |
A list of length |
noc |
Integer, the number of clusters that has been searched for on the data. |
signs |
Numerical vector of length |
Marcel Dettling, [email protected]
Marcel Dettling and Peter Bühlmann (2002). Supervised Clustering of Genes. Genome Biology, 3(12): research0069.1-0069.15, doi:10.1186/gb-2002-3-12-research0069 .
score
, margin
,
and for a newer methodology, pelora
.
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Wilma fit <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1) ## Working with the output fit summary(fit) plot(fit) fitted(fit) ## Fitted values and class predictions for the training data predict(fit, type = "cla") predict(fit, type = "fitt") ## Predicting fitted values and class labels for test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3)) predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3)) predict(fit, newdata = xN, type = "cla", classifier = "logreg") predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Wilma fit <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1) ## Working with the output fit summary(fit) plot(fit) fitted(fit) ## Fitted values and class predictions for the training data predict(fit, type = "cla") predict(fit, type = "fitt") ## Predicting fitted values and class labels for test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3)) predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3)) predict(fit, newdata = xN, type = "cla", classifier = "logreg") predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")