Contents
Introduction
Here we present the R package RUCova, a novel method designed to address confounding factors such as heterogeneous cell size and staining efficiency in mass cytometry data. RUCova removes unwanted covariance using multivariate linear regression based on Surrogates of Unwanted Covariance (SUCs), and Principal Component Analysis (PCA).
RUCova comprises two major steps:
First, it fits a multivariate model for each measured marker (m) across cells (i) from samples (ji) with respect to the surrogates of sources of unwanted covariance (SUC, →xi). Samples ji can be either different cell lines, perturbations, conditions, metacells (clusters), or even batches.
Second, it eliminates such dependency by assigning the residuals ϵ of the model as the new modified expression of the marker. The fit can be expressed as:
ymi(→xi,ji)=Om(ji)+Sm(→xi,ji)⏟M(→xi,ji)+ϵmi , where Sm(→xi,ji) describes the slope of the fit and Om(ji) the intercept or offset.
The predictors are the surrogates of sources unwanted covariance (SUC, →x), which can be either specific markers as proxies of the confounding factors or the principal components (PCs) derived from a Principal Component Analysis (PCA) performed on such markers.
RUCova offers 3 different uni- or multi-variate linear models to describe the relationship between marker expression and SUC: (1) M1(→xi): simple, (2) M2(→xi,ji): offset, and (3) M3(→xi,ji): interaction. In this vignette you can find a detailed description of these methods and their differences.
The RUCova method eliminates the dependency of each measured marker on the SUCs by computing the model’s residuals and the intercept as the revised expression for each marker (ym∗i).:
y∗mi(ji)=Om1(ji)+ϵmi, where ϵmi are the residuals of the model.
In this vignette, we guide you step-by-step through it so you can make good use this package and reliably analyse single-cell mass cytometry data.
Citation
If you use RUCova in published research, please cite:
RUCova: Removal of Unwanted Covariance in mass cytometry data Rosario Astaburuaga-García, Thomas Sell, Samet Mutlu, Anja Sieber, Kirsten Lauber, Nils Blüthgen. Bioinformatics 2024; doi: https://doi.org/10.1101/2024.05.24.595717
Installation
To install the stable release of RUCova from Bioconductor (recommended), run:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("RUCova")
To install the latest development version directly from GitHub, use:
remotes::install_github("molsysbio/RUCova@devel", force = TRUE)
library(RUCova)
library(ggplot2)
library(tidyr)
library(tibble)
library(dplyr)
library(SingleCellExperiment)
theme_set(theme_classic())
Input data
Let’s first go through each input of RUCova.
sce
Is a SingleCellExperiment
object containing an assay
(e.g.: “counts”) with mass cytometry data of single-cell marker signals [rows = markers, columns = cells] in linear scale. This data should be clean, meaning without calibration beads, debris, doublets, and dead cells, and single-cells are demultiplexed (important if you want to adapt the linear fits to the samples). Metadata of the cells (eg.: samples, treatment, etc) are stored in colData
.
In this example we offer a mass cytometry data set in form of a tibble consisting of 8 Head-and-Neck Squamous Cell Carcinoma (HNSCC) lines in irradiated (10 Gy) and control (0 Gy) conditions. It corresponds to the data in Figure 2 and Figure 3 of the manuscript, but subsampled to 500 cells per line and condition.
data <- RUCova::HNSCC_data
data <- as.data.frame(data)
colnames(data) <- make.names(colnames(data))
head(data)
rownames(data) <- 1:dim(data)[1]
data <- data |> mutate(cell_id = 1:n())
We convert our data into a SingleCellExperiment
to ensure interoperability across Bioconductor packages. We also add an id per cell to be able to do a cell-wise signals value modification.
sce <- SingleCellExperiment(
assays = list(counts = t(data[,4:48])),
rowData = DataFrame(measured_channels = colnames(data[,4:48])),
colData = DataFrame(cell_id = data$cell_id,
line = data$line,
dose = data$dose),
)
name_assay_before
A string specifying the name of the assay containing the data before applying RUCova. In this example, name_assay_before = "counts"
.
markers
A vector of markersm,for which we want to regress out the unwanted covariance. In this example:
m <- c("pH3","IdU","Cyclin_D1","Cyclin_B1", "Ki.67","pRb","pH2A.X","p.p53","p.p38","pChk2","pCDC25c","cCasp3","cPARP","pAkt","pAkt_T308","pMEK1.2","pERK1.2","pS6","p4e.BP1","pSmad1.8","pSmad2.3","pNFkB","IkBa", "CXCL1","Lamin_B1", "pStat1","pStat3", "YAP","NICD")
The character variables in the vectors x
(surrogates) and m
(markers) must be findable as column names in the tibble data
.
SUCs
Since cell volume and labeling efficiency cannot be directly measured with mass cytometry, we use four urrogates of sources of nwanted ovariance (SUCs):
→xi=[mean DNAi,mean highest BCi,total ERKi,pan Akti]
Mean DNA: Mean value of normalised iridium channels
mean DNAi=Ir191norm,i+Ir193norm,i2, where Ir191norm and Ir193norm are the percentile-normalised intensities (e.g., to the 95th percentile) of the iridium intercalators serving as DNA stains for each celli.
The RUCova function called RUCova::calc_mean_DNA
applies the asinh()
function and adjusts the transformed distributions of the iridium isotopes (Ir191 and Ir193) by matching a specific percentile q
to take then the mean value of these two signals per cell. The function returns a vector with the mean DNA signals in linear scale, as it applies the inverse transformation sinh()
.
dna_channels <- c("DNA_191Ir", "DNA_193Ir")
sce <- RUCova::calc_mean_DNA(sce, name_assay = "counts", dna_channels, q = 0.95)
Mean BC: Mean value of the highest (used) barcoding isotopes per cell:
mean highest BCi=1NBC∑NBCk=1BCi,k,norm, whereBCnormare the percentile-normalised intensities of the barcoding isotopes across all cells ( to the 95th percentile),NBCis the number of barcoding isotopes used per cell (e.g.:NBC= 3 for the Fluidigm kit of Palladium isotopes), andBCkis the barcoding isotope with the k-th highest signal in celli, meaning the isotope was used in that cell to barcode it.
The RUCova function called RUCova::calc_mean_BC
works in two steps. First, it applies the asinh()
function and adjusts the transformed distributions of the barcoding isotopes by matching a specific percentile q
. Then, it looks at the signals of each isotope for each cell and picks the top n_bc
signals used for barcoding. The function returns a vector with the mean BC signals in linear scale, as it applies the inverse transformation sinh()
.
In the following example the Cell-ID 20-Plex Pd Barcoding Kit was used, in addition to Platinum 194 and 198, where 4 out of 8 isotopes are mixed to form one barcode:
bc_channels <- c("Pd102Di", "Pd104Di", "Pd105Di", "Pd106Di", "Pd108Di", "Pd110Di",
"Dead_cells_194Pt", "Dead_cells_198Pt")
sce <- RUCova::calc_mean_BC(sce, name_assay = "counts", bc_channels, n_bc = 4, q = 0.95)
pan Akt and total ERK
The surrogates pan Akt and total ERK are markers included in our mass cytometry panel.
The selection of SUCs should be strategically aligned with the biological system, experimental framework, and specific markers involved. In this investigation, our mass cytometry panel is primarily focused on the MAPK pathway and its interactions. Consequently, selecting total ERK and pan Akt as SUCs is deemed suitable. Nonetheless, we are aware that these markers may not always be optimal in other scenarios due to possible variations. Alternatively, markers such as metabolic enzymes like GAPDH present a viable option. Although GAPDH was not tested in our study, we recognize its importance and frequent use as a control in western blots.
Now we define our vector of SUCs as:
x = c("total_ERK", "pan_Akt", "mean_DNA", "mean_BC")
PCA
SUCs can also be linearly transformed into a new coordinate system by applying Principal Component Analysis. By doing this, users can decide the extent of unwanted correlation to be removed by using eg.: only PC1 as a predictive variable in the univariate model, or PC1 to PC2, or all PCs equivalent to taking all surrogates as predictive variables (more examples below).
pca <- t(assay(sce,"counts")) |>
as_tibble() |>
select(all_of(x)) |>
mutate_all(asinh) |>
mutate_all(scale) |>
prcomp()
This PCA can be stored in the SingleCellExperiment object as follows:
reducedDim(sce, "PCA") <- pca$x
name_reduced_dim
Name of the dimensionality reduction data stored under reducedDim()
. In this example is "PCA
. This is important if the regression is desired to be fitted as a function of principal components or other reduced dimensions.
apply_asinh_SUCs
Apply (TRUE
) or not (FALSE
) asinh transformation to the SUCs. TRUE
if SUCs are the measured surrogates (x), FALSE
if SUCs are PCs.
model
A character: simple
, offset
or interaction
defining the linear model to be used. Here we offer a quick description of each model and their rationale.
M1: Simple
Consists of one fit per measured markerymiacross the entire data set.
M1(→xi)=βm⏟=Om1+NSUC∑p=1αmp⋅xi,p⏟=Sm1(→xi) , where βm0 is the intercept and αmp is the slope coefficient for each predictor or SUC p.
This is suitable when the data set includes one cell line/organoid/biological system with multiple perturbations, and the goal is to evaluate the treatment effect across these perturbations. We recommend this model when the relationship between marker abundance and the source of confounding factors (slope) is theoretically thought to be consistent across the data set.
M2: Offset
Consists of one fit per measured marker ymi and sample ji. The fits for the samples share the same slope, while differing in the intercept (offset term Om2(ji)). Samples ji can be either different cell lines, perturbations, conditions, metacells (clusters), or even batches.
M2(→xi,ji)=βmji⏟=Om2(ji)+NSUC∑p=1αmp⋅xi,p⏟=Sm2(→xi)
The offset - intercept - term Om can either represent desired variation between samples (keep_offset = TRUE
) or unwanted variation (keep_offset = FALSE
). If the offset is considered wanted, the output from the offset model will closely resemble that of the simple model. However, the offset model becomes particularly important when the variation between samples is unwanted, such as when different samples represent batches and the goal is to remove batch effects.
M3: Interaction
Consists of one fit per measured marker ymi and sample ji. The fits for the samples can have different slopes (interaction term Sm3(→xi,ji)) and intercepts (offset term Om3(ji)).
M3(→xi,ji)=βmji⏟=Om3(ji)+NSUC∑p=1αmji,p⋅xi,p⏟=Sm3(→xi,ji)
This model is ideal for datasets with different cell types (e.g., immune cells) or when comparing treatment effects across different cell lines with varying relationships between marker abundance and the source of unwanted covariance (e.g., cell size). Similarly, when removing batch effects is desired, fitting an interaction model per batch and removing the offset between fits can be effective (keep_offset = FALSE
).
col_name_sample
A character indicating the column name in data
defining each sample (cell lines, batches, treatments, etc). Clarifying examples are below.
center_SUCs
A character across_samples
or per_sample
defining how to zero-center the SUCs.
across samples
SUCs are zero-centered across all samples before applying the linear model. This is recommended when the relationship (e.g.: logFC) between samples is an unwanted covariance and it is thought to be affected by cell size or staining efficiency.
per sample
SUCs are zero-centered per sample applying the linear model. This is recommended when the relationship (e.g.: logFC) between samples is a desired covariance and is not affected by cell size or staining efficiency (more conservative)
keep_offset
As written above, the RUCova method eliminates the dependency of each measured marker on the SUCs by computing the model’s residuals and the intercept as the revised expression for each marker (ym∗i). If the offset between samples Om(ji is a desired covariance (e.g.: cell lines), then keep_offset = TRUE
, and the modified value of the marker is described by:
y∗mi(ji)=Om1(ji)+ϵmi
, where ϵmi are the residuals of the model.
If the offset between samples is an unwanted covariance (e.g.: batches), then keep_offset = FALSE
, and the modified value of the marker is:
y∗mi(ji)=ϵmi
## name_assay_after
A string specifying the name of the assay containing the data after applying RUCova.
Zero values
RUCova fits the chosen linear model including the zero values. By removing the correlation with the SUCs, zeros will get a value accordingly. Hence, RUCova addresses the issue of zero-values in mass cytometry data by assigning non-zero values while simultaneously removing such covariance. Examples below will further clarify this point.
Output data
The output of the function rucova
is a SingleCellExperiment containing the same information as the input SingleCellExperiment, plus:
Additional assay named name_assay_after
containing the modified marker intensity values in linear scale.
A metadata list named model_
concatenated with name_assay_after
accesible via metadata(sce)
. This list contains:
All the input variables: name_assay_before
, markers
, SUCs
, name_reduced_dim
, apply_asinh_SUCs
, model
, col_name_sample
, center_SUCs
, keep_offset
, `` (to remind you how you obtained this result).
model_formula
: the lm
model formula applied to each marker to regress-out surrogates of sources of unwanted covariance.
model_coefficients
: intercept Om(ji) and slope αmp of the fit for surrogate p and marker m (and sample j if applicable). This values are for dummy variables. For meaningful coefficients, refer to effective_coefficients
.
model_residuals
: the residuals of the fit ϵmi.
adj_r2
: adjusted R-squared or coefficient of determination to evaluate the goodness of fit for each marker. It quantify the proportion of the variance in the dependent variable (marker) that is explained by the independent variables (surrogates) in the model. The adjusted R-squared is adjusted by the number of independent variables used to predict the target variable. This is done to account for the automatic increase of R2 values when extra explanatory variables are added to the model. By analysing the R2adj,we can determine whether adding new variables increases the model fit.
R2adj=(1−(1−SSresSStot⏟R2))⋅n−1n−q−1
where SSres=∑iϵ2i is the residual sum of squares,ϵi are the residuals of the model, SStot=∑i(yi−¯y) is the total sum of squares, ¯y is the mean value of the marker,nis the sample size (total number of cells i) and q is the number of explanatory variables in the model.
stand_slopes
: We standardised the slope coefficients in order to make them comparable. In the case of the interaction model, where the slopes αm depend on samples ji and the SUC p, we standardised the slope by multiplying it by the standard deviation of the corresponding SUC in each sample j ( σxji,p ) and dividing it by the standard deviation of the marker m in sample j (σymji) :
αm∗ji,xi,p=αmji,xi,p⋅σxji,pσymji
Examples
M1: Simple model
As we have multiple cancer cell lines in the current data set, we run the simple model in only one cell line and across control and treated condition.
sce_Cal33 <- sce[,sce$line == "Cal33"]
Let’s give a look at the pearson correlation coefficients between markers before applying RUCova (symmetric matrix).
RUCova offers the option of returning the correlation coefficients on a matrix or directly the heatmap, which is automatically plotted and can also be stored:
matrix_corr <- RUCova::compare_corr(sce_Cal33[c(m,x)])
heatmap_corr <- RUCova::heatmap_compare_corr(sce_Cal33[c(m,x)])
RUCova::heatmap_compare_corr(sce_Cal33[c(m,x)])

Remove correlations
Let’s say you’re not convinced yet by the package, so you want to be conservative and remove correlations to all SUCs but not the logFC between conditions. In this case, center_SUCs = "per_sample"
.
Now we can compare the new Pearson correlation coefficients calculated from the assay counts_simple_persample
(after RUCova, upper triangle) to the original coefficients from the assay ``counts``` (lower triangle).
heatmap_compare_corr(sce_Cal33, name_assay_before = "counts", name_assay_after = "counts_simple_persample")

Log fold-changes between irradiated and control condition are kept (positive means higher in irradiated).
FC_before <- t(assay(sce_Cal33,"counts")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
mutate(data = "before RUCova")
FC_after <- t(assay(sce_Cal33,"counts_simple_persample")) |>
as.tibble() |> cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
ungroup() |>
mutate(data = "simple all, per sample")
rbind(FC_before,FC_after) |>
ggplot(aes(x = logFC, y = marker, fill = data)) +
geom_col(position = "dodge")

As radiation changes the cell volume, we think differences in protein intensities between treated and control are confounded. Hence, we want to remove any difference that correlates with the SUCs (center_SUCs = "across_sample"
).
heatmap_compare_corr(sce_Cal33, name_assay_before = "counts", name_assay_after = "counts_simple_acrosssamples")

Log fold-changes between irradiated and control condition are modified accordingly (positive means higher in irradiated).
FC_before <- t(assay(sce_Cal33,"counts")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
mutate(data = "before RUCova")
FC_after <- t(assay(sce_Cal33,"counts_simple_acrosssamples")) |>
as.tibble() |> cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
ungroup() |>
mutate(data = "simple all, per sample")
rbind(FC_before,FC_after) |>
ggplot(aes(x = logFC, y = marker, fill = data)) +
geom_col(position = "dodge")

Let’s imagine you want to be conservative and only remove correlations between markers and PC1 (of SUCs).
pca_cal33 <- t(assay(sce_Cal33,"counts")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
select(x) |>
mutate_all(asinh) |>
mutate_all(scale) |>
prcomp()
Calculate and plot the variance explained by each PC:
tibble(perc = as.numeric(pca_cal33$sdev^2/sum(pca_cal33$sdev^2))*100,
PC = 1:length(pca_cal33$sdev)) |>
ggplot(aes(x = PC, y = perc, label = round(perc,1))) +
geom_col() +
geom_label()

Check the loadings of each PC:
as.data.frame(pca_cal33$rotation) |>
rownames_to_column("x") |>
pivot_longer(names_to = "PC", values_to = "loadings", -x) |>
ggplot(aes(x = loadings, y = x)) +
geom_col() +
facet_wrap(~PC, nrow = 1)

In this example, PC1 has positive loadings. Meaning PC1 will positively correlate with the markers, which is intuitive if we think of it as the cell size. In case for your data set, PC1 has negative loadings, you can just the direction for a more intuitive analysis:
pca_cal33$x |> as.data.frame() |> mutate(PC1 = -PC1)
Add the PCA to the sce object under the name “PCA”:
name_reduced_dim = "PCA"
reducedDim(sce_Cal33, name_reduced_dim) <- pca_cal33$x
Then, SUCs= "PC1"
and apply_asinh_SUCs = FALSE
, as asinh transformation is not necessary on PCs (it was applied on SUCs before PCA). This applies to all models.
If we regress-out any PCs and want to check the correlation coefficient, it is important we specify now the name for the heatmap function to include it: ``name_reduced_dim = “PCA”```.
heatmap_compare_corr(sce_Cal33, name_assay_before = "counts", name_assay_after = "counts_simple_PC1", name_reduced_dim = "PCA")

Log fold-changes between irradiated and control condition are modified accordingly (positive means higher in irradiated).
FC_before <- t(assay(sce_Cal33,"counts")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
mutate(data = "before RUCova")
FC_after <- t(assay(sce_Cal33,"counts_simple_PC1")) |>
as.tibble() |> cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
ungroup() |>
mutate(data = "simple all, per sample")
rbind(FC_before,FC_after) |>
ggplot(aes(x = logFC, y = marker, fill = data)) +
geom_col(position = "dodge")

Adjusted R-squared
Let’s compare the adjusted R² per marker (goodness of fit, see complete definition above).
r2_a <- metadata(sce_Cal33)$model_counts_simple_acrosssamples$adjr2 |> mutate(model = "simple all, across samples")
r2_b <- metadata(sce_Cal33)$model_counts_simple_persample$adjr2|> mutate(model = "simple all, per sample")
r2_c <- metadata(sce_Cal33)$model_counts_simple_PC1$adjr2 |> mutate(model = "simple PC1, across sample")
rbind(rbind(r2_a,r2_b),r2_c) |>
ggplot(aes(x = adj_r_squared, y = marker, fill = model)) +
geom_col(position = "dodge")

In general, across all markers, the adjusted R^2 is higher when using all SUCs as predictive variables (instead of just PC1) and when SUCs are center across samples (residuals are smaller).
rbind(rbind(r2_a,r2_b),r2_c) |>
ggplot(aes(y = adj_r_squared, x = model, color = model)) +
geom_boxplot()

The decision of which model parameters to use should be based on: (1) the extent of variance to remove (all SUCs or some PCs) and (2) whether to keep or not the FC between samples.
Standardised slopes
We can also compare the slopes for each marker in each model.
slope_a <- metadata(sce_Cal33)$model_counts_simple_acrosssamples$stand_slopes |> mutate(model = "simple all, across samples")
slope_b <- metadata(sce_Cal33)$model_counts_simple_persample$stand_slopes |> mutate(model = "simple all, per sample")
slope_c <- metadata(sce_Cal33)$model_counts_simple_PC1$stand_slopes |> mutate(model = "simple PC1, across sample")
rbind(slope_a, slope_b, slope_c) |>
mutate(coef_key = factor(coef_key, levels = c("mean_DNA", "mean_BC","pan_Akt", "total_ERK", "PC1"))) |>
ggplot(aes(x = stand_value, y = marker, fill = model)) +
geom_col(position = "dodge") +
facet_wrap(~coef_key, nrow = 1)

M2: Offset model
Here we just show one example: Let’s say different samples (dose) within Cal33 are different batches, and we would like to adjust for it in addition to removing unwanted correlations with all SUCs. Then keep_offset = FALSE
.
sce_Cal33 <- RUCova::rucova(sce = sce_Cal33,
name_assay_before = "counts",
markers = m,
SUCs = x,
apply_asinh_SUCs = TRUE,
keep_offset = FALSE,
model = "offset",
center_SUCs = "per_sample",
col_name_sample = "dose",
name_assay_after = "counts_offset_persample")
Fold-change between conditions (“dose”) becomes zero:
FC_before <- t(assay(sce_Cal33,"counts")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
mutate(data = "before RUCova") |>
ungroup()
FC_after <- t(assay(sce_Cal33,"counts_offset_persample")) |>
as.tibble() |>
cbind(colData(sce_Cal33)) |>
mutate_at(vars(x,m), asinh) |>
pivot_longer(names_to = "marker", values_to = "value", c(x,m)) |>
group_by(marker) |>
summarise(logFC = mean(value[dose=="10Gy"])-mean(value[dose=="0Gy"])) |>
ungroup() |>
mutate(data = "offset all, per sample")
rbind(FC_before, FC_after) |>
ggplot(aes(x = logFC, y = marker, fill = data)) +
geom_col(position = "dodge")

Different intercepts between samples:
metadata(sce_Cal33)$model_counts_offset_persample$eff_coefficients |>
filter(surrogate == FALSE) |>
ggplot(aes(x = eff_value, y = marker, fill = sample)) +
geom_col(position = "dodge") +
xlab("eff_value (intercept)")

M3: Interaction model
Here we just show one example, the least conservative: we eliminate differences between cell lines that correlate with the SUCs (center_SUCs = "across_samples"
) and apply one fit per cell line (model = "interaction"
). Then:
sce <- RUCova::rucova(sce = sce,
name_assay_before = "counts",
markers = m,
SUCs = x,
apply_asinh_SUCs = TRUE,
model = "interaction",
center_SUCs = "across_samples",
col_name_sample = "line",
name_assay_after = "counts_interaction_persample")
Different slopes for different cell lines:
metadata(sce)$model_counts_interaction_persample$eff_coefficients |>
filter(surrogate != FALSE) |>
ggplot(aes(x = eff_value, y = marker, fill = sample)) +
geom_col(position = "dodge") +
facet_wrap(~surrogate) +
xlab("eff_value (slope)")
