lemur
now automatically inserts the variables from the design formula into thegroup_by
argument forfind_de_neighborhoods
. (thanks Katha for pushing for this feature)- The formula parsing automatically detects global variables and adds them to the colData. This avoids problems with the random test / training assignment.
- Duplicate column names in colData are now longer allowed.
- Require harmony version >= 1.2.0 (thanks Maija for reporting the problem)
- Make
predict
function faster and less memory intensive for subset fits. - Speed-up internal function
get_groups
- Gracefully handle duplicated column names in
colData(fit)
- Give better error message in
test_de
ifcond(..)
is used for a fit that was not specified with a design formula (thanks @MaximilianNuber for reporting)
- Bug fix in subsetting logic affecting
predict
andtest_de
. The problem occured if afit
object was subsetted with indices or gene names and the order changed, and resulted in a wrong order of the predictions.
- Submission to Bioconductor, thus the jump in version number.
- Adjusted internal code to handle breaking changes in
harmony
v1.0.0. - Multiple small fixes to comply with Bioconductor guidelines (see Bioconductor/Contributions#3152)
- Instead of
include_complement
, thefind_de_neighborhoods
function gains aadd_diff_in_diff
argument. If it is true, the function calculates the difference between the DE results inside the neighborhood vs. outside. - Change
indices
columns toneighborhood
and store list of cell name vectors in output offind_de_neighborhoods
. - Enforce unique column and row names.
- Make the neighborhoods more consistent: (1) include cells which are connected to many cells inside the neighborhood, (2) exclude cells from the neighborhood which are not well connected to the other cells in the neighborhood.
- Add a
control_parameters
argument tofind_de_neighborhoods
. - Add
BiocNeighbor
as a dependency.
- Detect problematic neighborhoods and skip them.
- Replace
test_data_cell_size_factors
bysize_factor_method
, which is more flexibel. Settingsize_factor_method = "ratio"
uses the size factor method described in the original DESeq paper
- Fix bug in
find_de_neighborhoods
that meant that accidentally additionally zeros where included in each neighborhood pseudobulk. The test should have more power now. - Expose
min_neighborhood_size
argument infind_de_neighborhoods
. - Add
test_data_cell_size_factors
argument tofind_de_neighborhoods
which is useful if the function is called with a subsettedfit
argument.
- Improve alignment functions: simplify algorithm, find linear approximation to Harmony's steps, include an intercept.
- Avoid calling private methods from
harmony
. - Convert character columns in
colData
to factors to avoid problems when dividing data into test and training data. - Fix bug in
find_de_neighborhoods
where I didn't embrace an argument. - Remove
BiocNeighbors
dependency.
- Minor bug fix in
find_de_neighborhoods
. The function threw an error ifalignment_design != design
. - Better error messages if
find_de_neighborhoods
is called without having calledtest_de
before.
- Change defaults for
find_de_neighborhoods
. Increase theridge_penalty
and add amin_neighborhood_size = 10
argument to avoid creation of very small neighborhoods.
- Add new
test_fraction
argument tolemur()
function. It automatically defines a hold-out datasets for the fitting step. These hold-out data is used to infer the differential expression of the neighborhoods infind_de_neighborhoods
. This change addresses the double-dipping problem, where it was previously left to the user to provide an independent matrix for thefind_de_neighborhoods
function. - As a consequence of these changes, the structure of
lemur_fit
objects has changed. They gain three new fields calledfit$test_data
,fit$training_data
, andfit$is_test_data
. - The order and names of the arguments for
find_de_neighborhoods
has changed.
- Remove
alignment_method
field fromlemur_fit
objects as it was not used for anything.
- Rename argument name for
align_by_template
fromalignment_template
totemplate
- Tweak algorithm for alignment to take cluster sizes into account during optimization
- Change in the alignment model. Previously, the method tried to align cells using
rotations and / or stretching, however, the method could not represent reflections!
To fix this, I now allow arbitrary linear transformations where
$R(x) = (I + sum_k x_k V_k)^{-1}$ . The new alignment is more flexible and easier to infer. The downside is the term inside the parantheses can be singular which would lead to an error. - Skip iteration step: first infer centering and then infer latent space. Previously, I iterated between these steps but that either didn't add anything or actually degraded the results.
- Set
center = FALSE
infind_base_point
. Centering the data before fitting the base point caused problems and made the data look less integrated in some cases. - Remove ambient PCA step. This was originally conceived as an performance optimization, however it had detrimental effects on the inference. Since a few version it was skipped per default, so removing it should not change the inference.
- Add
linear_coefficient_estimator
to give more flexibility how or if the conditions are centered. - Reduce the
minimum_cluster_membership
default to0.001
inalign_harmony
to make it more sensitive. - Make
test_global
an internal function again until further more extensive testing. - Remove
base_point
argument fromlemur()
. It wasn't used anyways.
- Refactor
find_de_neighborhoods
: the function can now combine the results of different directions, selection criteria, and pseudobulk test (on counts or continuous values). To implement this, I changed the names of the arguments and added parameters. - Remove many superfluous method generics and only provide the accession via
$
- Fix documentation warnings
- Rename class from 'lemur_fit_obj' to 'lemur_fit'
- Store 'contrast' in
lemur_fit
after callingtest_de
- Add option to fit count model in
find_de_neighborhoods
with edgeR