Tips for Running Renoir Effectively¶
This page collects practical guidance for getting the most out of Renoir across different spatial transcriptomics technologies and experimental designs. Each tip references the exact parameters you need to adjust.
NOTE: We are currently adding more tips to this page to improve tool performance.
1. Choosing the Optimal Neighborhood Size¶
The neighborhood definition is set in compute_neighborhood_scores via the
technology, use_radius, and radius parameters. The right choice depends on
your platform’s resolution and the biological scale of the signalling you expect.
Visium (55 µm spots, ~6 spot diameter)¶
Visium spots are large and sparsely packed. The default hexagonal ring neighborhood (the 6 immediately adjacent spots) is a good starting point and requires no extra parameters:
neighborhood_scores = Renoir.compute_neighborhood_scores(
...,
single_cell=False, # spot-level mode
use_radius=False, # use default hex-ring neighbor definition
)
If the signal looks too local (e.g., domains are only 1–2 spots wide), expand to a wider neighborhood by switching to radius mode:
neighborhood_scores = Renoir.compute_neighborhood_scores(
...,
single_cell=False,
use_radius=True,
radius=200, # 200 coordinate units — check your dataset's coordinate scale
)
Visium HD (8 µm bins)¶
Visium HD produces much denser data. Because individual bins are small, a larger radius is needed to capture biologically meaningful neighborhoods. Start with a moderate value and increase if domains appear fragmented:
neighborhood_scores = Renoir.compute_neighborhood_scores(
...,
single_cell=False,
use_radius=True,
radius=50, # in coordinate units — adjust based on your dataset's scale
)
CosMx / Xenium / MERSCOPE (single-cell resolution)¶
For single-cell platforms, enable single_cell=True and set radius in
the same coordinate units as your AnnData’s spatial coordinates:
neighborhood_scores = Renoir.compute_neighborhood_scores(
...,
single_cell=True,
use_radius=True,
radius=150, # in coordinate units — check obsm['spatial'] for your dataset's scale
)
Important:
radiusis always in the same units as the X, Y coordinates stored in your AnnData (typicallyobsm['spatial']). These units differ between datasets and platforms. Always inspect your coordinates before choosing a radius value:import pandas as pd coords = pd.DataFrame(neighborhood_scores.obsm['spatial'], columns=['x', 'y']) print(coords.describe()) # check the range to understand the coordinate scale
General rule of thumb:
Technology |
Neighborhood mode |
|
|---|---|---|
Visium |
|
|
Visium HD |
|
|
CosMx |
|
|
Xenium |
|
|
MERSCOPE |
|
|
Tip: If you are unsure, run
downstream_analysisat two or three radii and compare the spatial coherence of the resulting domains. Domains that look biologically sensible (contiguous regions matching known tissue compartments) are a good sign that the neighborhood size is appropriate.
2. Creating Your Own Curated Ligand–Receptor and Ligand–Target Pair Lists¶
Renoir accepts custom pair lists via the ligand_receptor_path and pairs_path
arguments in compute_neighborhood_scores. Both are simply CSV files with
specific column names.
Ligand–receptor pairs (ligand_receptor_path)¶
The file must have at least two columns named ligand and receptor:
ligand,receptor
TGFB1,TGFBR1
TGFB1,TGFBR2
IL6,IL6R
VEGFA,FLT1
VEGFA,KDR
Sources to build from:
CellChat DB — curated, literature-backed
OmniPath — large, multi-resource aggregate
NATMI — what the tutorials use
CellPhoneDB — includes multi-subunit complexes
To filter to your biology of interest (e.g., only cytokine signalling):
import pandas as pd
lr = pd.read_csv('All_human_lrpairs.csv')
# Keep only cytokine-related ligands (example gene list)
cytokines = ['IL6', 'IL1B', 'TNF', 'IFNG', 'CXCL10', 'CCL2']
lr_cytokine = lr[lr['ligand'].isin(cytokines)]
lr_cytokine.to_csv('cytokine_lrpairs.csv', index=False)
Ligand–target pairs (pairs_path)¶
The file must have columns named ligand and target:
ligand,target
IL6,STAT3
IL6,MYC
TGFB1,SNAI1
VEGFA,HIF1A
How to generate a ranked list:
The bundled top-N files (top_10_target_opt_both_ordered.csv,
top_100_target_opt_both_ordered.csv) are derived from NicheNet’s regulatory
potential matrix. To build your own ranked list from NicheNet:
import pandas as pd
# Load the full NicheNet regulatory potential matrix
# (available at https://zenodo.org/record/3260758)
reg = pd.read_csv('ligand_target_matrix.csv', index_col=0)
# Keep only your ligands of interest
ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA']
reg_subset = reg.loc[ligands_of_interest]
# For each ligand, keep the top-N targets by regulatory potential score
top_n = 20
rows = []
for ligand in reg_subset.index:
top_targets = reg_subset.loc[ligand].nlargest(top_n)
for target, score in top_targets.items():
rows.append({'ligand': ligand, 'target': target, 'score': score})
custom_pairs = pd.DataFrame(rows).sort_values('score', ascending=False)
custom_pairs.to_csv('my_ligand_target_pairs.csv', index=False)
Tip: Start with a smaller, focused pair list (top 10–20 pairs per ligand) rather than top 100+. Downstream clustering is faster and domains are often more interpretable when the signal is not diluted by low-scoring pairs.
3. Generating Your Own Ligand–Target Regulatory Potential Scores¶
The ligand_target_regulatory_potential object used by ligand_ranking is a
precomputed dictionary (or DataFrame) mapping each ligand to regulatory potential
scores across its top target genes. There are three ways to produce this.
Option A — Use the NicheNet matrix directly (recommended)¶
The NicheNet team publish a precomputed human and mouse matrix on Zenodo (record 3260758). Load it and convert to the format Renoir expects:
import pandas as pd, pickle
# Download from: https://zenodo.org/record/3260758
reg = pd.read_csv('ligand_target_matrix.csv', index_col=0) # ligands × targets
# Keep only the top 500 targets per ligand (matching the bundled file)
top_500 = reg.apply(lambda row: row.nlargest(500), axis=1)
# Save as a pickle
with open('top_500_target_opt_both_scores.pkl', 'wb') as f:
pickle.dump(top_500, f)
Option B — Use an existing regulatory database¶
Rather than inferring a gene regulatory network from data (which is unreliable in practice), it is better to derive regulatory potentials from a curated transcription factor–target database. Two well-maintained options are:
DoRothEA — curated TF–target regulons with confidence levels (A–E); available for human and mouse.
CollecTRI — a comprehensive signed TF–target network compiled from literature and ChIP-seq data.
Both can be accessed via the decoupler Python package and converted to the
format Renoir expects:
import decoupler as dc
import pandas as pd, pickle
# Load DoRothEA regulons (human, confidence levels A and B only)
dorothea = dc.get_dorothea(organism='human', levels=['A', 'B'])
# dorothea has columns: 'source' (TF), 'target', 'weight', 'confidence'
# Map TFs to their upstream ligands using your ligand-receptor table
# (i.e., if a receptor activates a TF, then the ligand -> TF -> target chain
# gives you the ligand's regulatory potential over that target)
lr = pd.read_csv('All_human_lrpairs.csv') # columns: ligand, receptor
# Build a ligand -> target score table via receptor -> TF -> target
rows = []
for _, row in lr.iterrows():
ligand, receptor = row['ligand'], row['receptor']
# Find targets regulated by TFs known to be downstream of this receptor
# (requires a receptor -> TF mapping from e.g. OmniPath kinase-substrate)
tf_targets = dorothea[dorothea['source'] == receptor]
for _, t in tf_targets.iterrows():
rows.append({'ligand': ligand, 'target': t['target'], 'score': t['weight']})
reg_potential = pd.DataFrame(rows)
reg_potential = reg_potential.groupby(['ligand', 'target'])['score'].max().unstack(fill_value=0)
with open('dorothea_reg_potential.pkl', 'wb') as f:
pickle.dump(reg_potential, f)
Option C — Restrict the bundled matrix to your ligands of interest¶
If you only care about a subset of ligands, slice the bundled pickle to reduce
memory and computation during ligand_ranking:
import pickle, pandas as pd
with open('top_500_target_opt_both_scores.pkl', 'rb') as f:
reg = pickle.load(f)
ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA', 'CXCL10']
reg_subset = reg.loc[[l for l in ligands_of_interest if l in reg.index]]
with open('subset_reg_potential.pkl', 'wb') as f:
pickle.dump(reg_subset, f)
4. Defining Communication Domains¶
Renoir offers two strategies for identifying communication domains. Which one to use depends on whether you want data-driven discovery or hypothesis-driven annotation.
Strategy A — Data-driven Leiden clustering (downstream_analysis)¶
Use this when you have no prior knowledge of how many regions exist or where they are. Renoir reduces the score matrix to pathway PCs and clusters with Leiden:
neighbscore_copy, pcs = Renoir.downstream_analysis(
neighborhood_scores,
ltpair_clusters=pathways,
resolution=0.6, # key parameter — see below
return_cluster=True,
return_pcs=True,
)
Tuning resolution:
The effect of resolution is highly data-dependent and no single range applies universally. From experiments across datasets, the following has been observed as a rough guide for Visium data:
Resolution |
Typical outcome (Visium) |
|---|---|
0.1 – 0.3 |
3–5 broad domains (tumor / stroma / immune) |
0.4 – 0.7 |
5–10 mid-grain domains — good starting point |
0.8 – 1.5 |
10+ fine-grain domains; risk of over-fragmentation |
For high-resolution single-cell platforms (CosMx, Xenium, MERSCOPE), the optimal resolution is often much lower. Values below 0.1 have been used successfully in practice — the much higher cell density means that even a very low resolution produces a meaningful number of domains. Start at 0.05 and increase gradually.
The best approach regardless of platform is to sweep a range and evaluate visually:
for res in [0.3, 0.5, 0.8, 1.0]:
copy, _ = Renoir.downstream_analysis(
neighborhood_scores,
ltpair_clusters=pathways,
resolution=res,
return_cluster=True,
)
sc.pl.spatial(copy, color='leiden', title=f'resolution={res}', size=1.4)
Strategy B — Cell-type-informed clustering (spot_v_spot)¶
Use spot_v_spot when cell-type co-localisation matters as much as signalling
similarity. It combines the L–T score matrix with pairwise cosine similarity of
cell-type abundances, so domains reflect both what signalling is happening and
which cell types are co-localised:
rd.spot_v_spot(
neighborhood_scores,
celltype,
resolution=0.8,
ltpair_clusters=pathways,
pdf_path='spot_v_spot_output.pdf',
)
Strategy C — User-defined regions of interest¶
If you already know which spots belong to a region of interest (e.g., from
pathologist annotations, RCTD output, or manual selection in Loupe Browser),
assign the labels directly to obs['leiden'] and skip clustering entirely:
import pandas as pd
# Load your manual annotations: a CSV with columns 'barcode' and 'region'
annotations = pd.read_csv('manual_annotations.csv', index_col='barcode')
# Assign to the neighborhood scores object
neighborhood_scores.obs['leiden'] = annotations.loc[
neighborhood_scores.obs_names, 'region'
].astype('category')
# Now run DE and ligand ranking as normal — Renoir treats the labels the same
# regardless of whether they came from clustering or manual annotation
sc.tl.rank_genes_groups(neighborhood_scores, 'leiden', method='wilcoxon')
Tip: Manual annotations and Leiden clusters can be mixed. Annotate the regions you understand (tumor core, necrotic zone) and let Leiden cluster the rest — then merge the two
obscolumns before runningligand_ranking.
5. Choosing Cell Types and Providing Custom Markers for Ligand Ranking¶
ligand_ranking has two parameters that give you fine-grained control over
which cell types are analysed and which ligand–target pairs are used as the
ranking signal.
Controlling which cell types are included (domain_celltypes)¶
Option 1 — Top-N by abundance (default):
fig = Renoir.ligand_ranking(
...,
domain_celltypes=['top', 5], # top 5 most abundant cell types in the domain
)
Option 2 — Explicit cell type list:
Pass a list of cell-type names to restrict analysis to only those types, regardless of their abundance in the domain. This is useful when you have a biological hypothesis (e.g., “I only care about T cell – tumour interactions”):
fig = Renoir.ligand_ranking(
...,
domain_celltypes=['Cancer Basal SC', 'T cells CD8+', 'Macrophage'],
)
The cell-type names must match the column names in your celltype AnnData
exactly (check with celltype.var_names or celltype.to_df().columns).
Controlling the ranking signal (markers)¶
The markers parameter defines which ligand–target pairs are used to score
each ligand’s predicted activity in the domain.
Option 1 — Top-N DE marker pairs (default):
fig = Renoir.ligand_ranking(
...,
markers={'top': 100}, # use the top 100 DE pairs of the domain
)
Increase top if you want a broader ranking signal; decrease it to focus on
only the most domain-specific pairs.
Option 2 — User-defined cell-type marker genes:
Instead of using DE pairs from the domain, you can supply your own curated marker genes per cell type as a dictionary. The keys are cell-type names and the values are lists of marker genes. Renoir uses these to identify which ligands best explain the activity of those markers within the domain:
# Define known marker genes per cell type
custom_markers = {
'Cancer Basal SC': ['KRT5', 'KRT14', 'TP63', 'CDH3'],
'T cells CD8+': ['CD8A', 'CD8B', 'GZMB', 'PRF1', 'IFNG'],
'Macrophage': ['CD68', 'CD163', 'MRC1', 'CSF1R'],
'CAFs myCAF-like': ['ACTA2', 'FAP', 'POSTN', 'THY1'],
}
fig = Renoir.ligand_ranking(
...,
markers=custom_markers,
)
This is particularly useful when:
The domain of interest is small and
rank_genes_groupsreturns few DE pairs.You have strong prior knowledge about which cell types are biologically relevant in the domain and want to anchor the ranking to known biology.
You want results that are directly comparable across datasets or studies, where DE-derived markers would differ due to batch or composition differences.
The marker gene names must match the gene names in your scRNA-seq reference
AnnData (SC.var_names).
Filtering ligands by receptor expression (receptor_exp)¶
receptor_exp sets the minimum fraction of spots in the domain that must
express a ligand’s receptor. Raise it to be more stringent (only ligands the
domain is definitely listening to), or lower it to capture weak but potentially
important signals:
# Strict: receptor must be expressed in ≥10% of domain spots (default)
fig = Renoir.ligand_ranking(..., receptor_exp=0.1)
# Permissive: receptor expressed in ≥1% of domain spots
fig = Renoir.ligand_ranking(..., receptor_exp=0.01)
Tip: If
ligand_rankingreturns very few ligands, it is usually becausereceptor_expis too high for your data. Try lowering it to0.01and check whether the additional ligands make biological sense before committing to the lower threshold.
Quick Reference¶
Goal |
Parameter |
Where |
|---|---|---|
Wider neighborhood |
|
|
Single-cell mode |
|
|
Fewer, broader domains (Visium) |
|
|
More, finer domains (Visium) |
|
|
High-res platforms (CosMx etc.) |
|
|
Cell-type-aware domains |
use |
— |
Manual region annotations |
assign |
— |
Focus on specific cell types |
|
|
DE-driven ranking signal |
|
|
Custom marker gene ranking |
|
|
Strict receptor filter |
|
|
Permissive receptor filter |
|
|