Tips for Running Renoir Effectively¶

This page collects practical guidance for getting the most out of Renoir across different spatial transcriptomics technologies and experimental designs. Each tip references the exact parameters you need to adjust.

NOTE: We are currently adding more tips to this page to improve tool performance.

1. Choosing the Optimal Neighborhood Size¶

The neighborhood definition is set in compute_neighborhood_scores via the technology, use_radius, and radius parameters. The right choice depends on your platform’s resolution and the biological scale of the signalling you expect.

Visium (55 µm spots, ~6 spot diameter)¶

Visium spots are large and sparsely packed. The default hexagonal ring neighborhood (the 6 immediately adjacent spots) is a good starting point and requires no extra parameters:

neighborhood_scores = Renoir.compute_neighborhood_scores(
    ...,
    single_cell=False,   # spot-level mode
    use_radius=False,    # use default hex-ring neighbor definition
)

If the signal looks too local (e.g., domains are only 1–2 spots wide), expand to a wider neighborhood by switching to radius mode:

neighborhood_scores = Renoir.compute_neighborhood_scores(
    ...,
    single_cell=False,
    use_radius=True,
    radius=200,         # 200 coordinate units — check your dataset's coordinate scale
)

Visium HD (8 µm bins)¶

Visium HD produces much denser data. Because individual bins are small, a larger radius is needed to capture biologically meaningful neighborhoods. Start with a moderate value and increase if domains appear fragmented:

neighborhood_scores = Renoir.compute_neighborhood_scores(
    ...,
    single_cell=False,
    use_radius=True,
    radius=50,          # in coordinate units — adjust based on your dataset's scale
)

CosMx / Xenium / MERSCOPE (single-cell resolution)¶

For single-cell platforms, enable single_cell=True and set radius in the same coordinate units as your AnnData’s spatial coordinates:

neighborhood_scores = Renoir.compute_neighborhood_scores(
    ...,
    single_cell=True,
    use_radius=True,
    radius=150,         # in coordinate units — check obsm['spatial'] for your dataset's scale
)

Important: radius is always in the same units as the X, Y coordinates stored in your AnnData (typically obsm['spatial']). These units differ between datasets and platforms. Always inspect your coordinates before choosing a radius value:
import pandas as pd
coords = pd.DataFrame(neighborhood_scores.obsm['spatial'], columns=['x', 'y'])
print(coords.describe())  # check the range to understand the coordinate scale

General rule of thumb:

Technology	Neighborhood mode	`single_cell`
Visium	`use_radius=False` (hex ring default)	`False`
Visium HD	`use_radius=True`, tune radius to your coordinate scale	`False`
CosMx	`use_radius=True`, tune radius to your coordinate scale	`True`
Xenium	`use_radius=True`, tune radius to your coordinate scale	`True`
MERSCOPE	`use_radius=True`, tune radius to your coordinate scale	`True`

Tip: If you are unsure, run downstream_analysis at two or three radii and compare the spatial coherence of the resulting domains. Domains that look biologically sensible (contiguous regions matching known tissue compartments) are a good sign that the neighborhood size is appropriate.

2. Creating Your Own Curated Ligand–Receptor and Ligand–Target Pair Lists¶

Renoir accepts custom pair lists via the ligand_receptor_path and pairs_path arguments in compute_neighborhood_scores. Both are simply CSV files with specific column names.

Ligand–receptor pairs (`ligand_receptor_path`)¶

The file must have at least two columns named ligand and receptor:

ligand,receptor
TGFB1,TGFBR1
TGFB1,TGFBR2
IL6,IL6R
VEGFA,FLT1
VEGFA,KDR

Sources to build from:

CellChat DB — curated, literature-backed
OmniPath — large, multi-resource aggregate
NATMI — what the tutorials use
CellPhoneDB — includes multi-subunit complexes

To filter to your biology of interest (e.g., only cytokine signalling):

import pandas as pd

lr = pd.read_csv('All_human_lrpairs.csv')

# Keep only cytokine-related ligands (example gene list)
cytokines = ['IL6', 'IL1B', 'TNF', 'IFNG', 'CXCL10', 'CCL2']
lr_cytokine = lr[lr['ligand'].isin(cytokines)]
lr_cytokine.to_csv('cytokine_lrpairs.csv', index=False)

Ligand–target pairs (`pairs_path`)¶

The file must have columns named ligand and target:

ligand,target
IL6,STAT3
IL6,MYC
TGFB1,SNAI1
VEGFA,HIF1A

How to generate a ranked list:

The bundled top-N files (top_10_target_opt_both_ordered.csv, top_100_target_opt_both_ordered.csv) are derived from NicheNet’s regulatory potential matrix. To build your own ranked list from NicheNet:

import pandas as pd

# Load the full NicheNet regulatory potential matrix
# (available at https://zenodo.org/record/3260758)
reg = pd.read_csv('ligand_target_matrix.csv', index_col=0)

# Keep only your ligands of interest
ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA']
reg_subset = reg.loc[ligands_of_interest]

# For each ligand, keep the top-N targets by regulatory potential score
top_n = 20
rows = []
for ligand in reg_subset.index:
    top_targets = reg_subset.loc[ligand].nlargest(top_n)
    for target, score in top_targets.items():
        rows.append({'ligand': ligand, 'target': target, 'score': score})

custom_pairs = pd.DataFrame(rows).sort_values('score', ascending=False)
custom_pairs.to_csv('my_ligand_target_pairs.csv', index=False)

Tip: Start with a smaller, focused pair list (top 10–20 pairs per ligand) rather than top 100+. Downstream clustering is faster and domains are often more interpretable when the signal is not diluted by low-scoring pairs.

3. Generating Your Own Ligand–Target Regulatory Potential Scores¶

The ligand_target_regulatory_potential object used by ligand_ranking is a precomputed dictionary (or DataFrame) mapping each ligand to regulatory potential scores across its top target genes. There are three ways to produce this.

Option A — Use the NicheNet matrix directly (recommended)¶

The NicheNet team publish a precomputed human and mouse matrix on Zenodo (record 3260758). Load it and convert to the format Renoir expects:

import pandas as pd, pickle

# Download from: https://zenodo.org/record/3260758
reg = pd.read_csv('ligand_target_matrix.csv', index_col=0)  # ligands × targets

# Keep only the top 500 targets per ligand (matching the bundled file)
top_500 = reg.apply(lambda row: row.nlargest(500), axis=1)

# Save as a pickle
with open('top_500_target_opt_both_scores.pkl', 'wb') as f:
    pickle.dump(top_500, f)

Option B — Use an existing regulatory database¶

Rather than inferring a gene regulatory network from data (which is unreliable in practice), it is better to derive regulatory potentials from a curated transcription factor–target database. Two well-maintained options are:

DoRothEA — curated TF–target regulons with confidence levels (A–E); available for human and mouse.
CollecTRI — a comprehensive signed TF–target network compiled from literature and ChIP-seq data.

Both can be accessed via the decoupler Python package and converted to the format Renoir expects:

import decoupler as dc
import pandas as pd, pickle

# Load DoRothEA regulons (human, confidence levels A and B only)
dorothea = dc.get_dorothea(organism='human', levels=['A', 'B'])
# dorothea has columns: 'source' (TF), 'target', 'weight', 'confidence'

# Map TFs to their upstream ligands using your ligand-receptor table
# (i.e., if a receptor activates a TF, then the ligand -> TF -> target chain
# gives you the ligand's regulatory potential over that target)
lr = pd.read_csv('All_human_lrpairs.csv')  # columns: ligand, receptor

# Build a ligand -> target score table via receptor -> TF -> target
rows = []
for _, row in lr.iterrows():
    ligand, receptor = row['ligand'], row['receptor']
    # Find targets regulated by TFs known to be downstream of this receptor
    # (requires a receptor -> TF mapping from e.g. OmniPath kinase-substrate)
    tf_targets = dorothea[dorothea['source'] == receptor]
    for _, t in tf_targets.iterrows():
        rows.append({'ligand': ligand, 'target': t['target'], 'score': t['weight']})

reg_potential = pd.DataFrame(rows)
reg_potential = reg_potential.groupby(['ligand', 'target'])['score'].max().unstack(fill_value=0)

with open('dorothea_reg_potential.pkl', 'wb') as f:
    pickle.dump(reg_potential, f)

Note: A full receptor → TF mapping requires an additional signalling resource such as OmniPath (via omnipath Python package) or SignaLink. The NicheNet matrix (Option A) already integrates all of these layers internally, which is why it remains the recommended starting point.

Option C — Restrict the bundled matrix to your ligands of interest¶

If you only care about a subset of ligands, slice the bundled pickle to reduce memory and computation during ligand_ranking:

import pickle, pandas as pd

with open('top_500_target_opt_both_scores.pkl', 'rb') as f:
    reg = pickle.load(f)

ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA', 'CXCL10']
reg_subset = reg.loc[[l for l in ligands_of_interest if l in reg.index]]

with open('subset_reg_potential.pkl', 'wb') as f:
    pickle.dump(reg_subset, f)

4. Defining Communication Domains¶

Renoir offers two strategies for identifying communication domains. Which one to use depends on whether you want data-driven discovery or hypothesis-driven annotation.

Strategy A — Data-driven Leiden clustering (`downstream_analysis`)¶

Use this when you have no prior knowledge of how many regions exist or where they are. Renoir reduces the score matrix to pathway PCs and clusters with Leiden:

neighbscore_copy, pcs = Renoir.downstream_analysis(
    neighborhood_scores,
    ltpair_clusters=pathways,
    resolution=0.6,        # key parameter — see below
    return_cluster=True,
    return_pcs=True,
)

Tuning resolution:

The effect of resolution is highly data-dependent and no single range applies universally. From experiments across datasets, the following has been observed as a rough guide for Visium data:

Resolution	Typical outcome (Visium)
0.1 – 0.3	3–5 broad domains (tumor / stroma / immune)
0.4 – 0.7	5–10 mid-grain domains — good starting point
0.8 – 1.5	10+ fine-grain domains; risk of over-fragmentation

For high-resolution single-cell platforms (CosMx, Xenium, MERSCOPE), the optimal resolution is often much lower. Values below 0.1 have been used successfully in practice — the much higher cell density means that even a very low resolution produces a meaningful number of domains. Start at 0.05 and increase gradually.

The best approach regardless of platform is to sweep a range and evaluate visually:

for res in [0.3, 0.5, 0.8, 1.0]:
    copy, _ = Renoir.downstream_analysis(
        neighborhood_scores,
        ltpair_clusters=pathways,
        resolution=res,
        return_cluster=True,
    )
    sc.pl.spatial(copy, color='leiden', title=f'resolution={res}', size=1.4)

Strategy B — Cell-type-informed clustering (`spot_v_spot`)¶

Use spot_v_spot when cell-type co-localisation matters as much as signalling similarity. It combines the L–T score matrix with pairwise cosine similarity of cell-type abundances, so domains reflect both what signalling is happening and which cell types are co-localised:

rd.spot_v_spot(
    neighborhood_scores,
    celltype,
    resolution=0.8,
    ltpair_clusters=pathways,
    pdf_path='spot_v_spot_output.pdf',
)

Strategy C — User-defined regions of interest¶

If you already know which spots belong to a region of interest (e.g., from pathologist annotations, RCTD output, or manual selection in Loupe Browser), assign the labels directly to obs['leiden'] and skip clustering entirely:

import pandas as pd

# Load your manual annotations: a CSV with columns 'barcode' and 'region'
annotations = pd.read_csv('manual_annotations.csv', index_col='barcode')

# Assign to the neighborhood scores object
neighborhood_scores.obs['leiden'] = annotations.loc[
    neighborhood_scores.obs_names, 'region'
].astype('category')

# Now run DE and ligand ranking as normal — Renoir treats the labels the same
# regardless of whether they came from clustering or manual annotation
sc.tl.rank_genes_groups(neighborhood_scores, 'leiden', method='wilcoxon')

Tip: Manual annotations and Leiden clusters can be mixed. Annotate the regions you understand (tumor core, necrotic zone) and let Leiden cluster the rest — then merge the two obs columns before running ligand_ranking.

5. Choosing Cell Types and Providing Custom Markers for Ligand Ranking¶

ligand_ranking has two parameters that give you fine-grained control over which cell types are analysed and which ligand–target pairs are used as the ranking signal.

Controlling which cell types are included (`domain_celltypes`)¶

Option 1 — Top-N by abundance (default):

fig = Renoir.ligand_ranking(
    ...,
    domain_celltypes=['top', 5],   # top 5 most abundant cell types in the domain
)

Option 2 — Explicit cell type list:

Pass a list of cell-type names to restrict analysis to only those types, regardless of their abundance in the domain. This is useful when you have a biological hypothesis (e.g., “I only care about T cell – tumour interactions”):

fig = Renoir.ligand_ranking(
    ...,
    domain_celltypes=['Cancer Basal SC', 'T cells CD8+', 'Macrophage'],
)

The cell-type names must match the column names in your celltype AnnData exactly (check with celltype.var_names or celltype.to_df().columns).

Controlling the ranking signal (`markers`)¶

The markers parameter defines which ligand–target pairs are used to score each ligand’s predicted activity in the domain.

Option 1 — Top-N DE marker pairs (default):

fig = Renoir.ligand_ranking(
    ...,
    markers={'top': 100},   # use the top 100 DE pairs of the domain
)

Increase top if you want a broader ranking signal; decrease it to focus on only the most domain-specific pairs.

Option 2 — User-defined cell-type marker genes:

Instead of using DE pairs from the domain, you can supply your own curated marker genes per cell type as a dictionary. The keys are cell-type names and the values are lists of marker genes. Renoir uses these to identify which ligands best explain the activity of those markers within the domain:

# Define known marker genes per cell type
custom_markers = {
    'Cancer Basal SC': ['KRT5', 'KRT14', 'TP63', 'CDH3'],
    'T cells CD8+':    ['CD8A', 'CD8B', 'GZMB', 'PRF1', 'IFNG'],
    'Macrophage':      ['CD68', 'CD163', 'MRC1', 'CSF1R'],
    'CAFs myCAF-like': ['ACTA2', 'FAP', 'POSTN', 'THY1'],
}

fig = Renoir.ligand_ranking(
    ...,
    markers=custom_markers,
)

This is particularly useful when:

The domain of interest is small and rank_genes_groups returns few DE pairs.
You have strong prior knowledge about which cell types are biologically relevant in the domain and want to anchor the ranking to known biology.
You want results that are directly comparable across datasets or studies, where DE-derived markers would differ due to batch or composition differences.

The marker gene names must match the gene names in your scRNA-seq reference AnnData (SC.var_names).

Filtering ligands by receptor expression (`receptor_exp`)¶

receptor_exp sets the minimum fraction of spots in the domain that must express a ligand’s receptor. Raise it to be more stringent (only ligands the domain is definitely listening to), or lower it to capture weak but potentially important signals:

# Strict: receptor must be expressed in ≥10% of domain spots (default)
fig = Renoir.ligand_ranking(..., receptor_exp=0.1)

# Permissive: receptor expressed in ≥1% of domain spots
fig = Renoir.ligand_ranking(..., receptor_exp=0.01)

Tip: If ligand_ranking returns very few ligands, it is usually because receptor_exp is too high for your data. Try lowering it to 0.01 and check whether the additional ligands make biological sense before committing to the lower threshold.

Quick Reference¶

Goal	Parameter	Where
Wider neighborhood	`use_radius=True, radius=N` (N in your coordinate units)	`compute_neighborhood_scores`
Single-cell mode	`single_cell=True`	`compute_neighborhood_scores`
Fewer, broader domains (Visium)	`resolution=0.2`	`downstream_analysis` / `spot_v_spot`
More, finer domains (Visium)	`resolution=1.0`	`downstream_analysis` / `spot_v_spot`
High-res platforms (CosMx etc.)	`resolution=0.05` or lower	`downstream_analysis` / `spot_v_spot`
Cell-type-aware domains	use `spot_v_spot` instead of `downstream_analysis`	—
Manual region annotations	assign `obs['leiden']` directly	—
Focus on specific cell types	`domain_celltypes=['CellA', 'CellB']`	`ligand_ranking`
DE-driven ranking signal	`markers={'top': 100}`	`ligand_ranking`
Custom marker gene ranking	`markers={'CellType': ['gene1', 'gene2']}`	`ligand_ranking`
Strict receptor filter	`receptor_exp=0.1`	`ligand_ranking`
Permissive receptor filter	`receptor_exp=0.01`	`ligand_ranking`