# Tips for Running Renoir Effectively This page collects practical guidance for getting the most out of Renoir across different spatial transcriptomics technologies and experimental designs. Each tip references the exact parameters you need to adjust. > **NOTE:** We are currently adding more tips to this page to improve tool performance. --- ## 1. Choosing the Optimal Neighborhood Size The neighborhood definition is set in `compute_neighborhood_scores` via the `technology`, `use_radius`, and `radius` parameters. The right choice depends on your platform's resolution and the biological scale of the signalling you expect. ### Visium (55 µm spots, ~6 spot diameter) Visium spots are large and sparsely packed. The default hexagonal ring neighborhood (the 6 immediately adjacent spots) is a good starting point and requires no extra parameters: ```python neighborhood_scores = Renoir.compute_neighborhood_scores( ..., single_cell=False, # spot-level mode use_radius=False, # use default hex-ring neighbor definition ) ``` If the signal looks too local (e.g., domains are only 1–2 spots wide), expand to a wider neighborhood by switching to radius mode: ```python neighborhood_scores = Renoir.compute_neighborhood_scores( ..., single_cell=False, use_radius=True, radius=200, # 200 coordinate units — check your dataset's coordinate scale ) ``` ### Visium HD (8 µm bins) Visium HD produces much denser data. Because individual bins are small, a larger radius is needed to capture biologically meaningful neighborhoods. Start with a moderate value and increase if domains appear fragmented: ```python neighborhood_scores = Renoir.compute_neighborhood_scores( ..., single_cell=False, use_radius=True, radius=50, # in coordinate units — adjust based on your dataset's scale ) ``` ### CosMx / Xenium / MERSCOPE (single-cell resolution) For single-cell platforms, enable `single_cell=True` and set `radius` in the same coordinate units as your AnnData's spatial coordinates: ```python neighborhood_scores = Renoir.compute_neighborhood_scores( ..., single_cell=True, use_radius=True, radius=150, # in coordinate units — check obsm['spatial'] for your dataset's scale ) ``` > **Important:** `radius` is always in the same units as the X, Y coordinates > stored in your AnnData (typically `obsm['spatial']`). These units differ > between datasets and platforms. Always inspect your coordinates before > choosing a radius value: > > ```python > import pandas as pd > coords = pd.DataFrame(neighborhood_scores.obsm['spatial'], columns=['x', 'y']) > print(coords.describe()) # check the range to understand the coordinate scale > ``` **General rule of thumb:** | Technology | Neighborhood mode | `single_cell` | |---|---|---| | Visium | `use_radius=False` (hex ring default) | `False` | | Visium HD | `use_radius=True`, tune radius to your coordinate scale | `False` | | CosMx | `use_radius=True`, tune radius to your coordinate scale | `True` | | Xenium | `use_radius=True`, tune radius to your coordinate scale | `True` | | MERSCOPE | `use_radius=True`, tune radius to your coordinate scale | `True` | > **Tip:** If you are unsure, run `downstream_analysis` at two or three radii > and compare the spatial coherence of the resulting domains. Domains that look > biologically sensible (contiguous regions matching known tissue compartments) > are a good sign that the neighborhood size is appropriate. --- ## 2. Creating Your Own Curated Ligand–Receptor and Ligand–Target Pair Lists Renoir accepts custom pair lists via the `ligand_receptor_path` and `pairs_path` arguments in `compute_neighborhood_scores`. Both are simply CSV files with specific column names. ### Ligand–receptor pairs (`ligand_receptor_path`) The file must have at least two columns named `ligand` and `receptor`: ``` ligand,receptor TGFB1,TGFBR1 TGFB1,TGFBR2 IL6,IL6R VEGFA,FLT1 VEGFA,KDR ``` **Sources to build from:** - [CellChat DB](https://github.com/sqjin/CellChat) — curated, literature-backed - [OmniPath](https://omnipathdb.org/) — large, multi-resource aggregate - [NATMI](https://github.com/asrhou/NATMI) — what the tutorials use - [CellPhoneDB](https://www.cellphonedb.org/) — includes multi-subunit complexes To filter to your biology of interest (e.g., only cytokine signalling): ```python import pandas as pd lr = pd.read_csv('All_human_lrpairs.csv') # Keep only cytokine-related ligands (example gene list) cytokines = ['IL6', 'IL1B', 'TNF', 'IFNG', 'CXCL10', 'CCL2'] lr_cytokine = lr[lr['ligand'].isin(cytokines)] lr_cytokine.to_csv('cytokine_lrpairs.csv', index=False) ``` ### Ligand–target pairs (`pairs_path`) The file must have columns named `ligand` and `target`: ``` ligand,target IL6,STAT3 IL6,MYC TGFB1,SNAI1 VEGFA,HIF1A ``` **How to generate a ranked list:** The bundled top-N files (`top_10_target_opt_both_ordered.csv`, `top_100_target_opt_both_ordered.csv`) are derived from NicheNet's regulatory potential matrix. To build your own ranked list from NicheNet: ```python import pandas as pd # Load the full NicheNet regulatory potential matrix # (available at https://zenodo.org/record/3260758) reg = pd.read_csv('ligand_target_matrix.csv', index_col=0) # Keep only your ligands of interest ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA'] reg_subset = reg.loc[ligands_of_interest] # For each ligand, keep the top-N targets by regulatory potential score top_n = 20 rows = [] for ligand in reg_subset.index: top_targets = reg_subset.loc[ligand].nlargest(top_n) for target, score in top_targets.items(): rows.append({'ligand': ligand, 'target': target, 'score': score}) custom_pairs = pd.DataFrame(rows).sort_values('score', ascending=False) custom_pairs.to_csv('my_ligand_target_pairs.csv', index=False) ``` > **Tip:** Start with a smaller, focused pair list (top 10–20 pairs per ligand) > rather than top 100+. Downstream clustering is faster and domains are often > more interpretable when the signal is not diluted by low-scoring pairs. --- ## 3. Generating Your Own Ligand–Target Regulatory Potential Scores The `ligand_target_regulatory_potential` object used by `ligand_ranking` is a precomputed dictionary (or DataFrame) mapping each ligand to regulatory potential scores across its top target genes. There are three ways to produce this. ### Option A — Use the NicheNet matrix directly (recommended) The NicheNet team publish a precomputed human and mouse matrix on Zenodo (record 3260758). Load it and convert to the format Renoir expects: ```python import pandas as pd, pickle # Download from: https://zenodo.org/record/3260758 reg = pd.read_csv('ligand_target_matrix.csv', index_col=0) # ligands × targets # Keep only the top 500 targets per ligand (matching the bundled file) top_500 = reg.apply(lambda row: row.nlargest(500), axis=1) # Save as a pickle with open('top_500_target_opt_both_scores.pkl', 'wb') as f: pickle.dump(top_500, f) ``` ### Option B — Use an existing regulatory database Rather than inferring a gene regulatory network from data (which is unreliable in practice), it is better to derive regulatory potentials from a curated transcription factor–target database. Two well-maintained options are: - **[DoRothEA](https://saezlab.github.io/dorothea/)** — curated TF–target regulons with confidence levels (A–E); available for human and mouse. - **[CollecTRI](https://github.com/saezlab/CollecTRI)** — a comprehensive signed TF–target network compiled from literature and ChIP-seq data. Both can be accessed via the `decoupler` Python package and converted to the format Renoir expects: ```python import decoupler as dc import pandas as pd, pickle # Load DoRothEA regulons (human, confidence levels A and B only) dorothea = dc.get_dorothea(organism='human', levels=['A', 'B']) # dorothea has columns: 'source' (TF), 'target', 'weight', 'confidence' # Map TFs to their upstream ligands using your ligand-receptor table # (i.e., if a receptor activates a TF, then the ligand -> TF -> target chain # gives you the ligand's regulatory potential over that target) lr = pd.read_csv('All_human_lrpairs.csv') # columns: ligand, receptor # Build a ligand -> target score table via receptor -> TF -> target rows = [] for _, row in lr.iterrows(): ligand, receptor = row['ligand'], row['receptor'] # Find targets regulated by TFs known to be downstream of this receptor # (requires a receptor -> TF mapping from e.g. OmniPath kinase-substrate) tf_targets = dorothea[dorothea['source'] == receptor] for _, t in tf_targets.iterrows(): rows.append({'ligand': ligand, 'target': t['target'], 'score': t['weight']}) reg_potential = pd.DataFrame(rows) reg_potential = reg_potential.groupby(['ligand', 'target'])['score'].max().unstack(fill_value=0) with open('dorothea_reg_potential.pkl', 'wb') as f: pickle.dump(reg_potential, f) ``` > **Note:** A full receptor → TF mapping requires an additional signalling > resource such as [OmniPath](https://omnipathdb.org/) (via `omnipath` Python > package) or [SignaLink](https://signalink.org/). The NicheNet matrix (Option A) > already integrates all of these layers internally, which is why it remains the > recommended starting point. ### Option C — Restrict the bundled matrix to your ligands of interest If you only care about a subset of ligands, slice the bundled pickle to reduce memory and computation during `ligand_ranking`: ```python import pickle, pandas as pd with open('top_500_target_opt_both_scores.pkl', 'rb') as f: reg = pickle.load(f) ligands_of_interest = ['IL6', 'TGFB1', 'VEGFA', 'CXCL10'] reg_subset = reg.loc[[l for l in ligands_of_interest if l in reg.index]] with open('subset_reg_potential.pkl', 'wb') as f: pickle.dump(reg_subset, f) ``` --- ## 4. Defining Communication Domains Renoir offers two strategies for identifying communication domains. Which one to use depends on whether you want data-driven discovery or hypothesis-driven annotation. ### Strategy A — Data-driven Leiden clustering (`downstream_analysis`) Use this when you have no prior knowledge of how many regions exist or where they are. Renoir reduces the score matrix to pathway PCs and clusters with Leiden: ```python neighbscore_copy, pcs = Renoir.downstream_analysis( neighborhood_scores, ltpair_clusters=pathways, resolution=0.6, # key parameter — see below return_cluster=True, return_pcs=True, ) ``` **Tuning `resolution`:** The effect of resolution is highly data-dependent and no single range applies universally. From experiments across datasets, the following has been observed as a rough guide **for Visium data**: | Resolution | Typical outcome (Visium) | |---|---| | 0.1 – 0.3 | 3–5 broad domains (tumor / stroma / immune) | | 0.4 – 0.7 | 5–10 mid-grain domains — good starting point | | 0.8 – 1.5 | 10+ fine-grain domains; risk of over-fragmentation | For **high-resolution single-cell platforms** (CosMx, Xenium, MERSCOPE), the optimal resolution is often much lower. Values below 0.1 have been used successfully in practice — the much higher cell density means that even a very low resolution produces a meaningful number of domains. Start at 0.05 and increase gradually. The best approach regardless of platform is to sweep a range and evaluate visually: ```python for res in [0.3, 0.5, 0.8, 1.0]: copy, _ = Renoir.downstream_analysis( neighborhood_scores, ltpair_clusters=pathways, resolution=res, return_cluster=True, ) sc.pl.spatial(copy, color='leiden', title=f'resolution={res}', size=1.4) ``` ### Strategy B — Cell-type-informed clustering (`spot_v_spot`) Use `spot_v_spot` when cell-type co-localisation matters as much as signalling similarity. It combines the L–T score matrix with pairwise cosine similarity of cell-type abundances, so domains reflect both what signalling is happening and which cell types are co-localised: ```python rd.spot_v_spot( neighborhood_scores, celltype, resolution=0.8, ltpair_clusters=pathways, pdf_path='spot_v_spot_output.pdf', ) ``` ### Strategy C — User-defined regions of interest If you already know which spots belong to a region of interest (e.g., from pathologist annotations, RCTD output, or manual selection in Loupe Browser), assign the labels directly to `obs['leiden']` and skip clustering entirely: ```python import pandas as pd # Load your manual annotations: a CSV with columns 'barcode' and 'region' annotations = pd.read_csv('manual_annotations.csv', index_col='barcode') # Assign to the neighborhood scores object neighborhood_scores.obs['leiden'] = annotations.loc[ neighborhood_scores.obs_names, 'region' ].astype('category') # Now run DE and ligand ranking as normal — Renoir treats the labels the same # regardless of whether they came from clustering or manual annotation sc.tl.rank_genes_groups(neighborhood_scores, 'leiden', method='wilcoxon') ``` > **Tip:** Manual annotations and Leiden clusters can be mixed. Annotate the > regions you understand (tumor core, necrotic zone) and let Leiden cluster the > rest — then merge the two `obs` columns before running `ligand_ranking`. --- ## 5. Choosing Cell Types and Providing Custom Markers for Ligand Ranking `ligand_ranking` has two parameters that give you fine-grained control over which cell types are analysed and which ligand–target pairs are used as the ranking signal. ### Controlling which cell types are included (`domain_celltypes`) **Option 1 — Top-N by abundance (default):** ```python fig = Renoir.ligand_ranking( ..., domain_celltypes=['top', 5], # top 5 most abundant cell types in the domain ) ``` **Option 2 — Explicit cell type list:** Pass a list of cell-type names to restrict analysis to only those types, regardless of their abundance in the domain. This is useful when you have a biological hypothesis (e.g., "I only care about T cell – tumour interactions"): ```python fig = Renoir.ligand_ranking( ..., domain_celltypes=['Cancer Basal SC', 'T cells CD8+', 'Macrophage'], ) ``` The cell-type names must match the column names in your `celltype` AnnData exactly (check with `celltype.var_names` or `celltype.to_df().columns`). ### Controlling the ranking signal (`markers`) The `markers` parameter defines which ligand–target pairs are used to score each ligand's predicted activity in the domain. **Option 1 — Top-N DE marker pairs (default):** ```python fig = Renoir.ligand_ranking( ..., markers={'top': 100}, # use the top 100 DE pairs of the domain ) ``` Increase `top` if you want a broader ranking signal; decrease it to focus on only the most domain-specific pairs. **Option 2 — User-defined cell-type marker genes:** Instead of using DE pairs from the domain, you can supply your own curated marker genes per cell type as a dictionary. The keys are cell-type names and the values are lists of marker genes. Renoir uses these to identify which ligands best explain the activity of those markers within the domain: ```python # Define known marker genes per cell type custom_markers = { 'Cancer Basal SC': ['KRT5', 'KRT14', 'TP63', 'CDH3'], 'T cells CD8+': ['CD8A', 'CD8B', 'GZMB', 'PRF1', 'IFNG'], 'Macrophage': ['CD68', 'CD163', 'MRC1', 'CSF1R'], 'CAFs myCAF-like': ['ACTA2', 'FAP', 'POSTN', 'THY1'], } fig = Renoir.ligand_ranking( ..., markers=custom_markers, ) ``` This is particularly useful when: - The domain of interest is small and `rank_genes_groups` returns few DE pairs. - You have strong prior knowledge about which cell types are biologically relevant in the domain and want to anchor the ranking to known biology. - You want results that are directly comparable across datasets or studies, where DE-derived markers would differ due to batch or composition differences. The marker gene names must match the gene names in your scRNA-seq reference AnnData (`SC.var_names`). ### Filtering ligands by receptor expression (`receptor_exp`) `receptor_exp` sets the minimum fraction of spots in the domain that must express a ligand's receptor. Raise it to be more stringent (only ligands the domain is definitely listening to), or lower it to capture weak but potentially important signals: ```python # Strict: receptor must be expressed in ≥10% of domain spots (default) fig = Renoir.ligand_ranking(..., receptor_exp=0.1) # Permissive: receptor expressed in ≥1% of domain spots fig = Renoir.ligand_ranking(..., receptor_exp=0.01) ``` > **Tip:** If `ligand_ranking` returns very few ligands, it is usually because > `receptor_exp` is too high for your data. Try lowering it to `0.01` and check > whether the additional ligands make biological sense before committing to the > lower threshold. --- ## Quick Reference | Goal | Parameter | Where | |---|---|---| | Wider neighborhood | `use_radius=True, radius=N` (N in your coordinate units) | `compute_neighborhood_scores` | | Single-cell mode | `single_cell=True` | `compute_neighborhood_scores` | | Fewer, broader domains (Visium) | `resolution=0.2` | `downstream_analysis` / `spot_v_spot` | | More, finer domains (Visium) | `resolution=1.0` | `downstream_analysis` / `spot_v_spot` | | High-res platforms (CosMx etc.) | `resolution=0.05` or lower | `downstream_analysis` / `spot_v_spot` | | Cell-type-aware domains | use `spot_v_spot` instead of `downstream_analysis` | — | | Manual region annotations | assign `obs['leiden']` directly | — | | Focus on specific cell types | `domain_celltypes=['CellA', 'CellB']` | `ligand_ranking` | | DE-driven ranking signal | `markers={'top': 100}` | `ligand_ranking` | | Custom marker gene ranking | `markers={'CellType': ['gene1', 'gene2']}` | `ligand_ranking` | | Strict receptor filter | `receptor_exp=0.1` | `ligand_ranking` | | Permissive receptor filter | `receptor_exp=0.01` | `ligand_ranking` |