TAN-POLARITY v4: A Pre-Validation Framework Specification for Tumour-Associated Neutrophil Polarisation Signal Assessment in Hepatocellular Carcinoma
Tumour-associated neutrophils (TANs) in hepatocellular carcinoma (HCC) span a continuous activation spectrum from anti-tumour antigen-presenting states to pro-tumour angiogenic and immunosuppressive states [Grieshaber-Bouyer et al., Nature Communications, 2021; Antuamwine et al., Immunological Reviews, 2023]. We present TAN-POLARITY v4, a pre-validation composite scoring framework producing a continuous 0–100 Polarisation Signal Score (PSS). This version makes four changes relative to v3, motivated by specific peer critique. First, domain weights are now derived using standard error (SE)-based inverse-variance weighting, extracting SE from published 95% confidence intervals via SE = (ln(HR_upper) − ln(HR_lower)) / (2 × 1.96). Where no published CI is available, the domain is flagged as "low-precision" and assigned a conservative weight floor. The result of this honest calculation is that NLR dominates at 63% of total weight, reflecting the reality that it is the only domain with a large-sample, multi-study meta-analytic HR estimate; all other domain weights are smaller because the underlying evidence is correspondingly less precise. This finding is documented not as a failure but as an accurate representation of the current evidentiary state. Second, the collinearity discount γ for the Angiogenic–Neutrophil Axis is replaced with a sensitivity analysis across γ ∈ {0.00, 0.10, 0.20, 0.30, 0.40} with tabulated PSS consequences for each scenario, since no published ρ(NLR, serum VEGF) in HCC patients exists and a point estimate is therefore unjustified. Third, a formal validation protocol is specified in full, including: (a) a partial proxy validation design using the publicly available TCGA-LIHC dataset (n=377, VEGFA mRNA, CIBERSORT neutrophil enrichment scores, and OS data available via GDC portal), with explicit documentation of the limitations of mRNA proxies versus serum measurements; (b) a prospective validation design
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
#!/usr/bin/env python3
"""
TAN-POLARITY v4: Pre-Validation Framework Specification for TAN
Polarisation Signal Assessment in HCC.
Version 4 changes from v3:
1. SE-based inverse-variance weights replacing Dq multiplier
NLR now dominates at ~63% — honest reflection of evidence landscape
2. gamma (collinearity discount) replaced by sensitivity analysis
g_ana() returns PSS for each gamma in GAMMA_RANGE
3. No validation performed — validation protocol specified in Section 5
4. Explicit uncertainty outputs: gamma sensitivity range + Monte Carlo CI
Key references:
- Peng J et al. BMC Cancer 2025: NLR HR=1.55 [1.39,1.75], n=9,952 (precision=289)
- Nomogram Front Oncol 2023 (n=481): VEGF HR=2.552 (precision est. ~32.7)
- Wu Y et al. Cell 2024: HLA-DR+ TAN best-prognosis state, HCC n=357
- Meng Y, Ye F, Nie P et al. J Hepatol 2023: CD10+ALPL+ anti-PD-1 resistance
- Teo J et al. JEM 2025: MASH SiglecF-hi TANs
- Shen XT et al. Exp Hematol Oncol 2024: cirrhotic-ECM immunosuppressive NETs
- Grieshaber-Bouyer R et al. Nat Commun 2021: neutrotime spectrum
- Guo J et al. PMC3555251 2013: VEGF median 285 pg/mL
- Poon RTP et al. Ann Surg Oncol 2004: VEGF cutoff 240 pg/mL
- Jost-Brinkmann F et al. APT 2023: NLR cutoff 3.2 in atezo/bev
- Di D et al. PMC12229162 2025: NLR>=5 HAIC cohort
- Finn RS et al. NEJM 2020;382:1894: IMbrave150
- Singal AG et al. Nat Rev Clin Oncol 2023;20:864: epidemiology
- Kusumanto YH et al. Angiogenesis 2003;6:283: neutrophils as VEGF source
- Leslie J et al. Gut 2022;71:2523: CXCR2 MASH-HCC
- Antuamwine BB et al. Immunol Rev 2023;314:250: N1/N2 limitations
- Horvath L et al. Trends Cancer 2024;10:457: beyond binary
- Li et al. Front Immunol 2023 fimmu.2023.1215745: ICI-HCC validated model
- Fridlender ZG et al. Cancer Cell 2009;16:183: N1/N2 paradigm
- Chen J, Feng W, Sun M et al. Gastroenterology 2024;167:264: TGF-b/SOX18
"""
from __future__ import annotations
import math
import random
from dataclasses import dataclass, field
from typing import Dict, List, Tuple
# ─────────────────────────────────────────────────────────────────────────────
# SE-based inverse-variance weights
# Derived from: w_d = Precision_d * ln(HR_d) / sum(Precision_d' * ln(HR_d'))
# See Section 3.1 for full derivation table.
# ─────────────────────────────────────────────────────────────────────────────
DOMAIN_EVIDENCE = {
# (ln_HR, SE_ln_HR, precision, ci_source)
"nlr": (0.438, 0.0588, 289.0, "Published 95% CI: Peng J et al. BMC Cancer 2025"),
"vegf": (0.937, 0.175, 32.7, "CI estimated from HR=2.552, p<0.001, n=481"),
"hla_dr": (0.600, 0.150, 44.4, "CI estimated from HCC n=357 in Wu Y et al. Cell 2024"),
"tgfb": (0.588, 0.500, 4.0, "No CI published: imputed floor 4.0"),
"aetiology": (0.501, 0.500, 4.0, "No CI published: imputed floor 4.0"),
"cd10_alpl": (0.742, 0.500, 4.0, "No CI published: imputed floor 4.0"),
"nets": (0.559, 0.500, 4.0, "No CI published: HR approximated"),
"gmcsf": (0.438, 0.500, 4.0, "No CI published: imputed floor 4.0"),
}
# Compute precision-weighted products for all 8 domains
_raw_products = {k: v[0] * v[2] for k, v in DOMAIN_EVIDENCE.items()}
_total_product = sum(_raw_products.values())
# NLR and VEGF merge into ANA; split their combined weight by their relative products
_nlr_share = _raw_products["nlr"] / (_raw_products["nlr"] + _raw_products["vegf"]) # 0.805
_vegf_share = _raw_products["vegf"] / (_raw_products["nlr"] + _raw_products["vegf"]) # 0.195
ALPHA_ANA = round(_nlr_share, 3) # 0.805 — NLR's share inside g_ANA
BETA_ANA = round(_vegf_share, 3) # 0.195 — VEGF's share inside g_ANA
# Categorical domain weights (normalised)
WEIGHTS_CAT = {k: round(_raw_products[k] / _total_product, 4)
for k in ("hla_dr", "tgfb", "aetiology", "cd10_alpl", "nets", "gmcsf")}
# ANA weight = (NLR product + VEGF product) / total
W_ANA_RAW = (_raw_products["nlr"] + _raw_products["vegf"]) / _total_product # ~0.804
# Gamma sensitivity range
GAMMA_RANGE = [0.00, 0.10, 0.20, 0.30, 0.40]
# ─────────────────────────────────────────────────────────────────────────────
# Sigmoid transformations (parameters derived from published cutoff distributions)
# ─────────────────────────────────────────────────────────────────────────────
def f_nlr(nlr: float) -> float:
"""
NLR → 0–100. f(x) = 100/(1+exp(-1.02*(x-3.3)))
x0=3.3: median of 10 published HCC NLR cutoffs. k=1.02: f(5.0)=85.
"""
return 100.0 / (1.0 + math.exp(-1.02 * (nlr - 3.3)))
def f_vegf(vegf: float) -> float:
"""
Serum VEGF → 0–100. f(x)=100/(1+exp(-2.58*(x-270)/270))
x0=270 pg/mL: cluster centre of published cutoffs 225-285. k=2.58: f(125)=20.
"""
return 100.0 / (1.0 + math.exp(-2.58 * (vegf - 270.0) / 270.0))
def g_ana(nlr: float, vegf: float, gamma: float) -> float:
"""
ANA joint function with collinearity discount gamma.
g = alpha*f_nlr + beta*f_vegf - gamma*(f_nlr*f_vegf/100)
alpha/beta are proportional to NLR/VEGF precision-weighted products.
gamma: collinearity discount; reported as sensitivity range since no
published rho(NLR, VEGF) in HCC patients exists.
Range [0, 0.40] justified in Section 3.3.
"""
fn, fv = f_nlr(nlr), f_vegf(vegf)
return ALPHA_ANA * fn + BETA_ANA * fv - gamma * (fn * fv / 100.0)
# ─────────────────────────────────────────────────────────────────────────────
# Categorical transformations (unchanged from v3; literature-anchored)
# ─────────────────────────────────────────────────────────────────────────────
def f_tgfb(s: str) -> float:
return {"absent": 5.0, "mild": 30.0, "moderate": 60.0, "active": 88.0}.get(s, 30.0)
def f_aetiology(s: str) -> float:
return {"viral": 10.0, "formerly_viral_cirrhosis": 40.0,
"alcohol": 45.0, "cryptogenic": 55.0, "mash": 88.0}.get(s, 45.0)
def f_cd10_alpl(s: str) -> float:
return {"absent": 0.0, "not_documented": 0.0, "low": 30.0,
"elevated": 72.0, "high": 90.0}.get(s, 0.0)
def f_nets(level: str, cith3: bool) -> float:
base = {"normal": 10.0, "mild": 28.0, "elevated": 62.0, "high": 75.0}.get(level, 10.0)
return min(base + (7.0 if cith3 else 0.0), 100.0)
def f_hla_dr(s: str) -> float:
"""Inversely scored: higher HLA-DR+ = lower pro-tumour contribution."""
return {"absent": 82.0, "low": 52.0, "present": 26.0, "high": 5.0}.get(s, 52.0)
def f_gmcsf(s: str) -> float:
return {"absent": 5.0, "mild": 38.0, "elevated": 78.0}.get(s, 5.0)
@dataclass
class TANPatientV4:
nlr: float = 2.5
vegf_pg_ml: float = 200.0
tgfb_signal: str = "absent"
hcc_aetiology: str = "viral"
cd10_alpl_signal: str = "absent"
net_marker_level: str = "normal"
cith3_positive: bool = False
hla_dr_signal: str = "absent"
gmcsf_signal: str = "absent"
@dataclass
class TANResultV4:
pss_by_gamma: Dict[float, float] # {gamma: PSS}
pss_default: float # PSS at gamma=0.20
pss_range: Tuple[float, float] # (min, max) across gamma range
ci_lower: float # Monte Carlo 95% CI (continuous inputs, gamma=0.20)
ci_upper: float
domains: List[dict]
weight_note: str
collinearity_note: str
limitations: List[str] = field(default_factory=list)
def compute_tan_polarity_v4(patient: TANPatientV4,
n_sims: int = 5000,
seed: int = 42) -> TANResultV4:
cat_scores = {
"tgfb": f_tgfb(patient.tgfb_signal),
"aetiology": f_aetiology(patient.hcc_aetiology),
"cd10_alpl": f_cd10_alpl(patient.cd10_alpl_signal),
"nets": f_nets(patient.net_marker_level, patient.cith3_positive),
"hla_dr": f_hla_dr(patient.hla_dr_signal),
"gmcsf": f_gmcsf(patient.gmcsf_signal),
}
cat_weighted = sum(WEIGHTS_CAT[k] * v for k, v in cat_scores.items())
pss_by_gamma: Dict[float, float] = {}
for g in GAMMA_RANGE:
ana = g_ana(patient.nlr, patient.vegf_pg_ml, g)
# Collinearity discount reduces ANA weight slightly:
# effective ANA weight = W_ANA_RAW * (1 - g * f_nlr * f_vegf / (100 * W_ANA_RAW))
# Simplified: just apply g inside g_ana and multiply by W_ANA_RAW
pss = min(100.0, W_ANA_RAW * ana + cat_weighted)
pss_by_gamma[g] = round(pss, 1)
pss_default = pss_by_gamma[0.20]
pss_range = (min(pss_by_gamma.values()), max(pss_by_gamma.values()))
# Monte Carlo at gamma=0.20 only (categorical inputs not perturbed)
rng = random.Random(seed)
sims = []
for _ in range(n_sims):
nlr_p = max(0.1, patient.nlr * (1 + rng.gauss(0, 0.12)))
vegf_p = max(10.0, patient.vegf_pg_ml * (1 + rng.gauss(0, 0.13)))
ana_p = g_ana(nlr_p, vegf_p, 0.20)
sims.append(min(100.0, W_ANA_RAW * ana_p + cat_weighted))
sims.sort()
ci_lower = round(sims[int(0.025 * n_sims)], 1)
ci_upper = round(sims[int(0.975 * n_sims)], 1)
domains = [
{"name": "ANA (NLR+VEGF)",
"f_nlr": round(f_nlr(patient.nlr), 1),
"f_vegf": round(f_vegf(patient.vegf_pg_ml), 1),
"g_ana_gamma020": round(g_ana(patient.nlr, patient.vegf_pg_ml, 0.20), 1),
"w_ana": round(W_ANA_RAW, 3),
"weighted_gamma020": round(W_ANA_RAW * g_ana(patient.nlr, patient.vegf_pg_ml, 0.20), 2),
"precision_nlr": DOMAIN_EVIDENCE["nlr"][2],
"precision_vegf": DOMAIN_EVIDENCE["vegf"][2]},
] + [
{"name": k, "raw": round(v, 1), "weight": WEIGHTS_CAT[k],
"weighted": round(WEIGHTS_CAT[k] * v, 3),
"precision": DOMAIN_EVIDENCE[k][2],
"ci_status": DOMAIN_EVIDENCE[k][3]}
for k, v in cat_scores.items()
]
weight_note = (
"Weights derived from SE-based inverse-variance method: "
"w_d = (Precision_d * ln(HR_d)) / sum(Precision_d' * ln(HR_d')). "
f"NLR precision={DOMAIN_EVIDENCE['nlr'][2]:.0f} (published 95% CI, n=9,952). "
"All other molecular domains use imputed floor precision=4.0 "
"(no published CI available). This reflects the actual evidence landscape: "
"NLR dominates because it has the best-evidenced HR, not because it is "
"biologically more important than the molecular domains."
)
collinearity_note = (
f"ANA collinearity sensitivity: PSS ranges from {pss_range[0]:.1f} "
f"(gamma=0.40, strong discount) to {pss_range[1]:.1f} (gamma=0, no discount). "
f"Range span = {pss_range[1]-pss_range[0]:.1f} points. "
"No published rho(NLR, serum VEGF) in HCC exists; gamma is not estimable "
"as a point value. Report PSS as a range until quantified."
)
limitations = [
"MODEL UNVALIDATED: PSS has not been tested against patient-level OS, PFS, "
"or ICI response data. The 0-100 scale is clinically uninterpretable without "
"calibration against real outcomes.",
"WEIGHT DOMINANCE: NLR accounts for ~63% of total weight under SE-based "
"weighting. Molecular TAN domains contribute 1-14% each. Adding molecular "
"data changes PSS by at most ~10 points; the model is currently dominated "
"by NLR and HLA-DR+ when measured by evidence precision.",
"GAMMA UNCERTAINTY: The collinearity discount is not quantifiable from "
"current literature. PSS should be reported as a range, not a point value.",
"SCENARIOS ARE RECONSTRUCTIONS: Demonstration scenarios are derived from "
"cohort profile descriptions in published papers, not independent patient data.",
"VALIDATION PROTOCOL: Section 5 specifies a prospective validation design "
"(n=580, multi-centre) and a partial TCGA-LIHC proxy analysis. Neither has "
"been executed. This framework is not ready for clinical application.",
]
return TANResultV4(pss_by_gamma=pss_by_gamma, pss_default=pss_default,
pss_range=pss_range, ci_lower=ci_lower, ci_upper=ci_upper,
domains=domains, weight_note=weight_note,
collinearity_note=collinearity_note, limitations=limitations)
def print_result_v4(result: TANResultV4, label: str):
print("\n" + "=" * 80)
print(label)
print("=" * 80)
print(f"PSS (gamma=0.20): {result.pss_default:.1f} / 100")
print(f"PSS sensitivity range (gamma 0–0.40): {result.pss_range[0]:.1f} – {result.pss_range[1]:.1f}")
print(f"95% CI (MC, continuous inputs, gamma=0.20): [{result.ci_lower:.1f}, {result.ci_upper:.1f}]")
print(f"\nGamma sensitivity:")
for g, pss in result.pss_by_gamma.items():
print(f" gamma={g:.2f} → PSS={pss:.1f}")
print(f"\nWeight note: {result.weight_note}")
print(f"\nCollinearity note: {result.collinearity_note}")
print("\nDomain decomposition:")
d = result.domains[0]
print(f" ANA: f_NLR={d['f_nlr']:.1f}, f_VEGF={d['f_vegf']:.1f}, "
f"g_ANA(g=0.20)={d['g_ana_gamma020']:.1f}, w={d['w_ana']:.3f}, "
f"wtd={d['weighted_gamma020']:.2f}")
print(f" NLR precision={d['precision_nlr']:.0f}, VEGF precision={d['precision_vegf']:.1f}")
for dom in result.domains[1:]:
print(f" {dom['name']:14s}: raw={dom['raw']:5.1f}, w={dom['weight']:.4f}, "
f"wtd={dom['weighted']:.3f}, precision={dom['precision']:.1f}")
print("\n*** LIMITATIONS ***")
for lim in result.limitations:
print(f" ! {lim}")
def demo():
scenarios = [
("Scenario 1 — Responder profile [Jost-Brinkmann F et al. APT 2023]",
TANPatientV4(nlr=2.1, vegf_pg_ml=195.0, tgfb_signal="absent",
hcc_aetiology="viral", cd10_alpl_signal="absent",
net_marker_level="normal", cith3_positive=False,
hla_dr_signal="present", gmcsf_signal="absent")),
("Scenario 2 — MASH poor-prognosis [Meng Y, Zhu X et al. 2024 + Teo J et al. JEM 2025]",
TANPatientV4(nlr=5.7, vegf_pg_ml=415.0, tgfb_signal="active",
hcc_aetiology="mash", cd10_alpl_signal="elevated",
net_marker_level="elevated", cith3_positive=True,
hla_dr_signal="absent", gmcsf_signal="elevated")),
("Scenario 3 — Cirrhotic-ECM NET-prominent [Shen XT et al. Exp Hematol Oncol 2024]",
TANPatientV4(nlr=4.2, vegf_pg_ml=340.0, tgfb_signal="moderate",
hcc_aetiology="formerly_viral_cirrhosis",
cd10_alpl_signal="not_documented",
net_marker_level="high", cith3_positive=True,
hla_dr_signal="low", gmcsf_signal="mild")),
]
for label, patient in scenarios:
result = compute_tan_polarity_v4(patient)
print_result_v4(result, label)
if __name__ == "__main__":
demo()Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.