# Component Separation

Component separation uses scattering-covariance statistics as differentiable
morphological priors to separate mixed physical fields. Because the statistics
encode multi-scale, cross-orientation correlations, they capture morphological
*shape* beyond what a simple power spectrum can describe — making them powerful
priors for separating physically distinct components.

This workflow corresponds to `Remove_CMB.ipynb` in `demo-foscat-pangeo-eosc`.

---

## The problem

Given an observed mixture:

$$d = s_1 + s_2 + \cdots + s_n$$

find component maps $\hat{s}_1, \ldots, \hat{s}_n$ such that:

- $\hat{s}_1 + \cdots + \hat{s}_n \approx d$ (mixture consistency)
- $\Phi(\hat{s}_i) \approx \Phi_i^\text{ref}$ (each component matches its expected morphology)

where $\Phi_i^\text{ref}$ are reference statistics for each component (from a
training set, a physical model, or a prior observation).

This is an underdetermined inverse problem: FOSCAT regularises it with scattering
statistics rather than with smoothness or sparsity priors.

---

## General workflow

```
  Observed mixture d = s₁ + s₂  (known)
        │
        ▼  optimise over (ŝ₁, ŝ₂)
  ┌──────────────────────────────────────────────────────────┐
  │  L_mix   = ‖ŝ₁ + ŝ₂ − d‖²          (mixture fidelity)  │
  │  L_stat₁ = ‖Φ(ŝ₁) − Φ(s₁_ref)‖²    (morphology prior)  │
  │  L_stat₂ = ‖Φ(ŝ₂) − Φ(s₂_ref)‖²    (morphology prior)  │
  └──────────────────────────────────────────────────────────┘
        │
        ▼
  Synthesis([L_mix, L_stat₁, L_stat₂]).run(x0)
        │
        ▼
  ŝ₁*, ŝ₂*  — separated components
```

---

## Example: CMB-like background removal

```python
import numpy as np
import foscat.scat_cov as sc
from foscat.Synthesis import Loss, Synthesis

nside = 64
npix  = 12 * nside**2

# --- input data ---
mixture   = np.load("observed_mixture.npy")   # d = foreground + CMB
cmb_ref   = np.load("cmb_simulations.npy")    # ensemble of CMB realisations
fg_ref    = np.load("foreground_ref.npy")     # reference foreground map

# --- operator ---
scat_op = sc.funct(KERNELSZ=5, NORIENT=4, nstep_max=4, all_type='float64')

# compute reference statistics (average over CMB ensemble if available)
cmb_stat = scat_op.eval(cmb_ref)
fg_stat  = scat_op.eval(fg_ref)

# --- optimise over (ŝ_fg, ŝ_cmb) jointly ---
# pack both fields into a single vector: x = [ŝ_fg | ŝ_cmb]
def mixture_loss(x, scat_op, args):
    d = args[0]
    s_fg  = x[:npix]
    s_cmb = x[npix:]
    residual = (s_fg + s_cmb - d) ** 2
    return scat_op.backend.bk_mean(residual)

def fg_stat_loss(x, scat_op, args):
    ref = args[0]
    s_fg = x[:npix]
    stat = scat_op.eval(s_fg)
    return stat.reduce_mean_batch((stat - ref) ** 2)

def cmb_stat_loss(x, scat_op, args):
    ref = args[0]
    s_cmb = x[npix:]
    stat = scat_op.eval(s_cmb)
    return stat.reduce_mean_batch((stat - ref) ** 2)

loss_mix  = Loss(mixture_loss,  scat_op, mixture)
loss_fg   = Loss(fg_stat_loss,  scat_op, fg_stat)
loss_cmb  = Loss(cmb_stat_loss, scat_op, cmb_stat)

# initialise: foreground = mixture, CMB = zero
x0 = np.concatenate([mixture, np.zeros(npix)])

solver = Synthesis([loss_mix, loss_fg, loss_cmb], eta=0.01)
result = solver.run(x0, NUM_EPOCHS=500, EVAL_FREQUENCY=20)

fg_hat  = result[:npix]
cmb_hat = result[npix:]
```

---

## Weighting losses

The three losses above have different scales. If one dominates, weight them
explicitly by scaling your loss function return values:

```python
def cmb_stat_loss(x, scat_op, args):
    ref, weight = args
    s_cmb = x[npix:]
    stat = scat_op.eval(s_cmb)
    return weight * stat.reduce_mean_batch((stat - ref) ** 2)

loss_cmb = Loss(cmb_stat_loss, scat_op, cmb_stat, 0.5)  # weight = 0.5
```

A good heuristic: at the first iteration, print each loss component separately
and set weights so they are all $O(1)$.

---

## Gradient masking for partial-sky separation

When the observation covers only part of the sky, freeze pixels outside the
survey footprint and only update the internal region:

```python
grd_mask = survey_mask.astype(np.float64)
grd_mask_full = np.tile(grd_mask, 2)   # one mask per component

result = solver.run(x0, NUM_EPOCHS=500, grd_mask=grd_mask_full)
```

---

## Practical notes

- **Reference quality.** The quality of separation depends critically on the
  reference statistics $\Phi_i^\text{ref}$. Averaging over an ensemble of
  simulations gives better priors than a single reference map.

- **Amplitude ambiguity.** Scattering statistics are not amplitude-preserving by
  default. If the amplitude ratio between components matters, add a power-spectrum
  constraint (e.g. via `healpy.anafast`) as an additional loss term.

- **Number of iterations.** Component separation typically needs more iterations
  than simple synthesis (500–1000 epochs). The loss may decrease slowly if the
  two components are morphologically similar.

- **Initialisation.** Initialising with the observed mixture for the first
  component and zeros for the second often converges faster than random noise.