← Макалаларга кайтуу
March 14, 2026
5 мүн окуу

Signal Correlation: How Many Pairs to Monitor

Signal Correlation: How Many Pairs to Monitor
#algotrading
#correlation
#diversification
#portfolio
#orchestration
#crypto

You launch a strategy on 10 crypto pairs: BTC/USDT, ETH/USDT, SOL/USDT, AVAX/USDT, and six more alts. The logic seems bulletproof: if the strategy is active 5% of the time on one pair, then across 10 pairs at least one should be active 10.9510=40%1 - 0.95^{10} = 40\% of the time. A fourfold increase in utilization.

In practice, utilization turns out to be 15-16%, not 40%. Your 10 pairs behave like 3. Capital sits idle, fill_efficiency drops, and the effective portfolio return ends up three times lower than projected.

The reason is signal correlation. And in cryptocurrencies, it is catastrophically high.

The Illusion of Diversification in Crypto

The illusion of diversification in cryptocurrency markets

In traditional finance, diversification works because Apple stock and an oil ETF react to different factors. In the cryptocurrency market, everything is different.

BTC is the dominant factor. When Bitcoin drops 5%, ETH drops 6-8%, SOL drops 8-12%, altcoins drop 10-20%. Daily return correlations in the crypto market are consistently above 0.6, and during panic they approach 1.0.

But for us — algo traders — what matters is not price correlation, but signal correlation. If the strategy is based on momentum and BTC triggers an entry signal, there is a high probability that ETH and SOL will trigger an analogous signal in the same minute. All pairs enter long simultaneously, all exit simultaneously. Ten positions — but essentially one bet.

Why 10 Pairs ≠ 10x Diversification

10 correlated pairs collapse into 3 effective ones

Formal Statement

Suppose a strategy on each of NN pairs is active pp fraction of the time. If signals were completely independent, the probability that at least one pair is active:

P(1 active)=1(1p)NP(\geq 1\ \text{active}) = 1 - (1 - p)^N

For Strategy B (p=0.05p = 0.05, N=10N = 10):

P(1)=10.9510=10.598740.1%P(\geq 1) = 1 - 0.95^{10} = 1 - 0.5987 \approx 40.1\%

But signals are not independent. Cryptocurrencies move in sync — meaning signals emerge and extinguish in clusters.

Correlation Turns 10 Pairs Into 3

The intuition is this: if 10 pairs are correlated, they carry information from not 10 independent sources but, say, 3-4. We formalize this through effective_N:

Neff=NCfN_{eff} = \frac{N}{C_f}

where CfC_f is the correlation factor, reflecting the average pairwise signal correlation. When Cf=1C_f = 1 the pairs are fully independent; when Cf=NC_f = N they are identical.

For crypto pairs, a typical Cf3C_f \approx 3. Then:

Neff=1033.3N_{eff} = \frac{10}{3} \approx 3.3

P(1)=10.953.310.844=15.6%P(\geq 1) = 1 - 0.95^{3.3} \approx 1 - 0.844 = 15.6\%

Not 40%, but 15.6%. A 2.5x difference. Fill efficiency drops accordingly, and with it the effective return of the entire portfolio (see PnL per Active Time).

Correlation in Crypto Markets

Crypto signal correlation matrix heatmap

BTC as the Dominant Factor

The crypto market has a pronounced factor structure. BTC explains 60-80% of the daily return variance for most altcoins. This is clearly visible through PCA (Principal Component Analysis):

import numpy as np
from sklearn.decomposition import PCA

def analyze_crypto_factor_structure(returns_matrix: np.ndarray, pair_names: list) -> dict:
    """
    PCA analysis of the factor structure of crypto returns.

    Args:
        returns_matrix: returns matrix [n_days x n_pairs]
        pair_names: list of pair names
    """
    pca = PCA()
    pca.fit(returns_matrix)

    explained = pca.explained_variance_ratio_
    cumulative = np.cumsum(explained)

    print("Factor structure:")
    for i, (var, cum) in enumerate(zip(explained[:5], cumulative[:5])):
        print(f"  PC{i+1}: {var:.1%} variance (cumulative: {cum:.1%})")

    loadings = pca.components_[0]
    print("\nPC1 loadings (BTC factor):")
    for name, load in sorted(zip(pair_names, loadings), key=lambda x: -abs(x[1])):
        print(f"  {name}: {load:.3f}")

    return {
        "explained_variance": explained,
        "n_effective_factors": int(np.searchsorted(cumulative, 0.90)) + 1,
        "pc1_loadings": dict(zip(pair_names, loadings)),
    }

Typical result for a portfolio of 10 crypto pairs:

Component Explained Variance Cumulative
PC1 (BTC) 65% 65%
PC2 12% 77%
PC3 8% 85%
PC4 5% 90%
PC5-PC10 10% 100%

Four factors explain 90% of variance. Out of 10 pairs, no more than 4 are "independent."

Signal Correlation vs. Price Correlation

Here is an important nuance. Price correlation and signal correlation are different things. BTC and ETH prices are correlated at 0.85, but the signals of a specific strategy may be correlated at 0.95 or at 0.50 — depending on the entry logic.

Example: an RSI overbought/oversold strategy. RSI on BTC crosses 30 (oversold) — enter long. ETH at the same moment may also be oversold (signal correlation ~0.90). Or it may not be, if ETH was falling more slowly (signal correlation ~0.40).

The correct approach is to measure the correlation of signals specifically, not price series:

import numpy as np
from itertools import combinations

def signal_correlation_matrix(
    signals: dict,  # {pair: np.array of 0/1 per minute}
    method: str = "pearson",
) -> np.ndarray:
    """
    Calculate the signal correlation matrix (binary: 0 = flat, 1 = in position).

    Args:
        signals: dictionary {pair_name: binary_signal_array}
        method: correlation method ("pearson", "jaccard")
    """
    pairs = sorted(signals.keys())
    n = len(pairs)
    corr_matrix = np.ones((n, n))

    for i, j in combinations(range(n), 2):
        s_i = signals[pairs[i]]
        s_j = signals[pairs[j]]

        if method == "pearson":
            corr = np.corrcoef(s_i, s_j)[0, 1]
        elif method == "jaccard":
            intersection = np.sum(s_i & s_j)
            union = np.sum(s_i | s_j)
            corr = intersection / union if union > 0 else 0
        else:
            raise ValueError(f"Unknown method: {method}")

        corr_matrix[i, j] = corr
        corr_matrix[j, i] = corr

    return corr_matrix, pairs


def estimate_correlation_factor(corr_matrix: np.ndarray) -> float:
    """
    Estimate correlation_factor from the signal correlation matrix.

    correlation_factor = 1 + (N-1) * mean_pairwise_correlation

    When correlation is 0 → C_f = 1 (all independent).
    When correlation is 1 → C_f = N (all identical).
    """
    n = corr_matrix.shape[0]
    upper_triangle = corr_matrix[np.triu_indices(n, k=1)]
    mean_corr = np.mean(upper_triangle)

    correlation_factor = 1 + (n - 1) * mean_corr
    return correlation_factor

Temporal Correlation: Calm vs. Panic

Temporal correlation regimes: calm market vs. panic

Correlation is not static. During calm periods, BTC and alts can diverge — ETH rises on Ethereum news, SOL on Solana news. In a crisis, everything collapses into a single factor: risk-on/risk-off.

def rolling_correlation_factor(
    signals: dict,
    window_days: int = 30,
    step_days: int = 7,
) -> list:
    """
    Rolling correlation_factor to detect regime changes.
    """
    pairs = sorted(signals.keys())
    minutes_per_day = 1440
    window = window_days * minutes_per_day
    step = step_days * minutes_per_day

    total_minutes = len(signals[pairs[0]])
    results = []

    for start in range(0, total_minutes - window, step):
        end = start + window
        window_signals = {p: signals[p][start:end] for p in pairs}

        corr_matrix, _ = signal_correlation_matrix(window_signals)
        cf = estimate_correlation_factor(corr_matrix)

        results.append({
            "start_minute": start,
            "end_minute": end,
            "correlation_factor": cf,
            "effective_n": len(pairs) / cf,
        })

    return results

Typical picture for 10 crypto pairs:

Market Regime Average Signal Correlation CfC_f NeffN_{eff}
Sideways (low vol) 0.15-0.25 2.4-3.3 3.0-4.2
Uptrend 0.25-0.40 3.3-4.6 2.2-3.0
Downtrend 0.30-0.50 3.7-5.5 1.8-2.7
Panic (crash) 0.60-0.90 6.4-9.1 1.1-1.6

During panic, 10 pairs compress to 1-2 effective ones. Precisely when diversification is needed most, it vanishes. This is the crypto analogue of the classic "correlations go to 1 in a crisis."

effective_N: The Key Concept

Effective N: dimensionality reduction from correlated pairs

Formula and Derivation

The idea of effective_N is borrowed from statistics, where effective sample size accounts for autocorrelation of observations. For our purposes:

Neff=N1+(N1)ρˉN_{eff} = \frac{N}{1 + (N - 1) \cdot \bar{\rho}}

where ρˉ\bar{\rho} is the average pairwise signal correlation. Simplified notation:

Neff=NCf,Cf=1+(N1)ρˉN_{eff} = \frac{N}{C_f}, \quad C_f = 1 + (N - 1) \cdot \bar{\rho}

Properties:

  • When ρˉ=0\bar{\rho} = 0: Cf=1C_f = 1, Neff=NN_{eff} = N — full independence
  • When ρˉ=1\bar{\rho} = 1: Cf=NC_f = N, Neff=1N_{eff} = 1 — all pairs are identical
  • When ρˉ=0.25\bar{\rho} = 0.25 and N=10N = 10: Cf=3.25C_f = 3.25, Neff=3.08N_{eff} = 3.08

How to Estimate correlation_factor from Data

In practice, there are three approaches:

1. From the signal correlation matrix (exact).

Run the strategy on all pairs, obtain binary signals (0/1 for each minute), build the correlation matrix, compute CfC_f using the formula above.

2. From PCA of price returns (approximate).

If signals strongly depend on price dynamics (momentum, mean-reversion), you can estimate NeffN_{eff} as the number of PCA components explaining 90% of variance.

3. From asset class heuristics (rough).

Asset Class Typical CfC_f
Crypto (top-10) 2.5-4.0
Crypto (with DeFi/memecoins) 2.0-3.0
Forex (majors) 1.5-2.5
Stocks (single sector) 2.0-3.5
Stocks (cross-sector) 1.2-1.8

For a crypto portfolio of BTC, ETH, SOL, AVAX, MATIC, DOGE, DOT, LINK, UNI, ATOM, a safe estimate is Cf3C_f \approx 3.

Slot Utilization Modeling

Orchestrator slot utilization dashboard

Formula for P(1 active)P(\geq 1\ \text{active})

The base formula accounting for correlation:

P(1 active)=1(1p)NeffP(\geq 1\ \text{active}) = 1 - (1 - p)^{N_{eff}}

Table for different strategies and number of pairs (Cf=3C_f = 3):

Strategy pp (trading time) 5 pairs (Neff=1.7N_{eff}=1.7) 10 pairs (Neff=3.3N_{eff}=3.3) 20 pairs (Neff=6.7N_{eff}=6.7) 50 pairs (Neff=16.7N_{eff}=16.7)
Strategy B 5% 8.2% 15.6% 29.1% 58.0%
Strategy A 15% 23.6% 41.8% 65.9% 92.8%
Strategy C 45% 67.1% 89.0% 98.8% ~100%

For Strategy B with 5% activity, you need 50 pairs just to have at least one active position half the time. And that doesn't even account for the fact that 50 crypto pairs are more strongly correlated than 10.

Multi-Slot Orchestrator

A real orchestrator manages multiple slots simultaneously. If you have 5 slots and 10 pairs, utilization is calculated differently:

utilization=min(E[active],max_slots)max_slots\text{utilization} = \frac{\min(E[\text{active}], \text{max\_slots})}{\text{max\_slots}}

E[active]=NeffpE[\text{active}] = N_{eff} \cdot p

def estimate_fill_efficiency(
    trading_time_pct: float,
    n_pairs: int,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Analytical estimate of fill efficiency for a multi-slot orchestrator.

    Args:
        trading_time_pct: fraction of active time for one strategy on one pair
        n_pairs: number of trading pairs
        correlation_factor: signal correlation coefficient
        max_slots: maximum number of simultaneous positions

    Returns:
        dict with utilization metrics
    """
    effective_n = n_pairs / correlation_factor

    p_at_least_one = 1 - (1 - trading_time_pct) ** effective_n

    expected_active = effective_n * trading_time_pct

    utilization = min(expected_active, max_slots) / max_slots

    fill_efficiency = min(p_at_least_one, utilization)

    return {
        "effective_n": effective_n,
        "p_at_least_one": p_at_least_one,
        "expected_active": expected_active,
        "utilization": utilization,
        "fill_efficiency": fill_efficiency,
    }


configs = [
    ("Strategy B, 10 pairs, 1 slot", 0.05, 10, 3.0, 1),
    ("Strategy B, 10 pairs, 3 slots", 0.05, 10, 3.0, 3),
    ("Strategy B, 30 pairs, 1 slot", 0.05, 30, 3.0, 1),
    ("Strategy A, 10 pairs, 1 slot", 0.15, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 1 slot", 0.45, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 5 slots", 0.45, 10, 3.0, 5),
]

for name, p, n, cf, slots in configs:
    result = estimate_fill_efficiency(p, n, cf, slots)
    print(f"{name}:")
    print(f"  N_eff = {result['effective_n']:.1f}")
    print(f"  P(≥1 active) = {result['p_at_least_one']:.1%}")
    print(f"  E[active] = {result['expected_active']:.2f}")
    print(f"  fill_efficiency = {result['fill_efficiency']:.1%}")
    print()

Expected output:

Strategy B, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 15.6%

Strategy B, 10 pairs, 3 slots:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 5.6%

Strategy B, 30 pairs, 1 slot:
  N_eff = 10.0
  P(≥1 active) = 40.1%
  E[active] = 0.50
  fill_efficiency = 40.1%

Strategy A, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 41.8%
  E[active] = 0.50
  fill_efficiency = 41.8%

Strategy C, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 89.0%

Strategy C, 10 pairs, 5 slots:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 30.0%

Note: Strategy B with 3 slots and 10 pairs shows fill_efficiency of 5.6%. Three slots are pointless when the expected number of active pairs is only 0.17. Slots should be allocated proportionally to expected load.

Simulation from Real Data

The analytical model is an approximation. For accurate estimation, you need simulation on real signals:

import numpy as np

def simulate_fill_efficiency(
    all_signals: dict,       # {(strategy, pair): [(entry_min, exit_min), ...]}
    max_slots: int = 10,
    test_period_minutes: int = 750 * 24 * 60,  # 750 days
    priority_fn=None,        # priority function for position selection
) -> dict:
    """
    Simulate real slot load of the orchestrator.

    For each minute: count how many pairs want to enter a position,
    and how many slots are actually occupied (accounting for the limit).

    Args:
        all_signals: signals by pairs and strategies
        max_slots: maximum number of simultaneous positions
        test_period_minutes: length of the test period in minutes
        priority_fn: if None — FIFO; otherwise — ranking function
    """
    demand_timeline = np.zeros(test_period_minutes, dtype=np.int32)
    capped_timeline = np.zeros(test_period_minutes, dtype=np.int32)

    for signals in all_signals.values():
        for entry_min, exit_min in signals:
            if entry_min < test_period_minutes:
                end = min(exit_min, test_period_minutes)
                demand_timeline[entry_min:end] += 1

    capped_timeline = np.minimum(demand_timeline, max_slots)

    total_demand = np.sum(demand_timeline)
    total_filled = np.sum(capped_timeline)
    time_with_any_active = np.sum(demand_timeline > 0)

    fill_efficiency = np.mean(capped_timeline) / max_slots
    demand_fill_ratio = total_filled / total_demand if total_demand > 0 else 0
    time_utilization = time_with_any_active / test_period_minutes

    slot_distribution = {}
    for s in range(max_slots + 1):
        slot_distribution[s] = np.mean(capped_timeline == s)

    return {
        "fill_efficiency": fill_efficiency,
        "demand_fill_ratio": demand_fill_ratio,
        "time_utilization": time_utilization,
        "avg_demand": np.mean(demand_timeline),
        "avg_filled": np.mean(capped_timeline),
        "slot_distribution": slot_distribution,
        "overflow_pct": np.mean(demand_timeline > max_slots),
    }

Simulation on real data often shows even lower utilization than the analytical estimate, because it accounts for temporal clustering of signals: all pairs enter simultaneously in a cluster, creating overflow, and then all go silent, creating a void.

How Many Pairs to Monitor? Diminishing Returns Analysis

Effective N and diminishing returns curve

The key question: at what NN does adding one more pair stop noticeably increasing fill_efficiency?

import numpy as np

def diminishing_returns_analysis(
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_pairs: int = 100,
    target_utilization: float = 0.80,
) -> dict:
    """
    Diminishing returns analysis from adding new pairs.
    """
    results = []
    target_n = None

    for n in range(1, max_pairs + 1):
        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        marginal = 0
        if n > 1:
            prev_eff = (n - 1) / correlation_factor
            prev_p = 1 - (1 - trading_time_pct) ** prev_eff
            marginal = p_active - prev_p

        results.append({
            "n_pairs": n,
            "n_effective": n_eff,
            "p_at_least_one": p_active,
            "marginal_gain": marginal,
        })

        if target_n is None and p_active >= target_utilization:
            target_n = n

    return {
        "results": results,
        "target_n_for_utilization": target_n,
    }


analysis_b = diminishing_returns_analysis(0.05, correlation_factor=3.0, target_utilization=0.80)
print(f"Strategy B: need {analysis_b['target_n_for_utilization']} pairs for 80% P(≥1)")

for r in analysis_b["results"]:
    if r["n_pairs"] in [1, 3, 5, 10, 20, 30, 50, 80]:
        print(f"  N={r['n_pairs']:3d}: N_eff={r['n_effective']:.1f}, "
              f"P(≥1)={r['p_at_least_one']:.1%}, "
              f"marginal={r['marginal_gain']:.2%}")

Results for Strategy B (p=0.05p = 0.05, Cf=3C_f = 3):

NN pairs NeffN_{eff} P(1)P(\geq 1) Marginal gain
1 0.3 1.7%
3 1.0 5.0% +1.7%
5 1.7 8.2% +1.6%
10 3.3 15.6% +1.4%
20 6.7 29.1% +1.1%
30 10.0 40.1% +0.9%
50 16.7 58.0% +0.6%
80 26.7 74.5% +0.4%

For Strategy B, reaching 80% single-slot utilization is impossible even with 100 pairs (you need ~96 pairs). This is a fundamental limitation: a strategy with 5% trading time is not suited for single-slot operation — it needs a portfolio approach with multiple strategies.

For Strategy A (p=0.15p = 0.15, Cf=3C_f = 3):

NN pairs NeffN_{eff} P(1)P(\geq 1) Marginal gain
5 1.7 23.6%
10 3.3 41.8% +3.3%
20 6.7 65.9% +2.1%
30 10.0 80.3% +1.2%

Strategy A reaches 80% utilization at ~30 pairs. Marginal gain at the 30th pair is only +1.2%.

For Strategy C (p=0.45p = 0.45, Cf=3C_f = 3):

NN pairs NeffN_{eff} P(1)P(\geq 1)
3 1.0 45.0%
5 1.7 67.1%
10 3.3 89.0%
15 5.0 95.0%

Strategy C with 45% trading time reaches 90% utilization at just 10 pairs. Adding more is pointless.

Edge Degradation Across Pairs

Edge degradation across trading pairs

There is another factor that limits the number of pairs: edge degradation. A strategy developed and optimized on BTC/USDT may perform worse on less liquid alts.

Causes of degradation:

  • Liquidity: slippage on DOGE/USDT is several times higher than on BTC/USDT
  • Spreads: less liquid pairs have wider bid-ask spreads
  • Microstructure: order book patterns differ between pairs
  • Manipulation: low-liquidity alts are susceptible to pump-and-dump
def edge_decay_analysis(
    strategy_results: dict,  # {pair: {"pnl_per_day": float, "n_trades": int}}
    min_trades: int = 30,
) -> list:
    """
    Rank pairs by edge accounting for degradation.
    """
    ranked = []
    for pair, metrics in strategy_results.items():
        if metrics["n_trades"] < min_trades:
            continue
        ranked.append({
            "pair": pair,
            "pnl_per_day": metrics["pnl_per_day"],
            "n_trades": metrics["n_trades"],
            "sharpe": metrics.get("sharpe", 0),
        })

    ranked.sort(key=lambda x: x["pnl_per_day"], reverse=True)

    cumulative_pnl = []
    running_sum = 0
    for i, r in enumerate(ranked):
        running_sum += r["pnl_per_day"]
        avg = running_sum / (i + 1)
        cumulative_pnl.append({
            "n_pairs": i + 1,
            "last_added": r["pair"],
            "last_pnl_per_day": r["pnl_per_day"],
            "avg_pnl_per_day": avg,
        })

    return cumulative_pnl

Typical picture:

# pairs Last Added PnL/day of last Average PnL/day
1 BTC/USDT 0.89% 0.89%
2 ETH/USDT 0.82% 0.86%
3 SOL/USDT 0.71% 0.81%
5 AVAX/USDT 0.55% 0.73%
8 DOT/USDT 0.31% 0.61%
10 DOGE/USDT 0.12% 0.53%

Adding the 10th pair lowers the portfolio's average PnL/day. By the 8th pair, the edge is already half that of the best. A balance is needed between fill_efficiency (grows with the number of pairs) and average edge (falls).

Optimal Number of Pairs: Unified Model

Optimal pair count: fill efficiency vs. average edge intersection

We combine fill_efficiency and edge decay into a single metric — expected portfolio PnL per day:

Portfolio PnL/day=avg_edge(N)×fill_efficiency(N)×365\text{Portfolio PnL/day} = \text{avg\_edge}(N) \times \text{fill\_efficiency}(N) \times 365

def optimal_pairs_count(
    pair_edges: list,           # PnL/day in descending order: [0.89, 0.82, 0.71, ...]
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Find the optimal number of pairs that maximizes portfolio PnL.
    """
    best_n = 0
    best_score = 0
    results = []

    for n in range(1, len(pair_edges) + 1):
        avg_edge = np.mean(pair_edges[:n])

        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        expected_active = n_eff * trading_time_pct
        utilization = min(expected_active, max_slots) / max_slots
        fill_eff = min(p_active, utilization)

        portfolio_score = avg_edge * fill_eff * 365

        results.append({
            "n_pairs": n,
            "avg_edge": avg_edge,
            "fill_efficiency": fill_eff,
            "portfolio_annualized": portfolio_score,
        })

        if portfolio_score > best_score:
            best_score = portfolio_score
            best_n = n

    return {
        "optimal_n": best_n,
        "optimal_score": best_score,
        "results": results,
    }


edges = [0.89, 0.82, 0.71, 0.65, 0.55, 0.48, 0.40, 0.31, 0.22, 0.12,
         0.08, 0.05, 0.02, -0.01, -0.05]

opt = optimal_pairs_count(edges, trading_time_pct=0.15, correlation_factor=3.0)
print(f"Optimal number of pairs: {opt['optimal_n']}")
print(f"Portfolio annualized: {opt['optimal_score']:.1f}%")

for r in opt["results"]:
    print(f"  N={r['n_pairs']:2d}: avg_edge={r['avg_edge']:.2f}%, "
          f"fill_eff={r['fill_efficiency']:.1%}, "
          f"portfolio={r['portfolio_annualized']:.1f}%")

The optimum is typically found at the point where the marginal fill_efficiency from adding a pair no longer compensates for the decline in average edge. For a typical crypto portfolio:

  • Strategy B (5% time): optimum at 8-12 pairs
  • Strategy A (15% time): optimum at 6-10 pairs
  • Strategy C (45% time): optimum at 4-6 pairs

A paradox: the strategy with the lowest trading time benefits from the most pairs, yet fill_efficiency still remains low. The solution is not more pairs, but combining with other strategies (see Combo Strategies).

Cross-Pair Diversification: Strategies for Reducing Correlation

Cross-strategy diversification network

If you cannot increase the number of pairs indefinitely, you can reduce CfC_f — that is, increase signal diversity.

Strategy 1: Mix of Liquid and DeFi Tokens

BTC, ETH, BNB are very strongly correlated. But UNI (DEX), AAVE (lending), CRV (stablecoins) may have their own drivers. Adding DeFi tokens reduces the average ρˉ\bar{\rho} from 0.35 to 0.20-0.25:

Cf=1+9×0.20=2.8(vs. 3.25 at ρˉ=0.25)C_f = 1 + 9 \times 0.20 = 2.8 \quad \text{(vs. 3.25 at } \bar{\rho} = 0.25\text{)}

Strategy 2: Different Strategies on the Same Pairs

Instead of 10 pairs with one strategy — 5 pairs with two different strategies. If the strategies are based on different principles (momentum vs. mean-reversion), their signals can be anti-correlated:

  • Momentum enters long when price rises
  • Mean-reversion enters long when price falls

ρˉcross-strategy<0    Cf<1    Neff>N\bar{\rho}_{\text{cross-strategy}} < 0 \implies C_f < 1 \implies N_{eff} > N

This is the only way to get Neff>NN_{eff} > N — use strategies with negative signal correlation.

Strategy 3: Spot vs. Futures

Funding rate arbitrage and spot trading have a different correlation structure. Adding arbitrage strategies to the portfolio significantly reduces the overall CfC_f, because arbitrage by definition exploits divergences, not convergences.

Practical Recommendations by Strategy Type

High-Frequency Low-Time Strategies (trading time < 10%)

Typical representative: Strategy B (5% time, 38 trades over 750 days).

  • Number of pairs: 10-15 (optimum for edge/fill balance)
  • Problem: fill_efficiency is low even with 15 pairs (~20-25%)
  • Solution: mandatory combination with other strategies in the orchestrator
  • CfC_f for crypto: 2.5-3.5
  • Monitoring: rolling CfC_f to adapt pair count to market regime

Medium-Time Strategies (trading time 10-30%)

Typical representative: Strategy A (15% time, 418 trades over 750 days).

  • Number of pairs: 6-10
  • Fill_efficiency at 10 pairs: ~40%
  • Solution: combining 2-3 such strategies fills 80%+ of the time
  • CfC_f for crypto: 2.5-3.5
  • Focus: select pairs with maximum edge, don't chase quantity

High-Time Strategies (trading time > 30%)

Typical representative: Strategy C (45% time).

  • Number of pairs: 4-6
  • Fill_efficiency at 6 pairs: ~80%
  • Problem: overflow — multiple pairs active simultaneously, but fewer slots
  • Solution: increase max_slots or add pair prioritization
  • CfC_f for crypto: 2.5-4.0 (higher due to long positions overlapping crises)

Summary Table

Parameter Strategy B (5%) Strategy A (15%) Strategy C (45%)
Recommended NN 10-15 6-10 4-6
NeffN_{eff} at Cf=3C_f=3 3.3-5.0 2.0-3.3 1.3-2.0
Fill eff. (1 slot) 15-23% 32-42% 77-89%
Combination needed? Mandatory Desirable No
Bottleneck Few signals Balance Overflow

Connection to Other Metrics in the Series

This article is the eleventh in the "Backtests Without Illusions" series. Signal correlation directly impacts the metrics from previous articles:

  • PnL per Active Time: fill_efficiency is the key multiplier in the effective return formula. If you overestimate fill_efficiency by ignoring correlation, your portfolio PnL forecast will be overly optimistic.

  • Funding rates: with high correlation, positions open simultaneously — and funding costs grow linearly with the number of slots. Overflow + funding = accelerated capital burn.

  • Funding rate arbitrage: arbitrage strategies are a natural diversifier that reduces the portfolio's CfC_f. Their signals are weakly correlated with momentum and mean-reversion strategies.

  • Combo strategies (next article): how to assemble a portfolio of strategies with different pp and CfC_f to achieve 90%+ utilization. Cascade orchestration accounts for signal correlation when assigning priorities.

Conclusion

Diversification in crypto is not about the number of pairs. 10 correlated pairs yield the effect of 3-4 independent ones. During panic, even fewer.

Four takeaways:

  1. Calculate effective_N, not N. For crypto pairs Cf3C_f \approx 3. Ten pairs are ~3.3 effective ones. Plan fill_efficiency based on NeffN_{eff}, not NN.

  2. Measure signal correlation, not price correlation. Price correlation is a proxy, not a substitute. Run the strategy on all pairs and compute the binary signal correlation matrix.

  3. Account for edge degradation. More pairs means lower average PnL/day. The optimum is at the point where marginal fill_efficiency from a new pair still compensates for the edge decline.

  4. Reduce CfC_f, don't increase NN. Combining different strategies on the same pairs is more effective than one strategy on more pairs. Cross-strategy diversification can yield Neff>NN_{eff} > N.

Correlation factor is the hidden variable that determines how realistic your utilization and return forecasts are. Ignoring it means building a portfolio on illusions.


Useful Links

  1. Markowitz, H. — Portfolio Selection (1952)
  2. López de Prado — Advances in Financial Machine Learning: Denoising and Detoning
  3. Ledoit, O. & Wolf, M. — Honey, I Shrunk the Sample Covariance Matrix (2004)
  4. Laloux, L. et al. — Noise Dressing of Financial Correlation Matrices (1999)
  5. Cont, R. — Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues
  6. Ernest Chan — Algorithmic Trading: Portfolio Construction and Risk Management
  7. Rebonato, R. & Jäckel, P. — The Most General Methodology for Creating a Valid Correlation Matrix

Citation

@article{soloviov2026signalcorrelation,
  author = {Soloviov, Eugen},
  title = {Signal Correlation: How Many Pairs to Monitor},
  year = {2026},
  url = {https://marketmaker.cc/ru/blog/post/signal-correlation-pairs},
  version = {0.1.0},
  description = {Why 10 crypto pairs don't provide 10x diversification, how to calculate effective\_N via correlation\_factor, and how many pairs you really need to monitor for 80-90\% orchestrator slot utilization.}
}
blog.disclaimer

MarketMaker.cc Team

Сандык изилдөөлөр жана стратегия

Telegram-да талкуулоо
Newsletter

Рынктан бир кадам алдыда болуңуз

AI соода аналитикасы, рынок талдоолору жана платформа жаңылыктары үчүн биздин жаңылыктар бюллетенине жазылыңыз.

Биз сиздин купуялыгыңызды урматтайбыз. Каалаган убакта жазылымдан чыга аласыз.