# LTR Offline Trainer (RankSVM-like)

This trainer learns ranking weights from labeled query-document data and exports a **multi-profile** `rank_weights.conf` used by `CalcPageRank`.

## Input CSV

Required columns:

- `query_id`
- `url`
- `level`
- `title_len`
- `text_len`
- `label` (higher = better)

Optional column:

- `intent` (or another column passed with `--intent-col`) used to train per-profile experts.

## Train

```bash
python3 tools/ltr/train_rank_ltr.py \
  --input /path/to/ltr_dataset.csv \
  --intent-col intent \
  --runs 200 \
  --train-ratio 0.8 \
  --c 1.0 \
  --out-conf rank_weights.conf
```

Outputs:

- `rank_weights.conf` (runtime multi-profile config)
- `tools/ltr/ltr_report.json` (default + per-intent Monte Carlo summary, including `rows`/`queries`/`pairs`)

## Config format

The generated config is INI-like:

```ini
[default]
w_level=...
w_title=...
w_text=...
std_level=...
std_title=...
std_text=...

[blog]
...

[ecommerce]
...
```

`default` is always produced and used as fallback.

## Runtime integration

`rank.h` loads all sections once and picks a profile for each host:

- Manual mapping from `host_intent(host_id, profile, confidence)` if available.
- Heuristic fallback from host URL/title signals (with optional DB cache in `host_intent_cache`).
- Safe fallback chain: selected profile -> `default` -> hardcoded defaults.

The scoring engine is unchanged: same linear score + Monte Carlo perturbation + stability metric.

Optional SQL bootstrap for manual mapping is provided in `sql/host_intent.sql`.
Operational cache invalidation/debug snippets are provided in `sql/host_intent_cache_ops.sql`.
Dashboard/analytics queries for `host_rank_metrics` are in `sql/host_rank_metrics_dashboard.sql`.
