finn-mcp/refactor.md

# PRD: finn-mcp v2

## Current State (from codebase + DB inspection)

### What already works
- **SQLite database** (`data/finn.sqlite`) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
- **Hash-aware caching architecture** is designed (see `cache.py` docstring)
- **Transport scoring** is implemented (`score_transport` uses lat/lng from Eiendom.no)
- **`listing_description`** is stored in the `FinnAd` model
- **`finn_analyze_unit_images`** downloads, resizes to 1024px, returns as `ImageContent` — Claude sees images directly

### Critical bugs discovered
- **Analysis cache is dead.** `analysis_cache` table has **0 rows**. Every search recomputes scoring from scratch.
- **`content_hash` is NULL on every row** in `finn_ads`, `eiendom_units`, `similar_units` — 100% NULL across 427 rows. The `_compute_deps_hash` function therefore returns a deterministic hash of empty strings on every call.
- Schema dump shows `, content_hash TEXT)` appended — column was added via `ALTER TABLE` after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
- **Only 36 of 222 ads** have `eiendom_unit_code` populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
- **Search page cache** (`cache_meta`) all rows expired May 16 — 60-min TTL is far too short.

### Known design problems
- **`feedback.py` is a stub** — all three functions are `# TODO`, nothing is persisted. No `user_feedback` table.
- No `price_history` table.
- No `search_runs` table with finnkodes per search.
- **`listing_description` is actively stripped** in `_slim_listing()` in `mcp_server.py`.
- **`detail_limit`** means only N listings get full Eiendom.no analysis — the rest are unscored.
- **No batch analysis** — analyzing 46 listings requires 46 sequential MCP calls.
- **12 tools**, 7 of which are internal plumbing.
- **Cache TTLs are far too short** — 24h on listing data forces full re-fetch on day-2 repeat searches.

---

## Goals

1. **Fix the broken cache first** — current cache promises nothing and delivers nothing
2. **Long-lived caching** with smart freshness checks — listing structural data doesn't change, treat it accordingly
3. **6 tools** — one per user intent
4. **Batch analysis** — analyze many listings in one call
5. **Persistent enrichment** — missing tables, feedback implementation
6. **Output matches intent** — each tool returns only what is relevant
7. **`listing_description` available** for AI interpretation in `finn_analyze_ad`

---

## Architecture

### Caching strategy (revised)

Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.

**Two-tier model:**

```
┌────────────────────────────────────────────────────────────────┐
│  STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│  - finn_ads.payload (description, area, year, etc.)            │
│  - eiendom_units.payload (lat, lng, property_type, etc.)       │
│  - similar_units.payload (completed sales — immutable)         │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│  VOLATILE DATA (short TTL, cheap refresh)                       │
│  - price, status, days_on_market                                │
│  - eiendom_units.estimated_selling_price                        │
└────────────────────────────────────────────────────────────────┘
```

### Cache TTLs (revised)

| Data | TTL | Refresh strategy |
|------|-----|-----------------|
| FINN ad structural | **30 days** | Full refetch only |
| FINN ad price/status | **6 hours** | Lightweight check, falls back to full refetch if status changed |
| Eiendom.no unit structural | **30 days** | Full refetch only |
| Eiendom.no estimate | **7 days** | Refresh on access |
| Similar units (sold comps) | **60 days** | Immutable rows; new rows appear over time |
| Search pages | **6 hours** | Content-hash check, only re-scrape if list actually changed |
| Analysis result | **Never expires** | Invalidated by `deps_hash` change |

**Lightweight price/status check:** A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small `price_widget` block — much cheaper than the full ad page. If price unchanged, bump `last_verified_at`; if changed, full refetch.

### Database schema changes

```sql
-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.

-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
    finnkode    TEXT PRIMARY KEY,
    verdict     TEXT NOT NULL,  -- 'liked' | 'disliked' | 'maybe' | 'visited'
    notes       TEXT,
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);

-- New: price history (append-only)
CREATE TABLE price_history (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    finnkode     TEXT NOT NULL,
    total_price  INTEGER,
    asking_price INTEGER,
    sale_status  TEXT,
    recorded_at  TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);

-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    search_url  TEXT NOT NULL,
    finnkodes   TEXT NOT NULL,  -- JSON array
    created_at  TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);

-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
```

---

## Tools (v2) — 6 total

### 1. `finn_analyze_search`

**Intent:** Ranked list of all listings in this search.

```typescript
Input:
  search_url: string
  refresh?: boolean      // force re-fetch even if cache is valid
  max_pages?: number     // default 5

Output:
  total: number
  cache_status: {
    listings_from_cache: number
    listings_refreshed: number
    listings_freshly_scraped: number
  }
  listings: Array<{
    finnkode, rank, score, url, address, district,
    area_m2, bedrooms, floor, construction_year,
    total_price, common_costs, shared_debt, sqm_price,
    price_vs_estimate,  // negative = below estimate
    market_placement, dom, categories, risks
  }>
```

**Behaviour:** Returns ALL scraped listings, not limited by `detail_limit`. Listings without enrichment get `score: null`. Lazy enrichment is triggered by `finn_analyze_ad`.

### 2. `finn_analyze_ad`

**Intent:** Deep-dive into one or more specific listings.

```typescript
Input:
  finnkode: string | string[]   // single or batch
  refresh?: boolean             // bypass cache

Output:
  // Single string input → single object
  // Array input → array of objects in same order
  finnkode: string
  url: string
  address: string
  listing_description: string   // ← INCLUDED for AI interpretation
  score: {
    total: number
    breakdown: Record<string, number>
    nearby_transit: { tbane: [...], trikk: [...] }
  }
  price: {
    total, asking, shared_debt, common_costs, sqm_price,
    estimate, estimate_lower, estimate_upper,
    vs_estimate, market_placement
  }
  property: {
    type, ownership, area_m2, bedrooms, floor,
    construction_year, has_balcony, has_elevator, has_garage
  }
  market: {
    dom, sale_status, avg_comp_sqm_price, comp_count,
    comps: Array<{address, usable_area, floor, construction_year,
                  selling_price, sqm_price, days_on_market, finalized_at}>  // top 15
  }
  price_history: Array<{ total_price, asking_price, recorded_at }>
  categories: string[]
  risks: string[]
  cache_age: {
    structural_days: number    // age of last full refetch
    price_hours: number        // age of last price verification
  }
```

**Batch behaviour:** Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have `{finnkode, error: "..."}` shape.

### 3. `finn_analyze_unit_images`

**Intent:** Visual assessment — condition, views, room feel.

Unchanged from current implementation. Returns `ImageContent` blocks, not URLs.

```typescript
Input:
  unit_code: string
  max_images?: number   // default 8
```

### 4. `finn_get_new_ads_since_last_run`

**Intent:** What has changed since I last checked this search?

```typescript
Input:
  search_url: string

Output:
  new_ads: Array<{finnkode, address, score, total_price, categories, url}>
  removed_ads: Array<{finnkode, address}>
  changed_ads: Array<{
    finnkode, address,
    changes: Array<{field, from, to}>   // typically price/status
  }>
  since: string  // ISO timestamp of previous run
```

### 5. `finn_save_feedback`

**Intent:** Save my verdict on a listing.

```typescript
Input:
  finnkode: string
  verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
  notes?: string

Output:
  ok: boolean
  finnkode: string
  verdict: string
```

### 6. `finn_get_shortlist`

**Intent:** Show me reviewed listings, or find similar to one I liked.

```typescript
Input:
  verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
  find_similar_to?: string  // finnkode — return listings similar to this
  min_score?: number
  limit?: number            // default 10

Output:
  listings: Array<{
    finnkode, address, score, total_price,
    verdict?, notes?, categories, url
  }>
```

---

## Tools removed

| Tool | Reason |
|------|--------|
| `finn_build_unit_vector` | Internal impl detail |
| `finn_decode_unit_vector` | Debug utility, no user value |
| `finn_resolve_eiendom_unit` | Internal mapping, runs automatically in `analyze_ad` |
| `finn_get_ad` | Raw fetch without scoring — `analyze_ad` covers it |
| `finn_get_eiendom_unit` | Raw Eiendom.no fetch, internal |
| `finn_get_similar_units` | Takes unit_vector directly, internal |
| `finn_analyze_ad_against_comps` | Absorbed into `analyze_ad` (comps always included) |
| `finn_compare_ads` | Absorbed into `analyze_ad(finnkode: string[])` |
| `finn_find_similar_to_liked_ad` | Absorbed into `get_shortlist(find_similar_to=finnkode)` |

12 → 6 tools. No user intent is lost. Batch use case now native via `analyze_ad`.

---

## Workflows & optimizations

### Lazy enrichment on demand
`analyze_search` returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have `score: null`. First `analyze_ad(finnkode)` call enriches and caches. Next `analyze_search` shows the now-cached score. Eliminates `detail_limit` as a user-facing parameter.

### Background freshness check
On `analyze_search` cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.

### Re-score without refetch
Scoring weights are configurable. If the user changes weights, re-score from cached `finn_ads` + `eiendom_units` + `similar_units` without any network calls. Invalidates `analysis_cache` only, not raw data.

### Price drop detection
`price_history` table enables `finn_get_shortlist(price_dropped_since: timestamp)` — surface listings that dropped price recently. Built on existing append-only writes.

### Cache warming on save_feedback
When `verdict='liked'`, pre-fetch similar units in background. Next `find_similar_to=finnkode` call is instant.

### Batch enrichment via parallel Eiendom.no
Current enrichment is sequential per ad. Parallel-batch up to N at a time via `asyncio.gather` already exists in `analyze_search` — use the same pattern in `analyze_ad(finnkode: string[])`.

### Cache inspection
Internal-only — useful for debugging. Add a `--cache-status` CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.

---

## Output principles

**Never in any tool response:**
- `unit_vector` / raw Eiendom.no vector
- `unit_images` URL lists (use `finn_analyze_unit_images`)
- Internal timestamps (`fetched_at`, `detail_fetched_at`, `computed_at`)
- `lat` / `lng` coordinates

**`listing_description`:**
- **Not** in `finn_analyze_search` — too long, 77 × 500 words = noise
- **Yes** in `finn_analyze_ad` — AI needs it to interpret risk flags, clauses, edge cases

---

## Migration plan

### Phase 0 — Fix the broken cache (BLOCKER)

Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.

- [ ] **Audit the running deployment.** Compare the deployed `cache.py` to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
- [ ] **Backfill content_hash for existing rows.** Compute from stored payloads.
- [ ] **Fix `ensure_eiendom_unit_code` persistence.** Only 36/222 ads have `eiendom_unit_code` in their payload — verify the mutation reaches `save_finn_ad` before serialisation.
- [ ] **Verify `save_analysis` actually fires.** Add unit test confirming analysis_cache row count increases after `analyze_ad` call. Currently 0 rows after 222 ad fetches.
- [ ] **Add CLI cache-status command** for ongoing visibility.

**Success criteria:**
- `analysis_cache` populated after any `analyze_search` run
- Repeat `analyze_search` within TTL window: zero network calls, sub-second response
- All `content_hash` columns populated across `finn_ads`, `eiendom_units`, `similar_units`

### Phase 1 — Longer cache TTLs + freshness model

- [ ] Update `config.py` TTLs (see table above)
- [ ] Add `last_verified_at` column to `finn_ads`
- [ ] Implement lightweight price/status check (HEAD or `price_widget` scrape)
- [ ] On cache hit, kick off async refresh if `last_verified_at` is stale
- [ ] Update `_is_fresh` logic to use TTL only on `last_verified_at`, not `fetched_at`

**Success criteria:**
- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
- Same listing fetched today: returns from cache, no network call
- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis

### Phase 2 — Missing tables and stub implementations

- [ ] Create `user_feedback`, `price_history`, `search_runs` tables
- [ ] Implement `feedback.py` — replace all TODO stubs with DB writes
- [ ] Populate `price_history` on every `save_finn_ad` call (append-only)
- [ ] Populate `search_runs` on every `analyze_search` call

**Success criteria:**
- `finn_save_feedback` writes to DB; `finn_get_shortlist(verdict=...)` returns it
- `finn_get_new_ads_since_last_run` returns real diff from last run
- `price_history` populated when a re-fetched ad has changed price

### Phase 3 — Output payload cleanup (no breaking tool changes)

- [ ] Stop stripping `listing_description` in `_slim_listing()` for `analyze_ad`
- [ ] Remove `unit_images`, `unit_vector`, internal timestamps from `analyze_ad` response
- [ ] Add `price_history` and `cache_age` to `analyze_ad` response
- [ ] Add `price_vs_estimate` and `cache_status` to `analyze_search` response

**Success criteria:**
- `finn_analyze_search` on 30 listings: < 50KB
- `finn_analyze_ad` per listing: < 8KB excluding description, < 12KB including

### Phase 4 — Consolidate to 6 tools + batch (breaking change)

- [ ] Remove the 9 redundant tools from `mcp_server.py`
- [ ] Update `finn_analyze_ad` to accept `string | string[]` — single or batch
- [ ] Add `find_similar_to` parameter to `finn_get_shortlist`
- [ ] Always include comps in `analyze_ad` — drop `include_eiendom_no` / `include_similar_units` flags
- [ ] Migrate all `test_mcp_integration.py` tests to new tool surface

**Success criteria:**
- `finn_analyze_ad(["a", "b", "c"])`: one round trip, parallel internal fetch
- All existing use cases covered by 6 tools

### Phase 5 — Lazy enrichment + workflow additions

- [ ] `analyze_search` returns all scraped listings, not just `detail_limit` count
- [ ] Listings without enrichment get `score: null`, enriched on first `analyze_ad` call
- [ ] Background warm-up on `save_feedback(liked)` → pre-fetch similar units
- [ ] Re-score endpoint (or flag) that rebuilds scores from cached raw data

**Success criteria:**
- `analyze_search` on 77-result search: all 77 returned, no `detail_limit` truncation
- Subsequent `analyze_ad` on a previously-unenriched listing: enriches + caches + returns
- Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no

---

## Success metrics

| Metric | Now | Target |
|--------|-----|--------|
| Number of tools | 12 | 6 |
| `content_hash` populated rows | 0% | 100% |
| `analysis_cache` row count after search | 0 | matches analyzed_listings |
| `eiendom_unit_code` populated in stored ads | 36/222 (16%) | ~95% (resale only) |
| `listing_description` available to AI | No | Yes (in `finn_analyze_ad`) |
| Feedback actually persisted | No (stub) | Yes |
| `finn_analyze_search` payload (30 ads) | ~215KB | < 50KB |
| `finn_analyze_ad` payload per ad | ~40KB | < 12KB |
| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
| Listings unscored due to `detail_limit` | 47 of 77 | 0 (lazy enrichment) |
| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
| FINN ad structural TTL | 24h | 30 days |