feat(refactor): Document refactoring progress and phases in markdown

feat(scripts): Add backfill script for content_hash in cache tables feat(scripts): Create recompute script for analysis_cache population test(tests): Implement comprehensive tests for analysis module functions fix(tests): Update CLI tests to assert errors on stderr instead of stdout fix(tests): Adjust MCP integration tests to pass context parameter correctly fix(tests): Modify service tests to return hash on save functions for consistency
2026-05-29 15:16:57 +00:00
parent 5b772b2ae5
commit 55d93894ac
18 changed files with 1457 additions and 60 deletions
@@ -0,0 +1,416 @@
+# PRD: finn-mcp v2
+
+## Current State (from codebase + DB inspection)
+
+### What already works
+- **SQLite database** (`data/finn.sqlite`) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
+- **Hash-aware caching architecture** is designed (see `cache.py` docstring)
+- **Transport scoring** is implemented (`score_transport` uses lat/lng from Eiendom.no)
+- **`listing_description`** is stored in the `FinnAd` model
+- **`finn_analyze_unit_images`** downloads, resizes to 1024px, returns as `ImageContent` — Claude sees images directly
+
+### Critical bugs discovered
+- **Analysis cache is dead.** `analysis_cache` table has **0 rows**. Every search recomputes scoring from scratch.
+- **`content_hash` is NULL on every row** in `finn_ads`, `eiendom_units`, `similar_units` — 100% NULL across 427 rows. The `_compute_deps_hash` function therefore returns a deterministic hash of empty strings on every call.
+- Schema dump shows `, content_hash TEXT)` appended — column was added via `ALTER TABLE` after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
+- **Only 36 of 222 ads** have `eiendom_unit_code` populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
+- **Search page cache** (`cache_meta`) all rows expired May 16 — 60-min TTL is far too short.
+
+### Known design problems
+- **`feedback.py` is a stub** — all three functions are `# TODO`, nothing is persisted. No `user_feedback` table.
+- No `price_history` table.
+- No `search_runs` table with finnkodes per search.
+- **`listing_description` is actively stripped** in `_slim_listing()` in `mcp_server.py`.
+- **`detail_limit`** means only N listings get full Eiendom.no analysis — the rest are unscored.
+- **No batch analysis** — analyzing 46 listings requires 46 sequential MCP calls.
+- **12 tools**, 7 of which are internal plumbing.
+- **Cache TTLs are far too short** — 24h on listing data forces full re-fetch on day-2 repeat searches.
+
+---
+
+## Goals
+
+1. **Fix the broken cache first** — current cache promises nothing and delivers nothing
+2. **Long-lived caching** with smart freshness checks — listing structural data doesn't change, treat it accordingly
+3. **6 tools** — one per user intent
+4. **Batch analysis** — analyze many listings in one call
+5. **Persistent enrichment** — missing tables, feedback implementation
+6. **Output matches intent** — each tool returns only what is relevant
+7. **`listing_description` available** for AI interpretation in `finn_analyze_ad`
+
+---
+
+## Architecture
+
+### Caching strategy (revised)
+
+Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.
+
+**Two-tier model:**
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│  STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
+│  - finn_ads.payload (description, area, year, etc.)            │
+│  - eiendom_units.payload (lat, lng, property_type, etc.)       │
+│  - similar_units.payload (completed sales — immutable)         │
+└────────────────────────────────────────────────────────────────┘
+┌────────────────────────────────────────────────────────────────┐
+│  VOLATILE DATA (short TTL, cheap refresh)                       │
+│  - price, status, days_on_market                                │
+│  - eiendom_units.estimated_selling_price                        │
+└────────────────────────────────────────────────────────────────┘
+```
+
+### Cache TTLs (revised)
+
+| Data | TTL | Refresh strategy |
+|------|-----|-----------------|
+| FINN ad structural | **30 days** | Full refetch only |
+| FINN ad price/status | **6 hours** | Lightweight check, falls back to full refetch if status changed |
+| Eiendom.no unit structural | **30 days** | Full refetch only |
+| Eiendom.no estimate | **7 days** | Refresh on access |
+| Similar units (sold comps) | **60 days** | Immutable rows; new rows appear over time |
+| Search pages | **6 hours** | Content-hash check, only re-scrape if list actually changed |
+| Analysis result | **Never expires** | Invalidated by `deps_hash` change |
+
+**Lightweight price/status check:** A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small `price_widget` block — much cheaper than the full ad page. If price unchanged, bump `last_verified_at`; if changed, full refetch.
+
+### Database schema changes
+
+```sql
+-- Add to finn_ads
+ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
+-- Tracks when we last confirmed price/status, separate from fetched_at
+-- which tracks when we last did a full refetch.
+
+-- New: user feedback (replaces feedback.py stubs)
+CREATE TABLE user_feedback (
+    finnkode    TEXT PRIMARY KEY,
+    verdict     TEXT NOT NULL,  -- 'liked' | 'disliked' | 'maybe' | 'visited'
+    notes       TEXT,
+    created_at  TEXT NOT NULL,
+    updated_at  TEXT NOT NULL
+);
+
+-- New: price history (append-only)
+CREATE TABLE price_history (
+    id           INTEGER PRIMARY KEY AUTOINCREMENT,
+    finnkode     TEXT NOT NULL,
+    total_price  INTEGER,
+    asking_price INTEGER,
+    sale_status  TEXT,
+    recorded_at  TEXT NOT NULL
+);
+CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);
+
+-- New: search runs (for finn_get_new_ads_since_last_run)
+CREATE TABLE search_runs (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    search_url  TEXT NOT NULL,
+    finnkodes   TEXT NOT NULL,  -- JSON array
+    created_at  TEXT NOT NULL
+);
+CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);
+
+-- Indexes for stale-detection scans
+CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
+CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
+```
+
+---
+
+## Tools (v2) — 6 total
+
+### 1. `finn_analyze_search`
+
+**Intent:** Ranked list of all listings in this search.
+
+```typescript
+Input:
+  search_url: string
+  refresh?: boolean      // force re-fetch even if cache is valid
+  max_pages?: number     // default 5
+
+Output:
+  total: number
+  cache_status: {
+    listings_from_cache: number
+    listings_refreshed: number
+    listings_freshly_scraped: number
+  }
+  listings: Array<{
+    finnkode, rank, score, url, address, district,
+    area_m2, bedrooms, floor, construction_year,
+    total_price, common_costs, shared_debt, sqm_price,
+    price_vs_estimate,  // negative = below estimate
+    market_placement, dom, categories, risks
+  }>
+```
+
+**Behaviour:** Returns ALL scraped listings, not limited by `detail_limit`. Listings without enrichment get `score: null`. Lazy enrichment is triggered by `finn_analyze_ad`.
+
+### 2. `finn_analyze_ad`
+
+**Intent:** Deep-dive into one or more specific listings.
+
+```typescript
+Input:
+  finnkode: string | string[]   // single or batch
+  refresh?: boolean             // bypass cache
+
+Output:
+  // Single string input → single object
+  // Array input → array of objects in same order
+  finnkode: string
+  url: string
+  address: string
+  listing_description: string   // ← INCLUDED for AI interpretation
+  score: {
+    total: number
+    breakdown: Record<string, number>
+    nearby_transit: { tbane: [...], trikk: [...] }
+  }
+  price: {
+    total, asking, shared_debt, common_costs, sqm_price,
+    estimate, estimate_lower, estimate_upper,
+    vs_estimate, market_placement
+  }
+  property: {
+    type, ownership, area_m2, bedrooms, floor,
+    construction_year, has_balcony, has_elevator, has_garage
+  }
+  market: {
+    dom, sale_status, avg_comp_sqm_price, comp_count,
+    comps: Array<{address, usable_area, floor, construction_year,
+                  selling_price, sqm_price, days_on_market, finalized_at}>  // top 15
+  }
+  price_history: Array<{ total_price, asking_price, recorded_at }>
+  categories: string[]
+  risks: string[]
+  cache_age: {
+    structural_days: number    // age of last full refetch
+    price_hours: number        // age of last price verification
+  }
+```
+
+**Batch behaviour:** Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have `{finnkode, error: "..."}` shape.
+
+### 3. `finn_analyze_unit_images`
+
+**Intent:** Visual assessment — condition, views, room feel.
+
+Unchanged from current implementation. Returns `ImageContent` blocks, not URLs.
+
+```typescript
+Input:
+  unit_code: string
+  max_images?: number   // default 8
+```
+
+### 4. `finn_get_new_ads_since_last_run`
+
+**Intent:** What has changed since I last checked this search?
+
+```typescript
+Input:
+  search_url: string
+
+Output:
+  new_ads: Array<{finnkode, address, score, total_price, categories, url}>
+  removed_ads: Array<{finnkode, address}>
+  changed_ads: Array<{
+    finnkode, address,
+    changes: Array<{field, from, to}>   // typically price/status
+  }>
+  since: string  // ISO timestamp of previous run
+```
+
+### 5. `finn_save_feedback`
+
+**Intent:** Save my verdict on a listing.
+
+```typescript
+Input:
+  finnkode: string
+  verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
+  notes?: string
+
+Output:
+  ok: boolean
+  finnkode: string
+  verdict: string
+```
+
+### 6. `finn_get_shortlist`
+
+**Intent:** Show me reviewed listings, or find similar to one I liked.
+
+```typescript
+Input:
+  verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
+  find_similar_to?: string  // finnkode — return listings similar to this
+  min_score?: number
+  limit?: number            // default 10
+
+Output:
+  listings: Array<{
+    finnkode, address, score, total_price,
+    verdict?, notes?, categories, url
+  }>
+```
+
+---
+
+## Tools removed
+
+| Tool | Reason |
+|------|--------|
+| `finn_build_unit_vector` | Internal impl detail |
+| `finn_decode_unit_vector` | Debug utility, no user value |
+| `finn_resolve_eiendom_unit` | Internal mapping, runs automatically in `analyze_ad` |
+| `finn_get_ad` | Raw fetch without scoring — `analyze_ad` covers it |
+| `finn_get_eiendom_unit` | Raw Eiendom.no fetch, internal |
+| `finn_get_similar_units` | Takes unit_vector directly, internal |
+| `finn_analyze_ad_against_comps` | Absorbed into `analyze_ad` (comps always included) |
+| `finn_compare_ads` | Absorbed into `analyze_ad(finnkode: string[])` |
+| `finn_find_similar_to_liked_ad` | Absorbed into `get_shortlist(find_similar_to=finnkode)` |
+
+12 → 6 tools. No user intent is lost. Batch use case now native via `analyze_ad`.
+
+---
+
+## Workflows & optimizations
+
+### Lazy enrichment on demand
+`analyze_search` returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have `score: null`. First `analyze_ad(finnkode)` call enriches and caches. Next `analyze_search` shows the now-cached score. Eliminates `detail_limit` as a user-facing parameter.
+
+### Background freshness check
+On `analyze_search` cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.
+
+### Re-score without refetch
+Scoring weights are configurable. If the user changes weights, re-score from cached `finn_ads` + `eiendom_units` + `similar_units` without any network calls. Invalidates `analysis_cache` only, not raw data.
+
+### Price drop detection
+`price_history` table enables `finn_get_shortlist(price_dropped_since: timestamp)` — surface listings that dropped price recently. Built on existing append-only writes.
+
+### Cache warming on save_feedback
+When `verdict='liked'`, pre-fetch similar units in background. Next `find_similar_to=finnkode` call is instant.
+
+### Batch enrichment via parallel Eiendom.no
+Current enrichment is sequential per ad. Parallel-batch up to N at a time via `asyncio.gather` already exists in `analyze_search` — use the same pattern in `analyze_ad(finnkode: string[])`.
+
+### Cache inspection
+Internal-only — useful for debugging. Add a `--cache-status` CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.
+
+---
+
+## Output principles
+
+**Never in any tool response:**
+- `unit_vector` / raw Eiendom.no vector
+- `unit_images` URL lists (use `finn_analyze_unit_images`)
+- Internal timestamps (`fetched_at`, `detail_fetched_at`, `computed_at`)
+- `lat` / `lng` coordinates
+
+**`listing_description`:**
+- **Not** in `finn_analyze_search` — too long, 77 × 500 words = noise
+- **Yes** in `finn_analyze_ad` — AI needs it to interpret risk flags, clauses, edge cases
+
+---
+
+## Migration plan
+
+### Phase 0 — Fix the broken cache (BLOCKER)
+
+Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.
+
+- [ ] **Audit the running deployment.** Compare the deployed `cache.py` to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
+- [ ] **Backfill content_hash for existing rows.** Compute from stored payloads.
+- [ ] **Fix `ensure_eiendom_unit_code` persistence.** Only 36/222 ads have `eiendom_unit_code` in their payload — verify the mutation reaches `save_finn_ad` before serialisation.
+- [ ] **Verify `save_analysis` actually fires.** Add unit test confirming analysis_cache row count increases after `analyze_ad` call. Currently 0 rows after 222 ad fetches.
+- [ ] **Add CLI cache-status command** for ongoing visibility.
+
+**Success criteria:**
+- `analysis_cache` populated after any `analyze_search` run
+- Repeat `analyze_search` within TTL window: zero network calls, sub-second response
+- All `content_hash` columns populated across `finn_ads`, `eiendom_units`, `similar_units`
+
+### Phase 1 — Longer cache TTLs + freshness model
+
+- [ ] Update `config.py` TTLs (see table above)
+- [ ] Add `last_verified_at` column to `finn_ads`
+- [ ] Implement lightweight price/status check (HEAD or `price_widget` scrape)
+- [ ] On cache hit, kick off async refresh if `last_verified_at` is stale
+- [ ] Update `_is_fresh` logic to use TTL only on `last_verified_at`, not `fetched_at`
+
+**Success criteria:**
+- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
+- Same listing fetched today: returns from cache, no network call
+- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis
+
+### Phase 2 — Missing tables and stub implementations
+
+- [ ] Create `user_feedback`, `price_history`, `search_runs` tables
+- [ ] Implement `feedback.py` — replace all TODO stubs with DB writes
+- [ ] Populate `price_history` on every `save_finn_ad` call (append-only)
+- [ ] Populate `search_runs` on every `analyze_search` call
+
+**Success criteria:**
+- `finn_save_feedback` writes to DB; `finn_get_shortlist(verdict=...)` returns it
+- `finn_get_new_ads_since_last_run` returns real diff from last run
+- `price_history` populated when a re-fetched ad has changed price
+
+### Phase 3 — Output payload cleanup (no breaking tool changes)
+
+- [ ] Stop stripping `listing_description` in `_slim_listing()` for `analyze_ad`
+- [ ] Remove `unit_images`, `unit_vector`, internal timestamps from `analyze_ad` response
+- [ ] Add `price_history` and `cache_age` to `analyze_ad` response
+- [ ] Add `price_vs_estimate` and `cache_status` to `analyze_search` response
+
+**Success criteria:**
+- `finn_analyze_search` on 30 listings: < 50KB
+- `finn_analyze_ad` per listing: < 8KB excluding description, < 12KB including
+
+### Phase 4 — Consolidate to 6 tools + batch (breaking change)
+
+- [ ] Remove the 9 redundant tools from `mcp_server.py`
+- [ ] Update `finn_analyze_ad` to accept `string | string[]` — single or batch
+- [ ] Add `find_similar_to` parameter to `finn_get_shortlist`
+- [ ] Always include comps in `analyze_ad` — drop `include_eiendom_no` / `include_similar_units` flags
+- [ ] Migrate all `test_mcp_integration.py` tests to new tool surface
+
+**Success criteria:**
+- `finn_analyze_ad(["a", "b", "c"])`: one round trip, parallel internal fetch
+- All existing use cases covered by 6 tools
+
+### Phase 5 — Lazy enrichment + workflow additions
+
+- [ ] `analyze_search` returns all scraped listings, not just `detail_limit` count
+- [ ] Listings without enrichment get `score: null`, enriched on first `analyze_ad` call
+- [ ] Background warm-up on `save_feedback(liked)` → pre-fetch similar units
+- [ ] Re-score endpoint (or flag) that rebuilds scores from cached raw data
+
+**Success criteria:**
+- `analyze_search` on 77-result search: all 77 returned, no `detail_limit` truncation
+- Subsequent `analyze_ad` on a previously-unenriched listing: enriches + caches + returns
+- Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no
+
+---
+
+## Success metrics
+
+| Metric | Now | Target |
+|--------|-----|--------|
+| Number of tools | 12 | 6 |
+| `content_hash` populated rows | 0% | 100% |
+| `analysis_cache` row count after search | 0 | matches analyzed_listings |
+| `eiendom_unit_code` populated in stored ads | 36/222 (16%) | ~95% (resale only) |
+| `listing_description` available to AI | No | Yes (in `finn_analyze_ad`) |
+| Feedback actually persisted | No (stub) | Yes |
+| `finn_analyze_search` payload (30 ads) | ~215KB | < 50KB |
+| `finn_analyze_ad` payload per ad | ~40KB | < 12KB |
+| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
+| Listings unscored due to `detail_limit` | 47 of 77 | 0 (lazy enrichment) |
+| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
+| FINN ad structural TTL | 24h | 30 days |