Files
finn-mcp/refactor.md
T
ole 55d93894ac feat(refactor): Document refactoring progress and phases in markdown
feat(scripts): Add backfill script for content_hash in cache tables

feat(scripts): Create recompute script for analysis_cache population

test(tests): Implement comprehensive tests for analysis module functions

fix(tests): Update CLI tests to assert errors on stderr instead of stdout

fix(tests): Adjust MCP integration tests to pass context parameter correctly

fix(tests): Modify service tests to return hash on save functions for consistency
2026-05-29 15:16:57 +00:00

416 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD: finn-mcp v2
## Current State (from codebase + DB inspection)
### What already works
- **SQLite database** (`data/finn.sqlite`) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
- **Hash-aware caching architecture** is designed (see `cache.py` docstring)
- **Transport scoring** is implemented (`score_transport` uses lat/lng from Eiendom.no)
- **`listing_description`** is stored in the `FinnAd` model
- **`finn_analyze_unit_images`** downloads, resizes to 1024px, returns as `ImageContent` — Claude sees images directly
### Critical bugs discovered
- **Analysis cache is dead.** `analysis_cache` table has **0 rows**. Every search recomputes scoring from scratch.
- **`content_hash` is NULL on every row** in `finn_ads`, `eiendom_units`, `similar_units` — 100% NULL across 427 rows. The `_compute_deps_hash` function therefore returns a deterministic hash of empty strings on every call.
- Schema dump shows `, content_hash TEXT)` appended — column was added via `ALTER TABLE` after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
- **Only 36 of 222 ads** have `eiendom_unit_code` populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
- **Search page cache** (`cache_meta`) all rows expired May 16 — 60-min TTL is far too short.
### Known design problems
- **`feedback.py` is a stub** — all three functions are `# TODO`, nothing is persisted. No `user_feedback` table.
- No `price_history` table.
- No `search_runs` table with finnkodes per search.
- **`listing_description` is actively stripped** in `_slim_listing()` in `mcp_server.py`.
- **`detail_limit`** means only N listings get full Eiendom.no analysis — the rest are unscored.
- **No batch analysis** — analyzing 46 listings requires 46 sequential MCP calls.
- **12 tools**, 7 of which are internal plumbing.
- **Cache TTLs are far too short** — 24h on listing data forces full re-fetch on day-2 repeat searches.
---
## Goals
1. **Fix the broken cache first** — current cache promises nothing and delivers nothing
2. **Long-lived caching** with smart freshness checks — listing structural data doesn't change, treat it accordingly
3. **6 tools** — one per user intent
4. **Batch analysis** — analyze many listings in one call
5. **Persistent enrichment** — missing tables, feedback implementation
6. **Output matches intent** — each tool returns only what is relevant
7. **`listing_description` available** for AI interpretation in `finn_analyze_ad`
---
## Architecture
### Caching strategy (revised)
Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.
**Two-tier model:**
```
┌────────────────────────────────────────────────────────────────┐
│ STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│ - finn_ads.payload (description, area, year, etc.) │
│ - eiendom_units.payload (lat, lng, property_type, etc.) │
│ - similar_units.payload (completed sales — immutable) │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│ VOLATILE DATA (short TTL, cheap refresh) │
│ - price, status, days_on_market │
│ - eiendom_units.estimated_selling_price │
└────────────────────────────────────────────────────────────────┘
```
### Cache TTLs (revised)
| Data | TTL | Refresh strategy |
|------|-----|-----------------|
| FINN ad structural | **30 days** | Full refetch only |
| FINN ad price/status | **6 hours** | Lightweight check, falls back to full refetch if status changed |
| Eiendom.no unit structural | **30 days** | Full refetch only |
| Eiendom.no estimate | **7 days** | Refresh on access |
| Similar units (sold comps) | **60 days** | Immutable rows; new rows appear over time |
| Search pages | **6 hours** | Content-hash check, only re-scrape if list actually changed |
| Analysis result | **Never expires** | Invalidated by `deps_hash` change |
**Lightweight price/status check:** A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small `price_widget` block — much cheaper than the full ad page. If price unchanged, bump `last_verified_at`; if changed, full refetch.
### Database schema changes
```sql
-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.
-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
finnkode TEXT PRIMARY KEY,
verdict TEXT NOT NULL, -- 'liked' | 'disliked' | 'maybe' | 'visited'
notes TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
-- New: price history (append-only)
CREATE TABLE price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
finnkode TEXT NOT NULL,
total_price INTEGER,
asking_price INTEGER,
sale_status TEXT,
recorded_at TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);
-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_url TEXT NOT NULL,
finnkodes TEXT NOT NULL, -- JSON array
created_at TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);
-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
```
---
## Tools (v2) — 6 total
### 1. `finn_analyze_search`
**Intent:** Ranked list of all listings in this search.
```typescript
Input:
search_url: string
refresh?: boolean // force re-fetch even if cache is valid
max_pages?: number // default 5
Output:
total: number
cache_status: {
listings_from_cache: number
listings_refreshed: number
listings_freshly_scraped: number
}
listings: Array<{
finnkode, rank, score, url, address, district,
area_m2, bedrooms, floor, construction_year,
total_price, common_costs, shared_debt, sqm_price,
price_vs_estimate, // negative = below estimate
market_placement, dom, categories, risks
}>
```
**Behaviour:** Returns ALL scraped listings, not limited by `detail_limit`. Listings without enrichment get `score: null`. Lazy enrichment is triggered by `finn_analyze_ad`.
### 2. `finn_analyze_ad`
**Intent:** Deep-dive into one or more specific listings.
```typescript
Input:
finnkode: string | string[] // single or batch
refresh?: boolean // bypass cache
Output:
// Single string input → single object
// Array input → array of objects in same order
finnkode: string
url: string
address: string
listing_description: string // ← INCLUDED for AI interpretation
score: {
total: number
breakdown: Record<string, number>
nearby_transit: { tbane: [...], trikk: [...] }
}
price: {
total, asking, shared_debt, common_costs, sqm_price,
estimate, estimate_lower, estimate_upper,
vs_estimate, market_placement
}
property: {
type, ownership, area_m2, bedrooms, floor,
construction_year, has_balcony, has_elevator, has_garage
}
market: {
dom, sale_status, avg_comp_sqm_price, comp_count,
comps: Array<{address, usable_area, floor, construction_year,
selling_price, sqm_price, days_on_market, finalized_at}> // top 15
}
price_history: Array<{ total_price, asking_price, recorded_at }>
categories: string[]
risks: string[]
cache_age: {
structural_days: number // age of last full refetch
price_hours: number // age of last price verification
}
```
**Batch behaviour:** Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have `{finnkode, error: "..."}` shape.
### 3. `finn_analyze_unit_images`
**Intent:** Visual assessment — condition, views, room feel.
Unchanged from current implementation. Returns `ImageContent` blocks, not URLs.
```typescript
Input:
unit_code: string
max_images?: number // default 8
```
### 4. `finn_get_new_ads_since_last_run`
**Intent:** What has changed since I last checked this search?
```typescript
Input:
search_url: string
Output:
new_ads: Array<{finnkode, address, score, total_price, categories, url}>
removed_ads: Array<{finnkode, address}>
changed_ads: Array<{
finnkode, address,
changes: Array<{field, from, to}> // typically price/status
}>
since: string // ISO timestamp of previous run
```
### 5. `finn_save_feedback`
**Intent:** Save my verdict on a listing.
```typescript
Input:
finnkode: string
verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
notes?: string
Output:
ok: boolean
finnkode: string
verdict: string
```
### 6. `finn_get_shortlist`
**Intent:** Show me reviewed listings, or find similar to one I liked.
```typescript
Input:
verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
find_similar_to?: string // finnkode — return listings similar to this
min_score?: number
limit?: number // default 10
Output:
listings: Array<{
finnkode, address, score, total_price,
verdict?, notes?, categories, url
}>
```
---
## Tools removed
| Tool | Reason |
|------|--------|
| `finn_build_unit_vector` | Internal impl detail |
| `finn_decode_unit_vector` | Debug utility, no user value |
| `finn_resolve_eiendom_unit` | Internal mapping, runs automatically in `analyze_ad` |
| `finn_get_ad` | Raw fetch without scoring — `analyze_ad` covers it |
| `finn_get_eiendom_unit` | Raw Eiendom.no fetch, internal |
| `finn_get_similar_units` | Takes unit_vector directly, internal |
| `finn_analyze_ad_against_comps` | Absorbed into `analyze_ad` (comps always included) |
| `finn_compare_ads` | Absorbed into `analyze_ad(finnkode: string[])` |
| `finn_find_similar_to_liked_ad` | Absorbed into `get_shortlist(find_similar_to=finnkode)` |
12 → 6 tools. No user intent is lost. Batch use case now native via `analyze_ad`.
---
## Workflows & optimizations
### Lazy enrichment on demand
`analyze_search` returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have `score: null`. First `analyze_ad(finnkode)` call enriches and caches. Next `analyze_search` shows the now-cached score. Eliminates `detail_limit` as a user-facing parameter.
### Background freshness check
On `analyze_search` cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.
### Re-score without refetch
Scoring weights are configurable. If the user changes weights, re-score from cached `finn_ads` + `eiendom_units` + `similar_units` without any network calls. Invalidates `analysis_cache` only, not raw data.
### Price drop detection
`price_history` table enables `finn_get_shortlist(price_dropped_since: timestamp)` — surface listings that dropped price recently. Built on existing append-only writes.
### Cache warming on save_feedback
When `verdict='liked'`, pre-fetch similar units in background. Next `find_similar_to=finnkode` call is instant.
### Batch enrichment via parallel Eiendom.no
Current enrichment is sequential per ad. Parallel-batch up to N at a time via `asyncio.gather` already exists in `analyze_search` — use the same pattern in `analyze_ad(finnkode: string[])`.
### Cache inspection
Internal-only — useful for debugging. Add a `--cache-status` CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.
---
## Output principles
**Never in any tool response:**
- `unit_vector` / raw Eiendom.no vector
- `unit_images` URL lists (use `finn_analyze_unit_images`)
- Internal timestamps (`fetched_at`, `detail_fetched_at`, `computed_at`)
- `lat` / `lng` coordinates
**`listing_description`:**
- **Not** in `finn_analyze_search` — too long, 77 × 500 words = noise
- **Yes** in `finn_analyze_ad` — AI needs it to interpret risk flags, clauses, edge cases
---
## Migration plan
### Phase 0 — Fix the broken cache (BLOCKER)
Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.
- [ ] **Audit the running deployment.** Compare the deployed `cache.py` to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
- [ ] **Backfill content_hash for existing rows.** Compute from stored payloads.
- [ ] **Fix `ensure_eiendom_unit_code` persistence.** Only 36/222 ads have `eiendom_unit_code` in their payload — verify the mutation reaches `save_finn_ad` before serialisation.
- [ ] **Verify `save_analysis` actually fires.** Add unit test confirming analysis_cache row count increases after `analyze_ad` call. Currently 0 rows after 222 ad fetches.
- [ ] **Add CLI cache-status command** for ongoing visibility.
**Success criteria:**
- `analysis_cache` populated after any `analyze_search` run
- Repeat `analyze_search` within TTL window: zero network calls, sub-second response
- All `content_hash` columns populated across `finn_ads`, `eiendom_units`, `similar_units`
### Phase 1 — Longer cache TTLs + freshness model
- [ ] Update `config.py` TTLs (see table above)
- [ ] Add `last_verified_at` column to `finn_ads`
- [ ] Implement lightweight price/status check (HEAD or `price_widget` scrape)
- [ ] On cache hit, kick off async refresh if `last_verified_at` is stale
- [ ] Update `_is_fresh` logic to use TTL only on `last_verified_at`, not `fetched_at`
**Success criteria:**
- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
- Same listing fetched today: returns from cache, no network call
- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis
### Phase 2 — Missing tables and stub implementations
- [ ] Create `user_feedback`, `price_history`, `search_runs` tables
- [ ] Implement `feedback.py` — replace all TODO stubs with DB writes
- [ ] Populate `price_history` on every `save_finn_ad` call (append-only)
- [ ] Populate `search_runs` on every `analyze_search` call
**Success criteria:**
- `finn_save_feedback` writes to DB; `finn_get_shortlist(verdict=...)` returns it
- `finn_get_new_ads_since_last_run` returns real diff from last run
- `price_history` populated when a re-fetched ad has changed price
### Phase 3 — Output payload cleanup (no breaking tool changes)
- [ ] Stop stripping `listing_description` in `_slim_listing()` for `analyze_ad`
- [ ] Remove `unit_images`, `unit_vector`, internal timestamps from `analyze_ad` response
- [ ] Add `price_history` and `cache_age` to `analyze_ad` response
- [ ] Add `price_vs_estimate` and `cache_status` to `analyze_search` response
**Success criteria:**
- `finn_analyze_search` on 30 listings: < 50KB
- `finn_analyze_ad` per listing: < 8KB excluding description, < 12KB including
### Phase 4 — Consolidate to 6 tools + batch (breaking change)
- [ ] Remove the 9 redundant tools from `mcp_server.py`
- [ ] Update `finn_analyze_ad` to accept `string | string[]` — single or batch
- [ ] Add `find_similar_to` parameter to `finn_get_shortlist`
- [ ] Always include comps in `analyze_ad` — drop `include_eiendom_no` / `include_similar_units` flags
- [ ] Migrate all `test_mcp_integration.py` tests to new tool surface
**Success criteria:**
- `finn_analyze_ad(["a", "b", "c"])`: one round trip, parallel internal fetch
- All existing use cases covered by 6 tools
### Phase 5 — Lazy enrichment + workflow additions
- [ ] `analyze_search` returns all scraped listings, not just `detail_limit` count
- [ ] Listings without enrichment get `score: null`, enriched on first `analyze_ad` call
- [ ] Background warm-up on `save_feedback(liked)` → pre-fetch similar units
- [ ] Re-score endpoint (or flag) that rebuilds scores from cached raw data
**Success criteria:**
- `analyze_search` on 77-result search: all 77 returned, no `detail_limit` truncation
- Subsequent `analyze_ad` on a previously-unenriched listing: enriches + caches + returns
- Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no
---
## Success metrics
| Metric | Now | Target |
|--------|-----|--------|
| Number of tools | 12 | 6 |
| `content_hash` populated rows | 0% | 100% |
| `analysis_cache` row count after search | 0 | matches analyzed_listings |
| `eiendom_unit_code` populated in stored ads | 36/222 (16%) | ~95% (resale only) |
| `listing_description` available to AI | No | Yes (in `finn_analyze_ad`) |
| Feedback actually persisted | No (stub) | Yes |
| `finn_analyze_search` payload (30 ads) | ~215KB | < 50KB |
| `finn_analyze_ad` payload per ad | ~40KB | < 12KB |
| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
| Listings unscored due to `detail_limit` | 47 of 77 | 0 (lazy enrichment) |
| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
| FINN ad structural TTL | 24h | 30 days |