55d93894ac
feat(scripts): Add backfill script for content_hash in cache tables feat(scripts): Create recompute script for analysis_cache population test(tests): Implement comprehensive tests for analysis module functions fix(tests): Update CLI tests to assert errors on stderr instead of stdout fix(tests): Adjust MCP integration tests to pass context parameter correctly fix(tests): Modify service tests to return hash on save functions for consistency
416 lines
18 KiB
Markdown
416 lines
18 KiB
Markdown
# PRD: finn-mcp v2
|
||
|
||
## Current State (from codebase + DB inspection)
|
||
|
||
### What already works
|
||
- **SQLite database** (`data/finn.sqlite`) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
|
||
- **Hash-aware caching architecture** is designed (see `cache.py` docstring)
|
||
- **Transport scoring** is implemented (`score_transport` uses lat/lng from Eiendom.no)
|
||
- **`listing_description`** is stored in the `FinnAd` model
|
||
- **`finn_analyze_unit_images`** downloads, resizes to 1024px, returns as `ImageContent` — Claude sees images directly
|
||
|
||
### Critical bugs discovered
|
||
- **Analysis cache is dead.** `analysis_cache` table has **0 rows**. Every search recomputes scoring from scratch.
|
||
- **`content_hash` is NULL on every row** in `finn_ads`, `eiendom_units`, `similar_units` — 100% NULL across 427 rows. The `_compute_deps_hash` function therefore returns a deterministic hash of empty strings on every call.
|
||
- Schema dump shows `, content_hash TEXT)` appended — column was added via `ALTER TABLE` after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
|
||
- **Only 36 of 222 ads** have `eiendom_unit_code` populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
|
||
- **Search page cache** (`cache_meta`) all rows expired May 16 — 60-min TTL is far too short.
|
||
|
||
### Known design problems
|
||
- **`feedback.py` is a stub** — all three functions are `# TODO`, nothing is persisted. No `user_feedback` table.
|
||
- No `price_history` table.
|
||
- No `search_runs` table with finnkodes per search.
|
||
- **`listing_description` is actively stripped** in `_slim_listing()` in `mcp_server.py`.
|
||
- **`detail_limit`** means only N listings get full Eiendom.no analysis — the rest are unscored.
|
||
- **No batch analysis** — analyzing 46 listings requires 46 sequential MCP calls.
|
||
- **12 tools**, 7 of which are internal plumbing.
|
||
- **Cache TTLs are far too short** — 24h on listing data forces full re-fetch on day-2 repeat searches.
|
||
|
||
---
|
||
|
||
## Goals
|
||
|
||
1. **Fix the broken cache first** — current cache promises nothing and delivers nothing
|
||
2. **Long-lived caching** with smart freshness checks — listing structural data doesn't change, treat it accordingly
|
||
3. **6 tools** — one per user intent
|
||
4. **Batch analysis** — analyze many listings in one call
|
||
5. **Persistent enrichment** — missing tables, feedback implementation
|
||
6. **Output matches intent** — each tool returns only what is relevant
|
||
7. **`listing_description` available** for AI interpretation in `finn_analyze_ad`
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### Caching strategy (revised)
|
||
|
||
Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.
|
||
|
||
**Two-tier model:**
|
||
|
||
```
|
||
┌────────────────────────────────────────────────────────────────┐
|
||
│ STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
|
||
│ - finn_ads.payload (description, area, year, etc.) │
|
||
│ - eiendom_units.payload (lat, lng, property_type, etc.) │
|
||
│ - similar_units.payload (completed sales — immutable) │
|
||
└────────────────────────────────────────────────────────────────┘
|
||
┌────────────────────────────────────────────────────────────────┐
|
||
│ VOLATILE DATA (short TTL, cheap refresh) │
|
||
│ - price, status, days_on_market │
|
||
│ - eiendom_units.estimated_selling_price │
|
||
└────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Cache TTLs (revised)
|
||
|
||
| Data | TTL | Refresh strategy |
|
||
|------|-----|-----------------|
|
||
| FINN ad structural | **30 days** | Full refetch only |
|
||
| FINN ad price/status | **6 hours** | Lightweight check, falls back to full refetch if status changed |
|
||
| Eiendom.no unit structural | **30 days** | Full refetch only |
|
||
| Eiendom.no estimate | **7 days** | Refresh on access |
|
||
| Similar units (sold comps) | **60 days** | Immutable rows; new rows appear over time |
|
||
| Search pages | **6 hours** | Content-hash check, only re-scrape if list actually changed |
|
||
| Analysis result | **Never expires** | Invalidated by `deps_hash` change |
|
||
|
||
**Lightweight price/status check:** A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small `price_widget` block — much cheaper than the full ad page. If price unchanged, bump `last_verified_at`; if changed, full refetch.
|
||
|
||
### Database schema changes
|
||
|
||
```sql
|
||
-- Add to finn_ads
|
||
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
|
||
-- Tracks when we last confirmed price/status, separate from fetched_at
|
||
-- which tracks when we last did a full refetch.
|
||
|
||
-- New: user feedback (replaces feedback.py stubs)
|
||
CREATE TABLE user_feedback (
|
||
finnkode TEXT PRIMARY KEY,
|
||
verdict TEXT NOT NULL, -- 'liked' | 'disliked' | 'maybe' | 'visited'
|
||
notes TEXT,
|
||
created_at TEXT NOT NULL,
|
||
updated_at TEXT NOT NULL
|
||
);
|
||
|
||
-- New: price history (append-only)
|
||
CREATE TABLE price_history (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
finnkode TEXT NOT NULL,
|
||
total_price INTEGER,
|
||
asking_price INTEGER,
|
||
sale_status TEXT,
|
||
recorded_at TEXT NOT NULL
|
||
);
|
||
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);
|
||
|
||
-- New: search runs (for finn_get_new_ads_since_last_run)
|
||
CREATE TABLE search_runs (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
search_url TEXT NOT NULL,
|
||
finnkodes TEXT NOT NULL, -- JSON array
|
||
created_at TEXT NOT NULL
|
||
);
|
||
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);
|
||
|
||
-- Indexes for stale-detection scans
|
||
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
|
||
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
|
||
```
|
||
|
||
---
|
||
|
||
## Tools (v2) — 6 total
|
||
|
||
### 1. `finn_analyze_search`
|
||
|
||
**Intent:** Ranked list of all listings in this search.
|
||
|
||
```typescript
|
||
Input:
|
||
search_url: string
|
||
refresh?: boolean // force re-fetch even if cache is valid
|
||
max_pages?: number // default 5
|
||
|
||
Output:
|
||
total: number
|
||
cache_status: {
|
||
listings_from_cache: number
|
||
listings_refreshed: number
|
||
listings_freshly_scraped: number
|
||
}
|
||
listings: Array<{
|
||
finnkode, rank, score, url, address, district,
|
||
area_m2, bedrooms, floor, construction_year,
|
||
total_price, common_costs, shared_debt, sqm_price,
|
||
price_vs_estimate, // negative = below estimate
|
||
market_placement, dom, categories, risks
|
||
}>
|
||
```
|
||
|
||
**Behaviour:** Returns ALL scraped listings, not limited by `detail_limit`. Listings without enrichment get `score: null`. Lazy enrichment is triggered by `finn_analyze_ad`.
|
||
|
||
### 2. `finn_analyze_ad`
|
||
|
||
**Intent:** Deep-dive into one or more specific listings.
|
||
|
||
```typescript
|
||
Input:
|
||
finnkode: string | string[] // single or batch
|
||
refresh?: boolean // bypass cache
|
||
|
||
Output:
|
||
// Single string input → single object
|
||
// Array input → array of objects in same order
|
||
finnkode: string
|
||
url: string
|
||
address: string
|
||
listing_description: string // ← INCLUDED for AI interpretation
|
||
score: {
|
||
total: number
|
||
breakdown: Record<string, number>
|
||
nearby_transit: { tbane: [...], trikk: [...] }
|
||
}
|
||
price: {
|
||
total, asking, shared_debt, common_costs, sqm_price,
|
||
estimate, estimate_lower, estimate_upper,
|
||
vs_estimate, market_placement
|
||
}
|
||
property: {
|
||
type, ownership, area_m2, bedrooms, floor,
|
||
construction_year, has_balcony, has_elevator, has_garage
|
||
}
|
||
market: {
|
||
dom, sale_status, avg_comp_sqm_price, comp_count,
|
||
comps: Array<{address, usable_area, floor, construction_year,
|
||
selling_price, sqm_price, days_on_market, finalized_at}> // top 15
|
||
}
|
||
price_history: Array<{ total_price, asking_price, recorded_at }>
|
||
categories: string[]
|
||
risks: string[]
|
||
cache_age: {
|
||
structural_days: number // age of last full refetch
|
||
price_hours: number // age of last price verification
|
||
}
|
||
```
|
||
|
||
**Batch behaviour:** Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have `{finnkode, error: "..."}` shape.
|
||
|
||
### 3. `finn_analyze_unit_images`
|
||
|
||
**Intent:** Visual assessment — condition, views, room feel.
|
||
|
||
Unchanged from current implementation. Returns `ImageContent` blocks, not URLs.
|
||
|
||
```typescript
|
||
Input:
|
||
unit_code: string
|
||
max_images?: number // default 8
|
||
```
|
||
|
||
### 4. `finn_get_new_ads_since_last_run`
|
||
|
||
**Intent:** What has changed since I last checked this search?
|
||
|
||
```typescript
|
||
Input:
|
||
search_url: string
|
||
|
||
Output:
|
||
new_ads: Array<{finnkode, address, score, total_price, categories, url}>
|
||
removed_ads: Array<{finnkode, address}>
|
||
changed_ads: Array<{
|
||
finnkode, address,
|
||
changes: Array<{field, from, to}> // typically price/status
|
||
}>
|
||
since: string // ISO timestamp of previous run
|
||
```
|
||
|
||
### 5. `finn_save_feedback`
|
||
|
||
**Intent:** Save my verdict on a listing.
|
||
|
||
```typescript
|
||
Input:
|
||
finnkode: string
|
||
verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
|
||
notes?: string
|
||
|
||
Output:
|
||
ok: boolean
|
||
finnkode: string
|
||
verdict: string
|
||
```
|
||
|
||
### 6. `finn_get_shortlist`
|
||
|
||
**Intent:** Show me reviewed listings, or find similar to one I liked.
|
||
|
||
```typescript
|
||
Input:
|
||
verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
|
||
find_similar_to?: string // finnkode — return listings similar to this
|
||
min_score?: number
|
||
limit?: number // default 10
|
||
|
||
Output:
|
||
listings: Array<{
|
||
finnkode, address, score, total_price,
|
||
verdict?, notes?, categories, url
|
||
}>
|
||
```
|
||
|
||
---
|
||
|
||
## Tools removed
|
||
|
||
| Tool | Reason |
|
||
|------|--------|
|
||
| `finn_build_unit_vector` | Internal impl detail |
|
||
| `finn_decode_unit_vector` | Debug utility, no user value |
|
||
| `finn_resolve_eiendom_unit` | Internal mapping, runs automatically in `analyze_ad` |
|
||
| `finn_get_ad` | Raw fetch without scoring — `analyze_ad` covers it |
|
||
| `finn_get_eiendom_unit` | Raw Eiendom.no fetch, internal |
|
||
| `finn_get_similar_units` | Takes unit_vector directly, internal |
|
||
| `finn_analyze_ad_against_comps` | Absorbed into `analyze_ad` (comps always included) |
|
||
| `finn_compare_ads` | Absorbed into `analyze_ad(finnkode: string[])` |
|
||
| `finn_find_similar_to_liked_ad` | Absorbed into `get_shortlist(find_similar_to=finnkode)` |
|
||
|
||
12 → 6 tools. No user intent is lost. Batch use case now native via `analyze_ad`.
|
||
|
||
---
|
||
|
||
## Workflows & optimizations
|
||
|
||
### Lazy enrichment on demand
|
||
`analyze_search` returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have `score: null`. First `analyze_ad(finnkode)` call enriches and caches. Next `analyze_search` shows the now-cached score. Eliminates `detail_limit` as a user-facing parameter.
|
||
|
||
### Background freshness check
|
||
On `analyze_search` cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.
|
||
|
||
### Re-score without refetch
|
||
Scoring weights are configurable. If the user changes weights, re-score from cached `finn_ads` + `eiendom_units` + `similar_units` without any network calls. Invalidates `analysis_cache` only, not raw data.
|
||
|
||
### Price drop detection
|
||
`price_history` table enables `finn_get_shortlist(price_dropped_since: timestamp)` — surface listings that dropped price recently. Built on existing append-only writes.
|
||
|
||
### Cache warming on save_feedback
|
||
When `verdict='liked'`, pre-fetch similar units in background. Next `find_similar_to=finnkode` call is instant.
|
||
|
||
### Batch enrichment via parallel Eiendom.no
|
||
Current enrichment is sequential per ad. Parallel-batch up to N at a time via `asyncio.gather` already exists in `analyze_search` — use the same pattern in `analyze_ad(finnkode: string[])`.
|
||
|
||
### Cache inspection
|
||
Internal-only — useful for debugging. Add a `--cache-status` CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.
|
||
|
||
---
|
||
|
||
## Output principles
|
||
|
||
**Never in any tool response:**
|
||
- `unit_vector` / raw Eiendom.no vector
|
||
- `unit_images` URL lists (use `finn_analyze_unit_images`)
|
||
- Internal timestamps (`fetched_at`, `detail_fetched_at`, `computed_at`)
|
||
- `lat` / `lng` coordinates
|
||
|
||
**`listing_description`:**
|
||
- **Not** in `finn_analyze_search` — too long, 77 × 500 words = noise
|
||
- **Yes** in `finn_analyze_ad` — AI needs it to interpret risk flags, clauses, edge cases
|
||
|
||
---
|
||
|
||
## Migration plan
|
||
|
||
### Phase 0 — Fix the broken cache (BLOCKER)
|
||
|
||
Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.
|
||
|
||
- [ ] **Audit the running deployment.** Compare the deployed `cache.py` to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
|
||
- [ ] **Backfill content_hash for existing rows.** Compute from stored payloads.
|
||
- [ ] **Fix `ensure_eiendom_unit_code` persistence.** Only 36/222 ads have `eiendom_unit_code` in their payload — verify the mutation reaches `save_finn_ad` before serialisation.
|
||
- [ ] **Verify `save_analysis` actually fires.** Add unit test confirming analysis_cache row count increases after `analyze_ad` call. Currently 0 rows after 222 ad fetches.
|
||
- [ ] **Add CLI cache-status command** for ongoing visibility.
|
||
|
||
**Success criteria:**
|
||
- `analysis_cache` populated after any `analyze_search` run
|
||
- Repeat `analyze_search` within TTL window: zero network calls, sub-second response
|
||
- All `content_hash` columns populated across `finn_ads`, `eiendom_units`, `similar_units`
|
||
|
||
### Phase 1 — Longer cache TTLs + freshness model
|
||
|
||
- [ ] Update `config.py` TTLs (see table above)
|
||
- [ ] Add `last_verified_at` column to `finn_ads`
|
||
- [ ] Implement lightweight price/status check (HEAD or `price_widget` scrape)
|
||
- [ ] On cache hit, kick off async refresh if `last_verified_at` is stale
|
||
- [ ] Update `_is_fresh` logic to use TTL only on `last_verified_at`, not `fetched_at`
|
||
|
||
**Success criteria:**
|
||
- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
|
||
- Same listing fetched today: returns from cache, no network call
|
||
- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis
|
||
|
||
### Phase 2 — Missing tables and stub implementations
|
||
|
||
- [ ] Create `user_feedback`, `price_history`, `search_runs` tables
|
||
- [ ] Implement `feedback.py` — replace all TODO stubs with DB writes
|
||
- [ ] Populate `price_history` on every `save_finn_ad` call (append-only)
|
||
- [ ] Populate `search_runs` on every `analyze_search` call
|
||
|
||
**Success criteria:**
|
||
- `finn_save_feedback` writes to DB; `finn_get_shortlist(verdict=...)` returns it
|
||
- `finn_get_new_ads_since_last_run` returns real diff from last run
|
||
- `price_history` populated when a re-fetched ad has changed price
|
||
|
||
### Phase 3 — Output payload cleanup (no breaking tool changes)
|
||
|
||
- [ ] Stop stripping `listing_description` in `_slim_listing()` for `analyze_ad`
|
||
- [ ] Remove `unit_images`, `unit_vector`, internal timestamps from `analyze_ad` response
|
||
- [ ] Add `price_history` and `cache_age` to `analyze_ad` response
|
||
- [ ] Add `price_vs_estimate` and `cache_status` to `analyze_search` response
|
||
|
||
**Success criteria:**
|
||
- `finn_analyze_search` on 30 listings: < 50KB
|
||
- `finn_analyze_ad` per listing: < 8KB excluding description, < 12KB including
|
||
|
||
### Phase 4 — Consolidate to 6 tools + batch (breaking change)
|
||
|
||
- [ ] Remove the 9 redundant tools from `mcp_server.py`
|
||
- [ ] Update `finn_analyze_ad` to accept `string | string[]` — single or batch
|
||
- [ ] Add `find_similar_to` parameter to `finn_get_shortlist`
|
||
- [ ] Always include comps in `analyze_ad` — drop `include_eiendom_no` / `include_similar_units` flags
|
||
- [ ] Migrate all `test_mcp_integration.py` tests to new tool surface
|
||
|
||
**Success criteria:**
|
||
- `finn_analyze_ad(["a", "b", "c"])`: one round trip, parallel internal fetch
|
||
- All existing use cases covered by 6 tools
|
||
|
||
### Phase 5 — Lazy enrichment + workflow additions
|
||
|
||
- [ ] `analyze_search` returns all scraped listings, not just `detail_limit` count
|
||
- [ ] Listings without enrichment get `score: null`, enriched on first `analyze_ad` call
|
||
- [ ] Background warm-up on `save_feedback(liked)` → pre-fetch similar units
|
||
- [ ] Re-score endpoint (or flag) that rebuilds scores from cached raw data
|
||
|
||
**Success criteria:**
|
||
- `analyze_search` on 77-result search: all 77 returned, no `detail_limit` truncation
|
||
- Subsequent `analyze_ad` on a previously-unenriched listing: enriches + caches + returns
|
||
- Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no
|
||
|
||
---
|
||
|
||
## Success metrics
|
||
|
||
| Metric | Now | Target |
|
||
|--------|-----|--------|
|
||
| Number of tools | 12 | 6 |
|
||
| `content_hash` populated rows | 0% | 100% |
|
||
| `analysis_cache` row count after search | 0 | matches analyzed_listings |
|
||
| `eiendom_unit_code` populated in stored ads | 36/222 (16%) | ~95% (resale only) |
|
||
| `listing_description` available to AI | No | Yes (in `finn_analyze_ad`) |
|
||
| Feedback actually persisted | No (stub) | Yes |
|
||
| `finn_analyze_search` payload (30 ads) | ~215KB | < 50KB |
|
||
| `finn_analyze_ad` payload per ad | ~40KB | < 12KB |
|
||
| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
|
||
| Listings unscored due to `detail_limit` | 47 of 77 | 0 (lazy enrichment) |
|
||
| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
|
||
| FINN ad structural TTL | 24h | 30 days | |