feat(refactor): Document refactoring progress and phases in markdown

feat(scripts): Add backfill script for content_hash in cache tables

feat(scripts): Create recompute script for analysis_cache population

test(tests): Implement comprehensive tests for analysis module functions

fix(tests): Update CLI tests to assert errors on stderr instead of stdout

fix(tests): Adjust MCP integration tests to pass context parameter correctly

fix(tests): Modify service tests to return hash on save functions for consistency
This commit is contained in:
Ole
2026-05-29 15:16:57 +00:00
parent 5b772b2ae5
commit 55d93894ac
18 changed files with 1457 additions and 60 deletions
+416
View File
@@ -0,0 +1,416 @@
# PRD: finn-mcp v2
## Current State (from codebase + DB inspection)
### What already works
- **SQLite database** (`data/finn.sqlite`) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
- **Hash-aware caching architecture** is designed (see `cache.py` docstring)
- **Transport scoring** is implemented (`score_transport` uses lat/lng from Eiendom.no)
- **`listing_description`** is stored in the `FinnAd` model
- **`finn_analyze_unit_images`** downloads, resizes to 1024px, returns as `ImageContent` — Claude sees images directly
### Critical bugs discovered
- **Analysis cache is dead.** `analysis_cache` table has **0 rows**. Every search recomputes scoring from scratch.
- **`content_hash` is NULL on every row** in `finn_ads`, `eiendom_units`, `similar_units` — 100% NULL across 427 rows. The `_compute_deps_hash` function therefore returns a deterministic hash of empty strings on every call.
- Schema dump shows `, content_hash TEXT)` appended — column was added via `ALTER TABLE` after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
- **Only 36 of 222 ads** have `eiendom_unit_code` populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
- **Search page cache** (`cache_meta`) all rows expired May 16 — 60-min TTL is far too short.
### Known design problems
- **`feedback.py` is a stub** — all three functions are `# TODO`, nothing is persisted. No `user_feedback` table.
- No `price_history` table.
- No `search_runs` table with finnkodes per search.
- **`listing_description` is actively stripped** in `_slim_listing()` in `mcp_server.py`.
- **`detail_limit`** means only N listings get full Eiendom.no analysis — the rest are unscored.
- **No batch analysis** — analyzing 46 listings requires 46 sequential MCP calls.
- **12 tools**, 7 of which are internal plumbing.
- **Cache TTLs are far too short** — 24h on listing data forces full re-fetch on day-2 repeat searches.
---
## Goals
1. **Fix the broken cache first** — current cache promises nothing and delivers nothing
2. **Long-lived caching** with smart freshness checks — listing structural data doesn't change, treat it accordingly
3. **6 tools** — one per user intent
4. **Batch analysis** — analyze many listings in one call
5. **Persistent enrichment** — missing tables, feedback implementation
6. **Output matches intent** — each tool returns only what is relevant
7. **`listing_description` available** for AI interpretation in `finn_analyze_ad`
---
## Architecture
### Caching strategy (revised)
Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.
**Two-tier model:**
```
┌────────────────────────────────────────────────────────────────┐
│ STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│ - finn_ads.payload (description, area, year, etc.) │
│ - eiendom_units.payload (lat, lng, property_type, etc.) │
│ - similar_units.payload (completed sales — immutable) │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│ VOLATILE DATA (short TTL, cheap refresh) │
│ - price, status, days_on_market │
│ - eiendom_units.estimated_selling_price │
└────────────────────────────────────────────────────────────────┘
```
### Cache TTLs (revised)
| Data | TTL | Refresh strategy |
|------|-----|-----------------|
| FINN ad structural | **30 days** | Full refetch only |
| FINN ad price/status | **6 hours** | Lightweight check, falls back to full refetch if status changed |
| Eiendom.no unit structural | **30 days** | Full refetch only |
| Eiendom.no estimate | **7 days** | Refresh on access |
| Similar units (sold comps) | **60 days** | Immutable rows; new rows appear over time |
| Search pages | **6 hours** | Content-hash check, only re-scrape if list actually changed |
| Analysis result | **Never expires** | Invalidated by `deps_hash` change |
**Lightweight price/status check:** A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small `price_widget` block — much cheaper than the full ad page. If price unchanged, bump `last_verified_at`; if changed, full refetch.
### Database schema changes
```sql
-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.
-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
finnkode TEXT PRIMARY KEY,
verdict TEXT NOT NULL, -- 'liked' | 'disliked' | 'maybe' | 'visited'
notes TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
-- New: price history (append-only)
CREATE TABLE price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
finnkode TEXT NOT NULL,
total_price INTEGER,
asking_price INTEGER,
sale_status TEXT,
recorded_at TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);
-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_url TEXT NOT NULL,
finnkodes TEXT NOT NULL, -- JSON array
created_at TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);
-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
```
---
## Tools (v2) — 6 total
### 1. `finn_analyze_search`
**Intent:** Ranked list of all listings in this search.
```typescript
Input:
search_url: string
refresh?: boolean // force re-fetch even if cache is valid
max_pages?: number // default 5
Output:
total: number
cache_status: {
listings_from_cache: number
listings_refreshed: number
listings_freshly_scraped: number
}
listings: Array<{
finnkode, rank, score, url, address, district,
area_m2, bedrooms, floor, construction_year,
total_price, common_costs, shared_debt, sqm_price,
price_vs_estimate, // negative = below estimate
market_placement, dom, categories, risks
}>
```
**Behaviour:** Returns ALL scraped listings, not limited by `detail_limit`. Listings without enrichment get `score: null`. Lazy enrichment is triggered by `finn_analyze_ad`.
### 2. `finn_analyze_ad`
**Intent:** Deep-dive into one or more specific listings.
```typescript
Input:
finnkode: string | string[] // single or batch
refresh?: boolean // bypass cache
Output:
// Single string input → single object
// Array input → array of objects in same order
finnkode: string
url: string
address: string
listing_description: string // ← INCLUDED for AI interpretation
score: {
total: number
breakdown: Record<string, number>
nearby_transit: { tbane: [...], trikk: [...] }
}
price: {
total, asking, shared_debt, common_costs, sqm_price,
estimate, estimate_lower, estimate_upper,
vs_estimate, market_placement
}
property: {
type, ownership, area_m2, bedrooms, floor,
construction_year, has_balcony, has_elevator, has_garage
}
market: {
dom, sale_status, avg_comp_sqm_price, comp_count,
comps: Array<{address, usable_area, floor, construction_year,
selling_price, sqm_price, days_on_market, finalized_at}> // top 15
}
price_history: Array<{ total_price, asking_price, recorded_at }>
categories: string[]
risks: string[]
cache_age: {
structural_days: number // age of last full refetch
price_hours: number // age of last price verification
}
```
**Batch behaviour:** Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have `{finnkode, error: "..."}` shape.
### 3. `finn_analyze_unit_images`
**Intent:** Visual assessment — condition, views, room feel.
Unchanged from current implementation. Returns `ImageContent` blocks, not URLs.
```typescript
Input:
unit_code: string
max_images?: number // default 8
```
### 4. `finn_get_new_ads_since_last_run`
**Intent:** What has changed since I last checked this search?
```typescript
Input:
search_url: string
Output:
new_ads: Array<{finnkode, address, score, total_price, categories, url}>
removed_ads: Array<{finnkode, address}>
changed_ads: Array<{
finnkode, address,
changes: Array<{field, from, to}> // typically price/status
}>
since: string // ISO timestamp of previous run
```
### 5. `finn_save_feedback`
**Intent:** Save my verdict on a listing.
```typescript
Input:
finnkode: string
verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
notes?: string
Output:
ok: boolean
finnkode: string
verdict: string
```
### 6. `finn_get_shortlist`
**Intent:** Show me reviewed listings, or find similar to one I liked.
```typescript
Input:
verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
find_similar_to?: string // finnkode — return listings similar to this
min_score?: number
limit?: number // default 10
Output:
listings: Array<{
finnkode, address, score, total_price,
verdict?, notes?, categories, url
}>
```
---
## Tools removed
| Tool | Reason |
|------|--------|
| `finn_build_unit_vector` | Internal impl detail |
| `finn_decode_unit_vector` | Debug utility, no user value |
| `finn_resolve_eiendom_unit` | Internal mapping, runs automatically in `analyze_ad` |
| `finn_get_ad` | Raw fetch without scoring — `analyze_ad` covers it |
| `finn_get_eiendom_unit` | Raw Eiendom.no fetch, internal |
| `finn_get_similar_units` | Takes unit_vector directly, internal |
| `finn_analyze_ad_against_comps` | Absorbed into `analyze_ad` (comps always included) |
| `finn_compare_ads` | Absorbed into `analyze_ad(finnkode: string[])` |
| `finn_find_similar_to_liked_ad` | Absorbed into `get_shortlist(find_similar_to=finnkode)` |
12 → 6 tools. No user intent is lost. Batch use case now native via `analyze_ad`.
---
## Workflows & optimizations
### Lazy enrichment on demand
`analyze_search` returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have `score: null`. First `analyze_ad(finnkode)` call enriches and caches. Next `analyze_search` shows the now-cached score. Eliminates `detail_limit` as a user-facing parameter.
### Background freshness check
On `analyze_search` cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.
### Re-score without refetch
Scoring weights are configurable. If the user changes weights, re-score from cached `finn_ads` + `eiendom_units` + `similar_units` without any network calls. Invalidates `analysis_cache` only, not raw data.
### Price drop detection
`price_history` table enables `finn_get_shortlist(price_dropped_since: timestamp)` — surface listings that dropped price recently. Built on existing append-only writes.
### Cache warming on save_feedback
When `verdict='liked'`, pre-fetch similar units in background. Next `find_similar_to=finnkode` call is instant.
### Batch enrichment via parallel Eiendom.no
Current enrichment is sequential per ad. Parallel-batch up to N at a time via `asyncio.gather` already exists in `analyze_search` — use the same pattern in `analyze_ad(finnkode: string[])`.
### Cache inspection
Internal-only — useful for debugging. Add a `--cache-status` CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.
---
## Output principles
**Never in any tool response:**
- `unit_vector` / raw Eiendom.no vector
- `unit_images` URL lists (use `finn_analyze_unit_images`)
- Internal timestamps (`fetched_at`, `detail_fetched_at`, `computed_at`)
- `lat` / `lng` coordinates
**`listing_description`:**
- **Not** in `finn_analyze_search` — too long, 77 × 500 words = noise
- **Yes** in `finn_analyze_ad` — AI needs it to interpret risk flags, clauses, edge cases
---
## Migration plan
### Phase 0 — Fix the broken cache (BLOCKER)
Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.
- [ ] **Audit the running deployment.** Compare the deployed `cache.py` to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
- [ ] **Backfill content_hash for existing rows.** Compute from stored payloads.
- [ ] **Fix `ensure_eiendom_unit_code` persistence.** Only 36/222 ads have `eiendom_unit_code` in their payload — verify the mutation reaches `save_finn_ad` before serialisation.
- [ ] **Verify `save_analysis` actually fires.** Add unit test confirming analysis_cache row count increases after `analyze_ad` call. Currently 0 rows after 222 ad fetches.
- [ ] **Add CLI cache-status command** for ongoing visibility.
**Success criteria:**
- `analysis_cache` populated after any `analyze_search` run
- Repeat `analyze_search` within TTL window: zero network calls, sub-second response
- All `content_hash` columns populated across `finn_ads`, `eiendom_units`, `similar_units`
### Phase 1 — Longer cache TTLs + freshness model
- [ ] Update `config.py` TTLs (see table above)
- [ ] Add `last_verified_at` column to `finn_ads`
- [ ] Implement lightweight price/status check (HEAD or `price_widget` scrape)
- [ ] On cache hit, kick off async refresh if `last_verified_at` is stale
- [ ] Update `_is_fresh` logic to use TTL only on `last_verified_at`, not `fetched_at`
**Success criteria:**
- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
- Same listing fetched today: returns from cache, no network call
- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis
### Phase 2 — Missing tables and stub implementations
- [ ] Create `user_feedback`, `price_history`, `search_runs` tables
- [ ] Implement `feedback.py` — replace all TODO stubs with DB writes
- [ ] Populate `price_history` on every `save_finn_ad` call (append-only)
- [ ] Populate `search_runs` on every `analyze_search` call
**Success criteria:**
- `finn_save_feedback` writes to DB; `finn_get_shortlist(verdict=...)` returns it
- `finn_get_new_ads_since_last_run` returns real diff from last run
- `price_history` populated when a re-fetched ad has changed price
### Phase 3 — Output payload cleanup (no breaking tool changes)
- [ ] Stop stripping `listing_description` in `_slim_listing()` for `analyze_ad`
- [ ] Remove `unit_images`, `unit_vector`, internal timestamps from `analyze_ad` response
- [ ] Add `price_history` and `cache_age` to `analyze_ad` response
- [ ] Add `price_vs_estimate` and `cache_status` to `analyze_search` response
**Success criteria:**
- `finn_analyze_search` on 30 listings: < 50KB
- `finn_analyze_ad` per listing: < 8KB excluding description, < 12KB including
### Phase 4 — Consolidate to 6 tools + batch (breaking change)
- [ ] Remove the 9 redundant tools from `mcp_server.py`
- [ ] Update `finn_analyze_ad` to accept `string | string[]` — single or batch
- [ ] Add `find_similar_to` parameter to `finn_get_shortlist`
- [ ] Always include comps in `analyze_ad` — drop `include_eiendom_no` / `include_similar_units` flags
- [ ] Migrate all `test_mcp_integration.py` tests to new tool surface
**Success criteria:**
- `finn_analyze_ad(["a", "b", "c"])`: one round trip, parallel internal fetch
- All existing use cases covered by 6 tools
### Phase 5 — Lazy enrichment + workflow additions
- [ ] `analyze_search` returns all scraped listings, not just `detail_limit` count
- [ ] Listings without enrichment get `score: null`, enriched on first `analyze_ad` call
- [ ] Background warm-up on `save_feedback(liked)` → pre-fetch similar units
- [ ] Re-score endpoint (or flag) that rebuilds scores from cached raw data
**Success criteria:**
- `analyze_search` on 77-result search: all 77 returned, no `detail_limit` truncation
- Subsequent `analyze_ad` on a previously-unenriched listing: enriches + caches + returns
- Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no
---
## Success metrics
| Metric | Now | Target |
|--------|-----|--------|
| Number of tools | 12 | 6 |
| `content_hash` populated rows | 0% | 100% |
| `analysis_cache` row count after search | 0 | matches analyzed_listings |
| `eiendom_unit_code` populated in stored ads | 36/222 (16%) | ~95% (resale only) |
| `listing_description` available to AI | No | Yes (in `finn_analyze_ad`) |
| Feedback actually persisted | No (stub) | Yes |
| `finn_analyze_search` payload (30 ads) | ~215KB | < 50KB |
| `finn_analyze_ad` payload per ad | ~40KB | < 12KB |
| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
| Listings unscored due to `detail_limit` | 47 of 77 | 0 (lazy enrichment) |
| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
| FINN ad structural TTL | 24h | 30 days |