feat(scripts): Add backfill script for content_hash in cache tables feat(scripts): Create recompute script for analysis_cache population test(tests): Implement comprehensive tests for analysis module functions fix(tests): Update CLI tests to assert errors on stderr instead of stdout fix(tests): Adjust MCP integration tests to pass context parameter correctly fix(tests): Modify service tests to return hash on save functions for consistency
18 KiB
PRD: finn-mcp v2
Current State (from codebase + DB inspection)
What already works
- SQLite database (
data/finn.sqlite) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units - Hash-aware caching architecture is designed (see
cache.pydocstring) - Transport scoring is implemented (
score_transportuses lat/lng from Eiendom.no) listing_descriptionis stored in theFinnAdmodelfinn_analyze_unit_imagesdownloads, resizes to 1024px, returns asImageContent— Claude sees images directly
Critical bugs discovered
- Analysis cache is dead.
analysis_cachetable has 0 rows. Every search recomputes scoring from scratch. content_hashis NULL on every row infinn_ads,eiendom_units,similar_units— 100% NULL across 427 rows. The_compute_deps_hashfunction therefore returns a deterministic hash of empty strings on every call.- Schema dump shows
, content_hash TEXT)appended — column was added viaALTER TABLEafter data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run. - Only 36 of 222 ads have
eiendom_unit_codepopulated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row. - Search page cache (
cache_meta) all rows expired May 16 — 60-min TTL is far too short.
Known design problems
feedback.pyis a stub — all three functions are# TODO, nothing is persisted. Nouser_feedbacktable.- No
price_historytable. - No
search_runstable with finnkodes per search. listing_descriptionis actively stripped in_slim_listing()inmcp_server.py.detail_limitmeans only N listings get full Eiendom.no analysis — the rest are unscored.- No batch analysis — analyzing 46 listings requires 46 sequential MCP calls.
- 12 tools, 7 of which are internal plumbing.
- Cache TTLs are far too short — 24h on listing data forces full re-fetch on day-2 repeat searches.
Goals
- Fix the broken cache first — current cache promises nothing and delivers nothing
- Long-lived caching with smart freshness checks — listing structural data doesn't change, treat it accordingly
- 6 tools — one per user intent
- Batch analysis — analyze many listings in one call
- Persistent enrichment — missing tables, feedback implementation
- Output matches intent — each tool returns only what is relevant
listing_descriptionavailable for AI interpretation infinn_analyze_ad
Architecture
Caching strategy (revised)
Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.
Two-tier model:
┌────────────────────────────────────────────────────────────────┐
│ STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│ - finn_ads.payload (description, area, year, etc.) │
│ - eiendom_units.payload (lat, lng, property_type, etc.) │
│ - similar_units.payload (completed sales — immutable) │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│ VOLATILE DATA (short TTL, cheap refresh) │
│ - price, status, days_on_market │
│ - eiendom_units.estimated_selling_price │
└────────────────────────────────────────────────────────────────┘
Cache TTLs (revised)
| Data | TTL | Refresh strategy |
|---|---|---|
| FINN ad structural | 30 days | Full refetch only |
| FINN ad price/status | 6 hours | Lightweight check, falls back to full refetch if status changed |
| Eiendom.no unit structural | 30 days | Full refetch only |
| Eiendom.no estimate | 7 days | Refresh on access |
| Similar units (sold comps) | 60 days | Immutable rows; new rows appear over time |
| Search pages | 6 hours | Content-hash check, only re-scrape if list actually changed |
| Analysis result | Never expires | Invalidated by deps_hash change |
Lightweight price/status check: A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small price_widget block — much cheaper than the full ad page. If price unchanged, bump last_verified_at; if changed, full refetch.
Database schema changes
-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.
-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
finnkode TEXT PRIMARY KEY,
verdict TEXT NOT NULL, -- 'liked' | 'disliked' | 'maybe' | 'visited'
notes TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
-- New: price history (append-only)
CREATE TABLE price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
finnkode TEXT NOT NULL,
total_price INTEGER,
asking_price INTEGER,
sale_status TEXT,
recorded_at TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);
-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_url TEXT NOT NULL,
finnkodes TEXT NOT NULL, -- JSON array
created_at TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);
-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);
Tools (v2) — 6 total
1. finn_analyze_search
Intent: Ranked list of all listings in this search.
Input:
search_url: string
refresh?: boolean // force re-fetch even if cache is valid
max_pages?: number // default 5
Output:
total: number
cache_status: {
listings_from_cache: number
listings_refreshed: number
listings_freshly_scraped: number
}
listings: Array<{
finnkode, rank, score, url, address, district,
area_m2, bedrooms, floor, construction_year,
total_price, common_costs, shared_debt, sqm_price,
price_vs_estimate, // negative = below estimate
market_placement, dom, categories, risks
}>
Behaviour: Returns ALL scraped listings, not limited by detail_limit. Listings without enrichment get score: null. Lazy enrichment is triggered by finn_analyze_ad.
2. finn_analyze_ad
Intent: Deep-dive into one or more specific listings.
Input:
finnkode: string | string[] // single or batch
refresh?: boolean // bypass cache
Output:
// Single string input → single object
// Array input → array of objects in same order
finnkode: string
url: string
address: string
listing_description: string // ← INCLUDED for AI interpretation
score: {
total: number
breakdown: Record<string, number>
nearby_transit: { tbane: [...], trikk: [...] }
}
price: {
total, asking, shared_debt, common_costs, sqm_price,
estimate, estimate_lower, estimate_upper,
vs_estimate, market_placement
}
property: {
type, ownership, area_m2, bedrooms, floor,
construction_year, has_balcony, has_elevator, has_garage
}
market: {
dom, sale_status, avg_comp_sqm_price, comp_count,
comps: Array<{address, usable_area, floor, construction_year,
selling_price, sqm_price, days_on_market, finalized_at}> // top 15
}
price_history: Array<{ total_price, asking_price, recorded_at }>
categories: string[]
risks: string[]
cache_age: {
structural_days: number // age of last full refetch
price_hours: number // age of last price verification
}
Batch behaviour: Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have {finnkode, error: "..."} shape.
3. finn_analyze_unit_images
Intent: Visual assessment — condition, views, room feel.
Unchanged from current implementation. Returns ImageContent blocks, not URLs.
Input:
unit_code: string
max_images?: number // default 8
4. finn_get_new_ads_since_last_run
Intent: What has changed since I last checked this search?
Input:
search_url: string
Output:
new_ads: Array<{finnkode, address, score, total_price, categories, url}>
removed_ads: Array<{finnkode, address}>
changed_ads: Array<{
finnkode, address,
changes: Array<{field, from, to}> // typically price/status
}>
since: string // ISO timestamp of previous run
5. finn_save_feedback
Intent: Save my verdict on a listing.
Input:
finnkode: string
verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
notes?: string
Output:
ok: boolean
finnkode: string
verdict: string
6. finn_get_shortlist
Intent: Show me reviewed listings, or find similar to one I liked.
Input:
verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
find_similar_to?: string // finnkode — return listings similar to this
min_score?: number
limit?: number // default 10
Output:
listings: Array<{
finnkode, address, score, total_price,
verdict?, notes?, categories, url
}>
Tools removed
| Tool | Reason |
|---|---|
finn_build_unit_vector |
Internal impl detail |
finn_decode_unit_vector |
Debug utility, no user value |
finn_resolve_eiendom_unit |
Internal mapping, runs automatically in analyze_ad |
finn_get_ad |
Raw fetch without scoring — analyze_ad covers it |
finn_get_eiendom_unit |
Raw Eiendom.no fetch, internal |
finn_get_similar_units |
Takes unit_vector directly, internal |
finn_analyze_ad_against_comps |
Absorbed into analyze_ad (comps always included) |
finn_compare_ads |
Absorbed into analyze_ad(finnkode: string[]) |
finn_find_similar_to_liked_ad |
Absorbed into get_shortlist(find_similar_to=finnkode) |
12 → 6 tools. No user intent is lost. Batch use case now native via analyze_ad.
Workflows & optimizations
Lazy enrichment on demand
analyze_search returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have score: null. First analyze_ad(finnkode) call enriches and caches. Next analyze_search shows the now-cached score. Eliminates detail_limit as a user-facing parameter.
Background freshness check
On analyze_search cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.
Re-score without refetch
Scoring weights are configurable. If the user changes weights, re-score from cached finn_ads + eiendom_units + similar_units without any network calls. Invalidates analysis_cache only, not raw data.
Price drop detection
price_history table enables finn_get_shortlist(price_dropped_since: timestamp) — surface listings that dropped price recently. Built on existing append-only writes.
Cache warming on save_feedback
When verdict='liked', pre-fetch similar units in background. Next find_similar_to=finnkode call is instant.
Batch enrichment via parallel Eiendom.no
Current enrichment is sequential per ad. Parallel-batch up to N at a time via asyncio.gather already exists in analyze_search — use the same pattern in analyze_ad(finnkode: string[]).
Cache inspection
Internal-only — useful for debugging. Add a --cache-status CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.
Output principles
Never in any tool response:
unit_vector/ raw Eiendom.no vectorunit_imagesURL lists (usefinn_analyze_unit_images)- Internal timestamps (
fetched_at,detail_fetched_at,computed_at) lat/lngcoordinates
listing_description:
- Not in
finn_analyze_search— too long, 77 × 500 words = noise - Yes in
finn_analyze_ad— AI needs it to interpret risk flags, clauses, edge cases
Migration plan
Phase 0 — Fix the broken cache (BLOCKER)
Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.
- Audit the running deployment. Compare the deployed
cache.pyto the source we have. Hashes are NULL in DB despite source code populating them — find the divergence. - Backfill content_hash for existing rows. Compute from stored payloads.
- Fix
ensure_eiendom_unit_codepersistence. Only 36/222 ads haveeiendom_unit_codein their payload — verify the mutation reachessave_finn_adbefore serialisation. - Verify
save_analysisactually fires. Add unit test confirming analysis_cache row count increases afteranalyze_adcall. Currently 0 rows after 222 ad fetches. - Add CLI cache-status command for ongoing visibility.
Success criteria:
analysis_cachepopulated after anyanalyze_searchrun- Repeat
analyze_searchwithin TTL window: zero network calls, sub-second response - All
content_hashcolumns populated acrossfinn_ads,eiendom_units,similar_units
Phase 1 — Longer cache TTLs + freshness model
- Update
config.pyTTLs (see table above) - Add
last_verified_atcolumn tofinn_ads - Implement lightweight price/status check (HEAD or
price_widgetscrape) - On cache hit, kick off async refresh if
last_verified_atis stale - Update
_is_freshlogic to use TTL only onlast_verified_at, notfetched_at
Success criteria:
- Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
- Same listing fetched today: returns from cache, no network call
- Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis
Phase 2 — Missing tables and stub implementations
- Create
user_feedback,price_history,search_runstables - Implement
feedback.py— replace all TODO stubs with DB writes - Populate
price_historyon everysave_finn_adcall (append-only) - Populate
search_runson everyanalyze_searchcall
Success criteria:
finn_save_feedbackwrites to DB;finn_get_shortlist(verdict=...)returns itfinn_get_new_ads_since_last_runreturns real diff from last runprice_historypopulated when a re-fetched ad has changed price
Phase 3 — Output payload cleanup (no breaking tool changes)
- Stop stripping
listing_descriptionin_slim_listing()foranalyze_ad - Remove
unit_images,unit_vector, internal timestamps fromanalyze_adresponse - Add
price_historyandcache_agetoanalyze_adresponse - Add
price_vs_estimateandcache_statustoanalyze_searchresponse
Success criteria:
finn_analyze_searchon 30 listings: < 50KBfinn_analyze_adper listing: < 8KB excluding description, < 12KB including
Phase 4 — Consolidate to 6 tools + batch (breaking change)
- Remove the 9 redundant tools from
mcp_server.py - Update
finn_analyze_adto acceptstring | string[]— single or batch - Add
find_similar_toparameter tofinn_get_shortlist - Always include comps in
analyze_ad— dropinclude_eiendom_no/include_similar_unitsflags - Migrate all
test_mcp_integration.pytests to new tool surface
Success criteria:
finn_analyze_ad(["a", "b", "c"]): one round trip, parallel internal fetch- All existing use cases covered by 6 tools
Phase 5 — Lazy enrichment + workflow additions
analyze_searchreturns all scraped listings, not justdetail_limitcount- Listings without enrichment get
score: null, enriched on firstanalyze_adcall - Background warm-up on
save_feedback(liked)→ pre-fetch similar units - Re-score endpoint (or flag) that rebuilds scores from cached raw data
Success criteria:
analyze_searchon 77-result search: all 77 returned, nodetail_limittruncation- Subsequent
analyze_adon a previously-unenriched listing: enriches + caches + returns - Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no
Success metrics
| Metric | Now | Target |
|---|---|---|
| Number of tools | 12 | 6 |
content_hash populated rows |
0% | 100% |
analysis_cache row count after search |
0 | matches analyzed_listings |
eiendom_unit_code populated in stored ads |
36/222 (16%) | ~95% (resale only) |
listing_description available to AI |
No | Yes (in finn_analyze_ad) |
| Feedback actually persisted | No (stub) | Yes |
finn_analyze_search payload (30 ads) |
~215KB | < 50KB |
finn_analyze_ad payload per ad |
~40KB | < 12KB |
| Repeat search within 1 week | Full recompute | 0 network calls, < 1s |
Listings unscored due to detail_limit |
47 of 77 | 0 (lazy enrichment) |
| Batch analyze 10 ads | 10 round-trips | 1 round-trip |
| FINN ad structural TTL | 24h | 30 days |