Files
finn-mcp/refactor.md
T
ole 55d93894ac feat(refactor): Document refactoring progress and phases in markdown
feat(scripts): Add backfill script for content_hash in cache tables

feat(scripts): Create recompute script for analysis_cache population

test(tests): Implement comprehensive tests for analysis module functions

fix(tests): Update CLI tests to assert errors on stderr instead of stdout

fix(tests): Adjust MCP integration tests to pass context parameter correctly

fix(tests): Modify service tests to return hash on save functions for consistency
2026-05-29 15:16:57 +00:00

18 KiB
Raw Blame History

PRD: finn-mcp v2

Current State (from codebase + DB inspection)

What already works

  • SQLite database (data/finn.sqlite) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
  • Hash-aware caching architecture is designed (see cache.py docstring)
  • Transport scoring is implemented (score_transport uses lat/lng from Eiendom.no)
  • listing_description is stored in the FinnAd model
  • finn_analyze_unit_images downloads, resizes to 1024px, returns as ImageContent — Claude sees images directly

Critical bugs discovered

  • Analysis cache is dead. analysis_cache table has 0 rows. Every search recomputes scoring from scratch.
  • content_hash is NULL on every row in finn_ads, eiendom_units, similar_units — 100% NULL across 427 rows. The _compute_deps_hash function therefore returns a deterministic hash of empty strings on every call.
  • Schema dump shows , content_hash TEXT) appended — column was added via ALTER TABLE after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
  • Only 36 of 222 ads have eiendom_unit_code populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
  • Search page cache (cache_meta) all rows expired May 16 — 60-min TTL is far too short.

Known design problems

  • feedback.py is a stub — all three functions are # TODO, nothing is persisted. No user_feedback table.
  • No price_history table.
  • No search_runs table with finnkodes per search.
  • listing_description is actively stripped in _slim_listing() in mcp_server.py.
  • detail_limit means only N listings get full Eiendom.no analysis — the rest are unscored.
  • No batch analysis — analyzing 46 listings requires 46 sequential MCP calls.
  • 12 tools, 7 of which are internal plumbing.
  • Cache TTLs are far too short — 24h on listing data forces full re-fetch on day-2 repeat searches.

Goals

  1. Fix the broken cache first — current cache promises nothing and delivers nothing
  2. Long-lived caching with smart freshness checks — listing structural data doesn't change, treat it accordingly
  3. 6 tools — one per user intent
  4. Batch analysis — analyze many listings in one call
  5. Persistent enrichment — missing tables, feedback implementation
  6. Output matches intent — each tool returns only what is relevant
  7. listing_description available for AI interpretation in finn_analyze_ad

Architecture

Caching strategy (revised)

Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.

Two-tier model:

┌────────────────────────────────────────────────────────────────┐
│  STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│  - finn_ads.payload (description, area, year, etc.)            │
│  - eiendom_units.payload (lat, lng, property_type, etc.)       │
│  - similar_units.payload (completed sales — immutable)         │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│  VOLATILE DATA (short TTL, cheap refresh)                       │
│  - price, status, days_on_market                                │
│  - eiendom_units.estimated_selling_price                        │
└────────────────────────────────────────────────────────────────┘

Cache TTLs (revised)

Data TTL Refresh strategy
FINN ad structural 30 days Full refetch only
FINN ad price/status 6 hours Lightweight check, falls back to full refetch if status changed
Eiendom.no unit structural 30 days Full refetch only
Eiendom.no estimate 7 days Refresh on access
Similar units (sold comps) 60 days Immutable rows; new rows appear over time
Search pages 6 hours Content-hash check, only re-scrape if list actually changed
Analysis result Never expires Invalidated by deps_hash change

Lightweight price/status check: A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small price_widget block — much cheaper than the full ad page. If price unchanged, bump last_verified_at; if changed, full refetch.

Database schema changes

-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.

-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
    finnkode    TEXT PRIMARY KEY,
    verdict     TEXT NOT NULL,  -- 'liked' | 'disliked' | 'maybe' | 'visited'
    notes       TEXT,
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);

-- New: price history (append-only)
CREATE TABLE price_history (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    finnkode     TEXT NOT NULL,
    total_price  INTEGER,
    asking_price INTEGER,
    sale_status  TEXT,
    recorded_at  TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);

-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    search_url  TEXT NOT NULL,
    finnkodes   TEXT NOT NULL,  -- JSON array
    created_at  TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);

-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);

Tools (v2) — 6 total

Intent: Ranked list of all listings in this search.

Input:
  search_url: string
  refresh?: boolean      // force re-fetch even if cache is valid
  max_pages?: number     // default 5

Output:
  total: number
  cache_status: {
    listings_from_cache: number
    listings_refreshed: number
    listings_freshly_scraped: number
  }
  listings: Array<{
    finnkode, rank, score, url, address, district,
    area_m2, bedrooms, floor, construction_year,
    total_price, common_costs, shared_debt, sqm_price,
    price_vs_estimate,  // negative = below estimate
    market_placement, dom, categories, risks
  }>

Behaviour: Returns ALL scraped listings, not limited by detail_limit. Listings without enrichment get score: null. Lazy enrichment is triggered by finn_analyze_ad.

2. finn_analyze_ad

Intent: Deep-dive into one or more specific listings.

Input:
  finnkode: string | string[]   // single or batch
  refresh?: boolean             // bypass cache

Output:
  // Single string input → single object
  // Array input → array of objects in same order
  finnkode: string
  url: string
  address: string
  listing_description: string   // ← INCLUDED for AI interpretation
  score: {
    total: number
    breakdown: Record<string, number>
    nearby_transit: { tbane: [...], trikk: [...] }
  }
  price: {
    total, asking, shared_debt, common_costs, sqm_price,
    estimate, estimate_lower, estimate_upper,
    vs_estimate, market_placement
  }
  property: {
    type, ownership, area_m2, bedrooms, floor,
    construction_year, has_balcony, has_elevator, has_garage
  }
  market: {
    dom, sale_status, avg_comp_sqm_price, comp_count,
    comps: Array<{address, usable_area, floor, construction_year,
                  selling_price, sqm_price, days_on_market, finalized_at}>  // top 15
  }
  price_history: Array<{ total_price, asking_price, recorded_at }>
  categories: string[]
  risks: string[]
  cache_age: {
    structural_days: number    // age of last full refetch
    price_hours: number        // age of last price verification
  }

Batch behaviour: Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have {finnkode, error: "..."} shape.

3. finn_analyze_unit_images

Intent: Visual assessment — condition, views, room feel.

Unchanged from current implementation. Returns ImageContent blocks, not URLs.

Input:
  unit_code: string
  max_images?: number   // default 8

4. finn_get_new_ads_since_last_run

Intent: What has changed since I last checked this search?

Input:
  search_url: string

Output:
  new_ads: Array<{finnkode, address, score, total_price, categories, url}>
  removed_ads: Array<{finnkode, address}>
  changed_ads: Array<{
    finnkode, address,
    changes: Array<{field, from, to}>   // typically price/status
  }>
  since: string  // ISO timestamp of previous run

5. finn_save_feedback

Intent: Save my verdict on a listing.

Input:
  finnkode: string
  verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
  notes?: string

Output:
  ok: boolean
  finnkode: string
  verdict: string

6. finn_get_shortlist

Intent: Show me reviewed listings, or find similar to one I liked.

Input:
  verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
  find_similar_to?: string  // finnkode — return listings similar to this
  min_score?: number
  limit?: number            // default 10

Output:
  listings: Array<{
    finnkode, address, score, total_price,
    verdict?, notes?, categories, url
  }>

Tools removed

Tool Reason
finn_build_unit_vector Internal impl detail
finn_decode_unit_vector Debug utility, no user value
finn_resolve_eiendom_unit Internal mapping, runs automatically in analyze_ad
finn_get_ad Raw fetch without scoring — analyze_ad covers it
finn_get_eiendom_unit Raw Eiendom.no fetch, internal
finn_get_similar_units Takes unit_vector directly, internal
finn_analyze_ad_against_comps Absorbed into analyze_ad (comps always included)
finn_compare_ads Absorbed into analyze_ad(finnkode: string[])
finn_find_similar_to_liked_ad Absorbed into get_shortlist(find_similar_to=finnkode)

12 → 6 tools. No user intent is lost. Batch use case now native via analyze_ad.


Workflows & optimizations

Lazy enrichment on demand

analyze_search returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have score: null. First analyze_ad(finnkode) call enriches and caches. Next analyze_search shows the now-cached score. Eliminates detail_limit as a user-facing parameter.

Background freshness check

On analyze_search cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.

Re-score without refetch

Scoring weights are configurable. If the user changes weights, re-score from cached finn_ads + eiendom_units + similar_units without any network calls. Invalidates analysis_cache only, not raw data.

Price drop detection

price_history table enables finn_get_shortlist(price_dropped_since: timestamp) — surface listings that dropped price recently. Built on existing append-only writes.

Cache warming on save_feedback

When verdict='liked', pre-fetch similar units in background. Next find_similar_to=finnkode call is instant.

Batch enrichment via parallel Eiendom.no

Current enrichment is sequential per ad. Parallel-batch up to N at a time via asyncio.gather already exists in analyze_search — use the same pattern in analyze_ad(finnkode: string[]).

Cache inspection

Internal-only — useful for debugging. Add a --cache-status CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.


Output principles

Never in any tool response:

  • unit_vector / raw Eiendom.no vector
  • unit_images URL lists (use finn_analyze_unit_images)
  • Internal timestamps (fetched_at, detail_fetched_at, computed_at)
  • lat / lng coordinates

listing_description:

  • Not in finn_analyze_search — too long, 77 × 500 words = noise
  • Yes in finn_analyze_ad — AI needs it to interpret risk flags, clauses, edge cases

Migration plan

Phase 0 — Fix the broken cache (BLOCKER)

Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.

  • Audit the running deployment. Compare the deployed cache.py to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
  • Backfill content_hash for existing rows. Compute from stored payloads.
  • Fix ensure_eiendom_unit_code persistence. Only 36/222 ads have eiendom_unit_code in their payload — verify the mutation reaches save_finn_ad before serialisation.
  • Verify save_analysis actually fires. Add unit test confirming analysis_cache row count increases after analyze_ad call. Currently 0 rows after 222 ad fetches.
  • Add CLI cache-status command for ongoing visibility.

Success criteria:

  • analysis_cache populated after any analyze_search run
  • Repeat analyze_search within TTL window: zero network calls, sub-second response
  • All content_hash columns populated across finn_ads, eiendom_units, similar_units

Phase 1 — Longer cache TTLs + freshness model

  • Update config.py TTLs (see table above)
  • Add last_verified_at column to finn_ads
  • Implement lightweight price/status check (HEAD or price_widget scrape)
  • On cache hit, kick off async refresh if last_verified_at is stale
  • Update _is_fresh logic to use TTL only on last_verified_at, not fetched_at

Success criteria:

  • Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
  • Same listing fetched today: returns from cache, no network call
  • Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis

Phase 2 — Missing tables and stub implementations

  • Create user_feedback, price_history, search_runs tables
  • Implement feedback.py — replace all TODO stubs with DB writes
  • Populate price_history on every save_finn_ad call (append-only)
  • Populate search_runs on every analyze_search call

Success criteria:

  • finn_save_feedback writes to DB; finn_get_shortlist(verdict=...) returns it
  • finn_get_new_ads_since_last_run returns real diff from last run
  • price_history populated when a re-fetched ad has changed price

Phase 3 — Output payload cleanup (no breaking tool changes)

  • Stop stripping listing_description in _slim_listing() for analyze_ad
  • Remove unit_images, unit_vector, internal timestamps from analyze_ad response
  • Add price_history and cache_age to analyze_ad response
  • Add price_vs_estimate and cache_status to analyze_search response

Success criteria:

  • finn_analyze_search on 30 listings: < 50KB
  • finn_analyze_ad per listing: < 8KB excluding description, < 12KB including

Phase 4 — Consolidate to 6 tools + batch (breaking change)

  • Remove the 9 redundant tools from mcp_server.py
  • Update finn_analyze_ad to accept string | string[] — single or batch
  • Add find_similar_to parameter to finn_get_shortlist
  • Always include comps in analyze_ad — drop include_eiendom_no / include_similar_units flags
  • Migrate all test_mcp_integration.py tests to new tool surface

Success criteria:

  • finn_analyze_ad(["a", "b", "c"]): one round trip, parallel internal fetch
  • All existing use cases covered by 6 tools

Phase 5 — Lazy enrichment + workflow additions

  • analyze_search returns all scraped listings, not just detail_limit count
  • Listings without enrichment get score: null, enriched on first analyze_ad call
  • Background warm-up on save_feedback(liked) → pre-fetch similar units
  • Re-score endpoint (or flag) that rebuilds scores from cached raw data

Success criteria:

  • analyze_search on 77-result search: all 77 returned, no detail_limit truncation
  • Subsequent analyze_ad on a previously-unenriched listing: enriches + caches + returns
  • Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no

Success metrics

Metric Now Target
Number of tools 12 6
content_hash populated rows 0% 100%
analysis_cache row count after search 0 matches analyzed_listings
eiendom_unit_code populated in stored ads 36/222 (16%) ~95% (resale only)
listing_description available to AI No Yes (in finn_analyze_ad)
Feedback actually persisted No (stub) Yes
finn_analyze_search payload (30 ads) ~215KB < 50KB
finn_analyze_ad payload per ad ~40KB < 12KB
Repeat search within 1 week Full recompute 0 network calls, < 1s
Listings unscored due to detail_limit 47 of 77 0 (lazy enrichment)
Batch analyze 10 ads 10 round-trips 1 round-trip
FINN ad structural TTL 24h 30 days