Files

T

ole 55d93894ac feat(refactor): Document refactoring progress and phases in markdown

feat(scripts): Add backfill script for content_hash in cache tables

feat(scripts): Create recompute script for analysis_cache population

test(tests): Implement comprehensive tests for analysis module functions

fix(tests): Update CLI tests to assert errors on stderr instead of stdout

fix(tests): Adjust MCP integration tests to pass context parameter correctly

fix(tests): Modify service tests to return hash on save functions for consistency

2026-05-29 15:16:57 +00:00

18 KiB

Raw Blame History

PRD: finn-mcp v2

Current State (from codebase + DB inspection)

What already works

SQLite database (data/finn.sqlite) with row counts: 222 finn_ads, 149 eiendom_units, 56 similar_units
Hash-aware caching architecture is designed (see cache.py docstring)
Transport scoring is implemented (score_transport uses lat/lng from Eiendom.no)
listing_description is stored in the FinnAd model
finn_analyze_unit_images downloads, resizes to 1024px, returns as ImageContent — Claude sees images directly

Critical bugs discovered

Analysis cache is dead. analysis_cache table has 0 rows. Every search recomputes scoring from scratch.
content_hash is NULL on every row in finn_ads, eiendom_units, similar_units — 100% NULL across 427 rows. The _compute_deps_hash function therefore returns a deterministic hash of empty strings on every call.
Schema dump shows , content_hash TEXT) appended — column was added via ALTER TABLE after data already existed. Either the running deployment doesn't populate it on writes, or no backfill migration was run.
Only 36 of 222 ads have eiendom_unit_code populated in the stored payload. Enrichment is failing or the resolved unit code isn't being persisted back to the ad row.
Search page cache (cache_meta) all rows expired May 16 — 60-min TTL is far too short.

Known design problems

feedback.py is a stub — all three functions are # TODO, nothing is persisted. No user_feedback table.
No price_history table.
No search_runs table with finnkodes per search.
listing_description is actively stripped in _slim_listing() in mcp_server.py.
detail_limit means only N listings get full Eiendom.no analysis — the rest are unscored.
No batch analysis — analyzing 46 listings requires 46 sequential MCP calls.
12 tools, 7 of which are internal plumbing.
Cache TTLs are far too short — 24h on listing data forces full re-fetch on day-2 repeat searches.

Goals

Fix the broken cache first — current cache promises nothing and delivers nothing
Long-lived caching with smart freshness checks — listing structural data doesn't change, treat it accordingly
6 tools — one per user intent
Batch analysis — analyze many listings in one call
Persistent enrichment — missing tables, feedback implementation
Output matches intent — each tool returns only what is relevant
listing_description available for AI interpretation in finn_analyze_ad

Architecture

Caching strategy (revised)

Listings don't fundamentally change on FINN once posted. Address, area, year, property type, description, eiendom_unit_code mapping — all stable. What changes: price, sale status, DOM. Treat structural data as effectively immutable; check price/status separately and cheaply.

Two-tier model:

┌────────────────────────────────────────────────────────────────┐
│  STRUCTURAL DATA (long TTL, full refetch only when invalidated)│
│  - finn_ads.payload (description, area, year, etc.)            │
│  - eiendom_units.payload (lat, lng, property_type, etc.)       │
│  - similar_units.payload (completed sales — immutable)         │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│  VOLATILE DATA (short TTL, cheap refresh)                       │
│  - price, status, days_on_market                                │
│  - eiendom_units.estimated_selling_price                        │
└────────────────────────────────────────────────────────────────┘

Cache TTLs (revised)

Data	TTL	Refresh strategy
FINN ad structural	30 days	Full refetch only
FINN ad price/status	6 hours	Lightweight check, falls back to full refetch if status changed
Eiendom.no unit structural	30 days	Full refetch only
Eiendom.no estimate	7 days	Refresh on access
Similar units (sold comps)	60 days	Immutable rows; new rows appear over time
Search pages	6 hours	Content-hash check, only re-scrape if list actually changed
Analysis result	Never expires	Invalidated by `deps_hash` change

Lightweight price/status check: A FINN ad page has a stable URL. Fetch headers only (HEAD) or scrape the small price_widget block — much cheaper than the full ad page. If price unchanged, bump last_verified_at; if changed, full refetch.

Database schema changes

-- Add to finn_ads
ALTER TABLE finn_ads ADD COLUMN last_verified_at TEXT;
-- Tracks when we last confirmed price/status, separate from fetched_at
-- which tracks when we last did a full refetch.

-- New: user feedback (replaces feedback.py stubs)
CREATE TABLE user_feedback (
    finnkode    TEXT PRIMARY KEY,
    verdict     TEXT NOT NULL,  -- 'liked' | 'disliked' | 'maybe' | 'visited'
    notes       TEXT,
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);

-- New: price history (append-only)
CREATE TABLE price_history (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    finnkode     TEXT NOT NULL,
    total_price  INTEGER,
    asking_price INTEGER,
    sale_status  TEXT,
    recorded_at  TEXT NOT NULL
);
CREATE INDEX idx_price_history_finnkode_recorded ON price_history(finnkode, recorded_at);

-- New: search runs (for finn_get_new_ads_since_last_run)
CREATE TABLE search_runs (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    search_url  TEXT NOT NULL,
    finnkodes   TEXT NOT NULL,  -- JSON array
    created_at  TEXT NOT NULL
);
CREATE INDEX idx_search_runs_url_created ON search_runs(search_url, created_at);

-- Indexes for stale-detection scans
CREATE INDEX idx_finn_ads_verified ON finn_ads(last_verified_at);
CREATE INDEX idx_eiendom_units_fetched ON eiendom_units(fetched_at);

Tools (v2) — 6 total

1. `finn_analyze_search`

Intent: Ranked list of all listings in this search.

Input:
  search_url: string
  refresh?: boolean      // force re-fetch even if cache is valid
  max_pages?: number     // default 5

Output:
  total: number
  cache_status: {
    listings_from_cache: number
    listings_refreshed: number
    listings_freshly_scraped: number
  }
  listings: Array<{
    finnkode, rank, score, url, address, district,
    area_m2, bedrooms, floor, construction_year,
    total_price, common_costs, shared_debt, sqm_price,
    price_vs_estimate,  // negative = below estimate
    market_placement, dom, categories, risks
  }>

Behaviour: Returns ALL scraped listings, not limited by detail_limit. Listings without enrichment get score: null. Lazy enrichment is triggered by finn_analyze_ad.

2. `finn_analyze_ad`

Intent: Deep-dive into one or more specific listings.

Input:
  finnkode: string | string[]   // single or batch
  refresh?: boolean             // bypass cache

Output:
  // Single string input → single object
  // Array input → array of objects in same order
  finnkode: string
  url: string
  address: string
  listing_description: string   // ← INCLUDED for AI interpretation
  score: {
    total: number
    breakdown: Record<string, number>
    nearby_transit: { tbane: [...], trikk: [...] }
  }
  price: {
    total, asking, shared_debt, common_costs, sqm_price,
    estimate, estimate_lower, estimate_upper,
    vs_estimate, market_placement
  }
  property: {
    type, ownership, area_m2, bedrooms, floor,
    construction_year, has_balcony, has_elevator, has_garage
  }
  market: {
    dom, sale_status, avg_comp_sqm_price, comp_count,
    comps: Array<{address, usable_area, floor, construction_year,
                  selling_price, sqm_price, days_on_market, finalized_at}>  // top 15
  }
  price_history: Array<{ total_price, asking_price, recorded_at }>
  categories: string[]
  risks: string[]
  cache_age: {
    structural_days: number    // age of last full refetch
    price_hours: number        // age of last price verification
  }

Batch behaviour: Up to 50 finnkodes per call. Internal parallelism, single MCP round-trip. Returns array in input order; failed lookups have {finnkode, error: "..."} shape.

3. `finn_analyze_unit_images`

Intent: Visual assessment — condition, views, room feel.

Unchanged from current implementation. Returns ImageContent blocks, not URLs.

Input:
  unit_code: string
  max_images?: number   // default 8

4. `finn_get_new_ads_since_last_run`

Intent: What has changed since I last checked this search?

Input:
  search_url: string

Output:
  new_ads: Array<{finnkode, address, score, total_price, categories, url}>
  removed_ads: Array<{finnkode, address}>
  changed_ads: Array<{
    finnkode, address,
    changes: Array<{field, from, to}>   // typically price/status
  }>
  since: string  // ISO timestamp of previous run

5. `finn_save_feedback`

Intent: Save my verdict on a listing.

Input:
  finnkode: string
  verdict: 'liked' | 'disliked' | 'maybe' | 'visited'
  notes?: string

Output:
  ok: boolean
  finnkode: string
  verdict: string

6. `finn_get_shortlist`

Intent: Show me reviewed listings, or find similar to one I liked.

Input:
  verdict?: 'liked' | 'disliked' | 'maybe' | 'visited'
  find_similar_to?: string  // finnkode — return listings similar to this
  min_score?: number
  limit?: number            // default 10

Output:
  listings: Array<{
    finnkode, address, score, total_price,
    verdict?, notes?, categories, url
  }>

Tools removed

Tool	Reason
`finn_build_unit_vector`	Internal impl detail
`finn_decode_unit_vector`	Debug utility, no user value
`finn_resolve_eiendom_unit`	Internal mapping, runs automatically in `analyze_ad`
`finn_get_ad`	Raw fetch without scoring — `analyze_ad` covers it
`finn_get_eiendom_unit`	Raw Eiendom.no fetch, internal
`finn_get_similar_units`	Takes unit_vector directly, internal
`finn_analyze_ad_against_comps`	Absorbed into `analyze_ad` (comps always included)
`finn_compare_ads`	Absorbed into `analyze_ad(finnkode: string[])`
`finn_find_similar_to_liked_ad`	Absorbed into `get_shortlist(find_similar_to=finnkode)`

12 → 6 tools. No user intent is lost. Batch use case now native via analyze_ad.

Workflows & optimizations

Lazy enrichment on demand

analyze_search returns all scraped listings immediately with whatever data is cached. Listings without Eiendom.no enrichment have score: null. First analyze_ad(finnkode) call enriches and caches. Next analyze_search shows the now-cached score. Eliminates detail_limit as a user-facing parameter.

Background freshness check

On analyze_search cache hit, kick off async refresh of any items older than the volatile-data TTL (6h price check). User gets immediate response from cache; next call benefits from refreshed data.

Re-score without refetch

Scoring weights are configurable. If the user changes weights, re-score from cached finn_ads + eiendom_units + similar_units without any network calls. Invalidates analysis_cache only, not raw data.

Price drop detection

price_history table enables finn_get_shortlist(price_dropped_since: timestamp) — surface listings that dropped price recently. Built on existing append-only writes.

Cache warming on save_feedback

When verdict='liked', pre-fetch similar units in background. Next find_similar_to=finnkode call is instant.

Batch enrichment via parallel Eiendom.no

Current enrichment is sequential per ad. Parallel-batch up to N at a time via asyncio.gather already exists in analyze_search — use the same pattern in analyze_ad(finnkode: string[]).

Cache inspection

Internal-only — useful for debugging. Add a --cache-status CLI command (not an MCP tool) that reports row counts, oldest/newest fetched_at, NULL-hash rows, missing eiendom_unit_codes.

Output principles

Never in any tool response:

unit_vector / raw Eiendom.no vector
unit_images URL lists (use finn_analyze_unit_images)
Internal timestamps (fetched_at, detail_fetched_at, computed_at)
lat / lng coordinates

listing_description:

Not in finn_analyze_search — too long, 77 × 500 words = noise
Yes in finn_analyze_ad — AI needs it to interpret risk flags, clauses, edge cases

Migration plan

Phase 0 — Fix the broken cache (BLOCKER)

Nothing else delivers value until this is fixed. The current cache stores nothing reusable across sessions.

Audit the running deployment. Compare the deployed cache.py to the source we have. Hashes are NULL in DB despite source code populating them — find the divergence.
Backfill content_hash for existing rows. Compute from stored payloads.
Fix ensure_eiendom_unit_code persistence. Only 36/222 ads have eiendom_unit_code in their payload — verify the mutation reaches save_finn_ad before serialisation.
Verify save_analysis actually fires. Add unit test confirming analysis_cache row count increases after analyze_ad call. Currently 0 rows after 222 ad fetches.
Add CLI cache-status command for ongoing visibility.

Success criteria:

analysis_cache populated after any analyze_search run
Repeat analyze_search within TTL window: zero network calls, sub-second response
All content_hash columns populated across finn_ads, eiendom_units, similar_units

Phase 1 — Longer cache TTLs + freshness model

Update config.py TTLs (see table above)
Add last_verified_at column to finn_ads
Implement lightweight price/status check (HEAD or price_widget scrape)
On cache hit, kick off async refresh if last_verified_at is stale
Update _is_fresh logic to use TTL only on last_verified_at, not fetched_at

Success criteria:

Listing fetched 28 days ago, never re-verified: returns from cache, triggers async verify
Same listing fetched today: returns from cache, no network call
Price changed since last fetch: detected by lightweight check, triggers full refetch + invalidates analysis

Phase 2 — Missing tables and stub implementations

Create user_feedback, price_history, search_runs tables
Implement feedback.py — replace all TODO stubs with DB writes
Populate price_history on every save_finn_ad call (append-only)
Populate search_runs on every analyze_search call

Success criteria:

finn_save_feedback writes to DB; finn_get_shortlist(verdict=...) returns it
finn_get_new_ads_since_last_run returns real diff from last run
price_history populated when a re-fetched ad has changed price

Phase 3 — Output payload cleanup (no breaking tool changes)

Stop stripping listing_description in _slim_listing() for analyze_ad
Remove unit_images, unit_vector, internal timestamps from analyze_ad response
Add price_history and cache_age to analyze_ad response
Add price_vs_estimate and cache_status to analyze_search response

Success criteria:

finn_analyze_search on 30 listings: < 50KB
finn_analyze_ad per listing: < 8KB excluding description, < 12KB including

Phase 4 — Consolidate to 6 tools + batch (breaking change)

Remove the 9 redundant tools from mcp_server.py
Update finn_analyze_ad to accept string | string[] — single or batch
Add find_similar_to parameter to finn_get_shortlist
Always include comps in analyze_ad — drop include_eiendom_no / include_similar_units flags
Migrate all test_mcp_integration.py tests to new tool surface

Success criteria:

finn_analyze_ad(["a", "b", "c"]): one round trip, parallel internal fetch
All existing use cases covered by 6 tools

Phase 5 — Lazy enrichment + workflow additions

analyze_search returns all scraped listings, not just detail_limit count
Listings without enrichment get score: null, enriched on first analyze_ad call
Background warm-up on save_feedback(liked) → pre-fetch similar units
Re-score endpoint (or flag) that rebuilds scores from cached raw data

Success criteria:

analyze_search on 77-result search: all 77 returned, no detail_limit truncation
Subsequent analyze_ad on a previously-unenriched listing: enriches + caches + returns
Scoring weight change re-runs analysis without re-fetching FINN or Eiendom.no

Success metrics

Metric	Now	Target
Number of tools	12	6
`content_hash` populated rows	0%	100%
`analysis_cache` row count after search	0	matches analyzed_listings
`eiendom_unit_code` populated in stored ads	36/222 (16%)	~95% (resale only)
`listing_description` available to AI	No	Yes (in `finn_analyze_ad`)
Feedback actually persisted	No (stub)	Yes
`finn_analyze_search` payload (30 ads)	~215KB	< 50KB
`finn_analyze_ad` payload per ad	~40KB	< 12KB
Repeat search within 1 week	Full recompute	0 network calls, < 1s
Listings unscored due to `detail_limit`	47 of 77	0 (lazy enrichment)
Batch analyze 10 ads	10 round-trips	1 round-trip
FINN ad structural TTL	24h	30 days

18 KiB Raw Blame History Unescape Escape

PRD: finn-mcp v2

Current State (from codebase + DB inspection)

What already works

Critical bugs discovered

Known design problems

Goals

Architecture

Caching strategy (revised)

Cache TTLs (revised)

Database schema changes

Tools (v2) — 6 total

1. finn_analyze_search

2. finn_analyze_ad

3. finn_analyze_unit_images

4. finn_get_new_ads_since_last_run

5. finn_save_feedback

6. finn_get_shortlist

Tools removed

Workflows & optimizations

Lazy enrichment on demand

Background freshness check

Re-score without refetch

Price drop detection

Cache warming on save_feedback

Batch enrichment via parallel Eiendom.no

Cache inspection

Output principles

Migration plan

Phase 0 — Fix the broken cache (BLOCKER)

Phase 1 — Longer cache TTLs + freshness model

Phase 2 — Missing tables and stub implementations

Phase 3 — Output payload cleanup (no breaking tool changes)

Phase 4 — Consolidate to 6 tools + batch (breaking change)

Phase 5 — Lazy enrichment + workflow additions

Success metrics

18 KiB

Raw Blame History

1. `finn_analyze_search`

2. `finn_analyze_ad`

3. `finn_analyze_unit_images`

4. `finn_get_new_ads_since_last_run`

5. `finn_save_feedback`

6. `finn_get_shortlist`