Files
finn-mcp/PRD.md
T
2026-05-16 06:54:17 +00:00

1556 lines
65 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD: finn-eiendom-mcp — Personal Real Estate Scout
> Private, self-hosted property analysis platform built around a FINN scraper, an Eiendom.no enrichment layer, a scoring engine, and a SQLite cache. Exposed through three coordinated entry points: a **Python library** (`finn_eiendom`), an **MCP server** (FastMCP, stdio + optional HTTP), and a **CLI** (`finn-eiendom`). The Python library is the source of truth — MCP and CLI are thin, parallel front ends over the same service layer.
---
## 1. Summary
`finn-eiendom-mcp` analyzes a FINN real-estate search URL and returns a ranked shortlist of properties enriched with Eiendom.no estimates, comparable recently-sold units, scoring, risk flags, and broker questions. The same domain code powers:
1. **MCP tools** for Claude Desktop / AI clients / n8n / agents.
2. **A CLI** for terminal-driven manual analysis and shell scripting.
3. **A Python library** that tests and notebooks can call directly.
```text
FINN search URL
→ listings (search cards)
→ FINN details
→ Eiendom.no enrichment (unit search + unit detail)
→ unit_vector (built locally)
→ similar-units / comps
→ scoring + categorization
→ shortlist + risks + next steps + broker questions
```
This is a **private, low-frequency decision-support tool**. Not a SaaS, not a crawler, not a bidding tool, not legal/technical/financial advice.
---
## 2. Why three entry points
| Layer | Audience | Transport | Purpose |
| ---------------- | ------------------------------------- | -------------------- | ----------------------------------------------------------------------------------- |
| Python library | tests, notebooks, custom scripts | in-process | Source of truth. Pure functions + async I/O. No global state beyond SQLite path. |
| MCP server | Claude Desktop, n8n, AI agents | stdio + streamableHttp | LLM-driven analysis, shortlisting, broker prep. |
| CLI | terminal, cron, ad-hoc debugging | stdio | Quick checks, smoke tests, scripted runs, demonstrations of new behavior. |
The architectural rule: **all three layers call the same service functions**. MCP tools and CLI commands are thin wrappers around `service.py`. If a change goes into one, equivalent behavior appears in the others.
---
## 3. User context & preferences
User and partner are searching for a home in the Oslo area, roughly 912 MNOK depending on total monthly cost, rental/hybel potential, and property quality. Important preferences:
* Good location and quality of life.
* Enough space and strong floor plan.
* Minimum 2 bedrooms, preferably more.
* Balcony, terrace, views, sun, sea/nature proximity.
* Hybel/rental potential or flexible layout.
* Willing to renovate themselves if the price is right.
* Renovation need is **not** automatically negative.
* Strong interest in **bargain candidates** where competition may be lower due to older standard or poor presentation.
* Avoid uncontrolled technical/legal risk: moisture, rot, illegal hybel, unapproved changes, severe TG3, unclear housing-association finances.
---
## 4. Problem
FINN search results are not ranked by the user's actual decision criteria. Manually triaging dozens of listings is slow and inconsistent. The current process lacks:
* Automated extraction of FINN search and listing data.
* Linking FINN listings to structured Eiendom.no units.
* Price evaluation against Eiendom.no estimates and comparable sales.
* Similar-property discovery from listings the user already likes.
* Consistent scoring of price, location, layout, risk, renovation upside, hybel potential.
* Local history of seen listings, changes, scores, and feedback.
* Integration with AI clients and shell tooling.
---
## 5. Goals
The system shall:
1. Accept a FINN real estate search URL via library, MCP tool, or CLI command.
2. Parse FINN search pages and extract listing cards, URLs, and finnkoder.
3. Fetch FINN listing detail pages and parse into a structured `FinnAd`.
4. Normalize Norwegian numbers, areas, currencies, dates, URLs.
5. Resolve each FINN URL to an Eiendom.no `unitCode` and fetch the unit detail.
6. Build a base64url-encoded `unit_vector` from unit detail and fetch similar-units / comps.
7. Score each listing using FINN data, Eiendom.no estimates, comps, user preferences, and risk signals.
8. Return a ranked shortlist with reasons, risks, next steps, and broker questions.
9. Cache HTML, JSON, parsed ads, units, comps, scores, and feedback in SQLite.
10. Detect new/removed/changed listings between runs of the same search URL.
11. Store user feedback (`liked`, `rejected`, `interesting`, `risk`, `viewing_candidate`, etc.) and surface it in subsequent runs.
12. Expose all of the above through MCP tools, CLI commands, and Python functions with consistent semantics.
13. Run locally in a project-local virtualenv. Docker is supported but optional.
---
## 6. Non-goals
MVP shall not:
* Crawl all of FINN or Eiendom.no.
* Bypass rate limits, bot protection, authentication, or access controls.
* Bulk-harvest or redistribute data.
* Contact brokers automatically.
* Place bids automatically.
* Interpret full PDF condition reports.
* Provide official valuation, legal advice, technical inspection, or mortgage advice.
* Expose a public SaaS service.
* Build a web UI.
---
## 7. Primary use cases
| ID | Use case | Description |
| ---- | ----------------------------- | ------------------------------------------------------------------------------------ |
| UC1 | Analyze FINN search | Paste a FINN search URL → ranked shortlist with reasons/risks/next steps. |
| UC2 | Find bargain candidates | Surface listings with renovation need or weak presentation that may be underpriced. |
| UC3 | Separate renovation from risk | Treat cosmetic renovation as upside; flag technical/legal risk. |
| UC4 | Compare listings | Side-by-side comparison of multiple finnkoder. |
| UC5 | Save feedback | Mark listings as liked, rejected, interesting, risk, viewing candidate, etc. |
| UC6 | Find new listings since last run | Show new/removed/changed listings vs the prior run of the same search URL. |
| UC7 | Broker questions | Generate concrete questions based on risks, deviations, hybel status, comps. |
| UC8 | Eiendom.no enrichment | Add estimates, coordinates, area, rooms, floor, year, market data. |
| UC9 | Price fairness | Classify price as cheap / fair / expensive vs estimate and comps. |
| UC10 | Similar to liked | Find properties similar to listings the user has explicitly liked. |
| UC11 | Comparable sales | Fetch similar recently sold units to support valuation and bargain scoring. |
---
## 8. Inputs
Supported inputs across all three layers:
* FINN search URL.
* FINN listing URL.
* Finnkode (string of digits).
* List of finnkoder.
* Eiendom.no `unitCode`.
* Eiendom.no `unit_vector` (base64url string).
* User feedback verdict + notes.
* Optional scoring/preference overrides (JSON or env).
Example FINN search URL:
```text
https://www.finn.no/realestate/homes/search.html?bbox=...&area_from=60&min_bedrooms=2&price_collective_to=12000000&...
```
---
## 9. External endpoints
### 9.1 FINN HTML
Not JSON. Parse HTML, cache aggressively, run at low frequency.
| Method | URL pattern | Purpose |
| ------ | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| GET | `https://www.finn.no/realestate/homes/search.html?{query_params}` | Parse search result cards, listing URLs, finnkoder. |
| GET | `https://www.finn.no/realestate/homes/search.html?{query_params}&page={N}` | Pagination. |
| GET | `https://www.finn.no/realestate/homes/ad.html?finnkode={finnkode}` | Parse listing detail page. |
| GET | `{calendar_ics_url_from_listing_html}` | Optional: parse viewing times (prefer parsing from listing HTML first). |
Important search params: `bbox`, `location`, `area_from`, `area_to`, `price_collective_to`, `price_collective_from`, `min_bedrooms`, `facilities`, `floor_navigator`, `lifecycle`, `page`, `stored-id`.
### 9.2 Eiendom.no
Real JSON API. Used for enrichment, valuation, and similar-units.
#### 9.2.1 Resolve FINN listing → Eiendom.no unitCode
```
GET https://api.eiendom.no/api/v1/geodata/units/search/?search={url_encoded_finn_listing_url_or_address}
```
Returns:
```json
{
"units": [
{
"unitCode": "c-gxw-xmyum-s2a",
"address": "Gunnar Schjelderups v. 11D H0502, Oslo",
"geometry": { "type": "Point", "coordinates": [10.77, 59.95] }
}
],
"summary": { "totalUnitsFound": 1, "totalCitiesFound": 1 }
}
```
#### 9.2.2 Fetch unit detail
```
GET https://api.eiendom.no/api/v1/geodata/units/{unitCode}/
```
Important response fields: `unitCode`, `address`, `unitName`, `streetAddress`, `postalName`, `registrationCode`, `geometry.coordinates`, `specification.{propertyType, floor, rooms, constructionYear, usableArea}`, `valuation.{estimatedSellingPrice, estimatedSellingPriceLower, estimatedSellingPriceUpper}`, `latestMarketData.{listingPrice, monthlyCosts, squareMeterPrice, daysOnMarket, saleStatus, marketPlacementScore}`.
#### 9.2.3 Build `unit_vector` (local, not HTTP)
Encoding step before similar-units. Generated from unit detail data:
```json
{
"lon": 10.7803,
"lat": 59.9287,
"ptype": "APARTMENT",
"floor": 8,
"rooms": 5,
"built": 2005,
"area": 80,
"price": 8491082
}
```
Encoding: `unit_vector = base64url_without_padding(msgpack(payload))`.
Library functions (in `eiendom_no.py` only):
* `build_unit_vector(unit) -> str`
* `decode_unit_vector(unit_vector) -> dict`
#### 9.2.4 Fetch similar-units
```
GET https://api.eiendom.no/api/v1/geodata/units/similar/?unit_vector={unit_vector}
```
Returns a list of comparable units with `unitCode`, `address`, `geometry`, `specification`, and `marketData.{listingPrice, jointDebt, monthlyCosts, sellingPrice, squareMeterPrice, daysOnMarket, saleStatus, finalizedAt}`.
`listing_status` (RECENTLY_SOLD / FOR_SALE / CURRENT) is implemented as a **local filter** over the returned `marketData.saleStatus` and `finalizedAt`. Only pass it to the API if later experimentation confirms server-side support.
### 9.3 Optional Hjemla (disabled by default)
```
GET https://consumer-service-hjemla-prod.propcloud.no/public/market/address-list
```
Params: `marketType`, `period`, `marketStates`, `unittypes`, bbox (`swLat`, `neLat`, `swLng`, `neLng`), `limit`, `randomize`.
Useful for bbox-level market snapshots. Disabled in MVP via `HJEMLA_ENABLED=false`.
### 9.4 MCP server endpoint
stdio is the default. Optional Streamable HTTP on `POST http://{host}:8010/mcp`. Operational endpoints when running HTTP: `GET /health`, `GET /version`, `GET /debug/config`.
---
## 10. Functional requirements
### 10.1 FINN search extraction
Fetch and parse FINN search pages. Extract and deduplicate by finnkode. Support pagination via `page=N` and respect `FINN_MAX_SEARCH_PAGES`. Search-card fields when available: finnkode, URL, title, address/area, area, asking_price, total_price, common_costs, ownership_type, property_type, bedrooms, floor, viewing time, broker.
### 10.2 FINN listing detail extraction
Fetch and parse individual listing pages. Fields when available: finnkode, URL, title, address, postal_area, district, property_type, ownership_type, asking_price, total_price, shared_debt, common_costs, fees, municipal_fees, BRA/BRA-i/BRA-e/BRA-b, P-room, rooms, bedrooms, floor, construction_year, energy_rating, heating, balcony/terrace, elevator, parking/garage, viewings, listing_description, broker_name, broker_company, document_links.
### 10.3 Normalization
* Norwegian formatted numbers: `7 200 991 kr``7200991`.
* Areas: `77 m²``77`.
* Dates/viewings → ISO 8601.
* URLs → absolute.
* Missing values → `null`.
* Finnkode and Eiendom.no unitCode as strings.
### 10.4 Eiendom.no enrichment
Enabled by default. Flow: FINN listing URL → unit search → `unitCode` → unit detail → structured market data. Store: unit_code, address, coordinates, registration code, property_type, floor, rooms, construction_year, usable_area, estimated_selling_price + lower/upper, latest market data (listing_price, sqm_price, monthly_costs, days_on_market, sale_status), market_placement, raw JSON.
If enrichment fails, the analysis continues with FINN data only and marks enrichment as `unavailable`.
### 10.5 Similar-units / `unit_vector`
Required functions: `build_unit_vector(unit)`, `decode_unit_vector(unit_vector)`, `get_similar_units(unit_vector, listing_status)`. Supported listing statuses: `RECENTLY_SOLD` (default for comps), `FOR_SALE` (active recommendations), `CURRENT` (if confirmed). Similar-unit fields when available: unit_code, address, coordinates, property_type, floor, rooms, construction_year, area, listing_price, selling_price, shared_debt, common_costs, sqm_price, days_on_market, sale_status, finalized_at, raw JSON.
### 10.6 Cache and history
SQLite. Default TTLs:
| Data | Default TTL |
| -------------------- | ----------------------: |
| Search results | 3060 minutes |
| FINN listing details | 624 hours |
| Eiendom.no unit data | 24 hours |
| Similar-units | 24 hours |
| Feedback/history | Permanent until deleted |
### 10.7 Feedback
Verdict vocabulary: `liked`, `rejected`, `interesting`, `bargain_candidate`, `risk_object`, `viewing_candidate`, `viewed`, `too_expensive`, `too_small`, `too_far_out`, `too_high_risk`, `likes_location`, `likes_layout`, `dislikes_area`. Stored permanently. `liked` listings are used as seeds for similar-to-liked recommendations. Feedback can be used as a soft scoring signal.
### 10.8 Diffs between runs
For a normalized search URL, the system shall compare finnkoder against the previous run and report `new_ads`, `removed_ads`, and `changed_ads` (price, common costs, status). Optionally re-fetch only new or changed details.
---
## 11. Scoring and classification
### 11.1 Score model (clamped to 0100)
| Category | Range |
| ------------------------------------- | ----: |
| Economy / total cost | 020 |
| Eiendom.no estimate / market position | 020 |
| Comparable sales / similar-units | 020 |
| Location | 015 |
| Layout and potential | 020 |
| Outdoor space / view / sun | 015 |
| Hybel / rental potential | 010 |
| Renovation / bargain upside | 015 |
| Technical / legal risk | -200 |
### 11.2 Categories
`bargain_candidate`, `safe_candidate`, `lifestyle_candidate`, `hybel_candidate`, `renovation_candidate`, `similar_to_liked`, `comparable_sale_match`, `risk_object`, `too_expensive`, `not_interesting`, `manual_review_required`.
### 11.3 Bargain candidate logic
A listing may be a bargain candidate when several of these are true: low sqm price vs comps, listing price below estimate, price near lower estimate interval, sqm price below similar recently sold, older standard / renovation need / weak presentation, strong underlying location/layout, suitable size, risk appears controllable.
### 11.4 Renovation logic
Renovation need is not automatically negative.
* **Opportunity:** older standard, modernization need, renovation object, cosmetic wear, outdated kitchen/surfaces, weak presentation, layout improvement potential.
* **Risk:** moisture, rot, mold, drainage issues, load-bearing concerns, illegal/unapproved changes, non-approved hybel, serious electrical/wet-room deviations, TG3 with high cost or safety implications.
### 11.5 Hybel / rental logic
* **Positive:** hybel, rental unit, separate entrance, extra bathroom/kitchenette, basement/sokkel, secondary section, stated rental income.
* **Risk:** not approved, not applied for, not building-reported, only "disposable room", not approved for permanent residence, board approval required.
Output classifies as: documented legal hybel / possible hybel potential / unclear/risky hybel / not relevant.
### 11.6 Market and comparable outputs
Market estimate: `market_score`, `price_vs_estimate_pct`, `price_position` (`below_estimate` / `within_estimate_range` / `above_estimate` / `unknown`), `sqm_price_position` (`cheap` / `normal` / `expensive` / `unknown`).
Comparable: `comparable_score`, `comps_count`, `avg_selling_price`, `median_selling_price` (where possible), `avg_sqm_price`, `sqm_price_delta_pct`, `price_delta_pct`, `confidence` (`low` / `medium` / `high`).
Risk factors: too few comps, comps too far away, large differences in area/rooms/floor/year, old sale dates, low confidence.
---
## 12. Technical architecture
```text
AI client / Claude Desktop / n8n / agent ← MCP layer
FastMCP (stdio | streamable HTTP)
User in a terminal ← CLI layer
finn-eiendom CLI (typer)
Python tests / notebooks / custom scripts ← Library layer
import finn_eiendom
──────── all three above share ────────
finn_eiendom.formatting ← render_* for json/markdown/table
finn_eiendom.service ← orchestration: get_or_fetch, analyze_*
finn_eiendom.analysis ← shortlist + summary building
search / ad / eiendom_no / scoring / feedback
finn_eiendom.cache (SQLite) ← html, json, ads, units, comps, scores, feedback
finn_eiendom.http (httpx) ← delay, retry, user-agent
FINN HTML + Eiendom.no JSON (+ optional Hjemla)
```
### 12.1 Module layout
```text
finn_eiendom/
__init__.py
config.py # env / defaults / TTLs
models.py # Pydantic v2 models
parser.py # number/area/date/URL/finnkode normalization
http.py # async HTTP with delay, retry, user-agent
cache.py # SQLite schema + persistence
search.py # FINN search HTML parsing + pagination
ad.py # FINN listing HTML parsing
eiendom_no.py # unit search/detail, unit_vector, similar-units
scoring.py # score model + classifications
feedback.py # verdicts + soft preference signal
analysis.py # orchestration + shortlist + summary
service.py # get_or_fetch_* + thin facade for MCP and CLI
formatting.py # render_* helpers shared by MCP and CLI
mcp_server.py # FastMCP wrappers around service
cli.py # typer-based CLI wrappers around service
__main__.py # python -m finn_eiendom → CLI entry
```
### 12.2 Layering rules
* `mcp_server.py` and `cli.py` are **thin**. They translate inputs to service calls and format outputs via `formatting.py`.
* `service.py` orchestrates cache + fetch. Every read should consult the cache first; every fresh fetch should write back.
* `analysis.py` orchestrates the full shortlist run: search → details → enrichment → comps → scoring → summary.
* Domain modules (`search`, `ad`, `eiendom_no`, `scoring`, `feedback`) are pure or only depend on `http`/`cache`.
* No layer above the service may call `httpx` or `sqlite3` directly.
---
## 13. Data model
SQLite. Existing schema already implements `finn_ads`, `eiendom_units`, `similar_units`, and `cache_meta`. MVP additions: `search_runs`, `scores`, `feedback`.
```sql
CREATE TABLE finn_ads (
finnkode TEXT PRIMARY KEY,
url TEXT,
payload TEXT NOT NULL, -- JSON-serialized FinnAd
fetched_at TEXT NOT NULL
);
CREATE TABLE eiendom_units (
unit_code TEXT PRIMARY KEY,
payload TEXT NOT NULL, -- JSON-serialized EiendomUnit
fetched_at TEXT NOT NULL
);
CREATE TABLE similar_units (
id INTEGER PRIMARY KEY AUTOINCREMENT,
unit_code TEXT NOT NULL,
listing_status TEXT NOT NULL,
payload TEXT NOT NULL, -- JSON array of SimilarUnit
fetched_at TEXT NOT NULL
);
CREATE TABLE cache_meta (
key TEXT PRIMARY KEY, -- e.g. search_page:{url}, search_cards:{url}
value TEXT NOT NULL,
expires_at TEXT
);
CREATE TABLE search_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_url TEXT NOT NULL,
normalized_url TEXT NOT NULL,
created_at TEXT NOT NULL,
total_found INTEGER,
total_parsed INTEGER,
total_scored INTEGER,
result_json TEXT -- shortlist snapshot
);
CREATE TABLE scores (
id INTEGER PRIMARY KEY AUTOINCREMENT,
finnkode TEXT NOT NULL,
search_run_id INTEGER,
total_score REAL,
economy REAL,
market_position REAL,
comparable_sales REAL,
location REAL,
layout REAL,
outdoor REAL,
rental_potential REAL,
renovation REAL,
risk REAL,
categories_json TEXT,
explanation_json TEXT,
created_at TEXT NOT NULL
);
CREATE TABLE feedback (
id INTEGER PRIMARY KEY AUTOINCREMENT,
finnkode TEXT NOT NULL,
verdict TEXT NOT NULL,
notes TEXT,
created_at TEXT NOT NULL
);
```
---
## 14. MCP design
### 14.1 Tools
All tool names use the `finn_` prefix to avoid collisions when the server runs alongside others.
| Tool | Purpose | Read-only |
| ------------------------------------- | ---------------------------------------------------------------- | :-------: |
| `finn_analyze_search` | Analyze a FINN search URL and return a ranked shortlist. | yes |
| `finn_get_ad` | Fetch structured data for one finnkode. | yes |
| `finn_compare_ads` | Compare multiple listings side by side. | yes |
| `finn_save_feedback` | Store feedback/verdict/notes. | no |
| `finn_get_shortlist` | Fetch stored shortlist from a search run. | yes |
| `finn_get_new_ads_since_last_run` | Detect new/removed/changed listings vs the previous run. | yes |
| `finn_resolve_eiendom_unit` | Map FINN URL → Eiendom.no `unitCode`. | yes |
| `finn_get_eiendom_unit` | Fetch Eiendom.no unit detail by `unitCode`. | yes |
| `finn_enrich_ad` | Combine FINN listing and Eiendom.no enrichment. | yes |
| `finn_build_unit_vector` | Build a base64url `unit_vector` from a `unitCode`. | yes |
| `finn_decode_unit_vector` | Decode a `unit_vector` for inspection/debugging. | yes |
| `finn_get_similar_units` | Fetch comps/recommendations from `unit_vector`. | yes |
| `finn_find_similar_to_liked_ad` | Find properties similar to a listing the user has liked. | yes |
| `finn_analyze_ad_against_comps` | Evaluate one listing against `RECENTLY_SOLD` comps. | yes |
All read-only tools set `readOnlyHint=True, destructiveHint=False, openWorldHint=True`. `finn_save_feedback` sets `readOnlyHint=False, destructiveHint=False, idempotentHint=False`.
### 14.2 Tool input schemas (Pydantic v2)
```python
class AnalyzeSearchInput(BaseModel):
search_url: str = Field(..., description="Full FINN search URL")
max_pages: int = Field(default=3, ge=1, le=10)
detail_limit: int = Field(default=20, ge=1, le=100)
include_details: bool = True
include_eiendom_no: bool = True
include_similar_units_for_shortlist: bool = False
response_format: Literal["json", "markdown"] = "json"
class GetAdInput(BaseModel):
finnkode: str = Field(..., pattern=r"^\d+$")
force_refresh: bool = False
include_eiendom_no: bool = True
include_similar_units: bool = False
class ResolveUnitInput(BaseModel):
finn_url: str
class GetUnitInput(BaseModel):
unit_code: str
force_refresh: bool = False
class BuildUnitVectorInput(BaseModel):
unit_code: str
class DecodeUnitVectorInput(BaseModel):
unit_vector: str
class SimilarUnitsInput(BaseModel):
unit_vector: str
listing_status: Literal["RECENTLY_SOLD", "FOR_SALE", "CURRENT"] = "RECENTLY_SOLD"
force_refresh: bool = False
class FindSimilarToLikedInput(BaseModel):
finnkode: str
mode: Literal["recommendations", "comps"] = "recommendations"
listing_status: Literal["RECENTLY_SOLD", "FOR_SALE", "CURRENT"] = "FOR_SALE"
class AnalyzeAgainstCompsInput(BaseModel):
finnkode: str
listing_status: Literal["RECENTLY_SOLD"] = "RECENTLY_SOLD"
class SaveFeedbackInput(BaseModel):
finnkode: str
verdict: str
notes: Optional[str] = None
class CompareAdsInput(BaseModel):
finnkoder: List[str] = Field(..., min_length=2, max_length=10)
include_eiendom_no: bool = True
include_comps: bool = True
```
### 14.3 Tool response convention
Every tool body wraps execution in try/except and returns a JSON string. Errors return:
```python
return json.dumps({"error": True, "code": "<error_code>", "message": str(e)})
```
This keeps the protocol layer happy and lets the LLM react to recoverable failures.
When `response_format="markdown"`, return human-readable formatted text instead of JSON — produced by `formatting.py`, never inline.
### 14.4 Resources
```text
finn://preferences/current
finn://search-runs/latest
finn://search-runs/{id}
finn://ads/{finnkode}
finn://ads/{finnkode}/enriched
finn://shortlist/latest
finn://feedback/{finnkode}
finn://eiendom-units/{unitCode}
finn://eiendom-units/{unitCode}/similar/{listingStatus}
```
### 14.5 Prompts
* `evaluate_property_for_user`
* `compare_properties_for_user`
* `refine_search_from_feedback`
* `find_more_like_this`
Evaluation prompt template output: category, score, short assessment, why interesting, Eiendom.no estimate, comparable sales, main risks, bargain potential, questions for broker, should we view it.
### 14.6 Entry point
```python
# finn_eiendom/mcp_server.py
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("finn_eiendom_mcp")
# ... tools defined here ...
def main() -> None:
mcp.run(transport="stdio")
if __name__ == "__main__":
main()
```
`pyproject.toml`:
```toml
[project.scripts]
finn-eiendom-mcp = "finn_eiendom.mcp_server:main"
finn-eiendom = "finn_eiendom.cli:app"
```
---
## 15. CLI design
Built with `typer`. Every command maps 1:1 to a service function — same parameters, same defaults, same outputs.
### 15.1 Commands
```text
finn-eiendom analyze-search <url> [--max-pages 3] [--detail-limit 20] [--no-details] [--no-eiendom] [--with-similar] [--format json|markdown|table]
finn-eiendom get-ad <finnkode> [--force-refresh] [--no-eiendom] [--with-similar] [--format ...]
finn-eiendom compare <finnkode...> [--no-eiendom] [--no-comps] [--format ...]
finn-eiendom save-feedback <finnkode> <verdict> [--notes "..."]
finn-eiendom shortlist [--run-id ID] [--limit 10] [--format ...]
finn-eiendom diff <url> [--format ...] ← new / removed / changed
finn-eiendom resolve-unit <finn_url>
finn-eiendom get-unit <unit_code> [--force-refresh]
finn-eiendom enrich-ad <finnkode> [--with-similar]
finn-eiendom build-vector <unit_code>
finn-eiendom decode-vector <unit_vector>
finn-eiendom similar-units <unit_vector> [--status RECENTLY_SOLD|FOR_SALE|CURRENT]
finn-eiendom similar-to-liked <finnkode> [--mode recommendations|comps] [--status ...]
finn-eiendom analyze-against-comps <finnkode>
finn-eiendom cache stats | clear | clear-html | clear-json
finn-eiendom serve [--transport stdio|http] [--host 127.0.0.1] [--port 8010]
finn-eiendom config show | path
finn-eiendom doctor ← run a few smoke checks: cache reachable, eiendom.no reachable, finn reachable
finn-eiendom version
```
### 15.2 Output formats
* `--format json` — full structured output (default for piping into `jq`).
* `--format markdown` — same data, human-readable.
* `--format table` — concise terminal table (for `analyze-search`, `compare`, `shortlist`, `diff`).
All three are produced by `finn_eiendom.formatting`. CLI never formats inline.
### 15.3 Examples
```bash
# Triage a search live
finn-eiendom analyze-search 'https://www.finn.no/realestate/homes/search.html?location=...' --format table
# Drill into one listing
finn-eiendom get-ad 462400360 --format markdown
# Compare two finalists
finn-eiendom compare 462400360 461153194 --format markdown
# Mark a listing as liked, then ask for similar
finn-eiendom save-feedback 462400360 liked --notes "great layout, check fellesgjeld"
finn-eiendom similar-to-liked 462400360
# Operate the MCP server in HTTP mode for n8n
finn-eiendom serve --transport http --port 8010
```
### 15.4 CLI implementation pattern
```python
# finn_eiendom/cli.py
import asyncio, typer
from . import service, formatting
app = typer.Typer(no_args_is_help=True, add_completion=False)
@app.command()
def analyze_search(
url: str,
max_pages: int = 3,
detail_limit: int = 20,
no_details: bool = typer.Option(False, "--no-details"),
no_eiendom: bool = typer.Option(False, "--no-eiendom"),
with_similar: bool = typer.Option(False, "--with-similar"),
format: str = typer.Option("json", "--format"),
) -> None:
result = asyncio.run(service.analyze_search(
search_url=url,
max_pages=max_pages,
detail_limit=detail_limit,
include_details=not no_details,
include_eiendom_no=not no_eiendom,
include_similar_units_for_shortlist=with_similar,
))
typer.echo(formatting.render_shortlist(result, format))
```
CLI commands are wrappers — no business logic, no rendering. If you need to add behavior, it goes in `service.py` and gets a matching MCP tool. If you need to change rendering, edit `formatting.py`.
---
## 16. Service layer
The keystone of the architecture.
```python
# finn_eiendom/service.py — public surface
async def get_or_fetch_ad(finnkode: str, force_refresh: bool = False) -> FinnAd: ...
async def get_or_fetch_eiendom_unit(unit_code: str, force_refresh: bool = False) -> Optional[EiendomUnit]: ...
async def get_or_fetch_similar_units(unit_code: str, listing_status: str = "RECENTLY_SOLD", force_refresh: bool = False) -> list[SimilarUnit]: ...
async def analyze_search(search_url: str, *, max_pages=3, detail_limit=20, include_details=True, include_eiendom_no=True, include_similar_units_for_shortlist=False) -> dict: ...
async def analyze_ad(finnkode: str, *, include_eiendom_no=True, include_similar_units=False) -> dict: ...
async def analyze_ad_against_comps(finnkode: str, listing_status: str = "RECENTLY_SOLD") -> dict: ...
async def find_similar_to_liked(finnkode: str, *, mode="recommendations", listing_status="FOR_SALE") -> dict: ...
async def compare_ads(finnkoder: list[str], *, include_eiendom_no=True, include_comps=True) -> dict: ...
async def resolve_eiendom_unit_from_finn_url(finn_url: str) -> Optional[EiendomUnit]: ...
def build_unit_vector_for_unit_code(unit_code: str) -> dict: ...
def decode_unit_vector_to_dict(unit_vector: str) -> dict: ...
def save_feedback(finnkode: str, verdict: str, notes: Optional[str] = None) -> dict: ...
def get_shortlist(run_id: Optional[int] = None, limit: int = 10) -> dict: ...
def get_new_ads_since_last_run(search_url: str) -> dict: ...
```
Every function:
1. Opens its own SQLite connection via `cache.init_db(FINN_CACHE_PATH)`.
2. Reads from cache first, with TTLs from `config.py`.
3. On cache miss (or `force_refresh=True`), calls the relevant fetch function in `ad.py` / `eiendom_no.py`.
4. Writes the fresh result back to the cache.
5. Returns a typed model or dict, never `None` unexpectedly — failures raise with clear messages.
---
## 17. Code ownership and anti-duplication
This section is the constitution. Everything else flexes; this does not. The goal is one home for every piece of logic and one obvious answer to "where does this go?".
### 17.1 The single-home rule
Every piece of logic has exactly one home. If you're tempted to add it in two places, you're wrong about one of them — push it down a layer and call it from both.
### 17.2 Decision table — "where does this go?"
| Concern | Lives in | Never in |
| -------------------------------------------------- | --------------------------------- | -------------------------------------------------------------- |
| Parsing FINN search HTML | `search.py` | `mcp_server`, `cli`, `analysis`, `scripts` |
| Parsing FINN listing HTML | `ad.py` | `mcp_server`, `cli`, `analysis`, `scripts` |
| Norwegian number / date / URL / finnkode normalization | `parser.py` | inline anywhere — if you write a regex twice, extract it |
| HTTP requests, retry, delay, user-agent | `http.py` | `search` / `ad` / `eiendom_no` using `httpx` directly |
| SQLite reads/writes | `cache.py` | every other module — go through cache helpers |
| Eiendom.no unit search / unit detail | `eiendom_no.py` | `ad`, `search`, `analysis` (call eiendom_no, don't reimplement)|
| `unit_vector` encode / decode | `eiendom_no.py` | `mcp_server`, `cli` (call it; don't pack msgpack inline) |
| Similar-units fetching + local filtering | `eiendom_no.py` | `analysis`, `service` (call `get_similar_units`) |
| Score components | `scoring.py` | `analysis` (use `score_ad`), `mcp_server`, `cli` |
| Category assignment | `scoring.py` (`classify_ad`) | `analysis`, `mcp_server`, `cli` |
| Feedback storage + retrieval | `feedback.py` | `mcp_server`, `cli`, `analysis` |
| "Get from cache, else fetch, else save" | `service.py` (`get_or_fetch_*`) | `mcp_server`, `cli`, `analysis` (always go through service) |
| Shortlist + summary assembly | `analysis.py` | `mcp_server`, `cli` |
| End-to-end orchestration (search → shortlist) | `service.py` (`analyze_search`) | `mcp_server`, `cli` (they just call it) |
| MCP tool definitions + annotations | `mcp_server.py` | `service`, `cli` |
| MCP error wrapping `{"error": True, ...}` | `mcp_server.py` only | `service` (which raises), `cli` (which has its own exit codes) |
| CLI command definitions + Typer plumbing | `cli.py` | `service`, `mcp_server` |
| Output formatting (json / markdown / table) | `formatting.py` | inline in `mcp_server.py` or `cli.py` |
| Env-var defaults | `config.py` | hardcoded anywhere |
| Pydantic models | `models.py` | redefined locally; subclass only if needed |
### 17.3 Layering invariants
The dependency graph is acyclic and points downward:
```
cli.py ─┐
├──> service.py ──> analysis.py ──> search / ad / eiendom_no / scoring / feedback
mcp_server.py ─┘ │
│ ├──> parser.py
│ └──> http.py / cache.py
└──> formatting.py
```
Hard rules:
* `mcp_server.py` and `cli.py` are **siblings** and never call each other.
* Neither MCP nor CLI imports from `search`, `ad`, `eiendom_no`, `scoring`, `feedback`, `cache`, or `http`. They import from `service`, `models`, and `formatting` only.
* `service.py` does not import from `mcp_server` or `cli`.
* `analysis.py` does not open SQLite connections directly — it goes through `cache.py` functions.
* `search.py`, `ad.py`, `eiendom_no.py` do not open SQLite directly — they call cache helpers passed in or imported from `cache.py`.
* Nothing except `http.py` uses `httpx` directly. If `import httpx` appears anywhere else, move it.
* Nothing except `cache.py` uses `sqlite3` directly.
* Nothing except `parser.py` defines Norwegian-text regexes.
### 17.4 Anti-duplication checklist
Before merging any change, ask:
1. Is this logic already implemented somewhere? (`grep` the function name and obvious keywords.)
2. If I'm copy-pasting from another file, am I about to duplicate behavior that should live in one shared function?
3. Can a new caller use an existing `service.py` function instead of writing its own orchestration?
4. Is the same Pydantic field defined in two models? If yes, factor out a base model.
5. Am I formatting output in two places (CLI + MCP)? Move it to `formatting.py`.
6. Am I opening a SQLite connection outside `cache.py`? Move it.
7. Am I building an httpx call outside `http.py`? Move it.
8. Am I writing a Norwegian-number / area / finnkode regex outside `parser.py`? Move it.
9. Am I adding an env-var lookup outside `config.py`? Move it.
10. Did I add a new behavior with only one front end (MCP or CLI)? If it should exist in both, the service function is missing.
### 17.5 Examples — what NOT to do
**Bad:** MCP tool reaches into `ad.py` directly.
```python
# ❌ in mcp_server.py
from .ad import fetch_ad_details
@mcp.tool()
async def finn_get_ad(...):
ad = await fetch_ad_details(...) # bypasses cache!
```
**Good:** MCP tool goes through `service.py`.
```python
# ✅ in mcp_server.py
from .service import get_or_fetch_ad
@mcp.tool()
async def finn_get_ad(...):
ad = await get_or_fetch_ad(finnkode, force_refresh=force_refresh)
return ad.model_dump_json()
```
**Bad:** CLI formats output inline that MCP also needs.
```python
# ❌ in cli.py
def _render_shortlist_markdown(result): ... # 80 lines of formatting
# later in mcp_server.py, the same 80 lines copy-pasted
```
**Good:** Shared formatter.
```python
# ✅ in finn_eiendom/formatting.py
def render_shortlist(result: dict, fmt: str) -> str: ...
# cli.py and mcp_server.py both call render_shortlist(result, fmt)
```
**Bad:** Service inlines parsing or HTTP.
```python
# ❌ in service.py
async def get_or_fetch_ad(...):
html = await httpx.AsyncClient().get(url) # http belongs in http.py
soup = BeautifulSoup(html.text, "html.parser") # parsing belongs in ad.py
```
**Good:** Service delegates.
```python
# ✅ in service.py
async def get_or_fetch_ad(finnkode, force_refresh=False):
conn = cache.init_db(FINN_CACHE_PATH)
if not force_refresh:
cached = cache.get_finn_ad(conn, finnkode, ttl_hours=FINN_CACHE_TTL_AD_HOURS)
if cached:
return cached
ad = await ad_module.fetch_ad_details(finnkode)
cache.save_finn_ad(conn, ad)
return ad
```
### 17.6 The shared `formatting.py` module
Output formatting (JSON / markdown / table) is shared between CLI (`--format`) and MCP (`response_format`). Centralize all renderers here:
```python
# finn_eiendom/formatting.py
def render_ad(ad: FinnAd, fmt: str) -> str: ...
def render_shortlist(result: dict, fmt: str) -> str: ...
def render_comparison(result: dict, fmt: str) -> str: ...
def render_diff(result: dict, fmt: str) -> str: ...
def render_similar_units(units: list[SimilarUnit], fmt: str) -> str: ...
def render_unit(unit: EiendomUnit, fmt: str) -> str: ...
def render_score_breakdown(scores: dict, fmt: str) -> str: ...
```
CLI and MCP both call these. Neither has its own renderer. `fmt` accepts `"json"`, `"markdown"`, `"table"` (only where table makes sense). Unsupported values raise `ValueError` with a list of supported formats.
### 17.7 Adding a new feature — the checklist
For any new tool / command / behavior:
1. **Decide the home.** Use the table in §17.2.
2. **Write the service function** in `service.py` (or extend `analysis.py` if it's pure orchestration of existing services).
3. **Add a test** for the service function in `tests/test_service.py`.
4. **Add the MCP tool** in `mcp_server.py` — thin wrapper, `response_format` aware.
5. **Add the CLI command** in `cli.py` — thin wrapper, `--format` aware.
6. **Add formatter** in `formatting.py` if output is non-trivial.
7. **Add a test** for the MCP tool registration in `tests/test_mcp_server.py`.
8. **Add a test** for the CLI command in `tests/test_cli.py`.
9. **Update docs** — README and the relevant `.github/instructions/*.md` if new patterns are introduced.
If step 4 or 5 needs more than ~20 lines, you've put logic in the wrong layer. Push it down.
### 17.8 Acceptable duplication
A few small repetitions are tolerated to keep boundaries clean:
* Trivial `model_dump()` / `model_dump_json()` calls at MCP and CLI boundaries.
* `try/except → format error` blocks at each MCP tool (kept identical via a helper if it grows).
* Pydantic input schema declarations at each MCP tool (they document the tool).
Anything beyond a handful of lines is duplication and goes into a helper.
---
## 18. Workflows
### A. Analyze FINN search
```
Input: FINN search URL
Steps:
1. Normalize URL.
2. Check search-page cache (TTL 60min).
3. Fetch page 1, parse cards.
4. If max_pages > 1, fetch page 2..N.
5. Deduplicate by finnkode.
6. Record a search_run.
7. Pre-score from card data.
8. Select top N for detail fetch.
9. Run workflow B for each.
10. Score + classify each.
11. Sort by total score.
12. Persist scores; persist shortlist snapshot.
13. Return shortlist + summary.
```
### B. Fetch and parse FINN listing
```
Input: finnkode
Steps:
1. Build https://www.finn.no/realestate/homes/ad.html?finnkode={n}.
2. Check finn_ads cache (TTL 24h).
3. Fetch HTML, parse with ad.scrape_ad().
4. Normalize numbers/areas/dates via parser.py.
5. save_finn_ad().
Output: FinnAd.
```
### C. Eiendom.no enrichment
```
Input: FINN listing URL or finnkode
Steps:
1. Build full FINN URL.
2. Cache check on unit search.
3. eiendom_no.search_unit_from_finn_url().
4. Pick best match.
5. Save unitCode on the ad.
6. Cache check on unit detail.
7. eiendom_no.get_unit(unitCode).
8. save_eiendom_unit().
9. Compute FINN-vs-Eiendom.no mismatch warnings.
Output: EiendomUnit + mismatch list (or unavailable).
```
### D. Build unit_vector
```
Input: EiendomUnit
Steps:
1. Extract lon/lat from geometry.
2. propertyType → ptype.
3. floor / rooms / constructionYear / usableArea.
4. Choose price: listingPrice → estimatedSellingPrice → FINN total_price.
5. msgpack.packb + urlsafe_b64encode (strip "=").
6. Persist unit_vector on eiendom_units.
Output: unit_vector + payload.
```
### E. Fetch similar-units / comps
```
Input: unitCode, listing_status=RECENTLY_SOLD
Steps:
1. Load EiendomUnit; ensure unit_vector exists.
2. Cache check on similar_units.
3. eiendom_no.get_similar_units(unit_vector).
4. Normalize and filter locally:
RECENTLY_SOLD → saleStatus=SOLD and finalizedAt is set
FOR_SALE → saleStatus=FORSALE
5. Compute summary: count, avg/median selling price, avg sqm price, avg DOM.
6. save_similar_units().
Output: similar_units[] + comps_summary + confidence.
```
### F. Score property
```
Input: FinnAd, EiendomUnit, similar_units, user_prefs, feedback
Steps:
1. economy / market / comparable / location / layout / outdoor / hybel / renovation / risk.
2. Clamp total to 0100.
3. Assign categories.
4. Build explanation: why_interesting, risks, next_steps, broker_questions.
Output: scores dict + categories + summary.
```
### G. Find similar to liked
```
Input: finnkode with verdict=liked
Steps:
1. Load FinnAd.
2. Ensure Eiendom.no enrichment + unit_vector.
3. Fetch similar-units (prefer FOR_SALE).
4. Score candidates against user preferences.
5. Return ranked recommendations.
```
### H. Analyze one listing against comps
```
Input: finnkode
Steps:
1. workflow B → enrich (C) → comps (E, RECENTLY_SOLD).
2. Compare listing price vs comp avg/median; sqm price vs comp avg.
3. Compute confidence and classify cheap/fair/expensive.
Output: price_position, sqm_price_position, comparable_score, confidence, comps_summary, warnings.
```
### I. Detect new / removed / changed listings
```
Input: FINN search URL
Steps:
1. workflow A (no detail fetch needed).
2. Compare finnkoder against previous search_run for same normalized_url.
3. For changed ads, diff price/common_costs/status.
4. Optionally workflow B on new + changed only.
Output: new_ads[], removed_ads[], changed_ads[].
```
### J. Feedback loop
```
Input: finnkode + verdict + notes
Steps:
1. INSERT into feedback.
2. Update ad status.
3. If verdict=liked: mark as seed for similar-to-liked recommendations.
4. If verdict=rejected: store rejection reason.
5. Future analyses use feedback as a soft preference signal.
```
### K. Compare multiple listings
```
Input: finnkoder[]
Steps:
1. workflow B + C for each.
2. Optionally workflow E.
3. Build comparison table.
4. Identify winners by category: best value / lifestyle / hybel / bargain / safest / highest risk / most overpriced.
Output: comparison_table + winners_by_category + recommendation + risks + broker_questions.
```
---
## 19. Output formats
### 19.1 Shortlist item
```text
1. [Title/address] Score 84/100
Category: Bargain candidate
Price: 7,200,000 total / 77 m² / 93,500 NOK per m²
Eiendom.no: Estimate 7,650,000 / range 6,900,0008,400,000
Comps: 12 similar recently sold / avg 98,000 NOK per m²
Why interesting:
- Good size for price.
- Balcony and view.
- Renovation need may reduce competition.
- Flexible layout.
- Price looks low vs estimate and comps.
Risks:
- Check wet rooms in condition report.
- Common costs need review.
- Hybel potential is not documented.
- Comparable confidence is medium.
Next steps:
- Open listing.
- Read condition report.
- Check FINN vs Eiendom.no mismatches.
- Ask broker about planned cost increases.
- Consider viewing.
```
### 19.2 Analysis summary
```text
Analyzed 83 listings.
Fetched details for 20.
Eiendom.no-enriched 18.
Fetched similar-units for 7 shortlisted listings.
Shortlisted 8.
Best bargain candidate: ...
Best safe candidate: ...
Best hybel candidate: ...
Best price vs estimate: ...
Best price vs comps: ...
Highest risk: ...
Most overpriced: ...
```
---
## 20. Configuration
| Variable | Default | Purpose |
| ----------------------------------------- | -------------------------------: | -------------------------------- |
| `FINN_CACHE_PATH` | `data/finn.sqlite` | SQLite DB path |
| `FINN_MAX_SEARCH_PAGES` | `3` | Max search pages |
| `FINN_DETAIL_LIMIT` | `20` | Max detailed listings per run |
| `FINN_REQUEST_DELAY_SECONDS` | `2` | Delay between FINN requests |
| `FINN_USER_AGENT` | `personal-finn-eiendom-analyzer/0.1` | HTTP User-Agent |
| `FINN_CACHE_TTL_SEARCH_MINUTES` | `60` | Search cache TTL |
| `FINN_CACHE_TTL_AD_HOURS` | `24` | Listing cache TTL |
| `EIENDOM_NO_ENABLED` | `true` | Enable Eiendom.no enrichment |
| `EIENDOM_NO_BASE_URL` | `https://api.eiendom.no/api/v1` | API base URL |
| `EIENDOM_NO_CACHE_TTL_HOURS` | `24` | Unit/similar cache TTL |
| `EIENDOM_NO_REQUEST_DELAY_SECONDS` | `1` | Delay between Eiendom.no calls |
| `EIENDOM_NO_SIMILAR_UNITS_ENABLED` | `true` | Enable similar-units |
| `EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS` | `RECENTLY_SOLD` | Default comps status |
| `HJEMLA_ENABLED` | `false` | Enable optional Hjemla API |
| `LOG_LEVEL` | `INFO` | Logging level |
| `MCP_TRANSPORT` | `stdio` | `stdio` or `streamable_http` |
| `MCP_HTTP_HOST` | `127.0.0.1` | Streamable HTTP bind |
| `MCP_HTTP_PORT` | `8010` | Streamable HTTP port |
---
## 21. Deployment
The default runtime is a project-local virtualenv. Docker is supported but optional.
### 21.1 Local install (default)
```bash
# in the project root
uv venv # or: python3.12 -m venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
# now available:
finn-eiendom --help # CLI
finn-eiendom-mcp # MCP server over stdio
finn-eiendom serve --transport http --port 8010 # MCP server over HTTP
pytest # tests
ruff check . # lint
```
For a global CLI install:
```bash
uv tool install .
# or
pipx install .
```
### 21.2 Claude Desktop integration (stdio)
`~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"finn-eiendom": {
"command": "/Users/ole/code/finn-mcp/.venv/bin/finn-eiendom-mcp",
"args": [],
"env": {
"FINN_CACHE_PATH": "/Users/ole/code/finn-mcp/data/finn.sqlite",
"EIENDOM_NO_ENABLED": "true"
}
}
}
}
```
Or, with `uv` from the project root:
```json
{
"mcpServers": {
"finn-eiendom": {
"command": "uv",
"args": ["run", "finn-eiendom-mcp"],
"cwd": "/Users/ole/code/finn-mcp"
}
}
}
```
### 21.3 Docker Compose (optional)
```yaml
services:
finn-eiendom-mcp:
build: .
container_name: finn-eiendom-mcp
restart: unless-stopped
ports:
- "8010:8010"
environment:
FINN_CACHE_PATH: /data/finn.sqlite
EIENDOM_NO_ENABLED: "true"
EIENDOM_NO_SIMILAR_UNITS_ENABLED: "true"
MCP_TRANSPORT: streamable_http
MCP_HTTP_HOST: 0.0.0.0
MCP_HTTP_PORT: "8010"
volumes:
- ./data:/data
command: ["finn-eiendom", "serve", "--transport", "http", "--host", "0.0.0.0", "--port", "8010"]
```
### 21.4 Dockerfile
```dockerfile
FROM python:3.12-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends gcc \
&& rm -rf /var/lib/apt/lists/*
COPY pyproject.toml .
COPY finn_eiendom ./finn_eiendom
RUN pip install --no-cache-dir .
EXPOSE 8010
CMD ["finn-eiendom-mcp"]
```
---
## 22. MVP scope
### Must have
* Local venv install (`uv venv` + `pip install -e .[dev]`).
* Python core package with all modules listed in §12.1.
* `service.py` with `get_or_fetch_*` helpers.
* `formatting.py` shared between CLI and MCP.
* SQLite cache/history (existing schema retained, `search_runs` + `scores` + `feedback` added).
* FastMCP server with all tools in §14.1 except `finn_compare_ads` (deferred to "should have").
* CLI with all commands in §15.1 except `serve --transport http` and `cache clear-*` variants (deferred).
* FINN search + listing extraction.
* Eiendom.no enrichment enabled by default.
* `unit_vector` build + decode.
* Similar-units/comps with local filtering.
* Scoring on all nine components with category assignment.
* Feedback storage.
* Shortlist output with reasons, risks, next steps, broker questions.
* Pydantic v2 models with `model_config` (no v1 `Config`).
* HTTP retry on 5xx in addition to connection errors.
* MCP entry-point registered in `pyproject.toml`.
* README + `.github/instructions/*.md` describing the architecture and ownership rules.
### Should have
* Pagination.
* Price per m² across the board.
* Component score breakdown in output.
* Generated broker questions.
* `finn_get_new_ads_since_last_run` / `finn-eiendom diff`.
* `finn_compare_ads` / `finn-eiendom compare`.
* Feedback-based scoring adjustment.
* `finn_find_similar_to_liked_ad` / `finn-eiendom similar-to-liked`.
* CLI `--format markdown` + `--format table`.
* CLI `serve --transport http`.
* CLI `cache stats|clear|clear-html|clear-json`.
### Later
* Web UI / dashboard.
* n8n workflow templates.
* PDF condition-report analysis.
* Geocoding / travel-time / sun / noise overlays.
* Push notifications.
* Price-drop monitoring.
* LLM-based listing-text scoring.
* Optional Hjemla integration.
---
## 23. Roadmap
### Phase 0 — Spike (largely done)
* Parse one FINN search result, extract finnkoder, parse 35 listings.
* Resolve FINN URL → Eiendom.no `unitCode`, fetch unit detail, generate `unit_vector`, fetch similar-units with `RECENTLY_SOLD`.
### Phase 1 — Core MVP (mostly done)
* Stable parser, SQLite cache, Eiendom.no enrichment, similar-units/comps, basic scoring.
* Fixture-based tests for parsers, cache, scoring.
### Phase 2 — MCP / CLI MVP (this PRD)
* Replace FastAPI with FastMCP stdio server.
* Add `service.py` and `formatting.py`.
* Add `cli.py` (typer) and `__main__.py`.
* Wire MCP tools and CLI commands into the service + formatting layers.
* Pydantic v2 `model_config` cleanup.
* HTTP retry on 5xx.
* New tests: `tests/test_service.py`, expanded `tests/test_mcp_server.py`, new `tests/test_cli.py`, new `tests/test_http.py`, new `tests/test_formatting.py`, new `tests/test_architecture.py`.
* Switch from Docker-only workflow to local venv as default; keep Docker as an optional packaging path.
### Phase 3 — Personal scoring v2
* Tighter user-preference weights, stronger bargain/risk/hybel logic, better confidence handling, generated broker questions.
### Phase 4 — Agent / workflow
* Cron / scheduled runs, diff notifications, n8n templates, Slack/Discord output.
### Phase 5 — Dashboard
* React/TanStack UI for shortlist, feedback, comps, history.
---
## 24. Acceptance criteria
### A1. MCP server
Given a fresh local venv install, `finn-eiendom-mcp` starts via `mcp.run(transport="stdio")` without error. Running `mcp dev finn_eiendom/mcp_server.py` shows all tools listed in §14.1.
### A2. CLI
Given `pip install -e .`, `finn-eiendom --help` lists every command in §15.1. Each command runs end-to-end against cached fixtures with no live network calls and produces JSON, markdown, or table output as requested via `formatting.py`.
### A3. Search analysis
Given a valid FINN search URL, `service.analyze_search()` returns a ranked shortlist sorted by total score, with at least the fields: `summary`, `shortlist`, `search_url`. Cards are deduplicated by finnkode. Identical reruns within the search-cache TTL are served from cache.
### A4. Listing detail
Given a valid finnkode, `service.get_or_fetch_ad()` returns a `FinnAd` with at least `finnkode`, `url`, `title`, `address`, `total_price`, `area_m2`, `listing_description`. Missing fields are `None`, not raised. Subsequent calls within the TTL hit the cache.
### A5. Feedback
Given a finnkode and verdict, `service.save_feedback()` writes a `feedback` row. `liked` verdicts are surfaced by `service.find_similar_to_liked()`.
### A6. Eiendom.no enrichment
Given a FINN listing URL, the system resolves a `unitCode`, fetches the unit detail, stores estimate / coordinates / area / rooms / year / market data, and uses them in scoring. Enrichment failures degrade gracefully — the `eiendom_unit` field is `None` in the result, no exception escapes the service.
### A7. Similar-units
Given a `unitCode`, the system builds (or loads) a cached `unit_vector`, calls similar-units with the requested `listing_status`, returns structured comps, caches the result, and emits a comps summary with count, average price, average sqm price.
### A8. Pydantic v2
`FinnAd`, `EiendomUnit`, `SimilarUnit` use `model_config = ConfigDict(...)`. No `class Config:` blocks remain.
### A9. HTTP retry
`HTTPClient.get()` retries 5xx responses with exponential backoff (`1s, 2s, 4s`) up to `retries` attempts, and surfaces 4xx as `httpx.HTTPStatusError` immediately.
### A10. No-duplication / architecture invariants
A static check (`tests/test_architecture.py`) verifies:
* No `import httpx` outside `finn_eiendom/http.py`.
* No `import sqlite3` outside `finn_eiendom/cache.py`.
* No `BeautifulSoup` import outside `finn_eiendom/search.py` or `finn_eiendom/ad.py`.
* No `msgpack` import outside `finn_eiendom/eiendom_no.py`.
* `mcp_server.py` only imports from `service`, `formatting`, `models`, `config`, and stdlib + `mcp`.
* `cli.py` only imports from `service`, `formatting`, `models`, `config`, and stdlib + `typer`.
### A11. Tooling
`ruff check .` returns zero issues. `pytest` passes. `mypy --strict finn_eiendom` passes (or is documented as a known gap).
---
## 25. Test strategy
### 25.1 Unit tests
* `tests/test_parser.py` — number/date/URL/finnkode normalization.
* `tests/test_search.py` — FINN search HTML → cards.
* `tests/test_ad.py` — FINN listing HTML → FinnAd.
* `tests/test_eiendom_no.py` — unit search/detail/similar JSON parsers, `unit_vector` encode/decode.
* `tests/test_scoring.py` — all scoring components + classifier.
* `tests/test_cache.py` — read/write/TTL behavior.
### 25.2 Service tests (new)
* `tests/test_service.py`
* `test_get_or_fetch_ad_uses_cache`
* `test_get_or_fetch_ad_fetches_when_cache_miss`
* `test_get_or_fetch_ad_force_refresh`
* `test_analyze_search_with_fixtures`
* `test_find_similar_to_liked_uses_liked_feedback`
### 25.3 MCP tests
* `tests/test_mcp_server.py`
* `test_mcp_server_has_correct_tools`
* `test_finn_decode_unit_vector_returns_json`
* `test_finn_analyze_search_handles_error`
### 25.4 CLI tests (new)
Use Typer's `CliRunner`.
* `tests/test_cli.py`
* `test_cli_help`
* `test_cli_analyze_search_table_format`
* `test_cli_get_ad_json_format`
* `test_cli_save_feedback_persists_row`
* `test_cli_decode_vector`
### 25.5 Formatting tests (new)
* `tests/test_formatting.py`
* `test_render_shortlist_json_roundtrips`
* `test_render_shortlist_markdown_contains_score`
* `test_render_unsupported_format_raises_valueerror`
### 25.6 HTTP tests (new)
Use `respx`.
* `tests/test_http.py`
* `test_get_retries_on_500`
* `test_get_raises_on_404`
* `test_post_delay_applied`
### 25.7 Architecture tests (new)
* `tests/test_architecture.py` — static import-graph checks listed in A10.
### 25.8 Manual / smoke tests
* `finn-eiendom doctor` runs.
* Real FINN URL run; compare top-3 with manual judgment.
* Save 5 feedback rows; rerun; verify scoring shift.
* Mark one ad liked; run `similar-to-liked`; sanity-check candidates.
---
## 26. Logging, safety, compliance
Log: start/end of analysis, pages/listings/details fetched, Eiendom.no enrichments attempted/found/failed, similar-units attempted/found/failed, cache hits/misses, parse errors, request errors, debug-level scoring details.
Safety / compliance:
* Private, low-frequency, user-triggered use only.
* Configurable request delays and User-Agent.
* Cache aggressively to minimize requests.
* No public redistribution of FINN/Eiendom.no data.
* No public exposure without auth — prefer LAN / Tailscale / reverse proxy.
* Scores, estimates, and comps are decision support, not official valuation, legal, or technical advice.
* stdio MCP servers must log to **stderr only** (`logging.basicConfig(stream=sys.stderr, ...)`).
---
## 27. Risks & mitigations
| Risk | Impact | Mitigation |
| ------------------------------------ | ---------------------- | -------------------------------------------- |
| FINN HTML changes | Parser breaks | Fixture tests, resilient selectors |
| Eiendom.no API/JSON changes | Enrichment/comps break | JSON fixtures, graceful fallback |
| Unit-vector format changes | Similar-units breaks | Unit tests, fall back to cache, mark unavailable |
| Too many requests | Blocking / unwanted load | Delay, cache, low-frequency use |
| Bad scoring | Poor recommendations | Explain score and uncertainty |
| Legal/technical interpretation wrong | Bad decisions | Present as broker questions, not facts |
| User overtrusts score | Missed risks | Always show risks and next steps |
| Public MCP exposure | Misuse | LAN / Tailscale / auth-only |
| stdio server writes to stdout | Breaks JSON-RPC frame | Configure logging to stderr; architecture test|
| Duplication of logic | Drift between MCP/CLI/library | Code-ownership table + architecture tests |
---
## 28. Open questions
1. Should `service.py` open one shared `sqlite3.Connection` per process or one per call? (current code opens per call — fine but worth measuring.)
2. Store raw HTML permanently or only parsed output? Default: only parsed, raw HTML under TTL.
3. How aggressively to detail-fetch in `analyze_search`? Default: top 20 cards.
4. Hardcode scoring weights or expose via YAML / env? Default: hardcoded for MVP; YAML in Phase 3.
5. Should feedback affect scoring in MVP, or only be stored? Default: stored only; soft signal in Phase 3.
6. Multiple scoring profiles (lifestyle / bargain / hybel / safe)? Default: single profile in MVP.
7. Permanently store Eiendom.no data or TTL only? Default: TTL only; review later.
8. How to handle FINN-vs-Eiendom.no mismatches (area, price)? Default: store both, surface as warning, never silently overwrite.
9. Which `listing_status` values does similar-units accept server-side? Verify in spike before relying on it.
10. Should recommendations use only `liked` listings, or also high-scoring listings without feedback? Default: liked only.
11. Should `serve --transport http` ship in MVP? Default: yes for cron/n8n users; stdio still default for Claude Desktop.
---
## 29. First implementation plan (Phase 2)
Step by step, each step independently mergeable.
1. **Switch dev workflow to local venv.** Update `AGENTS.md`, `copilot-instructions.md`, `python.instructions.md`, `tests.instructions.md`. Add `clean-code.instructions.md`, `cli.instructions.md`, and `docs.instructions.md`.
2. **Pydantic v2 cleanup** — replace `class Config` with `model_config = ConfigDict(...)` in `models.py`. Add roundtrip test.
3. **Service layer** — create `finn_eiendom/service.py` with `get_or_fetch_*` and orchestration helpers. Add `tests/test_service.py`.
4. **Formatting layer** — create `finn_eiendom/formatting.py` with all `render_*` helpers. Add `tests/test_formatting.py`.
5. **HTTP retry** — extend `HTTPClient.get()` with 5xx retry + exponential backoff. Add `tests/test_http.py`.
6. **Replace FastAPI with FastMCP** — rewrite `finn_eiendom/mcp_server.py` against `service.py` + `formatting.py`. Add stdio `main()`. Add `[project.scripts]` entry `finn-eiendom-mcp`. Expand `tests/test_mcp_server.py`.
7. **CLI** — create `finn_eiendom/cli.py` (typer) and `finn_eiendom/__main__.py`. Add `[project.scripts]` entry `finn-eiendom`. Add `tests/test_cli.py`.
8. **Diff workflow** — implement `search_runs` table + `service.get_new_ads_since_last_run` + matching MCP tool + CLI `diff` command.
9. **Compare workflow** — implement `service.compare_ads` + MCP tool + CLI `compare` command.
10. **Similar-to-liked** — implement `service.find_similar_to_liked` + MCP tool + CLI `similar-to-liked` command.
11. **Architecture tests**`tests/test_architecture.py` enforcing A10.
12. **README + Claude Desktop config** — document install paths for both CLI and MCP using local venv.
Definition of done for the whole phase:
* [ ] `finn-eiendom-mcp` boots over stdio with all tools listed.
* [ ] `finn-eiendom --help` lists every command in §15.1.
* [ ] `pytest` is green, including new `test_service.py`, `test_cli.py`, `test_http.py`, `test_formatting.py`, `test_architecture.py`.
* [ ] `ruff check .` is clean.
* [ ] README documents Claude Desktop config and a CLI quickstart using local venv.
* [ ] All acceptance criteria in §24 pass.
---
## 30. Final product statement
> **Build a compact, private, self-hosted property analysis platform whose source of truth is a typed Python library, and whose user-facing surfaces are (a) an MCP server for LLM agents, (b) a CLI for terminals and cron, and (c) a Python API for tests and notebooks. All three share the same service layer, the same formatting layer, and the same SQLite cache.**
The MVP does one thing well:
> **FINN search in → relevant property candidates out, enriched with Eiendom.no estimates, similar-units, explanation, risk, and next steps.**