This commit is contained in:
Ole
2026-05-16 06:54:17 +00:00
commit 1399f61c1a
44 changed files with 6746 additions and 0 deletions
+18
View File
@@ -0,0 +1,18 @@
FINN_CACHE_PATH=/data/finn.sqlite
FINN_MAX_SEARCH_PAGES=3
FINN_DETAIL_LIMIT=20
FINN_REQUEST_DELAY_SECONDS=2
FINN_CACHE_TTL_SEARCH_MINUTES=60
FINN_CACHE_TTL_AD_HOURS=24
FINN_USER_AGENT=personal-finn-eiendom-analyzer/0.1
EIENDOM_NO_ENABLED=true
EIENDOM_NO_BASE_URL=https://api.eiendom.no/api/v1
EIENDOM_NO_CACHE_TTL_HOURS=24
EIENDOM_NO_REQUEST_DELAY_SECONDS=1
EIENDOM_NO_SIMILAR_UNITS_ENABLED=true
EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS=RECENTLY_SOLD
LOG_LEVEL=DEBUG
MCP_HOST=0.0.0.0
MCP_PORT=8000
+181
View File
@@ -0,0 +1,181 @@
# Copilot instructions for finn-eiendom-mcp
This project is a private, self-hosted Python platform for analyzing FINN real-estate listings. It exposes the same code through three coordinated front ends:
1. A **Python library** (`finn_eiendom`) — source of truth.
2. An **MCP server** (FastMCP, stdio + optional HTTP) over `finn_eiendom/mcp_server.py`.
3. A **CLI** (`finn-eiendom`) over `finn_eiendom/cli.py`.
All three share the same `service.py`, `formatting.py`, `cache.py`, and `models.py`. Code lives in exactly one place and is called from both front ends. See `PRD.md` §17 for the full ownership rules — that section is the constitution.
---
## Source of truth
Read in this order:
1. `PRD.md` — product and architecture, especially §17.
2. `PROJECT.md` — module map.
3. `AGENTS.md` — workflow.
4. `.github/instructions/*.md` — per-topic rules.
---
## Module layout
```
finn_eiendom/
config.py # env vars, defaults, TTLs
models.py # Pydantic v2 models
parser.py # number/area/date/URL/finnkode normalization
http.py # async HTTP (httpx) with delay + retry + user-agent
cache.py # SQLite (sqlite3) schema + persistence
search.py # FINN search HTML parsing + pagination
ad.py # FINN listing HTML parsing
eiendom_no.py # Eiendom.no unit search/detail, unit_vector, similar-units
scoring.py # score model + classifications
feedback.py # verdicts + soft preference signal
analysis.py # orchestration + shortlist + summary
service.py # get_or_fetch_* + thin facade for MCP and CLI
formatting.py # render_* helpers shared by MCP and CLI
mcp_server.py # FastMCP wrappers around service.py
cli.py # typer-based CLI wrappers around service.py
__main__.py # python -m finn_eiendom → CLI entry
```
---
## The five hard rules
Enforced by `tests/test_architecture.py`:
1. **`mcp_server.py` and `cli.py` are siblings.** They never import from each other. Both import only from `service`, `formatting`, `models`, `config`, stdlib, and their own framework (`mcp` / `typer`).
2. **`service.py` is the only orchestrator.** Nothing above it touches HTTP or SQLite directly.
3. **`httpx` lives only in `http.py`.**
4. **`sqlite3` lives only in `cache.py`.**
5. **Output formatting lives only in `formatting.py`.** Never inline in CLI or MCP tool bodies.
---
## Development workflow — local venv
Default runtime is a project-local virtualenv. Docker is supported for packaging but optional for development.
```bash
uv venv # or: python3.12 -m venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
# from now on:
pytest
ruff check .
ruff format .
mypy finn_eiendom
finn-eiendom --help
finn-eiendom-mcp # stdio MCP server
```
**Never** install packages globally. **Never** add a dependency without updating `pyproject.toml`.
---
## Coding rules
* Python 3.12+.
* Pydantic v2 with `model_config = ConfigDict(...)`. No v1 `class Config:` blocks.
* Type hints on every function signature.
* Async I/O for all network and DB code paths through `service.py`.
* Dependency injection for HTTP/cache clients in tests.
* Small, focused functions. One job per function. See `clean-code.instructions.md`.
* Errors raise with actionable messages; the MCP boundary translates them to `{"error": True, "code": ..., "message": ...}`.
* stdio MCP servers log to **stderr only**.
---
## Code ownership — the short version
| Concern | Lives in |
| -------------------------------------- | ------------------------------ |
| FINN search HTML parsing | `search.py` |
| FINN listing HTML parsing | `ad.py` |
| Norwegian number / area / URL regexes | `parser.py` |
| HTTP fetching + retry + delay | `http.py` |
| SQLite reads / writes | `cache.py` |
| Eiendom.no unit search/detail/comps | `eiendom_no.py` |
| `unit_vector` encode/decode (msgpack) | `eiendom_no.py` |
| Scoring + classification | `scoring.py` |
| Feedback storage | `feedback.py` |
| Cache-aware orchestration | `service.py` (`get_or_fetch_*`)|
| Shortlist + summary assembly | `analysis.py` |
| End-to-end runs | `service.py` (`analyze_search`)|
| MCP tool definitions | `mcp_server.py` |
| CLI command definitions | `cli.py` |
| Output rendering | `formatting.py` |
| Env-var defaults | `config.py` |
| Pydantic models | `models.py` |
Full table with "never lives in" column is in `PRD.md` §17.2.
---
## Adding a feature
1. Decide the home using the table above (and `PRD.md` §17.2).
2. Implement in `service.py` (or `analysis.py` if pure orchestration).
3. Add a service-level test.
4. Add a thin MCP tool — `response_format`-aware.
5. Add a thin CLI command — `--format`-aware.
6. Add a renderer in `formatting.py`.
7. Test MCP and CLI registration.
8. Update PRD and instruction docs.
If the MCP tool body or CLI command body grows past ~20 lines, push logic down to `service.py`.
---
## Documentation lookups — use context7
When uncertain about an external library API (FastMCP, Pydantic v2, Typer, httpx, msgpack, pytest-asyncio, respx, BeautifulSoup), call the **`context7` MCP server** *before* writing code. Don't rely on training-data memory.
```
context7:resolve-library-id → library_id
context7:query-docs(library_id, topic) → authoritative snippets
```
Details in `.github/instructions/docs.instructions.md`.
---
## Clean code is a hard requirement
See `clean-code.instructions.md`. DRY, single-responsibility, descriptive names, type hints, no dead code, comments explain why not what. If duplication slips in, the right answer is to extract it — not to copy the second instance.
---
## Product behavior
The MVP does one thing well:
```
FINN search URL in
→ relevant property candidates out
→ enriched with Eiendom.no estimates
→ similar-units / comps
→ explanations
→ risks
→ next steps
→ broker questions
```
Always explain:
* why a property is interesting,
* price vs estimate,
* price vs comparable sales,
* renovation upside,
* hybel / rental potential,
* technical / legal risks,
* uncertainty / confidence,
* next questions for the broker.
Scores and estimates are decision support, not advice. Surface uncertainty, never hide it.
@@ -0,0 +1,150 @@
---
name: Clean code rules
description: Best-practice standards for all production and test code
applyTo: "**/*.py"
---
# Clean code rules
These rules apply everywhere — every module, every function, every test. They are intentionally opinionated. If a rule conflicts with the architecture rules in `PRD.md` §17, the architecture rules win. If it conflicts with another best practice here, pick the one that produces the simpler, more readable result.
## Single responsibility
* One job per function. If a function name needs "and" to describe it, it's two functions.
* One job per module. `parser.py` parses. `cache.py` caches. `formatting.py` formats. Don't mix.
* One job per class. We rarely need classes outside Pydantic models, dataclasses, and the `HTTPClient`. Avoid OO for OO's sake.
## Function size
* Aim for under **30 lines** of body.
* Past **50 lines** it's a code smell — extract helpers.
* If you've got more than **3 levels of nesting**, the function wants splitting (extract the inner block into a helper named after what it does).
## Naming
* Names describe **intent**, not implementation. `get_or_fetch_ad`, not `process_ad`. `render_shortlist_markdown`, not `format2`.
* Verbs for actions (`fetch_`, `parse_`, `score_`, `render_`).
* Nouns for data (`FinnAd`, `EiendomUnit`, `shortlist`).
* Boolean variables / parameters read as predicates: `force_refresh`, `include_eiendom_no`, `is_recently_sold`. Not `flag`, not `do_thing`.
* Avoid abbreviations except those well-established in the domain (`url`, `ad`, `nok`, `bra`, `sqm`).
* Norwegian terms stay Norwegian when they're domain vocabulary (`hybel`, `fellesgjeld`, `finnkode`). Don't translate `finnkode` to `finn_code` — it's a proper noun.
## Type hints
Required on every function signature, including private helpers. Mypy in strict mode is the goal.
```python
# ❌
def parse(html, base_url=None):
...
# ✅
def parse(html: str, base_url: str | None = None) -> FinnAd | None:
...
```
Use modern syntax: `X | None` over `Optional[X]`, `list[int]` over `List[int]`, `dict[str, Any]` over `Dict[str, Any]`.
## Comments
* Comments explain **WHY**, never **WHAT**. The code already says what.
* If a comment is needed to explain *what* a line does, the line wants renaming or extracting.
* Use docstrings for public functions, classes, and modules. One-line summary, blank line, optional details and examples.
* No commented-out code. Delete it. Git remembers.
* No `# TODO` without a date or issue reference. `# TODO(2026-05): replace once Eiendom.no confirms ...` is fine.
## DRY — Don't Repeat Yourself
If you write the same logic, regex, SQL, or format string **twice**, extract it. The decision table in `PRD.md` §17.2 tells you where it belongs.
The pre-merge anti-duplication checklist (from `PRD.md` §17.4):
1. Is this logic already implemented somewhere? (`grep` the function name and obvious keywords.)
2. If I'm copy-pasting from another file, am I about to duplicate behavior that should live in one shared function?
3. Can a new caller use an existing `service.py` function instead of writing its own orchestration?
4. Is the same Pydantic field defined in two models? Factor out a base model.
5. Am I formatting output in two places (CLI + MCP)? Move it to `formatting.py`.
6. Am I opening a SQLite connection outside `cache.py`? Move it.
7. Am I building an httpx call outside `http.py`? Move it.
8. Am I writing a Norwegian-number / area / finnkode regex outside `parser.py`? Move it.
9. Am I adding an env-var lookup outside `config.py`? Move it.
10. Did I add a new behavior with only one front end (MCP or CLI)? If it should exist in both, the service function is missing.
A small amount of duplication is acceptable to keep boundaries clean — see `PRD.md` §17.8. Past a handful of lines, extract.
## Errors
* **Fail loudly** with actionable messages.
```python
# ❌
raise ValueError("bad input")
# ✅
raise ValueError(f"Unknown listing_status {status!r}; expected one of {VALID_LISTING_STATUSES}")
```
* **No silent failures.** `except Exception: pass` is forbidden. Catch the specific exception, log it, and either recover or re-raise.
* **Service raises; MCP wraps.** Service functions raise normal exceptions. The MCP tool boundary translates them into `{"error": True, "code": ..., "message": ...}`. CLI lets typer handle non-zero exits.
* **Graceful degradation is explicit.** If Eiendom.no enrichment fails, return a result with `eiendom_unit=None` and a warning, not a silently-missing field.
## State
* No global mutable state. The only module-level constants allowed are configuration values loaded from env in `config.py`.
* No module-level caches (dicts, lists) that mutate. Use `cache.py` if you need persistence.
* Pass dependencies in (HTTP clients, DB connections) for testability.
## Dead code
* No commented-out code.
* No unused imports (ruff catches these — fix them, don't add `# noqa`).
* No unused parameters (use `_` or remove).
* No `if False:` blocks "for later".
* Functions and classes that aren't called anywhere — delete them. Git keeps history.
## Magic numbers and strings
Anything that influences behavior and isn't self-explanatory belongs in `config.py` (env-controlled) or as a named module-level constant near the top of the file.
```python
# ❌
if days > 90:
confidence = "low"
# ✅
COMPS_STALE_AFTER_DAYS = 90
if days > COMPS_STALE_AFTER_DAYS:
confidence = "low"
```
URLs, timeouts, retries, TTLs, status codes — never inline.
## Imports
* Standard library first, third-party second, local last, separated by blank lines.
* Ruff's `I` rules sort and group these — run `ruff check . --fix`.
* No wildcard imports.
* No relative imports above one level (`from ..thing import x` is a smell; refactor).
* Each module's allowed import set is enforced by `tests/test_architecture.py`.
## Tests are first-class code
Same rules. Same type hints. Same naming. Same DRY. If a fixture is used in three test files, it goes in `conftest.py`. If three tests share a setup, factor it into a fixture.
## Reviewing your own change before commit
A 60-second self-review:
1. Did I add a function that already exists somewhere? (`grep` it.)
2. Did I bypass `service.py`, `http.py`, `cache.py`, or `formatting.py`?
3. Is everything typed?
4. Did I leave a `print()`, `breakpoint()`, or commented-out block behind?
5. Does the test for this change actually fail without the change?
6. Did I update `PRD.md` or the relevant instruction file if I changed an architectural rule?
## When in doubt about a library API
Use the `context7` MCP server instead of guessing. See `docs.instructions.md`. Training-data memory of `pydantic.field_validator`, `typer.Option`, `mcp.tool` annotations, or `httpx.AsyncClient` is unreliable — they all change between versions.
+158
View File
@@ -0,0 +1,158 @@
---
name: CLI rules
description: Rules for the typer-based finn-eiendom CLI
applyTo: "finn_eiendom/cli.py,finn_eiendom/__main__.py"
---
# CLI rules
The CLI is a **thin wrapper** over `service.py`. It is a sibling of `mcp_server.py` — they never call each other and they share the same underlying service functions. Every CLI command maps 1:1 to a service function with the same parameters and defaults.
## Framework
Built with [`typer`](https://typer.tiangolo.com/). One `typer.Typer` app:
```python
# finn_eiendom/cli.py
import asyncio, typer
from . import service, formatting
app = typer.Typer(no_args_is_help=True, add_completion=False)
```
Entry points in `pyproject.toml`:
```toml
[project.scripts]
finn-eiendom-mcp = "finn_eiendom.mcp_server:main"
finn-eiendom = "finn_eiendom.cli:app"
```
Plus `finn_eiendom/__main__.py`:
```python
from .cli import app
if __name__ == "__main__":
app()
```
So `python -m finn_eiendom ...` works without installation.
## Command body shape
```python
@app.command()
def analyze_search(
url: str,
max_pages: int = 3,
detail_limit: int = 20,
no_details: bool = typer.Option(False, "--no-details"),
no_eiendom: bool = typer.Option(False, "--no-eiendom"),
with_similar: bool = typer.Option(False, "--with-similar"),
format: str = typer.Option("json", "--format"),
) -> None:
"""Analyze a FINN search URL and return a ranked shortlist."""
result = asyncio.run(service.analyze_search(
search_url=url,
max_pages=max_pages,
detail_limit=detail_limit,
include_details=not no_details,
include_eiendom_no=not no_eiendom,
include_similar_units_for_shortlist=with_similar,
))
typer.echo(formatting.render_shortlist(result, format))
```
Rules:
* The command body has at most three sections: option parsing (handled by typer), one `service.<function>` call, one `typer.echo(formatting.render_<thing>(result, format))`.
* If the body has more than ~20 lines, the logic belongs in `service.py`.
* No `print()` — use `typer.echo()` for stdout, `typer.echo(..., err=True)` for stderr.
* No business logic, no rendering, no SQLite, no HTTP, no parsing.
## Formats
Every command that produces structured output accepts `--format`:
* `--format json` (default) — full structured output, pipeable into `jq`.
* `--format markdown` — human-readable.
* `--format table` — terminal table (only where it makes sense: `analyze-search`, `compare`, `shortlist`, `diff`).
All three render paths are produced by `formatting.py`. Never format inline in `cli.py`. Unsupported values raise `ValueError` with a list of supported formats — typer surfaces this as a non-zero exit.
## Commands
```text
finn-eiendom analyze-search <url> [--max-pages 3] [--detail-limit 20] [--no-details] [--no-eiendom] [--with-similar] [--format ...]
finn-eiendom get-ad <finnkode> [--force-refresh] [--no-eiendom] [--with-similar] [--format ...]
finn-eiendom compare <finnkode...> [--no-eiendom] [--no-comps] [--format ...]
finn-eiendom save-feedback <finnkode> <verdict> [--notes "..."]
finn-eiendom shortlist [--run-id ID] [--limit 10] [--format ...]
finn-eiendom diff <url> [--format ...]
finn-eiendom resolve-unit <finn_url>
finn-eiendom get-unit <unit_code> [--force-refresh]
finn-eiendom enrich-ad <finnkode> [--with-similar]
finn-eiendom build-vector <unit_code>
finn-eiendom decode-vector <unit_vector>
finn-eiendom similar-units <unit_vector> [--status RECENTLY_SOLD|FOR_SALE|CURRENT]
finn-eiendom similar-to-liked <finnkode> [--mode recommendations|comps] [--status ...]
finn-eiendom analyze-against-comps <finnkode>
finn-eiendom cache stats | clear | clear-html | clear-json
finn-eiendom serve [--transport stdio|http] [--host 127.0.0.1] [--port 8010]
finn-eiendom config show | path
finn-eiendom doctor
finn-eiendom version
```
Sub-command groups (`cache`, `config`) use `typer.Typer` sub-apps:
```python
cache_app = typer.Typer(help="Cache management")
app.add_typer(cache_app, name="cache")
@cache_app.command("stats")
def cache_stats() -> None:
typer.echo(formatting.render_cache_stats(service.get_cache_stats(), "json"))
```
## Async glue
Service functions are async; CLI commands are sync. Always use `asyncio.run(service.<function>(...))` at the call boundary. Don't sprinkle `async def` across CLI commands — typer expects sync handlers.
## Exit codes
* `0` — success.
* `1` — runtime error (raised exception in service).
* `2` — usage error (typer's default for bad options).
Let exceptions propagate from `service.py` and rely on typer's default handling. Only catch where you want a more specific exit code or message.
## What stays out of cli.py
* `import httpx`, `import sqlite3`, `import msgpack` — never.
* `from .ad import ...`, `from .search import ...`, `from .eiendom_no import ...`, `from .scoring import ...`, `from .cache import ...`, `from .http import ...` — never.
* Inline formatting logic — goes in `formatting.py`.
* MCP imports (no `from .mcp_server import ...`).
Allowed imports in `cli.py`:
```python
import asyncio, json, sys
import typer
from . import service, formatting, config
from .models import FinnAd, EiendomUnit, SimilarUnit # only for type hints
```
`tests/test_architecture.py` enforces this.
## When uncertain about typer
Use `context7` instead of guessing:
```
context7:resolve-library-id → "tiangolo/typer"
context7:query-docs(id, "Typer sub-apps and option groups")
```
See `docs.instructions.md`.
+118
View File
@@ -0,0 +1,118 @@
---
name: Documentation lookups via context7 MCP
description: How and when to use the context7 MCP server for library documentation
applyTo: "**/*.py,**/*.md,**/*.toml,**/*.yaml,**/*.yml"
---
# Documentation lookups — use context7
When you are uncertain about a library's API, **call the `context7` MCP server before writing code**. Do not rely on training-data memory. Pydantic, FastMCP, Typer, httpx, and pytest all evolve quickly; what was true two releases ago is often wrong now.
## When to use context7
Use it **before** writing code involving any of these:
* **FastMCP / MCP Python SDK** — `@mcp.tool()` signatures, `ToolAnnotations`, `mcp.run(transport=...)`, resource and prompt decorators, server lifecycle, streamable-HTTP setup.
* **Pydantic v2** — `BaseModel`, `Field`, `ConfigDict`, `model_validator`, `field_validator`, `model_dump` / `model_dump_json`, discriminated unions, `Annotated[...]` with validators.
* **Typer** — `Typer()` apps, `typer.Option`, `typer.Argument`, sub-apps via `add_typer`, callbacks, exit codes, testing with `CliRunner`.
* **httpx** — `AsyncClient`, timeouts, transports, retries, `Response` API.
* **respx** — mocking httpx, `respx.mock`, `route.mock`, match patterns.
* **msgpack** — packing/unpacking, type extensions, raw vs string mode.
* **base64** — `urlsafe_b64encode`, padding handling.
* **pytest** / **pytest-asyncio** — fixtures, parametrize, async tests, markers, `tmp_path`, `monkeypatch`.
* **BeautifulSoup** / **lxml** — selectors, parser flavors, element traversal.
* **typer.testing.CliRunner** — invoking apps, asserting on stdout/stderr/exit codes.
Use it **also** when:
* A test fails with an error like `AttributeError: 'BaseModel' object has no attribute 'dict'` (Pydantic v1 vs v2 confusion).
* You see a `DeprecationWarning` from a third-party library and aren't sure of the modern replacement.
* You're about to copy a code pattern from memory that feels "old".
## When NOT to use it
* Pure Python stdlib (`json`, `pathlib`, `dataclasses`, `typing`) — these are stable and well-known.
* Project-internal modules — read the source.
* Generic programming questions ("what's a list comprehension") — use your own knowledge.
* FINN / Eiendom.no API behavior — these are not in context7. Use fixtures from prior runs in `tests/fixtures/` and the endpoint notes in `PRD.md` §9.
## How to use it
Two-step pattern:
### 1. Resolve the library ID
```
context7:resolve-library-id(query="fastmcp")
context7:resolve-library-id(query="pydantic")
context7:resolve-library-id(query="typer")
```
Returns the canonical library ID (e.g. `pydantic/pydantic`, `fastapi/typer`). Pick the most-starred / official-looking match.
### 2. Query the docs
```
context7:query-docs(
context7CompatibleLibraryID="pydantic/pydantic",
topic="field validators v2 mode after",
tokens=3000,
)
```
* **Keep the topic focused.** "Pydantic v2 field validators with mode=after on Optional[str]" beats "Pydantic validation".
* **Cap tokens** to roughly what you need (15004000 is usually plenty). The default is fine for most calls.
* **Use library-specific terminology** in the topic — "discriminator field" for Pydantic, "tool annotations" for FastMCP, "sub-apps" for Typer.
### Worked examples
**Q: How do I declare a FastMCP tool with read-only annotations?**
```
context7:resolve-library-id(query="modelcontextprotocol python sdk")
context7:query-docs(context7CompatibleLibraryID="<resolved id>",
topic="FastMCP @mcp.tool ToolAnnotations readOnlyHint")
```
**Q: How do I write a Pydantic v2 model_validator that runs after field validation?**
```
context7:resolve-library-id(query="pydantic")
context7:query-docs(context7CompatibleLibraryID="pydantic/pydantic",
topic="model_validator mode='after' v2")
```
**Q: How do I mock an async httpx POST with respx?**
```
context7:resolve-library-id(query="respx")
context7:query-docs(context7CompatibleLibraryID="<resolved id>",
topic="respx mock async httpx POST json body")
```
**Q: How do I add a Typer sub-app for `cache` commands?**
```
context7:resolve-library-id(query="typer")
context7:query-docs(context7CompatibleLibraryID="<resolved id>",
topic="Typer add_typer sub-application command groups")
```
## After the lookup
* Cite or summarize what you found in a code comment **only when** the snippet documents a non-obvious API choice — otherwise the code is enough.
* If context7 returns nothing useful, fall back to:
1. The library's official docs site.
2. The library's repo `README` / `examples/`.
3. The smallest possible spike (a 5-line script in the venv) to verify behavior.
## Anti-patterns
* **Don't** invent a method signature from memory and hope. If you're not 100% sure of an API, look it up.
* **Don't** copy patterns from old Stack Overflow answers without verifying — Pydantic, FastMCP, and Typer all had breaking changes recently.
* **Don't** silence a warning instead of fixing the deprecation. Look up the modern API.
* **Don't** query context7 for FINN or Eiendom.no — those endpoints aren't in any public docs index. Use `tests/fixtures/` and `PRD.md` §9.
## Network configuration note
`context7` is configured as a connected MCP server in this environment. If a call fails with a connection error, surface it clearly — don't fall back to guessing.
+192
View File
@@ -0,0 +1,192 @@
---
name: MCP rules
description: Rules for FastMCP tools, resources, and prompts
applyTo: "finn_eiendom/mcp_server.py,finn_eiendom/**/*mcp*.py"
---
# MCP server rules
The MCP server is a **thin wrapper** over `service.py`. It owns:
* Tool registration with `@mcp.tool()` and annotations.
* Pydantic input schemas (these double as tool documentation).
* Error wrapping at the protocol boundary.
* JSON / markdown response formatting via `formatting.py`.
It does **not** own:
* Parsing, scraping, scoring, cache, or HTTP fetching logic.
* SQLite or `httpx` access.
* Any orchestration of "check cache, else fetch, else save" — that's `service.py`.
## Server bootstrap
```python
# finn_eiendom/mcp_server.py
import sys, logging
from mcp.server.fastmcp import FastMCP
logging.basicConfig(stream=sys.stderr, level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s")
mcp = FastMCP("finn_eiendom_mcp")
# ... tools registered here ...
def main() -> None:
mcp.run(transport="stdio")
if __name__ == "__main__":
main()
```
stdio servers **must** log to stderr only — anything on stdout breaks the JSON-RPC frame.
## Tool naming
All tools use the `finn_` prefix so they don't collide with other MCP servers running in the same Claude Desktop:
* `finn_analyze_search`
* `finn_get_ad`
* `finn_compare_ads`
* `finn_save_feedback`
* `finn_get_shortlist`
* `finn_get_new_ads_since_last_run`
* `finn_resolve_eiendom_unit`
* `finn_get_eiendom_unit`
* `finn_enrich_ad`
* `finn_build_unit_vector`
* `finn_decode_unit_vector`
* `finn_get_similar_units`
* `finn_find_similar_to_liked_ad`
* `finn_analyze_ad_against_comps`
## Tool body shape
Every tool body looks like this:
```python
@mcp.tool(
annotations=ToolAnnotations(
title="Analyze a FINN search URL",
readOnlyHint=True,
destructiveHint=False,
openWorldHint=True,
)
)
async def finn_analyze_search(input: AnalyzeSearchInput) -> str:
"""Analyze a FINN search URL and return a ranked shortlist."""
try:
result = await service.analyze_search(
search_url=input.search_url,
max_pages=input.max_pages,
detail_limit=input.detail_limit,
include_details=input.include_details,
include_eiendom_no=input.include_eiendom_no,
include_similar_units_for_shortlist=input.include_similar_units_for_shortlist,
)
return formatting.render_shortlist(result, input.response_format)
except Exception as e:
log.exception("finn_analyze_search failed")
return json.dumps({
"error": True,
"code": type(e).__name__,
"message": str(e),
})
```
Notes:
* Every tool delegates to `service.<function>` in one call.
* Every tool wraps in try/except and returns the error envelope as a JSON string.
* Output rendering goes through `formatting.py`, never inline.
* If the tool body needs more than ~20 lines, logic has leaked out of the service layer — push it back down.
## Input schemas
Every tool has a Pydantic v2 input model. Schemas live with the tool in `mcp_server.py` (they document the tool to LLM clients). Reuse from `models.py` only when the same shape is also a domain object — otherwise keep them as tool-local input types.
```python
class AnalyzeSearchInput(BaseModel):
search_url: str = Field(..., description="Full FINN search URL")
max_pages: int = Field(default=3, ge=1, le=10)
detail_limit: int = Field(default=20, ge=1, le=100)
include_details: bool = True
include_eiendom_no: bool = True
include_similar_units_for_shortlist: bool = False
response_format: Literal["json", "markdown"] = "json"
```
## Annotations
Set the right hints:
* Read-only tools (most of them): `readOnlyHint=True, destructiveHint=False, openWorldHint=True`.
* `finn_save_feedback`: `readOnlyHint=False, destructiveHint=False, idempotentHint=False`.
## Response format
Tools accept a `response_format` parameter (`"json"` or `"markdown"`):
* `"json"` — return `json.dumps(result_dict)`.
* `"markdown"` — return `formatting.render_<thing>(result, "markdown")`.
Errors are always returned as the JSON error envelope regardless of `response_format`.
## What stays out of mcp_server.py
* `import httpx` — never.
* `import sqlite3` — never.
* `from .ad import ...`, `from .search import ...`, `from .eiendom_no import ...`, `from .scoring import ...`, `from .cache import ...`, `from .http import ...` — never. Go through `service`.
* Output formatting logic — goes in `formatting.py`.
* Cache management — goes in `service.py`.
Allowed imports in `mcp_server.py`:
```python
import json, logging, sys
from typing import Literal, Optional
from mcp.server.fastmcp import FastMCP
from mcp.server.fastmcp.utilities import ToolAnnotations
from pydantic import BaseModel, Field
from . import service, formatting
from .models import FinnAd, EiendomUnit, SimilarUnit # only if needed for type hints
from . import config
```
`tests/test_architecture.py` enforces this.
## Resources and prompts
When you add resources or prompts, they follow the same rule: thin wrappers over `service.py` and `formatting.py`. Resources:
```
finn://preferences/current
finn://search-runs/latest
finn://search-runs/{id}
finn://ads/{finnkode}
finn://ads/{finnkode}/enriched
finn://shortlist/latest
finn://feedback/{finnkode}
finn://eiendom-units/{unitCode}
finn://eiendom-units/{unitCode}/similar/{listingStatus}
```
Prompts: `evaluate_property_for_user`, `compare_properties_for_user`, `refine_search_from_feedback`, `find_more_like_this`.
## When uncertain about FastMCP
Use `context7` for FastMCP / MCP SDK questions instead of guessing:
```
context7:resolve-library-id → "modelcontextprotocol/python-sdk" or similar
context7:query-docs(id, "FastMCP tool annotations") → snippets
```
See `docs.instructions.md`.
## Transports
* Default: stdio. `finn-eiendom-mcp` is the entry point.
* Optional: Streamable HTTP via `finn-eiendom serve --transport http --port 8010`. Path: `POST /mcp`. Operational endpoints: `GET /health`, `GET /version`, `GET /debug/config`.
* Keep tools transport-agnostic. No request/response shape depends on the transport.
@@ -0,0 +1,80 @@
---
name: Python project rules
description: Python conventions for the FINN/Eiendom MCP server
applyTo: "**/*.py"
---
# Python conventions
## Runtime
* Python **3.12+**.
* Project-local virtualenv at `.venv/` (created by `uv venv` or `python3.12 -m venv .venv`).
* All commands run inside the activated venv.
* Editable install: `uv pip install -e ".[dev]"` (or `pip install -e ".[dev]"`).
* Never install packages globally; never use `sudo pip`; never mutate host Python.
* Add new dependencies to `pyproject.toml` in the same change that uses them.
## Language
* Use Python 3.12 syntax. Prefer `X | None` over `Optional[X]`, `list[int]` over `List[int]`, structural pattern matching where it actually helps.
* **Type hints on every function signature**, including private helpers. `mypy --strict finn_eiendom` is the target.
* Async-first for I/O. Sync code is fine for parsing, scoring, and cache access (SQLite).
* Pydantic v2 for all structured domain models, with `model_config = ConfigDict(...)`. No v1 `class Config:` blocks.
## Prefer
* Small, pure functions for parsing, normalization, and scoring.
* Explicit return types and explicit exceptions.
* Dependency injection for HTTP clients and DB connections in tests (pass `client` / `conn` as args; let services own the defaults).
* Domain names from the PRD (`FinnAd`, `EiendomUnit`, `SimilarUnit`, `analyze_search`, `get_or_fetch_ad`).
* `dataclass` for internal value objects that don't cross the API boundary; Pydantic for anything serialized or validated.
## Avoid
* Global mutable state (module-level dicts as caches, etc.). The only allowed module-level state is configuration loaded from env in `config.py`.
* Hardcoded URLs, credentials, paths, or magic numbers anywhere outside `config.py`.
* `httpx` imports anywhere except `finn_eiendom/http.py`.
* `sqlite3` imports anywhere except `finn_eiendom/cache.py`.
* `BeautifulSoup` imports anywhere except `finn_eiendom/search.py` and `finn_eiendom/ad.py`.
* `msgpack` imports anywhere except `finn_eiendom/eiendom_no.py`.
* Scraping, scoring, cache, or HTTP fetching logic inside MCP tool or CLI command bodies.
* Direct network calls in unit tests — use `respx` and fixtures.
* `print()` for logging — use the `logging` module. stdio MCP server logs go to **stderr only**.
* Bare `except:` or `except Exception: pass` — catch the specific exception or let it propagate.
## External fetches
All external fetches must support:
* Configurable request delay (`FINN_REQUEST_DELAY_SECONDS`, `EIENDOM_NO_REQUEST_DELAY_SECONDS`).
* Cache lookup before fetch.
* Retry on 5xx with exponential backoff (`1s, 2s, 4s`).
* Graceful failure that returns `None` or empty rather than raising, when the caller can degrade.
* Structured logging at INFO for success, WARNING for retry, ERROR for final failure.
## Best practices
* **Single responsibility per function.** If a function name needs "and" to describe it, it's two functions.
* **Function length:** aim for under 30 lines. Past 50 lines it's a code smell — extract helpers.
* **Cyclomatic complexity:** if you've got more than 3 levels of nesting, the function wants splitting.
* **Naming:** `get_or_fetch_ad`, not `process_ad`. Verbs for actions, nouns for data. Avoid abbreviations except those well-known in the domain (`url`, `ad`, `nok`).
* **DRY:** if you write the same logic, regex, SQL, or format string twice, extract it. The decision table in `PRD.md` §17.2 tells you where it belongs.
* **Comments explain WHY**, not WHAT. The code already says what.
* **Errors are loud:** raise with actionable messages (`f"Unknown listing_status {status!r}; expected one of {VALID_STATUSES}"`). The MCP boundary wraps them as `{"error": True, ...}`.
## When uncertain about a library API
Use the `context7` MCP server **before** writing code:
1. `context7:resolve-library-id` with the package name → canonical library ID.
2. `context7:query-docs` with that ID + focused topic.
See `docs.instructions.md`. Don't guess from training memory — Pydantic, FastMCP, and Typer all change.
## Tooling
* `ruff check .` — lint. Target Python 3.12. Active rules: `E F I UP B SIM`.
* `ruff format .` — format. Line length 100.
* `mypy --strict finn_eiendom` — type-check.
* `pytest` — run the full suite.
+199
View File
@@ -0,0 +1,199 @@
---
name: Test rules
description: Testing conventions for parser, cache, scoring, service, MCP, CLI, and architecture
applyTo: "tests/**/*.py"
---
# Test rules
## Runtime
Tests run in the project-local `.venv`. From the project root with the venv activated:
```bash
pytest # full suite
pytest tests/test_service.py -v # one file
pytest -k "shortlist" # one keyword
pytest --lf # rerun last failures
```
`pytest-asyncio` is in `[tool.pytest.ini_options]` with `asyncio_mode = "auto"``async def` tests run without an `@pytest.mark.asyncio` decorator.
## Never do live network calls
No real HTTP in unit tests. Mock with `respx` (sits in front of `httpx.AsyncClient`):
```python
import respx, httpx
from finn_eiendom import http as http_module
@respx.mock
async def test_finn_search_fetch_uses_user_agent():
route = respx.get("https://www.finn.no/realestate/homes/search.html").mock(
return_value=httpx.Response(200, html=SAMPLE_FINN_SEARCH_HTML)
)
client = http_module.HTTPClient(user_agent="test-agent")
resp = await client.get("https://www.finn.no/realestate/homes/search.html")
assert resp.status_code == 200
assert route.calls.last.request.headers["user-agent"] == "test-agent"
```
## Fixtures
Fixture-driven testing for parsers and APIs:
* FINN search HTML → `tests/fixtures/finn_search.html`.
* FINN listing HTML → `tests/fixtures/finn_ad_*.html`.
* Eiendom.no unit search JSON → `tests/fixtures/eiendom_unit_search.json`.
* Eiendom.no unit detail JSON → `tests/fixtures/eiendom_unit_detail.json`.
* Eiendom.no similar-units JSON → `tests/fixtures/eiendom_similar.json`.
Loader helpers in `tests/fixtures.py` (e.g. `SAMPLE_FINN_SEARCH_HTML`, `SAMPLE_EIENDOM_UNIT_JSON`). Add new fixtures here, don't inline large strings in test files.
## Test layout
```
tests/
fixtures/ # raw HTML / JSON inputs
fixtures.py # loader helpers
conftest.py # shared pytest fixtures (tmp DB, http client, etc.)
test_parser.py # number/area/date/URL/finnkode normalization
test_search.py # FINN search HTML → cards
test_ad.py # FINN listing HTML → FinnAd
test_eiendom_no.py # unit search/detail/similar JSON, unit_vector encode/decode
test_scoring.py # all scoring components + classifier
test_cache.py # SQLite read/write/TTL
test_http.py # retry on 5xx, raise on 4xx, delay applied (new)
test_service.py # get_or_fetch_*, analyze_* (new)
test_formatting.py # render_* json/markdown/table (new)
test_mcp_server.py # tool registration + error envelope (expanded)
test_cli.py # typer CliRunner (new)
test_architecture.py # import-graph invariants (new)
```
## What to test per category
### Parsers (`test_parser`, `test_search`, `test_ad`, `test_eiendom_no`)
* Missing fields → `None`, not exception.
* Norwegian number formats: `7 200 991 kr`, `kr 7 200 991`, `7.200.991`.
* URL normalization (relative → absolute).
* Finnkode extraction from various URL shapes.
* Area parsing: `77 m²`, `77m2`, `77 kvm`.
* Price parsing (asking vs total vs shared debt).
* Eiendom.no JSON edge cases: empty `units`, missing `valuation`, missing `latestMarketData`.
### Unit vectors (`test_eiendom_no`)
* msgpack encoding + base64url without padding.
* Decode roundtrip.
* Missing optional fields (floor, rooms, built).
* Both lon/lat orderings handled.
### Scoring (`test_scoring`)
* Each component in isolation.
* Total clamped to 0100.
* Risk penalties applied (negative range).
* Bargain classification triggers on the expected signal mix.
* Hybel classification: documented / possible / unclear / not relevant.
* Explainability: explanation list non-empty when score is non-trivial.
### Cache (`test_cache`)
* Read after write returns same object.
* TTL expiry returns `None`.
* JSON roundtrip preserves all fields.
* `init_db` is idempotent on existing DBs.
### HTTP (`test_http`)
* Retries on 500/502/503/504 with backoff (count exactly N retries).
* Raises immediately on 404 / 4xx.
* Applies `request_delay` between calls.
* Honors `user_agent`.
### Service (`test_service`)
The service tests are the heart of the suite. They cover orchestration end-to-end against fixtures.
* `test_get_or_fetch_ad_uses_cache` — second call hits cache, no HTTP.
* `test_get_or_fetch_ad_fetches_when_cache_miss` — first call hits HTTP, then writes cache.
* `test_get_or_fetch_ad_force_refresh``force_refresh=True` bypasses cache.
* `test_analyze_search_with_fixtures` — full run from search HTML → shortlist.
* `test_find_similar_to_liked_uses_liked_feedback` — only seeds from `liked` verdicts.
Use a tmp SQLite DB via the `tmp_path` pytest fixture:
```python
@pytest.fixture
def tmp_db(tmp_path, monkeypatch):
db_path = tmp_path / "finn.sqlite"
monkeypatch.setenv("FINN_CACHE_PATH", str(db_path))
return db_path
```
### Formatting (`test_formatting`)
* `render_shortlist(result, "json")` is parseable JSON and roundtrips.
* `render_shortlist(result, "markdown")` contains the score and at least one risk.
* `render_<thing>(result, "xml")` raises `ValueError` listing supported formats.
### MCP (`test_mcp_server`)
* `test_mcp_server_has_correct_tools` — all 14 `finn_*` tool names registered.
* `test_finn_decode_unit_vector_returns_json` — happy path.
* `test_finn_analyze_search_handles_error` — error envelope shape: `{"error": True, "code": ..., "message": ...}`.
Use the `mcp` SDK's testing helpers; don't spawn a subprocess.
### CLI (`test_cli`)
Use Typer's `CliRunner`:
```python
from typer.testing import CliRunner
from finn_eiendom.cli import app
runner = CliRunner()
def test_cli_help():
result = runner.invoke(app, ["--help"])
assert result.exit_code == 0
assert "analyze-search" in result.stdout
```
Patch `service.<function>` with `monkeypatch` so CLI tests don't exercise the full stack — that's covered by `test_service.py`.
### Architecture (`test_architecture`)
Static checks of the module dependency graph:
* No `import httpx` outside `finn_eiendom/http.py`.
* No `import sqlite3` outside `finn_eiendom/cache.py`.
* No `BeautifulSoup` import outside `search.py` and `ad.py`.
* No `msgpack` import outside `eiendom_no.py`.
* `mcp_server.py` only imports from `service`, `formatting`, `models`, `config`, `mcp`, stdlib, `pydantic`.
* `cli.py` only imports from `service`, `formatting`, `models`, `config`, `typer`, stdlib.
* `service.py` does not import from `mcp_server` or `cli`.
Implementation: walk `.py` files under `finn_eiendom/` with `ast`, collect imports, assert allowed sets per module.
## Best practices
* One assertion per test (or per closely related group). Long tests die in painful ways.
* Test names describe the behavior: `test_get_or_fetch_ad_uses_cache_within_ttl`.
* Use `monkeypatch` for env vars and `tmp_path` for files. No `os.environ` mutation.
* No `time.sleep` — use `freezegun` if a test depends on time, or refactor the code under test to take a `now` parameter.
* No "smoke tests" that ping real servers — those go under a separately-marked `pytest -m live` suite and are not part of CI.
## When uncertain about test tooling
Use `context7` for pytest, respx, freezegun, or Typer testing:
```
context7:resolve-library-id → "pytest-dev/pytest" / "lundberg/respx"
context7:query-docs(id, "respx mock httpx async post")
```
See `docs.instructions.md`.
+33
View File
@@ -0,0 +1,33 @@
# Python
__pycache__/
*.py[cod]
*.egg-info/
.pytest_cache/
.mypy_cache/
.ruff_cache/
.coverage
htmlcov/
# Virtualenvs
.venv/
venv/
# uv
# uv.lock
# Env
.env
.env.local
# Data/cache
data/*.sqlite
data/*.sqlite-*
data/*.db
data/*.db-*
# Editor
.DS_Store
.idea/
# Logs
*.log
+10
View File
@@ -0,0 +1,10 @@
{
"recommendations": [
"github.copilot",
"github.copilot-chat",
"ms-python.python",
"charliermarsh.ruff",
"ms-azuretools.vscode-docker",
"tamasfe.even-better-toml"
]
}
+8
View File
@@ -0,0 +1,8 @@
{
"servers": {
"context7": {
"type": "http",
"url": "https://mcp.context7.com/mcp",
},
},
}
+23
View File
@@ -0,0 +1,23 @@
{
"python.defaultInterpreterPath": ".venv/bin/python",
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.pytestArgs": [
"tests"
],
"editor.formatOnSave": true,
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff"
},
"ruff.enable": true,
"chat.instructionsFilesLocations": {
".github/instructions": true
},
"github.copilot.chat.codeGeneration.useInstructionFiles": true,
"files.exclude": {
"**/__pycache__": true,
"**/.pytest_cache": true,
"**/.mypy_cache": true,
"**/.ruff_cache": true
}
}
+178
View File
@@ -0,0 +1,178 @@
# AGENTS.md — Workflow for AI agents on finn-eiendom-mcp
This is the master doc for any AI agent (Claude, Copilot, Cursor, etc.) working in this repo. Read this first, then the more specific files it references.
---
## Read order
Before changing code, read:
1. **`PRD.md`** — what we're building and why. Especially §17 ("Code ownership and anti-duplication") — that section is the constitution.
2. **`PROJECT.md`** — module map.
3. This file — workflow.
4. The relevant `.github/instructions/*.md`:
* `python.instructions.md` — Python conventions.
* `mcp.instructions.md` — MCP tool rules.
* `cli.instructions.md` — CLI command rules.
* `tests.instructions.md` — testing conventions.
* `clean-code.instructions.md` — best practices and DRY enforcement.
* `docs.instructions.md` — when and how to use the **context7** MCP server for library documentation.
If something in code contradicts the PRD, the PRD wins. If you change behavior, update both the PRD and the relevant instruction file in the same change.
---
## Runtime — local venv (default)
This project runs in a project-local virtualenv. Docker is supported for packaging but is not required for development.
### One-time setup
```bash
# from the project root
uv venv # or: python3.12 -m venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
```
Python **3.12+** is required.
### Daily commands
All commands are run inside the activated `.venv`:
```bash
pytest # tests
ruff check . # lint
ruff format . # format
mypy finn_eiendom # type-check
finn-eiendom --help # CLI entrypoint
finn-eiendom-mcp # MCP server (stdio)
finn-eiendom serve --transport http --port 8010 # MCP server (HTTP)
```
### Never
* Install packages globally (`pip install ...` outside a venv).
* Use `sudo pip`.
* Mutate the host Python.
* Add dependencies without updating `pyproject.toml`.
### Adding a dependency
```bash
uv pip install <package> # ad-hoc, then:
# edit pyproject.toml to record it
uv pip install -e ".[dev]" # reinstall in editable mode
```
---
## Architecture in one screen
```
cli.py (typer) mcp_server.py (FastMCP) ← thin, parallel front ends
\ /
\ /
service.py ← orchestration: get_or_fetch, analyze_*
analysis.py ← shortlist + summary
search / ad / eiendom_no / scoring / feedback
parser / http / cache
FINN HTML + Eiendom.no JSON + SQLite
```
`formatting.py` sits next to `service.py` and is shared by both CLI and MCP for `json`, `markdown`, and `table` rendering.
**The single-home rule:** every piece of logic has exactly one home. If you're tempted to add it in two places, you're wrong about one — push it down a layer and call it from both. See `PRD.md` §17.2 for the full ownership table.
---
## The five hard rules
These are non-negotiable. Architecture tests in `tests/test_architecture.py` enforce them.
1. **`mcp_server.py` and `cli.py` are siblings.** They never call each other. Both call only `service`, `formatting`, `models`, and `config`.
2. **`service.py` is the only place that combines cache + fetch.** Nothing above it touches HTTP or SQLite directly.
3. **`httpx` lives in `http.py`. Nowhere else.**
4. **`sqlite3` lives in `cache.py`. Nowhere else.**
5. **Output formatting lives in `formatting.py`.** No inline rendering in CLI or MCP tool bodies.
If you have to break one of these to ship a feature, the feature is wrong — fix the design first.
---
## Adding a feature — the checklist
For any new tool / command / behavior:
1. Decide the home using the table in `PRD.md` §17.2.
2. Write the function in `service.py` (or extend `analysis.py` if it's pure orchestration).
3. Add a test in `tests/test_service.py`.
4. Add a thin MCP tool in `mcp_server.py``response_format` aware.
5. Add a thin CLI command in `cli.py``--format` aware.
6. Add the renderer in `formatting.py` if output is non-trivial.
7. Add tests in `tests/test_mcp_server.py` and `tests/test_cli.py`.
8. Update `PRD.md` and any affected `.github/instructions/*.md`.
If steps 4 or 5 need more than ~20 lines, logic has leaked out of the service layer. Push it back down.
---
## Clean code
See `.github/instructions/clean-code.instructions.md`. Highlights:
* Type hints everywhere.
* Functions stay small; one job per function.
* Names describe intent (`get_or_fetch_ad`, not `process`).
* Comments explain **why**, never **what** the code already says.
* DRY: if you write the same regex / SQL / format string twice, extract it.
* Errors fail loudly with actionable messages. No silent `except: pass`.
* No dead code, no commented-out blocks left in the tree.
---
## Documentation lookups — use context7
When uncertain about a library's API (FastMCP decorators, Pydantic v2 validators, Typer command patterns, httpx async, msgpack, pytest-asyncio, respx, BeautifulSoup selectors, etc.), **use the `context7` MCP server**. Do not guess from training-data memory.
Pattern (full details in `.github/instructions/docs.instructions.md`):
1. `context7:resolve-library-id` with the library name → get the canonical ID.
2. `context7:query-docs` with that ID + a focused topic.
Use context7 *before* writing the code, not after a test fails. If context7 returns nothing useful, search the library's official docs, then write the smallest possible spike to verify.
---
## Safety and compliance
* Private, low-frequency use only.
* Respect FINN / Eiendom.no rate limits and bot protection.
* Cache aggressively; never bulk-harvest.
* stdio MCP servers log to **stderr only** — anything on stdout breaks the JSON-RPC frame.
* Scores and estimates are decision support, never legal / technical / financial advice.
---
## Implementation order (Phase 2)
Follow `PRD.md` §29 step-by-step. Each step is independently mergeable:
1. Switch dev workflow to local venv + update instruction files (this change).
2. Pydantic v2 cleanup.
3. Service layer + tests.
4. Formatting layer + tests.
5. HTTP retry on 5xx + tests.
6. Replace FastAPI with FastMCP stdio server.
7. CLI with typer.
8. Diff workflow.
9. Compare workflow.
10. Similar-to-liked.
11. Architecture tests.
12. README + Claude Desktop config.
+384
View File
@@ -0,0 +1,384 @@
# IMPLEMENTATION.md — Phase 2 build runbook
How to drive Phase 2 (the 12 steps in `PRD.md` §29) to completion using an AI agent. Each step has its own kickoff prompt, files affected, and "done" criteria. Run them in order. Each step is independently mergeable.
---
## 0. Pre-flight
Before starting step 1:
1. ls -la
2. **Venv is healthy.** From the project root:
```bash
source .venv/bin/activate
pytest -x # green except for any pre-existing FastMCP-related skips
ruff check . # zero issues
```
3. **Docs are in place.** Re-confirm `PRD.md` §17 (code ownership) is current — every step below references it.
If any of these fail, stop and fix before proceeding.
---
## How to use this runbook
For each step:
1. Create a feature branch: `git checkout -b feat/phase2-step-<N>-<slug>` off `chore/cleanup-phase-2-prep`.
2. Open a fresh agent chat with repo access. Paste the kickoff prompt verbatim.
3. Let the agent propose, implement, and test. Push back where it skips tests or violates §17.
4. When all "done" boxes are checked, merge into `chore/cleanup-phase-2-prep`.
5. Move to the next step.
Each kickoff prompt assumes the agent reads PRD.md, AGENTS.md, and the relevant instruction files first — that's encoded in the prompt.
After step 12, merge `chore/cleanup-phase-2-prep` into `main`.
---
## Step 1 — Dev workflow already switched to local venv
This step is **done** by the time `CLEANUP.md` is merged. The instruction files and `AGENTS.md` already use local venv. Sanity check:
```bash
source .venv/bin/activate
which finn-eiendom 2>/dev/null || echo "expected: not yet installed; entry points come in steps 6 and 7"
ruff check . # zero issues
pytest -x # green (allow mcp_server failures)
```
Move on.
---
## Step 2 — Pydantic v2 cleanup
### Kickoff prompt
> Read **PRD.md** (especially §17 code ownership and A8 acceptance criterion), **`.github/instructions/python.instructions.md`**, and **`.github/instructions/clean-code.instructions.md`**.
>
> Implement Phase 2 step 2: convert every Pydantic model in `finn_eiendom/models.py` from v1 (`class Config:`) to v2 (`model_config = ConfigDict(...)`). Use `context7:query-docs` on `pydantic/pydantic` if you're not sure of the v2 syntax — don't guess.
>
> Add `tests/test_models.py` with a JSON roundtrip test per model.
>
> Run `ruff check .`, `ruff format .`, and `pytest tests/test_models.py -v` before declaring done.
### Files
* `finn_eiendom/models.py` (edit)
* `tests/test_models.py` (new)
### Done when
* `grep -rn "class Config:" finn_eiendom/` produces zero output.
* `pytest tests/test_models.py` is green.
* Existing tests still pass.
---
## Step 3 — Service layer
### Kickoff prompt
> Read **PRD.md** §16 (Service layer) and §17 (code ownership), **`.github/instructions/python.instructions.md`** and **`.github/instructions/clean-code.instructions.md`**.
>
> Create `finn_eiendom/service.py` with the public surface listed in PRD §16: `get_or_fetch_ad`, `get_or_fetch_eiendom_unit`, `get_or_fetch_similar_units`, `analyze_search`, `analyze_ad`, `analyze_ad_against_comps`, `find_similar_to_liked`, `compare_ads`, `resolve_eiendom_unit_from_finn_url`, `build_unit_vector_for_unit_code`, `decode_unit_vector_to_dict`, `save_feedback`, `get_shortlist`, `get_new_ads_since_last_run`.
>
> Each function:
> 1. Opens its own SQLite connection via `cache.init_db(FINN_CACHE_PATH)`.
> 2. Reads cache first with TTLs from `config.py`.
> 3. On miss or `force_refresh=True`, calls the fetcher in `ad.py` / `eiendom_no.py`.
> 4. Writes the fresh result back.
> 5. Returns a typed model or dict.
>
> Do not duplicate behavior from `analysis.py` — delegate to it. Add `tests/test_service.py` covering the five service tests listed in PRD §25.2.
### Files
* `finn_eiendom/service.py` (new)
* `tests/test_service.py` (new)
* `tests/conftest.py` (may need a `tmp_db` fixture if it doesn't exist)
### Done when
* `pytest tests/test_service.py` is green.
* `service.py` imports only from `models`, `config`, `cache`, `analysis`, `ad`, `eiendom_no`, `feedback`, `scoring`, stdlib.
* No `import httpx` or `import sqlite3` outside their owners.
---
## Step 4 — Formatting layer
### Kickoff prompt
> Read **PRD.md** §17.6 (shared formatting module) and §19 (output formats), **`.github/instructions/clean-code.instructions.md`**.
>
> Create `finn_eiendom/formatting.py` with these renderers (signatures in PRD §17.6): `render_ad`, `render_shortlist`, `render_comparison`, `render_diff`, `render_similar_units`, `render_unit`, `render_score_breakdown`, plus `render_cache_stats` for the CLI cache subcommand.
>
> Each renderer accepts `(payload, fmt: Literal["json","markdown","table"]) -> str`. Unsupported formats raise `ValueError` listing supported options. Table rendering only applies where it makes sense (shortlist, comparison, diff, similar-units).
>
> Add `tests/test_formatting.py` covering the three tests listed in PRD §25.5.
### Files
* `finn_eiendom/formatting.py` (new)
* `tests/test_formatting.py` (new)
### Done when
* `pytest tests/test_formatting.py` is green.
* `render_*` is the *only* place that formats output. No inline rendering anywhere else (verified by reading diffs of steps 6 and 7).
---
## Step 5 — HTTP retry on 5xx
### Kickoff prompt
> Read **PRD.md** A9 (acceptance criterion), **`.github/instructions/python.instructions.md`**.
>
> Extend `HTTPClient.get()` in `finn_eiendom/http.py` to retry on 5xx responses (500/502/503/504) with exponential backoff `1s, 2s, 4s`, up to `retries` attempts (default 3). Surface 4xx as `httpx.HTTPStatusError` immediately. Apply the existing `request_delay` between any two calls.
>
> If you're unsure about `httpx` retry semantics or `respx` test patterns, use `context7`.
>
> Add `tests/test_http.py` covering the three tests listed in PRD §25.6 using `respx`.
### Files
* `finn_eiendom/http.py` (edit)
* `tests/test_http.py` (new)
### Done when
* `pytest tests/test_http.py` is green.
* `httpx` imports remain confined to `http.py`.
---
## Step 6 — Replace FastAPI with FastMCP
### Kickoff prompt
> Read **PRD.md** §14 (MCP design — every tool and input schema), §17 (code ownership), and **`.github/instructions/mcp.instructions.md`** end-to-end.
>
> Rewrite `finn_eiendom/mcp_server.py` from scratch:
> - Use `from mcp.server.fastmcp import FastMCP`.
> - Configure stderr-only logging.
> - Register all 14 tools listed in PRD §14.1 with the `finn_` prefix.
> - Each tool body has the shape in `mcp.instructions.md` §"Tool body shape": one `service.<function>` call, one `formatting.render_*` call, try/except returning the JSON error envelope.
> - Input schemas as in PRD §14.2.
> - Annotations: `readOnlyHint=True` for all except `finn_save_feedback`.
> - `main()` calls `mcp.run(transport="stdio")`.
> - Add `finn-eiendom-mcp = "finn_eiendom.mcp_server:main"` to `[project.scripts]` in `pyproject.toml`.
>
> If unsure about FastMCP annotations or transport options, use `context7:query-docs` on the MCP Python SDK.
>
> Rewrite `tests/test_mcp_server.py` to cover the three tests in PRD §25.3. Use the SDK's testing helpers — do not spawn a subprocess.
>
> Verify: `finn-eiendom-mcp` boots over stdio, `mcp dev finn_eiendom/mcp_server.py` lists all 14 tools.
### Files
* `finn_eiendom/mcp_server.py` (full rewrite)
* `tests/test_mcp_server.py` (full rewrite)
* `pyproject.toml` (edit `[project.scripts]`)
### Done when
* `mcp_server.py` imports only `service`, `formatting`, `models`, `config`, stdlib, `mcp`, `pydantic`.
* All 14 tools registered.
* `pytest tests/test_mcp_server.py` is green.
* `grep -rn "FastAPI" finn_eiendom/` is empty.
---
## Step 7 — CLI
### Kickoff prompt
> Read **PRD.md** §15 (CLI design — every command and option) and **`.github/instructions/cli.instructions.md`** end-to-end.
>
> Create `finn_eiendom/cli.py` with a `typer.Typer` app exposing all commands in PRD §15.1, plus `finn_eiendom/__main__.py` that calls the app. Add to `pyproject.toml`:
> ```
> [project.scripts]
> finn-eiendom = "finn_eiendom.cli:app"
> ```
>
> Each command:
> - Translates options into a `service.<function>` call.
> - Calls `formatting.render_*(result, format)` and `typer.echo(...)`.
> - No business logic, no inline rendering.
> - Body under ~20 lines.
>
> Sub-app for `cache` (stats/clear/clear-html/clear-json) and `config` (show/path). `serve` accepts `--transport stdio|http` and dispatches to `mcp_server.main()` or the HTTP transport.
>
> If unsure about Typer sub-apps or `CliRunner`, use `context7`.
>
> Add `tests/test_cli.py` covering the five tests in PRD §25.4 using `typer.testing.CliRunner`. Mock `service.*` with `monkeypatch` — do not exercise the full stack here, that's `test_service.py`.
### Files
* `finn_eiendom/cli.py` (new)
* `finn_eiendom/__main__.py` (new)
* `tests/test_cli.py` (new)
* `pyproject.toml` (edit)
### Done when
* `finn-eiendom --help` lists every command in PRD §15.1.
* `cli.py` imports only `service`, `formatting`, `models`, `config`, stdlib, `typer`.
* `pytest tests/test_cli.py` is green.
---
## Step 8 — Diff workflow (new / removed / changed)
### Kickoff prompt
> Read **PRD.md** §10.8, §13 (search_runs table), workflow I in §18, and **`.github/instructions/clean-code.instructions.md`**.
>
> Implement:
> 1. `search_runs` and `scores` tables in `cache.py` (use existing migration pattern).
> 2. `service.get_new_ads_since_last_run(search_url)` that compares against the previous run for the same `normalized_url` and returns `{new_ads, removed_ads, changed_ads}` with price/common_costs/status diffs on changed.
> 3. `finn_get_new_ads_since_last_run` MCP tool.
> 4. `finn-eiendom diff <url>` CLI command.
> 5. `formatting.render_diff(result, fmt)`.
>
> Add tests covering: empty previous-run case, all-new case, mixed new+removed+changed case.
### Done when
* The three new tests pass.
* MCP and CLI both expose the same behavior with identical defaults.
---
## Step 9 — Compare workflow
### Kickoff prompt
> Read **PRD.md** workflow K in §18 and §14.2 (`CompareAdsInput`).
>
> Implement `service.compare_ads(finnkoder, include_eiendom_no=True, include_comps=True)` returning a comparison table + winners by category (best value / lifestyle / hybel / bargain / safest / highest risk / most overpriced).
>
> Wire `finn_compare_ads` MCP tool and `finn-eiendom compare <finnkode...>` CLI command. Add `formatting.render_comparison`. Tests for service and CLI.
### Done when
* `finn-eiendom compare 462400360 461153194 --format markdown` produces a readable comparison.
* Service test covers the winners-by-category logic.
---
## Step 10 — Similar-to-liked
### Kickoff prompt
> Read **PRD.md** workflow G in §18 and `FindSimilarToLikedInput` in §14.2.
>
> Implement `service.find_similar_to_liked(finnkode, mode, listing_status)`:
> 1. Load FinnAd; verify `feedback` has `verdict=liked` for this finnkode.
> 2. Ensure Eiendom.no enrichment + unit_vector exist.
> 3. Fetch similar-units (prefer `FOR_SALE` for recommendations, `RECENTLY_SOLD` for comps).
> 4. Score candidates against user preferences.
> 5. Return ranked recommendations.
>
> Wire MCP tool and CLI command. Tests covering: no liked feedback raises clear error; happy path returns ranked list.
### Done when
* `finn-eiendom similar-to-liked 462400360` returns ranked candidates when the listing has a liked verdict, and a clear error otherwise.
---
## Step 11 — Architecture tests
### Kickoff prompt
> Read **PRD.md** A10 (architecture acceptance criterion) and §17.3 (layering invariants).
>
> Create `tests/test_architecture.py` that walks every `.py` file under `finn_eiendom/` with `ast`, collects all `import` and `from X import Y` statements, and asserts the layering invariants in PRD A10:
> - No `httpx` outside `http.py`.
> - No `sqlite3` outside `cache.py`.
> - No `BeautifulSoup` outside `search.py` / `ad.py`.
> - No `msgpack` outside `eiendom_no.py`.
> - `mcp_server.py` and `cli.py` import only from the allowed set.
> - `service.py` never imports `mcp_server` or `cli`.
>
> Add a parametrize'd test per invariant so failures show which module violated which rule. Failures should print the offending import line and module.
### Done when
* `pytest tests/test_architecture.py` is green.
* Deliberately introducing a violation (e.g. `import httpx` in `service.py`) makes a test fail with a clear message.
---
## Step 12 — README + Claude Desktop config + final verification
### Kickoff prompt
> Read **PRD.md** §21 (deployment), §22 (MVP scope), §24 (all acceptance criteria), **README.md** and **USAGE.md**.
>
> Update `README.md` and `USAGE.md` so every command, env var, and Claude Desktop snippet matches what was actually built in steps 111. Verify with the user's exact paths.
>
> Run the full A1A11 acceptance check:
>
> - A1: `finn-eiendom-mcp` boots over stdio; `mcp dev finn_eiendom/mcp_server.py` lists all 14 tools.
> - A2: `finn-eiendom --help` lists every §15.1 command; each command runs against fixtures.
> - A3 A9: matching service tests pass.
> - A10: `pytest tests/test_architecture.py` is green.
> - A11: `ruff check .` is clean; `pytest` is fully green; `mypy --strict finn_eiendom` passes or is documented as a gap.
>
> Report any failures with specific file/line references — don't paper over them.
### Files
* `README.md` (edit to match reality)
* `USAGE.md` (edit to match reality)
### Done when
* All 11 acceptance criteria in PRD §24 pass.
* README + USAGE quickstart examples actually work end-to-end on a fresh clone.
---
## Definition of done for the whole phase
Merge `chore/cleanup-phase-2-prep` into `main` when **every** box is checked:
* [ ] All 12 steps merged in order.
* [ ] `finn-eiendom-mcp` boots over stdio with all 14 tools.
* [ ] `finn-eiendom --help` lists every command in PRD §15.1.
* [ ] `pytest` is green, including the new `test_service.py`, `test_cli.py`, `test_http.py`, `test_formatting.py`, `test_models.py`, `test_architecture.py`.
* [ ] `ruff check .` is clean.
* [ ] `mypy --strict finn_eiendom` passes or has a documented exception list.
* [ ] `README.md` and `USAGE.md` quickstart examples work on a fresh clone in under 5 minutes.
* [ ] Claude Desktop config in USAGE.md is verified to work against your installation.
---
## When a step blocks
If a step blocks on an unclear requirement:
1. Re-read the relevant PRD section.
2. Check `PRD.md` §28 (open questions) — the answer may be a deferred decision.
3. If still unclear, write the question down, pick the simplest interpretation, mark it `# TODO(<date>): revisit <question>` in code, and move on.
If a step blocks on a library question (FastMCP, Pydantic v2, Typer, httpx, msgpack, respx):
1. Use `context7` — see `.github/instructions/docs.instructions.md`.
2. If context7 returns nothing useful, write the smallest possible spike in `scratch/` (gitignored) to verify behavior.
If a step blocks on §17 (code ownership) — i.e. it feels like the right answer requires putting logic in the "wrong" place:
1. Stop.
2. Re-read PRD §17.2 (decision table) and §17.3 (layering invariants).
3. Ask whether the service layer is actually missing a function. Usually it is.
4. Add the missing service function instead of bending the layering.
+47
View File
@@ -0,0 +1,47 @@
.PHONY: help venv install dev test test-fast lint format typecheck check clean serve mcp doctor
PYTHON ?= python3.12
VENV ?= .venv
BIN = $(VENV)/bin
help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## ' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-12s\033[0m %s\n", $$1, $$2}'
venv: ## Create the local virtualenv
uv venv $(VENV) 2>/dev/null || $(PYTHON) -m venv $(VENV)
@echo "Activate with: source $(BIN)/activate"
install: venv ## Install the package (editable) with dev extras
uv pip install --python $(BIN)/python -e ".[dev]" 2>/dev/null || $(BIN)/pip install -e ".[dev]"
dev: install ## Alias for install
test: ## Run the full test suite
$(BIN)/pytest
test-fast: ## Run tests, fail fast, verbose
$(BIN)/pytest -x -v
lint: ## Lint with ruff
$(BIN)/ruff check .
format: ## Auto-format with ruff
$(BIN)/ruff format .
typecheck: ## Static type-check with mypy
$(BIN)/mypy finn_eiendom
check: lint typecheck test ## Run lint + typecheck + tests
clean: ## Remove caches and build artifacts
rm -rf .pytest_cache .ruff_cache .mypy_cache build dist *.egg-info
find . -type d -name __pycache__ -prune -exec rm -rf {} +
serve: ## Start the MCP server over HTTP on port 8010
$(BIN)/finn-eiendom serve --transport http --port 8010
mcp: ## Start the MCP server over stdio
$(BIN)/finn-eiendom-mcp
doctor: ## Smoke-check the install
$(BIN)/finn-eiendom doctor
+1556
View File
File diff suppressed because it is too large Load Diff
+162
View File
@@ -0,0 +1,162 @@
# PROJECT.md — module map
The repo at a glance. For the why and the rules, read [`PRD.md`](PRD.md) §12 and §17. For the workflow, read [`AGENTS.md`](AGENTS.md).
---
## Source tree
```
finn-mcp/
├── pyproject.toml
├── Makefile
├── README.md ← user-facing overview
├── USAGE.md ← full user guide
├── PRD.md ← product spec + architecture (§17 = constitution)
├── PROJECT.md ← this file
├── AGENTS.md ← workflow for AI agents and contributors
├── CLEANUP.md ← pre-Phase-2 cleanup runbook
├── IMPLEMENTATION.md ← Phase 2 build runbook (12 steps)
├── .github/
│ ├── copilot-instructions.md
│ └── instructions/
│ ├── python.instructions.md
│ ├── mcp.instructions.md
│ ├── cli.instructions.md
│ ├── tests.instructions.md
│ ├── clean-code.instructions.md
│ └── docs.instructions.md ← context7 lookup rules
├── finn_eiendom/ ← the package
│ ├── __init__.py
│ ├── __main__.py ← python -m finn_eiendom → CLI
│ ├── config.py ← env vars, defaults, TTLs
│ ├── models.py ← Pydantic v2 models
│ ├── parser.py ← Norwegian number/area/URL/finnkode normalization
│ ├── http.py ← async httpx client w/ retry + delay
│ ├── cache.py ← SQLite schema + persistence
│ ├── search.py ← FINN search HTML parsing
│ ├── ad.py ← FINN listing HTML parsing
│ ├── eiendom_no.py ← Eiendom.no unit search/detail, unit_vector, comps
│ ├── scoring.py ← score model + classifications
│ ├── feedback.py ← verdicts + soft preference signal
│ ├── analysis.py ← shortlist + summary assembly
│ ├── service.py ← get_or_fetch_* + thin facade for MCP and CLI
│ ├── formatting.py ← render_* helpers (json/markdown/table) — shared by MCP and CLI
│ ├── mcp_server.py ← FastMCP wrappers around service.py
│ └── cli.py ← typer wrappers around service.py
├── tests/
│ ├── conftest.py
│ ├── fixtures.py
│ ├── fixtures/ ← HTML + JSON samples
│ ├── test_parser.py
│ ├── test_search.py
│ ├── test_ad.py
│ ├── test_eiendom_no.py
│ ├── test_scoring.py
│ ├── test_cache.py
│ ├── test_http.py ← retry + delay behavior
│ ├── test_service.py ← get_or_fetch_* + analyze_*
│ ├── test_formatting.py ← render_* roundtrips
│ ├── test_models.py ← Pydantic v2 roundtrips
│ ├── test_mcp_server.py ← tool registration + error envelope
│ ├── test_cli.py ← Typer CliRunner
│ └── test_architecture.py ← import-graph invariants (PRD A10)
└── data/ ← gitignored; SQLite cache lives here
└── finn.sqlite
```
---
## Module responsibilities
Single-home rule: every concern lives in exactly one module. See `PRD.md` §17.2 for the full table.
| Module | Owns | Imports allowed |
| --------------- | --------------------------------------------------------------------- | ---------------------------------------------------------- |
| `config.py` | env-var loading, defaults, TTL constants | stdlib |
| `models.py` | Pydantic v2 models | stdlib, `pydantic` |
| `parser.py` | Norwegian text normalization (numbers, dates, URLs, finnkode) | stdlib |
| `http.py` | async `httpx.AsyncClient`, retry on 5xx, delay, user-agent | stdlib, `httpx` |
| `cache.py` | SQLite schema, reads, writes, TTL | stdlib, `sqlite3`, `models` |
| `search.py` | FINN search HTML → cards (BeautifulSoup) | stdlib, `bs4`, `parser`, `http`, `cache`, `models` |
| `ad.py` | FINN listing HTML → `FinnAd` (BeautifulSoup) | stdlib, `bs4`, `parser`, `http`, `cache`, `models` |
| `eiendom_no.py` | Eiendom.no unit search/detail, unit_vector, similar-units (msgpack) | stdlib, `msgpack`, `http`, `cache`, `models` |
| `scoring.py` | 9 score components, total clamping, category classifier | stdlib, `models` |
| `feedback.py` | feedback storage and retrieval | stdlib, `cache`, `models` |
| `analysis.py` | shortlist + summary assembly | stdlib, `search`, `ad`, `eiendom_no`, `scoring`, `feedback`|
| `service.py` | cache-aware orchestration; the only place that combines fetch + cache | stdlib, `config`, `cache`, `analysis`, `ad`, `eiendom_no`, `feedback`, `scoring`, `models` |
| `formatting.py` | render_* helpers (json/markdown/table) | stdlib, `models` |
| `mcp_server.py` | FastMCP tool definitions, error wrapping, stdio/HTTP entry | stdlib, `mcp`, `pydantic`, `service`, `formatting`, `config`, `models` |
| `cli.py` | typer command definitions, --format dispatch | stdlib, `typer`, `service`, `formatting`, `config`, `models` |
`mcp_server.py` and `cli.py` are siblings — they never import each other. `service.py` never imports `mcp_server` or `cli`. `tests/test_architecture.py` enforces all of this.
---
## Entry points
Defined in `pyproject.toml`:
```toml
[project.scripts]
finn-eiendom-mcp = "finn_eiendom.mcp_server:main"
finn-eiendom = "finn_eiendom.cli:app"
```
So you have:
* `finn-eiendom-mcp` — MCP server over stdio (what Claude Desktop calls).
* `finn-eiendom` — CLI with all subcommands.
* `python -m finn_eiendom` — same as `finn-eiendom` (via `__main__.py`).
* `import finn_eiendom` — the library, for tests and notebooks.
---
## Dependency graph
```
cli.py mcp_server.py
↓ ↓
└──> formatting.py <──┘
service.py
analysis.py
┌───────────┼──────────────┐
↓ ↓ ↓
search.py ad.py eiendom_no.py scoring.py feedback.py
│ │ │ │ │
↓ ↓ ↓ ↓ ↓
parser.py parser.py cache.py models.py cache.py
│ │ │
↓ ↓ ↓
http.py http.py http.py
```
Bottom layer: `parser.py`, `http.py`, `cache.py`, `models.py`, `config.py`. They depend only on stdlib + one third-party library each.
The graph is acyclic and points downward. Every arrow can be drawn; no arrow can be drawn upward.
---
## Where to add things
| You want to… | Add it to… |
| ----------------------------------------- | --------------------------------------- |
| Parse a new FINN field | `ad.py` or `search.py` + `models.py` |
| Add a new score component | `scoring.py` |
| Add a new env var | `config.py` |
| Add a new MCP tool | `mcp_server.py` (after `service.py`) |
| Add a new CLI command | `cli.py` (after `service.py`) |
| Change how something renders | `formatting.py` |
| Add a new orchestration / workflow | `service.py` (then add MCP + CLI) |
| Speak to a new external API | new module next to `eiendom_no.py` |
| Add a new SQLite table | `cache.py` |
For anything else — read `PRD.md` §17.2 and §17.7.
+160
View File
@@ -0,0 +1,160 @@
# finn-eiendom-mcp
> **Private, self-hosted property analysis platform for Norwegian real estate.** Analyzes FINN listings, enriches with Eiendom.no estimates, scores against personal preferences, and surfaces bargain candidates, hybel potential, renovation upside, and risk flags. Exposed through an MCP server, a CLI, and a Python library — all sharing one service layer.
This is a **personal tool**. Not a SaaS, not a crawler, not legal/financial advice. Run locally, low frequency, your own data.
---
## What it does
```
FINN search URL → ranked shortlist of homes
with reasons, risks, comps, broker questions
```
Specifically:
* Parses FINN search and listing pages.
* Resolves each listing to an Eiendom.no `unitCode` for valuation and similar-units.
* Builds a `unit_vector` and fetches recently-sold comparables.
* Scores 9 components (economy, market position, comps, location, layout, outdoor, hybel, renovation, risk).
* Classifies as *bargain*, *safe*, *hybel*, *renovation*, *lifestyle*, or *risk*.
* Caches everything in SQLite; remembers what you've liked or rejected.
* Detects new / removed / changed listings between runs.
---
## Three ways to use it
| Surface | When you want… | Entry point |
| --------------- | -------------------------------------------------------------- | ----------------------- |
| **CLI** | Quick triage in a terminal, scripting, cron | `finn-eiendom ...` |
| **MCP server** | Claude Desktop, n8n, AI agents — conversational analysis | `finn-eiendom-mcp` |
| **Python lib** | Tests, notebooks, custom scripts | `import finn_eiendom` |
All three call the same underlying `service.py` — same defaults, same semantics, same results.
---
## Quick start
### Requirements
* Python **3.12+**
* `uv` (recommended) or `pip`
### Install
```bash
git clone <your-repo-url> finn-mcp
cd finn-mcp
uv venv # or: python3.12 -m venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
```
### First run (CLI)
```bash
# Triage a FINN search
finn-eiendom analyze-search 'https://www.finn.no/realestate/homes/search.html?location=...' --format table
# Drill into one listing
finn-eiendom get-ad 462400360 --format markdown
# Mark a listing as liked
finn-eiendom save-feedback 462400360 liked --notes "great layout, check fellesgjeld"
# Find similar properties to liked listings
finn-eiendom similar-to-liked 462400360
```
### First run (Claude Desktop)
Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or the equivalent on Linux:
```json
{
"mcpServers": {
"finn-eiendom": {
"command": "/absolute/path/to/finn-mcp/.venv/bin/finn-eiendom-mcp",
"env": {
"FINN_CACHE_PATH": "/absolute/path/to/finn-mcp/data/finn.sqlite",
"EIENDOM_NO_ENABLED": "true"
}
}
}
}
```
Restart Claude Desktop. Then in any chat:
> Analyze this FINN search and shortlist the top 5 for a couple in Oslo with a 912 MNOK budget, willing to renovate, prefer hybel potential:
> `https://www.finn.no/realestate/homes/search.html?location=...`
For deep usage — every command, every MCP tool, every env var — see [`USAGE.md`](USAGE.md).
---
## Architecture in one screen
```
CLI (typer) MCP server (FastMCP) ← thin, parallel front ends
\ /
\ /
service.py ← cache + fetch orchestration
analysis.py ← shortlist + summary
search / ad / eiendom_no / scoring / feedback
parser / http / cache (SQLite)
FINN HTML + Eiendom.no JSON
```
`formatting.py` lives next to `service.py` and is shared by both CLI and MCP for JSON / markdown / table rendering.
**Key rule:** CLI and MCP are siblings. They never call each other. Both call the same `service.py` functions. See [`PRD.md`](PRD.md) §17 for the full code-ownership constitution.
---
## Project documents
Read in this order depending on what you're doing:
| If you want to… | Read |
| ------------------------------------- | --------------------------------------------------- |
| Use the tool | This README, then [`USAGE.md`](USAGE.md) |
| Understand the design | [`PRD.md`](PRD.md), especially §1, §12, §17 |
| Contribute / extend / hack on it | [`AGENTS.md`](AGENTS.md), then [`PROJECT.md`](PROJECT.md), then `.github/instructions/*.md` |
| Run the cleanup pass on the repo | [`CLEANUP.md`](CLEANUP.md) |
| Build out unfinished features | [`IMPLEMENTATION.md`](IMPLEMENTATION.md) |
---
## Status
* **Phase 0 (spike):** done.
* **Phase 1 (core MVP):** mostly done.
* **Phase 2 (MCP + CLI):** in progress — driven by [`IMPLEMENTATION.md`](IMPLEMENTATION.md).
* **Phase 3+ (scoring v2, agent workflows, dashboard):** future.
---
## Safety and compliance
* Private, low-frequency, user-triggered use only. No public deployment.
* Configurable request delays (`FINN_REQUEST_DELAY_SECONDS`, `EIENDOM_NO_REQUEST_DELAY_SECONDS`) — defaults are conservative.
* Aggressive caching to minimize external requests.
* No bypassing of rate limits, bot protection, authentication, or access controls.
* No public redistribution of FINN or Eiendom.no data.
* Scores, estimates, and comparable sales are **decision support, not advice**. Don't substitute this for a real broker, lawyer, or technical inspector.
---
## License / use
Personal project. Not for redistribution. Don't expose the MCP HTTP transport on a public interface — keep it on LAN, Tailscale, or behind auth.
+503
View File
@@ -0,0 +1,503 @@
# USAGE.md — finn-eiendom user guide
How to use the tool day-to-day. Covers installation, every CLI command, every MCP tool, Claude Desktop integration, common workflows, environment variables, and troubleshooting.
For the why and the architecture, see [`README.md`](README.md) and [`PRD.md`](PRD.md).
---
## 1. Installation
### Requirements
* Python **3.12 or newer** (check with `python3 --version`)
* `uv` (recommended) or `pip`
* macOS, Linux, or WSL2 on Windows
### Install
```bash
git clone <your-repo-url> finn-mcp
cd finn-mcp
# Option A: uv (preferred — fast)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Option B: pip
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```
Verify:
```bash
finn-eiendom --help
finn-eiendom-mcp --help # may exit immediately on stdio mode; that's fine
finn-eiendom doctor # smoke-checks cache, FINN, Eiendom.no reachability
```
### Updating
```bash
git pull
source .venv/bin/activate
uv pip install -e ".[dev]"
```
If `pyproject.toml` added dependencies, the second command picks them up.
### Global install (optional)
If you want `finn-eiendom` available system-wide without activating the venv:
```bash
uv tool install .
# or
pipx install .
```
---
## 2. First-time setup
### Set up the data directory
```bash
mkdir -p data
```
SQLite cache lives there at `data/finn.sqlite` by default. Override with `FINN_CACHE_PATH` if you want it elsewhere.
### Optional: environment file
Create `.env` in the project root for your usual settings:
```bash
FINN_CACHE_PATH=data/finn.sqlite
FINN_MAX_SEARCH_PAGES=3
FINN_DETAIL_LIMIT=20
EIENDOM_NO_ENABLED=true
EIENDOM_NO_SIMILAR_UNITS_ENABLED=true
LOG_LEVEL=INFO
```
See §7 for the full list of variables.
### Verify
```bash
finn-eiendom doctor
```
This pings the cache, reaches FINN once, reaches Eiendom.no once, and reports any failures.
---
## 3. CLI reference
Every command runs inside the activated venv.
### 3.1 Analyze a FINN search
```bash
finn-eiendom analyze-search '<finn-search-url>' [options]
```
| Option | Default | Purpose |
| ------------------- | ------- | ---------------------------------------------------------- |
| `--max-pages N` | `3` | Pages of search results to fetch. |
| `--detail-limit N` | `20` | How many listings to detail-fetch from the result set. |
| `--no-details` | off | Skip detail fetches; use only search-card data. |
| `--no-eiendom` | off | Skip Eiendom.no enrichment. |
| `--with-similar` | off | Fetch similar-units / comps for shortlisted listings. |
| `--format FMT` | `json` | `json`, `markdown`, or `table`. |
Examples:
```bash
# Triage in the terminal
finn-eiendom analyze-search 'https://www.finn.no/realestate/homes/search.html?location=0.20061&min_bedrooms=2&price_collective_to=12000000' --format table
# Full JSON for piping into jq
finn-eiendom analyze-search '<url>' --format json | jq '.shortlist[].title'
# Detailed run with comps
finn-eiendom analyze-search '<url>' --detail-limit 30 --with-similar --format markdown
```
### 3.2 Drill into one listing
```bash
finn-eiendom get-ad <finnkode> [options]
```
| Option | Default | Purpose |
| ------------------- | ------- | -------------------------------------------------- |
| `--force-refresh` | off | Bypass the 24h cache and refetch. |
| `--no-eiendom` | off | Skip Eiendom.no enrichment. |
| `--with-similar` | off | Fetch similar-units / comps. |
| `--format FMT` | `json` | `json` or `markdown`. |
```bash
finn-eiendom get-ad 462400360 --format markdown
finn-eiendom get-ad 462400360 --force-refresh --with-similar
```
### 3.3 Compare listings
```bash
finn-eiendom compare <finnkode> <finnkode> [<finnkode>...] [options]
```
| Option | Default | Purpose |
| ---------------- | ------- | -------------------------------------- |
| `--no-eiendom` | off | Skip Eiendom.no enrichment. |
| `--no-comps` | off | Skip similar-units / comps. |
| `--format FMT` | `json` | `json`, `markdown`, or `table`. |
```bash
finn-eiendom compare 462400360 461153194 --format markdown
finn-eiendom compare 462400360 461153194 462400360 --format table
```
Up to 10 finnkoder per call.
### 3.4 Feedback
```bash
finn-eiendom save-feedback <finnkode> <verdict> [--notes "..."]
```
Verdict vocabulary: `liked`, `rejected`, `interesting`, `bargain_candidate`, `risk_object`, `viewing_candidate`, `viewed`, `too_expensive`, `too_small`, `too_far_out`, `too_high_risk`, `likes_location`, `likes_layout`, `dislikes_area`.
```bash
finn-eiendom save-feedback 462400360 liked --notes "balcony, view, check wet rooms"
finn-eiendom save-feedback 461153194 rejected --notes "too far from city center"
```
`liked` verdicts feed the `similar-to-liked` command.
### 3.5 New / removed / changed listings
```bash
finn-eiendom diff '<finn-search-url>' [--format FMT]
```
Compares the current search results against the previous run for the same normalized URL and reports new finnkoder, removed finnkoder, and changed listings (price, common costs, status).
```bash
finn-eiendom diff '<url>' --format table
```
Useful as a daily cron:
```bash
0 9 * * * cd /path/to/finn-mcp && .venv/bin/finn-eiendom diff 'https://www.finn.no/...' --format markdown >> diff.log
```
### 3.6 Shortlist history
```bash
finn-eiendom shortlist [--run-id ID] [--limit N] [--format FMT]
```
Without `--run-id`, returns the latest saved shortlist.
### 3.7 Eiendom.no commands
```bash
finn-eiendom resolve-unit '<finn-listing-url>' # find unitCode for a FINN listing
finn-eiendom get-unit <unit_code> [--force-refresh] # fetch unit detail
finn-eiendom enrich-ad <finnkode> [--with-similar] # FINN + Eiendom.no combined
finn-eiendom build-vector <unit_code> # build the base64url unit_vector
finn-eiendom decode-vector <unit_vector> # decode for inspection
finn-eiendom similar-units <unit_vector> [--status RECENTLY_SOLD|FOR_SALE|CURRENT]
```
### 3.8 Find similar to liked
```bash
finn-eiendom similar-to-liked <finnkode> [--mode recommendations|comps] [--status STATUS]
```
The listing must have a `liked` feedback row. Defaults to `mode=recommendations`, `status=FOR_SALE` — i.e. find active listings similar to this one. Use `--mode comps --status RECENTLY_SOLD` to get comparable sales instead.
### 3.9 Price analysis against comps
```bash
finn-eiendom analyze-against-comps <finnkode>
```
Returns `price_position` (`below_estimate` / `within_range` / `above_estimate`), `sqm_price_position` (`cheap` / `normal` / `expensive`), `comparable_score`, and a `confidence` label.
### 3.10 Cache management
```bash
finn-eiendom cache stats # row counts and TTL summary
finn-eiendom cache clear # purge everything except feedback
finn-eiendom cache clear-html # only purge raw HTML
finn-eiendom cache clear-json # only purge raw JSON
```
Feedback is never purged by `cache clear` — feedback is permanent until explicitly deleted via SQL.
### 3.11 MCP server
```bash
finn-eiendom serve # stdio (default)
finn-eiendom serve --transport http --port 8010 # HTTP for n8n / multi-client
```
In HTTP mode the server listens on `http://127.0.0.1:8010/mcp` with operational endpoints `GET /health`, `GET /version`, `GET /debug/config`.
There's also a shorthand `finn-eiendom-mcp` that starts stdio mode directly — that's what Claude Desktop calls.
### 3.12 Misc
```bash
finn-eiendom config show # print resolved configuration
finn-eiendom config path # print SQLite cache path
finn-eiendom doctor # smoke checks
finn-eiendom version
```
---
## 4. MCP tools (for Claude Desktop / n8n / agents)
All tools use the `finn_` prefix. They mirror the CLI commands 1:1 — same defaults, same semantics.
| Tool | Purpose |
| ------------------------------------- | ---------------------------------------------------------------- |
| `finn_analyze_search` | Analyze a FINN search URL and return a ranked shortlist. |
| `finn_get_ad` | Fetch structured data for one finnkode. |
| `finn_compare_ads` | Compare multiple listings side by side. |
| `finn_save_feedback` | Store feedback/verdict/notes. |
| `finn_get_shortlist` | Fetch a stored shortlist from a previous run. |
| `finn_get_new_ads_since_last_run` | Detect new / removed / changed listings. |
| `finn_resolve_eiendom_unit` | Map FINN URL → Eiendom.no `unitCode`. |
| `finn_get_eiendom_unit` | Fetch Eiendom.no unit detail by `unitCode`. |
| `finn_enrich_ad` | Combine FINN listing + Eiendom.no enrichment. |
| `finn_build_unit_vector` | Build a `unit_vector` from a `unitCode`. |
| `finn_decode_unit_vector` | Decode a `unit_vector` for inspection. |
| `finn_get_similar_units` | Fetch comps / recommendations. |
| `finn_find_similar_to_liked_ad` | Find properties similar to one you liked. |
| `finn_analyze_ad_against_comps` | Evaluate a listing against `RECENTLY_SOLD` comps. |
Every tool accepts a `response_format` parameter (`"json"` or `"markdown"`). Errors come back as `{"error": true, "code": "<ExceptionName>", "message": "..."}`.
---
## 5. Claude Desktop setup
### Config file
* macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
* Linux: `~/.config/Claude/claude_desktop_config.json`
### Direct entry-point (recommended)
```json
{
"mcpServers": {
"finn-eiendom": {
"command": "/absolute/path/to/finn-mcp/.venv/bin/finn-eiendom-mcp",
"env": {
"FINN_CACHE_PATH": "/absolute/path/to/finn-mcp/data/finn.sqlite",
"EIENDOM_NO_ENABLED": "true",
"EIENDOM_NO_SIMILAR_UNITS_ENABLED": "true",
"LOG_LEVEL": "INFO"
}
}
}
}
```
The `command` **must** be the absolute path to the venv's `finn-eiendom-mcp` binary. Don't rely on `$PATH` here — Claude Desktop doesn't inherit your shell environment.
### Alternative: via `uv`
```json
{
"mcpServers": {
"finn-eiendom": {
"command": "uv",
"args": ["run", "finn-eiendom-mcp"],
"cwd": "/absolute/path/to/finn-mcp"
}
}
}
```
### Verify
1. Restart Claude Desktop.
2. Look for `finn-eiendom` in the MCP servers indicator (usually a hammer icon).
3. Ask in any chat: *"Use the finn-eiendom server to analyze this search: ..."*
If it doesn't show up, check the Claude Desktop logs:
* macOS: `~/Library/Logs/Claude/mcp-server-finn-eiendom.log`
* Linux: `~/.local/share/Claude/logs/mcp-server-finn-eiendom.log`
stdout output from the server is a fatal error — the server must only log to stderr.
---
## 6. Common workflows
### 6.1 Daily triage
```bash
# Morning routine
finn-eiendom diff 'https://www.finn.no/...' --format table
# Detail-fetch only what's new or changed
finn-eiendom analyze-search 'https://www.finn.no/...' --detail-limit 10 --format markdown
```
### 6.2 Weekly deep dive in Claude Desktop
> Read my latest finn-eiendom shortlist and group the top 10 by category (bargain / safe / hybel / lifestyle). For each, summarize the three most important risks and the three most important broker questions.
### 6.3 Pre-viewing prep
```bash
# Mark candidates for viewing
finn-eiendom save-feedback 462400360 viewing_candidate --notes "Saturday 14:00"
# Get the full data + comps
finn-eiendom get-ad 462400360 --with-similar --format markdown > viewing_prep_462400360.md
```
Then in Claude Desktop:
> Read the saved markdown for finnkode 462400360 and prepare a viewing checklist: wet rooms to inspect, common-costs questions, hybel-approval question, neighbor questions.
### 6.4 Comparing finalists
```bash
finn-eiendom compare 462400360 461153194 459333210 --format markdown > finalists.md
```
### 6.5 Build a recommendation set from liked properties
```bash
# After you've liked a few
finn-eiendom save-feedback 462400360 liked
finn-eiendom save-feedback 461153194 liked
# Get recommendations similar to each
finn-eiendom similar-to-liked 462400360 --mode recommendations --status FOR_SALE
finn-eiendom similar-to-liked 461153194 --mode recommendations --status FOR_SALE
```
---
## 7. Environment variables
| Variable | Default | Purpose |
| ----------------------------------------- | -------------------------------: | -------------------------------- |
| `FINN_CACHE_PATH` | `data/finn.sqlite` | SQLite DB path |
| `FINN_MAX_SEARCH_PAGES` | `3` | Max search pages per analyze |
| `FINN_DETAIL_LIMIT` | `20` | Max detail fetches per analyze |
| `FINN_REQUEST_DELAY_SECONDS` | `2` | Seconds between FINN requests |
| `FINN_USER_AGENT` | `personal-finn-eiendom-analyzer/0.1` | HTTP User-Agent |
| `FINN_CACHE_TTL_SEARCH_MINUTES` | `60` | Search cache TTL |
| `FINN_CACHE_TTL_AD_HOURS` | `24` | Listing cache TTL |
| `EIENDOM_NO_ENABLED` | `true` | Enable Eiendom.no enrichment |
| `EIENDOM_NO_BASE_URL` | `https://api.eiendom.no/api/v1` | API base URL |
| `EIENDOM_NO_CACHE_TTL_HOURS` | `24` | Unit/similar cache TTL |
| `EIENDOM_NO_REQUEST_DELAY_SECONDS` | `1` | Seconds between Eiendom.no calls |
| `EIENDOM_NO_SIMILAR_UNITS_ENABLED` | `true` | Enable similar-units |
| `EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS` | `RECENTLY_SOLD` | Default comps status |
| `HJEMLA_ENABLED` | `false` | Enable optional Hjemla API |
| `LOG_LEVEL` | `INFO` | Log level |
| `MCP_TRANSPORT` | `stdio` | `stdio` or `streamable_http` |
| `MCP_HTTP_HOST` | `127.0.0.1` | HTTP bind address |
| `MCP_HTTP_PORT` | `8010` | HTTP port |
Set them in `.env`, in your shell, or in the Claude Desktop `env` block per §5.
---
## 8. Troubleshooting
### Claude Desktop doesn't see the server
1. The `command` path must be absolute and point at the venv's binary.
2. Check `~/Library/Logs/Claude/mcp-server-finn-eiendom.log` (macOS) for a Python traceback.
3. The server **must not** write to stdout — any `print()` in the code breaks JSON-RPC. If you're hacking on it and see a frame parse error, that's the cause.
4. Restart Claude Desktop after config changes (`Cmd+Q`, not just close the window).
### "Module not found" when running CLI
The venv isn't activated, or the package isn't installed in editable mode.
```bash
source .venv/bin/activate
uv pip install -e ".[dev]"
```
### Eiendom.no enrichment is `unavailable`
This is graceful degradation when:
* The FINN URL can't be matched to a `unitCode` (rare, but happens for unusual addresses).
* Eiendom.no rate-limited or returned 5xx.
* The unit was deleted from Eiendom.no's index.
Check the log for the warning. The listing analysis continues without enrichment.
### Similar-units returns nothing
* Verify `EIENDOM_NO_SIMILAR_UNITS_ENABLED=true`.
* The `unit_vector` might be empty / malformed — check `finn-eiendom decode-vector <unit_vector>`.
* Try `--status FOR_SALE` if `RECENTLY_SOLD` is sparse, or vice versa.
### Slow first run
The first analyze fills the cache. Subsequent runs are much faster. Tune `FINN_REQUEST_DELAY_SECONDS` and `EIENDOM_NO_REQUEST_DELAY_SECONDS` if you're impatient — but don't drop them too low, the whole point of caching is to be polite.
### Stale results
Cache TTLs:
* Search: 60 minutes
* FINN listing: 24 hours
* Eiendom.no unit: 24 hours
* Similar-units: 24 hours
Force a refresh with `--force-refresh` on `get-ad` or `get-unit`, or wipe with `finn-eiendom cache clear`.
### `pytest` fails after pulling new changes
```bash
source .venv/bin/activate
uv pip install -e ".[dev]" # re-sync dependencies
pytest -x # find the first failure
```
If a test fails with a network-related error, that's a bug — tests should never hit the network. Report it.
---
## 9. What this tool is not
* Not a public API. Don't expose the HTTP transport on the open internet.
* Not financial, legal, or valuation advice. Scores and estimates are decision support.
* Not a bidding agent. It will never contact a broker or place a bid for you.
* Not a crawler. Use it for the searches you'd be manually browsing anyway — at your own pace.
* Not a substitute for a real condition report (`tilstandsrapport`), a real lawyer, or a real broker.
---
## 10. Getting help
* [`README.md`](README.md) — overview
* [`PRD.md`](PRD.md) — full product spec and architecture
* [`AGENTS.md`](AGENTS.md) — workflow rules for contributors
* [`.github/instructions/*.md`](.github/instructions/) — per-topic conventions
For bugs, open an issue in the repo with: the exact command run, the full traceback or unexpected output, the version (`finn-eiendom version`), and a redacted FINN URL if relevant.
+36
View File
@@ -0,0 +1,36 @@
"""FINN Real Estate MCP Server - Private property analysis platform."""
__version__ = "0.1.0"
__author__ = "FINN Scout"
from . import ad, analysis, cache, config, eiendom_no, scoring, search
from .http import HTTPClient
from .models import EiendomUnit, FinnAd, FinnSearchCard, SimilarUnit, UnitVector
from .parser import (
extract_finnkode_from_url,
normalize_area,
normalize_finnkode,
normalize_number,
normalize_price,
)
__all__ = [
"config",
"FinnAd",
"FinnSearchCard",
"EiendomUnit",
"SimilarUnit",
"UnitVector",
"normalize_price",
"normalize_area",
"normalize_number",
"normalize_finnkode",
"extract_finnkode_from_url",
"HTTPClient",
"ad",
"analysis",
"cache",
"eiendom_no",
"scoring",
"search",
]
+193
View File
@@ -0,0 +1,193 @@
"""FINN listing detail scraping and normalization."""
import logging
import re
from datetime import UTC, datetime
from bs4 import BeautifulSoup
from .http import HTTPClient
from .models import FinnAd
from .parser import (
clean_text,
extract_finnkode_from_url,
normalize_area,
normalize_finnkode,
normalize_number,
normalize_price,
text_to_bool,
)
logger = logging.getLogger(__name__)
FINN_AD_URL_TEMPLATE = "https://www.finn.no/realestate/homes/ad.html?finnkode={}"
async def fetch_ad(finnkode: str, client: HTTPClient | None = None) -> str:
"""Fetch FINN listing HTML by finnkode."""
client = client or HTTPClient(request_delay_seconds=0.0)
url = FINN_AD_URL_TEMPLATE.format(finnkode)
response = await client.get(url)
return response.text
def _load_property_map(soup: BeautifulSoup) -> dict[str, str]:
properties: dict[str, str] = {}
for dt, dd in zip(soup.find_all("dt"), soup.find_all("dd"), strict=False):
key = clean_text(dt.get_text()) or ""
value = clean_text(dd.get_text()) or ""
properties[key.lower()] = value
return properties
def _get_data_testid_value(soup: BeautifulSoup, testid: str) -> str | None:
node = soup.select_one(f'[data-testid="{testid}"]')
if not node:
return None
return clean_text(node.get_text(" ", strip=True))
def _strip_labelled_text(text: str | None, labels: list[str]) -> str | None:
if not text:
return None
for label in labels:
if text.lower().startswith(label.lower()):
return clean_text(text[len(label) :])
return text
def _extract_floor_from_text(text: str | None) -> str | None:
if not text:
return None
match = re.search(r"(\d+)\s*\.?\s*etasje", text, re.IGNORECASE)
if match:
return f"{match.group(1)}. etasje"
return None
def _clean_description(text: str | None) -> str | None:
if not text:
return None
cleaned = re.sub(r"(?i)^om boligen", "", text).strip()
cleaned = re.sub(r"(?i)^beskrivelse", "", cleaned).strip()
return clean_text(cleaned)
def _load_feature_text(soup: BeautifulSoup) -> str:
return _get_data_testid_value(soup, "object-facilities") or ""
def _extract_description(soup: BeautifulSoup) -> str | None:
node = soup.select_one('[data-testid="om boligen"]') or soup.select_one(".description")
if not node:
return None
paragraphs = [clean_text(p.get_text()) for p in node.select("p") if clean_text(p.get_text())]
if paragraphs:
return "\n".join(paragraphs)
return _clean_description(node.get_text(" ", strip=True))
def scrape_ad(html: str, url: str | None = None) -> FinnAd:
"""Scrape a FINN listing HTML page into a FinnAd model."""
soup = BeautifulSoup(html, "html.parser")
title_node = soup.select_one("h1")
broker_name = soup.select_one(".broker-name")
properties = _load_property_map(soup)
feature_text = _load_feature_text(soup).lower()
finnkode = normalize_finnkode(extract_finnkode_from_url(url or "")) or ""
address = _get_data_testid_value(soup, "object-address") or properties.get("adresse")
district = _get_data_testid_value(soup, "local-area-name") or properties.get("område")
ownership_type = _strip_labelled_text(
_get_data_testid_value(soup, "info-ownership-type"), ["Eieform", "Eiendomstype"]
) or properties.get("eierform")
property_type = _strip_labelled_text(
_get_data_testid_value(soup, "info-property-type"), ["Boligtype", "Eiendomstype"]
) or properties.get("eiendomstype")
asking_price = normalize_price(
properties.get("prisantydning") or _get_data_testid_value(soup, "pricing-incicative-price")
)
total_price_value = normalize_price(
properties.get("totalpris") or _get_data_testid_value(soup, "pricing-total-price")
)
shared_debt = normalize_price(
properties.get("fellesgjeld") or _get_data_testid_value(soup, "pricing-joint-debt")
)
common_costs = normalize_number(
properties.get("felles utgifter")
or _get_data_testid_value(soup, "pricing-common-monthly-cost")
)
area_m2 = normalize_area(
properties.get("boligareal")
or _get_data_testid_value(soup, "info-usable-i-area")
or _get_data_testid_value(soup, "info-usable-area")
)
rooms = normalize_number(properties.get("rom") or _get_data_testid_value(soup, "info-rooms"))
bedrooms = normalize_number(
properties.get("soverom") or _get_data_testid_value(soup, "info-bedrooms")
)
floor = (
properties.get("etasje")
or _extract_floor_from_text(title_node.get_text() if title_node else "")
or _get_data_testid_value(soup, "info-floor")
)
construction_year = normalize_number(
properties.get("byggeår") or _get_data_testid_value(soup, "info-construction-year")
)
energy_rating = properties.get("energimerking")
heating = properties.get("oppvarming")
has_balcony = text_to_bool(properties.get("balkonger/terrasser")) or "balkong" in feature_text
has_terrace = "terrasse" in feature_text
has_elevator = text_to_bool(properties.get("heis")) or "heis" in feature_text
has_parking = (
bool(properties.get("parkering/garasje"))
or "parkering" in feature_text
or "garasje" in feature_text
)
broker_company = None
if broker_name:
broker_company = clean_text(broker_name.get_text())
listing_description = _extract_description(soup)
ad = FinnAd(
finnkode=finnkode,
url=url or "",
title=clean_text(title_node.get_text()) if title_node else None,
address=address,
postal_area=properties.get("postnummer"),
district=district,
property_type=property_type,
ownership_type=ownership_type,
asking_price=asking_price,
total_price=total_price_value,
shared_debt=shared_debt,
common_costs=common_costs,
municipal_fee=normalize_number(properties.get("kommunale avgifter")),
other_fees=normalize_number(properties.get("andre utgifter")),
area_m2=area_m2,
rooms=rooms,
bedrooms=bedrooms,
floor=floor,
construction_year=construction_year,
energy_rating=energy_rating,
heating=heating,
has_balcony=has_balcony,
has_terrace=has_terrace,
has_elevator=has_elevator,
has_parking=has_parking,
listing_description=listing_description,
broker_name=None,
broker_company=broker_company,
detail_fetched_at=None,
)
return ad
async def fetch_ad_details(finnkode: str, client: HTTPClient | None = None) -> FinnAd:
"""Fetch FINN listing HTML and return a parsed FinnAd object."""
html = await fetch_ad(finnkode, client=client)
ad = scrape_ad(html, url=FINN_AD_URL_TEMPLATE.format(finnkode))
ad.detail_fetched_at = datetime.now(UTC)
return ad
+175
View File
@@ -0,0 +1,175 @@
"""Orchestration for FINN search + Eiendom.no enrichment + scoring."""
import logging
from . import ad as ad_module
from . import cache, eiendom_no, scoring, search
from .config import (
FINN_CACHE_PATH,
FINN_CACHE_TTL_AD_HOURS,
FINN_DETAIL_LIMIT,
FINN_MAX_SEARCH_PAGES,
)
from .models import EiendomUnit, FinnAd, SimilarUnit
logger = logging.getLogger(__name__)
def _normalize_description(text: str | None) -> str:
return text.lower() if text else ""
def _build_ad_summary(
ad: FinnAd,
enriched: EiendomUnit | None,
similar_units: list[SimilarUnit],
scores: dict,
categories: list[str],
) -> dict:
description = _normalize_description(ad.listing_description)
reasons = []
risks = []
next_steps = [
"Open the FINN listing and condition report.",
"Review the Eiendom.no estimate and comparable sales.",
"Ask the broker about renovation status and approvals.",
]
if enriched and enriched.estimated_selling_price and ad.total_price:
if ad.total_price < enriched.estimated_selling_price:
reasons.append("Listing price is below Eiendom.no estimate.")
elif ad.total_price <= enriched.estimated_selling_price_upper:
reasons.append("Price sits within the local estimate range.")
else:
reasons.append("Listing price is above the estimate range.")
else:
reasons.append("Eiendom.no enrichment is unavailable or incomplete.")
if "utsikt" in description or ad.has_balcony or ad.has_terrace:
reasons.append("Outdoor space or view potential is positive.")
if "hybel" in description or "leie" in description:
reasons.append("Potential hybel/rental opportunity is mentioned.")
if "potensial" in description or "renover" in description:
reasons.append("Renovation or improvement potential is highlighted.")
if scores.get("risk", 0.0) < 0:
risks.append("Risk flags are detected in description or metadata.")
if ad.common_costs and ad.common_costs > 5000:
risks.append("Common costs are relatively high and should be reviewed.")
if enriched and enriched.sale_status and enriched.sale_status.upper() != "FOR_SALE":
risks.append("Eiendom.no sale status does not indicate an active sale.")
if not enriched:
risks.append("Missing Eiendom.no data increases uncertainty.")
if not any("Eiendom.no" in step for step in next_steps):
next_steps.append("Verify the property on Eiendom.no and reconcile any mismatches.")
if similar_units:
next_steps.append("Review the comparable units and average sqm prices.")
else:
next_steps.append("Comparable sales are unavailable; treat valuation with caution.")
return {
"why_interesting": reasons,
"risks": risks,
"next_steps": next_steps,
"shortlist_reason": ", ".join(reasons[:3])
if reasons
else "Review details and seller disclosures.",
}
async def analyze_ad(
finn_ad: FinnAd,
unit_code: str | None = None,
) -> dict:
"""Enrich a FinnAd and compute score summary."""
conn = cache.init_db(FINN_CACHE_PATH)
enriched: EiendomUnit | None = None
similar_units: list[SimilarUnit] = []
if unit_code:
enriched = cache.get_eiendom_unit(conn, unit_code)
if enriched is None:
enriched = await eiendom_no.enrich_ad_with_eiendom_no(finn_ad, unit_code)
if enriched is not None:
cache.save_eiendom_unit(conn, enriched)
if enriched and enriched.unit_vector:
similar_units = cache.get_similar_units(conn, enriched.unit_code, "RECENTLY_SOLD")
if not similar_units:
similar_units = await eiendom_no.get_similar_units(enriched.unit_vector)
if similar_units:
cache.save_similar_units(conn, enriched.unit_code, "RECENTLY_SOLD", similar_units)
scores = scoring.score_ad(finn_ad, enriched, similar_units)
categories = scoring.classify_ad(scores)
summary = _build_ad_summary(finn_ad, enriched, similar_units, scores, categories)
result = {
"finnkode": finn_ad.finnkode,
"title": finn_ad.title,
"address": finn_ad.address,
"score": scores,
"categories": categories,
"summary": summary,
"eiendom_unit": enriched.model_dump() if enriched else None,
"similar_units": [unit.model_dump() for unit in similar_units],
}
cache.save_finn_ad(conn, finn_ad)
return result
async def analyze_search(
search_url: str,
max_pages: int = FINN_MAX_SEARCH_PAGES,
fetch_details: bool = True,
detail_limit: int = FINN_DETAIL_LIMIT,
include_eiendom_no: bool = True,
client=None,
use_cache: bool = True,
) -> dict:
"""Analyze a FINN search URL and enrich matching listings."""
conn = cache.init_db(FINN_CACHE_PATH)
cards = await search.fetch_search_pages(
search_url,
max_pages=max_pages,
client=client,
use_cache=use_cache,
)
results = []
enriched_count = 0
if fetch_details:
for card in cards[:detail_limit]:
finn_ad = cache.get_finn_ad(conn, card.finnkode, ttl_hours=FINN_CACHE_TTL_AD_HOURS)
if finn_ad is None:
finn_ad = await ad_module.fetch_ad_details(card.finnkode, client=client)
unit_code = None
if include_eiendom_no:
try:
matched_unit = await eiendom_no.search_unit_from_finn_url(card.url)
except Exception as exc:
logger.warning("Eiendom.no unit search failed: %s", exc)
matched_unit = None
unit_code = (
matched_unit.unit_code
if matched_unit
else eiendom_no.resolve_unit_from_finn_url(card.url)
)
result = await analyze_ad(finn_ad, unit_code=unit_code)
if result.get("eiendom_unit"):
enriched_count += 1
results.append(result)
results.sort(key=lambda item: item["score"].get("total", 0.0), reverse=True)
return {
"search_url": search_url,
"search_cards": [card.model_dump() for card in cards],
"analysis": results,
"summary": {
"total_listings": len(cards),
"analyzed_listings": len(results),
"eiendom_enriched": enriched_count,
},
}
+243
View File
@@ -0,0 +1,243 @@
"""SQLite cache and persistence for FINN and Eiendom.no data."""
import json
import logging
import sqlite3
from datetime import UTC, datetime, timedelta
from typing import Any
from .config import FINN_CACHE_PATH
from .models import EiendomUnit, FinnAd, FinnSearchCard, SimilarUnit
logger = logging.getLogger(__name__)
def get_connection(path: str | None = None) -> sqlite3.Connection:
db_path = path or FINN_CACHE_PATH
conn = sqlite3.connect(str(db_path), detect_types=sqlite3.PARSE_DECLTYPES)
conn.row_factory = sqlite3.Row
return conn
def init_db(path: str | None = None) -> sqlite3.Connection:
conn = get_connection(path)
cursor = conn.cursor()
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS finn_ads (
finnkode TEXT PRIMARY KEY,
url TEXT,
payload TEXT NOT NULL,
fetched_at TEXT NOT NULL
)
"""
)
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS eiendom_units (
unit_code TEXT PRIMARY KEY,
payload TEXT NOT NULL,
fetched_at TEXT NOT NULL
)
"""
)
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS similar_units (
id INTEGER PRIMARY KEY AUTOINCREMENT,
unit_code TEXT NOT NULL,
listing_status TEXT NOT NULL,
payload TEXT NOT NULL,
fetched_at TEXT NOT NULL
)
"""
)
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS cache_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
expires_at TEXT
)
"""
)
conn.commit()
return conn
def cache_get(conn: sqlite3.Connection, key: str) -> dict[str, Any] | None:
cursor = conn.cursor()
cursor.execute("SELECT value, expires_at FROM cache_meta WHERE key = ?", (key,))
row = cursor.fetchone()
if not row:
return None
expires_at = row["expires_at"]
if expires_at and datetime.fromisoformat(expires_at) < datetime.now(UTC):
cursor.execute("DELETE FROM cache_meta WHERE key = ?", (key,))
conn.commit()
return None
return json.loads(row["value"])
def cache_set(
conn: sqlite3.Connection,
key: str,
payload: dict[str, Any],
ttl_hours: int | None = None,
ttl_minutes: int | None = None,
) -> None:
expires_at = None
if ttl_minutes is not None:
expires_at = (datetime.now(UTC) + timedelta(minutes=ttl_minutes)).isoformat()
elif ttl_hours is not None:
expires_at = (datetime.now(UTC) + timedelta(hours=ttl_hours)).isoformat()
cursor = conn.cursor()
cursor.execute(
"INSERT OR REPLACE INTO cache_meta (key, value, expires_at) VALUES (?, ?, ?)",
(key, json.dumps(payload), expires_at),
)
conn.commit()
def _is_fresh(fetched_at: str, ttl_hours: int | None) -> bool:
if ttl_hours is None:
return True
return datetime.fromisoformat(fetched_at) >= datetime.now(UTC) - timedelta(hours=ttl_hours)
def save_search_page(
conn: sqlite3.Connection,
url: str,
html: str,
ttl_minutes: int = 60,
) -> None:
cache_set(conn, f"search_page:{url}", {"html": html}, ttl_minutes=ttl_minutes)
def get_search_page(conn: sqlite3.Connection, url: str) -> str | None:
payload = cache_get(conn, f"search_page:{url}")
if not payload:
return None
return payload.get("html")
def save_search_cards(
conn: sqlite3.Connection,
url: str,
cards: list[FinnSearchCard],
ttl_minutes: int = 60,
) -> None:
cache_set(
conn,
f"search_cards:{url}",
[card.model_dump(mode="json") for card in cards],
ttl_minutes=ttl_minutes,
)
def get_search_cards(conn: sqlite3.Connection, url: str) -> list[FinnSearchCard]:
payload = cache_get(conn, f"search_cards:{url}")
if not payload:
return []
return [FinnSearchCard.model_validate(item) for item in payload]
def save_finn_ad(conn: sqlite3.Connection, ad: FinnAd) -> None:
cursor = conn.cursor()
payload = ad.model_dump(mode="json")
cursor.execute(
"INSERT OR REPLACE INTO finn_ads (finnkode, url, payload, fetched_at) VALUES (?, ?, ?, ?)",
(
ad.finnkode,
ad.url,
json.dumps(payload),
ad.detail_fetched_at.isoformat()
if ad.detail_fetched_at
else datetime.now(UTC).isoformat(),
),
)
conn.commit()
def get_finn_ad(
conn: sqlite3.Connection, finnkode: str, ttl_hours: int | None = None
) -> FinnAd | None:
cursor = conn.cursor()
cursor.execute("SELECT payload, fetched_at FROM finn_ads WHERE finnkode = ?", (finnkode,))
row = cursor.fetchone()
if not row:
return None
if ttl_hours is not None and not _is_fresh(row["fetched_at"], ttl_hours):
return None
return FinnAd.model_validate(json.loads(row["payload"]))
def save_eiendom_unit(conn: sqlite3.Connection, unit: EiendomUnit) -> None:
cursor = conn.cursor()
cursor.execute(
"INSERT OR REPLACE INTO eiendom_units (unit_code, payload, fetched_at) VALUES (?, ?, ?)",
(unit.unit_code, json.dumps(unit.model_dump(mode="json")), unit.fetched_at.isoformat()),
)
conn.commit()
def get_eiendom_unit(
conn: sqlite3.Connection,
unit_code: str,
ttl_hours: int | None = None,
) -> EiendomUnit | None:
cursor = conn.cursor()
cursor.execute(
"SELECT payload, fetched_at FROM eiendom_units WHERE unit_code = ?",
(unit_code,),
)
row = cursor.fetchone()
if not row:
return None
if ttl_hours is not None and not _is_fresh(row["fetched_at"], ttl_hours):
return None
return EiendomUnit.model_validate(json.loads(row["payload"]))
def save_similar_units(
conn: sqlite3.Connection,
unit_code: str,
listing_status: str,
similar_units: list[SimilarUnit],
) -> None:
cursor = conn.cursor()
payload = json.dumps([item.model_dump(mode="json") for item in similar_units])
cursor.execute(
(
"INSERT INTO similar_units"
" (unit_code, listing_status, payload, fetched_at)"
" VALUES (?, ?, ?, ?)"
),
(unit_code, listing_status, payload, datetime.now(UTC).isoformat()),
)
conn.commit()
def get_similar_units(
conn: sqlite3.Connection,
unit_code: str,
listing_status: str,
ttl_hours: int | None = None,
) -> list[SimilarUnit]:
cursor = conn.cursor()
cursor.execute(
(
"SELECT payload, fetched_at FROM similar_units"
" WHERE unit_code = ? AND listing_status = ?"
" ORDER BY id DESC LIMIT 1"
),
(unit_code, listing_status),
)
row = cursor.fetchone()
if not row:
return []
if ttl_hours is not None and not _is_fresh(row["fetched_at"], ttl_hours):
return []
return [SimilarUnit.model_validate(item) for item in json.loads(row["payload"])]
+30
View File
@@ -0,0 +1,30 @@
"""Configuration and environment variables."""
import os
from pathlib import Path
# Cache and database
FINN_CACHE_PATH = os.getenv("FINN_CACHE_PATH", str(Path("data/finn.sqlite")))
# FINN API settings
FINN_MAX_SEARCH_PAGES = int(os.getenv("FINN_MAX_SEARCH_PAGES", "3"))
FINN_DETAIL_LIMIT = int(os.getenv("FINN_DETAIL_LIMIT", "20"))
FINN_REQUEST_DELAY_SECONDS = float(os.getenv("FINN_REQUEST_DELAY_SECONDS", "2"))
FINN_USER_AGENT = os.getenv("FINN_USER_AGENT", "personal-finn-eiendom-analyzer/0.1")
FINN_CACHE_TTL_SEARCH_MINUTES = int(os.getenv("FINN_CACHE_TTL_SEARCH_MINUTES", "60"))
FINN_CACHE_TTL_AD_HOURS = int(os.getenv("FINN_CACHE_TTL_AD_HOURS", "24"))
# Eiendom.no API settings
EIENDOM_NO_ENABLED = os.getenv("EIENDOM_NO_ENABLED", "true").lower() == "true"
EIENDOM_NO_BASE_URL = os.getenv("EIENDOM_NO_BASE_URL", "https://api.eiendom.no/api/v1")
EIENDOM_NO_REQUEST_DELAY_SECONDS = float(os.getenv("EIENDOM_NO_REQUEST_DELAY_SECONDS", "1"))
EIENDOM_NO_CACHE_TTL_HOURS = int(os.getenv("EIENDOM_NO_CACHE_TTL_HOURS", "24"))
EIENDOM_NO_SIMILAR_UNITS_ENABLED = (
os.getenv("EIENDOM_NO_SIMILAR_UNITS_ENABLED", "true").lower() == "true"
)
EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS = os.getenv(
"EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS", "RECENTLY_SOLD"
)
# Logging
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
+236
View File
@@ -0,0 +1,236 @@
"""Eiendom.no enrichment, unit vector, and similar units client."""
import base64
import logging
from typing import Any
import msgpack
from .config import (
EIENDOM_NO_BASE_URL,
EIENDOM_NO_ENABLED,
EIENDOM_NO_REQUEST_DELAY_SECONDS,
EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS,
)
from .http import HTTPClient
from .models import EiendomUnit, SimilarUnit, UnitVector
from .parser import extract_finnkode_from_url, normalize_finnkode
logger = logging.getLogger(__name__)
def _extract_coordinates(geometry: dict) -> tuple[float | None, float | None]:
if not isinstance(geometry, dict):
return None, None
coords = geometry.get("coordinates") or []
if isinstance(coords, (list, tuple)) and len(coords) >= 2:
return coords[0], coords[1]
return None, None
def parse_eiendom_unit_json(unit_data: dict) -> EiendomUnit:
geometry = unit_data.get("geometry", {})
lon, lat = _extract_coordinates(geometry)
specification = unit_data.get("specification", {})
valuation = unit_data.get("valuation", {})
market = unit_data.get("latestMarketData", {})
return EiendomUnit(
unit_code=unit_data.get("unitCode", ""),
address=unit_data.get("address") or unit_data.get("streetAddress"),
lat=lat or unit_data.get("lat"),
lng=lon or unit_data.get("lon"),
property_type=specification.get("propertyType") or unit_data.get("propertyType"),
floor=specification.get("floor") or unit_data.get("floor"),
rooms=specification.get("rooms") or unit_data.get("rooms"),
construction_year=specification.get("constructionYear")
or unit_data.get("constructionYear"),
usable_area=specification.get("usableArea") or unit_data.get("usableArea"),
estimated_selling_price=valuation.get("estimatedSellingPrice")
or unit_data.get("estimatedSellingPrice"),
estimated_selling_price_lower=valuation.get("estimatedSellingPriceLower")
or unit_data.get("estimatedSellingPriceLower"),
estimated_selling_price_upper=valuation.get("estimatedSellingPriceUpper")
or unit_data.get("estimatedSellingPriceUpper"),
listing_price=market.get("listingPrice") or unit_data.get("listingPrice"),
listing_sqm_price=market.get("squareMeterPrice")
or unit_data.get("listingSquareMeterPrice"),
common_costs=market.get("monthlyCosts")
or market.get("commonCosts")
or unit_data.get("commonCosts"),
days_on_market=market.get("daysOnMarket") or unit_data.get("daysOnMarket"),
sale_status=market.get("saleStatus") or unit_data.get("saleStatus"),
market_placement_score=market.get("marketPlacementScore")
or unit_data.get("marketPlacementScore"),
)
def parse_similar_units_json(response_data: dict) -> list[SimilarUnit]:
units: list[SimilarUnit] = []
for item in response_data.get("units", []):
geometry = item.get("geometry", {})
lon, lat = _extract_coordinates(geometry)
specification = item.get("specification", {})
market = item.get("marketData", {})
units.append(
SimilarUnit(
unit_code=item.get("unitCode", ""),
address=item.get("address"),
lat=lat or item.get("lat"),
lng=lon or item.get("lon"),
property_type=specification.get("propertyType") or item.get("propertyType"),
floor=specification.get("floor") or item.get("floor"),
rooms=specification.get("rooms") or item.get("rooms"),
construction_year=specification.get("constructionYear")
or item.get("constructionYear"),
usable_area=specification.get("usableArea") or item.get("usableArea"),
listing_price=market.get("listingPrice") or item.get("listingPrice"),
selling_price=market.get("sellingPrice") or item.get("sellingPrice"),
shared_debt=market.get("jointDebt") or item.get("sharedDebt"),
common_costs=market.get("monthlyCosts") or item.get("commonCosts"),
sqm_price=market.get("squareMeterPrice") or item.get("squareMeterPrice"),
days_on_market=market.get("daysOnMarket") or item.get("daysOnMarket"),
sale_status=market.get("saleStatus") or item.get("saleStatus"),
finalized_at=item.get("finalizedAt") or market.get("finalizedAt"),
listing_status=item.get("listingStatus", "RECENTLY_SOLD"),
)
)
return units
def build_unit_vector(unit: EiendomUnit) -> str:
"""Build a base64url-encoded unit_vector from EiendomUnit data."""
payload = UnitVector(
lon=unit.lng or 0.0,
lat=unit.lat or 0.0,
ptype=unit.property_type or "APARTMENT",
floor=unit.floor,
rooms=unit.rooms,
built=unit.construction_year,
area=unit.usable_area,
price=unit.listing_price or unit.estimated_selling_price,
)
packed = msgpack.packb(payload.model_dump(), use_bin_type=True)
encoded = base64.urlsafe_b64encode(packed).decode("utf-8").rstrip("=")
return encoded
def decode_unit_vector(vector_str: str) -> dict:
"""Decode a base64url unit_vector for debugging."""
padding = 4 - (len(vector_str) % 4)
if padding != 4:
vector_str += "=" * padding
packed = base64.urlsafe_b64decode(vector_str.encode("utf-8"))
return msgpack.unpackb(packed, raw=False)
async def search_unit_from_finn_url(
finn_url: str,
client: HTTPClient | None = None,
) -> EiendomUnit | None:
if not EIENDOM_NO_ENABLED or not finn_url:
logger.info("Eiendom.no unit search is disabled or finn_url is empty")
return None
client = client or HTTPClient(
base_url=EIENDOM_NO_BASE_URL,
request_delay_seconds=EIENDOM_NO_REQUEST_DELAY_SECONDS,
)
response = await client.get(
"/geodata/units/search/",
params={"search": finn_url},
)
data = response.json()
units = data.get("units", [])
if not units:
return None
return parse_eiendom_unit_json(units[0])
async def get_unit(
unit_code: str,
client: HTTPClient | None = None,
) -> EiendomUnit | None:
if not EIENDOM_NO_ENABLED:
logger.info("Eiendom.no enrichment is disabled")
return None
client = client or HTTPClient(
base_url=EIENDOM_NO_BASE_URL,
request_delay_seconds=EIENDOM_NO_REQUEST_DELAY_SECONDS,
)
path = f"/geodata/units/{unit_code}/"
response = await client.get(path)
data = response.json()
units = data.get("units") or []
if not units and isinstance(data, dict) and data.get("unitCode"):
return parse_eiendom_unit_json(data)
if not units:
return None
return parse_eiendom_unit_json(units[0])
async def get_eiendom_unit(
unit_code: str,
client: HTTPClient | None = None,
) -> EiendomUnit | None:
return await get_unit(unit_code, client=client)
async def get_similar_units(
unit_vector: str,
listing_status: str = EIENDOM_NO_SIMILAR_UNITS_DEFAULT_STATUS,
client: HTTPClient | None = None,
) -> list[SimilarUnit]:
if not EIENDOM_NO_ENABLED:
logger.info("Eiendom.no similar-units disabled")
return []
client = client or HTTPClient(
base_url=EIENDOM_NO_BASE_URL,
request_delay_seconds=EIENDOM_NO_REQUEST_DELAY_SECONDS,
)
response = await client.get(
"/geodata/units/similar/",
params={"unit_vector": unit_vector},
)
data = response.json()
units = parse_similar_units_json(data)
listing_status = (listing_status or "").upper()
if listing_status == "RECENTLY_SOLD":
units = [
unit
for unit in units
if unit.sale_status and unit.sale_status.upper() == "SOLD" and unit.finalized_at
]
elif listing_status == "FOR_SALE":
units = [
unit for unit in units if unit.sale_status and unit.sale_status.upper() == "FORSALE"
]
return units
def resolve_unit_from_finn_url(finn_url: str) -> str | None:
"""Resolve the FINN URL into a unit identifier or unitCode placeholder."""
if not finn_url:
return None
candidate = normalize_finnkode(extract_finnkode_from_url(finn_url))
if candidate:
return candidate
return None
async def enrich_ad_with_eiendom_no(
ad: Any,
unit_code: str | None = None,
client: HTTPClient | None = None,
) -> EiendomUnit | None:
if not unit_code:
return None
unit = await get_eiendom_unit(unit_code, client=client)
if unit is None:
return None
unit.unit_vector = build_unit_vector(unit)
return unit
+122
View File
@@ -0,0 +1,122 @@
"""HTTP client with retries, delays, and error handling."""
import asyncio
import logging
import httpx
logger = logging.getLogger(__name__)
class HTTPClient:
"""HTTP client with configurable retries, delays, and timeout."""
def __init__(
self,
base_url: str = "",
user_agent: str = "personal-finn-eiendom-analyzer/0.1",
request_delay_seconds: float = 0.0,
retries: int = 1,
timeout_seconds: float = 30.0,
):
"""
Initialize HTTP client.
Args:
base_url: Base URL for requests
user_agent: User-Agent header value
request_delay_seconds: Delay between requests (to be respectful)
retries: Number of retry attempts for failed connections
timeout_seconds: Request timeout
"""
self.base_url = base_url
self.user_agent = user_agent
self.request_delay_seconds = request_delay_seconds
self.timeout = httpx.Timeout(timeout_seconds)
self.transport = httpx.AsyncHTTPTransport(retries=retries)
self.last_request_time: float | None = None
async def get(self, url: str, **kwargs) -> httpx.Response:
"""
Make async GET request with delay and error handling.
Args:
url: URL to fetch
**kwargs: Additional httpx arguments
Returns:
httpx.Response
Raises:
httpx.HTTPStatusError if status is 4xx or 5xx
"""
headers = kwargs.pop("headers", {})
if "User-Agent" not in headers:
headers["User-Agent"] = self.user_agent
for attempt in range(self._get_retries() + 1):
await self._apply_delay()
async with httpx.AsyncClient(
timeout=self.timeout,
base_url=self.base_url if not url.startswith("http") else "",
) as client:
try:
response = await client.get(url, headers=headers, **kwargs)
if response.status_code < 500:
response.raise_for_status()
logger.debug(f"GET {url} -> {response.status_code}")
return response
if attempt < self._get_retries():
await asyncio.sleep(2**attempt)
continue
response.raise_for_status()
return response
except httpx.HTTPStatusError as e:
logger.error(f"HTTP {e.response.status_code} for {url}")
raise
except httpx.RequestError as e:
logger.error(f"Request failed for {url}: {e}")
raise
def _get_retries(self) -> int:
"""Get retries count from transport."""
if hasattr(self.transport, "_retries"):
return self.transport._retries
return 1
async def post(self, url: str, **kwargs) -> httpx.Response:
"""Make async POST request with delay and error handling."""
headers = kwargs.pop("headers", {})
if "User-Agent" not in headers:
headers["User-Agent"] = self.user_agent
for attempt in range(self._get_retries() + 1):
await self._apply_delay()
async with httpx.AsyncClient(
timeout=self.timeout,
base_url=self.base_url if not url.startswith("http") else "",
) as client:
try:
response = await client.post(url, headers=headers, **kwargs)
if response.status_code < 500:
response.raise_for_status()
logger.debug(f"POST {url} -> {response.status_code}")
return response
if attempt < self._get_retries():
await asyncio.sleep(2**attempt)
continue
response.raise_for_status()
return response
except httpx.HTTPStatusError as e:
logger.error(f"HTTP {e.response.status_code} for {url}")
raise
except httpx.RequestError as e:
logger.error(f"Request failed for {url}: {e}")
raise
async def _apply_delay(self):
"""Apply delay between requests if configured."""
if self.request_delay_seconds > 0:
await asyncio.sleep(self.request_delay_seconds)
+160
View File
@@ -0,0 +1,160 @@
"""FastMCP stdio server for FINN real estate analysis and Eiendom.no enrichment."""
import json
import logging
from mcp.server.fastmcp import FastMCP
from .analysis import analyze_search
from .eiendom_no import (
build_unit_vector,
decode_unit_vector,
get_similar_units,
get_unit,
search_unit_from_finn_url,
)
from .service import get_or_fetch_ad, get_or_fetch_eiendom_unit
logger = logging.getLogger(__name__)
mcp = FastMCP("finn_eiendom_mcp")
@mcp.tool(
description=(
"Analyze a FINN.no real estate search URL. Scrapes listing cards,"
" fetches details, enriches with Eiendom.no data, scores, and ranks."
)
)
async def finn_analyze_search(
search_url: str,
max_pages: int = 3,
detail_limit: int = 20,
include_details: bool = True,
include_eiendom_no: bool = True,
) -> str:
"""Analyze a FINN search URL and return ranked listing results."""
try:
result = await analyze_search(
search_url,
max_pages=max_pages,
fetch_details=include_details,
detail_limit=detail_limit,
include_eiendom_no=include_eiendom_no,
)
return json.dumps(result)
except Exception as e:
logger.error(f"Error analyzing search: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description=(
"Fetch full detail for a FINN listing by finnkode."
" Checks cache first; use force_refresh=True to bypass."
)
)
async def finn_get_ad(finnkode: str, force_refresh: bool = False) -> str:
"""Fetch FINN ad details by finnkode."""
try:
ad = await get_or_fetch_ad(finnkode, force_refresh=force_refresh)
return ad.model_dump_json()
except Exception as e:
logger.error(f"Error fetching ad {finnkode}: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description="Resolve an Eiendom.no unit_code from a FINN listing URL. "
"Returns unit_code, address, lat, lng or an error if not found."
)
async def finn_resolve_eiendom_unit(finn_url: str) -> str:
"""Resolve Eiendom.no unit from FINN URL."""
try:
unit = await search_unit_from_finn_url(finn_url)
if unit is None:
return json.dumps(
{
"error": True,
"message": "Eiendom.no unit could not be resolved from FINN URL",
}
)
return json.dumps(
{
"unit_code": unit.unit_code,
"address": unit.address,
"lat": unit.lat,
"lng": unit.lng,
}
)
except Exception as e:
logger.error(f"Error resolving unit from {finn_url}: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description="Fetch full Eiendom.no unit data by unit_code. Checks SQLite cache (24h TTL)."
)
async def finn_get_eiendom_unit(unit_code: str, force_refresh: bool = False) -> str:
"""Fetch Eiendom.no unit details by unit_code."""
try:
unit = await get_or_fetch_eiendom_unit(unit_code, force_refresh=force_refresh)
if unit is None:
return json.dumps({"error": True, "message": "Eiendom.no unit not found"})
return unit.model_dump_json()
except Exception as e:
logger.error(f"Error fetching unit {unit_code}: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description="Fetch comparable recently-sold or for-sale units from Eiendom.no using a "
"base64-encoded unit vector. Returns list of similar units with sale prices."
)
async def finn_get_similar_units(unit_vector: str, listing_status: str = "RECENTLY_SOLD") -> str:
"""Fetch similar units from Eiendom.no."""
try:
units = await get_similar_units(unit_vector, listing_status)
return json.dumps([unit.model_dump() for unit in units])
except Exception as e:
logger.error(f"Error fetching similar units: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description="Build a base64-encoded unit vector for a given Eiendom.no unit_code. "
"The vector is used as input to finn_get_similar_units."
)
async def finn_build_unit_vector(unit_code: str) -> str:
"""Build unit vector for Eiendom.no unit."""
try:
unit = await get_unit(unit_code)
if unit is None:
return json.dumps({"error": True, "message": "Eiendom.no unit not found"})
return json.dumps({"unit_code": unit.unit_code, "unit_vector": build_unit_vector(unit)})
except Exception as e:
logger.error(f"Error building unit vector for {unit_code}: {e}")
return json.dumps({"error": True, "message": str(e)})
@mcp.tool(
description="Decode a base64 unit vector into human-readable JSON (lat, lon, property type, "
"floor, rooms, construction year, area, price)."
)
def finn_decode_unit_vector(unit_vector: str) -> str:
"""Decode unit vector to readable format."""
try:
result = decode_unit_vector(unit_vector)
return json.dumps(result)
except Exception as e:
logger.error(f"Error decoding unit vector: {e}")
return json.dumps({"error": True, "message": str(e)})
def main() -> None:
"""Run the FastMCP stdio server."""
mcp.run(transport="stdio")
if __name__ == "__main__":
main()
+128
View File
@@ -0,0 +1,128 @@
"""Pydantic models for FINN ads and Eiendom.no units."""
from datetime import UTC, datetime
from pydantic import BaseModel, ConfigDict, Field
class FinnSearchCard(BaseModel):
"""FINN search result card (minimal fields from search listing)."""
finnkode: str
url: str
title: str | None = None
address: str | None = None
area_m2: int | None = None
asking_price: int | None = None
total_price: int | None = None
common_costs: int | None = None
property_type: str | None = None
ownership_type: str | None = None
bedrooms: int | None = None
floor: str | None = None
broker_company: str | None = None
class FinnAd(BaseModel):
"""FINN listing detail with all available fields."""
finnkode: str
url: str
title: str | None = None
address: str | None = None
postal_area: str | None = None
district: str | None = None
property_type: str | None = None
ownership_type: str | None = None
asking_price: int | None = None
total_price: int | None = None
shared_debt: int | None = None
common_costs: int | None = None
municipal_fee: int | None = None
other_fees: int | None = None
area_m2: int | None = None
rooms: int | None = None
bedrooms: int | None = None
floor: str | None = None
construction_year: int | None = None
energy_rating: str | None = None
heating: str | None = None
has_balcony: bool | None = None
has_terrace: bool | None = None
has_elevator: bool | None = None
has_parking: bool | None = None
has_garage: bool | None = None
listing_description: str | None = None
broker_name: str | None = None
broker_company: str | None = None
first_seen_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
last_seen_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
detail_fetched_at: datetime | None = None
eiendom_unit_code: str | None = None
model_config = ConfigDict(serializers={datetime: lambda v: v.isoformat()})
class EiendomUnit(BaseModel):
"""Eiendom.no unit detail with market data."""
unit_code: str
address: str | None = None
lat: float | None = None
lng: float | None = None
property_type: str | None = None
floor: int | None = None
rooms: int | None = None
construction_year: int | None = None
usable_area: int | None = None
estimated_selling_price: int | None = None
estimated_selling_price_lower: int | None = None
estimated_selling_price_upper: int | None = None
listing_price: int | None = None
listing_sqm_price: int | None = None
common_costs: int | None = None
days_on_market: int | None = None
sale_status: str | None = None
market_placement_score: str | None = None
unit_vector: str | None = None
fetched_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
model_config = ConfigDict(serializers={datetime: lambda v: v.isoformat()})
class SimilarUnit(BaseModel):
"""Eiendom.no similar unit (comp) result."""
unit_code: str
address: str | None = None
lat: float | None = None
lng: float | None = None
property_type: str | None = None
floor: int | None = None
rooms: int | None = None
construction_year: int | None = None
usable_area: int | None = None
listing_price: int | None = None
selling_price: int | None = None
shared_debt: int | None = None
common_costs: int | None = None
sqm_price: int | None = None
days_on_market: int | None = None
sale_status: str | None = None
finalized_at: datetime | None = None
listing_status: str = Field(default="RECENTLY_SOLD")
model_config = ConfigDict(serializers={datetime: lambda v: v.isoformat() if v else None})
class UnitVector(BaseModel):
"""Unit vector payload for similar-units API."""
lon: float
lat: float
ptype: str # property type: APARTMENT, HOUSE, etc.
floor: int | None = None
rooms: int | None = None
built: int | None = None # construction year
area: int | None = None # usable area
price: int | None = None # listing or estimated price
+88
View File
@@ -0,0 +1,88 @@
"""Normalization and parsing helpers."""
import re
def normalize_price(price_str: str | None) -> int | None:
"""
Normalize Norwegian formatted price to integer.
Example: "7 200 991 kr" -> 7200991
"""
if not price_str:
return None
# Remove "kr" and spaces, keep only digits
normalized = re.sub(r"[^\d]", "", price_str)
try:
return int(normalized) if normalized else None
except ValueError:
return None
def normalize_area(area_str: str | None) -> int | None:
"""
Normalize area string to integer.
Example: "77 m²" -> 77
"""
if not area_str:
return None
cleaned = area_str.replace(" ", "")
match = re.search(r"(\d+(?:[.,]\d+)?)", cleaned)
if match:
value = match.group(1).replace(",", ".")
try:
return int(float(value))
except ValueError:
return None
return None
def normalize_number(num_str: str | None) -> int | None:
"""
Normalize Norwegian formatted number to integer.
Handles text like "3 500 kr/mnd" and "7,2".
"""
if not num_str:
return None
cleaned = re.sub(r"[^\d,\.]", "", num_str)
cleaned = cleaned.replace(" ", "")
if "," in cleaned:
cleaned = cleaned.replace(".", "").replace(",", ".")
else:
cleaned = cleaned.replace(".", "")
try:
return int(float(cleaned)) if cleaned else None
except ValueError:
return None
def normalize_finnkode(finnkode: str | None) -> str | None:
"""Normalize finnkode to string, strip whitespace."""
if not finnkode:
return None
return str(finnkode).strip()
def extract_finnkode_from_url(url: str) -> str | None:
"""
Extract finnkode from FINN URL.
Example: https://www.finn.no/realestate/homes/ad.html?finnkode=462400360 -> 462400360
"""
match = re.search(r"finnkode=(\d+)", url)
if match:
return match.group(1)
return None
def text_to_bool(text: str | None) -> bool:
"""Convert text to boolean."""
if not text:
return False
return text.lower() in ("ja", "yes", "true", "1", "y")
def clean_text(text: str | None) -> str | None:
"""Clean and normalize text: strip, collapse whitespace."""
if not text:
return None
cleaned = " ".join(text.split())
return cleaned if cleaned else None
+146
View File
@@ -0,0 +1,146 @@
"""Scoring engine for FINN listings enriched with Eiendom.no data."""
import logging
from typing import Any
from .models import EiendomUnit, SimilarUnit
logger = logging.getLogger(__name__)
def _clamp(value: float, min_value: float, max_value: float) -> float:
return max(min_value, min(max_value, value))
def score_market_position(unit: EiendomUnit | None) -> float:
if unit is None or unit.estimated_selling_price is None or unit.listing_price is None:
return 0.0
ratio = unit.listing_price / unit.estimated_selling_price
if ratio <= 0.9:
return 20.0
if ratio <= 1.0:
return 16.0 + (1.0 - ratio) * 40.0
if ratio <= 1.1:
return 12.0 - (ratio - 1.0) * 40.0
return 5.0
def score_economy(ad: Any, unit: EiendomUnit | None) -> float:
if ad.total_price is None:
return 0.0
if unit and unit.estimated_selling_price:
ratio = ad.total_price / unit.estimated_selling_price
if ratio <= 0.95:
return 20.0
if ratio <= 1.0:
return 15.0
if ratio <= 1.05:
return 10.0
return 6.0
if ad.asking_price and ad.total_price <= ad.asking_price:
return 12.0
return 8.0
def score_comparable_sales(listings: list[SimilarUnit], listing_price: int | None) -> float:
if not listings or listing_price is None:
return 0.0
selling_prices = [unit.selling_price for unit in listings if unit.selling_price]
if not selling_prices:
return 0.0
average = sum(selling_prices) / len(selling_prices)
ratio = listing_price / average
score = (1.0 - abs(ratio - 1.0)) * 20.0
return float(_clamp(score, 0.0, 20.0))
def score_location(address: str | None, district: str | None) -> float:
if not address and not district:
return 0.0
if district and "oslo" in district.lower():
return 15.0
if address and "oslo" in address.lower():
return 12.0
return 7.0
def score_layout_and_potential(description: str | None, rooms: int | None) -> float:
score = 0.0
if rooms and rooms >= 4:
score += 10.0
if description and "potensial" in description.lower():
score += 8.0
return float(_clamp(score, 0.0, 20.0))
def score_outdoor_and_view(description: str | None) -> float:
if not description:
return 0.0
score = 5.0 if "utsikt" in description.lower() or "balkong" in description.lower() else 0.0
return float(_clamp(score, 0.0, 15.0))
def score_rental_potential(description: str | None) -> float:
if not description:
return 0.0
score = 10.0 if "hybel" in description.lower() or "leie" in description.lower() else 0.0
return score
def score_renovation_upside(description: str | None, asking_price: int | None) -> float:
score = 0.0
if description and "renover" in description.lower():
score += 10.0
if asking_price and asking_price > 0:
score += 5.0
return float(_clamp(score, 0.0, 15.0))
def score_risk(description: str | None, unit: EiendomUnit | None) -> float:
if unit is None:
return -10.0
if description and "usikker" in description.lower():
return -10.0
return 0.0
def score_ad(
ad: Any, unit: EiendomUnit | None, similar_units: list[SimilarUnit]
) -> dict[str, float]:
scores = {
"economy": score_economy(ad, unit),
"market_position": score_market_position(unit),
"comparable_sales": score_comparable_sales(
similar_units, ad.total_price or ad.asking_price
),
"location": score_location(ad.address, ad.district),
"layout": score_layout_and_potential(ad.listing_description, ad.rooms),
"outdoor": score_outdoor_and_view(ad.listing_description),
"rental_potential": score_rental_potential(ad.listing_description),
"renovation": score_renovation_upside(ad.listing_description, ad.asking_price),
"risk": score_risk(ad.listing_description, unit),
}
scores["total"] = float(_clamp(sum(scores.values()), 0.0, 100.0))
return scores
def classify_ad(scores: dict[str, float]) -> list[str]:
categories: list[str] = []
total = scores.get("total", 0.0)
if total >= 70:
categories.append("bargain_candidate")
if total >= 60:
categories.append("safe_candidate")
if 50 <= total < 70:
categories.append("lifestyle_candidate")
if scores.get("renovation", 0.0) >= 8:
categories.append("renovation_candidate")
if scores.get("rental_potential", 0.0) >= 5:
categories.append("hybel_candidate")
if scores.get("risk", 0.0) < 0:
categories.append("risk_object")
if total < 30:
categories.append("not_interesting")
if 30 <= total < 60:
categories.append("manual_review_required")
return categories
+194
View File
@@ -0,0 +1,194 @@
"""FINN search scraping and parsing."""
import logging
import re
from bs4 import BeautifulSoup
from . import cache
from .config import FINN_CACHE_TTL_SEARCH_MINUTES
from .http import HTTPClient
from .models import FinnSearchCard
from .parser import (
clean_text,
extract_finnkode_from_url,
normalize_area,
normalize_finnkode,
normalize_number,
normalize_price,
)
logger = logging.getLogger(__name__)
async def fetch_search_page(url: str, client: HTTPClient | None = None) -> str:
"""Fetch a FINN search page HTML."""
client = client or HTTPClient(request_delay_seconds=0.0)
response = await client.get(url)
return response.text
async def fetch_search_page_cached(
url: str,
client: HTTPClient | None = None,
conn: cache.sqlite3.Connection | None = None,
use_cache: bool = True,
) -> str:
"""Fetch a FINN search page with optional SQLite caching."""
client = client or HTTPClient(request_delay_seconds=0.0)
conn = conn or cache.init_db()
if use_cache:
cached_html = cache.get_search_page(conn, url)
if cached_html:
logger.debug("Using cached search page: %s", url)
return cached_html
html = await fetch_search_page(url, client=client)
cache.save_search_page(conn, url, html, ttl_minutes=FINN_CACHE_TTL_SEARCH_MINUTES)
return html
def extract_ad_links(html: str) -> list[str]:
"""Extract listing URLs from FINN search HTML."""
soup = BeautifulSoup(html, "html.parser")
links = []
for article in soup.select("article.listing-card, article.sf-search-ad"):
anchor = article.select_one("a[href*='finnkode']")
if anchor and anchor.get("href"):
links.append(clean_text(anchor.get("href")) or "")
return links
def _extract_int_from_text(text: str, pattern: str) -> int | None:
match = re.search(pattern, text, re.I)
if match:
return normalize_number(match.group(1))
return None
def _extract_area_from_text(text: str) -> int | None:
matches = re.findall(r"(\d+(?:[.,]\d+)?)\s*(?:m²|m2|kvm)", text, re.I)
if matches:
return normalize_area(matches[-1])
return None
def _extract_price_from_text(text: str, label: str) -> int | None:
pattern = rf"{label}[:\s]*([\d\s]+kr)"
match = re.search(pattern, text, re.I)
if match:
return normalize_price(match.group(1))
return None
def extract_search_cards(html: str) -> list[FinnSearchCard]:
"""Parse FINN search HTML and return a list of FinnSearchCard objects."""
logger.debug("Extracting FINN search cards")
soup = BeautifulSoup(html, "html.parser")
cards: list[FinnSearchCard] = []
for card in soup.select("article.listing-card, article.sf-search-ad"):
data_id = card.get("data-id")
anchor = card.select_one("a[href*='finnkode']")
url = anchor.get("href") if anchor else ""
finnkode = normalize_finnkode(data_id or extract_finnkode_from_url(url))
if not finnkode:
logger.debug("Skipping card with missing finnkode")
continue
title_elem = card.select_one(".title, h2.sf-realestate-heading, a.sf-search-ad-link")
address_elem = card.select_one(".location, .sf-realestate-location")
area_elem = card.select_one(".area")
price_elem = card.select_one(".price")
common_costs_elem = card.select_one(".common-costs")
bedrooms_elem = card.select_one(".bedrooms")
property_type_elem = card.select_one(".property-type")
ownership_type_elem = card.select_one(".ownership-type")
broker_elem = card.select_one(".broker-company")
card_text = clean_text(card.get_text(" ") or "")
bedrooms = None
if bedrooms_elem:
bedrooms = normalize_number(bedrooms_elem.get_text())
elif card_text:
bedrooms = _extract_int_from_text(card_text, r"(\d+)\s*soverom")
common_costs = None
if common_costs_elem:
common_costs = normalize_number(common_costs_elem.get_text())
elif card_text:
common_costs = _extract_int_from_text(
card_text, r"(?:Fellesutg|Felleskost(?:er)?)[^\d]*(\d+[\d\s]*)kr"
)
total_price = None
if price_elem:
total_price = normalize_price(price_elem.get_text())
if not total_price and card_text:
total_price = _extract_price_from_text(card_text, r"Totalpris")
if not total_price and card_text:
first_price_match = re.search(r"([\d\s]+kr)", card_text)
if first_price_match:
total_price = normalize_price(first_price_match.group(1))
area_m2 = None
if area_elem:
area_m2 = normalize_area(area_elem.get_text())
elif card_text:
area_m2 = _extract_area_from_text(card_text)
card_data = FinnSearchCard(
finnkode=finnkode,
url=url or "",
title=clean_text(title_elem.get_text()) if title_elem else None,
address=clean_text(address_elem.get_text()) if address_elem else None,
area_m2=area_m2,
asking_price=None,
total_price=total_price,
common_costs=common_costs,
property_type=clean_text(property_type_elem.get_text()) if property_type_elem else None,
ownership_type=clean_text(ownership_type_elem.get_text())
if ownership_type_elem
else None,
bedrooms=bedrooms,
floor=None,
broker_company=clean_text(broker_elem.get_text()) if broker_elem else None,
)
cards.append(card_data)
logger.debug("Parsed FINN search card %s", finnkode)
return cards
def find_next_page_url(html: str) -> str | None:
"""Return the FINN search next page URL if present."""
soup = BeautifulSoup(html, "html.parser")
next_link = soup.select_one("a[rel='next']")
if next_link and next_link.get("href"):
return clean_text(next_link.get("href"))
return None
async def fetch_search_pages(
start_url: str,
max_pages: int = 1,
client: HTTPClient | None = None,
use_cache: bool = True,
) -> list[FinnSearchCard]:
"""Fetch paginated FINN search pages and parse search cards."""
client = client or HTTPClient(request_delay_seconds=0.0)
conn = cache.init_db()
url = start_url
all_cards: list[FinnSearchCard] = []
for _ in range(max_pages):
html = await fetch_search_page_cached(url, client=client, conn=conn, use_cache=use_cache)
all_cards.extend(extract_search_cards(html))
next_url = find_next_page_url(html)
if not next_url:
break
url = next_url
logger.debug("Following next page link: %s", url)
return all_cards
+35
View File
@@ -0,0 +1,35 @@
"""Service layer for cache-aware fetching of FINN ads and Eiendom.no units."""
import logging
from .ad import fetch_ad_details
from .cache import get_eiendom_unit as get_cached_eiendom_unit
from .cache import get_finn_ad, init_db, save_eiendom_unit, save_finn_ad
from .config import FINN_CACHE_PATH
from .eiendom_no import get_unit
from .models import EiendomUnit, FinnAd
logger = logging.getLogger(__name__)
async def get_or_fetch_ad(finnkode: str, force_refresh: bool = False) -> FinnAd:
"""Get FinnAd from cache or fetch fresh. Never returns None."""
conn = init_db(FINN_CACHE_PATH)
ad = None if force_refresh else get_finn_ad(conn, finnkode, ttl_hours=24)
if ad is None:
ad = await fetch_ad_details(finnkode)
save_finn_ad(conn, ad)
return ad
async def get_or_fetch_eiendom_unit(
unit_code: str, force_refresh: bool = False
) -> EiendomUnit | None:
"""Get EiendomUnit from cache or fetch fresh."""
conn = init_db(FINN_CACHE_PATH)
unit = None if force_refresh else get_cached_eiendom_unit(conn, unit_code, ttl_hours=24)
if unit is None:
unit = await get_unit(unit_code)
if unit is not None:
save_eiendom_unit(conn, unit)
return unit
+49
View File
@@ -0,0 +1,49 @@
[project]
name = "finn-eiendom-mcp"
version = "0.1.0"
description = "Private FINN and Eiendom.no real estate MCP scout"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"beautifulsoup4>=4.12.0",
"httpx>=0.27.0",
"lxml>=5.0.0",
"mcp[cli]>=1.0.0",
"msgpack>=1.0.0",
"pydantic>=2.8.0",
"pydantic-settings>=2.4.0",
"python-dotenv>=1.0.0",
]
[project.scripts]
finn-eiendom-mcp = "finn_eiendom.mcp_server:main"
[dependency-groups]
dev = [
"ipython>=8.0.0",
"mypy>=1.10.0",
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"respx>=0.21.0",
"ruff>=0.6.0",
]
[tool.ruff]
line-length = 100
target-version = "py312"
[tool.ruff.lint]
select = ["E", "F", "I", "UP", "B", "SIM"]
ignore = []
[tool.ruff.lint.per-file-ignores]
"tests/fixtures.py" = ["E501"]
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
[tool.mypy]
python_version = "3.12"
strict = true
plugins = []
+1
View File
@@ -0,0 +1 @@
"""Test fixtures and utilities."""
+236
View File
@@ -0,0 +1,236 @@
"""Fixture data for testing without hitting live APIs."""
# noqa: E501
SAMPLE_FINN_SEARCH_HTML = """
<!DOCTYPE html>
<html lang="no">
<head><title>FINN.no - Leiligheter til salgs</title></head>
<body>
<div class="listings">
<article class="listing-card" data-id="462400360">
<a href="https://www.finn.no/realestate/homes/ad.html?finnkode=462400360" class="listing-link">
<div class="title">Flott 3-roms i Ferner</div>
<div class="meta">
<span class="area">77 m²</span>
<span class="price">7 200 991 kr</span>
<span class="price-per-sqm">93 500 kr/m²</span>
</div>
</a>
<div class="details">
<span class="bedrooms">3</span>
<span class="location">Grünerløkka, Oslo</span>
<span class="common-costs">3 500 kr/mnd</span>
</div>
</article>
<article class="listing-card" data-id="460784945">
<a href="https://www.finn.no/realestate/homes/ad.html?finnkode=460784945" class="listing-link">
<div class="title">Leilighet med potensial - må renoveres</div>
<div class="meta">
<span class="area">65 m²</span>
<span class="price">6 500 000 kr</span>
<span class="price-per-sqm">100 000 kr/m²</span>
</div>
</a>
<div class="details">
<span class="bedrooms">2</span>
<span class="location">Sagene, Oslo</span>
<span class="common-costs">2 800 kr/mnd</span>
</div>
</article>
</div>
</body>
</html>
"""
# noqa: E501
SAMPLE_FINN_SEARCH_HTML_NEW = """
<!DOCTYPE html>
<html lang="no">
<head><title>FINN.no - Leiligheter til salgs</title></head>
<body>
<div class="listings">
<article class="relative isolate sf-search-ad card card--cardShadow">
<div class="col-span-2 p-16 grid sm:grid-cols-2">
<h2 class="h4 mb-0 col-span-2 mt-12 sm:mt-24 sf-realestate-heading">IDYLLISKE ILADALEN - Lekker 3-roms loftsleilighet fra 2016 | Privat, solrik takterrasse | Peis | Gulvareal på 77kvm | Sentralt, men rolig</h2>
<a href="https://www.finn.no/realestate/homes/ad.html?finnkode=462880791" class="sf-search-ad-link s-text!">IDYLLISKE ILADALEN - Lekker 3-roms loftsleilighet fra 2016 | Privat, solrik takterrasse | Peis | Gulvareal på 77kvm | Sentralt, men rolig</a>
<div class="mt-4 sf-line-clamp-2 sm:order-first sm:text-right sm:mt-0 sm:ml-16 sf-realestate-location">Lofotgata 4B, Oslo</div>
<div class="col-span-2 mt-16 flex justify-between sm:mt-4 sm:block space-x-12 font-bold">62 m² 6 750 000 kr</div>
<div class="col-span-2 sm:flex sm:items-baseline sm:justify-between">Totalpris: 7 253 377 kr ∙ Fellesutg.: 7 067 kr ∙ Andel ∙ Leilighet ∙ 2 soverom</div>
</div>
</article>
</div>
</body>
</html>
"""
SAMPLE_FINN_LISTING_HTML = """
<!DOCTYPE html>
<html lang="no">
<head><title>Flott 3-roms i Ferner - FINN.no</title></head>
<body>
<div class="listing-details">
<div class="heading">
<h1>Flott 3-roms i Ferner</h1>
<div class="price">Totalpris: 7 200 991 kr</div>
</div>
<div class="properties">
<dl>
<dt>Adresse</dt>
<dd>Fernerveien 42, 0554 Oslo</dd>
<dt>Område</dt>
<dd>Grünerløkka</dd>
<dt>Postnummer</dt>
<dd>0554</dd>
<dt>Eierform</dt>
<dd>Eierbolig</dd>
<dt>Eiendomstype</dt>
<dd>Leilighet</dd>
<dt>Prisantydning</dt>
<dd>7 200 000 kr</dd>
<dt>Totalpris</dt>
<dd>7 200 991 kr</dd>
<dt>Fellesgjeld</dt>
<dd>0 kr</dd>
<dt>Felles utgifter</dt>
<dd>3 500 kr/mnd</dd>
<dt>Boligareal</dt>
<dd>77 m²</dd>
<dt>Rom</dt>
<dd>4</dd>
<dt>Soverom</dt>
<dd>3</dd>
<dt>Etasje</dt>
<dd>4. etasje</dd>
<dt>Byggeår</dt>
<dd>2005</dd>
<dt>Energimerking</dt>
<dd>C</dd>
<dt>Oppvarming</dt>
<dd>Fjernvarme</dd>
<dt>Balkonger/terrasser</dt>
<dd>Ja, balkonger</dd>
<dt>Heis</dt>
<dd>Ja</dd>
<dt>Parkering/garasje</dt>
<dd>Privat parkering</dd>
</dl>
</div>
<div class="description">
<h2>Beskrivelse</h2>
<p>Flott beliggenhet med fin utsikt over Oslo. Moderne kjøkken og bad.</p>
<p>Klar til visning!</p>
</div>
<div class="broker">
<div class="broker-info">
<span class="broker-name">Meglerhuset AS</span>
<span class="broker-contact">Telefon: 21 00 00 00</span>
</div>
</div>
</div>
</body>
</html>
"""
SAMPLE_FINN_LISTING_HTML_NEW = """
<!DOCTYPE html>
<html lang="no">
<head><title>Romslig 5-roms i 5.etasje med heisadkomst</title></head>
<body>
<div data-testid="object-details">
<h1>Romslig 5-roms i 5.etasje med heisadkomst | 2 hybler | 4 balkonger | Ingen dokavgift!</h1>
<span data-testid="object-address">Hegdehaugsveien 3, 0352 Oslo</span>
<span data-testid="local-area-name">Homansbyen</span>
<section data-testid="pricing-details">
<div data-testid="pricing-incicative-price">Prisantydning10 900 000 kr</div>
<div data-testid="pricing-total-price"><dt>Totalpris</dt><dd>10 986 901 kr</dd></div>
<div data-testid="pricing-joint-debt"><dt>Fellesgjeld</dt><dd>76 911 kr</dd></div>
<div data-testid="pricing-common-monthly-cost"><dt>Felleskost/mnd.</dt><dd>12 011 kr</dd></div>
</section>
<section data-testid="key-info">
<div data-testid="info-property-type">BoligtypeLeilighet</div>
<div data-testid="info-ownership-type">EieformAndel</div>
<div data-testid="info-bedrooms">Soverom2</div>
<div data-testid="info-rooms">Rom5</div>
<div data-testid="info-construction-year">Byggeår1938</div>
<div data-testid="info-usable-i-area">Internt bruksareal124 m² (BRA-i)</div>
</section>
<section data-testid="object-facilities">FasiliteterBalkong/TerrasseParkettHeis</section>
<section data-testid="om boligen">
<h2>Om boligen</h2>
<p>Her bor du med kort vei til daglige behov og offentlig transport.</p>
</section>
</div>
</body>
</html>
"""
SAMPLE_EIENDOM_UNIT_JSON = {
"units": [
{
"unitCode": "c-gxw-xmyum-s2a",
"address": "Fernerveien 42, 0554 Oslo",
"municipality": "Oslo",
"lat": 59.9287,
"lon": 10.7803,
"propertyType": "APARTMENT",
"floor": 4,
"rooms": 4,
"constructionYear": 2005,
"usableArea": 77,
"estimatedSellingPrice": 7650000,
"estimatedSellingPriceLower": 6900000,
"estimatedSellingPriceUpper": 8400000,
"listingPrice": 7200000,
"listingSquareMeterPrice": 93500,
"commonCosts": 3500,
"daysOnMarket": 12,
"saleStatus": "FOR_SALE",
"marketPlacementScore": "ABOVE_AVERAGE",
"similarUnitCount": 12,
"averageSquareMeterPrice": 98000,
}
]
}
SAMPLE_EIENDOM_SIMILAR_UNITS_JSON = {
"units": [
{
"unitCode": "c-recent-1",
"address": "Birketveien 10, 0554 Oslo",
"lat": 59.9290,
"lon": 10.7810,
"propertyType": "APARTMENT",
"floor": 3,
"rooms": 3,
"constructionYear": 2004,
"usableArea": 75,
"listingPrice": 7100000,
"sellingPrice": 7050000,
"sharedDebt": 0,
"commonCosts": 3400,
"squareMeterPrice": 94000,
"daysOnMarket": 18,
"saleStatus": "SOLD",
"finalizedAt": "2024-05-01",
},
{
"unitCode": "c-recent-2",
"address": "Sommers gate 5, 0554 Oslo",
"lat": 59.9280,
"lon": 10.7820,
"propertyType": "APARTMENT",
"floor": 2,
"rooms": 4,
"constructionYear": 2006,
"usableArea": 80,
"listingPrice": 7400000,
"sellingPrice": 7350000,
"sharedDebt": 0,
"commonCosts": 3600,
"squareMeterPrice": 91875,
"daysOnMarket": 22,
"saleStatus": "SOLD",
"finalizedAt": "2024-04-28",
},
]
}
+45
View File
@@ -0,0 +1,45 @@
from finn_eiendom.ad import scrape_ad
from tests.fixtures import SAMPLE_FINN_LISTING_HTML, SAMPLE_FINN_LISTING_HTML_NEW
def test_scrape_ad():
ad = scrape_ad(
SAMPLE_FINN_LISTING_HTML,
url="https://www.finn.no/realestate/homes/ad.html?finnkode=462400360",
)
assert ad.finnkode == "462400360"
assert ad.title == "Flott 3-roms i Ferner"
assert ad.address == "Fernerveien 42, 0554 Oslo"
assert ad.area_m2 == 77
assert ad.asking_price == 7200000
assert ad.total_price == 7200991
assert ad.common_costs == 3500
assert ad.rooms == 4
assert ad.bedrooms == 3
assert ad.floor == "4. etasje"
assert ad.construction_year == 2005
assert ad.energy_rating == "C"
assert ad.heating == "Fjernvarme"
assert "Moderne kjøkken" in ad.listing_description
assert ad.broker_company == "Meglerhuset AS"
def test_scrape_ad_new_structure():
ad = scrape_ad(
SAMPLE_FINN_LISTING_HTML_NEW,
url="https://www.finn.no/realestate/homes/ad.html?finnkode=455978973",
)
assert ad.finnkode == "455978973"
assert ad.title.startswith("Romslig 5-roms i 5.etasje")
assert ad.address == "Hegdehaugsveien 3, 0352 Oslo"
assert ad.property_type == "Leilighet"
assert ad.ownership_type == "Andel"
assert ad.asking_price == 10900000
assert ad.total_price == 10986901
assert ad.common_costs == 12011
assert ad.area_m2 == 124
assert ad.rooms == 5
assert ad.bedrooms == 2
assert ad.construction_year == 1938
assert ad.floor == "5. etasje"
assert "kort vei" in ad.listing_description.lower()
+71
View File
@@ -0,0 +1,71 @@
import tempfile
from datetime import UTC, datetime, timedelta
from pathlib import Path
from finn_eiendom.cache import (
get_eiendom_unit,
get_finn_ad,
get_search_page,
get_similar_units,
init_db,
save_eiendom_unit,
save_finn_ad,
save_search_page,
save_similar_units,
)
from finn_eiendom.models import EiendomUnit, FinnAd, SimilarUnit
def test_cache_roundtrip():
with tempfile.TemporaryDirectory() as tmpdir:
db_path = Path(tmpdir) / "cache.sqlite"
conn = init_db(str(db_path))
ad = FinnAd(finnkode="1234", url="https://example.com", title="Test")
save_finn_ad(conn, ad)
loaded_ad = get_finn_ad(conn, "1234")
assert loaded_ad is not None
assert loaded_ad.finnkode == "1234"
assert loaded_ad.url == "https://example.com"
unit = EiendomUnit(unit_code="abc", address="Oslo")
save_eiendom_unit(conn, unit)
loaded_unit = get_eiendom_unit(conn, "abc")
assert loaded_unit is not None
assert loaded_unit.address == "Oslo"
comps = [
SimilarUnit(unit_code="x1"),
SimilarUnit(unit_code="x2"),
]
save_similar_units(conn, "abc", "RECENTLY_SOLD", comps)
loaded_comps = get_similar_units(conn, "abc", "RECENTLY_SOLD")
assert len(loaded_comps) == 2
assert loaded_comps[0].unit_code == "x1"
def test_search_page_cache_roundtrip():
with tempfile.TemporaryDirectory() as tmpdir:
conn = init_db(str(Path(tmpdir) / "cache.sqlite"))
html = "<html><body>search page</body></html>"
url = "https://www.finn.no/realestate/homes/search.html"
save_search_page(conn, url, html, ttl_minutes=5)
loaded_html = get_search_page(conn, url)
assert loaded_html == html
def test_finn_ad_cache_ttl_expiration():
with tempfile.TemporaryDirectory() as tmpdir:
conn = init_db(str(Path(tmpdir) / "cache.sqlite"))
ad = FinnAd(
finnkode="1234",
url="https://example.com",
title="Test",
detail_fetched_at=datetime.now(UTC) - timedelta(hours=2),
)
save_finn_ad(conn, ad)
expired_ad = get_finn_ad(conn, "1234", ttl_hours=1)
assert expired_ad is None
+44
View File
@@ -0,0 +1,44 @@
from finn_eiendom.eiendom_no import (
build_unit_vector,
decode_unit_vector,
parse_eiendom_unit_json,
parse_similar_units_json,
resolve_unit_from_finn_url,
)
from tests.fixtures import (
SAMPLE_EIENDOM_SIMILAR_UNITS_JSON,
SAMPLE_EIENDOM_UNIT_JSON,
)
def test_parse_eiendom_unit_json():
unit = parse_eiendom_unit_json(SAMPLE_EIENDOM_UNIT_JSON["units"][0])
assert unit.unit_code == "c-gxw-xmyum-s2a"
assert unit.address == "Fernerveien 42, 0554 Oslo"
assert unit.estimated_selling_price == 7650000
assert unit.listing_sqm_price == 93500
def test_unit_vector_roundtrip():
unit = parse_eiendom_unit_json(SAMPLE_EIENDOM_UNIT_JSON["units"][0])
vector = build_unit_vector(unit)
decoded = decode_unit_vector(vector)
assert decoded["ptype"] == "APARTMENT"
assert decoded["area"] == 77
assert decoded["price"] == 7200000
assert isinstance(decoded, dict)
assert decoded["lon"] == unit.lng
def test_parse_similar_units_json():
comps = parse_similar_units_json(SAMPLE_EIENDOM_SIMILAR_UNITS_JSON)
assert len(comps) == 2
assert comps[0].unit_code == "c-recent-1"
assert comps[1].selling_price == 7350000
def test_resolve_unit_from_finn_url():
unit_code = resolve_unit_from_finn_url(
"https://www.finn.no/realestate/homes/ad.html?finnkode=462400360"
)
assert unit_code == "462400360"
+83
View File
@@ -0,0 +1,83 @@
"""Tests for HTTP client retry logic."""
import httpx
import pytest
import respx
from finn_eiendom.http import HTTPClient
@pytest.mark.asyncio
async def test_get_retries_on_500():
"""Test that HTTPClient retries on 500 errors and succeeds on second attempt."""
client = HTTPClient(request_delay_seconds=0.0, retries=2)
with respx.mock:
route = respx.get("https://example.com/api")
route.side_effect = [
httpx.Response(500, text="Server Error"),
httpx.Response(200, text="Success"),
]
response = await client.get("https://example.com/api")
assert response.status_code == 200
@pytest.mark.asyncio
async def test_get_raises_on_404():
"""Test that HTTPClient raises on 4xx errors immediately."""
client = HTTPClient(request_delay_seconds=0.0, retries=2)
with respx.mock:
respx.get("https://example.com/api").mock(return_value=httpx.Response(404))
with pytest.raises(httpx.HTTPStatusError) as exc_info:
await client.get("https://example.com/api")
assert exc_info.value.response.status_code == 404
@pytest.mark.asyncio
async def test_get_retries_on_502_bad_gateway():
"""Test that HTTPClient retries on 502 Bad Gateway."""
client = HTTPClient(request_delay_seconds=0.0, retries=2)
with respx.mock:
route = respx.get("https://example.com/api")
route.side_effect = [
httpx.Response(502, text="Bad Gateway"),
httpx.Response(200, text="Success"),
]
response = await client.get("https://example.com/api")
assert response.status_code == 200
@pytest.mark.asyncio
async def test_post_retries_on_503():
"""Test that HTTPClient retries POST on 503 Service Unavailable."""
client = HTTPClient(request_delay_seconds=0.0, retries=2)
with respx.mock:
route = respx.post("https://example.com/api")
route.side_effect = [
httpx.Response(503, text="Service Unavailable"),
httpx.Response(201, json={"success": True}),
]
response = await client.post("https://example.com/api", json={"test": "data"})
assert response.status_code == 201
@pytest.mark.asyncio
async def test_get_eventually_fails_on_persistent_500():
"""Test that HTTPClient gives up after max retries."""
client = HTTPClient(request_delay_seconds=0.0, retries=1)
with respx.mock:
respx.get("https://example.com/api").mock(return_value=httpx.Response(500))
with pytest.raises(httpx.HTTPStatusError) as exc_info:
await client.get("https://example.com/api")
assert exc_info.value.response.status_code == 500
+69
View File
@@ -0,0 +1,69 @@
"""Tests for the MCP server tools."""
import json
from finn_eiendom.mcp_server import (
finn_decode_unit_vector,
mcp,
)
def test_mcp_server_has_correct_tools():
"""Assert that the MCP server has all expected tools."""
import asyncio
async def check_tools():
tools = await mcp.list_tools()
tool_names = {tool.name for tool in tools}
expected_tools = {
"finn_analyze_search",
"finn_get_ad",
"finn_resolve_eiendom_unit",
"finn_get_eiendom_unit",
"finn_get_similar_units",
"finn_build_unit_vector",
"finn_decode_unit_vector",
}
assert expected_tools.issubset(tool_names), f"Missing tools: {expected_tools - tool_names}"
asyncio.run(check_tools())
def test_finn_decode_unit_vector_returns_json():
"""Test that finn_decode_unit_vector returns valid JSON with expected keys."""
from unittest.mock import patch
test_vector = {
"lon": 10.7,
"lat": 59.9,
"ptype": "APARTMENT",
"floor": 3,
"rooms": 3,
"built": 2000,
"area": 80,
"price": 5000000,
}
with patch("finn_eiendom.mcp_server.decode_unit_vector", return_value=test_vector):
result = finn_decode_unit_vector("dGVzdA==")
data = json.loads(result)
assert "lon" in data
assert "lat" in data
assert "ptype" in data
assert data["lat"] == 59.9
assert data["lon"] == 10.7
def test_finn_decode_unit_vector_error_handling():
"""Test that finn_decode_unit_vector handles errors gracefully."""
from unittest.mock import patch
with patch(
"finn_eiendom.mcp_server.decode_unit_vector", side_effect=Exception("decode failed")
):
result = finn_decode_unit_vector("invalid")
data = json.loads(result)
assert data.get("error") is True
assert "message" in data
+45
View File
@@ -0,0 +1,45 @@
from finn_eiendom.parser import (
clean_text,
extract_finnkode_from_url,
normalize_area,
normalize_finnkode,
normalize_number,
normalize_price,
)
def test_normalize_price():
assert normalize_price("7 200 991 kr") == 7200991
assert normalize_price("1 234") == 1234
assert normalize_price(None) is None
def test_normalize_area():
assert normalize_area("77 m²") == 77
assert normalize_area("100,5 m²") == 100
assert normalize_area("") is None
def test_normalize_number():
assert normalize_number("3 500 kr/mnd") == 3500
assert normalize_number("7,2") == 7
assert normalize_number("1.234") == 1234
assert normalize_number(None) is None
def test_normalize_finnkode():
assert normalize_finnkode(" 462400360 ") == "462400360"
assert normalize_finnkode(None) is None
def test_extract_finnkode_from_url():
assert (
extract_finnkode_from_url("https://www.finn.no/realestate/homes/ad.html?finnkode=462400360")
== "462400360"
)
assert extract_finnkode_from_url("https://www.finn.no/realestate/homes/ad.html") is None
def test_clean_text():
assert clean_text(" Hello world \n") == "Hello world"
assert clean_text(None) is None
+22
View File
@@ -0,0 +1,22 @@
from finn_eiendom.models import EiendomUnit, FinnAd
from finn_eiendom.scoring import classify_ad, score_ad
def test_score_ad_and_classify():
ad = FinnAd(
finnkode="462400360",
url="https://www.finn.no/realestate/homes/ad.html?finnkode=462400360",
title="Flott 3-roms i Ferner",
)
unit = EiendomUnit(
unit_code="c-gxw-xmyum-s2a",
estimated_selling_price=7650000,
listing_price=7200000,
property_type="APARTMENT",
usable_area=77,
rooms=4,
)
scores = score_ad(ad, unit, [])
assert scores["market_position"] >= 0
categories = classify_ad(scores)
assert isinstance(categories, list)
+38
View File
@@ -0,0 +1,38 @@
from finn_eiendom.search import extract_ad_links, extract_search_cards
from tests.fixtures import SAMPLE_FINN_SEARCH_HTML, SAMPLE_FINN_SEARCH_HTML_NEW
def test_extract_search_cards():
cards = extract_search_cards(SAMPLE_FINN_SEARCH_HTML)
assert len(cards) == 2
assert cards[0].finnkode == "462400360"
assert cards[0].url.endswith("finnkode=462400360")
assert cards[0].area_m2 == 77
assert cards[0].total_price == 7200991
assert cards[0].common_costs == 3500
assert cards[1].bedrooms == 2
def test_extract_search_cards_new_format():
cards = extract_search_cards(SAMPLE_FINN_SEARCH_HTML_NEW)
assert len(cards) == 1
assert cards[0].finnkode == "462880791"
assert cards[0].url.endswith("finnkode=462880791")
assert cards[0].address == "Lofotgata 4B, Oslo"
assert cards[0].area_m2 == 62
assert cards[0].total_price == 7253377
assert cards[0].common_costs == 7067
assert cards[0].bedrooms == 2
def test_extract_ad_links():
links = extract_ad_links(SAMPLE_FINN_SEARCH_HTML)
assert len(links) == 2
assert "finnkode=462400360" in links[0]
assert "finnkode=460784945" in links[1]
def test_extract_ad_links_new_format():
links = extract_ad_links(SAMPLE_FINN_SEARCH_HTML_NEW)
assert len(links) == 1
assert "finnkode=462880791" in links[0]
+97
View File
@@ -0,0 +1,97 @@
"""Tests for the service layer (cache-aware fetching)."""
from unittest.mock import patch
import pytest
from finn_eiendom.models import EiendomUnit, FinnAd
from finn_eiendom.service import get_or_fetch_ad, get_or_fetch_eiendom_unit
@pytest.mark.asyncio
async def test_get_or_fetch_ad_uses_cache():
"""Test that get_or_fetch_ad returns cached ad without fetching."""
mock_ad = FinnAd(finnkode="123", url="http://example.com")
with (
patch("finn_eiendom.service.init_db"),
patch("finn_eiendom.service.get_finn_ad", return_value=mock_ad) as mock_get,
patch("finn_eiendom.service.fetch_ad_details") as mock_fetch,
):
result = await get_or_fetch_ad("123")
assert result.finnkode == "123"
mock_get.assert_called_once()
mock_fetch.assert_not_called()
@pytest.mark.asyncio
async def test_get_or_fetch_ad_fetches_when_cache_miss():
"""Test that get_or_fetch_ad fetches when cache is empty."""
mock_ad = FinnAd(finnkode="123", url="http://example.com")
with (
patch("finn_eiendom.service.init_db"),
patch("finn_eiendom.service.get_finn_ad", return_value=None),
patch("finn_eiendom.service.fetch_ad_details", return_value=mock_ad) as mock_fetch,
patch("finn_eiendom.service.save_finn_ad") as mock_save,
):
result = await get_or_fetch_ad("123")
assert result.finnkode == "123"
mock_fetch.assert_called_once_with("123")
mock_save.assert_called_once()
@pytest.mark.asyncio
async def test_get_or_fetch_ad_force_refresh():
"""Test that force_refresh=True bypasses cache."""
mock_ad = FinnAd(finnkode="123", url="http://example.com")
with (
patch("finn_eiendom.service.init_db"),
patch("finn_eiendom.service.get_finn_ad", return_value=mock_ad) as mock_get,
patch("finn_eiendom.service.fetch_ad_details", return_value=mock_ad) as mock_fetch,
patch("finn_eiendom.service.save_finn_ad") as mock_save,
):
result = await get_or_fetch_ad("123", force_refresh=True)
assert result.finnkode == "123"
mock_get.assert_not_called()
mock_fetch.assert_called_once_with("123")
mock_save.assert_called_once()
@pytest.mark.asyncio
async def test_get_or_fetch_eiendom_unit_uses_cache():
"""Test that get_or_fetch_eiendom_unit returns cached unit without fetching."""
mock_unit = EiendomUnit(unit_code="test-code")
with (
patch("finn_eiendom.service.init_db"),
patch("finn_eiendom.service.get_cached_eiendom_unit", return_value=mock_unit) as mock_get,
patch("finn_eiendom.service.get_unit") as mock_fetch,
):
result = await get_or_fetch_eiendom_unit("test-code")
assert result.unit_code == "test-code"
mock_get.assert_called_once()
mock_fetch.assert_not_called()
@pytest.mark.asyncio
async def test_get_or_fetch_eiendom_unit_fetches_when_cache_miss():
"""Test that get_or_fetch_eiendom_unit fetches when cache is empty."""
mock_unit = EiendomUnit(unit_code="test-code")
with (
patch("finn_eiendom.service.init_db"),
patch("finn_eiendom.service.get_cached_eiendom_unit", return_value=None),
patch("finn_eiendom.service.get_unit", return_value=mock_unit) as mock_fetch,
patch("finn_eiendom.service.save_eiendom_unit") as mock_save,
):
result = await get_or_fetch_eiendom_unit("test-code")
assert result.unit_code == "test-code"
mock_fetch.assert_called_once_with("test-code")
mock_save.assert_called_once()