finn-mcp/.github/instructions/tests.instructions.md

---
name: Test rules
description: Testing conventions for parser, cache, scoring, service, MCP, CLI, and architecture
applyTo: "tests/**/*.py"
---

# Test rules

## Runtime

Tests run in the project-local `.venv`. From the project root with the venv activated:

```bash
pytest                                 # full suite
pytest tests/test_service.py -v        # one file
pytest -k "shortlist"                  # one keyword
pytest --lf                            # rerun last failures
```

`pytest-asyncio` is in `[tool.pytest.ini_options]` with `asyncio_mode = "auto"` — `async def` tests run without an `@pytest.mark.asyncio` decorator.

## Never do live network calls

No real HTTP in unit tests. Mock with `respx` (sits in front of `httpx.AsyncClient`):

```python
import respx, httpx
from finn_eiendom import http as http_module

@respx.mock
async def test_finn_search_fetch_uses_user_agent():
    route = respx.get("https://www.finn.no/realestate/homes/search.html").mock(
        return_value=httpx.Response(200, html=SAMPLE_FINN_SEARCH_HTML)
    )
    client = http_module.HTTPClient(user_agent="test-agent")
    resp = await client.get("https://www.finn.no/realestate/homes/search.html")
    assert resp.status_code == 200
    assert route.calls.last.request.headers["user-agent"] == "test-agent"
```

## Fixtures

Fixture-driven testing for parsers and APIs:

* FINN search HTML → `tests/fixtures/finn_search.html`.
* FINN listing HTML → `tests/fixtures/finn_ad_*.html`.
* Eiendom.no unit search JSON → `tests/fixtures/eiendom_unit_search.json`.
* Eiendom.no unit detail JSON → `tests/fixtures/eiendom_unit_detail.json`.
* Eiendom.no similar-units JSON → `tests/fixtures/eiendom_similar.json`.

Loader helpers in `tests/fixtures.py` (e.g. `SAMPLE_FINN_SEARCH_HTML`, `SAMPLE_EIENDOM_UNIT_JSON`). Add new fixtures here, don't inline large strings in test files.

## Test layout

```
tests/
  fixtures/                # raw HTML / JSON inputs
  fixtures.py              # loader helpers
  conftest.py              # shared pytest fixtures (tmp DB, http client, etc.)
  test_parser.py           # number/area/date/URL/finnkode normalization
  test_search.py           # FINN search HTML → cards
  test_ad.py               # FINN listing HTML → FinnAd
  test_eiendom_no.py       # unit search/detail/similar JSON, unit_vector encode/decode
  test_scoring.py          # all scoring components + classifier
  test_cache.py            # SQLite read/write/TTL
  test_http.py             # retry on 5xx, raise on 4xx, delay applied  (new)
  test_service.py          # get_or_fetch_*, analyze_*                    (new)
  test_formatting.py       # render_* json/markdown/table                  (new)
  test_mcp_server.py       # tool registration + error envelope            (expanded)
  test_cli.py              # typer CliRunner                                (new)
  test_architecture.py     # import-graph invariants                        (new)
```

## What to test per category

### Parsers (`test_parser`, `test_search`, `test_ad`, `test_eiendom_no`)

* Missing fields → `None`, not exception.
* Norwegian number formats: `7 200 991 kr`, `kr 7 200 991`, `7.200.991`.
* URL normalization (relative → absolute).
* Finnkode extraction from various URL shapes.
* Area parsing: `77 m²`, `77m2`, `77 kvm`.
* Price parsing (asking vs total vs shared debt).
* Eiendom.no JSON edge cases: empty `units`, missing `valuation`, missing `latestMarketData`.

### Unit vectors (`test_eiendom_no`)

* msgpack encoding + base64url without padding.
* Decode roundtrip.
* Missing optional fields (floor, rooms, built).
* Both lon/lat orderings handled.

### Scoring (`test_scoring`)

* Each component in isolation.
* Total clamped to 0–100.
* Risk penalties applied (negative range).
* Bargain classification triggers on the expected signal mix.
* Hybel classification: documented / possible / unclear / not relevant.
* Explainability: explanation list non-empty when score is non-trivial.

### Cache (`test_cache`)

* Read after write returns same object.
* TTL expiry returns `None`.
* JSON roundtrip preserves all fields.
* `init_db` is idempotent on existing DBs.

### HTTP (`test_http`)

* Retries on 500/502/503/504 with backoff (count exactly N retries).
* Raises immediately on 404 / 4xx.
* Applies `request_delay` between calls.
* Honors `user_agent`.

### Service (`test_service`)

The service tests are the heart of the suite. They cover orchestration end-to-end against fixtures.

* `test_get_or_fetch_ad_uses_cache` — second call hits cache, no HTTP.
* `test_get_or_fetch_ad_fetches_when_cache_miss` — first call hits HTTP, then writes cache.
* `test_get_or_fetch_ad_force_refresh` — `force_refresh=True` bypasses cache.
* `test_analyze_search_with_fixtures` — full run from search HTML → shortlist.
* `test_find_similar_to_liked_uses_liked_feedback` — only seeds from `liked` verdicts.

Use a tmp SQLite DB via the `tmp_path` pytest fixture:

```python
@pytest.fixture
def tmp_db(tmp_path, monkeypatch):
    db_path = tmp_path / "finn.sqlite"
    monkeypatch.setenv("FINN_CACHE_PATH", str(db_path))
    return db_path
```

### Formatting (`test_formatting`)

* `render_shortlist(result, "json")` is parseable JSON and roundtrips.
* `render_shortlist(result, "markdown")` contains the score and at least one risk.
* `render_<thing>(result, "xml")` raises `ValueError` listing supported formats.

### MCP (`test_mcp_server`)

* `test_mcp_server_has_correct_tools` — all 14 `finn_*` tool names registered.
* `test_finn_decode_unit_vector_returns_json` — happy path.
* `test_finn_analyze_search_handles_error` — error envelope shape: `{"error": True, "code": ..., "message": ...}`.

Use the `mcp` SDK's testing helpers; don't spawn a subprocess.

### CLI (`test_cli`)

Use Typer's `CliRunner`:

```python
from typer.testing import CliRunner
from finn_eiendom.cli import app

runner = CliRunner()

def test_cli_help():
    result = runner.invoke(app, ["--help"])
    assert result.exit_code == 0
    assert "analyze-search" in result.stdout
```

Patch `service.<function>` with `monkeypatch` so CLI tests don't exercise the full stack — that's covered by `test_service.py`.

### Architecture (`test_architecture`)

Static checks of the module dependency graph:

* No `import httpx` outside `finn_eiendom/http.py`.
* No `import sqlite3` outside `finn_eiendom/cache.py`.
* No `BeautifulSoup` import outside `search.py` and `ad.py`.
* No `msgpack` import outside `eiendom_no.py`.
* `mcp_server.py` only imports from `service`, `formatting`, `models`, `config`, `mcp`, stdlib, `pydantic`.
* `cli.py` only imports from `service`, `formatting`, `models`, `config`, `typer`, stdlib.
* `service.py` does not import from `mcp_server` or `cli`.

Implementation: walk `.py` files under `finn_eiendom/` with `ast`, collect imports, assert allowed sets per module.

## Best practices

* One assertion per test (or per closely related group). Long tests die in painful ways.
* Test names describe the behavior: `test_get_or_fetch_ad_uses_cache_within_ttl`.
* Use `monkeypatch` for env vars and `tmp_path` for files. No `os.environ` mutation.
* No `time.sleep` — use `freezegun` if a test depends on time, or refactor the code under test to take a `now` parameter.
* No "smoke tests" that ping real servers — those go under a separately-marked `pytest -m live` suite and are not part of CI.

## When uncertain about test tooling

Use `context7` for pytest, respx, freezegun, or Typer testing:

```
context7:resolve-library-id   →  "pytest-dev/pytest" / "lundberg/respx"
context7:query-docs(id, "respx mock httpx async post")
```

See `docs.instructions.md`.