This commit is contained in:
Ole
2026-05-16 06:54:17 +00:00
commit 1399f61c1a
44 changed files with 6746 additions and 0 deletions
+181
View File
@@ -0,0 +1,181 @@
# Copilot instructions for finn-eiendom-mcp
This project is a private, self-hosted Python platform for analyzing FINN real-estate listings. It exposes the same code through three coordinated front ends:
1. A **Python library** (`finn_eiendom`) — source of truth.
2. An **MCP server** (FastMCP, stdio + optional HTTP) over `finn_eiendom/mcp_server.py`.
3. A **CLI** (`finn-eiendom`) over `finn_eiendom/cli.py`.
All three share the same `service.py`, `formatting.py`, `cache.py`, and `models.py`. Code lives in exactly one place and is called from both front ends. See `PRD.md` §17 for the full ownership rules — that section is the constitution.
---
## Source of truth
Read in this order:
1. `PRD.md` — product and architecture, especially §17.
2. `PROJECT.md` — module map.
3. `AGENTS.md` — workflow.
4. `.github/instructions/*.md` — per-topic rules.
---
## Module layout
```
finn_eiendom/
config.py # env vars, defaults, TTLs
models.py # Pydantic v2 models
parser.py # number/area/date/URL/finnkode normalization
http.py # async HTTP (httpx) with delay + retry + user-agent
cache.py # SQLite (sqlite3) schema + persistence
search.py # FINN search HTML parsing + pagination
ad.py # FINN listing HTML parsing
eiendom_no.py # Eiendom.no unit search/detail, unit_vector, similar-units
scoring.py # score model + classifications
feedback.py # verdicts + soft preference signal
analysis.py # orchestration + shortlist + summary
service.py # get_or_fetch_* + thin facade for MCP and CLI
formatting.py # render_* helpers shared by MCP and CLI
mcp_server.py # FastMCP wrappers around service.py
cli.py # typer-based CLI wrappers around service.py
__main__.py # python -m finn_eiendom → CLI entry
```
---
## The five hard rules
Enforced by `tests/test_architecture.py`:
1. **`mcp_server.py` and `cli.py` are siblings.** They never import from each other. Both import only from `service`, `formatting`, `models`, `config`, stdlib, and their own framework (`mcp` / `typer`).
2. **`service.py` is the only orchestrator.** Nothing above it touches HTTP or SQLite directly.
3. **`httpx` lives only in `http.py`.**
4. **`sqlite3` lives only in `cache.py`.**
5. **Output formatting lives only in `formatting.py`.** Never inline in CLI or MCP tool bodies.
---
## Development workflow — local venv
Default runtime is a project-local virtualenv. Docker is supported for packaging but optional for development.
```bash
uv venv # or: python3.12 -m venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]" # or: pip install -e ".[dev]"
# from now on:
pytest
ruff check .
ruff format .
mypy finn_eiendom
finn-eiendom --help
finn-eiendom-mcp # stdio MCP server
```
**Never** install packages globally. **Never** add a dependency without updating `pyproject.toml`.
---
## Coding rules
* Python 3.12+.
* Pydantic v2 with `model_config = ConfigDict(...)`. No v1 `class Config:` blocks.
* Type hints on every function signature.
* Async I/O for all network and DB code paths through `service.py`.
* Dependency injection for HTTP/cache clients in tests.
* Small, focused functions. One job per function. See `clean-code.instructions.md`.
* Errors raise with actionable messages; the MCP boundary translates them to `{"error": True, "code": ..., "message": ...}`.
* stdio MCP servers log to **stderr only**.
---
## Code ownership — the short version
| Concern | Lives in |
| -------------------------------------- | ------------------------------ |
| FINN search HTML parsing | `search.py` |
| FINN listing HTML parsing | `ad.py` |
| Norwegian number / area / URL regexes | `parser.py` |
| HTTP fetching + retry + delay | `http.py` |
| SQLite reads / writes | `cache.py` |
| Eiendom.no unit search/detail/comps | `eiendom_no.py` |
| `unit_vector` encode/decode (msgpack) | `eiendom_no.py` |
| Scoring + classification | `scoring.py` |
| Feedback storage | `feedback.py` |
| Cache-aware orchestration | `service.py` (`get_or_fetch_*`)|
| Shortlist + summary assembly | `analysis.py` |
| End-to-end runs | `service.py` (`analyze_search`)|
| MCP tool definitions | `mcp_server.py` |
| CLI command definitions | `cli.py` |
| Output rendering | `formatting.py` |
| Env-var defaults | `config.py` |
| Pydantic models | `models.py` |
Full table with "never lives in" column is in `PRD.md` §17.2.
---
## Adding a feature
1. Decide the home using the table above (and `PRD.md` §17.2).
2. Implement in `service.py` (or `analysis.py` if pure orchestration).
3. Add a service-level test.
4. Add a thin MCP tool — `response_format`-aware.
5. Add a thin CLI command — `--format`-aware.
6. Add a renderer in `formatting.py`.
7. Test MCP and CLI registration.
8. Update PRD and instruction docs.
If the MCP tool body or CLI command body grows past ~20 lines, push logic down to `service.py`.
---
## Documentation lookups — use context7
When uncertain about an external library API (FastMCP, Pydantic v2, Typer, httpx, msgpack, pytest-asyncio, respx, BeautifulSoup), call the **`context7` MCP server** *before* writing code. Don't rely on training-data memory.
```
context7:resolve-library-id → library_id
context7:query-docs(library_id, topic) → authoritative snippets
```
Details in `.github/instructions/docs.instructions.md`.
---
## Clean code is a hard requirement
See `clean-code.instructions.md`. DRY, single-responsibility, descriptive names, type hints, no dead code, comments explain why not what. If duplication slips in, the right answer is to extract it — not to copy the second instance.
---
## Product behavior
The MVP does one thing well:
```
FINN search URL in
→ relevant property candidates out
→ enriched with Eiendom.no estimates
→ similar-units / comps
→ explanations
→ risks
→ next steps
→ broker questions
```
Always explain:
* why a property is interesting,
* price vs estimate,
* price vs comparable sales,
* renovation upside,
* hybel / rental potential,
* technical / legal risks,
* uncertainty / confidence,
* next questions for the broker.
Scores and estimates are decision support, not advice. Surface uncertainty, never hide it.