A data toolkit for Icelandic public data, built on Claude Code skills

February 27, 2026

I've been building a repo that teaches Claude Code about Icelandic public data sources. Not a library, not a framework — a collection of skill files, Python scripts, and accumulated knowledge about how to extract data from official APIs that were never designed to be easy.

The repo is icelandic-data. Each data source gets a skill file (.claude/skills/{source}.md) that documents API endpoints, series codes, encoding quirks, and classification gotchas. Alongside it, a Python script fetches and cleans the data. Clone the repo, run setup.sh, and Claude Code can query 16+ Icelandic data sources.

Why skills, not notebooks

The traditional approach to data work is Jupyter notebooks or R scripts — sequential, manual, hard to reuse. The skill-based approach is different: you document the data source once, then interact with it conversationally. "What's the Central Bank policy rate history?" doesn't require remembering which API endpoint to call or how the SDMX response is structured. The skill file knows.

This makes the repo useful for research that crosses data sources. Property prices from HMS, interest rates from the Central Bank, building permits from Planitor, company financials from the Tax Authority — joining these used to mean a week of API spelunking. Now it's a conversation.

What it actually produces

The outputs vary. Sometimes it's a quick DuckDB query piped to a gist. Sometimes it's a self-contained HTML report with Chart.js charts that I PDF for sharing. A few examples of what the toolkit has produced:

Fuel market oligopoly analysis — Three conglomerates control Iceland's fuel retail. By combining Gasvaktin's daily price scrapes (10 years of data), annual reports from skatturinn.is (extracted from PDFs via Docling), and Nordic comparisons from public filings, the report showed Icelandic fuel retailers extracting 4-6x the gross margin per liter compared to Nordic peers. The price spread between competitors is 5-10 ISK/L — textbook parallel pricing.

Insurance margin comparison — Four companies, combined ratios of 89-99% vs Nordic peers at 81-86%. The interesting finding: the gap isn't in claims (loss ratios are similar) but in expense ratios and underwriting discipline. Profits come from investment income, not insurance. When markets dip, underwriting losses surface immediately. This required pulling IFRS 17 figures from PDF annual reports, which Docling handles well.

Reykjavik dust and PM10 — Studded tires grinding asphalt into breathable dust every spring. The report joined air quality monitoring data (57 stations, hourly readings via UST's API), vehicle registration data from Samgöngustofa, and Reykjavik's own studded tire usage surveys. Oslo charges per day for studded tires, Stockholm saw 20% PM10 drops from bans. Reykjavik's dust binding program ran out of supplies in February 2025.

Winter services procurement — Who does Reykjavik pay to clear snow? By querying Opin Fjármál (vendor-level municipal spending, ~94k rows/year), the report tracked outsourcing patterns, cost trends, and which contractors dominate.

The hard parts

Icelandic data sources have character. Statistics Iceland uses PX-Web with Icelandic variable names. The Central Bank speaks SDMX. Reykjavik has both a CKAN portal and a PX-Web instance. The Tax Authority requires Playwright to navigate — no API, just a website with session state.

PDF annual reports are the worst and most valuable source. Docling (IBM's extraction library) handles table extraction well, but financial statements need interpretation — which line is revenue, which is EBITDA, what's the fiscal year. The financials skill chains Docling extraction with Claude interpretation to produce structured JSON.

Some quirks that the skills capture: skatturinn.is lists two reports per year for public companies (parent and consolidated). Hagar uses a February-January fiscal year. VÍS consolidated data lives under the "Skagi" entity, not the operating company. Gasvaktin's Git repo uses company codes (n1, ol, or, ao) that don't match anything official.

Not a portable skill library

The agent skills ecosystem is moving toward distributable, standalone skills — Cloudflare publishes theirs at cloudflare/skills, there's a formal spec at agentskills.io. My repo isn't that. The skills reference co-located scripts, assume uv and duckdb are installed, and expect processed data to exist locally. It's a toolkit, not a package. The skills and scripts work as a unit.

This is fine. Not everything needs to be npm-installable. Sometimes the right unit of sharing is "clone the repo."

Setup

git clone https://github.com/jokull/icelandic-data
cd icelandic-data
./setup.sh  # installs jq, duckdb, uv
uv sync     # Python dependencies

Then open it in Claude Code and start asking questions.

Comments 0

No comments yet. Be the first to comment!