New · Forge Suite · v0.1

ForgeData

The data-intelligence layer of the Forge suite

Ask for geospatial data by intent and get back the best-fit correct dataset — ranked by coverage and resolution, with access cost as the deciding tiebreaker — across thirteen heterogeneous source types. ForgeGIS computes, ForgeMind reasons, ForgeGIS Studio renders; ForgeData decides which data to use and how to get it.

13 → 1
heterogeneous source types behind one query interface
49
MCP tools over stdio JSON-RPC — the same surface ForgeMind and Studio consume
47
spatial operations bound to data requirements in the routing registry
5
access tiers, cheapest to most expensive — self-calibrating on measured latency

What it is, why it matters, how it's built differently

Three short answers for the technical evaluator.

What it is

ForgeData sits beneath the suite and owns the cross-cutting problem of data discovery and access. It keeps one catalog — what you can query and the recipes that refresh it — and routes every request to the best-fit dataset across local files and remote services alike, with no hardcoded paths and no per-project glue.

Why it matters

Most platforms make the customer build the routing layer themselves: bespoke catalog tables, hand-maintained config, hardcoded paths nobody wants to own. ForgeData ships that layer in the box, so the suite works against a realistic, scattered data landscape on day one rather than after an integration project.

How it's built differently

The catalog is the configuration — the data-source recipes live in the same GeoPackage file as the inventory, so nothing drifts out of sync, and API keys stay in the environment, never the file. Routing is fidelity-first: coverage, then resolution, then extent-fit, with access cost as the deciding tiebreaker.

Cost-aware routing

ForgeData's core differentiator. Name an operation and an area of interest, and find_optimal_for_operation returns a ranked, costed, deduplicated list of catalog entries that can serve it — with no hardcoded paths.

How candidates are ranked

  1. AOI coverage — descending. The dataset covering more of your bbox wins.
  2. Resolution — ascending. When coverage ties, finer pixels win.
  3. Extent-fit — descending. The footprint closest to the AOI size wins.
  4. Access tier — final tiebreaker only. Locality breaks an otherwise-perfect tie; it never overrides coverage or resolution.

Five access tiers, cheapest first

  • LOCAL_HOTRecently loaded; warm in ForgeGIS CPU/GPU memory.
  • LOCAL_COLDOn-disk local file; fast, but needs I/O.
  • REMOTE_DBPostGIS — fast point lookup, costlier windowed reads.
  • REMOTE_COGHTTP / S3 range-queryable raster.
  • REMOTE_WCSOGC WCS and other on-demand HTTP services.

After the first read of a remote dataset, ForgeData stores the measured wall-clock latency and uses it instead of the static formula — the cost signal sharpens the longer the suite runs.

Thirteen sources, one interface

Thirteen source-type implementations cover the common public and private geospatial sources — all addressable through one uniform query interface. The customer's environment can be a mix; the caller sees one ranked answer.

LOCAL_DIR
A directory of tiles, vectors, or point clouds; optional filesystem-watch auto-pickup.
STAC
SpatioTemporal Asset Catalog collections (e.g. Sentinel-2 on S3).
WCS
OGC Web Coverage Service — on-demand windowed raster.
WFS
OGC Web Feature Service — GeoJSON at read time.
POSTGIS
PostGIS vector + raster with server-side spatial filtering.
ARCGIS_REST
Esri FeatureServer / ImageServer layers.
OVERPASS
OpenStreetMap features via the Overpass API (tile-grid).
OPENTOPO_API
OpenTopography elevation (tile-grid; daily rate limits).
NOAA_API
NOAA datasets (tile-grid).
COPERNICUS_CDS
Copernicus Climate Data Store (single-request, hash-named output).
TIGER_ROADS
TIGER/Line All Roads — named-road lookup and bbox queries.
TIGER_ADDRFEAT
TIGER/Line address-range edges — address geocoding.
TERRARIUM
Mapzen Terrarium terrain tiles (no-API-key elevation bootstrap).

What makes the answers trustworthy

The same correctness discipline the rest of the Forge suite is built on, applied to data.

Configuration as data

One portable catalog file

All state — datasets and the recipes that refresh them — lives in one standard GeoPackage. No companion sources.yml to drift out of sync, and no secrets baked in: the catalog stores only the environment-variable name that holds a key. Safe to copy, ship to a field laptop or air-gapped site, or commit — and readable in QGIS or DB Browser.

Refuses, never fabricates

Structured, classified errors

Every tool failure returns a structured envelope with a specific, machine-readable reason — aoi_out_of_coverage, windowed_read_unsupported, bbox_invalid — never an empty success or the nearest wrong tile. For coverage gaps it even names the registered source whose sync could acquire the area.

Honest geometry, one seam

Open water, handled correctly

DEMs flatten open water to a literal 0 m that reads as solid ground and corrupts slope, viewshed, and line-of-sight. ForgeData recodes over-water cells to NoData at one shared read seam — geometry-driven and source-agnostic across SRTM, Copernicus, 3DEP, Terrarium, and remote WCS — so every consumer inherits honest terrain. A strict no-op until a water layer is present.

Gets faster with use

Derived-product memory

When ForgeGIS computes a slope raster over an area, the result registers back into the catalog as a derived product, pinning its parent, operation, parameters, and content hash. The next request that needs the same thing finds it cheaply instead of recomputing — so over a long-running project, workflows get faster the longer the suite runs.

Where ForgeData sits in the suite

The routing work could in principle live inside ForgeGIS, ForgeMind, or Studio. It deliberately does not — each of those has a sharp job, and folding data-intelligence concerns into any of them would blur that boundary. ForgeData is the home for the cross-cutting problem, so it can evolve independently of every consumer.

ForgeGIS
Computes
GPU-accelerated geospatial compute engine — raster, terrain, spectral, point clouds.
ForgeMind
Reasons
The GIS Intent Compiler — natural-language orchestration across tools.
ForgeGIS Studio
Renders
Browser-facing UI — catalog browser, source management, raster preview.
ForgeData
Decides the data
Catalog, cost routing, sync, derived-product memory, window extraction.

ForgeMind and Studio talk to ForgeData over MCP — the same stdio JSON-RPC surface any third-party MCP client could use. ForgeGIS is consumed by ForgeData as an in-process Java library for the actual data reads, so windowed-raster extraction and elevation queries happen with no network hop. Underlying sources stay exactly where they are.

Who it's for

One layer, three readers. Each one-pager leads with the parts of ForgeData that matter most to that audience.

Downloads

All ForgeData collateral. Direct download — no email gate, no form wall.

A note on licensing. ForgeData is bundled with the Forge suite and is not sold separately. Its license is not yet finalized; nothing here should be read as a grant of rights. Contact the maintainer before redistributing ForgeData or any catalog bundle.

Talk to us about ForgeData

Evaluation access, suite licensing, and partnership inquiries — we read every email. An evaluation build is available to qualified teams on request.

rich@seaglassfoundry.com