Prophet
Predictive Land Intelligence
Predictive modelling platform that learns historical subdivision and land use patterns to forecast future tenure changes and property value at scale. Built on Dagster, GeoPandas, FastAPI, and MapLibre GL.
Land doesn't change randomly.
Subdivision patterns, rezoning sequences, and tenure transitions follow learnable trajectories. What if a model trained on decades of cadastral history could tell you what a parcel will become?
The Insight
Every piece of land has a history. Before a greenfield estate becomes 400 residential lots, it was a single rural holding. Before that rural holding was rezoned, the corridor around it changed first — infrastructure extended, adjacent parcels consolidated, planning overlays shifted.
These transitions aren’t random. They follow patterns that repeat across decades and geographies. A parcel’s neighbours, its zoning trajectory, the timing of surrounding subdivisions, the sequence of ownership changes — these are features in a prediction problem that nobody is modelling at scale.
The data exists. Australian state governments maintain cadastral registries with historical snapshots going back years. Every lot plan, every boundary change, every status transition is recorded. But it sits in static GIS layers that nobody treats as training data.
Prophet treats it as training data.
Architecture
Prophet is a three-layer system: a Dagster-orchestrated pipeline that converts raw cadastral vectors into a cellular grid, a FastAPI tile server that serves that grid as pre-computed MVTs, and a React/MapLibre GL frontend that renders temporal comparisons in the browser.
┌─────────────────────────────────────────────────────────────────┐
│ PROPHET ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ DAGSTER PIPELINE FASTAPI SERVER REACT FRONTEND │
│ ─────────────── ────────────── ────────────── │
│ │
│ source_metadata GET /perspectives MapLibre GL │
│ │ /{id}/mvt/{tick} VectorTileSource│
│ source_vector_data /0/{z}/{x}/{y}.mvt │ │
│ │ │ setUrl() on │
│ base_cadastre_blob_grid ──→ pickle.load(mvt) tick change │
│ │ │ │ │
│ blob_matrix_mvt_tiles application/vnd. Zustand store │
│ (MultiPartition: mapbox-vector-tile manages state│
│ time × tile) │
│ │
│ Partitions: Serves ~300 pre- Temporal │
│ T2020, T2021 × computed tiles scrubbing via │
│ z12-z14 Brisbane tiles per time period timeline UI │
│ │
└─────────────────────────────────────────────────────────────────┘The Blob Matrix
The core data structure is a spatial grid we call the blob matrix. Instead of comparing vector boundaries across time (computationally expensive and geometrically unstable when parcels split, merge, or shift), Prophet rasterises each cadastral snapshot onto a fixed cellular grid aligned to coordinate-space origins.
The grid generation algorithm aligns cells to (0,0) in the projected CRS, guaranteeing that grids produced from different spatial extents but the same cell size are perfectly aligned for temporal stacking:
# Grid alignment to coordinate origin
xmin_aligned = np.floor(xmin / cell_size) * cell_size
ymin_aligned = np.floor(ymin / cell_size) * cell_size
xmax_aligned = np.ceil(xmax / cell_size) * cell_size
ymax_aligned = np.ceil(ymax / cell_size) * cell_size
cols = np.arange(xmin_aligned, xmax_aligned, cell_size)
rows = np.arange(ymin_aligned, ymax_aligned, cell_size)Each cell is a Shapely box(x, y, x + cell_size, y + cell_size) with a deterministic ID derived from its grid coordinates: cell_{col}_{row}. A 10-metre grid over the Brisbane cadastral extent produces tens of thousands of cells per time period.
The cellularisation step then tags each grid cell with the attributes of every parcel it intersects via gpd.sjoin(). A single cell that overlaps multiple parcels appears multiple times in the output — one row per grid-cell/parcel intersection. The geometry stays as the original grid cell (not the intersection polygon), making the output directly suitable for vector tile encoding.
This representation enables three things that raw cadastral vectors don’t:
Temporal alignment — Identical grid IDs across years produce a clean time-series per cell, regardless of how the underlying parcel boundaries changed. A cell that was tagged with one rural lot plan in 2020 and three residential lot plans in 2025 carries that transition as a structured attribute delta.
ML-ready features — Each cell’s attribute history across time periods is a feature vector. The grid is the training dataset.
Tile-native storage — The grid cells map directly to MVT features with zero additional geometry processing.
Dagster Pipeline
The data pipeline uses Dagster’s asset-based orchestration with multi-dimensional partitioning. Every transformation is an auditable, reproducible asset.
The critical design decision is the MultiPartitionsDefinition that combines time and tile dimensions:
time_partitions = StaticPartitionsDefinition(["T2020", "T2021"])
# Generate tile partition keys for Brisbane (z12-z14)
tile_keys = get_brisbane_tile_keys(zooms=range(12, 15))
tile_partitions = StaticPartitionsDefinition(tile_keys)
time_tile_partitions = MultiPartitionsDefinition({
"time": time_partitions,
"tile": tile_partitions
})The tile key generation uses mercantile to enumerate all tiles in the Brisbane bounding box across zoom levels 12-14, producing ~150 tile keys per zoom level. Combined with 2 time partitions, the pipeline manages ~300+ discrete MVT assets, each independently materializable.
The asset DAG flows:
| Asset | Partitioning | Input | Output |
|---|---|---|---|
source_metadata | Per source | YAML config | Source registry |
source_vector_data | Per source × time | ArcGIS FeatureServer | GeoDataFrame (EPSG:3857) |
base_cadastre_blob_grid | Per time | Source vectors | Tagged cellular grid |
blob_matrix_mvt_tiles | Time × tile | Blob grid | MVT bytes |
The blob grid asset takes raw parcel vectors, creates the aligned grid via create_grid(), then performs the spatial join via cellularize_parcels(). The MVT asset then clips the grid to each tile’s Web Mercator bounds, reprojects to EPSG:4326, encodes features with mapbox_vector_tile.encode() at 4096 extent with 256-unit buffer, and persists the bytes via Dagster’s filesystem IO manager.
New years of cadastral data process incrementally. The pipeline ingests annual snapshots as they’re published and extends the training window without reprocessing history.
Tile Serving
FastAPI serves pre-computed MVTs at sub-5ms response times. The endpoint follows standard slippy map conventions:
GET /perspectives/{perspective_id}/mvt/{tick}/0/{z}/{x}/{y}.mvt
→ Content-Type: application/vnd.mapbox-vector-tileEach tile is stored as a pickled bytes object by Dagster’s filesystem IO manager, keyed by the multi-partition path {tile_partition}/{time_partition}. The server deserialises and returns raw bytes — no runtime geometry processing, no database queries, no spatial computation at serve time.
This is the key performance insight: all spatial computation happens once during pipeline materialisation. The tile server is a glorified file server with content-type headers.
Frontend
The React frontend uses MapLibre GL JS for rendering and Zustand for state management. The critical interaction is temporal scrubbing — switching between time periods to observe cadastral change.
When the user selects a different time period, the frontend calls VectorTileSource.setUrl() on the existing MapLibre source, swapping the tile URL template to point at the new time partition. MapLibre handles cache invalidation and tile re-fetching automatically. The map doesn’t reinitialise; only the tile data changes.
const mvtUrl = getMvtTileUrl(perspectiveId, selectedTick);
(source as VectorTileSource).setUrl(mvtUrl);The Zustand store manages source selection, tick selection, map viewport state, and a viewport size constraint system that calculates approximate viewport dimensions in metres (via Haversine) to enforce modeller size limits from server-side configuration.
An inverse-bbox overlay (a polygon covering the entire world except the source extent) provides visual context for the dataset boundary. The overlay uses a GeoJSON polygon with an outer ring at [-180.1, -90.1] to [180.1, 90.1] and an inner ring cut from the source’s bounding box.
The application ships as an Electron desktop app for cross-platform delivery.
Why This Matters
The Prediction Gap
Property markets are valued on comparable sales and discounted cash flows. Both methods are backward-looking. They tell you what land was worth; they don’t tell you what it will become.
The highest-value information in property development is knowing which land will transition — from rural to residential, from low-density to high-density, from fragmented ownership to consolidated parcels. Developers who identify these transitions early acquire at agricultural prices and sell at residential prices. The margin between those two numbers is where billions of dollars are made.
Today, identifying transition candidates requires local knowledge, planning contacts, and years of experience driving corridors. It doesn’t scale. It can’t be backtested. It’s invisible to institutional capital.
Prophet makes it visible, quantifiable, and scalable.
What the Model Learns
By training on historical cadastral time-series, Prophet identifies patterns like:
Subdivision precursors — A rural parcel surrounded by recently subdivided lots, within 2km of an infrastructure corridor extension, with a planning overlay change in the last 3 years, has a quantifiable probability of subdividing within the next 5 years.
Rezoning trajectories — Parcels don’t rezone in isolation. They follow spatial sequences. A rezoning event in one area predicts subsequent rezonings in adjacent areas with measurable lead times.
Tenure transition signals — Consolidation of adjacent parcels by a single entity, changes in lot status, and shifts in ownership tenure duration are precursors to development activity.
Value inflection points — The moment a parcel’s probability of transition crosses a threshold, its residual value (current use) diverges from its potential value (highest and best use). Prophet identifies these inflection points.
The Stack
| Layer | Technology | Role |
|---|---|---|
| Pipeline | Dagster | Multi-partition asset orchestration with full observability |
| Geospatial | GeoPandas, Shapely, Mercantile | Grid generation, spatial joins, tile enumeration |
| Storage | GeoParquet, pickle | Columnar geospatial persistence, MVT byte cache |
| Serving | FastAPI | Pre-computed MVT tile serving (<5ms) |
| Rendering | MapLibre GL JS | GPU-accelerated vector tile visualisation |
| State | Zustand | Reactive store with viewport constraint system |
| Desktop | Electron | Cross-platform packaging |
| Config | YAML | Source registry and modeller parameter management |
Market
Prophet operates at the intersection of predictive analytics and property:
| Segment | Size | Growth |
|---|---|---|
| Geospatial Analytics | $14.1B by 2030 | 14.2% CAGR |
| PropTech | $33.6B by 2030 | 16.8% CAGR |
| Property Data & Valuations | $8.7B by 2028 | 11.3% CAGR |
Property developers making land acquisition decisions. Infrastructure funds evaluating corridor investments. Local government forecasting housing supply and infrastructure load. Institutional investors seeking quantified, backtestable exposure to land transitions. Valuation firms needing data-driven highest-and-best-use assessments.
Relationship to Starling
Prophet is the physical-world intelligence complement to Starling, our financial signal intelligence platform.
The thesis is the same: public data, properly modelled over time, contains latent predictive signal that markets haven’t priced.
| Dimension | Starling | Prophet |
|---|---|---|
| Domain | Equities | Land and property |
| Signal source | Forum predictions | Cadastral history |
| Intelligence | Who predicts accurately | What land will become |
| Processing | NLP + backtesting | Spatial ML + temporal grids |
| Output | Ranked trading signals | Tenure and value forecasts |
| Alpha | Before consensus forms | Before transitions happen |
| Pipeline | PostgreSQL event-driven | Dagster asset-based |
Roadmap
Phase 1 (Current): Temporal Visualisation Engine — POC complete. Blob matrix pipeline ingests QLD cadastral snapshots from ArcGIS FeatureServer, generates time-partitioned cellular grids across zoom levels 12-14, and serves interactive MVT tiles via FastAPI. The data foundation is proven.
Phase 2: Multi-Source Enrichment — Overlay zoning layers, planning scheme amendments, ownership transfer records, and transaction data onto the temporal grid. Each additional signal dimension increases model expressiveness.
Phase 3: Pattern Training — Train temporal models on the enriched grid. Identify subdivision precursors, rezoning sequences, and tenure transition signatures. Backtest against held-out time periods.
Phase 4: Prediction API — Expose parcel-level predictions via API. Subscription model by coverage region and prediction horizon.
Phase 5: National Scale — Extend to all Australian states. Each new jurisdiction adds training data and validates model generalisability.
Prophet is in private development. Contact blake@drksci.com for a technical demonstration or investment discussion.