Project

Prophet

Predictive Land Intelligence

Predictive modelling platform that learns historical subdivision and land use patterns to forecast future tenure changes and property value at scale. Built on Dagster, GeoPandas, FastAPI, and MapLibre GL.

Spatial Intelligence
POC Complete Python Dagster GeoPandas Shapely FastAPI MapLibre GL React TypeScript Zustand Electron GeoParquet Mercantile
The Thesis

Land doesn't change randomly.

Subdivision patterns, rezoning sequences, and tenure transitions follow learnable trajectories. What if a model trained on decades of cadastral history could tell you what a parcel will become?

The Insight

Every piece of land has a history. Before a greenfield estate becomes 400 residential lots, it was a single rural holding. Before that rural holding was rezoned, the corridor around it changed first — infrastructure extended, adjacent parcels consolidated, planning overlays shifted.

These transitions aren’t random. They follow patterns that repeat across decades and geographies. A parcel’s neighbours, its zoning trajectory, the timing of surrounding subdivisions, the sequence of ownership changes — these are features in a prediction problem that nobody is modelling at scale.

The data exists. Australian state governments maintain cadastral registries with historical snapshots going back years. Every lot plan, every boundary change, every status transition is recorded. But it sits in static GIS layers that nobody treats as training data.

Prophet treats it as training data.

Architecture

Prophet is a three-layer system: a Dagster-orchestrated pipeline that converts raw cadastral vectors into a cellular grid, a FastAPI tile server that serves that grid as pre-computed MVTs, and a React/MapLibre GL frontend that renders temporal comparisons in the browser.

┌─────────────────────────────────────────────────────────────────┐
│                       PROPHET ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DAGSTER PIPELINE              FASTAPI SERVER    REACT FRONTEND │
│  ───────────────              ──────────────    ──────────────  │
│                                                                 │
│  source_metadata               GET /perspectives  MapLibre GL   │
│       │                        /{id}/mvt/{tick}   VectorTileSource│
│  source_vector_data            /0/{z}/{x}/{y}.mvt     │         │
│       │                             │             setUrl() on   │
│  base_cadastre_blob_grid ──→  pickle.load(mvt)    tick change   │
│       │                             │                  │        │
│  blob_matrix_mvt_tiles        application/vnd.    Zustand store │
│  (MultiPartition:              mapbox-vector-tile  manages state│
│   time × tile)                                                  │
│                                                                 │
│  Partitions:                   Serves ~300 pre-   Temporal      │
│  T2020, T2021 ×               computed tiles     scrubbing via  │
│  z12-z14 Brisbane tiles        per time period    timeline UI   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Blob Matrix

The core data structure is a spatial grid we call the blob matrix. Instead of comparing vector boundaries across time (computationally expensive and geometrically unstable when parcels split, merge, or shift), Prophet rasterises each cadastral snapshot onto a fixed cellular grid aligned to coordinate-space origins.

The grid generation algorithm aligns cells to (0,0) in the projected CRS, guaranteeing that grids produced from different spatial extents but the same cell size are perfectly aligned for temporal stacking:

# Grid alignment to coordinate origin
xmin_aligned = np.floor(xmin / cell_size) * cell_size
ymin_aligned = np.floor(ymin / cell_size) * cell_size
xmax_aligned = np.ceil(xmax / cell_size) * cell_size
ymax_aligned = np.ceil(ymax / cell_size) * cell_size

cols = np.arange(xmin_aligned, xmax_aligned, cell_size)
rows = np.arange(ymin_aligned, ymax_aligned, cell_size)

Each cell is a Shapely box(x, y, x + cell_size, y + cell_size) with a deterministic ID derived from its grid coordinates: cell_{col}_{row}. A 10-metre grid over the Brisbane cadastral extent produces tens of thousands of cells per time period.

The cellularisation step then tags each grid cell with the attributes of every parcel it intersects via gpd.sjoin(). A single cell that overlaps multiple parcels appears multiple times in the output — one row per grid-cell/parcel intersection. The geometry stays as the original grid cell (not the intersection polygon), making the output directly suitable for vector tile encoding.

This representation enables three things that raw cadastral vectors don’t:

  1. Temporal alignment — Identical grid IDs across years produce a clean time-series per cell, regardless of how the underlying parcel boundaries changed. A cell that was tagged with one rural lot plan in 2020 and three residential lot plans in 2025 carries that transition as a structured attribute delta.

  2. ML-ready features — Each cell’s attribute history across time periods is a feature vector. The grid is the training dataset.

  3. Tile-native storage — The grid cells map directly to MVT features with zero additional geometry processing.

Dagster Pipeline

The data pipeline uses Dagster’s asset-based orchestration with multi-dimensional partitioning. Every transformation is an auditable, reproducible asset.

The critical design decision is the MultiPartitionsDefinition that combines time and tile dimensions:

time_partitions = StaticPartitionsDefinition(["T2020", "T2021"])

# Generate tile partition keys for Brisbane (z12-z14)
tile_keys = get_brisbane_tile_keys(zooms=range(12, 15))
tile_partitions = StaticPartitionsDefinition(tile_keys)

time_tile_partitions = MultiPartitionsDefinition({
    "time": time_partitions,
    "tile": tile_partitions
})

The tile key generation uses mercantile to enumerate all tiles in the Brisbane bounding box across zoom levels 12-14, producing ~150 tile keys per zoom level. Combined with 2 time partitions, the pipeline manages ~300+ discrete MVT assets, each independently materializable.

The asset DAG flows:

AssetPartitioningInputOutput
source_metadataPer sourceYAML configSource registry
source_vector_dataPer source × timeArcGIS FeatureServerGeoDataFrame (EPSG:3857)
base_cadastre_blob_gridPer timeSource vectorsTagged cellular grid
blob_matrix_mvt_tilesTime × tileBlob gridMVT bytes

The blob grid asset takes raw parcel vectors, creates the aligned grid via create_grid(), then performs the spatial join via cellularize_parcels(). The MVT asset then clips the grid to each tile’s Web Mercator bounds, reprojects to EPSG:4326, encodes features with mapbox_vector_tile.encode() at 4096 extent with 256-unit buffer, and persists the bytes via Dagster’s filesystem IO manager.

New years of cadastral data process incrementally. The pipeline ingests annual snapshots as they’re published and extends the training window without reprocessing history.

Tile Serving

FastAPI serves pre-computed MVTs at sub-5ms response times. The endpoint follows standard slippy map conventions:

GET /perspectives/{perspective_id}/mvt/{tick}/0/{z}/{x}/{y}.mvt
→ Content-Type: application/vnd.mapbox-vector-tile

Each tile is stored as a pickled bytes object by Dagster’s filesystem IO manager, keyed by the multi-partition path {tile_partition}/{time_partition}. The server deserialises and returns raw bytes — no runtime geometry processing, no database queries, no spatial computation at serve time.

This is the key performance insight: all spatial computation happens once during pipeline materialisation. The tile server is a glorified file server with content-type headers.

Frontend

The React frontend uses MapLibre GL JS for rendering and Zustand for state management. The critical interaction is temporal scrubbing — switching between time periods to observe cadastral change.

When the user selects a different time period, the frontend calls VectorTileSource.setUrl() on the existing MapLibre source, swapping the tile URL template to point at the new time partition. MapLibre handles cache invalidation and tile re-fetching automatically. The map doesn’t reinitialise; only the tile data changes.

const mvtUrl = getMvtTileUrl(perspectiveId, selectedTick);
(source as VectorTileSource).setUrl(mvtUrl);

The Zustand store manages source selection, tick selection, map viewport state, and a viewport size constraint system that calculates approximate viewport dimensions in metres (via Haversine) to enforce modeller size limits from server-side configuration.

An inverse-bbox overlay (a polygon covering the entire world except the source extent) provides visual context for the dataset boundary. The overlay uses a GeoJSON polygon with an outer ring at [-180.1, -90.1] to [180.1, 90.1] and an inner ring cut from the source’s bounding box.

The application ships as an Electron desktop app for cross-platform delivery.

Why This Matters

The Prediction Gap

Property markets are valued on comparable sales and discounted cash flows. Both methods are backward-looking. They tell you what land was worth; they don’t tell you what it will become.

The highest-value information in property development is knowing which land will transition — from rural to residential, from low-density to high-density, from fragmented ownership to consolidated parcels. Developers who identify these transitions early acquire at agricultural prices and sell at residential prices. The margin between those two numbers is where billions of dollars are made.

Today, identifying transition candidates requires local knowledge, planning contacts, and years of experience driving corridors. It doesn’t scale. It can’t be backtested. It’s invisible to institutional capital.

Prophet makes it visible, quantifiable, and scalable.

What the Model Learns

By training on historical cadastral time-series, Prophet identifies patterns like:

Subdivision precursors — A rural parcel surrounded by recently subdivided lots, within 2km of an infrastructure corridor extension, with a planning overlay change in the last 3 years, has a quantifiable probability of subdividing within the next 5 years.

Rezoning trajectories — Parcels don’t rezone in isolation. They follow spatial sequences. A rezoning event in one area predicts subsequent rezonings in adjacent areas with measurable lead times.

Tenure transition signals — Consolidation of adjacent parcels by a single entity, changes in lot status, and shifts in ownership tenure duration are precursors to development activity.

Value inflection points — The moment a parcel’s probability of transition crosses a threshold, its residual value (current use) diverges from its potential value (highest and best use). Prophet identifies these inflection points.

The Stack

LayerTechnologyRole
PipelineDagsterMulti-partition asset orchestration with full observability
GeospatialGeoPandas, Shapely, MercantileGrid generation, spatial joins, tile enumeration
StorageGeoParquet, pickleColumnar geospatial persistence, MVT byte cache
ServingFastAPIPre-computed MVT tile serving (<5ms)
RenderingMapLibre GL JSGPU-accelerated vector tile visualisation
StateZustandReactive store with viewport constraint system
DesktopElectronCross-platform packaging
ConfigYAMLSource registry and modeller parameter management

Market

Prophet operates at the intersection of predictive analytics and property:

SegmentSizeGrowth
Geospatial Analytics$14.1B by 203014.2% CAGR
PropTech$33.6B by 203016.8% CAGR
Property Data & Valuations$8.7B by 202811.3% CAGR

Property developers making land acquisition decisions. Infrastructure funds evaluating corridor investments. Local government forecasting housing supply and infrastructure load. Institutional investors seeking quantified, backtestable exposure to land transitions. Valuation firms needing data-driven highest-and-best-use assessments.

Relationship to Starling

Prophet is the physical-world intelligence complement to Starling, our financial signal intelligence platform.

The thesis is the same: public data, properly modelled over time, contains latent predictive signal that markets haven’t priced.

DimensionStarlingProphet
DomainEquitiesLand and property
Signal sourceForum predictionsCadastral history
IntelligenceWho predicts accuratelyWhat land will become
ProcessingNLP + backtestingSpatial ML + temporal grids
OutputRanked trading signalsTenure and value forecasts
AlphaBefore consensus formsBefore transitions happen
PipelinePostgreSQL event-drivenDagster asset-based

Roadmap

Phase 1 (Current): Temporal Visualisation Engine — POC complete. Blob matrix pipeline ingests QLD cadastral snapshots from ArcGIS FeatureServer, generates time-partitioned cellular grids across zoom levels 12-14, and serves interactive MVT tiles via FastAPI. The data foundation is proven.

Phase 2: Multi-Source Enrichment — Overlay zoning layers, planning scheme amendments, ownership transfer records, and transaction data onto the temporal grid. Each additional signal dimension increases model expressiveness.

Phase 3: Pattern Training — Train temporal models on the enriched grid. Identify subdivision precursors, rezoning sequences, and tenure transition signatures. Backtest against held-out time periods.

Phase 4: Prediction API — Expose parcel-level predictions via API. Subscription model by coverage region and prediction horizon.

Phase 5: National Scale — Extend to all Australian states. Each new jurisdiction adds training data and validates model generalisability.


Prophet is in private development. Contact blake@drksci.com for a technical demonstration or investment discussion.