Prophet

The Thesis

Land doesn't change randomly.

Subdivision patterns, rezoning sequences, and tenure transitions follow learnable trajectories. What if a model trained on decades of cadastral history could tell you what a parcel will become?

The Insight

Every piece of land has a history. Before a greenfield estate becomes 400 residential lots, it was a single rural holding. Before that rural holding was rezoned, the corridor around it changed first — infrastructure extended, adjacent parcels consolidated, planning overlays shifted.

These transitions aren’t random. They follow patterns that repeat across decades and geographies. A parcel’s neighbours, its zoning trajectory, the timing of surrounding subdivisions, the sequence of ownership changes — these are features in a prediction problem that nobody is modelling at scale.

The data exists. Australian state governments maintain cadastral registries with historical snapshots going back years. Every lot plan, every boundary change, every status transition is recorded. But it sits in static GIS layers that nobody treats as training data.

Prophet treats it as training data.

Architecture

Prophet is a three-layer system: a Dagster-orchestrated pipeline that converts raw cadastral vectors into a cellular grid, a FastAPI tile server that serves that grid as pre-computed MVTs, and a React/MapLibre GL frontend that renders temporal comparisons in the browser.

┌─────────────────────────────────────────────────────────────────┐
│                       PROPHET ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DAGSTER PIPELINE              FASTAPI SERVER    REACT FRONTEND │
│  ───────────────              ──────────────    ──────────────  │
│                                                                 │
│  source_metadata               GET /perspectives  MapLibre GL   │
│       │                        /{id}/mvt/{tick}   VectorTileSource│
│  source_vector_data            /0/{z}/{x}/{y}.mvt     │         │
│       │                             │             setUrl() on   │
│  base_cadastre_blob_grid ──→  pickle.load(mvt)    tick change   │
│       │                             │                  │        │
│  blob_matrix_mvt_tiles        application/vnd.    Zustand store │
│  (MultiPartition:              mapbox-vector-tile  manages state│
│   time × tile)                                                  │
│                                                                 │
│  Partitions:                   Serves ~300 pre-   Temporal      │
│  T2020, T2021 ×               computed tiles     scrubbing via  │
│  z12-z14 Brisbane tiles        per time period    timeline UI   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Blob Matrix

The core data structure is a spatial grid we call the blob matrix. Instead of comparing vector boundaries across time (computationally expensive and geometrically unstable when parcels split, merge, or shift), Prophet rasterises each cadastral snapshot onto a fixed cellular grid aligned to coordinate-space origins.

The grid generation algorithm aligns cells to (0,0) in the projected CRS, guaranteeing that grids produced from different spatial extents but the same cell size are perfectly aligned for temporal stacking:

# Grid alignment to coordinate origin
xmin_aligned = np.floor(xmin / cell_size) * cell_size
ymin_aligned = np.floor(ymin / cell_size) * cell_size
xmax_aligned = np.ceil(xmax / cell_size) * cell_size
ymax_aligned = np.ceil(ymax / cell_size) * cell_size

cols = np.arange(xmin_aligned, xmax_aligned, cell_size)
rows = np.arange(ymin_aligned, ymax_aligned, cell_size)

Each cell is a Shapely box(x, y, x + cell_size, y + cell_size) with a deterministic ID derived from its grid coordinates: cell_{col}_{row}. A 10-metre grid over the Brisbane cadastral extent produces tens of thousands of cells per time period.

The cellularisation step then tags each grid cell with the attributes of every parcel it intersects via gpd.sjoin(). A single cell that overlaps multiple parcels appears multiple times in the output — one row per grid-cell/parcel intersection. The geometry stays as the original grid cell (not the intersection polygon), making the output directly suitable for vector tile encoding.

This representation enables three things that raw cadastral vectors don’t:

Temporal alignment — Identical grid IDs across years produce a clean time-series per cell, regardless of how the underlying parcel boundaries changed. A cell that was tagged with one rural lot plan in 2020 and three residential lot plans in 2025 carries that transition as a structured attribute delta.
ML-ready features — Each cell’s attribute history across time periods is a feature vector. The grid is the training dataset.
Tile-native storage — The grid cells map directly to MVT features with zero additional geometry processing.

Dagster Pipeline

The data pipeline uses Dagster’s asset-based orchestration with multi-dimensional partitioning. Every transformation is an auditable, reproducible asset.

The critical design decision is the MultiPartitionsDefinition that combines time and tile dimensions:

time_partitions = StaticPartitionsDefinition(["T2020", "T2021"])

# Generate tile partition keys for Brisbane (z12-z14)
tile_keys = get_brisbane_tile_keys(zooms=range(12, 15))
tile_partitions = StaticPartitionsDefinition(tile_keys)

time_tile_partitions = MultiPartitionsDefinition({
    "time": time_partitions,
    "tile": tile_partitions
})

The tile key generation uses mercantile to enumerate all tiles in the Brisbane bounding box across zoom levels 12-14, producing ~150 tile keys per zoom level. Combined with 2 time partitions, the pipeline manages ~300+ discrete MVT assets, each independently materializable.

The asset DAG flows:

Asset	Partitioning	Input	Output
`source_metadata`	Per source	YAML config	Source registry
`source_vector_data`	Per source × time	ArcGIS FeatureServer	GeoDataFrame (EPSG:3857)
`base_cadastre_blob_grid`	Per time	Source vectors	Tagged cellular grid
`blob_matrix_mvt_tiles`	Time × tile	Blob grid	MVT bytes

The blob grid asset takes raw parcel vectors, creates the aligned grid via create_grid(), then performs the spatial join via cellularize_parcels(). The MVT asset then clips the grid to each tile’s Web Mercator bounds, reprojects to EPSG:4326, encodes features with mapbox_vector_tile.encode() at 4096 extent with 256-unit buffer, and persists the bytes via Dagster’s filesystem IO manager.

New years of cadastral data process incrementally. The pipeline ingests annual snapshots as they’re published and extends the training window without reprocessing history.

Tile Serving

FastAPI serves pre-computed MVTs at sub-5ms response times. The endpoint follows standard slippy map conventions:

GET /perspectives/{perspective_id}/mvt/{tick}/0/{z}/{x}/{y}.mvt
→ Content-Type: application/vnd.mapbox-vector-tile

Each tile is stored as a pickled bytes object by Dagster’s filesystem IO manager, keyed by the multi-partition path {tile_partition}/{time_partition}. The server deserialises and returns raw bytes — no runtime geometry processing, no database queries, no spatial computation at serve time.

This is the key performance insight: all spatial computation happens once during pipeline materialisation. The tile server is a glorified file server with content-type headers.

Frontend

The React frontend uses MapLibre GL JS for rendering and Zustand for state management. The critical interaction is temporal scrubbing — switching between time periods to observe cadastral change.

When the user selects a different time period, the frontend calls VectorTileSource.setUrl() on the existing MapLibre source, swapping the tile URL template to point at the new time partition. MapLibre handles cache invalidation and tile re-fetching automatically. The map doesn’t reinitialise; only the tile data changes.

const mvtUrl = getMvtTileUrl(perspectiveId, selectedTick);
(source as VectorTileSource).setUrl(mvtUrl);

The Zustand store manages source selection, tick selection, map viewport state, and a viewport size constraint system that calculates approximate viewport dimensions in metres (via Haversine) to enforce modeller size limits from server-side configuration.

An inverse-bbox overlay (a polygon covering the entire world except the source extent) provides visual context for the dataset boundary. The overlay uses a GeoJSON polygon with an outer ring at [-180.1, -90.1] to [180.1, 90.1] and an inner ring cut from the source’s bounding box.

The application ships as an Electron desktop app for cross-platform delivery.

Why This Matters

The Prediction Gap

Property markets are valued on comparable sales and discounted cash flows. Both methods are backward-looking. They tell you what land was worth; they don’t tell you what it will become.

The highest-value information in property development is knowing which land will transition — from rural to residential, from low-density to high-density, from fragmented ownership to consolidated parcels. Developers who identify these transitions early acquire at agricultural prices and sell at residential prices. The margin between those two numbers is where billions of dollars are made.

Today, identifying transition candidates requires local knowledge, planning contacts, and years of experience driving corridors. It doesn’t scale. It can’t be backtested. It’s invisible to institutional capital.

Prophet makes it visible, quantifiable, and scalable.

What the Model Learns

By training on historical cadastral time-series, Prophet identifies patterns like:

Subdivision precursors — A rural parcel surrounded by recently subdivided lots, within 2km of an infrastructure corridor extension, with a planning overlay change in the last 3 years, has a quantifiable probability of subdividing within the next 5 years.

Rezoning trajectories — Parcels don’t rezone in isolation. They follow spatial sequences. A rezoning event in one area predicts subsequent rezonings in adjacent areas with measurable lead times.

Tenure transition signals — Consolidation of adjacent parcels by a single entity, changes in lot status, and shifts in ownership tenure duration are precursors to development activity.

Value inflection points — The moment a parcel’s probability of transition crosses a threshold, its residual value (current use) diverges from its potential value (highest and best use). Prophet identifies these inflection points.

The Stack

Layer	Technology	Role
Pipeline	Dagster	Multi-partition asset orchestration with full observability
Geospatial	GeoPandas, Shapely, Mercantile	Grid generation, spatial joins, tile enumeration
Storage	GeoParquet, pickle	Columnar geospatial persistence, MVT byte cache
Serving	FastAPI	Pre-computed MVT tile serving (<5ms)
Rendering	MapLibre GL JS	GPU-accelerated vector tile visualisation
State	Zustand	Reactive store with viewport constraint system
Desktop	Electron	Cross-platform packaging
Config	YAML	Source registry and modeller parameter management

Market

Prophet operates at the intersection of predictive analytics and property:

Segment	Size	Growth
Geospatial Analytics	$14.1B by 2030	14.2% CAGR
PropTech	$33.6B by 2030	16.8% CAGR
Property Data & Valuations	$8.7B by 2028	11.3% CAGR

Property developers making land acquisition decisions. Infrastructure funds evaluating corridor investments. Local government forecasting housing supply and infrastructure load. Institutional investors seeking quantified, backtestable exposure to land transitions. Valuation firms needing data-driven highest-and-best-use assessments.

Relationship to Starling

Prophet is the physical-world intelligence complement to Starling, our financial signal intelligence platform.

The thesis is the same: public data, properly modelled over time, contains latent predictive signal that markets haven’t priced.

Dimension	Starling	Prophet
Domain	Equities	Land and property
Signal source	Forum predictions	Cadastral history
Intelligence	Who predicts accurately	What land will become
Processing	NLP + backtesting	Spatial ML + temporal grids
Output	Ranked trading signals	Tenure and value forecasts
Alpha	Before consensus forms	Before transitions happen
Pipeline	PostgreSQL event-driven	Dagster asset-based

Roadmap

Phase 1 (Current): Temporal Visualisation Engine — POC complete. Blob matrix pipeline ingests QLD cadastral snapshots from ArcGIS FeatureServer, generates time-partitioned cellular grids across zoom levels 12-14, and serves interactive MVT tiles via FastAPI. The data foundation is proven.

Phase 2: Multi-Source Enrichment — Overlay zoning layers, planning scheme amendments, ownership transfer records, and transaction data onto the temporal grid. Each additional signal dimension increases model expressiveness.

Phase 3: Pattern Training — Train temporal models on the enriched grid. Identify subdivision precursors, rezoning sequences, and tenure transition signatures. Backtest against held-out time periods.

Phase 4: Prediction API — Expose parcel-level predictions via API. Subscription model by coverage region and prediction horizon.

Phase 5: National Scale — Extend to all Australian states. Each new jurisdiction adds training data and validates model generalisability.

Prophet is in private development. Contact blake@drksci.com for a technical demonstration or investment discussion.