GDAL raster processing

Process every raster tile, not a pretty subset

In this example we:

  • Process 2,000 Sentinel-2 tiles.
  • Read red and near-infrared bands from S3.
  • Compute NDVI and write a report with per-tile stats.

The first tile usually works. The full region is where missing bands, bad nodata values, CRS surprises, and requester-pays mistakes show up.

Step 1: Make one task per tile

The input is just a list of tile ids.

SRC_BUCKET = "sentinel-s2-l2a"
DST_BUCKET = "my-ndvi-outputs"

with open("sentinel_tiles.txt") as f:
    tile_ids = [line.strip() for line in f if line.strip()]

Step 2: Compute NDVI in the worker

The worker reads both bands, computes NDVI, and returns summary stats.

def compute_ndvi(tile_id: str) -> dict:
    import boto3, numpy as np, rasterio
    from rasterio.io import MemoryFile

    s3 = boto3.client("s3", region_name="eu-central-1")
    def read_band(band: str):
        key = f"tiles/{tile_id}/{band}.jp2"
        body = s3.get_object(Bucket=SRC_BUCKET, Key=key, RequestPayer="requester")["Body"].read()
        with MemoryFile(body) as mem, mem.open() as src:
            return src.read(1).astype("float32"), src.profile

    red, profile = read_band("B04")
    nir, _ = read_band("B08")
    ndvi = (nir - red) / (nir + red + 1e-6)
    profile.update(driver="GTiff", dtype="float32", count=1, compress="DEFLATE", tiled=True)
    return {"tile_id": tile_id, "mean_ndvi": float(ndvi.mean()), "pixels": int(ndvi.size)}

Step 3: Run the tiles

Each tile gets two CPUs and enough RAM for the bands.

from burla import remote_parallel_map

results = remote_parallel_map(compute_ndvi, tile_ids, func_cpu=2, func_ram=8, grow=True)
pd.DataFrame(results).to_csv("ndvi_report.csv", index=False)

What's the point?

A pretty subset can produce a convincing map and still miss the data-quality problem.

For geospatial work, I want one task to own one tile, scene, or chip group. Keep the source reads and output writes inside the worker. Return enough stats that the report can catch suspicious tiles before they quietly enter a model.