1.1 million Airbnbs, looked at all at once Burla

How it ran on Burla

Burla ships a Python function plus its imports to a fleet of containers running the standard python:3.12 image. The orchestrator returns them as a single iterator. The shared filesystem at ./shared is GCSFuse-mounted to a GCS bucket and visible from every worker, which is what makes this kind of slice-and-merge pipeline cheap.

At peak, Burla had 1,000 containers cooperating on the photo pipeline and another 1,000 chewing through 50M reviews. The whole 1.4M-photo index moved through CLIP in roughly 24 minutes once the fleet was warm.

1,000 concurrent containers at peak, across the photo download, CLIP scoring, and review-tier-1 stages. GPU stages topped out at 30 workers (T4 quota); Claude Haiku ran 200-wide.

Full writeup is on GitHub. Burla docs are at docs.burla.dev.

score_photos.py

from burla import remote_parallel_map

def score_batch(args: ScoreArgs) -> list[dict]:
    import torch, open_clip, pyarrow.parquet as pq
    model, _, prep = open_clip.create_model_and_transforms(
        "ViT-B-32",
        pretrained="laion2b_s34b_b79k",
        cache_dir="./shared/airbnb/clip_weights",
    )
    # read row group, encode, score against prompts,
    # write a parquet shard back to /workspace/shared.

results = remote_parallel_map(
    score_batch,
    batch_args,                  # ~6,000 row-group batches
    func_cpu=2, func_ram=8,
    max_parallelism=1000,    # 1k concurrent at peak
    grow=True, spinner=True,
)

Does any of this actually predict bookings?

For each idea below, we sort all 1.1M listings into a few groups (like “darkest photos” vs “brightest photos”) and check whether the more popular ones really do land in one group. Two of four ideas held up. Two did not.

How to read these cards

The bar shows how popular the average listing in a group is. The tick is our best guess. The wider band is the range we are confident covers the real number.

n = 240.5K n is how many listings ended up in that group. 240.5K means 240,500. Bigger groups give us tighter, more trustworthy bars.

ACCEPTED REJECTED Accepted means the bars in the card are clearly separated, so the groups really are different. Rejected means the bars overlap, so we cannot tell the groups apart.

1.1 million Airbnbs,
looked at all at once.

Every flagged listing on a map

Worst TV placements in 1.1M Airbnbs

Messiest photos a host actually posted

The most plant-maximalist Airbnbs

Funniest reviews from 50 million

How it ran on Burla

Does any of this actually predict bookings?

How to read these cards