Burla demo · April 2026

1.1 million Airbnbs,
looked at all at once.

Every public Airbnb listing in Inside Airbnb's open data dump, 1.4M photos we could pull from public listing pages, and every one of 50M reviews. CLIP, YOLOv8, and Claude Haiku running in parallel on Burla, in 10.8 hours of wall time.

--Listings
--Photos scraped
--CLIP-scored
--Reviews scored
--Peak workers
--GPU detections
--Hours wall
--Total spend

Public Inside Airbnb dumps for the listings and reviews. Photo URLs scraped from public listing pages. Correlations use bootstrap 95% CIs with n ≥ 100 per bucket. Click any photo to expand it, any review to read it in full. Demand proxy is reviews_per_month, not actual bookings.

Every flagged listing on a map

Each dot is a listing flagged by one of the photo detectors below, color-coded by category. Drag, zoom, click for the listing.

Worst TV placements in 1.1M Airbnbs

YOLOv8 confirmed a TV in the upper half of the photo and CLIP also flagged it as “TV mounted above a fireplace.” Click any photo for the full image and the listing.

--

Messiest photos a host actually posted

Top by CLIP score against “a messy cluttered room with stuff everywhere.” Click a photo to expand it. We are not naming the listings; the link is there if you really need to.

--

The most plant-maximalist Airbnbs

CLIP “room full of houseplants” combined with YOLOv8 potted-plant counts.

--

Funniest reviews from 50 million

A 3-tier funnel: regex on every review, embedding cluster on the top 200k, Claude Haiku on the top 10k. Click any card to read the full review.

How it ran on Burla

Burla ships a Python function plus its imports to a fleet of containers running the standard python:3.12 image. The orchestrator returns them as a single iterator. The shared filesystem at ./shared is GCSFuse-mounted to a GCS bucket and visible from every worker, which is what makes this kind of slice-and-merge pipeline cheap.

At peak, Burla had 1,000 containers cooperating on the photo pipeline and another 1,000 chewing through 50M reviews. The whole 1.4M-photo index moved through CLIP in roughly 24 minutes once the fleet was warm.

1,000 concurrent containers at peak, across the photo download, CLIP scoring, and review-tier-1 stages. GPU stages topped out at 30 workers (T4 quota); Claude Haiku ran 200-wide.

Full writeup is on GitHub. Burla docs are at docs.burla.dev.

score_photos.py
from burla import remote_parallel_map

def score_batch(args: ScoreArgs) -> list[dict]:
    import torch, open_clip, pyarrow.parquet as pq
    model, _, prep = open_clip.create_model_and_transforms(
        "ViT-B-32",
        pretrained="laion2b_s34b_b79k",
        cache_dir="./shared/airbnb/clip_weights",
    )
    # read row group, encode, score against prompts,
    # write a parquet shard back to /workspace/shared.

results = remote_parallel_map(
    score_batch,
    batch_args,                  # ~6,000 row-group batches
    func_cpu=2, func_ram=8,
    max_parallelism=1000,    # 1k concurrent at peak
    grow=True, spinner=True,
)

Does any of this actually predict bookings?

For each idea below, we sort all 1.1M listings into a few groups (like “darkest photos” vs “brightest photos”) and check whether the more popular ones really do land in one group. Two of four ideas held up. Two did not.

How to read these cards

The bar shows how popular the average listing in a group is. The tick is our best guess. The wider band is the range we are confident covers the real number.
n = 240.5K n is how many listings ended up in that group. 240.5K means 240,500. Bigger groups give us tighter, more trustworthy bars.
ACCEPTED REJECTED Accepted means the bars in the card are clearly separated, so the groups really are different. Rejected means the bars overlap, so we cannot tell the groups apart.