Burla demo · April 2026

Every Airbnb,
looked at all at once.

Every public listing in Inside Airbnb's open dump, 119 cities, 4 quarterly snapshots. We scored 1.7M photos with CLIP (a model that turns an image into a vector you can compare to a text prompt), shortlisted the most suspicious ones, and had Claude Haiku Vision double-check each shortlist. We also scored every review and reranked the weirdest 12K with Haiku. Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster.

--Listings
--Photos scraped
--Reviews scored
--CLIP-scored
--GPU detections
--Peak workers

Listings, reviews, and calendars come straight from public Inside Airbnb dumps. The findings cards below use bootstrap 95% confidence intervals on each listing's 365-night calendar occupancy (how booked a listing is over the next year, our demand proxy). Click any photo to expand it. Click any review to read it in full.

Every flagged listing on a map

Each dot is a listing flagged by one of the Haiku-validated photo detectors below, color-coded by category. Drag, zoom, click for the listing.

Listings with drug-den vibes

CLIP shortlisted “messy room” candidates, then Claude Haiku Vision kept only the ones that look less like an Airbnb and more like an opium den. Bare bulb, mattress on the floor, peeling walls, you can almost smell it through the photo.

--

The most hectic kitchens

CLIP shortlisted “messy room” candidates, then Claude Haiku Vision said the photo is genuinely a chaotic kitchen, not just a small one.

--

Cats and dogs Claude said are actually real

CLIP shortlisted pet-shaped candidates from 1.7M photos, then Claude Haiku Vision said “yes, that is a real cat or dog.” Paintings, throw pillows, and rugs that looked vaguely animal-shaped were rejected.

--

Worst TV placements across every public Airbnb

CLIP shortlisted “TV mounted way too high” candidates from 1.7M photos, then Claude Haiku Vision confirmed each one as either above-fireplace or unusually-high.

--

Funniest reviews from 50 million

A 3-tier funnel: regex on every review, embedding cluster on the top 200k, Claude Haiku on the top 12k. Filter by category, city, or year, or just type any word to search. Click any card to read it in full.

--

How it ran on Burla

Burla is a high-performance parallel processing library for data teams that iterate quickly. You write a Python function, you call remote_parallel_map, and it runs across a cluster with a shared filesystem mounted at ./shared. No Docker, no Kubernetes, no orchestration glue.

For this run a single dynamic cluster scaled CPU workers up to ~1.7K for photo download and CLIP scoring, and the same cluster ran 20 A100 GPUs for embedding-cluster work, in parallel with the CPU jobs. Claude Haiku validation ran rate-limited on top.

-- concurrent workers at peak across photo download, CLIP scoring, and review tier-1. 20 A100 GPUs ran in parallel on the same cluster, while CPU jobs kept going.

Full writeup is on GitHub.
Burla docs are at docs.burla.dev.

# s02b: download every photo URL, score with CLIP,
# write parquet shards to ./shared. 6K batches.
from burla import remote_parallel_map
import open_clip

def score_batch(args):
    model, _, prep = open_clip.create_model_and_transforms(
        "ViT-B-32", pretrained="laion2b_s34b_b79k",
        cache_dir="./shared/clip_weights",
    )
    # download -> encode -> cosine vs PROMPTS -> parquet
    return {"shard": shard, "n_ok": n_ok}

remote_parallel_map(
    score_batch, batch_args,
    func_cpu=2, func_ram=8,
    max_parallelism=1000,   # 1k concurrent at peak
    grow=True,
)
# s04 tier 2: embed top 200K reviews with SBERT,
# one parquet shard per worker on ./shared.
from burla import remote_parallel_map
from sentence_transformers import SentenceTransformer

def embed_batch(args):
    model = SentenceTransformer(
        "all-MiniLM-L6-v2",
        cache_folder="./shared/sbert",
    )
    rows = read_slice(
        args.input_path, args.row_start, args.row_end,
    )
    vecs = model.encode(
        rows["comments"].tolist(), batch_size=128,
    )
    write_shard(args.output_root, rows, vecs)
    return {"n_ok": len(rows)}

remote_parallel_map(
    embed_batch, embed_args,
    func_cpu=2, func_ram=8, max_parallelism=200,
    grow=True,
)
# s05c: Haiku Vision double-checks the CLIP
# shortlists. Rate-limited at 64 workers.
from burla import remote_parallel_map
import anthropic, json

def validate_pet(args):
    client = anthropic.Anthropic()
    rows = []
    for url, listing_id in args.batch:
        msg = client.messages.create(
            model="claude-haiku-4-5", max_tokens=200,
            messages=pet_prompt(fetch(url)),
        )
        verdict = json.loads(msg.content[0].text)
        rows.append({"listing_id": listing_id, **verdict})
    write_shard(args.output_path, rows)
    return {"n_ok": len(rows)}

remote_parallel_map(
    validate_pet, pet_batches,
    func_cpu=2, func_ram=8, max_parallelism=64,
    grow=True,
)

Does any of this actually predict demand?

For each idea below, we sort every listing into a few groups (like “darkest photos” vs “brightest photos”) and check whether the higher-occupancy ones really do land in one group. We accept an idea only when no two groups overlap.

How to read these cards

The bar shows the median % booked over the next 365 nights for each group, our demand proxy. Further right means more booked. The tick is our best guess; the wider band is the range we're confident covers the real number.
n = 240.5K n is how many listings ended up in that group. 240.5K means 240,500. Bigger groups give us tighter, more trustworthy bars.
ACCEPTED REJECTED Accepted means the bars in the card are clearly separated, so the groups really are different. Rejected means the bars overlap, so we cannot tell the groups apart.