A Burla demo · 1,000 CPUs · … reviews parsed

Every Amazon review is a cry for help.
We distilled the loudest ones.

We streamed 275 GB of raw Amazon reviews across 34 categories through a Burla cluster with 1,000 workers in parallel, and ranked them by seven flavors of unhinged: profanity, screaming, punctuation bombs, short-brutal one-liners, full-blown rants, five-star obscene, and five-star silent.

Flip Unhinged Mode to swap the wall for the rawest content we found. Every f-bomb, censored slur, and full-caps meltdown three map-reduce passes could surface. Nothing sanitized. Reader discretion strongly advised.

Wall of Rants → Wall of Fucked Up → Read the findings

…reviews parsed

…reviews w/ profanity

…categories

…global profanity rate

1,000Burla workers

275 GBdata streamed

Methodology

How the distillation works. Reproducible on any Burla cluster.

Data

McAuley-Lab / Amazon-Reviews-2023, the raw JSONL review files (275 GB across 34 categories, 2023 snapshot). Streamed directly from Hugging Face via HTTP Range requests, no local download.

Pipeline

Plan. Split each review file into ~500 MB byte-range chunks (~550 chunks total).
Scatter. Dispatch chunks to Burla cluster (up to 1,000 parallel workers).
Map. Each worker streams its byte range, line-aligns, parses JSONL, scores each review on seven signals, keeps top-K per signal in heaps. Writes one JSON per chunk.
Reduce. One Burla worker per category merges all shard outputs into one per-category JSON (heaps merged, rating histograms summed).
Analyze. Client builds findings + Wall + per-category pages.

Scoring

Rule-based, no LLM: word lists for strong/medium/mild profanity, caps-ratio for screaming, consecutive-exclamation-run for punctuation bombs, length × rage for rants, short × profanity for one-liner brutality, rating × profanity for five-star obscene, rating × length-zero for five-star silent.

Unhinged Mode (three-pass pipeline)

The first pass surfaced too many soft "crap" reviews, so we re-ran the full 275 GB / 571 M review corpus on 1,000 CPUs in parallel, twice more. Pass two targeted a hard-profanity lexicon (~14 strong roots) and produced 496,668 hits. Pass three added a censor-aware lexicon (asterisk, symbol, and zero-width variants) plus a categorized slur taxonomy (racial, homophobic, ableist, xenophobic, gendered), and classified each hit as deploy / quote_crit / reclaim / ambiguous. Results were rescored locally to drop proper-noun and idiom false positives (TV shows, pet breeds, brand names) and boost angry physical-product rants. The two hard passes were merged and deduped into the single Unhinged corpus you see when the toggle is on.

Source

Open source on GitHub. Reproduce this on your own cluster in ~15 minutes with one burla run scale.py.

Warning

These are real reviews written by real people on Amazon. Unhinged Mode unlocks content that includes profanity, racial slurs, homophobic slurs, ableist slurs, and descriptions of regret. Slur tokens are auto-redacted with category badges. The raw strings are never displayed. If that's not your thing, leave the toggle off, or close this tab.

Credits

Dataset: McAuley Lab, UCSD. Pipeline: Burla. UI: plain HTML + CSS + JS, zero frameworks.

Every Amazon review is a cry for help.
We distilled the loudest ones.

The Wall of Rants

Browse by category

What we found

Methodology

Data

Pipeline

Scoring

Unhinged Mode (three-pass pipeline)

Source

Warning

Credits

Every Amazon review is a cry for help.We distilled the loudest ones.

The Wall of Rants

Browse by category

What we found

Methodology

Data

Pipeline

Scoring

Unhinged Mode (three-pass pipeline)

Source

Warning

Credits

Every Amazon review is a cry for help.
We distilled the loudest ones.