A Burla demo · 1,000 CPUs · … reviews parsed
We streamed 275 GB of raw Amazon reviews across 34 categories through a Burla cluster with 1,000 workers in parallel, and ranked them by seven flavors of unhinged: profanity, screaming, punctuation bombs, short-brutal one-liners, full-blown rants, five-star obscene, and five-star silent.
Flip Unhinged Mode to swap the wall for the rawest content we found. Every f-bomb, censored slur, and full-caps meltdown three map-reduce passes could surface. Nothing sanitized. Reader discretion strongly advised.
Loading…
34 categories, ranked by how many reviews contain profanity. Click one to see the top 30 most fucked-up reviews in that category.
Nine ways Amazon reviews go off the rails.
How the distillation works. Reproducible on any Burla cluster.
McAuley-Lab / Amazon-Reviews-2023, the raw JSONL review files (275 GB across 34 categories, 2023 snapshot). Streamed directly from Hugging Face via HTTP Range requests, no local download.
Rule-based, no LLM: word lists for strong/medium/mild profanity, caps-ratio for screaming, consecutive-exclamation-run for punctuation bombs, length × rage for rants, short × profanity for one-liner brutality, rating × profanity for five-star obscene, rating × length-zero for five-star silent.
The first pass surfaced too many soft "crap" reviews, so we re-ran the full
275 GB / 571 M review corpus on 1,000 CPUs in parallel, twice more.
Pass two targeted a hard-profanity lexicon (~14 strong roots) and produced
496,668 hits. Pass three added a censor-aware lexicon
(asterisk, symbol, and zero-width variants) plus a categorized slur taxonomy
(racial, homophobic, ableist, xenophobic, gendered), and classified each hit
as deploy / quote_crit / reclaim /
ambiguous. Results were rescored locally to drop proper-noun and
idiom false positives (TV shows, pet breeds, brand names) and boost angry
physical-product rants. The two hard passes were merged and deduped into the
single Unhinged corpus you see when the toggle is on.
Open source on GitHub.
Reproduce this on your own cluster in ~15 minutes with one burla run scale.py.
These are real reviews written by real people on Amazon. Unhinged Mode unlocks content that includes profanity, racial slurs, homophobic slurs, ableist slurs, and descriptions of regret. Slur tokens are auto-redacted with category badges. The raw strings are never displayed. If that's not your thing, leave the toggle off, or close this tab.
Dataset: McAuley Lab, UCSD. Pipeline: Burla. UI: plain HTML + CSS + JS, zero frameworks.