The best articles on Wikipedia are the ones that have that juicy “Controversy” heading. If I open an article, and see that heading, I go straight there every single time.
So I got to thinking: which articles have the biggest controversy sections? Which articles are mostly about controversy?
Fortunately, the wonderful people at Wikimedia provide us with all of the necessary tools to visualise this (this may take a little while to load on poor connections and is quite resource heavy after it does load, sorry about that):
You can view a full screen version of the graph here.
First, I downloaded one of the full backups of the English language version of wikipedia from https://dumps.wikimedia.org/backup-index.html (“enwiki” is the one I grabbed).
After unzipping this 58GB behemoth XML file, I parsed through it and extracted
<page> elements that had a latest revision that contained a
element that contained the string
Then I took all of those articles and calculated the size (in characters) of the controversy section and divided it by the full size (in characters) of the article.
I tried to do some categorisation of the articles (person, place, event, etc.) but wasn’t able to come up with anything that looked accurate enough to publish.
The raw data can be found at here if you want to have a play with it. :)