Artificial Intelligence Against Corruption

Corruption is a global problem that costs governments around the world 1 trillion dollars every year, according to the IMF. Can new technologies make this a problem of the past?

Photo by Possessed Photography on Unsplash

orruption grows when accountability is low — it is hard to imagine a politician abusing their power for personal gain if they knew for certain that they would get caught and punished. This is why improving accountability is a wining strategy for fighting corruption, and Artificial Intelligence technology can help us do that.

Whether we realize it or not, AI technologies that spot wrongdoing are already all around us. Credit card companies, for example, have been using it for years — if your card is used in strange websites, to buy strange products, in a price range that is strange to your normal behavior, the company’s AI models are likely to flag it as suspicious. And it does so incredibly fast for millions and millions of transactions everyday.

Photo by Bermix Studio on Unsplash

All of this is possible because credit card companies have the data necessary to train machines to do the job. With enough data, the same technologies could be used to fight corruption.

The Data Out There

Data availability is evidently crucial for the potential of the technology to be realized. Countries with a combination of high corruption and high availability of data would therefore make great case studies to explore AI’s potential for change.

In a research report published in 2019, Oxford Insights pointed out that such a group of countries exists: Brazil, Colombia, Mexico and Russia, among others, are well ranked in terms of open data and have high levels of perceived corruption.

But how easy is it to actually find and use the data to create a Machine Learning model from the ground up?

In order to see it for myself , I decided to build my very own fraud detection model with data from Brazil and see the kind of results I would get.

Rio De Janeiro, Photo by Mariano Diaz on Unsplash

Brazil is ranked 8th best in the Global Open Data Index, and scores just 35 out of 100 in the Corruption Perceptions Index according to

Proof of Concept

The first step in this project was to decide which type of corruption to focus on. After some initial research I decided to use political campaigns’ financial data from the 2018 Brazilian Elections and try to predict which candidates would be investigated for financial infractions.

All the data I used can be found at Tribunal Superior Eleitoral websites such as this one and this one.

Photo by Omid Kashmari on Unsplash

The data consisted of information about candidates’ personal information, assets, donations received, expenditures and election results. It also included a list of investigations related to financial infractions in the 2018 elections, without the verdict. This is how the model would be told who might have committed infractions.

The final dataset had over 26 thousand campaigns listed, 8.5 thousand of which had been subjected to an investigation related to their finances.

The Model

The Machine Learning model I set out to build would take in 70% of all this data to learn from, and then apply what it learned on the remaining 30%, which it had never seen before.

What the model does is finding the patterns associated with the known cases of financial infractions by using a collection of decision trees.

Example of a Decision Tree
Example of a Decision Tree
Decision Tree

To illustrate this, let’s think of an example. Let’s say that a significant portion of the infraction cases relate to candidates that have received the majority of its funds from a single source, and spent most of it on general services. In this case, the model might conclude that this is a suspicious pattern.

It would then incorporate this reasoning in a decision tree: if majority of funds comes from single source, check if most of it was spent on general services and if it was, flag it.

Note that this hypothetical example doesn’t relate to specific rules or laws, it is just a pattern. Training a ML model does not mean feeding the rule book to the machine, which is a a good thing. Campaign finance regulation is, in every nation, a complex subject with many particularities. If I had to learn it all to build a Machine Learning model, I might never do it.

In the end, the model uses all the data it was given and figures out the set of decisions that will lead to the best classification results possible.


After a few weeks of work, these were the results of the final model:

  • It was correct 92.1% of the time when asked to classify each campaign as suspicious or not
  • Out of all cases the model classified as suspicious, 91% had been investigated by the authorities
  • Out of all actual cases of suspected infractions, the model flagged 84.3% as suspicious

Better results could be achieved with more advanced models and more time invested in such a project. However, as a proof of concept the results make it clear that machine learning can be a useful tool in combating corruption.

Ok, But So What?

Even in a short period of time, a Machine Learning model can perform significantly better than a single person would. And it means that I don't have to be a lawyer to do something about corruption in my city or country.

Obviously, programming a ML model is not for everyone. But when it comes to investigating and fighting corruption, a local newspaper might find it easier to hire a computer geek then a specialist in election law — and the geek can do other stuff too.

My point is, by allowing the public to conduct research using open data, AI models could provide watchdogs, investigative journalists and civil society with a valuable tool to increase accountability in politics. Corruption will still be a problem in the long term, but new technological frontiers provide hope that we are up to the challenge.

If you are interested in checking out the details of this model, you can get it here.

Data Scientist, Economist with a background in Banking