Deep Learning for Beginners
上QQ阅读APP看书,第一时间看更新

Ethical implications of manipulating data

There are many ethical implications and risks when manipulating data that you need to know. We live in a world where most deep learning algorithms will have to be corrected, by re-training them, because it was found that they were biased or unfair. That is very unfortunate; you want to be a person who exercises responsible AI and produces carefully thought out models. 

When manipulating data, be careful about removing outliers from the data just because you think they are decreasing your model's performance. Sometimes, outliers represent information about protected groups or minorities, and removing those perpetuates unfairness and introduces bias toward the majority groups. Avoid removing outliers unless you are absolutely sure that they are errors caused by faulty sensors or human error. 

Be careful of the way you transform the distribution of the data. Altering the distribution is fine in most cases, but if you are dealing with demographic data, you need to pay close attention to what you are transforming.

When dealing with demographic information such as gender, encoding female and male as 0 and 1 could be risky if we are considering proportions; we need to be careful not to promote equality (or inequality) that does not reflect the reality of the community that will use your models. The exception is when our current reality shows unlawful discrimination, exclusion, and bias. Then, our models (based on our data) should not reflect this reality, but the lawful reality that our community wants. That is, we will prepare good data to create models not to perpetuate societal problems, but models that will reflect the society we want to become.