A Design Science Research to Correct NLP Biases Americas Conference on Information Systems 1 A Design Science Research to Correct Inherent Biases in Natural Language Applications Emergent Research Forum (ERF) Jasmin Manseau Queen`s University jasmin.manseau@queensu.ca Ikenna Mbuko Queen`s University ikenna.mbuko@queensu.ca Abstract Developing natural language applications such as chatbots and intelligent assistants like Alexa, Siri and Cortana is currently a significant undertaking of many organizations who are seeking greater customer engagement. These applications rely on natural language that learns from human content and this content is often subject to systematic and universal biases such as race and gender stereotypes. These biases can be transferred to natural language processing applications which have been found to behave erratically in some instances. This unpredictability is linked to the increasing reliance humans place on the recommendations provided by these applications. There is a risk of humans circulating false information, which can mislead or amplify biases. This research proposes to investigate the impact of machine learning on human biases, such as gender and racism that have been systematically present since the emergence of the human corpus using design science research. Keywords Design science, natural language, NLP, biases. Introduction As humans divert and automate interactions towards machine communication, existing systematic biases in learning datasets pose an issue. Organizations are rushing to create chatbots and AI applications to support customers and develop natural language applications to interact with an increasing number of tasks. It is well-established that biases such as gender and race exist in language and can be amplified by these applications. This is because algorithms trained using historical data may amplify systematic biases that are present in the data. Researchers studying biases and systematic issues in datasets that are trained in implementing algorithms have highlighted several disconcerting examples emphasizing the ambivalent nature of the technology such as issues with making decisions in the judicial systems (Angwin et al. 2016) and issues with recruitment of candidates (Datta, Tschantz, and Datta 2015; Lambrecht and Tucker 2018). Some of these faulty or clearly problematic applications made headlines with their public failures. The Department of Homeland Security inaccurately identified air marshals and young children as a potential terrorist threats with automated data-matching algorithms (Jacoby 2016). AI applications have defamed humans which led to fake news demonstrating racist behaviours (Crawford 2016; Dormehl 2014; Meyer 2016; Silverman 2016). An image software application developed by Google inaccurately labelled individuals with black complexions as gorillas (Kasperkevic, 2015); while a software developed by Nikon inaccurately labelled individuals of Asian descent as blinking (Rose, 2010). Although it may appear harmless to mislabel images, there is a more profound issue as more algorithmic processes are integrated into daily routines. The issue remains that it is difficult for computers to understand the finely nuanced points of human language or the complicated meanings of truth. In the end, humans may end up with applications that are meant to help, but that could mislead, provide inaccurate information, or blatantly act unacceptably.