Generative AI models...have captured the minds of the public and inspired widespread adoption. Yet, these models contain known racial, gender, and class stereotypes and biases from their training data and other structural factors, which downstream into model outputs. Marginalized groups are the most negatively affected by these biases.
[Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248]
Whilst there is great potential for making use of generative AI, it is important to ensure you engage with the technology critically, particularly in terms of its outputs and how they have been formulated. Understanding how AI functions is critical in understanding its possibilities, its limitations and its problems. Before reflecting on AI though, it's important to reflect upon pre-existing technologies and how they function.
It's important to note, that the technology we have used and continue to use before the advent of ChatGPT was not without its problems. Conducting effective literature searching online has always been an important skill to develop, and it requires a good understanding of how search engines function. An effective, high quality literature search requires an understanding of the various operators and how to use them in your search. Some aspects of the process, however, remain a mystery. For example, does the user really know how relevance ranking actually works? Why one article is rated as more "relevant" than another?
In her book, Algorithms of Oppression, Safiya Noble explores how search engines reinforce racism in their outputs. She argues:
"Search does not merely present pages but structures knowledge, and the results retrieved in a commercial search engine create their own particular material reality. Ranking is itself information that also reflects the political, social, and cultural values of the society that search engines operate within..."
Noble S. U. (2018). Algorithms of oppression : how search engines reinforce racism. New York University Press.
Search engines do not exist outside of society, they are a product of our society and, therefore, they are susceptible to all the same biases and cultural values of the society in which they exist. Search engines are not neutral. And neither is generative AI.
The emergence of generative AI has raised many questions about education and research. As with any other resource, it needs to be engaged with critically, not just accepted and incorporated into one's work. One of the most critical aspects of generative AI that needs to be understood is how it produces the data it outputs and what that means for us as the receiver of these outputs.
"The AI that we have here and now? It’s racist. And ablest. And sexist. And riddled with assumptions. And that includes large language models like the one #ChatGPT is built on."
Generative AI is trained through large datasets often scraped from the internet. As a result of this, the outputs can replicate the biases that are inherent in the data that goes into the training. As with all forms of data processing, the data that comes out is dependent on the data that goes in.
In 2020, Abeba Birhane and Vinay Prabhu audited two popular data sets (Prabhu & Birhane, 2020). The first, "80 Million Tiny Images", was an MIT set that was used to teach maching learning systems how to recognise people and objects. The set was full of racist slurs and offensive labels. In another set, ImageNet, they found pornographic content which did not require explicit consent because they were scraped from the internet. After publishing the study, MIT apologised and removed the set.
In 2023, Leonardo Nicoletti and Dina Bass (Nicoletti & Bass, 2023), in a report for Bloomberg Technology, investigated the outputs of Stable Diffusion, a deep learning, text-to-image model. Nicoletti and Bass used Stable Diffusion to generate thousands of images related to job titles and crime. Their analysis found that image sets generated for high-paying jobs were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like “fast-food worker” and “social worker.” When it came to prompts related to crime, more than 80% of the images generated were of people with a darker skin. When Nicoletti and Bass compared the outputs with the data collected by the US Bureau of Labor Statistics, they found that Stable Diffusion had "skewed representations of reality when it comes to women with darker skin".
At present there is not a lot of clarity around what happens to the data that is entered into generative AI tools. Consequently, it is wise to be cautious when entering data into any generative AI tool, if you choose to use them. Anything you enter into tools such as ChatGPT could be used to help develop its underlying large language model (LLM) and may be used to develop future products. As a result, there are a few key things you should do if you choose to use generative AI tools.
So-called "hallucinations" are essentially factually incorrect or misleading responses by generative AI. Each iteration of ChatGPT results in fewer and fewer hallucinations, however, they have not been eradicated at the time of writing [September 2023]. It is always important to interrogate the information you receive from generative AI tools, whether that be from a chat or from a search engine that utilises artificial intelligence behind the scenes.
Karim Lakhani, Harvard Business School Professor and Chair of the D^3 (Digital, Data, Design) Institute at Harvard, suggests the following steps to minimise hallucinations and misinformation when interacting with ChatGPT or other generative AI tools through careful prompting:
(Lakhani, 2023)
Clarke Grey, B. (2023). Losing the Plot: From the Dream of AI to Performative Equity – TRU Digital Detox. Retrieved 25 September 2023, from https://digitaldetox.trubox.ca/losing-the-plot-from-the-dream-of-ai-to-performative-equity/
Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248
Lakhani, K. (2023). How Can We Counteract Generative AI’s Hallucinations? Digital Data Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations/
Nicoletti, L., & Bass, D. (2023, August 22). Humans Are Biased. Generative AI Is Even Worse. Bloomberg.Com. https://www.bloomberg.com/graphics/2023-generative-ai-bias/
Noble S. U. (2018). Algorithms of oppression : how search engines reinforce racism. New York University Press.
Prabhu, V. U., & Birhane, A. (2020). Large image datasets: A pyrrhic win for computer vision? (arXiv:2006.16923). arXiv. https://doi.org/10.48550/arXiv.2006.16923