Generative AI models...have captured the minds of the public and inspired widespread adoption. Yet, these models contain known racial, gender, and class stereotypes and biases from their training data and other structural factors, which downstream into model outputs. Marginalized groups are the most negatively affected by these biases.
[Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248]
Whilst there is some potential in using generative AI, it is important to ensure you engage with the technology critically, particularly in terms of its outputs and how they have been formulated. Understanding how AI functions is critical in understanding its possibilities, its limitations and problems. One of the most critical aspects of generative AI is an understanding of how it produces the data it outputs and what that means for us as the consumer of these outputs.
On this page we'll take a look at a few of the key issues associated with Generative AI:
"The AI that we have here and now? It’s racist. And ablest. And sexist. And riddled with assumptions. And that includes large language models like the one #ChatGPT is built on."
[Clarke Grey, 2023]
Generative AI is trained by large datasets often scraped from the internet. As a result of this, the outputs can replicate the biases that are inherent in the data that goes into the training. In common with all forms of data processing, the data that comes out is dependent on the data that goes in.
In 2020, Abeba Birhane and Vinay Prabhu audited two popular data sets (Prabhu & Birhane, 2020). The first, 80 Million Tiny Images, was an MIT set that was used to teach machine learning systems how to recognise people and objects. The set was full of racist slurs and offensive labels. In another set, ImageNet, they found pornographic content which did not require explicit consent because they were scraped from the internet. After publishing the study, MIT apologised and removed the set.
In 2023, Leonardo Nicoletti and Dina Bass (Nicoletti & Bass, 2023), in a report for Bloomberg Technology, investigated the outputs of Stable Diffusion, a deep learning, text-to-image model. Nicoletti and Bass used Stable Diffusion to generate thousands of images related to job titles and crime. Their analysis found that image sets generated for high-paying jobs were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like “fast-food worker” and “social worker.” When it came to prompts related to crime, more than 80% of the images generated were of people with a darker skin. When Nicoletti and Bass compared the outputs with the data collected by the US Bureau of Labor Statistics, they found that Stable Diffusion had "skewed representations of reality when it comes to women with darker skin".
Furthermore, as the Queer in AI group have noted, the outputs can also have a serious, material impact on the LGBTQ+ community, in terms of hate speech and the erasure of non-binary genders and identities. They argue that some of the harms to the queer community "can be traced to large language models (LLMs) trained on datasets containing hate speech and censored queer words, leading search systems to avoid queer content and content moderation systems to more often tag it as suspect. LLMs also overwhelmingly fail to account for non-binary genders and pronouns, contributing to erasure of these identities" (Queerinai et al, 2023).
Even when products are built with the intention of addressing bias in the training data, they simply do not provide a permanent solution to the deeper problem. Note, for example, the attempts by Google with Gemini in 2024 (Kleinman, 2024). Ultimately, as Gillard argues, "algorithms inevitably perpetuate one kind of bias or another" (Gillard, 2024).
"AI-enabled surveillance systems, in conjunction with surveillance of online spaces such as dating apps by states, corporations, and even individuals have outed queer people, compromising their privacy and safety."
[Queerinai, 2023]
At present there is a lack of clarity regarding the data that is entered into generative AI tools, how it is secured and how it is used. Consequently, it is wise to be cautious when entering data into any generative AI tool, if you choose to use them. This is particularly the case when it comes to custom GPTs - software that individuals or groups have built using the GPT engine. In 2024, researchers found that they had a 100% success rate for file leakage and 97% for system prompt extraction using simple prompts (prompt extraction refers to using prompts to extract data from the tool) (Yu et al, 2024).
Even Chat GPT itself isn't immune to data leaks (few online tools are). In March 2023, an outage led to the leakage of personal data, including "active user’s first and last name, email address, payment address, credit card type and the last four digits (only) of a credit card number, and credit card expiration date" (although fortunately not the full credit card number) (Open AI, 2023).
Furthermore, in a report published by the Mozilla Foundation (Caltrider et al, 2024), a range of issues were revealed regarding so-called romantic chatbots. The researchers found that 90% of the romantic chatbots failed to meet Mozilla's minimum security standards, 90% may share or sell personal data and 54% won't let users delete their personal data.
There are a few key things you should do to limit the risks if you choose to use generative AI tools.
So-called "hallucinations" are essentially factually incorrect or misleading responses by generative AI. Each iteration of ChatGPT results in fewer and fewer hallucinations, however, they have not been eradicated at the time of writing [July 2024]. It is always important to interrogate the information you receive from generative AI tools, whether that be from a chat or from a search engine that utilises artificial intelligence behind the scenes.
Karim Lakhani, Harvard Business School Professor and Chair of the D^3 (Digital, Data, Design) Institute at Harvard, suggests the following steps to minimise hallucinations and misinformation when interacting with ChatGPT or other generative AI tools through careful prompting:
(Lakhani, 2023)
The problems with generative AI aren't solely limited to your immediate engagement with the tool. There are also a range of issues one needs to consider regarding how it is trained.
In 2024, partners in the Climate Action Against Disinformation coalition, including Greenpeace and Friends of the Earth, published a report on the risks that generative AI poses to the climate crisis (Climate Action Against Disinformation coalition, 2024). Noting the International Energy Agency's estimates that "electricity consumption from data centres, artificial intelligence (AI) and the cryptocurrency sector could double by 2026" (IEA, 2024), the report argues that such growth would lead to an 80% increase in global carbon emissions.
It is also important to note the hidden labour that is used to train generative AI tools. Due to the need to train the large language models, cheap labour is used to manually sort data and code data to develop AI tools (Hao & Hernandez, 2022), something Mary Gray describes as "ghost work", work where the human labour is concealed and it's "just a matter of software working its magic" (Hao, 2022).
It is also important to acknowledge the issues around intellectual property associated with the content generated by artificial intelligence. As noted above, large language models are trained on data that leads to the creation of its outputs. This means that copyrighted content is ultimately used to train AI and is then used to generate "new" content, leading to lawuits claiming that "generative AI art tools violate copyright law by scraping artists’ work from the web without their consent" (Vincent, 2023). Ultimately, generative AI has an "intellectual property problem" (Appel et al, 2023), alongside issues around bias, privacy, misinformation, the environment and labour.
Appel, G., Neelbauer, J., & Schweidel, D. A. (2023). Generative AI Has an Intellectual Property Problem. Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
Caltrider, J., Rykov, M., & MacDonald, Z. (2024). Happy Valentine’s Day! Romantic AI Chatbots Don’t Have Your Privacy at Heart.
https://foundation.mozilla.org/en/privacynotincluded/articles/happy-valentines-day-romantic-ai-chatbots-dont-have-your-privacy-at-heart/
Clarke Grey, B. (2023). Losing the Plot: From the Dream of AI to Performative Equity – TRU Digital Detox. https://digitaldetox.trubox.ca/losing-the-plot-from-the-dream-of-ai-to-performative-equity/
Climate Action Against Disinformation coalition. (2024). Report: Artificial Intelligence A Threat to Climate Change, Energy Usage and Disinformation. Friends of the Earth. https://foe.org/news/ai-threat-report/
Gilliard, C. (2024). The Deeper Problem With Google’s Racially Diverse Nazis. The Atlantic. https://www.theatlantic.com/technology/archive/2024/02/google-gemini-diverse-nazis/677575/
Hao, K. (2019). The AI gig economy is coming for you. MIT Technology Review. Retrieved 9 July 2024, from https://www.technologyreview.com/2019/05/31/103015/the-ai-gig-economy-is-coming-for-you/
Hao, K., & Hernández, A. P. (2022). How the AI industry profits from catastrophe. MIT Technology Review. Retrieved 9 July 2024, from https://www.technologyreview.com/2022/04/20/1050392/ai-industry-appen-scale-data-labels/
International Energy Agency. (2024). Executive summary – Electricity 2024 – Analysis. IEA. https://www.iea.org/reports/electricity-2024/executive-summary
Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248
Kleinman, Z. (2024). Why Google’s ‘woke’ AI problem won’t be an easy fix. BBC News. https://www.bbc.com/news/technology-68412620
Lakhani, K. (2023). How Can We Counteract Generative AI’s Hallucinations? Digital Data Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations/
Nicoletti, L., & Bass, D. (2023). Humans Are Biased. Generative AI Is Even Worse. Bloomberg.Com. https://www.bloomberg.com/graphics/2023-generative-ai-bias/
OpenAI. (2023). March 20 ChatGPT outage: Here’s what happened. https://openai.com/index/march-20-chatgpt-outage/
Prabhu, V. U., & Birhane, A. (2020). Large image datasets: A pyrrhic win for computer vision? (arXiv:2006.16923). arXiv. https://doi.org/10.48550/arXiv.2006.16923
Queerinai, O. O., Ovalle, A., Subramonian, A., Singh, A., Voelcker, C., Sutherland, D. J., Locatelli, D., Breznik, E., Klubicka, F., Yuan, H., J, H., Zhang, H., Shriram, J., Lehman, K., Soldaini, L., Sap, M., Deisenroth, M. P., Pacheco, M. L., Ryskina, M., … Stark, L. (2023). Queer In AI: A Case Study in Community-Led Participatory AI. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1882–1895. https://doi.org/10.1145/3593013.3594134
Vincent, J. (2023). AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit. The Verge. https://www.theverge.com/2023/1/16/23557098/generative-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart
Banner image c/o merlinlightpainting on Pixabay. https://pixabay.com/photos/magical-woman-fantasy-creative-6046020/