LibGuides: Artificial Intelligence (AI): Using AI Critically

Using AI Critically

Generative AI models...have captured the minds of the public and inspired widespread adoption. Yet, these models contain known racial, gender, and class stereotypes and biases from their training data and other structural factors, which downstream into model outputs. Marginalized groups are the most negatively affected by these biases.

[Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248]

Whilst there is some potential in using generative AI, it is important to ensure you engage with the technology critically, particularly in terms of its outputs and how they have been formulated. Understanding how AI functions is critical in understanding its possibilities, its limitations and problems. One of the most critical aspects of generative AI is an understanding of how it produces the data it outputs and what that means for us as the consumer of these outputs.

On this page we'll take a look at a few of the key issues associated with Generative AI:

Bias

Privacy

Hallucinations

Climate, Labour and Copyright

Bias

"The AI that we have here and now? It’s racist. And ablest. And sexist. And riddled with assumptions. And that includes large language models like the one #ChatGPT is built on."

[Clarke Grey, 2023]

Generative AI is trained by large datasets often scraped from the internet. As a result of this, the outputs can replicate the biases that are inherent in the data that goes into the training. In common with all forms of data processing, the data that comes out is dependent on the data that goes in.

In 2020, Abeba Birhane and Vinay Prabhu audited two popular data sets (Prabhu & Birhane, 2020). The first, 80 Million Tiny Images, was an MIT set that was used to teach machine learning systems how to recognise people and objects. The set was full of racist slurs and offensive labels. In another set, ImageNet, they found pornographic content which did not require explicit consent because they were scraped from the internet. After publishing the study, MIT apologised and removed the set.

In 2023, Leonardo Nicoletti and Dina Bass (Nicoletti & Bass, 2023), in a report for Bloomberg Technology, investigated the outputs of Stable Diffusion, a deep learning, text-to-image model. Nicoletti and Bass used Stable Diffusion to generate thousands of images related to job titles and crime. Their analysis found that image sets generated for high-paying jobs were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like “fast-food worker” and “social worker.” When it came to prompts related to crime, more than 80% of the images generated were of people with a darker skin. When Nicoletti and Bass compared the outputs with the data collected by the US Bureau of Labor Statistics, they found that Stable Diffusion had "skewed representations of reality when it comes to women with darker skin".

Furthermore, as the Queer in AI group have noted, the outputs can also have a serious, material impact on the LGBTQ+ community, in terms of hate speech and the erasure of non-binary genders and identities. They argue that some of the harms to the queer community "can be traced to large language models (LLMs) trained on datasets containing hate speech and censored queer words, leading search systems to avoid queer content and content moderation systems to more often tag it as suspect. LLMs also overwhelmingly fail to account for non-binary genders and pronouns, contributing to erasure of these identities" (Queerinai et al, 2023).

Even when products are built with the intention of addressing bias in the training data, they simply do not provide a permanent solution to the deeper problem. Note, for example, the attempts by Google with Gemini in 2024 (Kleinman, 2024). Ultimately, as Gillard argues, "algorithms inevitably perpetuate one kind of bias or another" (Gillard, 2024).

Privacy

"AI-enabled surveillance systems, in conjunction with surveillance of online spaces such as dating apps by states, corporations, and even individuals have outed queer people, compromising their privacy and safety."

[Queerinai, 2023]

At present there is a lack of clarity regarding the data that is entered into generative AI tools, how it is secured and how it is used. Consequently, it is wise to be cautious when entering data into any generative AI tool, if you choose to use them. This is particularly the case when it comes to custom GPTs - software that individuals or groups have built using the GPT engine. In 2024, researchers found that they had a 100% success rate for file leakage and 97% for system prompt extraction using simple prompts (prompt extraction refers to using prompts to extract data from the tool) (Yu et al, 2024).

Even Chat GPT itself isn't immune to data leaks (few online tools are). In March 2023, an outage led to the leakage of personal data, including "active user’s first and last name, email address, payment address, credit card type and the last four digits (only) of a credit card number, and credit card expiration date" (although fortunately not the full credit card number) (Open AI, 2023).

Furthermore, in a report published by the Mozilla Foundation (Caltrider et al, 2024), a range of issues were revealed regarding so-called romantic chatbots. The researchers found that 90% of the romantic chatbots failed to meet Mozilla's minimum security standards, 90% may share or sell personal data and 54% won't let users delete their personal data.

There are a few key things you should do to limit the risks if you choose to use generative AI tools.

Check the privacy policy of any tools that you use. It's very tempting to not bother reading the privacy policy of online tools, they can be long and very dry to read. However, you should always know what you are agreeing to before you use the product. The privacy polocy for OpenAI (the organisation behind ChatGPT), for example, is available here and outlines how they use personal information. If no privacy policy is available, then avoid using the product.
Be careful with what you share. As you don't know how the service will utilise the data you enter or how the data will be stored, it's wise to be cautious about the data you provide. There are already examples of chat data being exposed online, not to mention the example of romantic chatbots not allowing you to delete your personal data (see above), so do not share any sensitive or private data with any generative AI tool.
Check privacy settings. Once you have logged in and are accessing the tools, head to your settings and ensure you are happy with the security and privacy settings. OpenAI provides some information here, and you can adjust your settings by clicking on your email in the bottom left corner, then Settings then Data Controls. For other tools, look for Settings or Profile or similar.

Hallucinations

So-called "hallucinations" are essentially factually incorrect or misleading responses by generative AI. Each iteration of ChatGPT results in fewer and fewer hallucinations, however, they have not been eradicated at the time of writing [July 2024]. It is always important to interrogate the information you receive from generative AI tools, whether that be from a chat or from a search engine that utilises artificial intelligence behind the scenes.

Karim Lakhani, Harvard Business School Professor and Chair of the D^3 (Digital, Data, Design) Institute at Harvard, suggests the following steps to minimise hallucinations and misinformation when interacting with ChatGPT or other generative AI tools through careful prompting:

Request sources or evidence. When asking for factual information, specifically request reliable sources or evidence to support the response. For example, you can ask, “What are the sources for that information?” or “Can you provide evidence to support your answer?” This can encourage the model to provide more reliable and verifiable information.
Use multiple prompts or iterative refinement. If the initial response from the model seems dubious or insufficient, try rephrasing or providing additional prompts to get a more accurate or comprehensive answer. Iterative refinement of the conversation can help in obtaining better results.
Ask for explanations or reasoning. Instead of simply asking for a direct answer, ask the model to explain its reasoning or provide a step-by-step explanation. This can help uncover any potential flaws or biases in the generated response.
Double-check information independently. Don’t solely rely on the model’s responses. Take the responsibility to fact-check and verify the information independently using trusted sources or references. Cross-referencing information can help identify and correct any misinformation generated by the model.
Address biases by increasing multiple perspectives. Generative AI models are ultimately human-made, and therefore reflect pre-existing biases which may lead to unintended impacts. Instead of asking, “Is this response biased?,” we can assume that the answer is “Yes.” Our response requires ethical considerations in prompting the use of outputs. In order to evaluate generated responses for accuracy and fairness, we must become increasingly aware blind spots, of which perspectives may not be represented, and to both value and seek multiple perspectives.

(Lakhani, 2023)

Climate, Labour and Copyright

The problems with generative AI aren't solely limited to your immediate engagement with the tool. There are also a range of issues one needs to consider regarding how it is trained.

In 2024, partners in the Climate Action Against Disinformation coalition, including Greenpeace and Friends of the Earth, published a report on the risks that generative AI poses to the climate crisis (Climate Action Against Disinformation coalition, 2024). Noting the International Energy Agency's estimates that "electricity consumption from data centres, artificial intelligence (AI) and the cryptocurrency sector could double by 2026" (IEA, 2024), the report argues that such growth would lead to an 80% increase in global carbon emissions.

It is also important to note the hidden labour that is used to train generative AI tools. Due to the need to train the large language models, cheap labour is used to manually sort data and code data to develop AI tools (Hao & Hernandez, 2022), something Mary Gray describes as "ghost work", work where the human labour is concealed and it's "just a matter of software working its magic" (Hao, 2022).

It is also important to acknowledge the issues around intellectual property associated with the content generated by artificial intelligence. As noted above, large language models are trained on data that leads to the creation of its outputs. This means that copyrighted content is ultimately used to train AI and is then used to generate "new" content, leading to lawuits claiming that "generative AI art tools violate copyright law by scraping artists’ work from the web without their consent" (Vincent, 2023). Ultimately, generative AI has an "intellectual property problem" (Appel et al, 2023), alongside issues around bias, privacy, misinformation, the environment and labour.

References

Appel, G., Neelbauer, J., & Schweidel, D. A. (2023). Generative AI Has an Intellectual Property Problem. Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem

Caltrider, J., Rykov, M., & MacDonald, Z. (2024). Happy Valentine’s Day! Romantic AI Chatbots Don’t Have Your Privacy at Heart.

https://foundation.mozilla.org/en/privacynotincluded/articles/happy-valentines-day-romantic-ai-chatbots-dont-have-your-privacy-at-heart/

Clarke Grey, B. (2023). Losing the Plot: From the Dream of AI to Performative Equity – TRU Digital Detox. https://digitaldetox.trubox.ca/losing-the-plot-from-the-dream-of-ai-to-performative-equity/

Climate Action Against Disinformation coalition. (2024). Report: Artificial Intelligence A Threat to Climate Change, Energy Usage and Disinformation. Friends of the Earth. https://foe.org/news/ai-threat-report/

Gilliard, C. (2024). The Deeper Problem With Google’s Racially Diverse Nazis. The Atlantic. https://www.theatlantic.com/technology/archive/2024/02/google-gemini-diverse-nazis/677575/

Hao, K. (2019). The AI gig economy is coming for you. MIT Technology Review. Retrieved 9 July 2024, from https://www.technologyreview.com/2019/05/31/103015/the-ai-gig-economy-is-coming-for-you/

Hao, K., & Hernández, A. P. (2022). How the AI industry profits from catastrophe. MIT Technology Review. Retrieved 9 July 2024, from https://www.technologyreview.com/2022/04/20/1050392/ai-industry-appen-scale-data-labels/

International Energy Agency. (2024). Executive summary – Electricity 2024 – Analysis. IEA. https://www.iea.org/reports/electricity-2024/executive-summary

Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248

Kleinman, Z. (2024). Why Google’s ‘woke’ AI problem won’t be an easy fix. BBC News. https://www.bbc.com/news/technology-68412620

Lakhani, K. (2023). How Can We Counteract Generative AI’s Hallucinations? Digital Data Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations/

Nicoletti, L., & Bass, D. (2023). Humans Are Biased. Generative AI Is Even Worse. Bloomberg.Com. https://www.bloomberg.com/graphics/2023-generative-ai-bias/

OpenAI. (2023). March 20 ChatGPT outage: Here’s what happened. https://openai.com/index/march-20-chatgpt-outage/

Prabhu, V. U., & Birhane, A. (2020). Large image datasets: A pyrrhic win for computer vision? (arXiv:2006.16923). arXiv. https://doi.org/10.48550/arXiv.2006.16923

Queerinai, O. O., Ovalle, A., Subramonian, A., Singh, A., Voelcker, C., Sutherland, D. J., Locatelli, D., Breznik, E., Klubicka, F., Yuan, H., J, H., Zhang, H., Shriram, J., Lehman, K., Soldaini, L., Sap, M., Deisenroth, M. P., Pacheco, M. L., Ryskina, M., … Stark, L. (2023). Queer In AI: A Case Study in Community-Led Participatory AI. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1882–1895. https://doi.org/10.1145/3593013.3594134

Vincent, J. (2023). AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit. The Verge. https://www.theverge.com/2023/1/16/23557098/generative-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart

Image credit

Banner image c/o merlinlightpainting on Pixabay. https://pixabay.com/photos/magical-woman-fantasy-creative-6046020/