Skip to Main Content

Artificial Intelligence (AI)

A resource guide on using Artificial Intelligence (AI) critically for literature searching and research.

A series of cogs on a yellow background.

Using AI Critically

Generative AI models...have captured the minds of the public and inspired widespread adoption. Yet, these models contain known racial, gender, and class stereotypes and biases from their training data and other structural factors, which downstream into model outputs. Marginalized groups are the most negatively affected by these biases.

[Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248]

 

Whilst there is great potential for making use of generative AI, it is important to ensure you engage with the technology critically, particularly in terms of its outputs and how they have been formulated. Understanding how AI functions is critical in understanding its possibilities, its limitations and its problems. Before reflecting on AI though, it's important to reflect upon pre-existing technologies and how they function.

Search Engines

It's important to note, that the technology we have used and continue to use before the advent of ChatGPT was not without its problems. Conducting effective literature searching online has always been an important skill to develop, and it requires a good understanding of how search engines function. An effective, high quality literature search requires an understanding of the various operators and how to use them in your search. Some aspects of the process, however, remain a mystery. For example, does the user really know how relevance ranking actually works? Why one article is rated as more "relevant" than another?

In her book, Algorithms of Oppression, Safiya Noble explores how search engines reinforce racism in their outputs. She argues:

"Search does not merely present pages but structures knowledge, and the results retrieved in a commercial search engine create their own particular material reality. Ranking is itself information that also reflects the political, social, and cultural values of the society that search engines operate within..."

Noble S. U. (2018). Algorithms of oppression : how search engines reinforce racism. New York University Press.

Search engines do not exist outside of society, they are a product of our society and, therefore, they are susceptible to all the same biases and cultural values of the society in which they exist. Search engines are not neutral. And neither is generative AI.

 


Considerations When Using Generative AI

The emergence of generative AI has raised many questions about education and research. As with any other resource, it needs to be engaged with critically, not just accepted and incorporated into one's work. One of the most critical aspects of generative AI that needs to be understood is how it produces the data it outputs and what that means for us as the receiver of these outputs.

 

Bias

"The AI that we have here and now? It’s racist. And ablest. And sexist. And riddled with assumptions. And that includes large language models like the one #ChatGPT is built on."

[Clarke Grey, 2023]

Generative AI is trained through large datasets often scraped from the internet. As a result of this, the outputs can replicate the biases that are inherent in the data that goes into the training. As with all forms of data processing, the data that comes out is dependent on the data that goes in.

In 2020, Abeba Birhane and Vinay Prabhu audited two popular data sets (Prabhu & Birhane, 2020). The first, "80 Million Tiny Images", was an MIT set that was used to teach maching learning systems how to recognise people and objects. The set was full of racist slurs and offensive labels. In another set, ImageNet, they found pornographic content which did not require explicit consent because they were scraped from the internet. After publishing the study, MIT apologised and removed the set.

In 2023, Leonardo Nicoletti and Dina Bass (Nicoletti & Bass, 2023), in a report for Bloomberg Technology, investigated the outputs of Stable Diffusion, a deep learning, text-to-image model. Nicoletti and Bass used Stable Diffusion to generate thousands of images related to job titles and crime. Their analysis found that image sets generated for high-paying jobs were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like “fast-food worker” and “social worker.” When it came to prompts related to crime, more than 80% of the images generated were of people with a darker skin.  When Nicoletti and Bass compared the outputs with the data collected by the US Bureau of Labor Statistics, they found that Stable Diffusion had "skewed representations of reality when it comes to women with darker skin".

Privacy

At present there is not a lot of clarity around what happens to the data that is entered into generative AI tools. Consequently, it is wise to be cautious when entering data into any generative AI tool, if you choose to use them. Anything you enter into tools such as ChatGPT could be used to help develop its underlying large language model (LLM) and may be used to develop future products. As a result, there are a few key things you should do if you choose to use generative AI tools.

  1. Check the privacy policy of any tools that you use. It's very tempting to not bother reading the privacy policy of online tools. However, you should always know what you are agreeing to before you use the product. The privacy polocy for OpenAI (the organisation behind ChatGPT), for example, is available here and outlines how they use personal information. If no privacy policy is available, then avoid using the product.
  2. Be careful with what you share. As you don't know how the service will utilise the data you enter or how the data will be stored, it's wise to be cautious about the data you provide. There are already examples of chat data being exposed online, so do not share any sensitive or private data with any generative AI tool.
  3. Check privacy settings. Once you have logged in and are accessing the tools, head to your settings and ensure you are happy with the security and privacy settings. OpenAI provides some information here, and you can adjust your settings by clicking on your email in the bottom left corner, then Settings then Data Controls. For other tools, look for Settings or Profile or similar.

Hallucinations

So-called "hallucinations" are essentially factually incorrect or misleading responses by generative AI. Each iteration of ChatGPT results in fewer and fewer hallucinations, however, they have not been eradicated at the time of writing [September 2023]. It is always important to interrogate the information you receive from generative AI tools, whether that be from a chat or from a search engine that utilises artificial intelligence behind the scenes.

Karim Lakhani, Harvard Business School Professor and Chair of the D^3 (Digital, Data, Design) Institute at Harvard, suggests the following steps to minimise hallucinations and misinformation when interacting with ChatGPT or other generative AI tools through careful prompting:

  1. Request sources or evidence. When asking for factual information, specifically request reliable sources or evidence to support the response. For example, you can ask, “What are the sources for that information?” or “Can you provide evidence to support your answer?” This can encourage the model to provide more reliable and verifiable information.
  2. Use multiple prompts or iterative refinementIf the initial response from the model seems dubious or insufficient, try rephrasing or providing additional prompts to get a more accurate or comprehensive answer. Iterative refinement of the conversation can help in obtaining better results.
  3. Ask for explanations or reasoning. Instead of simply asking for a direct answer, ask the model to explain its reasoning or provide a step-by-step explanation. This can help uncover any potential flaws or biases in the generated response.
  4. Double-check information independently. Don’t solely rely on the model’s responses. Take the responsibility to fact-check and verify the information independently using trusted sources or references. Cross-referencing information can help identify and correct any misinformation generated by the model.
  5. Address biases by increasing multiple perspectives Generative AI models are ultimately human-made, and therefore reflect pre-existing biases which may lead to unintended impacts. Instead of asking, “Is this response biased?,” we can assume that the answer is “Yes.” Our response requires ethical considerations in prompting the use of outputs. In order to evaluate generated responses for accuracy and fairness, we must become increasingly aware blind spots, of which perspectives may not be represented, and to both value and seek multiple perspectives.

(Lakhani, 2023)

References

Clarke Grey, B. (2023). Losing the Plot: From the Dream of AI to Performative Equity – TRU Digital Detox. Retrieved 25 September 2023, from https://digitaldetox.trubox.ca/losing-the-plot-from-the-dream-of-ai-to-performative-equity/

Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248

Lakhani, K. (2023). How Can We Counteract Generative AI’s Hallucinations? Digital Data Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations/

Nicoletti, L., & Bass, D. (2023, August 22). Humans Are Biased. Generative AI Is Even Worse. Bloomberg.Com. https://www.bloomberg.com/graphics/2023-generative-ai-bias/

Noble S. U. (2018). Algorithms of oppression : how search engines reinforce racism. New York University Press.

Prabhu, V. U., & Birhane, A. (2020). Large image datasets: A pyrrhic win for computer vision? (arXiv:2006.16923). arXiv. https://doi.org/10.48550/arXiv.2006.16923