How Open is Open? Transparency and Accountability in Open-Source LLMs.


The world of Natural Language Processing (NLP) has seen a significant transformation with the advent of Large Language Models (LLMs) such as ChatGPT.  Companies implement LLMs in a variety of ways including customer service, sentiment analysis and marketing, while researchers explore its use in areas such as natural language processing, psychology, and linguistics.
However ChatGPT, the popular face of LLMs, despite its widespread use, has major drawbacks. For example, ChatGPT is known to generate factually incorrect responses, often referred to as hallucinations, as well as exhibiting  a variety of biases – all based on the data used to build the model. Open-AI, the company behind ChatGPT, is not open, as in open-source, making it impossible to examine the data sources used to build the underlying models that generate the output.

The presentation will begin by explaining the concept of 'attention' - the breakthrough idea making ChatGPT possible. We'll explore how large language models are constructed, emphasizing  the need for genuine openness and examining the implications of models that may not fully disclose their data sources and algorithms. But it's not just ChatGPT that has problems.
A recent study of AI software (Nolan, 2023) found numerous instances of software claiming to be open source failed to provide clear details about the source of their training data and the underlying algorithms The audience will gain insights into why knowing the origin of training data is vital and how hidden data sources can introduce biases and reinforce inequalities, leading to real-world consequences.
When open-source LLMs keep their algorithms proprietary, it becomes challenging to evaluate and scrutinize their operations, leading to a lack of accountability. Attendees will learn how proprietary algorithms can hinder the identification and correction of algorithmic biases. Issues of fairness, accountability, and responsible AI are central themes. Attendees will gain a deeper understanding of the risks posed by models that are not as open as they claim to be.

For those interested in building applications with open LLMs, we'll outline open-source alternatives and techniques for buidling traceabilty into their software.

The presentation concludes by advocating for genuine transparency and accountability in open-source LLMs. Attendees will leave with a call to action, encouraging them to support projects that adhere to open-source principles in both word and spirit.

reference: Nolan, Michael. Llama and ChatGPT Are Not Open-Source. IEEE Spectrum, July, 2023

Ballroom F
Saturday, March 16, 2024 - 12:30 to 13:30