When the term "open source" was first applied to Large Language Models (LLMs) it created some confusion. Open source was originally applied to the source code of computer programs, and LLMs are not source code. What would make an LLM open source? While the Open Source Initiative (OSI) created their own definition of open source generative AI, many in the open source community objected to that definition as it doesn't require the disclosure of training data.
Rather that discuss that topic, this session with focus on open source applications released under good ol' open source licenses that complement LLMs and actually make generative AI useful.
An obvious role for open source is in the area of vector databases. As LLMs are trained they create a large number of vectors with many dimensions, and having a database to store and retrieve them is a requirement. Open source has a long history of providing industry-leading database solutions and the same will happen here.
Another role for open source lies in Retrieval Augmented Generation. RAG allows users to enrich LLMs with their custom data. For example, one common use of RAG is to take a project's documentation and use it to create an agent that can answer user questions.
In order to fully utilize RAG, that user data has to be broken up into "chunks" that can be consumed by the software. Once again open source steps in with a number of "chunkers" to help automate this process.
One of the most powerful features of Linux/Unix operating systems is the idea of a "pipe": where the output from one process can become the input of another one. Expect open source to play a huge role in generative AI through the creation of great, special purpose tools that combine to create something greater than the sum of its parts.



