How Do We Avoid Polluting Our LLMs?
The rise of Large Language Models
Large Language Models (LLMs) have quickly gained the attention of many, joining the discussion list at family dinners due to their ability to engage users with compelling and impactful results that evolve based on user input. LLMs’ power comes from their ability to distill down information and make meaningful connections that lead to recommended sets of content for users to consume.
One commonly discussed defect with LLMs is their ability to manifest ‘hallucinations’, essentially creating a relationship between two pieces of information that does not exist in the real world.
The problem of hallucinations in LLMs
Hallucinations are an unexpected output, often triggered by edge cases in data and relationships. Beyond random manifestations of data, we have the risk of user input leading to hallucinations. This input could be intentional or unintentional and becomes a form of ‘pollution’ in the model, or interactions by users that lead to adverse outputs from the LLM and associated applications.
Pollution in LLMs
What is LLM pollution (and how do we manage it)?
Pollution in our models can come from many sources but has the potential to lead to unexpected behavior, including hallucinations or other incorrect results.
Pollution can come from a variety of sources, including incorrect or incomplete training data, user interaction in a low-resource language, and incorrect domain-specific terminology. To ensure our models do not get polluted from these and other sources, your organization must first define what constitutes pollution by defining standards for model outcomes, language usage, answers and outputs, responses, and accuracy levels.
Once our policies are defined, we have multiple options for where and how we monitor for and correct pollution within our model training and deployment processes:
- Monitor User Inputs – User inputs to LLMs can lead to new associations being created for content. We must log, monitor, and periodically review user input. We should evaluate it against the expected patterns of user interactions and appropriate use of terms, including acronyms for applicability in our specific domain or industry.
- Monitor Training Data Completeness – The completeness of our training data plays a significant role in the resulting accuracy of our LLMs. As we train our LLMs, both generic and industry-specific, we must ensure that we do not inadvertently create gaps in subsets of domain knowledge or skipped relationships in common semantic models.
- Monitor Output Drift – At a very simplistic level, we must regularly ask our LLMs the same set of questions to ensure that our output does not drift outside of our policies and standards for accuracy. This process should be automated to occur at regular intervals and triggered by actions such as large increases in users, new domains of data being added through training or shifts in industry norms leading to different applications of common terms.
LLMs evolve faster than other forms of analytical models due to their nature of direct interaction with users. This direct interaction can intersect with training data in unpredictable ways. Through both intentional and unintentional user actions, we can experience polluted models that lead to incorrect and incomplete results, including hallucinations.
Through clear policies that lead to the implementation of quality controls, modeling monitoring, and alerting we can catch pollution before it manifests in ways that break business processes or negatively impact user trust.