e-LLM | Honeycomb

e-LLM

Large language models(LLMs) are prone to confidently hallucinate as well as behave differently depending on training practices and data.

To combat LLM hallucinations and biases in LLM training data, Honeycomb AI introduces a novel algorithm: Ensemblic-distributed Large Language Models(e-LLM).

Our preliminary research and experimentation has shown e-LLM to be more consistent and performant than traditional single-LLM approaches for certain tasks such as detecting prompt injection attempts.

Coupled with traditional semantic-search filtering, e-LLM is the state-of-the-art for prompt injection detection and prevention.

To tackle prompt injection, we propose a coupled algorithm which uses e-LLM as a fail-safe for whenever traditional semantic search algorithms pass.

Traditional semantic search algorithms are mostly deterministic & open-source, resulting in higher confidence and consistency in performance. By only passing the prompt through e-LLM as a fail-safe, we avoid unnecessary resource and time usage.

At the LLM stage, we pass the prompt through the LLMs in parallel to increase speed. Furthermore, the input prompt is wrapped in an injection-safe wrapper to reduce injection success.

We observe a positive relationship between the number of LLMs and prompt injection detection, albeit with diminishing returns as well as increased resource usage.

e-LLM + Semantic Search

e-LLM Perceptron + Semantic Search

Sum

Sigmoid

x1w1

x2w2

x3w3

xnwn

In further experimentation, we propose a slight modification of the e-LLM + Semantic Search algorithm, replacing the meta LLM with a single-layer perceptron.

The main benefits of this modification are the increase in speed, decrease in resource usage, and ability of granular fine-tuning of LLM weighting.

Similar results are observed: a positive relationship between the number of LLMs and prompt injection detection.

We should note that the benefit of granular fine-tuning supposes the need for knowledgeable supervision which may be a negative.

e-LLM + Semantic Search + Anomaly dETECTION

xnwn

x3w3

x2w2

x1w1

Sum

Sigmoid

We further propose monitoring the output & behavior of the LLM's using an anomaly detector.

With the presumption that a prompt injection attack is successful against at least one of the LLM's despite the injection-wrapper, we would observe a deviation from the expected output: a classification digit of 0 or 1.

Even more so, we would likely observe a slight deviation in the LLM's compute time.

This approach not only provides an additional fail-safe mechanism, but it also handles the edge case of the perceptron not being able to provide a classification.