@@ -5,7 +5,7 @@ AIMon helps developers build, ship, and monitor LLM Apps more confidently and re
55✨ ** Join our community on [ Slack] ( https://join.slack.com/t/generativeair/shared_invite/zt-2jab62lsj-xM9a_s~Qweu8lf3YS2cANg ) **
66
77<div align =" center " >
8- <img src="images/aimon-rely-image.png" alt="AIMon" width="650 " height="350 ">
8+ <img src="images/aimon-rely-image.png" alt="AIMon" width="325 " height="175 ">
99</div >
1010
1111## Metrics Supported
@@ -81,9 +81,18 @@ A few key takeaways:
8181Overall, AIMon is 10 times cheaper, 4 times faster, and close to or even ** better than GPT-4** on the benchmarks
8282making it a suitable choice for both offline and online detection of hallucinations.
8383
84- <div align =" center " >
85- <img src="images/hallucination-benchmarks.png" alt="Hallucination Benchmarks">
86- </div >
84+ | Metric | Aimon Rely v1 | GPT-4 Turbo (LLM-as-a-judge) |
85+ | ---------------------------------------------------------------| ------------------------| ----------------------------------|
86+ | Context Length | 32,000 | ** 128,000** |
87+ | TRUE Dataset Precision/Recall | 0.808 / 0.922 | ** 0.810 / 0.926** |
88+ | SummaC (test) Balanced Accuracy | ** 0.778** | 0.756 |
89+ | SummaC (test) AUC | ** 0.809** | 0.780 |
90+ | AnyScale Ranking Test for Hallucinations Accuracy | 0.665 | ** 0.741** |
91+ | AnyScale Ranking Test for Hallucinations Rel. Accuracy | 0.804 | ** 0.855** |
92+ | Avg. Latency | ** 417ms** | 1800ms |
93+ | Cost (15M tokens across all benchmark datasets) excluding free tier | ** $15** | $158 |
94+ | Fully Hosted | :white_check_mark : | :white_check_mark : |
95+ | Explainability | ** Automatic sentence-level Scores** | Detailed reasoning with additional prompt engineering |
8796
8897### Benchmarks on other Detectors
8998
0 commit comments