Advancing from RAG to Agent Systems: Upgrading from Gen AI 1.5 to 2.0

2-0, agent systems, gen AI 1.5, RAG

With the rapid advancement of artificial intelligence (AI) and generative AI models, there is an increasing need to understand and audit these models to ensure optimal performance and accuracy. The AI Impact Tour, taking place on June 5th, offers a unique opportunity to explore different methods for auditing AI models and learn from industry leaders in this field.

Over the past year, there has been significant progress in developing solutions based on generative AI foundation models. Initially, most applications used large language models (LLMs), but now multi-modal models that can understand and generate images and videos have emerged, giving rise to the term “foundation model” (FM) as a more accurate descriptor.

These FMs have become more attuned to users’ preferences by learning from feedback and predicting the best next token to generate responses that people find appealing. The generative AI community has also discovered the importance of formatting, with YAML performing better than JSON. This understanding has led to the development of “prompt-engineering” techniques, where specific prompts are provided to guide the models towards desired response styles.

There have been notable improvements in the capabilities of LLMs, particularly in processing larger amounts of information. State-of-the-art models can now handle up to 1 million tokens, equivalent to a full-length college textbook. This enables users to control the context with which they ask questions and obtain accurate answers. For instance, complex legal, medical, or scientific texts can be processed, and questions can be answered at 85% accuracy on relevant entrance exams in these fields.

Furthermore, technology leveraging LLMs for text retrieval based on concepts rather than keywords has been developed. New embedding models, such as titan-v2, gte, or cohere-embed, convert diverse sources of text into “vectors” learned from correlations in large datasets. These vectors can be queried in databases or specialized systems, like turbopuffer, LanceDB, and QDrant, enabling the retrieval of similar texts. These systems have proven successful in scaling up to 100 million multi-page documents, albeit with some performance limitations.

Despite these advancements, scaling LLM-based applications in production remains a complex endeavor. Many factors, including security, scaling, latency, cost optimization, and data/response quality, must be addressed to optimize these systems.

Looking ahead, the next evolution in generative AI is the development of agent-based systems that integrate multi-modal models and reasoning engines, which are typically LLMs. These systems break down problems into steps and select AI-enabled tools to execute each step, leveraging the results as context for subsequent steps. This approach enables more flexible and complex solutions. For example, medical agent systems can access electronic health records, imaging data, genetic data, clinical trials, medications, and biomedical literature to generate detailed responses and aid in informed decision-making by clinicians.

However, without careful optimization, these systems can be expensive to run, as they require numerous LLM calls and large numbers of tokens. Ongoing development in LLM optimization techniques, including hardware improvements (e.g., NVidia Blackwell, AWS Inferentia), framework enhancements (e.g., Mojo), and cloud optimizations (e.g., AWS Spot Instances), must be integrated with these solutions to optimize costs.

As organizations continue to mature in their use of LLMs, the focus will be on obtaining high-quality outputs (tokens) quickly and cost-effectively. Finding a partner with real-world experience running and optimizing genAI-backed solutions in production will be crucial in navigating this evolving landscape.

In conclusion, the AI Impact Tour provides a valuable platform for exploring the latest advancements in auditing AI models and gaining insights from industry leaders. The progress in generative AI, from LLMs to multi-modal models and agent-based systems, opens up transformative opportunities. However, it is essential to manage the costs associated with these advancements and continuously optimize LLM-based applications to achieve the desired outcomes.

Source link

Leave a Comment