July 29, 2024: What does Llama-3.1 mean for agents?

Tool calling makes Meta's new models fully agent-capable, AgentOps comes to AutoGen, and more

Jul 29, 2024

🔍 Spotlight

Meta’s much-anticipated Llama-3.1 models are out, bringing new features that make the open-source large language models easier to use for building agentic systems.

The latest in Meta’s Llama series of open-source LLMs, excitement for the 3.1 generation been building ever since the release of Llama-3 in April of this year, particularly as it was announced that the largest would be a 405 billion parameter model, comparable in size and performance to flagship closed-source LLMs such as GPT-4 and Claude Opus. Released on July 23 (perhaps as a birthday present to your humble author), Llama-3.1 comes in three sizes—8, 70, and 405 billion parameters—and incorporates a slew of new features, such as a much longer context length of 128,000 tokens (versus the previous 8,192), multilingual capabilities, and a license that even allows its outputs to be used to train other LLMs.

One particular enhancement, however, is particularly critical to agent builders: tool calling. A feature which already comes standard with OpenAI and Anthropic’s models, tool calling allows the LLM to invoke pre-defined functions to perform actions which would otherwise be difficult or impossible for a language model—searching the web, solving mathematical problems, or executing Python code, for instance. The new Llama models come standard with three functions which respectively address each of these tasks, as well as the ability for users to define their own custom functions and equip the LLM with them.

With this essential piece in place, Meta appears to be leaning into agents as a use-case, mentioning them several times in the announcement and concurrently releasing Llama Agentic System, a Python package designed to facilitate agent building using the new models. Third-party providers recognized the significance of Llama-3.1’s tool capabilities as well, with the LLM orchestration platform LangChain pointing out that they bring fully local agents run on one’s own hardware closer to reality, something previously feasible only with closed-source providers such as OpenAI and Anthropic.

While the AI agent field is evolving rapidly and unpredictably, the release of Llama’s latest generation undoubtedly represents a significant step forward in the democratization of their capabilities, bringing the power of agentic systems to smaller enterprises and individual builders on their own terms, rather than those of large LLM providers.

📰 News

AutoGen teams up with AgentOps to enhance agent observability

“AgentOps” has become increasingly common lingo in the same way that MLOps and LLMOps did before it, and a startup of the same name has positioned itself as a leader in the field. Now, AgentOps’ tech has been integrated into Microsoft’s AutoGen agent framework as its official observability tool.

Agent hackathon winners announced

The AI Agents 2.0 hackathon, held July 20-21 in San Francisco by agent startups MultiOn and the aforementioned AgentOps, announced its winning entries. These included tools for identifying relevant business trends, personal assistants for maintaining focus and applying for jobs, a real-time debate fact-checker, and more.

AgentGPT founders raise funds for new web-scraping agent startup

AgentGPT—along with AutoGPT, BabyAGI, and others—was part of the original wave of AI agents which provided the field’s first proofs-of-concept in spring 2023. Now, its builders are back with a new startup called Reworkd. Backed by funding from Paul Graham, Y Combinator, and others, it aims to build a new generation of agents which can intelligently scrape the web.

Flowise adds Sequential Agent Workflows

Flowise, the popular low-code GenAI and agent framework, is out with a major update adding a feature called “Sequential Agent Workflows”, which enables agents to exert control over the sequence in which flow blocks are executed.

Vijil raises $6 million to audit agent outputs

One of the most critical failure modes of AI agents is the risk that they will create false, toxic, or even dangerous outputs. The new startup Vijil has raised $6M to build out their platform, providing suites of automated tests for agent builders to allow them to ensure the safety of their agents’ generations.

LLM-based agent discovers new exoplanets?

Identifying exoplanets in astronomical data involves a complex series of steps which require expert human judgements and do not lend themselves easily to machine learning alone. The author of this piece claims to have used an agentic system based on fine-tuned Llama3 models to identify a new exoplanet candidate, though it does not link to any corroborating paper or data.

🧪 Research

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

The excitedly expected technical report for OpenDevin, an open-source rival to Cognition’s Devin software engineering agent. OpenDevin utilizes a multi-agent system capable of browsing the web, running code, and using a command line interface to perform code creation and repair.

Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

Agent-E is a novel web-browsing agent which employs hierarchical planning and a novel method for removing noise from webpages to achieve SOTA results on the WebVoyager benchmark even without multimodality. The agent is built with the AutoGen framework and won plaudits from AutoGen’s creator Chi Wang.

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

The authors of this paper introduce Recursive IntroSpEction, a method for enabling LLMs to fine-tune themselves by generating responses to a prompt, scoring the answers, and using the results to fine-tune itself.

TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON

TaskGen is a modular agent system which functions by recursively delegating each task to a function or another agent, saving on token costs. It achieves impressive results across multiple types of agent tasks, including maze navigation, web browsing, and the MATH dataset of mathematical reasoning problems.

🛠️ Useful stuff

A roundup of useful agent and RAG apps

A curated list of LLM-based apps, including AI agents and RAG applications.

A cookbook for building a standard genAI application

Agent builders often find that the systems they build converge on similar architectures. The author of this piece abstracts this lesson into a general scheme for an agent with guardrails, database access, model routing, and more, presenting a highly detailed walkthrough on how to build the described system.

💡 Analysis

Why agents are the next frontier of generative AI

A new report by McKinsey giving an overview of AI agents, particularly for business leaders, and enumerating some of the most promising use cases. The authors present a bullish take on the future of agents, while acknowledging that the technology is still in an early stage of development.

AI agents will outnumber humans, Zuckerberg predicts

In a new interview, Meta founder and CEO Mark Zuckerberg predicted that AI agents would eventually grow to outnumber humans, with each business represented by its own agent in the same way that each currently has an email and website.

Building AI Agents

Discussion about this post