Mistral’s New Le Chat and the Canvas Interface

Mistral upgrades Le Chat with a canvas interface, web search, and image generation, and introduces Pixtral Large. Improved AI interfaces can lead us to frictionless AI.

Nov 20, 2024

Figure 1. AI art – Paris shines again with the latest Mistral update.

Introduction

AI is as much a pursuit of seamless interaction between human and machine intelligence, as it is of frontier science that makes machines reason better. – Mistral AI

Despite recent news about upcoming AI models stalling in their capabilities, AI systems are advancing on many fronts, quickly. Every week brings news of more AI progress, better AI models, and new AI application features. The latest example is Mistral’s double-header announcement:

Mistral AI announced an updated Le Chat interface that includes web search, document and image understanding, a canvas interface for code editing and ideation, image generation (with Flux Pro), task agents.
Mistral AI announced Pixtral Large, a state-of-the-art (on MathVista, DocVQA, VQAv2 benchmarks) 124B open-weights multimodal AI model, build on Mistral Large 2 LLM.

We’ll share more about Mistral’s Le Chat interface and AI interfaces, but before we dive into that, let’s consider AI interfaces in the context of overall AI system capabilities.

The three tiers of AI: Reasoning, RAG, and Interfaces

To understand AI progress, we need to look at all AI systems capabilities, not just AI reasoning. It’s helpful to categorize those capabilities in a manner similar to general software application architectures. AI applications are built on three pillars of reasoning, knowledge, and interfaces.

Memory, computation, and communication are the fundamental activities of information processing performed on data. That is, you can store and recall data from a memory store, transform or process data through a computation, or communicate data from on interface to another. All IT systems can be built on those three categories.

This concept is well known in software architecture as the three-tier architecture. The three tiers:

The presentation tier, or user interface; the application tier, where data is processed; and the data tier, where application data is stored and managed.

FIgure 2. One view of the three tier architecture, with presentation, application and database tiers.

What does it look like to apply this triad to AI applications or an AI agent system?

Presentation layer - AI interfaces: An IT system communicates both between system components and external systems via interfaces. The interfaces are both input and output. This includes different modalities of inputs, and outputs in text or other modalities.

Data processing layer – AI reasoning: The LLM or AI model itself performs processing by transforming an AI model prompt input into a helpful response. This core AI model capability depends on the level of reasoning and the types of data that can be processed, i.e., modalities of text, audio, image, and video. Level of reasoning relates to how well it can extract correct understanding of queries and requests and synthesize reasoning responses.

Memory and data layer – AI knowledge retrieval: This layer relates to data, knowledge, or memory. An AI model’s capability in this area is based on the knowledge and data that the AI can access, either as a part of the AI model’s own context and internal parameter representations, or via capabilities like RAG that bring external knowledge sources into the AI model context window.

AI applications improve as their ability to extract, refine, reason over, and communicate knowledge improves. We have advanced AI capabilities on all three fronts of reasoning, RAG, and interfaces:

Improved reasoning with test-time compute, chain-of-though and the o1 model.
Improved knowledge retrieval with advanced RAG methods and larger context windows and connecting AI models to knowledge stores to expand knowledge capabilities.
Improved AI interfaces that are frictionless and as close to the user as possible are being developed, going beyond the chatbot with outputs in artifacts, in editable applications, or in a canvas. Added modalities such as voice, audio, and vision increase AI model usability and reach.

There are many ways to improve AI models and applications that go beyond scaling. All these avenues are being pursued to make AI better, which together could be as important for useful AI as achieving AGI.

The Canvas AI interface

While the text-based chatbot interface – the ‘killer app’ for AI - is flexible and able to handle a broad range of uses, it is not the best interface for many use cases. Users want AI to come to where they are in their workflow and have the AI interact within the user’s work environment.

For coding, that’s putting AI in a code editor; for AI image generation, the best output mode is an editable image canvas; for conversational AI, a voice-mode interface is most convenient. The best interface and modality will depend on the type of workflow and the specific application.

The canvas is a capable general interface for AI that for many applications is more useful than the chat interface.

Mistral’s canvas is not novel. Anthropic pioneered the feature with Claude Artifacts, released in June, then OpenAI followed up with ChatGPT Canvas in October. The Mistral Canvas confirms this new interface paradigm and improves on it. From Mistral:

No longer restricted to bidirectional conversations, Canvas enables you to directly modify, edit or transform content aided by the powerful reasoning capabilities of LLMs.

Mistral improves on the original artifacts by supporting an incremental iteration loop of creating and modifying code or written output in place.

Figure 3. You can use Mistral Canvas to create documents, presentations, code, and mockups, and modify output contents in place to create new drafts of an output.

Increasing Utility by becoming an Agent

Mistral not only brought in the Canvas interface, but they also added AI agent features matching and exceeding some of the features now in Claude and ChatGPT:

Web search with citations. Like how Perplexity and now ChatGPT Search works, Mistral le Chat will search the web to help answer knowledge-based queries.
Ability to analyze and summarize large, complex documents (PDFs) and images, using Pixtral Large to interpret image information.
Task agents. Mistral’s task agent is currently quite simple; it automates invoking the same prompt repeatedly.

Figure 4. The Mistral view on how Mistral’s le Chat compares to other leading chat and search assistants.

Conclusion - Frictionless AI

In past computer eras, software applications required the user to type in exact commands and enter correct switches and parameters without error to get a correct output. Software interfaces were unforgiving and brittle. The GUI and SaaS interfaces improved things, but complex systems required complex interfaces.

AI offers possibilities for truly intelligent and therefore flexible yet powerful interfaces. A frictionless interface is about making AI technology as friendly, helpful, and accommodating as possible. With the right AI interfaces and features, we can get to AI with frictionless interfaces - frictionless AI.

Mistral’s offering confirms some trends in AI development that could lead us towards frictionless AI:

Multi-modality: The best modality, be it voice or text, will be seamless to use. Vision is an important feature, and now with Pixtral Large, Mistral joins the Gemini, GPT-4o, and Claude models in having multi-modal vision capabilities.

Canvas interface with iteration: A canvas interface where the user can edit the work product via a prompt enables iterative feedback. This enables iterative improvement on work products such as AI-generated code, webpages, or data visualizations. A canvas interface with iteration helps the AI to interpret user intent clearer and generate accurate refined output.

Frontier models with robust prompt understanding: With existing AI models, prompt engineering is needed to give you the desired output. But as AI models improve, they can understand user direction better, allowing for less exact prompting. Even a short prompt can convey what they need. A more intelligent AI will be more robust to input prompts. The ultimate will be an AI that ‘reads your mind’ by knowing your prior preferences. As with iteration, it will need feedback over time to adjust to users’ real intentions.

Web search and knowledge grounding: The ability to leverage external knowledge with RAG and web search extends the LLMs knowledge grounding beyond in-context recall. The AI model becomes far less subject to hallucinations and therefore much more reliable, a vital feature for knowledge-based queries.

Agents and integrations: The ability by the AI to use and control computers enables the automation of many tasks and workflows. With more integrated AI applications and AI agents, you can insert the AI application into your workflow to replace human tasks with automation.

Thanks to Mistral’s latest announcement, multi-modality with vision, web search, canvas interfaces, and agentic integration are becoming more widespread among frontier models, to the point where they are table stakes for a frontier AI model. This is good news; improving AI interfaces and features will lead us to frictionless AI, and frictionless AI is more valuable AI.

AI Changes Everything

Discussion about this post