AI Engineer Summit Review

Tools and infrastructure for the AI Ecosystem

Oct 21, 2023

Prometheus and the AI Engineers by the bay.

Summary

Last week, the first “AI Engineer Summit” was held in San Francisco, and it brought together startup founders, ML & AI engineers, open source evangelists, hackers, and AI researchers to share ideas on building the AI ecosystem and AI applications.

This was delayed from my intended review last week (see my personal note here), but I feel it is important to make a note event, for this reason: AI Engineers are the people building the AI Applications. If you want to know what will be the earth-shattering AI Applications of tomorrow, look to what AI Engineers are creating today - what problems they are solving, what creative ideas they are pursuing, etc.

The AI Engineer Summit served as a helpful guide for understanding the state of play of the AI Application stack and the AI ecosystem. You could break it into a few categories:

AI Coding Tools: AI tools that make software engineers and AI engineers more productive
AI Systems Infrastructure: The components that build out the AI ecosystem
AI Applications and Agents: The AI applications that - the end products that will
Concepts for building better AI systems: This covers AI UX.

AI Coding Tools for AI Engineers

Codium: Their pitch is a tool for software development, the whole process not just code-writing tool. The solution needs to be like GANs: You have a generative component and a critic component. Their demo showed how to run tests, review and correct tests in their system.

Replit: Replit makes a web-based coding run-time environment. They built an early AI capability into their system, called Ghostwriter. Their CEO declared "AI has redefined software creation,” claiming the AI-enhanced engineer will be 10x-1000x more productive in the next decade. They announced model Farm, to give access to models inside their IDE.

Replit also announced an updated open-source AI coding model: replit-code-v1.5-3b. Their repl-tuned version was trained on about 1 trillion tokens, and achieved 36% on human Eval python, impressive for such a small model, beating CodeLlama 7B and the davinci model, of last year, 50 times its size. They used a finding from the paper "Scaling Data-constrained Language Models": In a data-constrained regime, 4 times repetition of tokens is as good as unique data, but diminishing returns beyond that. So they did a 4-times repetition of data in training their AI model, producing excellent results relative to the parameter size.

AI Systems Infrastructure

The open source framework LangChain is perhaps the most important AI ecosystem component that helps build out AI applications. LangChain’s CEO and founder spoke on the latest in both LangChain and LangSmith, which is a testing and monitoring environment built on LangChain.

Roboflow provides developer friendly vision inferences. They have open sourced their inference server, and it can serve up models like CLIP and SAM that can be used for a variety of computer vision and image recognition tasks.

The Bloke spoke on running LLMs locally and on LLM quantization, incuding the Rust library similar to llama.cpp called llm.rs: “llm is an ecosystem of Rust libraries for working with large language models - it's built on top of the fast, efficient GGML library for machine learning.”

PromptHub.us provides templates for prompts, and gave a talk on prompting for better results:

Use an emotional prompt "This is very important to me” to get more helpful.
Use an “according to” modifier to avoid hallucinations, e.g. “according to wikipedia.”
Use multiple personas.

Retrieval Augmented Generation and Vector Databases

Chroma is a provider of vector databases, that gives LLM-based AI systems access to data and memory. They spoke on challenges in Retrieval Augmented Generation, or RAG. Some points they made:

Which embedding model? Information retrieval benchmarks may not fit use cases, so use human feedback.
You want the retrieval to get all relevant information, but not irrelevant information.
How to chunk the data in the vector database? Need to consider semantic content, natural structure, and context length. NLTK, langchain, and llamaindex support chunking & information hierarchies.
Is the retrieval result relevant? If not, then use reranking and human feedback.

Jerry Liu of LlamaIndex spoke on paradigms for inserting knowledge into LLMs, specifically covering issues of RAG and LLM fine-tuning. As with the Chroma talk, the RAG discussion was about improving upon the basic concept of RAG to get better results (response quality) than ‘naive’ RAG. Bad retrieval is due to low precision, low recall or outdated information. Ways to improve include:

Changing chunk sizes: Not obvious that more retrieved tokens means better results.
Metadata filtering, where you include structured info such as page numbers or document title. Raw semantic search is low precision, but metadata helps make for implied structured query, improves results.
Retrieve on smaller chunks, but give LLM more information. For example,
-Embed a smaller reference to the parent chunk.

The flavor of the talks are that we are still exploring how to use RAG effectively, and methods and design practices to make RAG more effective and efficient are being developed now.

Supabase: They are provider of Postgres databases and their AI relevant product is PG-Vector, the vector embedding extension to Postgres. They spoke about how they built this feature to become part of the AI stack.

AI UX - AI User Experience

Amelia Wattenberger of Adept spoke on the issue of AI UX. AI interfaces tend to have two goals: Automate and Augment, and augmentation is composed of smaller automations. How to use AI to help automate decision processes? Pointing out why “Chatbots are not the future” she used the example of how google maps works to hide and expose information seamlessly to highlight the importance of UI in expressing the value of AI.

Fixie: Fixie builds Fixie Platform and offer AI.JSX to create Reactive LLM-based Apps. AI.JSX is open source, provides for react integration for high-performance LLM interfaces. For example, you can perform RAG with Vector databases in 10 lines of code. They presented a “Dr Donuts” chat-bot demo showing how AI.JSX supports real-time voice conversations, grounded on documents and data.

Low-Code AI System Builders

Entrypoint AI: They presented a no-code fine-tuning platform for AI models, where you could do basic fine-tuning on openAI and other models, and create custom AI models without code. You need to identify the tasks, import examples, etc. They present this as a useful platform for creating narrow custom model types for business use cases: Blog writer, lead qualifier, issue prioritizer, data normalizer, email scrubber, fraud detector.

N8n provides “workflow automation for technical people” and now has n8n LangChain Integration: A low-code platform to build an LLM app in minutes with LangChain. You can integrate tailored AI functionality to custom datasets with LangChain and combine them with 398 pre-built connectors for automated workflows.

AI Agents

One of the first and most viral of the AI Agents to pop up this year was AutoGPT. AutoGPT gave a presentation that was mainly rehashing their viral success, which was stunning: AutoGPT hit 150k stars on Github, 462 contributors, and has an online community of 47,000. In just six months, it has hundreds of developers building on it and thousands of hackers trying it out.

Their stated goal is to make “AutoGPT the heart of the open-source agent ecosystem.” This vision is to use AI agents for menial mind tasks - “In this world, we are all AI Engineers.” They also announced Redpoint Ventures has invested $12 million in AutoGPT, and they were emphatic about keeping AutoGPT open source and are working with the AI Engineer Foundation.

One way to avoid reinventing the wheel is to standardize and unify common protocols. The AI Engineer Foundation project called agentprotocol.ai is to commonalize a REST API and schema for agent systems.

Flo Crivello of lindy.ai pitched the idea of personalized AI Agents.

AI Agents show great promise, but aren’t quite ready to take on many real tasks. The chasm from ‘interesting toy’ to ‘useful production tool’ hasn’t been crossed, but there are many people working on the problem and progress seems to be made weekly.

AI Project Hacker Advice

Hassan El Mghari, who goes by nutlope on X, has done a number of great AI projects that went viral, including roomGPT that achieved 2 million users. His advice on how to make AI magic happen:

KISS: Simplest apps can do really really well, because most people don't know
how to use the AI. Downscope the project to be an MVP (minimum viable product). Launch early and then iterate.
Don't build or fine tune your own AI models. Use the latest off-the-shelf AI models.
AI UX matters. Make it visually appealing, he spends 80% of his time on UI. Use next.js for frontend, Tailwind, and Vercel for deploying the apps.
Make it free and open source.

Summary

This first AI Engineer summit didn’t cover the gamut of what’s going on in the ever-growing AI ecosystem space, but it did cover some of the most important areas and bring some of the most important players. Likewise, this review doesn’t do justice to many excellent talks that I don’t have space to cover.

Missing were the big foundation AI model makers, like OpenAI and Google, but they didn’t need to be there. This was about the AI Engineers who build on top of the LLMs to make applications, and the tools and infrastructure they need to build AI applications. In any case, we will find out in early November about OpenAI’s plans for AI developers.

The keynotes from Simon Willison and Swyx that book-ended the summit both painted a picture of AI being in a very special moment, the start of something very big. The AI Revolution has just begun.

AI Changes Everything

Discussion about this post