Adding Multi-Modal Support to a Chatbot Without Rebuilding Backend
This case study shows how we integrated images, audio, and documents into a chatbot with Acontext. Unified session management and message storage reduced a 5–7 day build to just one day.

TL;DR
We added multi-modal support (images, audio, and documents) to a chatbot using Acontext, which simplified session management, message storage, and multi-format message conversion. What would have taken us 5-7 days to implement from scratch was reduced to 1 day with Acontext. In this post, we break down the process, from integrating Acontext for session management to enabling multi-modal capabilities, showing how this approach can save developers significant time and effort.
The Goal: Enabling Multi-Modal Support in a Chatbot
The objective was to add support for images, audio, and documents to the chatbot, while ensuring that the conversation history and user sessions were handled properly. These features not only enhance the chatbot's interaction capabilities but also demand a backend that can handle a variety of data types and provide seamless storage and retrieval.
Challenges We Faced:
- Message Format Conversion: Different LLM providers, such as OpenAI, Anthropic, and Gemini, use different message formats, which complicates integration.
- Multi-Modal Handling: Managing images, audio, and documents across various platforms required writing custom handling code for each format.
- Session Management: Rolling our own session management meant dealing with databases, edge cases, and migrations.
- Token Management: Truncating conversation history to stay within token limits while preserving important context was a tricky problem to solve.
How Acontext Simplified the Process
Step 1: Setting Up Basic Session Management
The first task was to set up session management. Acontext made it easy by handling session storage automatically. Here’s what we did:
- Set up the Acontext client: We integrated the Acontext client into the project to manage message persistence and session creation.
- Replaced in-memory message handling: Instead of manually managing message history, we used Acontext sessions to automatically store and retrieve user messages.
Step 2: Adding Multi-Modal Support
With basic session management in place, we added multi-modal support for images, audio, and documents:
- File upload UI: We built a simple user interface to upload images, audio, and documents.
- Convert files to base64: The files were encoded into base64 format, making them compatible with the message format we needed for OpenAI.
- Store multi-modal messages: Acontext handled the storage and conversion of multi-modal content, ensuring compatibility with multiple AI models.
Time Estimate for Integration
Task | Time |
Basic integration | ~2-3 hours |
Multi-modal support | ~3-4 hours |
Testing & debugging | ~2 hours |
Total | ~1 day |
Challenges Without a Backend vs. Acontext Approach
Here's a comparison of what we would have had to do manually versus how Acontext simplified each step:
Challenge | Without Acontext | With Acontext |
Message Format Conversion | Each LLM provider (OpenAI, Anthropic, Gemini) uses different message formats, requiring manual conversion between them. This process is tedious and error-prone. | Acontext automatically handles message format conversion, allowing messages to be stored in one format (e.g., OpenAI) and retrieved in another (e.g., Anthropic). |
Multi-Modal Complexity | Handling images, audio, and documents requires building different JSON structures for each LLM making the integration complex and prone to mistakes. | Acontext simplifies multi-modal support by automatically handling the storage and format conversion of images, audio, and documents with base64 encoding. |
Session Persistence | Rolling session management system requires building and maintaining databases, managing edge cases, and dealing with data migrations. | Acontext provides built-in session management, automatically storing messages and creating sessions without requiring custom database schemas. |
Token Management | Truncating conversation history while maintaining context coherence requires careful implementation, as each provider has different token limits. | Acontext provides edit_strategies with token_limit, making it easy to properly truncate history and maintain context coherence automatically. |
Possible Workloads Without Acontext
Step | Time |
Database schema design | 4-6 hours |
Message storage API | 6-8 hours |
Multi-modal format handling | 8-12 hours |
Multi-provider conversion | 10-15 hours |
Token management | 4-6 hours |
Testing & edge cases | 6-8 hours |
Total | 5-7 days |
Key Takeaway:
What would have taken 3-5 days to build manually (session management, format conversion, multi-modal handling) was reduced to 1 day with Acontext. The biggest win was not having to worry about message format differences between LLM providers — just store and retrieve, and Acontext handles the rest.
Step-by-Step Breakdown
Current Message Flow
User Input → API /api/chat → OpenAI (streamText) → Stream Response
↓
Memobase stores messages → Extract user profiles/eventsText-only messages - No support for images, audio, or documents
No session persistence - Messages are not persisted in Acontext format
Target Message Flow
User Input (text/image/audio/document)
↓
API /api/chat
↓
┌───────────────────────────────────────┐
│ 1. Get/Create Acontext Session │
│ 2. Store user message to Acontext │
│ 3. Call OpenAI with multi-modal input │
│ 4. Store assistant response │
└───────────────────────────────────────┘
↓
Stream Response + Update UIDetailed Migration Steps
Phase 1: Basic Integration (Replace Message Storage)
Step 1.1: Install Dependencies
pnpm add @acontext/acontext
Step 1.2: Create Acontext Client
New file: utils/acontext/client.ts
import { AcontextClient } from '@acontext/acontext';
export const acontextClient = new AcontextClient({
apiKey: process.env.ACONTEXT_API_KEY!
});Step 1.3: Add Environment Variables
Modify file: .env.example and .env
# Acontext ConfigurationACONTEXT_API_KEY=sk-ac-your-api-key
Step 1.4: Update Chat API for Acontext
Modify file: app/api/chat/route.ts
import { openai } from "@/lib/openai";
import { jsonSchema, streamText } from "ai";
import { createClient } from "@/utils/supabase/server";
import { acontextClient } from "@/utils/acontext/client";
export const maxDuration = 30;
export async function POST(req: Request) {
const supabase = await createClient();
const { data, error } = await supabase.auth.getUser();
if (error || !data?.user) {
return new Response("Unauthorized", { status: 401 });
}
try {
const { messages, tools, sessionId } = await req.json();
// 1. Get or create Acontext Session
let session;
if (sessionId) {
session = { id: sessionId };
} else {
session = await acontextClient.sessions.create({
user: data.user.id
});
}
// 2. Store user message to Acontext
const lastUserMessage = messages[messages.length - 1];
await acontextClient.sessions.storeMessage(session.id, lastUserMessage, {
format: 'openai'
});
// 3. Build system prompt
const systemPrompt = `You're Memobase Assistant, a helpful assistant that demonstrates the capabilities of Memobase Memory.`;
// 4. Call LLM
const result = streamText({
model: openai(process.env.OPENAI_MODEL!),
messages,
system: systemPrompt,
tools: Object.fromEntries(
Object.entries<{ parameters: unknown }>(tools).map(([name, tool]) => [
name,
{ parameters: jsonSchema(tool.parameters!) },
])
),
});
// 5. Store assistant response after completion
result.then(async (finalResult) => {
const text = await finalResult.text;
if (text) {
await acontextClient.sessions.storeMessage(session.id, {
role: 'assistant',
content: text
}, { format: 'openai' });
}
});
return result.toDataStreamResponse({
headers: {
"x-session-id": session.id,
},
});
} catch (error) {
console.error(error);
return new Response("Internal Server Error", { status: 500 });
}
}Phase 2: Multi-modal Support (OpenAI)
Step 2.1: Create File Upload Component
New file: components/file-upload.tsx
"use client";
import { useRef } from "react";
import { Button } from "@/components/ui/button";
import { ImageIcon, FileIcon, MicIcon } from "lucide-react";
export type AttachmentType = 'image' | 'audio' | 'document';
export interface Attachment {
type: AttachmentType;
base64: string;
mimeType: string;
filename: string;
}
interface FileUploadProps {
onFileSelect: (attachment: Attachment) => void;
disabled?: boolean;
}
export function FileUpload({ onFileSelect, disabled }: FileUploadProps) {
const imageInputRef = useRef<HTMLInputElement>(null);
const audioInputRef = useRef<HTMLInputElement>(null);
const docInputRef = useRef<HTMLInputElement>(null);
const handleFileChange = async (
e: React.ChangeEvent<HTMLInputElement>,
type: AttachmentType
) => {
const file = e.target.files?.[0];
if (!file) return;
const reader = new FileReader();
reader.onload = () => {
const base64 = (reader.result as string).split(',')[1];
onFileSelect({
type,
base64,
mimeType: file.type,
filename: file.name,
});
};
reader.readAsDataURL(file);
// Reset input
e.target.value = '';
};
return (
<div className="flex gap-1">
<input
ref={imageInputRef}
type="file"
accept="image/png,image/jpeg,image/gif,image/webp"
className="hidden"
onChange={(e) => handleFileChange(e, 'image')}
/>
<input
ref={audioInputRef}
type="file"
accept="audio/wav,audio/mp3,audio/webm"
className="hidden"
onChange={(e) => handleFileChange(e, 'audio')}
/>
<input
ref={docInputRef}
type="file"
accept=".pdf"
className="hidden"
onChange={(e) => handleFileChange(e, 'document')}
/>
<Button
variant="ghost"
size="icon"
disabled={disabled}
onClick={() => imageInputRef.current?.click()}
title="Upload image"
>
<ImageIcon className="h-4 w-4" />
</Button>
<Button
variant="ghost"
size="icon"
disabled={disabled}
onClick={() => audioInputRef.current?.click()}
title="Upload audio"
>
<MicIcon className="h-4 w-4" />
</Button>
<Button
variant="ghost"
size="icon"
disabled={disabled}
onClick={() => docInputRef.current?.click()}
title="Upload document"
>
<FileIcon className="h-4 w-4" />
</Button>
</div>
);
}Step 2.2: Create Multi-modal Message Builder
New file: lib/multimodal.ts
import type { Attachment } from "@/components/file-upload";
/**
* Build OpenAI-format multi-modal message content
*/
export function buildMultimodalContent(
text: string,
attachments?: Attachment[]
): string | Array<{ type: string; [key: string]: any }> {
// If no attachments, return plain text
if (!attachments || attachments.length === 0) {
return text;
}
const content: Array<{ type: string; [key: string]: any }> = [];
// Add text part
if (text) {
content.push({ type: 'text', text });
}
// Add attachment parts
for (const attachment of attachments) {
switch (attachment.type) {
case 'image':
content.push({
type: 'image_url',
image_url: {
url: `data:${attachment.mimeType};base64,${attachment.base64}`,
detail: 'auto'
}
});
break;
case 'audio':
content.push({
type: 'input_audio',
input_audio: {
data: attachment.base64,
format: attachment.mimeType.split('/')[1] || 'wav'
}
});
break;
case 'document':
// Note: OpenAI doesn't natively support PDF in chat
// Store in Acontext for reference, but convert to text description
content.push({
type: 'text',
text: `[Attached document: ${attachment.filename}]`
});
break;
}
}
return content;
}
/**
* Build complete OpenAI-format message with attachments
*/
export function buildUserMessage(
text: string,
attachments?: Attachment[]
) {
return {
role: 'user' as const,
content: buildMultimodalContent(text, attachments)
};
}Step 2.3: Update Chat API for Multi-modal
Modify file: app/api/chat/route.ts
import { openai } from "@/lib/openai";
import { jsonSchema, streamText } from "ai";
import { createClient } from "@/utils/supabase/server";
import { acontextClient } from "@/utils/acontext/client";
export const maxDuration = 30;
export async function POST(req: Request) {
const supabase = await createClient();
const { data, error } = await supabase.auth.getUser();
if (error || !data?.user) {
return new Response("Unauthorized", { status: 401 });
}
try {
const { messages, tools, sessionId } = await req.json();
// 1. Get or create Acontext Session
let session;
if (sessionId) {
session = { id: sessionId };
} else {
session = await acontextClient.sessions.create({
user: data.user.id
});
}
// 2. Store user message to Acontext (supports multi-modal)
const lastUserMessage = messages[messages.length - 1];
await acontextClient.sessions.storeMessage(session.id, lastUserMessage, {
format: 'openai'
});
// 3. Build system prompt
const systemPrompt = `You're Memobase Assistant, a helpful assistant that demonstrates the capabilities of Memobase Memory.`;
// 4. Call OpenAI (GPT-4o supports vision)
const result = streamText({
model: openai(process.env.OPENAI_MODEL!), // Use gpt-4o for multi-modal
messages,
system: systemPrompt,
tools: tools ? Object.fromEntries(
Object.entries<{ parameters: unknown }>(tools).map(([name, tool]) => [
name,
{ parameters: jsonSchema(tool.parameters!) },
])
) : undefined,
});
// 5. Store assistant response after completion
result.then(async (finalResult) => {
const text = await finalResult.text;
if (text) {
await acontextClient.sessions.storeMessage(session.id, {
role: 'assistant',
content: text
}, { format: 'openai' });
}
});
const lastMessage = Array.isArray(lastUserMessage.content)
? lastUserMessage.content.find((c: any) => c.type === 'text')?.text || ''
: lastUserMessage.content;
return result.toDataStreamResponse({
headers: {
"x-session-id": session.id,
"x-last-user-message": encodeURIComponent(lastMessage),
},
});
} catch (error) {
console.error(error);
return new Response("Internal Server Error", { status: 500 });
}
}Step 2.4: Update Frontend Page
Modify file: app/page.tsx (key changes)
// Add state for session and attachments
const [sessionId, setSessionId] = useState<string | null>(null);
const [attachments, setAttachments] = useState<Attachment[]>([]);
// Update runtime config
const runtime = useChatRuntime({
api: `${process.env["NEXT_PUBLIC_BASE_PATH"] || ""}/api/chat`,
body: {
sessionId,
},
onResponse: (response) => {
if (response.status !== 200) return;
// Get session ID from response
const newSessionId = response.headers.get("x-session-id");
if (newSessionId && !sessionId) {
setSessionId(newSessionId);
}
const message = response.headers.get("x-last-user-message") || "";
lastUserMessageRef.current = decodeURIComponent(message);
},
// ... rest of config
});
// Clear attachments after sending
const handleSend = () => {
setAttachments([]);
};In short, Acontext eliminates backend complexity, saving you time on tasks such as session management, format conversion, and multi-modal handling.
Save yourself days of work: Try Acontext now↗.