How to Build an AI Chatbot From Scratch: A Step-by-Step Guide
How to Build an AI Chatbot From Scratch: A Step-by-Step Guide
Building an AI chatbot is one of the best ways to understand how modern AI applications work under the hood. In this tutorial, we will build a fully functional chatbot with streaming responses, conversation memory, and a clean UI — then deploy it to production.
By the end, you will have a chatbot that rivals the basic functionality of ChatGPT's interface, running on your own infrastructure with your own API key.
Architecture Overview
Before writing code, let us map out what we are building:
┌─────────────┐ HTTP/SSE ┌──────────────┐ API Call ┌─────────────┐
│ React UI │ ───────────────▶ │ Node.js API │ ──────────────▶ │ LLM API │
│ (Frontend) │ ◀─────────────── │ (Backend) │ ◀────────────── │ (Claude/ │
│ │ Streamed tokens │ │ Streamed tokens │ OpenAI) │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ In-Memory │
│ Conversation│
│ Store │
└──────────────┘
The stack: React frontend, Express.js backend, and either the Anthropic or OpenAI API for the language model. We will use Server-Sent Events (SSE) for streaming.
Step 1: Choose Your Model API
You have two primary options for the LLM backend: Anthropic Claude API — Excellent for nuanced, longer-form responses. Claude's system prompts are powerful for shaping chatbot personality. The API uses a messages-based format that maps cleanly to chat interfaces. OpenAI GPT API — The most widely documented option. GPT-4o provides fast, capable responses. The Chat Completions API is straightforward. For this tutorial, we will use the Anthropic Claude API, but the architecture works identically with OpenAI — you only swap out the API call in one function. Get your API key: Sign up at console.anthropic.com, create a project, and generate an API key. Store it securely — never commit it to version control.Step 2: Set Up the Backend
Initialize a Node.js project and install dependencies:mkdir ai-chatbot && cd ai-chatbot
npm init -y
npm install express cors @anthropic-ai/sdk dotenv uuid
Create your environment file:
# .env
ANTHROPIC_API_KEY=sk-ant-your-key-here
PORT=3001
Now build the Express server. Create server.js:
import express from 'express';
import cors from 'cors';
import Anthropic from '@anthropic-ai/sdk';
import { randomUUID } from 'crypto';
import 'dotenv/config';
const app = express();
app.use(cors());
app.use(express.json());
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// In-memory conversation store
const conversations = new Map();
const SYSTEM_PROMPT = You are a helpful, knowledgeable assistant.
You give clear, concise answers and ask clarifying questions
when a request is ambiguous. You format responses with markdown
when it improves readability.;
app.listen(process.env.PORT || 3001, () => {
console.log(Server running on port ${process.env.PORT || 3001});
});
This gives us a running server with the Anthropic client initialized and a Map to store conversation histories.
Step 3: Build the Chat Endpoint with Streaming
The key to a responsive chatbot is streaming. Instead of waiting for the entire response to generate (which can take 10-30 seconds for long answers), we stream tokens to the frontend as they are produced. Add this endpoint toserver.js:
app.post('/api/chat', async (req, res) => {
const { message, conversationId } = req.body;
// Get or create conversation
const convId = conversationId || randomUUID();
if (!conversations.has(convId)) {
conversations.set(convId, []);
}
const history = conversations.get(convId);
// Add user message to history
history.push({ role: 'user', content: message });
// Set up SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Send conversation ID first
res.write(data: ${JSON.stringify({ type: 'id', conversationId: convId })}\n\n);
try {
let fullResponse = '';
const stream = anthropic.messages.stream({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: SYSTEM_PROMPT,
messages: history,
});
stream.on('text', (text) => {
fullResponse += text;
res.write(data: ${JSON.stringify({ type: 'token', content: text })}\n\n);
});
stream.on('finalMessage', () => {
// Save assistant response to history
history.push({ role: 'assistant', content: fullResponse });
res.write(data: ${JSON.stringify({ type: 'done' })}\n\n);
res.end();
});
stream.on('error', (error) => {
console.error('Stream error:', error);
res.write(data: ${JSON.stringify({ type: 'error', message: error.message })}\n\n);
res.end();
});
} catch (error) {
console.error('API error:', error);
res.write(data: ${JSON.stringify({ type: 'error', message: 'Failed to generate response' })}\n\n);
res.end();
}
});
Let us break down what this does:
.stream() method returns an event emitter that fires text events as tokens arrive.Step 4: Add Conversation Management
Users need to start new conversations and retrieve existing ones. Add these endpoints:// List conversations (returns IDs and first message preview)
app.get('/api/conversations', (req, res) => {
const list = [];
for (const [id, messages] of conversations) {
if (messages.length > 0) {
list.push({
id,
preview: messages[0].content.substring(0, 80),
messageCount: messages.length,
lastUpdated: Date.now(),
});
}
}
res.json(list);
});
// Get full conversation history
app.get('/api/conversations/:id', (req, res) => {
const history = conversations.get(req.params.id);
if (!history) {
return res.status(404).json({ error: 'Conversation not found' });
}
res.json({ id: req.params.id, messages: history });
});
// Delete a conversation
app.delete('/api/conversations/:id', (req, res) => {
conversations.delete(req.params.id);
res.json({ success: true });
});
Step 5: Build the Chat UI
For the frontend, create a React application. We will keep it focused on the chat functionality:npm create vite@latest client -- --template react
cd client
npm install
Replace src/App.jsx with the chat interface:
import { useState, useRef, useEffect } from 'react';
import './App.css';
function App() {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const [conversationId, setConversationId] = useState(null);
const messagesEndRef = useRef(null);
const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
};
useEffect(() => { scrollToBottom(); }, [messages]);
const sendMessage = async () => {
if (!input.trim() || isStreaming) return;
const userMessage = input.trim();
setInput('');
setMessages(prev => [...prev, { role: 'user', content: userMessage }]);
setIsStreaming(true);
// Add empty assistant message that we will stream into
setMessages(prev => [...prev, { role: 'assistant', content: '' }]);
try {
const response = await fetch('http://localhost:3001/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userMessage,
conversationId,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = JSON.parse(line.slice(6));
if (data.type === 'id') {
setConversationId(data.conversationId);
} else if (data.type === 'token') {
setMessages(prev => {
const updated = [...prev];
const last = updated[updated.length - 1];
last.content += data.content;
return updated;
});
} else if (data.type === 'error') {
console.error('Stream error:', data.message);
}
}
}
} catch (error) {
console.error('Request failed:', error);
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1].content = 'Sorry, something went wrong. Please try again.';
return updated;
});
} finally {
setIsStreaming(false);
}
};
const handleKeyDown = (e) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
sendMessage();
}
};
return (
<div className="chat-container">
<header className="chat-header">
<h1>AI Chatbot</h1>
<button onClick={() => { setMessages([]); setConversationId(null); }}>
New Chat
</button>
</header>
<div className="messages">
{messages.map((msg, i) => (
<div key={i} className={message ${msg.role}}>
<div className="message-content">{msg.content}</div>
</div>
))}
<div ref={messagesEndRef} />
</div>
<div className="input-area">
<textarea
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={handleKeyDown}
placeholder="Type your message..."
rows={1}
disabled={isStreaming}
/>
<button onClick={sendMessage} disabled={isStreaming || !input.trim()}>
{isStreaming ? '...' : 'Send'}
</button>
</div>
</div>
);
}
export default App;
Step 6: Handle Edge Cases
A production chatbot needs to handle several things that tutorials often skip.Token Limit Management
Conversation histories grow indefinitely, but the API has a context window limit. Add a function to trim old messages when the conversation gets too long:function trimHistory(messages, maxTokenEstimate = 150000) {
// Rough estimate: 1 token ≈ 4 characters
const estimateTokens = (msgs) =>
msgs.reduce((sum, m) => sum + Math.ceil(m.content.length / 4), 0);
while (messages.length > 2 && estimateTokens(messages) > maxTokenEstimate) {
// Remove the oldest user-assistant pair, keeping the first message for context
messages.splice(1, 2);
}
return messages;
}
Call trimHistory(history) before passing messages to the API. This preserves the first message (which often sets context) while removing older exchanges from the middle.
Rate Limiting
Protect your API key from abuse with basic rate limiting:import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 60 1000, // 1 minute
max: 20, // 20 requests per minute per IP
message: { error: 'Too many requests. Please wait a moment.' },
});
app.use('/api/chat', limiter);
Graceful Error Recovery
When the API returns errors — rate limits, overloaded servers, invalid requests — your chatbot should not just crash. The streaming error handler we built earlier catches API-level errors, but you should also handle network timeouts:const stream = anthropic.messages.stream({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
system: SYSTEM_PROMPT,
messages: trimHistory(history),
}).on('error', (error) => {
if (error.status === 429) {
res.write(data: ${JSON.stringify({
type: 'error',
message: 'Rate limited. Please wait 30 seconds and try again.'
})}\n\n);
} else {
res.write(data: ${JSON.stringify({
type: 'error',
message: 'An error occurred. Please try again.'
})}\n\n);
}
res.end();
});
Step 7: Add Markdown Rendering
AI responses frequently contain markdown — code blocks, lists, headers, bold text. Rendering raw markdown in the browser looks terrible. Add a markdown renderer to the frontend:cd client
npm install react-markdown remark-gfm rehype-highlight
Update the message display component:
import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import rehypeHighlight from 'rehype-highlight';
// Inside the messages map:
<div className="message-content">
{msg.role === 'assistant' ? (
<ReactMarkdown remarkPlugins={[remarkGfm]} rehypePlugins={[rehypeHighlight]}>
{msg.content}
</ReactMarkdown>
) : (
msg.content
)}
</div>
This gives you GitHub-flavored markdown with syntax-highlighted code blocks. The visual improvement is dramatic — responses with code snippets, tables, or structured lists become actually readable.
Step 8: Deploy to Production
For deployment, we need to combine the frontend and backend into a single deployable unit.Build the Frontend
cd client
npm run build
This creates a dist/ folder with static files.
Serve Static Files from Express
Add this to yourserver.js, after your API routes:
import path from 'path';
import { fileURLToPath } from 'url';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
// Serve the built React app
app.use(express.static(path.join(__dirname, 'client', 'dist')));
// Catch-all: serve index.html for client-side routing
app.get('', (req, res) => {
res.sendFile(path.join(__dirname, 'client', 'dist', 'index.html'));
});
Deploy to a Cloud Provider
Railway or Render (simplest): Push your repo to GitHub, connect it to Railway or Render, set theANTHROPIC_API_KEY environment variable, and deploy. Both platforms detect Node.js automatically and handle the rest.
Docker (most portable):
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN cd client && npm ci && npm run build
EXPOSE 3001
CMD ["node", "server.js"]
Build and run: docker build -t chatbot . && docker run -p 3001:3001 --env-file .env chatbot
Production Checklist
Before going live, verify these items:- Environment variables are set on the hosting platform, not hardcoded
- CORS is restricted to your actual domain instead of allowing all origins
- Rate limiting is configured appropriately for your expected traffic
- HTTPS is enabled (most platforms handle this automatically)
- Error logging is connected to a service like Sentry or LogTail so you catch issues in production
- Conversation cleanup — add a TTL to your conversation store so old conversations are deleted after 24 hours, or switch to Redis for persistent storage with built-in expiration
Going Further
This chatbot is functional but intentionally minimal. Here are high-impact improvements worth implementing:
Persistent storage. Replace the in-memory Map with PostgreSQL or Redis. This lets conversations survive server restarts and enables multi-server deployments. Authentication. Add user accounts so conversations are private. A simple JWT-based auth system works well. Libraries likepassport.js or lucia-auth handle the heavy lifting.
File uploads. Claude's API supports image inputs. Add a file upload endpoint that converts images to base64 and includes them in the messages array. This enables vision-based conversations.
System prompt customization. Let users configure the chatbot's personality. Store system prompts per conversation and let users modify them through a settings panel.
Streaming markdown. Our current implementation re-renders the full markdown on every token. For smoother performance, look into incremental markdown parsing libraries that only process new content.
The core architecture we built — SSE streaming, conversation state management, and a clean separation between frontend and backend — scales cleanly as you add these features. Each improvement is additive rather than requiring a rewrite, which is the sign of a solid foundation.