[System Design] AI Chatbot

Requirement

Functional Requirement

Chat
1. Persistent Chat History
2. Biz Knowledge Chat
3. Multiple Chat Window
Natural Language Understanding and Reply
Features
1. Task Detection
2. Tool
3. Long/short term user behavior memory
4. Web Search

Non-Functional Requirement

Scalability
1. 100M Users
2. (More?)
High Availability
1. 99.99% SLA
Low Latency
1. <1s response latency

Architecture

Singleton Service
Backend Frontend Separate
Microservice

Feature Implementation

Single Chat
1. Persistent Chat History ⇒ RDB/No SQL/Object Storage(Multi-media storage)
2. Biz Knowledge Chat ⇒ (RAG, retrieval augmented generation)
Natural Language Understanding and Reply ⇒ (Rule Based NLP ⇒ LLM)
Features
1. Task Detection ⇒ LLM
2. Tool ⇒ MCP? LLM Tool/Function Calling
3. Long/short term user behavior memory ⇒ DB
4. Web Search ⇒ web search tool

Persistent Chat History

Table Design

chat_tab {
  chat_id uint64 primary_key,
  user_id uint64 index,
  title char(16),
  c_time uint64,
  m_time uint64
}

message_tab {
  message_id uint64 primary_key,
  chat_id uint64 index,
  user_id uint64,
  content text(~10,000),
  c_time uint64,
  m_time uint64
}

Storage

one chat, 10 messages(5 users, 5 bots, avg content length 500bytes)

50 bytes + 600 bytes * 10 ~= 6000 bytes ⇒ 10KB

100M users, 20% DAU, 10 chat a day

100M * 10 * 10KB * 20% = 2TB

365 * 2TB = 730 TB / year

db sharding

chat_tab by user_id

message_tab by chat_id

db partitioning

by user_id?

Chat

API interface

/api/v1/chat

request_body {
  user_id,
  chat_id(optional),
  messages,
  stream
}

respone_body {
  chat_id
  messages,
}

connection type

HTTP polling
HTTP long polling(one HTTP longactive connection
Websocket

QPS

one chat, 10 messages

100M users, 20% DAU, 10 chat a day

avg QPS

100M * 20 % * 10 * 5 /86400 = 1B /100k = 10K/sec

peak(5 times)

50k/sec

latency

200ms server-client latency
100ms server processing
600ms LLM ttft(time to first token)

High Availability

load balancer
cache

Trade-off

Large model VS Small model

Cost VS Accuracy/Inference

API provider VS Local deployment

Cost VS Latency/Efficiency

[System Design] AI Chatbot

Requirement

Functional Requirement​

Non-Functional Requirement​

Architecture​

Feature Implementation​

Persistent Chat History​

Table Design​

Storage​

db sharding​

db partitioning​

Chat​

API interface​

connection type​

QPS​

latency​

High Availability​

Trade-off​