Creating an AI Chat Assistant with Streamlit and OpenRouter

Building AI-powered web applications has never been easier. By combining Streamlit (a Python framework for rapid web app development) with OpenRouter (a unified API gateway for top language models), you can create a flexible chat application that switches between different AI models like GPT-4, Claude 3, and Llama 3—without changing a single line of code.

What is OpenRouter?

OpenRouter acts as a unified API endpoint for multiple large language models (LLMs). Instead of managing separate API keys, endpoints, and request formats for OpenAI, Anthropic, Meta, and others, OpenRouter provides a single interface to access them all.

Benefits of Using OpenRouter

Model Agnostic: Switch between GPT-4, Claude, Llama, and more with one parameter change
Cost Optimization: Choose models based on price/performance needs
Simplified Integration: One API endpoint, one authentication method
Fallback Options: Automatically switch to alternative models if one is down

Prerequisites

Before we begin, ensure you have the following:

Python installed (3.8+)
Streamlit: pip install streamlit
Requests: pip install requests
OpenRouter API Key: Get it at openrouter.ai

Building the Chat Application

Let's build a complete chat interface that sends messages to any OpenRouter-supported model.

Complete Code

import streamlit as st
import requests
import json

# App title
st.title("OpenRouter Streamlit Chatbot")

# Sidebar for API Key and Model Selection
with st.sidebar:
    st.header("Settings")
    api_key = st.text_input("OpenRouter API Key", type="password")
    model = st.selectbox(
        "Select Model",
        [
            "openai/gpt-3.5-turbo",
            "anthropic/claude-3-haiku",
            "meta-llama/llama-3-8b-instruct"
        ]
    )
    st.markdown("[Get API Key](https://openrouter.ai)")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Chat input
if prompt := st.chat_input("Ask me anything..."):
    if not api_key:
        st.warning("Please enter your OpenRouter API Key in the sidebar.")
        st.stop()

    # Add user message to history
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # API Request configuration
    response_url = "https://openrouter.ai/api/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "HTTP-Referer": "http://localhost:8501",  # Optional: For ranking on OpenRouter
        "X-Title": "OpenRouter Streamlit App",
    }
    data = {
        "model": model,
        "messages": st.session_state.messages
    }

    # Call OpenRouter API
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = requests.post(
                response_url,
                headers=headers,
                data=json.dumps(data)
            )

            if response.status_code == 200:
                content = response.json()['choices'][0]['message']['content']
                st.markdown(content)
                st.session_state.messages.append({
                    "role": "assistant",
                    "content": content
                })
            else:
                st.error(f"Error: {response.status_code} - {response.text}")

How It Works

1. Session State Management

Streamlit's st.session_state preserves chat history across app re-runs:

if "messages" not in st.session_state:
    st.session_state.messages = []

2. Dynamic Model Selection

The sidebar dropdown lets users switch models on-the-fly:

model = st.selectbox("Select Model", [
    "openai/gpt-3.5-turbo",
    "anthropic/claude-3-haiku",
    "meta-llama/llama-3-8b-instruct"
])

3. OpenRouter API Call

The core integration sends the entire message history to maintain context:

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "HTTP-Referer": "http://localhost:8501",
    "X-Title": "OpenRouter Streamlit App",
}
data = {
    "model": model,
    "messages": st.session_state.messages
}

Running the Application

Save the code as app.py
Run the app: streamlit run app.py
Open your browser at http://localhost:8501
Enter your OpenRouter API key in the sidebar
Start chatting!

Key Considerations

Streaming Responses

For a better user experience, enable real-time token streaming so the AI appears to "type" its response:

response = requests.post(
    response_url,
    headers=headers,
    json=data,
    stream=True
)

for chunk in response.iter_lines():
    if chunk:
        # Parse and stream each token
        pass

This requires additional parsing of Server-Sent Events (SSE) format returned by OpenRouter.

Model Selection Strategy

OpenRouter supports hundreds of models. Here's a quick guide:

Use Case	Recommended Model
General purpose	`openai/gpt-4`
Cost-effective	`meta-llama/llama-3-8b-instruct`
Creative writing	`anthropic/claude-3-sonnet`
Code generation	`google/gemini-pro`
Fast responses	`anthropic/claude-3-haiku`

Browse the full list at OpenRouter Models.

Security Best Practices

Never hardcode API keys in your application. Use one of these approaches:

Option 1: Streamlit Secrets (Local Development)

Create .streamlit/secrets.toml:

[openrouter]
api_key = "your-api-key-here"

Access it in code:

api_key = st.secrets["openrouter"]["api_key"]

Option 2: Environment Variables (Production)

import os
api_key = os.environ.get("OPENROUTER_API_KEY")

Set it before running:

export OPENROUTER_API_KEY="your-key"
streamlit run app.py

OAuth Option

OpenRouter supports OAuth, allowing users to connect their own OpenRouter accounts instead of entering API keys manually. This is ideal for public-facing applications where you don't want to manage keys.

Performance Optimization

Use st.spinner: Always wrap API calls with loading indicators

Timeout Handling: Add request timeouts to prevent hanging

response = requests.post(url, headers=headers, json=data, timeout=30)

Error Retries: Implement exponential backoff for failed requests
Caching: Use @st.cache_data for repeated similar queries

Enhanced Version with Error Handling

Here's an improved API call with production-ready features:

def call_openrouter(messages, model, api_key, max_retries=3):
    url = "https://openrouter.ai/api/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    }
    payload = {
        "model": model,
        "messages": messages,
        "temperature": 0.7,
        "max_tokens": 1000
    }

    for attempt in range(max_retries):
        try:
            response = requests.post(
                url,
                headers=headers,
                json=payload,
                timeout=30
            )

            if response.status_code == 200:
                return response.json()['choices'][0]['message']['content']
            elif response.status_code == 429:
                st.warning("Rate limited. Please wait...")
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                st.error(f"Error: {response.status_code}")
                return None

        except requests.exceptions.Timeout:
            st.error("Request timed out. Retrying...")
        except requests.exceptions.RequestException as e:
            st.error(f"Request failed: {e}")
            return None

    st.error("Max retries exceeded")
    return None

Deployment Options

Once your chat app is ready, deploy it to share with the world:

Streamlit Community Cloud: Free hosting at share.streamlit.io
Hugging Face Spaces: Free tier with custom models
Heroku: Scalable PaaS option
AWS/GCP/Azure: Full control with cloud providers

Next Steps

Add File Uploads: Enable PDF/text analysis with st.file_uploader
Memory Management: Implement conversation summarization for long chats
Multi-Modal: Integrate image analysis with vision models
Custom Prompts: Add system prompts for specialized behaviors
Analytics: Track usage patterns and model performance

Conclusion

Combining Streamlit's rapid prototyping with OpenRouter's model flexibility gives you a powerful toolkit for building AI applications. You can experiment with different models, optimize for cost or performance, and deploy a polished web app—all in under 100 lines of Python.

The beauty of this approach? Switching from GPT-4 to Claude 3 to Llama 3 is as simple as changing a dropdown selection. No code rewrites, no multiple SDK integrations—just pure experimentation.

Have you built AI apps with Streamlit and OpenRouter? Share your projects and tips in the comments below!