Bifrost 개요

예상 읽기 시간:4분 19 조회수

핵심 개념

Bifrost는 AI Gateway입니다

Bifrost는 애플리케이션과 LLM Provider 사이에 위치하는 고성능 AI Gateway입니다.

공식 문서에서는 Bifrost를 “20+ providers를 단일 OpenAI-compatible API로 통합하는 고성능 AI gateway”라고 설명합니다.

핵심은 애플리케이션 코드가 OpenAI, Anthropic, Bedrock, Vertex AI, Azure OpenAI 등 여러 Provider의 차이를 직접 처리하지 않도록 만드는 것입니다.

Python 백엔드 입장에서는 LLM 호출 지점을 Provider별 SDK 호출 묶음으로 흩뜨리지 않고, Bifrost Gateway라는 단일 경계로 모을 수 있습니다.

11µs overhead의 의미

공식 Overview는 sustained benchmark 기준 5,000 requests/sec에서 Bifrost가 요청당 11µs overhead만 추가한다고 설명합니다.

이 수치는 “LLM 응답 latency가 1~수십 초인 환경에서 Gateway 레이어 자체의 부가 비용이 매우 작다”는 의미로 이해하시면 됩니다.

다만 이 숫자는 전체 응답 시간이 11µs라는 뜻이 아닙니다.

실제 latency는 Provider API, 네트워크, 모델 추론 시간, streaming 여부, tool call, cache hit 여부에 의해 결정됩니다.

운영 문서에서는 이 수치를 “Gateway 도입으로 생기는 추가 비용이 병목이 되기 어렵다”는 근거로만 사용해야 합니다.

20+ Provider 통합의 의미

Bifrost는 Provider 이름과 모델 이름을 Gateway에서 해석하고 라우팅합니다.

예를 들어 openai/gpt-4o-mini처럼 Provider prefix를 포함한 모델 이름을 사용하면, Gateway가 OpenAI Provider의 해당 모델로 요청을 보낼 수 있습니다.

bare model name을 쓰는 경우에는 Model Catalog를 통해 Provider를 해석할 수 있습니다.

이 구조 덕분에 백엔드 애플리케이션은 Provider별 인증, fallback, rate limit, key 분산, 로그 수집을 직접 구현하지 않고도 중앙에서 운영 정책을 적용할 수 있습니다.

OSS에서 바로 중요한 기능

OSS 운영에서 먼저 봐야 할 기능은 Drop-in Replacement, Automatic Fallbacks, Load Balancing, Virtual Keys, Routing, Budget & Rate Limits, MCP Tool Filtering, Semantic Caching, Observability, Prometheus Metrics, OpenTelemetry, Custom Plugins입니다.

이 기능들은 각각 독립 기능처럼 보이지만 운영에서는 함께 묶입니다.

예를 들어 Provider 장애는 Fallback으로 완화하고, 키별 rate limit은 Load Balancing으로 분산하며, 고객별 사용량은 Virtual Key와 Budget으로 제어하고, 문제 분석은 Logging/Observability로 수행합니다.

Bifrost 위치 정리

관점	Bifrost 없이 직접 구현	Bifrost 도입 후
Provider 연동	OpenAI/Anthropic/Bedrock SDK를 앱 코드에 직접 분기	Gateway에 Provider를 등록하고 앱은 단일 API 호출
장애 대응	try/except와 재시도 로직을 서비스마다 구현	Provider/Model fallback을 중앙 정책으로 관리
Rate limit 대응	키별 사용량을 앱에서 직접 추적	Key pool, weight, failover 정책으로 분산
비용 통제	서비스별 토큰/비용 집계 로직 필요	Virtual Key, Budget, Logging으로 중앙화
관측성	앱 로그와 Provider 응답을 따로 수집	요청/응답 로그, metrics, tracing을 Gateway 기준으로 수집
Agent/MCP	Agent별 tool registry를 개별 구성	MCP Gateway로 tool discovery/execution을 중앙화

실습 예제

1. Gateway가 단일 API 경계가 되는지 확인

아래 예제는 Bifrost Gateway가 로컬 8080 포트에서 실행 중이고, OpenAI Provider가 등록되어 있다고 가정합니다.

curl -X POST <http://localhost:8080/v1/chat/completions> \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Explain Bifrost in one sentence."}
    ]
  }'

curl -X POST <http://localhost:8080/v1/chat/completions> \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Explain Bifrost in one sentence."}
    ]
  }'

이 호출에서 애플리케이션은 OpenAI API Key를 직접 들고 있지 않아도 됩니다.

Provider key와 routing 정책은 Bifrost 설정에 두고, 앱은 Gateway만 호출하는 형태로 바뀝니다.

2. Python 백엔드에서 Gateway를 추상화

from openai import OpenAI

client = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",  # 실제 Provider key는 Bifrost에서 관리
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise backend assistant."},
        {"role": "user", "content": "What problem does an AI gateway solve?"},
    ],
)

print(response.choices[0].message.content)

from openai import OpenAI

client = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",  # 실제 Provider key는 Bifrost에서 관리
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise backend assistant."},
        {"role": "user", "content": "What problem does an AI gateway solve?"},
    ],
)

print(response.choices[0].message.content)

3. FastAPI 서비스에서 호출 위치 고정

from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

llm = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",
)

@app.post("/chat")
def chat(req: ChatRequest):
    result = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": req.message}],
    )
    return {"answer": result.choices[0].message.content}

from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

llm = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",
)

@app.post("/chat")
def chat(req: ChatRequest):
    result = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": req.message}],
    )
    return {"answer": result.choices[0].message.content}

이 패턴의 핵심은 FastAPI 서비스가 Provider 장애, key failover, 비용 제한, tracing 정책을 직접 알 필요가 없다는 점입니다. 서비스 코드는 LLM 호출만 수행하고, 운영 정책은 Gateway에서 조정합니다.

고급 활용

1. LangGraph의 모델 노드를 Gateway 뒤로 모으기

LangGraph에서 여러 노드가 LLM을 호출하면 Provider별 SDK 설정이 노드마다 퍼지기 쉽습니다Bifrost를 쓰면 LangGraph의 모델 호출 경계를 Gateway로 통일하고, fallback이나 budget 제한은 Virtual Key/Provider 정책으로 분리할 수 있습니다.

from openai import OpenAI

bifrost_client = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",
)

def call_llm_for_node(state: dict) -> dict:
    completion = bifrost_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=state["messages"],
    )
    return {
        "messages": completion.choices[0].message,
    }

from openai import OpenAI

bifrost_client = OpenAI(
    base_url="<http://localhost:8080/openai>",
    api_key="dummy-key",
)

def call_llm_for_node(state: dict) -> dict:
    completion = bifrost_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=state["messages"],
    )
    return {
        "messages": completion.choices[0].message,
    }

이렇게 만들면 LangGraph 그래프의 관심사는 상태 전이와 tool orchestration이고, Provider 운영 정책은 Bifrost에 남습니다.

2. Gateway 기준으로 운영 책임 분리

Self-hosted 운영에서는 애플리케이션 팀과 플랫폼 팀의 책임을 나누는 것이 중요합니다.

책임	애플리케이션 팀	플랫폼/Gateway 팀
Prompt/Chain	비즈니스 로직, LangGraph 상태, 응답 품질	공통 정책 템플릿, 안전한 기본값
Provider 선택	기본 모델 요구사항 제시	Provider 등록, fallback chain, key pool
비용	기능별 예산 요구	Virtual Key, budget, usage log
장애 대응	graceful degradation UX	health check, failover, observability