Usage Guide

Basic Usage
1. Python Client
2. curl Commands
Test Client Usage
Response Handling
1. Synchronous Mode
2. Asynchronous Mode
Best Practices

Basic Usage

Python Client

from openai import OpenAI

client = OpenAI(
    api_key="dummy_openai_api_key",  # Any string works as the key
    base_url="http://localhost:8080/v1"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

curl Commands

Chat completion request:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Check batch status:

curl http://localhost:8080/v1/batches/batch_123

List all batches:

curl http://localhost:8080/v1/batches

Test Client Usage

The included Python test client provides easy testing:

# Send a chat completion request
python client.py --api chat_completions --content "Write a joke"

# Check specific batch status
python client.py --api status_single_batch --batch_id batch_123

# List all batches
python client.py --api status_all_batches

# List only completed batches
python client.py --api status_all_batches --status_filter completed

Response Handling

Synchronous Mode

# Response will be returned when ready
response = client.chat.completions.create(...)
print(response.choices[0].message.content)

Asynchronous Mode

# Returns immediately with batch ID
response = client.chat.completions.create(...)
batch_id = response.id

# Check status later
status = client.batches.retrieve(batch_id)
print(f"Status: {status.batch.status}")

Best Practices

Request Batching
- Group similar requests together
- Use appropriate batch window size
- Consider request volume

Error Handling

try:
    response = client.chat.completions.create(...)
except Exception as e:
    print(f"Error: {e}")

Monitoring
- Use the batch monitor tool
- Track batch statuses
- Monitor cache hits/misses

Usage Guide

Table of contents

Basic Usage

Python Client

curl Commands

Test Client Usage

Response Handling

Synchronous Mode

Asynchronous Mode

Best Practices