Advanced Features

Table of contents

  1. Serving Modes
    1. Synchronous Mode
    2. Asynchronous Mode
    3. Cache-Only Mode
  2. Caching System
    1. Cache Configuration
    2. Cache Operations
  3. Batch Recovery
    1. Automatic Recovery
    2. Manual Recovery
  4. Advanced Monitoring
    1. Custom Polling Intervals
    2. Progress Tracking
  5. Performance Tuning
    1. Batch Window Optimization
    2. MongoDB Optimization

Serving Modes

Synchronous Mode

export CLIENT_SERVING_MODE=sync  # Default
  • Blocks until response is available
  • Similar to standard OpenAI API
  • Best for low-volume scenarios

Asynchronous Mode

export CLIENT_SERVING_MODE=async
  • Returns immediately with submission confirmation
  • Suitable for high-volume applications
  • Requires separate status checking

Cache-Only Mode

export CLIENT_SERVING_MODE=cache
  • Serves only cached responses
  • No new API calls
  • Processes pending batches

Caching System

Cache Configuration

  • Automatic request hashing
  • MongoDB-based storage
  • Cross-session persistence

Cache Operations

# Cache hit example
response1 = client.chat.completions.create(...)
response2 = client.chat.completions.create(...)  # Same request returns cached response

Batch Recovery

Automatic Recovery

  • Detects interrupted batches
  • Resumes processing on restart
  • Updates original requesters

Manual Recovery

# Check dangling batches
python client.py --api status_all_batches --status_filter not_completed

Advanced Monitoring

Custom Polling Intervals

export COLLECT_BATCH_STATS_POLLING_MAX_INTERVAL_SECONDS=600

Progress Tracking

  • Real-time completion statistics
  • Request counts monitoring
  • Error tracking

Performance Tuning

Batch Window Optimization

# Adjust batch collection window
export COLLATE_BATCHES_FOR_DURATION_IN_MS=3000  # 3 seconds

MongoDB Optimization

  • Index management
  • Connection pooling
  • Query optimization