Blogโ€บEngineering

Building a Content Moderation Pipeline with AI

Every platform with user-generated content needs moderation. AI classifiers can handle 95% of moderation automatically, flagging the 5% that need human review. Here's how to build a production moderation pipeline.

The Moderation Categories

Classify content into categories based on your platform's policies:

CATEGORIES = {
    "clean": "Safe for all audiences",
    "spam": "Unsolicited promotional content",
    "profanity": "Contains offensive language",
    "hate_speech": "Targeted harassment or hate speech",
    "violence": "Graphic violence or threats",
    "adult": "Sexual or NSFW content",
    "suspicious": "Potential fraud or manipulation"
}

Text Moderation with Classifiers

import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("CELUXE_API_KEY"),
    base_url="https://api.celuxe.shop/v1"
)

SYSTEM_PROMPT = """You are a content moderation classifier.
Classify the following text into one of these categories:
- clean: Safe for all audiences
- spam: Promotional or unsolicited content
- profanity: Contains offensive language
- hate_speech: Harassment or hate speech
- violence: Threats or graphic content
- adult: Sexual content
- suspicious: Fraud or manipulation

Respond ONLY with the category name. No explanation."""

def moderate_text(text):
    response = client.chat.completions.create(
        model="deepseek-chat",  # Cheap, fast, accurate for classification
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text}
        ],
        max_tokens=20,
        temperature=0
    )
    category = response.choices[0].message.content.strip().lower()
    # Validate response
    valid = ["clean", "spam", "profanity", "hate_speech", "violence", "adult", "suspicious"]
    return category if category in valid else "clean"

Batch Moderation for High Volume

def moderate_batch(texts, threshold=100):
    """Moderate many texts efficiently with batching."""
    results = []
    for i in range(0, len(texts), threshold):
        batch = texts[i:i+threshold]
        # Combine into single classification request
        combined = "\n---\n".join(f"[{j}] {t}" for j, t in enumerate(batch))
        
        response = client.chat.completions.create(
            model="deepseek-chat",
            messages=[
                {"role": "system", "content": f"{SYSTEM_PROMPT}\n\nClassify each item. Format: [0]=category, [1]=category, ..."},
                {"role": "user", "content": combined}
            ],
            max_tokens=200,
            temperature=0
        )
        # Parse responses...
        results.extend(parse_batch_response(response, len(batch)))
    return results

The Pipeline Architecture

def moderation_pipeline(user_content, user_id=None):
    # 1. Pre-check: blocklist (fastest, check first)
    if contains_blocked_words(user_content):
        return {"status": "rejected", "reason": "blocklist_match", "action": "auto_reject"}
    
    # 2. AI classification
    category = moderate_text(user_content)
    
    # 3. Action based on category
    if category == "clean":
        return {"status": "approved"}
    elif category in ["spam", "profanity"]:
        return {"status": "flagged", "action": "human_review"}
    elif category in ["hate_speech", "violence", "adult", "suspicious"]:
        return {"status": "rejected", "action": "auto_remove", "reason": category}
    else:
        return {"status": "flagged", "action": "human_review"}

Production Tips

  • Pre-check blocklists first: Simple string matching is 1000ร— faster than AI. Check blocklists before calling the LLM.
  • Use the cheapest model: DeepSeek V3 is 9ร— cheaper for classification and performs equivalently to GPT-4o.
  • Human review queue: Route flagged content to a review system. Log all decisions for audit trails.
  • Feedback loop: When human reviewers override AI decisions, use that feedback to improve your classifier.

Start Building for Free

Access 30+ AI models through Celuxe's unified API. DeepSeek V3 classification costs less than $0.001 per 1,000 calls.

Get Your API Key โ†’
C

Celuxe Team

Engineering and product team at Celuxe.