Building a Content Moderation Pipeline with AI

Every platform with user-generated content needs moderation. AI classifiers can handle 95% of moderation automatically, flagging the 5% that need human review. Here's how to build a production moderation pipeline.

The Moderation Categories

Classify content into categories based on your platform's policies:

CATEGORIES = {
    "clean": "Safe for all audiences",
    "spam": "Unsolicited promotional content",
    "profanity": "Contains offensive language",
    "hate_speech": "Targeted harassment or hate speech",
    "violence": "Graphic violence or threats",
    "adult": "Sexual or NSFW content",
    "suspicious": "Potential fraud or manipulation"
}

Text Moderation with Classifiers

import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("CELUXE_API_KEY"),
    base_url="https://api.celuxe.shop/v1"
)

SYSTEM_PROMPT = """You are a content moderation classifier.
Classify the following text into one of these categories:
- clean: Safe for all audiences
- spam: Promotional or unsolicited content
- profanity: Contains offensive language
- hate_speech: Harassment or hate speech
- violence: Threats or graphic content
- adult: Sexual content
- suspicious: Fraud or manipulation

Respond ONLY with the category name. No explanation."""

def moderate_text(text):
    response = client.chat.completions.create(
        model="deepseek-chat",  # Cheap, fast, accurate for classification
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text}
        ],
        max_tokens=20,
        temperature=0
    )
    category = response.choices[0].message.content.strip().lower()
    # Validate response
    valid = ["clean", "spam", "profanity", "hate_speech", "violence", "adult", "suspicious"]
    return category if category in valid else "clean"

Batch Moderation for High Volume

def moderate_batch(texts, threshold=100):
    """Moderate many texts efficiently with batching."""
    results = []
    for i in range(0, len(texts), threshold):
        batch = texts[i:i+threshold]
        # Combine into single classification request
        combined = "\n---\n".join(f"[{j}] {t}" for j, t in enumerate(batch))
        
        response = client.chat.completions.create(
            model="deepseek-chat",
            messages=[
                {"role": "system", "content": f"{SYSTEM_PROMPT}\n\nClassify each item. Format: [0]=category, [1]=category, ..."},
                {"role": "user", "content": combined}
            ],
            max_tokens=200,
            temperature=0
        )
        # Parse responses...
        results.extend(parse_batch_response(response, len(batch)))
    return results

The Pipeline Architecture

def moderation_pipeline(user_content, user_id=None):
    # 1. Pre-check: blocklist (fastest, check first)
    if contains_blocked_words(user_content):
        return {"status": "rejected", "reason": "blocklist_match", "action": "auto_reject"}
    
    # 2. AI classification
    category = moderate_text(user_content)
    
    # 3. Action based on category
    if category == "clean":
        return {"status": "approved"}
    elif category in ["spam", "profanity"]:
        return {"status": "flagged", "action": "human_review"}
    elif category in ["hate_speech", "violence", "adult", "suspicious"]:
        return {"status": "rejected", "action": "auto_remove", "reason": category}
    else:
        return {"status": "flagged", "action": "human_review"}

Production Tips

Pre-check blocklists first: Simple string matching is 1000× faster than AI. Check blocklists before calling the LLM.
Use the cheapest model: DeepSeek V3 is 9× cheaper for classification and performs equivalently to GPT-4o.
Human review queue: Route flagged content to a review system. Log all decisions for audit trails.
Feedback loop: When human reviewers override AI decisions, use that feedback to improve your classifier.

Start Building for Free

Access 30+ AI models through Celuxe's unified API. DeepSeek V3 classification costs less than $0.001 per 1,000 calls.

Get Your API Key →

The Moderation Categories

Text Moderation with Classifiers

Batch Moderation for High Volume

The Pipeline Architecture

Production Tips

Start Building for Free

Celuxe Team

Related Articles

Building Your First AI Agent

Build a RAG System in 200 Lines

Get more like this in your inbox