Every platform with user-generated content needs moderation. AI classifiers can handle 95% of moderation automatically, flagging the 5% that need human review. Here's how to build a production moderation pipeline.
The Moderation Categories
Classify content into categories based on your platform's policies:
CATEGORIES = {
"clean": "Safe for all audiences",
"spam": "Unsolicited promotional content",
"profanity": "Contains offensive language",
"hate_speech": "Targeted harassment or hate speech",
"violence": "Graphic violence or threats",
"adult": "Sexual or NSFW content",
"suspicious": "Potential fraud or manipulation"
}
Text Moderation with Classifiers
import openai
import os
client = openai.OpenAI(
api_key=os.environ.get("CELUXE_API_KEY"),
base_url="https://api.celuxe.shop/v1"
)
SYSTEM_PROMPT = """You are a content moderation classifier.
Classify the following text into one of these categories:
- clean: Safe for all audiences
- spam: Promotional or unsolicited content
- profanity: Contains offensive language
- hate_speech: Harassment or hate speech
- violence: Threats or graphic content
- adult: Sexual content
- suspicious: Fraud or manipulation
Respond ONLY with the category name. No explanation."""
def moderate_text(text):
response = client.chat.completions.create(
model="deepseek-chat", # Cheap, fast, accurate for classification
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text}
],
max_tokens=20,
temperature=0
)
category = response.choices[0].message.content.strip().lower()
# Validate response
valid = ["clean", "spam", "profanity", "hate_speech", "violence", "adult", "suspicious"]
return category if category in valid else "clean"
Batch Moderation for High Volume
def moderate_batch(texts, threshold=100):
"""Moderate many texts efficiently with batching."""
results = []
for i in range(0, len(texts), threshold):
batch = texts[i:i+threshold]
# Combine into single classification request
combined = "\n---\n".join(f"[{j}] {t}" for j, t in enumerate(batch))
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": f"{SYSTEM_PROMPT}\n\nClassify each item. Format: [0]=category, [1]=category, ..."},
{"role": "user", "content": combined}
],
max_tokens=200,
temperature=0
)
# Parse responses...
results.extend(parse_batch_response(response, len(batch)))
return results
The Pipeline Architecture
def moderation_pipeline(user_content, user_id=None):
# 1. Pre-check: blocklist (fastest, check first)
if contains_blocked_words(user_content):
return {"status": "rejected", "reason": "blocklist_match", "action": "auto_reject"}
# 2. AI classification
category = moderate_text(user_content)
# 3. Action based on category
if category == "clean":
return {"status": "approved"}
elif category in ["spam", "profanity"]:
return {"status": "flagged", "action": "human_review"}
elif category in ["hate_speech", "violence", "adult", "suspicious"]:
return {"status": "rejected", "action": "auto_remove", "reason": category}
else:
return {"status": "flagged", "action": "human_review"}
Production Tips
- Pre-check blocklists first: Simple string matching is 1000ร faster than AI. Check blocklists before calling the LLM.
- Use the cheapest model: DeepSeek V3 is 9ร cheaper for classification and performs equivalently to GPT-4o.
- Human review queue: Route flagged content to a review system. Log all decisions for audit trails.
- Feedback loop: When human reviewers override AI decisions, use that feedback to improve your classifier.
Start Building for Free
Access 30+ AI models through Celuxe's unified API. DeepSeek V3 classification costs less than $0.001 per 1,000 calls.
Get Your API Key โ