Spam, sorted
by type.

Instantly know whether a message is legitimate or falls into one of five categories of spam — promotion, phishing, scam, SEO boost, clickbait.

GET THE MODEL GET THE DATASET

French message spam classification — 5 types.

French spam, precisely classified

French forms, comments, and shared inboxes are constantly hit by promotion blasts, phishing attempts, SEO link stuffing, clickbait, and other low-effort pitches that waste time and distort analytics.

This model focuses on one thing — understanding French spam by type.
It distinguishes between ham and five clear families: promotion, phishing, scam, SEO boost and clickbait.

No SaaS dependency, no data leakage — just local AI you control, tuned for real French traffic.

What changes when you deploy the AI model
(or train from the dataset)

The model filters every French message, labeling it as ham or one of five spam types — promotion, phishing, scam, SEO boost, or clickbait.
You gain instant visibility, better automation, and measurable progress without changing your existing tools.

1

Cleaner inbox, faster response

Spam gets intercepted before it reaches your CRM or shared inbox.
Your team only handles legitimate messages, already sorted and scored.
Fewer duplicates, fewer false leads, faster triage.

2

Measurable accuracy,
real insights

Track the share of each spam family week over week.
Run A/B tests on thresholds and monitor macro-F1 to prove balanced performance across ham and all spam types.

3

Smarter rules, lower moderation costs

Routing becomes predictable:
phishing → quarantine, promotion → soft block, clickbait → review, ham → publish or route.
Less manual moderation, cleaner data pipelines, and fewer errors over time.

Trusted by automation teams.

01

Native to French spam patterns. Trained and validated on authentic and synthetic French messages, covering every major spam behavior.

02

Granular decisions, seamless integration. Each message returns clear probabilities for ham and all five spam families.

03

Proven reliability and full ownership. Built on CamemBERT-v2 and audited with macro-F1 on a stratified French corpus.

Built to classify smarter

Native to the French web: this multi-class spam classifier is trained and validated on real French messages—emails, forms, comments, reviews—and checked on a stratified hold-out with macro-F1 so every spam type matters equally.
It detects not only spam vs. ham, but five distinct spam families: promotion, phishing, scam, SEO boost, and clickbait.

Start with safe thresholds, monitor early quarantines, then fine-tune per channel once you know your clean traffic profile

How it works

Feed any French text—from contact forms, inboxes, or comment systems—into the model.

Each message returns a probability for ham and for each spam family, letting you automate precise actions: block phishing attempts, review promotion posts, hide SEO-boost spam, downrank clickbait headlines.

You define your own thresholds, store predictions for analytics or audits, and schedule drift checks as volumes change.

Who it’s for

Security & IT — Stop phishing and scam emails before they hit user inboxes.

Marketing & Partnerships — Automatically quarantine promotion messages or SEO-boost submissions that flood your forms.

Community & Content Teams — Filter clickbait and promo posts while letting genuine comments auto-publish.

Data & Engineering — Deploy locally, monitor macro-F1, and retain full control over thresholds, logs, and updates — no cloud dependency, full GDPR compliance.

What you receive

Choose the Dataset to train or adapt your own classifier, or the Model to deploy instantly.

  • Dataset (JSONL) — Line-delimited French corpus labeled with ham, promotion, phishing, scam, seo_boost, and clickbait; includes documentation, label dictionary, and evaluation split.
  • Model (Transformers-ready) — Fine-tuned CamemBERT-v2 weights, tokenizer, label mapping, and quickstart scripts for inference or batch scoring.

Integration scenarios

Affiliate link moderation

Monitor partner feeds or user-submitted URLs. SEO-boost posts are flagged and held for review; promotion content gets a soft block. Ham passes automatically, which helps keep your ads relevant.

Email intake triage

Before routing to a shared inbox, classify inbound messages. Phishing and scam emails are quarantined, while promotion goes to marketing review. Ham threads reach support instantly, reducing response delay.

Ad-content screening

Score each submitted ad or banner copy. Clickbait titles are down-ranked, promotion spam is paused pending moderation. Ham campaigns go live without delay, protecting user trust and platform quality.

Formats & pricing

Two paths to clean French traffic.
Train with the dataset or deploy the ready-to-use model — same precision, different control.

AI Model

€30

Deploy instantly with a ready-to-use multi-class spam classifier.

Includes

  • Hugging Face compatible
  • Fine-tuned multi-class spam classifier
  • Documentation (example scripts)

Delivery

  • Secure download.
  • Perpetual license for a project.
GET THE MODEL

Dataset — JSONL

€80

Ideal if you want to train or adapt your own classifier.

Includes

  • Line‑delimited JSONL (text, type)
  • Total records : 28 200
  • Documentation

Delivery

  • Secure download.
  • Commercial or research license.
GET THE DATASET

Frequently asked questions

What spam families are included?

Exactly five: promotion, phishing, scam, seo_boost, clickbait, plus ham.

A concise label dictionary is provided.

We report macro-F1 on a stratified hold-out split. You can reproduce this with the provided split recipe or maintain your own weekly check.

Yes. The model runs locally on CPU or GPU and can be exported to ONNX. No data needs to leave your infrastructure.

Training includes ham balancing to reduce over-blocking. We still recommend logging predictions and reviewing edge cases during onboarding.

Absolutely. Start from the model or train from the dataset. Both routes are documented and quick to try.

No. It’s channel-agnostic: forms, comments, in-app feedback, CRM imports, partner feeds, and more.

Because it yields reasons, not just a yes/no. Type-aware automation is simpler to govern and more useful for analytics.

READY TO DEPLOY

Instant clarity for French customer feedback.

Deploy the model to evaluate reviews immediately, or use the dataset to tailor sentiment to your domain.

Start small, then refine as you grow.

GET THE MODEL GET THE DATASET