No spam.
Just signal.

Turn any French message into Spam or Ham
instantly and at scale.

GET THE MODEL GET THE DATASET

French message spam detection — Ham / Spam

French spam,
no frills

French forms, comments and shared inboxes get flooded by bots and low-effort pitches. That costs minutes, skews analytics and buries real leads.
English-trained filters miss FR-specific patterns (“gagnez X € par jour”, promo-code blasts, SEO link stuffing) and sometimes over-block legitimate messages.

This product does one job well: decide spam vs ham in French.
Pick the JSONL dataset to train in-house, or deploy the ready-to-use model on your servers.

No new SaaS, no data leaving your stack—just a threshold you control.

What changes when you deploy the AI model
(or train from the dataset)

Add the model to your forms, comments, or shared inbox, or train your own from the dataset. The filter blocks spam before it reaches users, lets legitimate messages in French through, and provides you with a simple threshold that you can adjust over time.

1

Faster triage

Spam gets stopped before it reaches your CRM or shared inbox. Your team only sees ham (legitimate) messages, sorted faster, with fewer duplicates and noise.

2

Measurable progress

You can track the share of spam week over week, run safe A/B on thresholds, and report a macro-F1 that everyone can understand (balanced performance across spam/ham).

3

Lower-cost automation

Routing rules become simple:
spam → quarantine or silent drop;
ham → CRM or team collaboration tool.
Less manual moderation means fewer false leads to clean up later.

Why teams choose it

01

Native French speakers: trained and evaluated on synthetic French texts from the outset, rather than after the fact.

02

Simple integration: a single probability and a single label (spam or ham). Trigger thresholds that you control.

03

Time-tested: start now with the binary model; you can then add our spam type model (phishing, promotion, scam, etc.) if you need greater granularity.

Built for French.
Built to block.

French-native from the start: the classifier is trained and validated on French messages, then checked on a stratified hold-out with macro-F1 so each class matters equally. It runs on-device (CPU/GPU) for privacy and low latency, and can be exported to ONNX (an open model format) if you need it. Start with a safe threshold, review a small quarantine at first, then tighten once you see clean traffic.

How it works

Enter plain text in French from forms, comments, or a shared inbox. The model returns spam/ham with a calibrated probability. You choose the threshold per channel (e.g., stricter for public comments, more flexible for partner forms), record predictions for audits, and schedule simple drift checks as volumes or sources change. No new SaaS, just a decision step in your pipeline.

Who it’s for

Support & Sales retain genuine leads, while bots are filtered out from the outset.
Community/Content auto-approve ham before publish and hold spam for review.
Data/IT deploy privately, monitor macro-F1 over time, and keep full control of logs, thresholds and upgrades—without sending data to third-party APIs.

What you receive

Choose the Dataset to train and adapt (line-delimited JSONL with labels, schema notes, suggested 85/15 split, checksums, and a short evaluation template). Or ship now with the Model (fine-tuned weights, tokenizer, exact label mapping, and quickstart scripts for batch or real-time). Both options are designed to coexist—own the data, run the model.

Integration scenarios

Contact-form firewall

Classify each submission server-side. Spam is quietly quarantined; ham becomes a lead and notifies Sales. Start with a safe threshold and a daily digest, then tighten.

Pre-publish comment check

Before publish, classify comments/reviews. Ham auto-posts to keep the flow; spam is held for batch moderation. Use stricter thresholds on public pages.

Spam trend analysis

Run periodic scans on stored messages to measure spam share over time. Detect sudden spikes, new patterns, or channels most affected. Useful for reporting and anti-bot strategy updates.

Formats & pricing

Use it your way — train with the dataset
or deploy the ready model.

AI Model

€30

Best when you need results now, on‑prem or private cloud.

Includes

  • Hugging Face compatible
  • Fine-tuned binary classifier (spam / ham)
  • Documentation (example scripts)

Delivery

  • Secure download
  • Perpetual license for a project
GET THE MODEL

Dataset — JSONL

€150

Best when you want to train, adapt, and audit.

Includes

  • Line‑delimited JSONL (text, label)
  • Total records : 56 400
  • Documentation

Delivery

  • Secure download
  • Commercial or research license
GET THE DATASET

Frequently asked questions

Will it block legitimate French messages by mistake?

Any spam filter can make false positives. We bias the model toward “ham unless clearly spam,” and we recommend you start with a conservative threshold and a quarantine rather than hard delete. Monitor a weekly sampling at first—most teams converge on a safe threshold in days.

Yes. Both the dataset and the model are designed for offline use. You can deploy on laptop, server, or container, on CPU or GPU. No external API is required.

Use the macro-F1 we provide as a baseline, then sample 200–300 of your own messages (balanced by channel) and compute the same metric. If your domain is very specific (e.g., classifieds, job boards), fine-tune on a small in-house subset for an extra boost.

This product is binary (spam/ham). If you need categories, add our Spam Type model later (phishing, promo, scam, SEO boost, clickbait). Many teams start binary and move to multi-class once routing rules mature.

The server side is the simplest. For forms, call the template after validation and before creating the lead. For comments, check before publishing. For the inbox, run a scheduled task. We provide minimal code examples and suggestions for using this template in our resources.

Messages never leave your infrastructure. You run the model where you already process user data. Keep logs locally for auditing purposes. If you use the dataset, remove personally identifiable information from your own training logs as you normally would.

Yes, and you should do so. Rate limits, honeypots, and CAPTCHAs eliminate the least expensive bot traffic. The model then handles more sophisticated and language-specific spam. This combination reduces both false positives and false negatives.

READY TO DEPLOY

Instant clarity for French customer feedback.

Deploy the model to evaluate reviews immediately, or use the dataset to tailor sentiment to your domain.

Start small, then refine as you grow.

GET THE MODEL GET THE DATASET