How to Deploy a Spam Detection Model Locally (Spam/Ham) with Python

This step-by-step guide shows how to deploy a local spam detection model (spam/ham) in Python with Hugging Face Transformers on CPU—no server, no cloud.

You will load the model from a local model/ folder and run predictions on 10 French sample texts to verify that inference works end-to-end.

What you need

Before you start, make sure you have the following in place:

Python 3.11+
A virtual environment (recommended)
The model files in a local folder (e.g. ./model/)

Your model/ folder must contain the model files (real example, inference side):

config.json
model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.txt

Recommended structure

Keep the model files in a dedicated model/ folder at the project root, next to test_spam_ham.py. This makes the default path work out of the box and keeps the tutorial copy/paste friendly.

model/ contains the packaged Hugging Face artifacts.
requirements.txt pins the Python dependencies for a reproducible install.
test_spam_ham.py is the single entrypoint you run from the project root.

If you later change the layout, you can pass the correct path with --model-dir.

kinoux-spam-ham-local/
  model/
    ... model files ...
  requirements.txt
  test_spam_ham.py

Installation

From the project root, create a requirements.txt file:

transformers>=4.40,<5.0
torch
safetensors

Then install:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Note: safetensors is required when the model is shipped as .safetensors.

Immediate test script (10 reviews/messages)

Create test_spam_ham.py at the project root.

This script:

uses CPU only (no GPU required)
loads the tokenizer + model from ./model/
runs 10 French texts
prints per-label scores (in %)

test_spam_ham.py



#!/usr/bin/env python3
"""Local sanity check for a Kinoux spam/ham sequence classification model.

- Inputs are expected to be French.
- Code and output are intentionally in English.

This script runs 10 sample reviews/messages and prints per-label probabilities.
"""

from __future__ import annotations

import argparse
import json
import time
from pathlib import Path
from typing import Iterable

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

DEFAULT_MODEL_DIR = Path(__file__).resolve().parent / "model"

SAMPLE_FRENCH_TEXTS: list[str] = [
    "Livraison rapide et produit conforme, rien à signaler.",
    "Très déçu : emballage abîmé, mais le service client a répondu vite.",
    "Qualité correcte pour le prix, je recommanderai probablement.",
    "Cliquez sur mon profil pour gagner de l'argent facilement, offre limitée.",
    "Le produit fonctionne bien, mais la notice mériterait d'être plus claire.",
    "Super boutique ! J'ai reçu un code promo après l'achat, merci.",
    "Investissement garanti, 400% en 7 jours : écrivez-moi en privé.",
    "Très bon rapport qualité/prix, je vais en prendre un second.",
    "Lien secret pour récupérer des cadeaux gratuits, dépêchez-vous.",
    "Service impeccable, réponse en moins d'une heure et échange facile.",
]


def load_model(model_dir: Path, device: torch.device):
    """Load tokenizer + model from a local directory."""
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForSequenceClassification.from_pretrained(model_dir)
    model.to(device)
    model.eval()

    # Map logits indices to readable labels
    labels = [model.config.id2label[i] for i in range(model.config.num_labels)]
    return tokenizer, model, labels


def predict_single(
    text: str,
    tokenizer,
    model,
    labels: list[str],
    device: torch.device,
    max_length: int,
) -> dict[str, float]:
    """Return per-label probabilities in percent (0-100)."""
    batch = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=max_length,
    )
    batch = {k: v.to(device) for k, v in batch.items()}

    with torch.inference_mode():
        logits = model(**batch).logits
        probs = torch.softmax(logits, dim=-1)[0].detach().to("cpu").tolist()

    return {label: round(score * 100.0, 2) for label, score in zip(labels, probs)}


def iter_texts(
    texts: Iterable[str],
    tokenizer,
    model,
    labels: list[str],
    device: torch.device,
    max_length: int,
) -> None:
    for idx, text in enumerate(texts, start=1):
        scores = predict_single(text, tokenizer, model, labels, device, max_length)
        best_label = max(scores, key=scores.get)
        best_score = scores[best_label]

        print(f"Sample #{idx}")
        print(f"Text      : {text}")
        print(f"Top label : {best_label} ({best_score}%)")
        print(f"Scores    : {json.dumps(scores, ensure_ascii=False)}")
        print()


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Run a local sanity check on the packaged Kinoux spam/ham model"
    )
    parser.add_argument(
        "--model-dir",
        default=str(DEFAULT_MODEL_DIR),
        help="Directory containing the fine-tuned model files",
    )
    parser.add_argument(
        "--max-length",
        type=int,
        default=256,
        help="Tokenizer max_length parameter",
    )
    parser.add_argument(
        "--limit",
        type=int,
        default=10,
        help="How many sample texts to run",
    )
    return parser.parse_args()


def main() -> None:
    args = parse_args()
    model_dir = Path(args.model_dir).resolve()

    if not model_dir.exists():
        raise SystemExit(f"Model directory not found: {model_dir}")

    device = torch.device("cpu")
    print(f"Device    : {device}")
    print(f"Model dir : {model_dir}")
    print()

    tokenizer, model, labels = load_model(model_dir, device)

    texts = SAMPLE_FRENCH_TEXTS[: max(0, args.limit)]
    if len(texts) == 0:
        raise SystemExit("No texts to run (limit is 0)")

    start = time.time()
    iter_texts(texts, tokenizer, model, labels, device, args.max_length)
    elapsed = time.time() - start

    print(f"Total latency: {elapsed:.3f}s for {len(texts)} samples")


if __name__ == "__main__":
    main()

Run the test

From the project root:

python test_spam_ham.py

Useful options:

python test_spam_ham.py --max-length 128
python test_spam_ham.py --limit 10
python test_spam_ham.py --model-dir ./model

What the script does

Device selection

The script runs on CPU on purpose. This keeps the tutorial predictable: if you can run Python, you can run this model, with no GPU drivers, CUDA toolkits, or platform-specific setup.

In practice, this line forces CPU:

device = torch.device("cpu")

Then, when we create tensors (the tokenized inputs), we move them to the same device with:

v.to(device)

Why it matters: PyTorch will error if the model is on CPU but the inputs are on another device (or vice versa). The “model + inputs on the same device” rule is a common beginner pitfall.

Loading the tokenizer and the model

A text model cannot read raw strings directly. It needs two components:

Tokenizer: turns text into numbers (token IDs) the model understands.
Model: takes those numbers and outputs a prediction.

That is why the script loads both from the same model/ directory:

AutoTokenizer.from_pretrained(model_dir) loads tokenization rules and vocabulary.
AutoModelForSequenceClassification.from_pretrained(model_dir) loads the trained weights.

Then we call:

model.eval()

This switches the model to inference mode (it disables training-only behaviors like dropout). For beginners: always call eval() when you are doing predictions.

Finally, we build the list of human-readable labels with:

model.config.id2label

This mapping tells us which output index corresponds to which label (for example 0 -> ham, 1 -> spam, depending on how the model was trained). The script uses it to print results with label names instead of numeric indices.

Inference and probabilities

When you call the model, the output is not a direct “percentage” yet. The model returns logits:

logits are raw scores, one score per label
the highest logit usually corresponds to the predicted label

To make this easier to interpret, we convert logits into probabilities with softmax:

probabilities are normalized scores between 0 and 1
they sum to 1 across labels (or 100% once converted to percentages)

That is what this part does:

logits = model(**batch).logits
probs = torch.softmax(logits, dim=-1)

Then the script prints:

Top label: the label with the highest probability
Scores: a full dict like { "spam": 97.12, "ham": 2.88 }

Beginner note: a “high percentage” is a confidence signal, not a guarantee. The best way to trust it is to validate on your own real messages and, later, choose a decision threshold (for example “treat as spam only if spam ≥ 95%”).

Production notes

The model expects French. If you feed English or mixed-language text, validate on your real data.
Decision threshold: probabilities are useful, but in production you usually trigger actions from a threshold (e.g. spam >= 95%).
Long texts: if inputs exceed max_length, they will be truncated. The right max_length depends on your model and your use case.

Next steps

Once this local test is validated, the next logical steps are:

expose a POST /predict endpoint with FastAPI
dockerize
add logs + timeouts + batching

With these steps, you move from a local check to a production-ready building block that can be integrated into any back-office or moderation pipeline.