How to Deploy a Spam Detection Model Locally (Spam/Ham) with Python
This step-by-step guide shows how to deploy a local spam detection model (spam/ham) in Python with Hugging Face Transformers on CPU—no server, no cloud.
You will load the model from a local model/ folder and run predictions on 10 French sample texts to verify that inference works end-to-end.
What you need
Before you start, make sure you have the following in place:
- Python 3.11+
- A virtual environment (recommended)
- The model files in a local folder (e.g.
./model/)
Your model/ folder must contain the model files (real example, inference side):
config.jsonmodel.safetensorstokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonvocab.txt
Recommended structure
Keep the model files in a dedicated model/ folder at the project root, next to test_spam_ham.py. This makes the default path work out of the box and keeps the tutorial copy/paste friendly.
model/contains the packaged Hugging Face artifacts.requirements.txtpins the Python dependencies for a reproducible install.test_spam_ham.pyis the single entrypoint you run from the project root.
If you later change the layout, you can pass the correct path with --model-dir.
kinoux-spam-ham-local/
model/
... model files ...
requirements.txt
test_spam_ham.py
Installation
From the project root, create a requirements.txt file:
transformers>=4.40,<5.0 torch safetensors
Then install:
python -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt
Note: safetensors is required when the model is shipped as .safetensors.
Immediate test script (10 reviews/messages)
Create test_spam_ham.py at the project root.
This script:
- uses CPU only (no GPU required)
- loads the tokenizer + model from
./model/ - runs 10 French texts
- prints per-label scores (in %)
test_spam_ham.py
#!/usr/bin/env python3
"""Local sanity check for a Kinoux spam/ham sequence classification model.
- Inputs are expected to be French.
- Code and output are intentionally in English.
This script runs 10 sample reviews/messages and prints per-label probabilities.
"""
from __future__ import annotations
import argparse
import json
import time
from pathlib import Path
from typing import Iterable
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
DEFAULT_MODEL_DIR = Path(__file__).resolve().parent / "model"
SAMPLE_FRENCH_TEXTS: list[str] = [
"Livraison rapide et produit conforme, rien à signaler.",
"Très déçu : emballage abîmé, mais le service client a répondu vite.",
"Qualité correcte pour le prix, je recommanderai probablement.",
"Cliquez sur mon profil pour gagner de l'argent facilement, offre limitée.",
"Le produit fonctionne bien, mais la notice mériterait d'être plus claire.",
"Super boutique ! J'ai reçu un code promo après l'achat, merci.",
"Investissement garanti, 400% en 7 jours : écrivez-moi en privé.",
"Très bon rapport qualité/prix, je vais en prendre un second.",
"Lien secret pour récupérer des cadeaux gratuits, dépêchez-vous.",
"Service impeccable, réponse en moins d'une heure et échange facile.",
]
def load_model(model_dir: Path, device: torch.device):
"""Load tokenizer + model from a local directory."""
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.to(device)
model.eval()
# Map logits indices to readable labels
labels = [model.config.id2label[i] for i in range(model.config.num_labels)]
return tokenizer, model, labels
def predict_single(
text: str,
tokenizer,
model,
labels: list[str],
device: torch.device,
max_length: int,
) -> dict[str, float]:
"""Return per-label probabilities in percent (0-100)."""
batch = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=max_length,
)
batch = {k: v.to(device) for k, v in batch.items()}
with torch.inference_mode():
logits = model(**batch).logits
probs = torch.softmax(logits, dim=-1)[0].detach().to("cpu").tolist()
return {label: round(score * 100.0, 2) for label, score in zip(labels, probs)}
def iter_texts(
texts: Iterable[str],
tokenizer,
model,
labels: list[str],
device: torch.device,
max_length: int,
) -> None:
for idx, text in enumerate(texts, start=1):
scores = predict_single(text, tokenizer, model, labels, device, max_length)
best_label = max(scores, key=scores.get)
best_score = scores[best_label]
print(f"Sample #{idx}")
print(f"Text : {text}")
print(f"Top label : {best_label} ({best_score}%)")
print(f"Scores : {json.dumps(scores, ensure_ascii=False)}")
print()
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Run a local sanity check on the packaged Kinoux spam/ham model"
)
parser.add_argument(
"--model-dir",
default=str(DEFAULT_MODEL_DIR),
help="Directory containing the fine-tuned model files",
)
parser.add_argument(
"--max-length",
type=int,
default=256,
help="Tokenizer max_length parameter",
)
parser.add_argument(
"--limit",
type=int,
default=10,
help="How many sample texts to run",
)
return parser.parse_args()
def main() -> None:
args = parse_args()
model_dir = Path(args.model_dir).resolve()
if not model_dir.exists():
raise SystemExit(f"Model directory not found: {model_dir}")
device = torch.device("cpu")
print(f"Device : {device}")
print(f"Model dir : {model_dir}")
print()
tokenizer, model, labels = load_model(model_dir, device)
texts = SAMPLE_FRENCH_TEXTS[: max(0, args.limit)]
if len(texts) == 0:
raise SystemExit("No texts to run (limit is 0)")
start = time.time()
iter_texts(texts, tokenizer, model, labels, device, args.max_length)
elapsed = time.time() - start
print(f"Total latency: {elapsed:.3f}s for {len(texts)} samples")
if __name__ == "__main__":
main()
Run the test
From the project root:
python test_spam_ham.py
Useful options:
python test_spam_ham.py --max-length 128 python test_spam_ham.py --limit 10 python test_spam_ham.py --model-dir ./model
What the script does
Device selection
The script runs on CPU on purpose. This keeps the tutorial predictable: if you can run Python, you can run this model, with no GPU drivers, CUDA toolkits, or platform-specific setup.
In practice, this line forces CPU:
device = torch.device("cpu")
Then, when we create tensors (the tokenized inputs), we move them to the same device with:
v.to(device)
Why it matters: PyTorch will error if the model is on CPU but the inputs are on another device (or vice versa). The “model + inputs on the same device” rule is a common beginner pitfall.
Loading the tokenizer and the model
A text model cannot read raw strings directly. It needs two components:
- Tokenizer: turns text into numbers (token IDs) the model understands.
- Model: takes those numbers and outputs a prediction.
That is why the script loads both from the same model/ directory:
AutoTokenizer.from_pretrained(model_dir)loads tokenization rules and vocabulary.AutoModelForSequenceClassification.from_pretrained(model_dir)loads the trained weights.
Then we call:
model.eval()
This switches the model to inference mode (it disables training-only behaviors like dropout). For beginners: always call eval() when you are doing predictions.
Finally, we build the list of human-readable labels with:
model.config.id2label
This mapping tells us which output index corresponds to which label (for example 0 -> ham, 1 -> spam, depending on how the model was trained). The script uses it to print results with label names instead of numeric indices.
Inference and probabilities
When you call the model, the output is not a direct “percentage” yet. The model returns logits:
- logits are raw scores, one score per label
- the highest logit usually corresponds to the predicted label
To make this easier to interpret, we convert logits into probabilities with softmax:
- probabilities are normalized scores between 0 and 1
- they sum to 1 across labels (or 100% once converted to percentages)
That is what this part does:
logits = model(**batch).logitsprobs = torch.softmax(logits, dim=-1)
Then the script prints:
- Top label: the label with the highest probability
- Scores: a full dict like
{ "spam": 97.12, "ham": 2.88 }
Beginner note: a “high percentage” is a confidence signal, not a guarantee. The best way to trust it is to validate on your own real messages and, later, choose a decision threshold (for example “treat as spam only if spam ≥ 95%”).
Production notes
- The model expects French. If you feed English or mixed-language text, validate on your real data.
- Decision threshold: probabilities are useful, but in production you usually trigger actions from a threshold (e.g.
spam >= 95%). - Long texts: if inputs exceed
max_length, they will be truncated. The rightmax_lengthdepends on your model and your use case.
Next steps
Once this local test is validated, the next logical steps are:
- expose a
POST /predictendpoint with FastAPI - dockerize
- add logs + timeouts + batching
With these steps, you move from a local check to a production-ready building block that can be integrated into any back-office or moderation pipeline.
