This step-by-step guide shows how to deploy a local spam detection model (spam/ham) in Python with Hugging Face Transformers on CPU—no server, no cloud.
You will load the model from a local model/ folder and run predictions on 10 French sample texts to verify that inference works end-to-end.
What you need
Before you start, make sure you have the following in place:
- Python 3.11+
- A virtual environment (recommended)
- The model files in a local folder (e.g.
./model/)
Your model/ folder must contain the model files (real example, inference side):
config.jsonmodel.safetensorstokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonvocab.txt
Recommended structure
Keep the model files in a dedicated model/ folder at the project root, next to test_spam_ham.py. This makes the default path work out of the box and keeps the tutorial copy/paste friendly.
model/contains the packaged Hugging Face artifacts.requirements.txtpins the Python dependencies for a reproducible install.test_spam_ham.pyis the single entrypoint you run from the project root.
If you later change the layout, you can pass the correct path with --model-dir.
kinoux-spam-ham-local/
model/
... model files ...
requirements.txt
test_spam_ham.py
Installation
From the project root, create a requirements.txt file:
transformers>=4.40,<5.0 torch safetensors
Then install:
python -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt
Note: safetensors is required when the model is shipped as .safetensors.
Immediate test script (10 reviews/messages)
Create test_spam_ham.py at the project root.
This script:
- uses CPU only (no GPU required)
- loads the tokenizer + model from
./model/ - runs 10 French texts
- prints per-label scores (in %)
Run the test
From the project root:
python test_spam_ham.py
Useful options:
python test_spam_ham.py --max-length 128 python test_spam_ham.py --limit 10 python test_spam_ham.py --model-dir ./model
What the script does
Device selection
The script runs on CPU on purpose. This keeps the tutorial predictable: if you can run Python, you can run this model, with no GPU drivers, CUDA toolkits, or platform-specific setup.
In practice, this line forces CPU:
device = torch.device("cpu")
Then, when we create tensors (the tokenized inputs), we move them to the same device with:
v.to(device)
Why it matters: PyTorch will error if the model is on CPU but the inputs are on another device (or vice versa). The “model + inputs on the same device” rule is a common beginner pitfall.
Loading the tokenizer and the model
A text model cannot read raw strings directly. It needs two components:
- Tokenizer: turns text into numbers (token IDs) the model understands.
- Model: takes those numbers and outputs a prediction.
That is why the script loads both from the same model/ directory:
AutoTokenizer.from_pretrained(model_dir)loads tokenization rules and vocabulary.AutoModelForSequenceClassification.from_pretrained(model_dir)loads the trained weights.
Then we call:
model.eval()
This switches the model to inference mode (it disables training-only behaviors like dropout). For beginners: always call eval() when you are doing predictions.
Finally, we build the list of human-readable labels with:
model.config.id2label
This mapping tells us which output index corresponds to which label (for example 0 -> ham, 1 -> spam, depending on how the model was trained). The script uses it to print results with label names instead of numeric indices.
Inference and probabilities
When you call the model, the output is not a direct “percentage” yet. The model returns logits:
- logits are raw scores, one score per label
- the highest logit usually corresponds to the predicted label
To make this easier to interpret, we convert logits into probabilities with softmax:
- probabilities are normalized scores between 0 and 1
- they sum to 1 across labels (or 100% once converted to percentages)
That is what this part does:
logits = model(**batch).logitsprobs = torch.softmax(logits, dim=-1)
Then the script prints:
- Top label: the label with the highest probability
- Scores: a full dict like
{ "spam": 97.12, "ham": 2.88 }
Beginner note: a “high percentage” is a confidence signal, not a guarantee. The best way to trust it is to validate on your own real messages and, later, choose a decision threshold (for example “treat as spam only if spam ≥ 95%”).
Production notes
- The model expects French. If you feed English or mixed-language text, validate on your real data.
- Decision threshold: probabilities are useful, but in production you usually trigger actions from a threshold (e.g.
spam >= 95%). - Long texts: if inputs exceed
max_length, they will be truncated. The rightmax_lengthdepends on your model and your use case.
Next steps
Once this local test is validated, the next logical steps are:
- expose a
POST /predictendpoint with FastAPI - dockerize
- add logs + timeouts + batching
With these steps, you move from a local check to a production-ready building block that can be integrated into any back-office or moderation pipeline.
