You've trained a model that achieves 94% accuracy on your validation set. You save the
.pkl file, close your notebook, and โฆ now what? The gap between a working
Jupyter notebook and a model that actually serves predictions in production is exactly
what MLOps was built to bridge. This MLOps tutorial for beginners
walks you through the four essential steps โ packaging, containerization, deployment, and
monitoring โ so you can ship your first model with confidence.
By the end of this guide, you'll have a fully functional model inference API running on AWS Elastic Beanstalk or Google Cloud Run, complete with health checks, logging, and a simple monitoring dashboard. No prior DevOps experience required.
Why MLOps Matters (Even for Solo Projects)
MLOps isn't just for enterprise teams with dedicated infrastructure engineers. If you've ever tried to share a model with a colleague or redeploy after a data update, you've felt the pain that MLOps solves. At its core, MLOps is a set of practices that make your ML workflow reproducible, scalable, and maintainable. For a solo practitioner or small team, that means:
- Reproducibility: Pin dependencies and versions so your model runs the same way every time.
- Portability: Containerize your model so it runs on your laptop, a VM, or a Kubernetes cluster without changes.
- Observability: Know when your model is serving bad predictions or when latency spikes.
- Iteration speed: Deploy updates in minutes instead of days.
This guide focuses on the minimum viable MLOps pipeline โ just enough to get your model into production without unnecessary complexity. Let's dive in.
What You'll Need (Prerequisites)
Before we start, make sure you have these basics ready:
- Python 3.9+ installed locally
- A trained ML model (we'll use a scikit-learn pipeline as an example)
- Docker Desktop installed and running
- An AWS account (free tier) or a Google Cloud account (free tier with $300 credit)
- Basic familiarity with the command line
If you don't have a model ready, you can use the sample Iris classifier we provide in our Iris ML Pipeline project โ that project pairs perfectly with this deployment guide.
Step 1: Package Your Model with a Consistent Interface
The first step in any MLOps tutorial for beginners is learning to treat
your model as a service, not a notebook cell. We'll create a simple Python module that
loads the model and exposes a predict function.
Create the project structure
Organize your code so it's clean and importable:
ml-deployment/
โโโ model/
โ โโโ __init__.py
โ โโโ train.py # your training script (already run)
โ โโโ model.pkl # the trained model artifact
โโโ app/
โ โโโ __init__.py
โ โโโ predict.py # inference wrapper
โ โโโ schemas.py # input/output validation
โโโ requirements.txt
โโโ Dockerfile
โโโ README.md
Write the inference wrapper
Create app/predict.py with a clean interface:
import pickle
import numpy as np
from pathlib import Path
MODEL_PATH = Path(__file__).parent.parent / "model" / "model.pkl"
def load_model():
with open(MODEL_PATH, "rb") as f:
return pickle.load(f)
def predict(features: np.ndarray) -> np.ndarray:
model = load_model()
return model.predict(features)
This separation of concerns means you can test the prediction logic independently, swap models without touching the API layer, and version your model artifacts. For a complete example with input validation, check our Python API Design Patterns tutorial.
Step 2: Containerize with Docker
Containerization guarantees that your model runs identically in development, staging, and production. Docker is the industry standard, and it's surprisingly easy to set up for ML models.
Write a Dockerfile
Create a Dockerfile in your project root:
# Use an official Python runtime as base
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application
COPY . .
# Expose port 8080 (Cloud Run / EB default)
EXPOSE 8080
# Run the API server
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8080"]
We use uvicorn to serve a FastAPI app (or Flask โ whichever you prefer). FastAPI gives us automatic OpenAPI docs and input validation with Pydantic, which is a huge win for MLOps.
python:3.10-slim
is ~120 MB, while the full python:3.10 is over 800 MB. For GPU inference,
use nvidia/cuda:12.2-runtime-ubuntu22.04 as your base.
Build and test locally
docker build -t ml-iris-api .
docker run -p 8080:8080 ml-iris-api
Visit http://localhost:8080/docs โ you should see the Swagger UI.
Test a prediction with sample data. If it works locally, it will work in the cloud.
Step 3: Deploy to AWS or GCP
This is where your model becomes a real production service. I'll show you both AWS and GCP options so you can choose what fits your workflow.
Option A: Deploy to AWS Elastic Beanstalk
Elastic Beanstalk abstracts away the underlying infrastructure (EC2, load balancer, auto-scaling) and lets you deploy via the CLI or console. Best for teams already invested in AWS.
- Install the EB CLI and run
eb initin your project folder. - Set the platform to "Docker" (EB detects your Dockerfile automatically).
- Create an environment:
eb create ml-iris-env --single(single instance for testing). - Deploy:
eb deploy. EB builds your Docker image and deploys it. - Open the app:
eb open. Your API is live.
Elastic Beanstalk also gives you a health monitoring URL and basic logs out of the box. For a deeper walkthrough, see our AWS ML Deployment project.
Option B: Deploy to Google Cloud Run
Cloud Run is a fully managed serverless container platform. It's perfect for ML inference because it scales to zero when not in use (saving money) and scales up under load.
- Enable the Cloud Run API and install the
gcloudCLI. - Build your image with Cloud Build:
gcloud builds submit --tag gcr.io/<PROJECT_ID>/ml-iris-api - Deploy to Cloud Run:
gcloud run deploy ml-iris-api \ --image gcr.io/<PROJECT_ID>/ml-iris-api \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --memory 512Mi \ --timeout 60 - You'll get a HTTPS URL like
https://ml-iris-api-xxxx-uc.a.run.app.
Cloud Run includes built-in logging (Cloud Logging), request metrics, and automatic TLS. No infrastructure to manage. It's my personal recommendation for most MLOps tutorial for beginners deployments.
Step 4: Basic Monitoring & Logging
Deployment isn't the finish line โ it's the starting line. Without monitoring, you're flying blind. Here's the minimal monitoring setup every model should have:
Health checks & readiness probes
Add a