MLOps Beginner Guide: Deploy Your First ML Model End-to-End

You've trained a model that achieves 94% accuracy on your validation set. You save the .pkl file, close your notebook, and … now what? The gap between a working Jupyter notebook and a model that actually serves predictions in production is exactly what MLOps was built to bridge. This MLOps tutorial for beginners walks you through the four essential steps — packaging, containerization, deployment, and monitoring — so you can ship your first model with confidence.

By the end of this guide, you'll have a fully functional model inference API running on AWS Elastic Beanstalk or Google Cloud Run, complete with health checks, logging, and a simple monitoring dashboard. No prior DevOps experience required.

Why MLOps Matters (Even for Solo Projects)

MLOps isn't just for enterprise teams with dedicated infrastructure engineers. If you've ever tried to share a model with a colleague or redeploy after a data update, you've felt the pain that MLOps solves. At its core, MLOps is a set of practices that make your ML workflow reproducible, scalable, and maintainable. For a solo practitioner or small team, that means:

Reproducibility: Pin dependencies and versions so your model runs the same way every time.
Portability: Containerize your model so it runs on your laptop, a VM, or a Kubernetes cluster without changes.
Observability: Know when your model is serving bad predictions or when latency spikes.
Iteration speed: Deploy updates in minutes instead of days.

This guide focuses on the minimum viable MLOps pipeline — just enough to get your model into production without unnecessary complexity. Let's dive in.

What You'll Need (Prerequisites)

Before we start, make sure you have these basics ready:

Python 3.9+ installed locally
A trained ML model (we'll use a scikit-learn pipeline as an example)
Docker Desktop installed and running
An AWS account (free tier) or a Google Cloud account (free tier with $300 credit)
Basic familiarity with the command line

If you don't have a model ready, you can use the sample Iris classifier we provide in our Iris ML Pipeline project — that project pairs perfectly with this deployment guide.

Step 1: Package Your Model with a Consistent Interface

The first step in any MLOps tutorial for beginners is learning to treat your model as a service, not a notebook cell. We'll create a simple Python module that loads the model and exposes a predict function.

Create the project structure

Organize your code so it's clean and importable:

ml-deployment/
├── model/
│   ├── __init__.py
│   ├── train.py          # your training script (already run)
│   └── model.pkl         # the trained model artifact
├── app/
│   ├── __init__.py
│   ├── predict.py        # inference wrapper
│   └── schemas.py        # input/output validation
├── requirements.txt
├── Dockerfile
└── README.md

Write the inference wrapper

Create app/predict.py with a clean interface:

import pickle
import numpy as np
from pathlib import Path

MODEL_PATH = Path(__file__).parent.parent / "model" / "model.pkl"

def load_model():
    with open(MODEL_PATH, "rb") as f:
        return pickle.load(f)

def predict(features: np.ndarray) -> np.ndarray:
    model = load_model()
    return model.predict(features)

This separation of concerns means you can test the prediction logic independently, swap models without touching the API layer, and version your model artifacts. For a complete example with input validation, check our Python API Design Patterns tutorial.

Step 2: Containerize with Docker

Containerization guarantees that your model runs identically in development, staging, and production. Docker is the industry standard, and it's surprisingly easy to set up for ML models.

Write a Dockerfile

Create a Dockerfile in your project root:

# Use an official Python runtime as base
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application
COPY . .

# Expose port 8080 (Cloud Run / EB default)
EXPOSE 8080

# Run the API server
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8080"]

We use uvicorn to serve a FastAPI app (or Flask — whichever you prefer). FastAPI gives us automatic OpenAPI docs and input validation with Pydantic, which is a huge win for MLOps.

💡 Pro tip: Keep your base image slim. python:3.10-slim is ~120 MB, while the full python:3.10 is over 800 MB. For GPU inference, use nvidia/cuda:12.2-runtime-ubuntu22.04 as your base.

Build and test locally

docker build -t ml-iris-api .
docker run -p 8080:8080 ml-iris-api

Visit http://localhost:8080/docs — you should see the Swagger UI. Test a prediction with sample data. If it works locally, it will work in the cloud.

Step 3: Deploy to AWS or GCP

This is where your model becomes a real production service. I'll show you both AWS and GCP options so you can choose what fits your workflow.

Option A: Deploy to AWS Elastic Beanstalk

Elastic Beanstalk abstracts away the underlying infrastructure (EC2, load balancer, auto-scaling) and lets you deploy via the CLI or console. Best for teams already invested in AWS.

Install the EB CLI and run eb init in your project folder.
Set the platform to "Docker" (EB detects your Dockerfile automatically).
Create an environment: eb create ml-iris-env --single (single instance for testing).
Deploy: eb deploy. EB builds your Docker image and deploys it.
Open the app: eb open. Your API is live.

Elastic Beanstalk also gives you a health monitoring URL and basic logs out of the box. For a deeper walkthrough, see our AWS ML Deployment project.

Option B: Deploy to Google Cloud Run

Cloud Run is a fully managed serverless container platform. It's perfect for ML inference because it scales to zero when not in use (saving money) and scales up under load.

Enable the Cloud Run API and install the gcloud CLI.
Build your image with Cloud Build: gcloud builds submit --tag gcr.io/<PROJECT_ID>/ml-iris-api

Deploy to Cloud Run:

gcloud run deploy ml-iris-api \
  --image gcr.io/<PROJECT_ID>/ml-iris-api \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 512Mi \
  --timeout 60

You'll get a HTTPS URL like https://ml-iris-api-xxxx-uc.a.run.app.

Cloud Run includes built-in logging (Cloud Logging), request metrics, and automatic TLS. No infrastructure to manage. It's my personal recommendation for most MLOps tutorial for beginners deployments.

⚙️ Cost note: Both AWS free tier and GCP free tier cover small deployments. Cloud Run's "scale to zero" means you pay only when requests come in — often less than $5/month for a low-traffic model.

Step 4: Basic Monitoring & Logging

Deployment isn't the finish line — it's the starting line. Without monitoring, you're flying blind. Here's the minimal monitoring setup every model should have:

Health checks & readiness probes

Add a

MLOps Tutorial for Beginners: Deploy Your First ML Model End-to-End