This example shows how to deploy a classifier trained on the famous iris data set using scikit-learn.
- Create a Python file
trainer.py. - Use scikit-learn's
LogisticRegressionto train your model. - Add code to pickle your model (you can use other serialization libraries such as joblib).
- Upload it to S3 (boto3 will need access to valid AWS credentials).
import boto3
import pickle
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Train the model
iris = load_iris()
data, labels = iris.data, iris.target
training_data, test_data, training_labels, test_labels = train_test_split(data, labels)
model = LogisticRegression(solver="lbfgs", multi_class="multinomial")
model.fit(training_data, training_labels)
accuracy = model.score(test_data, test_labels)
print("accuracy: {:.2f}".format(accuracy))
# Upload the model
pickle.dump(model, open("model.pkl", "wb"))
s3 = boto3.client("s3")
s3.upload_file("model.pkl", "my-bucket", "sklearn/iris-classifier/model.pkl")Run the script locally:
# Install scikit-learn and boto3
$ pip3 install sklearn boto3
# Run the script
$ python3 trainer.py- Create another Python file
predictor.py. - Add code to load and initialize your pickled model.
- Add a prediction function that will accept a sample and return a prediction from your model.
# predictor.py
import pickle
import numpy as np
model = None
labels = ["setosa", "versicolor", "virginica"]
def init(model_path, metadata):
global model
model = pickle.load(open(model_path, "rb"))
def predict(sample, metadata):
measurements = [
sample["sepal_length"],
sample["sepal_width"],
sample["petal_length"],
sample["petal_width"],
]
label_id = model.predict(np.array([measurements]))[0]
return labels[label_id]Create a requirements.txt file to specify the dependencies needed by predictor.py. Cortex will automatically install them into your runtime once you deploy:
# requirements.txt
numpyYou can skip dependencies that are pre-installed to speed up the deployment process. Note that pickle is part of the Python standard library so it doesn't need to be included.
Create a cortex.yaml file and add the configuration below. A deployment specifies a set of resources that are deployed together. An api provides a runtime for inference and makes our predictor.py implementation available as a web service that can serve real-time predictions:
# cortex.yaml
- kind: deployment
name: iris
- kind: api
name: classifier
predictor:
path: predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/model.pklcortex deploy takes the declarative configuration from cortex.yaml and creates it on your Cortex cluster:
$ cortex deploy
creating classifier apiTrack the status of your deployment using cortex get:
$ cortex get classifier --watch
status up-to-date available requested last update avg latency
live 1 1 1 8s -
endpoint: http://***.amazonaws.com/iris/classifierThe output above indicates that one replica of the API was requested and is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
We can use curl to test our prediction service:
$ curl http://***.amazonaws.com/iris/classifier \
-X POST -H "Content-Type: application/json" \
-d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}'
"setosa"Add a tracker to your cortex.yaml and specify that this is a classification model:
# cortex.yaml
- kind: deployment
name: iris
- kind: api
name: classifier
predictor:
path: predictor.py
tracker:
model_type: classificationRun cortex deploy again to perform a rolling update to your API with the new configuration:
$ cortex deploy
updating classifier apiAfter making more predictions, your cortex get command will show information about your API's past predictions:
$ cortex get classifier --watch
status up-to-date available requested last update avg latency
live 1 1 1 16s 28ms
class count
setosa 8
versicolor 2
virginica 4This model is fairly small but larger models may require more compute resources. You can configure this in your cortex.yaml:
- kind: deployment
name: iris
- kind: api
name: classifier
predictor:
path: predictor.py
tracker:
model_type: classification
compute:
cpu: 0.5
mem: 1GYou could also configure GPU compute here if your cluster supports it. Adding compute resources may help reduce your inference latency. Run cortex deploy again to update your API with this configuration:
$ cortex deploy
updating classifier apiRun cortex get again:
$ cortex get classifier --watch
status up-to-date available requested last update avg latency
live 1 1 1 16s 24 ms
class count
setosa 8
versicolor 2
virginica 4If you trained another model and want to A/B test it with your previous model, simply add another api to your configuration and specify the new model:
- kind: deployment
name: iris
- kind: api
name: classifier
predictor:
path: predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/model.pkl
tracker:
model_type: classification
compute:
cpu: 0.5
mem: 1G
- kind: api
name: another-classifier
predictor:
path: predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/another-model.pkl
tracker:
model_type: classification
compute:
cpu: 0.5
mem: 1GRun cortex deploy to create the new API:
$ cortex deploy
creating another-classifier apicortex deploy is declarative so the classifier API is unchanged while another-classifier is created:
$ cortex get --watch
api status up-to-date available requested last update
classifier live 1 1 1 5m
another-classifier live 1 1 1 8sFirst, implement batch-predictor.py with a predict function that can process an array of samples:
# batch-predictor.py
import pickle
import numpy as np
model = None
labels = ["setosa", "versicolor", "virginica"]
def init(model_path, metadata):
global model
model = pickle.load(open(model_path, "rb"))
def predict(sample, metadata):
measurements = [
[s["sepal_length"], s["sepal_width"], s["petal_length"], s["petal_width"]] for s in sample
]
label_ids = model.predict(np.array(measurements))
return [labels[label_id] for label_id in label_ids]Next, add the api to cortex.yaml:
- kind: deployment
name: iris
- kind: api
name: classifier
predictor:
path: predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/model.pkl
tracker:
model_type: classification
compute:
cpu: 0.5
mem: 1G
- kind: api
name: another-classifier
predictor:
path: predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/another-model.pkl
tracker:
model_type: classification
compute:
cpu: 0.5
mem: 1G
- kind: api
name: batch-classifier
predictor:
path: batch-predictor.py
model: s3://cortex-examples/sklearn/iris-classifier/model.pkl
compute:
cpu: 0.5
mem: 1GRun cortex deploy to create the batch API:
$ cortex deploy
creating batch-classifier apicortex get should show all three APIs now:
$ cortex get --watch
api status up-to-date available requested last update
classifier live 1 1 1 10m
another-classifier live 1 1 1 5m
batch-classifier live 1 1 1 8s$ curl http://***.amazonaws.com/iris/classifier \
-X POST -H "Content-Type: application/json" \
-d '[
{
"sepal_length": 5.2,
"sepal_width": 3.6,
"petal_length": 1.5,
"petal_width": 0.3
},
{
"sepal_length": 7.1,
"sepal_width": 3.3,
"petal_length": 4.8,
"petal_width": 1.5
},
{
"sepal_length": 6.4,
"sepal_width": 3.4,
"petal_length": 6.1,
"petal_width": 2.6
}
]'
["setosa","versicolor","virginica"]Run cortex delete to spin down your API:
$ cortex delete
deleting classifier api
deleting another-classifier api
deleting batch-classifier apiRunning cortex delete will free up cluster resources and allow Cortex to scale down to the minimum number of instances you specified during cluster installation. It will not spin down your cluster.
Any questions? chat with us.