Building Microsoft ML+Python App Container

I wanted to share a quick post about building out a Microsoft Machine Learning container that can be used with Azure App Service to host a front-end web app developed in Python which makes use of Microsoft’s ML Libraries.  The full documentation may be found here: https://docs.microsoft.com/en-us/machine-learning-server/what-is-machine-learning-server.  Additionally, if you’re looking just to install the client libraries on Linux that information can be found here: https://docs.microsoft.com/en-us/machine-learning-server/install/python-libraries-interpreter#install-python-libraries-on-linux.

However, if you plan to build a container and host it in Azure, particularly Azure App Service, there are a couple of things you’ll want to do in addition to simply installing the ML package.  This post will focus on getting the image size down and ensuring dependencies are installed.

There are a few problems that exist if one simply takes the route of creating a Dockerfile and including Run command such as:

# ML Server Python
RUN apt-get -y update \
&& apt-get install -y apt-transport-https wget \
&& dpkg -i packages-microsoft-prod.deb \
&& rm -f packages-microsoft-prod.deb \
&& apt-key adv –keyserver packages.microsoft.com –recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 \
&& apt-get -y update \
&& apt-get install -y microsoft-mlserver-packages-py-9.3.0

 

The problems are:

  1. The bits are likely in the wrong location
  2. The dependencies for them may not be installed in your desired location
  3. SIZE matters

If you’ve already got an app running for a front-end it that uses ML services deployed using Microsoft ML Server then I highly suggest that you use the direct API calls and Swagger capability to implement your application’s client-side interface.  However, if you’ve gone done the path of using the DeployClient() to discover service endpoints and make the calls then you will need the Microsoft ML client libraries installed.  Again, I strongly suggest using swagger and the requests package to create your client, but if you have to move forward with DeployClient() for now.  Here are my suggestions.

Thanks to Nick Reith with the help on explaining and illustrating the staged build for me.  To kick things off let’s ensure that we start this docker file by setting our stage 0:

FROM tiangolo/uwsgi-nginx-flask:python3.6 as stage0
#disto info
RUN cat /etc/issue
RUN pip install –upgrade pip \
&& apt-get update \
&& apt-get install -y apt-utils
# ML Server Python
RUN apt-get -y update \
&& apt-get install -y apt-transport-https wget \
&& dpkg -i packages-microsoft-prod.deb \
&& rm -f packages-microsoft-prod.deb \
&& apt-key adv –keyserver packages.microsoft.com –recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 \
&& apt-get -y update \
&& apt-get install -y microsoft-mlserver-packages-py-9.3.0

Note that I’m using tiangolo’s uwsgi-nginx-flax container.  This helps as I don’t have to worry about all of the initial configurations of uwsgi and nginx.  However, I will cover its usage and optimization in a subsequent post.  For now, let’s focus on the python and ML library set up in the image.

In the code block above note that we run the setup of the ML libraries all in a single line.  By default, it installs the packages into the folder /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages.  However, we need to move the required packages from that location over into our python runtime environment.  The good news is that we don’t need all of them.  The bad news is that they have a number of dependencies that we must install.  Also, installing the packages drives the image size up to about 5.6 GB.  That’s a pretty large container.  The real problem with the size shows up at deployment time.  Depending on the size of the App Service instance that I used the App Service would take as much as 13 minutes to pull the images and get it running.  This is definitely not desirable for iterative work or scale operations.  So, let’s reduce that footprint.

After the packages are installed completely, we’ll copy the ones we know that we need in order to use the DeployClient() as a client-side object for calling service endpoints.

#copy the ML libraries into the python path for the app to tmp folder to move later
RUN mkdir /tmp/mldeps
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/adal* /tmp/mldeps/
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/liac* /tmp/mldeps/
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/azureml* /tmp/mldeps/

The lines above I create a temporary holding location and recursively copy the folder for adal, liac, and azureml into that folder in preparation for the next stage.  To start the next stage the image is pulled from the same image repo that was used previously.  Subsequently, the files that were copied from the Microsoft ML install into a temp location are copied into a temp location in this stage.

#start next stage
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Copy ml packages and app files from previous install and discard the rest
COPY –from=stage0 /tmp/mldeps /tmp/mldeps

We can’t copy them directly to the site-packages location, because if we do that prior to updating pip and installing dependencies we’ll get several dependency version errors.  Thus, getting the files over into the proper the location in this stage will follow the sequence of [copy from stage 0 to temp] –> [update pip] –> [install dependencies] –> [move to final location] –> [delete temp folder].

#must upgrade and install python package dependencies for ML packages before moving over ML packages
RUN pip install –upgrade pip \
&& apt-get update \
&& apt-get install -y apt-utils

#dependencies needed for azureml packages
RUN pip install dill PyJWT cryptography
# add needed packages for front-end app
RUN pip install dash dash-core-components dash-html-components dash_dangerously_set_inner_html pandas requests

#move ML packages into place
RUN cp -r /tmp/mldeps/* /usr/local/lib/python3.6/site-packages/

#remove the temp holding directory
RUN rm -r /tmp/mldeps

Other than a couple of more lines for setting up the ports that’s about it.  The results are an image that drops from 5.6 GB to about 1.3 GB as can be seen in my image repo.

C:\Users\jofultz\Documents\Visual Studio 2017\Projects\App%20Service%20Linux%20Python%20App>docker images

REPOSITORY                TAG                IMAGE ID         CREATED          SIZE

ml-py-reduced                   latest              f248747aedad       6 hours ago         1.31GB

ml-py                                   latest              f308579a7ac8        6 hours ago         5.66GB

Keeping the size down allows the image to be pulled initially and made operational in a much shorter timeframe. For ease of reading I’ve kept mostly discrete operations, but if you wanted to reduce the number of image layers you can combine a number of the RUN statements and reduce the layering for the image.  For reference, here is the full Dockerfile for building the reduced image:

FROM tiangolo/uwsgi-nginx-flask:python3.6 as stage0

#disto info
RUN cat /etc/issue

RUN pip install –upgrade pip \
&& apt-get update \
&& apt-get install -y apt-utils

# ML Server Python
RUN apt-get -y update \
&& apt-get install -y apt-transport-https wget \
&& wget https://packages.microsoft.com/config/ubuntu/16.04/packages-microsoft-prod.deb \
&& dpkg -i packages-microsoft-prod.deb \
&& rm -f packages-microsoft-prod.deb \
&& apt-key adv –keyserver packages.microsoft.com –recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 \
&& apt-get -y update \
&& apt-get install -y microsoft-mlserver-packages-py-9.3.0

#copy the ML libraries into the python path for the app to tmp folder to move later
RUN mkdir /tmp/mldeps
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/adal* /tmp/mldeps/
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/liac* /tmp/mldeps/
RUN cp -r /opt/microsoft/mlserver/9.3.0/runtime/python/lib/python3.5/site-packages/azureml* /tmp/mldeps/

#start next stage
FROM tiangolo/uwsgi-nginx-flask:python3.6

# Copy ml packages and app files from previous install and discard the rest
COPY –from=stage0 /tmp/mldeps /tmp/mldeps

#must upgrade and install python package dependencies for ML packages before moving over ML packages
RUN pip install –upgrade pip \
&& apt-get update \
&& apt-get install -y apt-utils

#dependencies needed for azureml packages
RUN pip install dill PyJWT cryptography
# add needed packages for front-end app
RUN pip install dash dash-core-components dash-html-components dash_dangerously_set_inner_html pandas requests

#move ML packages into place
RUN cp -r /tmp/mldeps/* /usr/local/lib/python3.6/site-packages/

#remove the temp holding directory
RUN rm -r /tmp/mldeps

ENV LISTEN_PORT=80

EXPOSE 80

COPY /app /app

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s