Writing Dockerfile for Data Scientists and Machine Learning Engineers

Mathanraj Sharma
4 min readMay 18, 2021
PC: https://www.freepik.com/vectorpouch

We Data Scientists and ML Engineers focus more on building successful models and not much concerned about how we going to deploy the model in real-world production systems.

Why we need dockers?

Things will be simple when we only have to setup python environments, tools like Anaconda and venv make it easy to share the python environments as an environment.yml or requirements.txt to reproduce it. But what if have configured few more libraries, or we have used TensorRT or any other C++ libraries to do fast inferencing.

As the system gets larger we may install new dependencies and make configurations to the environment. Missing one of those steps might end up in the failure of the entire system. Sometimes it may work perfectly on our system but not on someone else's or on sever. It will be great if we can pack and ship our development environment, isn’t it?

That’s where DOCKER comes into play. Using docker you can simply share your development environment as a docker image. So that other developers can simply run that image and recreate the development environment you use, with a single command.

--

--

Mathanraj Sharma

Machine Learning Engineer at H2O.ai | Maker | Developer | Dev Blogger | Like my stories? Buy me a ☕ https://ko-fi.com/mathanraj_sharma