On the importance of using Docker 🐳

Jun 29, 2020

On the importance of using Docker 🐳


I have always talked about the importance of adopting agile methodology in analytics projects.

With the fast pace of data acquisition and the immense volume of data points collected every hour in certain domains,  being  able to iterate and deploy very fast is a vital necessity in today's cutting-edge era.

 fish swallows an Egyptian soldier in a mosaic scene depicting the splitting of the Red Sea from the Exodus story




I wanted to share some thoughts on a very important set of technologies that are widely used in certain organizations in the automation, the deployment and  the running of robust and stable data products.


I am most certainly sure that you have heard of Docker! That big blue smiling whale  🐳 

Below, I will boil down in simple terms what is it and why I believe it is an important tool in the field of data analytics.

What is Docker ?


Simply put, Docker is an open-source application that utilizes the container paradigm at its core functioning. It allows developers to eliminate hours of work every year, lost in doing repetitive work of setting up environments, installing OS and other applications and therefor allowing them to be more efficient and providing them with precious time that should be spent in more added-value tasks.

It allows the packaging of an application and its dependencies in a virtualized container, permitting its deployment in other environment without the hassle of thinking about compatibility and runtime issues.


Docker gives the possibility of making sure the application could be launched in an isolated environments, therefore providing the huge possibility of deploying it on premise, in a public or private cloud...etc

Docker is not the equivalent of a virtual machine because it does not require the user to worry about setting up the OS and its different dependencies. Its technology relies of utilizing the Linux kernel and the different host system resources (CPU, memory, storage...etc) to provide an isolated environment relative to the application.



Benefits of using Docker


Several benefits could be gained by utilizing Docker in an enterprise context. It goes without mention that the success of this technology comes at no surprise considering the gains it allows us to have >


Flexibility & Portability :


One of the major pain points for IT professionals while working with different sets of applications is the extreme complexity they face to guarantee that applications are smoothly deployed and ran on their production environment.

This complexity is doubled when there are constraints within the company if they have different sets of environments for development, testing and production. It is not always possible to have an iso-system for all these environments and sometimes upgrading the python version in a server might have disastrous consequences on other applications using the older version.

Due to its technology, Docker permits organizations to worry less about this issue and provides a certain flexibility and portability to its users. The core system and its dependencies are not effected by one application, each Docker container has its owns requirements installed within the container.

This makes the task of spinning up a Docker container and deploying it anywhere , a piece of cake! You don't need to think about whether or not the application dev team has thought of making sure the exact system requirements are respected, and offers this flexibility for the dev team to only focus on how to build a running software without the hassle of prerequisites installation and version compatibility .



Efficiency & Scalability :


You can not fully grasp the fact that with all its features, Docker provides a certain amount of efficiency. Gaining time that could have been spent in prerequisites verification and installation is a major efficiency feature. No more time lost due to inadequate configuration, error of compatibility and issues we all could face.

Docker's isolation feature allows developers to pick and choose their technological stack and use the most efficient version of any software to include in their application,  which means that is you believe that a certain programing language is more user friendly than other, or if you are more efficient in working with R better than python, Docker provides you with the possibility of using the best tools for you.

And because launching an application with Docker is a matter of a couple of clicks ( which can be automated) spinning up a container in few seconds provides a major scalability feature that could be utilized when demand is very high and there is a need to scale up or down a certain application.


Security:


Lastly and most importantly, the security feature provided by Docker is a less-known feature. By design, Docker ensures that every application running inside a container is isolated from other applications and containers, even if they are deployed on the same system/server.

Docker ensures that every container is completely segregated and isolated and from an architecture point of view, the system administrator have complete control over the flow of data and the interactions between different system.



Final thoughts:


I believe that data scientist and software engineers are better off focusing on enhancing their code, building efficient and user friendly application. Time lost in setting up infrastructure and dealing with compatibility issues is for me a lost investment .

Using Docker is in today's extremely competitive world is a must. It because some sort of a filter criteria for me to evaluate how efficient and competitive a certain company is, based on their techno stack and whether or not they utilize this type con technologies.

I should mention that the container based solutions are not limited to Docker. There are various other tools that can substitute it.

It goes without saying that some people have brought up the disadvantages of Docker in certain contexts. I am fully aware that in different situations, there should be an adapted solution/ tool.

Docker might have few drawback, however I believe that for a well structured organization, taking advantage of this tool, while keeping in mind its constrains (and doing something about them) is the answer here.


Feel free to share your thoughts about your experience with Docker in the comment section! If you need any advice about how to use Docker in Data Science applications, hit me up on Twitter!