Data Science Models, Containers, Kubernetes: A Primer for Beginners
A senior architect at Saxony Partners, Yasir Bashir works with our clients to ensure that their technology investments are aligned with their business goals. Yasir has deep expertise related to AI, data science, machine learning, automation, and software. He believes that automation is key to solving many of the inefficiencies plaguing the American healthcare system. He periodically shares thought leadership related to data science. In this post, he introduces us to the basics of data science models and their maintenance.
Healthcare firms are increasingly relying on data science models to resolve business problems – but this has created some problems in its own right.
These models – many of which leverage machine learning and/or artificial intelligence – depend on specific versions of libraries, packages, and frameworks. Maintaining those exact versions is a prerequisite for model deployment. This is fairly easy to do when working with a single model.
But what happens when multiple models are deployed in the same environment – each requiring different versions of the same library, package, and framework? This is when things can get complicated.
In order to solve business problems using data science, everything depends on the continuous iteration of models. The faster they can be deployed, the better. The more accessible they are, the better. The way to ensure fast deployment and accessibility is to leverage DevOps.
The combination of software development with information technology operations (the functional definition of DevOps), DevOps has become a critical component of the data science lifecycle. It allows teams to easily deploy and move models through different environments by building Continuous Integration (CI) and Continuous Deployment (CD) automation.
This need for effective CI/CD has been a primary driver to the rise in usage of containers – as well as container management tools.
Containers are Lightweight Standalone Executable Packages that include everything needed to run and execute a model, regardless of the environment. It includes applications and services code, supporting files and runtime, libraries, and key settings and configurations. By utilizing containers, models will be able to run across all environments, application development will be faster and library version issues will be eliminated.
Much in the same way that multiple models need containers to ensure that they operate correctly in varied environments, the same is true when you deploy multiple containers. This idea of a container for containers is called Container Management. The most commonly used container manager is called Kubernetes.
Kubernetes is a process that automates the deploying, scaling, and managing of containers. It assists in the ongoing orchestration of containerized applications, automates the deployment pipeline, and abstracts physical and virtual requirements.
With Kubernetes, you can deploy containers several times per day without retooling. It allows for manual triggering and/or automated deployments – as well as for centralized monitoring and management of
Data science can absolutely be leveraged to solve your business problems – but data, model code, and process synchronization rely on CI/CD and containers to achieve those benefits. Fortunately, the data strategy experts at Saxony Partners are here to assist. We help clients do more with their data by leveraging technology and strategy to improve outcomes, reduce inefficiencies, and drive performance.
Have questions about data strategy? Reach out to our team today: