Case Studies Big Data

Big Data

Big Data Platform for Machine Learning and Data Analysis

Client

Company specializing in medical devices that produces and distributes endoscopes and surgical tools.

Company specializing in medical devices that produces and distributes endoscopes and surgical tools.

Technologies

Azure Kubernetes service, Terraform, Azure DevOps pipeline, Ingress controller, Cert-manager, network policy, ArgoCD, Helm Chart, Velero

Azure Kubernetes service, Terraform, Azure DevOps pipeline, Ingress controller, Cert-manager, network policy, ArgoCD, Helm Chart, Velero

Background

The client must store vast amounts of data in various formats from multiple sources in a unified data platform, accessible to machine learning engineers and data scientists within the organization for the development of machine learning applications and data analysis. The data platform is constructed on Azure cloud and handles pictures, videos from operational recordings, and sensor stream data from various operational devices. These data will be annotated, pre-processed using various data tools, and then used for model training. This is crucial for creating new machine-learning services such as cancer detection and patient anonymization

The client must store vast amounts of data in various formats from multiple sources in a unified data platform, accessible to machine learning engineers and data scientists within the organization for the development of machine learning applications and data analysis. The data platform is constructed on Azure cloud and handles pictures, videos from operational recordings, and sensor stream data from various operational devices. These data will be annotated, pre-processed using various data tools, and then used for model training. This is crucial for creating new machine-learning services such as cancer detection and patient anonymization

Challenge

The client’s big data platform encompasses various Azure services, in-house developed applications, and data applications. The platform’s central components are Azure storage accounts for big data storage, Azure Machine Learning Workspace for model training, and various web applications for data management and monitoring. Prior to Datics DevOps engineers joining the project, the entire data platform infrastructure was built manually. Most of the applications were hosted on Azure via the Azure Web Application Service, and each application’s deployment and setup were completed manually, leading to a time-consuming process with a higher risk of inaccuracies and longer bug-fixing times. Additionally, the security of deployed applications was not given adequate consideration, and data backup was also a weakness.

The client’s big data platform encompasses various Azure services, in-house developed applications, and data applications. The platform’s central components are Azure storage accounts for big data storage, Azure Machine Learning Workspace for model training, and various web applications for data management and monitoring. Prior to Datics DevOps engineers joining the project, the entire data platform infrastructure was built manually. Most of the applications were hosted on Azure via the Azure Web Application Service, and each application’s deployment and setup were completed manually, leading to a time-consuming process with a higher risk of inaccuracies and longer bug-fixing times. Additionally, the security of deployed applications was not given adequate consideration, and data backup was also a weakness.

Solution

In light of the aforementioned challenges, we, as DevOps engineers from Datics, took on the following major tasks:
• Designed the entire infrastructure using Azure best practices and implemented Infrastructure as Code (IaC) using Terraform and Azure DevOps pipeline. This automated approach creates a single source of truth for the data platform and greatly reduces deployment efforts across environments.
• To centralize application management, we opted to use Azure Kubernetes Service (AKS). All applications were efficiently containerized and automated, with Docker images securely stored in Azure Container Registry.
• Implemented GitOps with ArgoCD and Helm Chart to continuously monitor and automatically compare live application states to desired states specified in the Git repository, ensuring error-free deployment.
• Created backup strategies for the AKS cluster using Velero and storage accounts.
• Implemented network security. On the AKS side, we set up SSL/TLS for applications with an ingress controller and cert-manager, and enabled a WAF on the ingress controller for filtering and monitoring HTTP traffic. We also used network policies to isolate internal and external traffic in the Kubernetes cluster, and network security groups to define security rules for subnets or VMs. To secure storage accounts, we set up network firewalls and restricted access to only specific private subnets where AKS or Azure Machine Learning Workspace is allocated.

In light of the aforementioned challenges, we, as DevOps engineers from Datics, took on the following major tasks:
• Designed the entire infrastructure using Azure best practices and implemented Infrastructure as Code (IaC) using Terraform and Azure DevOps pipeline. This automated approach creates a single source of truth for the data platform and greatly reduces deployment efforts across environments.
• To centralize application management, we opted to use Azure Kubernetes Service (AKS). All applications were efficiently containerized and automated, with Docker images securely stored in Azure Container Registry.
• Implemented GitOps with ArgoCD and Helm Chart to continuously monitor and automatically compare live application states to desired states specified in the Git repository, ensuring error-free deployment.
• Created backup strategies for the AKS cluster using Velero and storage accounts.
• Implemented network security. On the AKS side, we set up SSL/TLS for applications with an ingress controller and cert-manager, and enabled a WAF on the ingress controller for filtering and monitoring HTTP traffic. We also used network policies to isolate internal and external traffic in the Kubernetes cluster, and network security groups to define security rules for subnets or VMs. To secure storage accounts, we set up network firewalls and restricted access to only specific private subnets where AKS or Azure Machine Learning Workspace is allocated.

Result

With the help of Datics DevOps engineers, the data platform infrastructure is now streamlined and can be easily recreated with just a pipeline trigger. The same pipeline configuration can be used for provisioning in dev, stage, and prod environments, freeing up valuable time for the team. Applications are centrally deployed and monitored with ArgoCD, and developers have access to the ArgoCD UI for troubleshooting. The platform now meets important security requirements outlined in the pen test report, and data backup strategies are in place with azure monitoring services.

With the help of Datics DevOps engineers, the data platform infrastructure is now streamlined and can be easily recreated with just a pipeline trigger. The same pipeline configuration can be used for provisioning in dev, stage, and prod environments, freeing up valuable time for the team. Applications are centrally deployed and monitored with ArgoCD, and developers have access to the ArgoCD UI for troubleshooting. The platform now meets important security requirements outlined in the pen test report, and data backup strategies are in place with azure monitoring services.

You will also like

Big Data

Industry: Medical Technology

Technologies: Azure Kubernetes service, Terraform, Azure DevOps pipeline

The client must store vast amounts of data in various formats from multiple sources in a unified data platform, accessible to machine learning engineers and data scientists within the organization for the development of machine learning applications and data analysis.

Automation in Banking systems

Industry: Automotive

Technologies:Java, Spring Boot, Java Persistence API(JPA), Hibernate

The client needs automation for manual procedures in handling bankrupt clients, as well as for document management and payment recommendation platforms.

Design and establish Voice AI system

Industry: Automotive

Technologies: Azure Functions, Github Actions, Python | Flask | Pytest, Microsoft LUIS

We developed a voice-based AI system utilizing voice recognition and natural language processing to communicate with customers in a car.

Full Body Anonymization

Industry: Medical Technology

Technologies: Gluoncv, opencv, decord and Python

A client in the medical technology industry has a requirement to protect the anonymity of individuals whose images or videos are being used.

Azure Cloud Monitoring

Industry: Medical Technology

Technologies: Grafana, Prometheus, Azure Monitor, TorchServe, Docker

The medical technology client aims to deploy their machine learning models on Azure cloud for active use by consumers.

Why don’t we talk business?

Get in touch with our experts and start your journey towards business evolution, innovation and profitability. Upgrade your company with our cutting-edge IT infrastructure and data management services. Contact us today to schedule a consultation.