
UpLink Azure Migration
Overview
I am the Senior SRE on the UpLink team during the migration of DataSnipper's UpLink platform from AWS to Azure, implementing modern cloud-native architecture and comprehensive monitoring solutions.
The UpLink Azure Migration project was a comprehensive initiative to move the UpLink application, formerly hosted on AWS, into Microsoft Azure.
My primary goals were to:
- Standardize all infrastructure as code for consistency and repeatability
- Establish automated CI/CD pipelines for both frontend and backend components
- Centralize observability through custom dashboards
- Reduce operational toil and improve deployment velocity
By leveraging Azure’s native services, we enabled the team to deploy changes faster, gain end-to-end visibility, and operate within the West Europe region in addition to our existing East US presence.
Key Features
- Modular Terraform Modules: Network, compute, storage, and RBAC components packaged for reuse
- Automated Pipelines: End-to-end GitHub Action pipelines for build, test, and deploy of frontend and backend
- Live Dashboards: Custom Azure Monitor workbooks surfacing SLIs/SLOs for Container Apps, Document Intelligence, and OpenAI calls
- Secrets Management: Centralized secret injection via Key Vault-backed service connections
Infrastructure as Code
- Terraform Modules: Developed reusable Terraform modules for Azure Container Apps, Document Intelligence services, and Azure Service Bus integration
- Environment Consistency: Ensured identical infrastructure across development, staging, and production environments
- Security Best Practices: Implemented Azure Key Vault integration for secrets management and RBAC for access control
CI/CD Pipeline Design
- End-to-End Automation: Built comprehensive CI/CD pipelines for both backend microservices and frontend applications
- Multi-Environment Deployment: Automated deployments across development, staging, and production with proper approval gates
Monitoring & Observability
- Azure Dashboards: Created comprehensive dashboards for every microservice with key SLIs and SLOs
- Metrics Collection: Implemented CPU, memory, error rate, and latency monitoring for all services
- Alerting: Set up intelligent alerting based on service health and performance thresholds
Technical Stack
- Infrastructure: Azure Container Apps, Azure Document Intelligence, Azure Service Bus, OpenAI Services
- IaC: Terraform
- CI/CD: Azure DevOps, GitHub Actions
- Monitoring: Azure Log Analytics, Dashboards, and Alerts
- Security: Azure Key Vault
Implementation Details
We began by architecting a robust infrastructure foundation in Terraform, defining a landing zone complete with virtual networks, subnets, and network security groups. To promote consistency and reuse, we modularized key components, such as Container Apps environments and Azure Key Vault instances, and configured remote state storage in Azure Storage.
Next, we crafted dedicated GitHub Action pipelines in Azure DevOps for both the React frontend and Python backend. Each pipeline orchestrates Terraform plan/apply tasks alongside build, test, and deploy stages. We also layered in manual approval gates for production deployments to enforce compliance and reduce risk.
For end-to-end visibility, we instrumented our microservices with the Application Insights SDK, enabling custom events and metrics collection. Log Analytics queries track error rates, request latency, and resource utilization, which we surface through Azure Dashboards featuring real-time charts and drill-down logs.
Security was baked in at every layer: all secrets live in Azure Key Vault and are accessed via managed identities to maintain least-privilege access. Role-based access controls on resource groups and pipeline service principals ensure only authorized personnel can trigger deployments or view sensitive data.
Finally, the new Azure environment was deployed in parallel with AWS, and new DNS records and urls were provisioned after automated integration tests validated feature parity. When the migration is successfully completed, we decommissioned AWS resources to eliminate configuration drift and capture cost savings.
Challenges and Solutions
One of the biggest hurdles we faced was that none of the UpLink team had prior Azure experience. To bridge this gap, we enrolled in Microsoft’s Fast Track program, which paired us with Azure engineers for weekly coaching sessions. During these meetings we validated our landing-zone design, refined our Terraform modules, and gradually built confidence in key services like Container Apps, Service Bus, and Key Vault.
Coordinating work between our SRE team in Amsterdam and the UpLink engineers on the U.S. East Coast introduced another layer of complexity. We overcame this by establishing a rotating “core hours” schedule that guaranteed at least three overlapping work hours each weekday, volunteered to work earlier than my scheduled time to meet with them while another SRE from the Amsterdam team volunteered to stay later,and by running asynchronous design reviews in shared Notion pages and Azure DevOps/GitHub pull requests to keep everyone aligned despite the time-zone gap.
Working with the Azure Terraform provider proved challenging; early versions exhibited erratic behaviors and intermittent errors as we defined complex networking and RBAC modules. To address this, we contributed bug reports upstream, implemented retries and workarounds in our wrapper scripts, and locked our provider version until stability improvements were released. This ensured our Terraform runs became predictable and reliable.
Results
- Zero Downtime Migration: Successfully migrated production services without any customer impact
- Data Sovereignty: European customers now have their data hosted within their borders
- Improved Performance: 30% reduction in average response times
- Enhanced Reliability: 99.9% uptime achieved through improved monitoring and alerting
- Cost Optimization: 25% reduction in infrastructure costs through Azure's pricing model
Lessons Learned
The migration highlighted the importance of:
- Comprehensive testing in staging environments
- Detailed monitoring and observability from day one
- Clear communication with stakeholders throughout the process
- Having rollback plans for every deployment