DevOps Engineer | Azure | CI/CD | Big Data Platforms

MANIGANDAN B Automation, Reliability, and Production Stability at Enterprise Scale

4+ years of experience designing, automating, and supporting enterprise deployment and data platforms using Azure DevOps, Linux, Hadoop, and cloud-based tooling. I specialize in CI/CD pipeline engineering, production reliability, and operational automation in environments where uptime, release stability, and SLA performance directly affect the business.

50%

Faster Releases

99.8%

Pipeline Success

35%

Less Alert Noise

8 FTE/wk

Automation Savings

About

I am a DevOps Engineer with 4+ years of experience in Big Data and enterprise platform environments. Currently based in Utrecht, I support a Netherlands mortgage platform where I own CI/CD workflows, production operations, and reliability improvements across Linux, Hadoop, and Azure DevOps ecosystems. My focus is automation, stability, and reducing manual operational effort in high-availability environments.

Who I Am

I'm a DevOps engineer who believes that reliability is a business enabler. I've spent 4+ years helping teams move faster without burning out on-call engineers. I do this by automating toil, building observability that actually catches problems early, and creating deployment workflows that teams trust.

What I enjoy solving: Reducing the friction between "developers want to ship" and "operators need stability." Whether it's pipeline bottlenecks, alert noise, or environment consistency—I design systems that scale and don't need constant manual care.

Current role: Platform and DevOps Engineer at Infosys (Utrecht, Netherlands), supporting critical mortgage platform operations across 40+ Linux servers, Azure DevOps pipelines, and enterprise monitoring stacks.

Looking for: Platform Engineer or Senior DevOps roles where I can own reliability end-to-end, mentor teams on SRE practices, and drive infrastructure standardization in fast-moving environments.

Experience

Platform and DevOps Engineer | Infosys Limited | Jan 2022 - Present
Utrecht / Amersfoort, Netherlands (Current - 6 months onsite), Chennai, India (Earlier tenure)

Current Project

I currently support the OPK project in a financial-grade modernization and reliability program. My role spans three delivery streams with both hands-on execution and production ownership.

Featured Projects

Case studies focused on problem, solution, architecture, and measurable results.

Observability and Incident Response

Unified Observability Platform

Problem: Fragmented monitoring across 4 stacks (Grafana, Splunk, Dynatrace), 15+ min incident detection, manual cross-platform correlation, 30% false positives.

My Role as Platform and DevOps Engineer: Led observability consolidation for critical Linux workloads and designed the alerting model used by on-call teams.

Solution: Built a unified monitoring layer with shared dashboards, centralized logs, intelligent alert correlation, and runbook-linked incident context.

Tools: Grafana, Splunk, Dynatrace, Azure Monitor, Fluentd, Prometheus, AlertManager, Python

Impact: Faster detection, lower noise, better on-call efficiency, and more consistent incident handling for SLA-sensitive systems.

-25% Detection Time
-18% Recovery Time
99.9% SLA Uptime
-35% Alert Noise
GrafanaObservabilityIncident ResponseSRE

Automation and Toil Reduction

Runbook Automation & Alert Correlation

Problem: 200+ daily alerts (70% false positives), manual runbooks causing 2+ hour response, inconsistent incident recovery, 8 FTE hours/day wasted in triage.

My Role as Platform and DevOps Engineer: Drove the automation design for repetitive operational tasks and standardized first-response workflows for support teams.

Solution: Built a Python correlation engine, Bash runbooks, PagerDuty integration, and controlled auto-remediation to reduce manual triage effort.

Tools: Python, Bash, PagerDuty, Splunk, Git, Ansible, AlertManager

Impact: Lower alert fatigue, faster recovery, improved first-time resolution, and measurable reduction in operational toil.

-35% Alert Volume
-67% Response Time
+45% 1st Resolution
8 FTE /week saved
PythonAutomationIncident MgmtToil Reduction

DevOps and Platform Skills

Hands-on tooling used to improve delivery speed, platform stability, and operational efficiency.

Cloud and DevOps Delivery

Azure DevOps, Jenkins, Git, and YAML pipelines used to standardize releases, enforce gates, and improve deployment reliability across multi-environment delivery flows.

Big Data Platforms

Hadoop, Hive, and Cloudera operations supporting data platform stability, job execution reliability, cluster health, and enterprise production support.

Automation and Scripting

Python, Bash, Ansible, and AWX used to reduce manual intervention, automate runbooks, standardize configuration, and cut repeat operational effort.

Monitoring and Reliability

Grafana, Splunk, Dynatrace, Azure Monitor, and Prometheus applied to detect issues earlier, reduce alert noise, and support RCA and SLA adherence.

Infrastructure and Configuration

Linux administration, Terraform, ARM templates, and configuration management used to keep environments consistent and reduce drift-related incidents.

Incident Management and Support

Production support, ITIL workflows, escalation handling, postmortems, and stakeholder coordination in systems with 99.9% uptime expectations.

Security and Compliance

Release validation, secrets handling, audit logging, and compliance checks integrated into delivery pipelines for safer production changes.

Certifications

Education

Languages

Activities and Interests

AI Lab

I use AI in practical DevOps and platform workflows to reduce response time, improve communication quality, and increase release confidence.

AIOps LLM-Assisted RCA Prompt Engineering Copilot for Delivery Automation Agents

Incident AI

Incident Summary Copilot

Converts noisy incident timelines into concise stakeholder-ready summaries with clear impact, current status, and next actions.

Reliability AI

RCA Draft Assistant

Speeds up first-draft RCA creation by organizing failure sequence, probable causes, evidence points, and prevention actions.

Change AI

Change Risk Brief Generator

Produces pre-deployment risk briefs from change details so teams can plan rollback, validation, and communication before production rollout.

Delivery AI

AI Workflow Trainer for Project Execution

Built and trained AI assistants using project runbooks and delivery patterns to generate faster implementation guidance, improve consistency, and keep execution timelines on track.

Prompt Studio

Choose a real use case to preview how I frame AI prompts in operations.

Sample Prompt

Summarize this incident timeline in 5 bullets with impact, current status, suspected cause, mitigation already applied, and next action for stakeholders.

Expected Outcome

A clean and business-ready incident update in less than 1 minute.

Talk with Me

Ask quick questions about my experience, OPK project, Terraform work, or availability.

Hi, I am Manigandan. Ask me about my experience, current project, skills, or how I can support your team.

Contact

Open to Platform and DevOps opportunities across the Netherlands and Europe. Currently in Utrecht. Visa sponsorship required.

Location

Utrecht, Netherlands

Phone (Netherlands)

+31 617886316

Phone (India)

+91 8056353767