Back to Jobs

Senior Software Engineer, Devops

Remote, USA Full-time Posted 2025-07-27

As a Senior Software DevOps Engineer, you will lead the design,implementation, and evolution of telemetry pipelines and DevOps automation that enable

next-generation observability for distributed systems. You will blend a deep understanding of Open Telemetry architecture with strong DevOps practices to build a reliable, high-performance and self-service observability platform across hybrid cloud environments (AWS & Azure). Your mission: empower engineering teams with actionable insights through rich metrics, logs, and traces, while championing automation and innovation at every layer.

WHAT YOU WILL BE DOING

  • Observability Strategy & Implementation

  • Architect and manage scalable observability solutions using OpenTelemetry (OTel),encompassing:

  • Collectors: Design and deploy OTel Collectors (agent/gateway modes) for ingesting and exporting telemetry across services.

  • Instrumentation: Guide teams on auto/manual instrumentation for services (metrics, traces, and logs).

  • Export Pipelines: Build telemetry pipelines to route data to backends like
  • Grafana, Prometheus, Loki, New Relic, and Azure Monitor.

  • Processors & Extensions: Leverage OTel processors (batching, filtering,
  • resource detection) and extensions for advanced enrichment and routing.
  • DevOps Automation & Platform Reliability

  • Own the CI/CD experience using GitLab Pipelines, integrating infrastructure automation with Terraform, Docker, and scripting in Bash and Python.
  • Build resilient and reusable infrastructure-as-code modules across AWS and Azure ecosystems.Manage containerized workloads, registries, secrets, and secure cloud-native deployments with best practices.
  • Cloud-Native Enablement

  • Develop observability blueprints for cloud-native apps across AWS (ECS, EC2, VPC,IAM, CloudWatch) and Azure (AKS, App Services, Monitor).
  • Optimize cost and performance of telemetry pipelines while ensuring SLA/SLO adherence for observability services.
  • Monitoring, Dashboards, and Alerting

  • Build and maintain intuitive, role-based dashboards in Grafana ,New Relic..., enabling real-time visibility into service health, business KPIs, and SLOs. Implement alerting best practices (noise reduction, deduplication, alert grouping)integrated with incident management systems.
  • Innovation & Technical Leadership

  • Drive cross-team observability initiatives that reduce MTTR and elevate engineering velocity.
  • Champion innovation projects—including self-service observability onboarding, log/metric reduction strategies, AI-assisted root cause detection, and more.
  • Mentor engineering teams on instrumentation, telemetry standards, and operational excellence.
  • WHAT YOU BRING

  • 10+years of experience in DevOps, Site Reliability Engineering, or Observability roles.
  • Deep expertise with OpenTelemetry, including Collector configurations,
  • receivers/exporters (OTLP, HTTP, Prometheus, Loki), and semantic conventions.
  • Proficient in GitLab CI/CD, Terraform, Docker, and scripting (Python, Bash, Go). Strong hands-on experience with AWS and Azure services, cloud automation, and cost optimization.
  • Proficiency with observability backends: Grafana, New Relic, Prometheus, Loki, or equivalent APM/log platforms.
  • Passion for building automated, resilient, and scalable telemetry pipelines.
  • Excellent documentation and communication skills to drive adoption and influence engineering culture.
  • Nice to Have)

  • Certifications in AWS, Azure, or Terraform.
  • Experience with OpenTelemetry SDKs in Go, Java, or Node.js.
  • Familiarity with SLO management, error budgets, and observability-as-code approaches.
  • Exposure to event streaming (Kafka,rabbitmq), Elasticsearch ,Vault,consul
  • Apply to this Job

    Similar Jobs