Reliability Engineer

Boston, MA · Job # 8056BK

Our client company is looking for a top-notch, hands-on SENIOR RELIABILITY ENGINEER to lead a small and talented infrastructure engineering team.

The production systems are hosted in AWS data centers running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java.  

Technologies currently being used include:

  • Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
  • Postgres, Redis, Memcached, ElasticSearch
  • Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen

In this critical role, you would:

  • Design, build and maintain the core infrastructure for the company
  • Actively manage the backlog for the infrastructure team and work closely with other SREs on the team to provide coaching and mentorship
  • Help increase developer productivity and get to true continuous delivery
  • Develop operational and security standards and champion operational excellence and secure coding practices
  • Partner with engineering teams closely to educate and consult
  • Participate in solution design for new features, products, systems, and tooling
  • Debug complex problems across the whole stack
  • Continually monitor application/system performance and costs, generate actionable insights and either implement or advocate for them
  • Participate in on-call rotations, along with every member of the engineering team
  • Ruthlessly eliminate repetitive manual tasks and recurring errors
  • Ensure company is always employing best-of-breed tooling for all our infrastructure and automation needs
  • Collaboratively plot course for the maturing and growth of company’s infrastructure
  • Participate (and sometimes run point) in handling production incidents
  • Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling.

This role might be that rare opportunity if you:

  • Love building tooling and infrastructure to help developers be more productive
  • Love eliminating repetitive manual tasks through automation
  • Have a healthy appreciation of what it means to work in production
  • Have solid Unix command line and systems chops
  • Have experience with substantial, distributed SaaS or eCommerce systems
  • Can point to a solid track record of success leading small-to-medium infrastructure teams
Apply For this Position
Create an Account

Track your jobs and take advantage of one click applying by creating an account.

Register Now
Let Us Search

Want us to do the work for you? Submit your resume and we'll find a job that's a perfect match for you!

Submit Resume