UpGuard - Site Reliability Engineer

Mountain View, California, United States


At UpGuard, our Platform team handles scale, deployment, uptime, monitoring and infrastructure for both our cloud and enterprise appliance customers. We build autonomous, self-healing clusters of systems using distributed consensus protocols and containers. Our internal tools are built with open-source projects like CoreOS, Etcd, Docker, Fleet, and Kubernetes. We follow a strong release process and collaborate with the Engineering and Product teams. We've built continuous integration and delivery mechanisms (DevOps) and test the resilience of our systems often with live host reboots in production. We’ve got experience building systems that scale and work across datacenter regions. We write code, so the ideal candidate will have experience in both systems and software development.

Our goal is to create an SRE team that incorporate many of the attributes that Google describes in O'Reilly's "Site Reliability Engineering" book. We are looking for candidates who are fast learners, great communicators (both within and outside the team), strong troubleshooters and always strive to build better systems.



Minimum qualifications:

Preferred qualifications:


Apply for this job