SRE fits right at the crossroads of IT operations, support and software engineering. SRE serves as the perfect blend of skills to tighten the relationship between IT and developers – leading to shorter feedback loops, better collaboration and more reliable software.
Department: Operations
Project Location(s): San Jose, Costa Rica
Objectives of this role
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Provide primary operational support and engineering for multiple large-scale distributed software applications
Required skills and qualifications
- Bachelor’s degree (or equivalent) in computer science or related discipline
- Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
- Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
Responsibilities
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service-level objectives