Job Detail

Site Reliability Engineer (W/M)

Inseriert am: 08.01.2020
The Ecole polytechnique fédérale de Lausanne (EPFL) is one of the most dynamic university campuses in Europe and ranks among the top 20 universities worldwide. The EPFL employs 6,000 people supporting the three main missions of the institutions: education, research and innovation. The EPFL campus offers an exceptional working environment at the heart of a community of 16,000 people, including over 10,000 students and 3,500 researchers from 120 different countries.




Site Reliability Engineer (W/M)

Your mission :

The EPFL Blue Brain Project (BBP), situated on the Campus Biotech in Geneva, Switzerland, applies advanced neuroinformatics, data analytics, high-performance computing and simulation-based approaches to the challenge of understanding the structure and function of the mammalian brain in health and disease. The BBP provides the community with regular releases of data, models and tools to accelerate neuroscience discovery and clinical translation through open science and global collaboration.Main duties and responsibilities include :

BBP’s Core Services section is now looking for an experienced Site Reliability Engineer (W/M) to work on BBP’s High Performance Computing (HPC) and other mission-critical IT systems. This opportunity presents you with a chance to:



  • Ensure IT service reliability for our critically important IT services e.g. by implementing SRE best practices for availability, performance, utilisation, change management, emergency response and capacity planning

  • Develop monitoring, logging and metrics tools to manage risk

  • Automate IT processes - in order to get rid of toil, technical debt and manual work - using modern software engineering practices

  • Ensure reliable product launches and system upgrades upon our IT platforms like HPC, virtualization, containerization, and storage using modern software development, configuration management, CI/CD and infrastructure-as-code approaches

  • Contribute to IT security e.g. by establishing modern and clever system update/upgrade methodologies

Your profile :

What you'll need to succeed:



  • Deep understanding of IT operations using software engineering practices

  • Practical & recent hands-on experience with full-lifecycle config management, provisioning and CI tools (e.g. Puppet, Git, Jenkins, Foreman)

  • Practical & deep experience of using Linux (e.g. RedHat/CentOS, Ubuntu) in server environments

  • Extensive knowledge of monitoring infrastructure (e.g. Icinga, Prometheus, ELK)

  • Experience in programming and scripting (e.g. Python, Ruby, shell)

  • Understanding of networking fundamentals (e.g. HTTPS, DNS, TCP/IP & load balancing) with ability to implement changes and diagnose issues with routing, network protocols, subnets and DNS

  • Knowledge of industry best practices to run secure infrastructure

We count as an advantage any experience with:



  • Administering HPC clusters, cloud and container platforms (e.g. OpenStack, OpenShift/Kubernetes)

  • Administering storage systems (e.g. NetApp, GPFS, CEPH)

  • Streamlining processes and interest towards process development

You:



  • Dislike cutting corners and sweeping technical debt under the rug

  • Have a Bachelor or Master degree in computer science - or similar working experience

  • Are experienced in working in a collaborative and multi-cultural environment

  • Have excellent interpersonal and communication skills, written and oral

  • Are a self-starter, fast learner and eager to expand your domains of expertise

  • Have a proven ability to work both independently and in team-based environments

  • Are fluent in communication in English (written and spoken)

We offer :



  • An internationally recognized research project using state-of-the-art HPC infrastructure

  • A dynamic, inter-disciplinary and international working environment

  • An opportunity to get your hands dirty with new technologies as they emerge

  • Great colleagues and excellent coffee

Start date :

As soon as possible Term of employment :

Unlimited (CDI)Duration :

Negotiable : CDI or 1 year CDD (renewable)

Remark :

Only candidates who applied through EPFL website or our partner Jobup’s website will be considered.apply online