Site Reliability Engineer (W/M)

Inseriert am: 23.04.2020

The Ecole polytechnique fédérale de Lausanne (EPFL) is one of the most dynamic university campuses in Europe and ranks among the top 20 universities worldwide. The EPFL employs 6,000 people supporting the three main missions of the institutions: education, research and innovation. The EPFL campus offers an exceptional working environment at the heart of a community of 16,000 people, including over 10,000 students and 3,500 researchers from 120 different countries.

Site Reliability Engineer (W/M)

Your mission :

The EPFL Blue Brain Project (BBP), situated on the Campus Biotech in Geneva, Switzerland, applies advanced high-performance computing to the challenge of understanding the structure and function of the mammalian brain in health and disease. We are now looking for an experienced Site Reliability Engineer (W/M) to work on our High-Performance Computing (HPC) and other mission-critical IT systems.Main duties and responsibilities include :

We offer e.g. following challenges:

Ensuring reliable product launches and upgrades upon our 1000+ node HPC cluster, Spectrum Scale, CEPH, OpenStack, OpenShift, VMware and NetApp with the help of modern sw development, configuration management, CI/CD and infrastructure-as-code approaches

Improving IT service reliability for our critically important IT services by implementing SRE best practices e.g. for availability, performance, utilisation, emergency response and capacity planning

Developing monitoring, logging and metrics tools to embrace risks

Automating IT processes - in order to get rid of toil, technical debt and manual work - using modern software engineering practices

Contributing to IT security e.g. by establishing clever update & patching methodologies.

Your profile :

We expect you to have strong experience in the following areas:

Linux (e.g. RedHat/CentOS, Ubuntu) in production server environments

Physical, virtualized and containerized infrastructure

Network concepts (e.g. IP routing, DNS, HA)

Configuration & provisioning tools (e.g. Puppet, Ansible, Foreman)

Programming and scripting (e.g. Python, Ruby, bash).

We count as advantage your possible experience with:

Implementing / operating large-scale storage systems, filesystems and data archiving

Operating large scale, Linux-based hardware infrastructure

Operating data centre networks built on Ethernet or InfiniBand

Operating HPC systems and software (e.g. Slurm, cluster managers)

Architecting, implementing & monitoring secure IT infrastructure

Stakeholder relationships, team leadership & management.

Our desired candidate would have:

Bachelor or Master degree in computer science - or similar working experience

Detail-oriented, cautious & professional working practices

Experience managing and completing large scale IT projects

Experience of e.g. kernel performance tuning & debugging of complex IT problems

Interest to improve IT operations, IT processes and SRE best practices

Experience working in collaborative and multi-cultural environments

Proven ability to work both independently and in team-based environments

Fluent communication in English (written and spoken).

We offer :

An internationally recognized research project using state-of-the-art HPC infrastructure

A dynamic, inter-disciplinary and international working environment in picturesque Geneva

An opportunity to get your hands dirty with new technologies as they emerge.

Start date :

As soon as possible

Term of employment :

Unlimited (CDI)Duration :

Negotiable. 1 year CDD (renewable) or CDI

Contact :

Please provide your CV and a cover letter (in English) in PDF format.Remark :

Only candidates who applied through EPFL website or our partner Jobup’s website will be considered.apply online

Details

Arbeitgeber

EPFL - Ecole Polytechnique Fédérale de Lausanne
Ort

Genève
Region

GE

Unternehmensgrösse

Mittelunternehmen

Weitere offenen Stellen dieses Unternehmens

Job Detail