AVP/SVP, Site Reliability Engineer/ Lead (Java Development, Observability, Resiliency), Banking

Employer: Charterhouse Partnership Singapore, EA Licence No: 16S8066
Location: Singapore, Singapore
Salary: Highly Competitive
Closing date: Aug 12, 2022

Job Function: Other
Industry Sector: Finance - General
Employment Type: Full Time
Education: Bachelors

Min. 6 years of Development experience (for AVP level) and min. 12 of Development experience (for SVP level)
3 years hands on experience (in JAVA/J2EE, Spring Boot, JavaScript, SQL/PostgreSQL, Microservices) + 2 years hands on experience (in any of the technology such as Red Hat OpenShift/ Kubernetes, Docker, Kafka, ELK, Redis and DevOps Tools such as Jenkins, Bitbucket, JIRA)
Practical experience in maintenance of large-scale distributed systems architectures, hybrid cloud/on-premise environments, and event-driven or event stream systems. (i.e. distributed storage, scheduling, big data computing system)

As the AVP, Site Reliability Engineer, you will sit in the Enterprise SRE Team (Architect & Development) and assist development team to tune the applications/ configurations for critical systems to comply with the NFR before going live in production and ensure the performance recommendations are part of the change request process. Drive thorough performance analysis of microservices code by using single-user code profiling techniques

You will also define critical performance KPIs, set alert rules and roll-out monitoring dashboards for Production with timely reporting to the stakeholders, and look at automation of various manual tasks w.r.t performance monitoring, alerting, analysis, reporting, capacity planning etc to improve application observability, resiliency & operational efficiency.

You will work closely with solution architects, application development team to ensure adherence to best practices in design and coding w.r.t SRE principles. Monitor, troubleshoot & analyse application & underlying infrastructure performance issues as part of the performance engineering exercises and derive gold-configuration parameters.

In addition, you will ensure appropriate governance w.r.t framework usage across multiple delivery streams and enhance the framework capability to meet the upcoming requirements and participate & contribute to resiliency validation exercises and create proper reporting to the stakeholders.

Or

As SVP, Site Reliability Engineer Lead, you will be responsible for leading and building a team of

software/system engineers (including team recruitment, new talent training, system

operation/maintenance/ coordination and team culture building), developing a long-term technical plan, have a clear implementation path and milestones, continuously ensure the competitiveness of the team and technology, designing and implementing software platforms as well as monitoring frameworks for efficient, automated, and intelligent event driven / service-oriented architecture governance, and monitoring, troubleshooting & analyse application & underlying infrastructure performance issues as part of the performance engineering exercises and derive gold-configuration parameters.

You are expected to set up necessary processes for efficient execution and advocate good engineering practices, including formulating process specifications and plans with regards to access, configuration, disaster recovery as well as fault handling for all critical paths of the operating platform, promoting the evolution of business architecture design through reduction of customer anxiety.

You will work with the bank infrastructure and software development teams to ensure services reliability (i.e.: system development team to ensure system reliability throughout the entire life cycle from system design to launch (Cradle to Grave), solution architects, application development team to ensure adherence to best practices in design and coding w.r.t SRE & CRE principles, and other business teams, improve cross-team coordination, ensure continuous improvement and optimization of business flows) and uptime appropriate to the needs of users and fast iterations of improvement, and assisting development team to tune the applications/ configurations for critical systems to comply with the NFR before going live in production and ensure the performance recommendations are part of the change request process.

You will also identify opportunities for continuous improvement in the full lifecycle of a large distributed system. (i.e. Design, development, configuration, testing, deployment, monitoring, and operations) Continuously evolve automated operation, maintenance facilities and platforms (automation of various manual tasks w.r.t performance monitoring, alerting, analysis, reporting, capacity planning etc to improve application observability, resiliency & operational efficiency), and ensuring appropriate governance w.r.t framework usage across multiple delivery streams and enhance the framework capability to meet the upcoming requirements.

You will drive thorough performance analysis of microservices code by using single-user code profiling techniques, participate & contribute to resiliency validation exercises and create proper reporting to the stakeholders and define critical performance KPIs, set alert rules and roll-out monitoring dashboards for Production with timely reporting to the stakeholders.

To qualify, individuals must possess:

For AVP, Site Reliability Engineer

- Degree with min. 6 years of Development experience (3 years hands-on experience in JAVA/J2EE, Spring Boot, JavaScript, SQL/PostgreSQL, 2 years hands-on experience in any of the technology such as Red Hat OpenShift/Kubernetes, Docker, Kafka, ELK, Redis and DevOps Tools such as Jenkins, Bitbucket, JIRA.

Must have:

- Development experience in JAVA/J2EE, Spring Boot, JavaScript, Microservices, etc

- Experience in assist the bank in establishing reliability and performance

- Strong analytical and problem-solving skills with good interpersonal and communication skills.

- Positive attitude towards continuous learning.

Good to have:

- Hands on experience in Chaos Engineering

- Hands on experience in application monitoring with Grafana, Kibana, Prometheus, AppDynamics or Dynatrace

For SVP, Site Reliability Engineer Lead

- Min.12 years of Development experience (3-5 years hands-on experience in Python, JAVA/J2EE, Spring Boot, JavaScript, SQL/PostgreSQL in terms of writing maintenable, testable code and 2 years hands-on experience in any of the technology such as Red Hat OpenShift/Kubernetes, Docker, Kafka, ELK, Redis and DevOps Tools such as Jenkins, Bitbucket, JIRA)

- Practical experience in maintenance of large-scale distributed systems architectures, hybrid cloud/on-premise environments, and event-driven or event stream systems. (i.e. distributed storage, scheduling, big data computing system) is preferred.

Must have:

- Development experience in JAVA/J2EE, Spring Boot, JavaScript, Microservices, etc

- Experienced with project and team management.

- Systematic in operation and maintenance thinking with the ability to find the balance between when to be tactical vs. strategic. Familiar with Linux systems and networking.

- Familiarity with Helm / Terraform

- Positive attitude towards continuous learning with a passion for software development and pays great attention to optimizing existing systems, building infrastructure as well as reducing/eliminating toil through automation.

Good to have:

- Hands-on experience in application monitoring with Grafana, Kibana, Prometheus, AppDynamics or Dynatrace

- Hands on experience in Chaos Engineering

- R&D experience is a bonus

Please reach out to Vyon Ng at 69500385 or VyonN@charterhouse.com.sg for a confidential discussion.

Only successful candidates will be notified.

EA License no.: 16S8066 I Reg no.: R1110857

AVP/SVP, Site Reliability Engineer/ Lead (Java Development, Observability, Resiliency), Banking

Sign in to create job alerts