AVP, Site Reliability Lead, Branch & Self-serviced Banking, Consumer Banking Group Technology, Technology & Operations
DBS Bank Limited
Location: Singapore, Singapore
Type: Full Time
Internal Number: 18510512
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
This position is for a Site Reliability Engineer responsible for the development and implementation of processes necessary to improve application / system reliability along with operational support. The position would comprise of approximately equal focus on both software development and operation disciplines. This position will also develop software to automate operational processes along with coding for the shared engineering backlog deliveries.
Engage with both the development and support teams throughout the life cycle to help build for reliability. Develop software to automate manual operational work. The workload for the position is multifaceted and would include: Close working collaboration with development and application support teams through SDLC; to maintain and improve the service against established Service Level Objectives by applying software engineering principles. Responsible for the availability, performance, change management, monitoring, and capacity management of their services. Incident manage, troubleshoot business critical incidents, conduct post-mortems and ensure permanent closure of the incidents. Analyse patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering Manage the efforts to split between manual operational work and engineering work. Work with partner organizations and vendors to provide solutions to current business issues. Participate in a shift model covering 24x7x365 support.
Bachelor of Science degree or equivalent experience. Proven experience with cloud platforms (AWS, PCF) is preferred. 3+ years working with configuration management and CI/CD tools (SonarQube, Fortify, NexusIQ) 5+ years of scripting/software experience (bash, python, java and perl) Familiarity and working experience on DevOps testing and release techniques (i.e. A/B Testing, Blue / Green Deployments and Canary Release, etc...) Working knowledge on DevOps tools/technologies (Docker, Kubernetes, OpenShift and Fabric8) will be preferred. Basic knowledge of database technologies (MariaDB/MySQL, etc..) Strong understanding of all LINUX security best practices Extensive experience in application/system/network performance and availability monitoring (Grafana, Vizceral, Tivoli, Splunk, etc..) Solid knowledge of APACHE/Weblogic and MQ Working knowledge of Cloud Engineering. Private and Public Cloud. Proven experience with cloud platforms (AWS, PCF) is preferred. AWS certification is preferred. Proven technical leadership experience, including the ability to quickly understand an issue, appropriately / efficiently troubleshoot to detailed levels and direct swift resolution. Ability to adapt to a dynamic work environment. Strong ability to take ownership of issues and drive resolution across teams. Assertive personality and drive improvement across environment. Effective written and verbal communication skills. Ability to develop strong client relationships and partner with technology engineering teams.