Citigroup Site Reliability Engineer – Data & Platform in Mississauga, Ontario
Now is an extremely exciting time to join a newly formed group within Citi. The Institutional Clients Group - Engineering and Architecture Practice (EAP) is responsible for defining and building core architecture and technology strategy for the ICG.
This position will be in Kafka as-a-Service team which sits under Common Platform Engineering (CPE). The CPE is a department within the EAP group whose mission is to provide engineering for common platform capabilities in ICG, engineer solutions that codify the firm's data strategy into frameworks & tools and to ensure 'Common Product' standards are defined to ensure efficient adoption of common components.
We are looking for a SRE with software engineering background who is passionate about running large scale, multi-tenant distributed data systems for customers that expect a very high level of availability. In this role, you will be responsible for the availability, performance, monitoring, emergency response, and capacity planning of the data systems.
If you love the hum of big data systems, thinking about how to make them run as smoothly as possible, and want to have a big influence on the architecture plus operational design points of the systems, then you will fit right in. Your solutions will be leveraged by tens of thousands of developers across Citi supporting applications used by hundreds of thousands of internal and client users.
What you‘ll be doing:
Design & build observability solutions for distributed systems
Contribute to the continuous automation of toil, and drive & evangelize the four key DORA metrics
Establish Service Level Objectives for core services, monitor their Service Level Indicators, and implement error-budget based alerting
Help operational team by building solutions that allow them to identify and resolve health issues of the data systems as quickly as possible
Automate the deployment of infrastructure and application for data systems such as Kafka
Support the rapid growth of the platform, by expanding its strategy to deploy into an OpenShift environment and AWS Cloud environment (EKS/GKE)
Design and implement service improvements for performance & security, relentlessly improve reliability and facilitate effective incident response, mitigation & resolution
Write and review technical documents, including design, requirements, and process documentation
Advocate for a culture of platform automation with obsession for everything as-a-code approach
What we are looking for:
4+ years’ experience in Site Reliability Engineering to create scalable and highly reliable systems
Strong fundamentals in distributed systems design and operation with experience building automation to operate large-scale data systems
Experience designing & implementing observability solutions for data systems to enable a holistic view of system health
Strong understanding of modern site reliability engineering practices and ability to apply them to improve the reliability of systems
Experience creating, deploying, and managing the lifecycle of containerised applications on Kubernetes
Experience in an agile development environment with modern programming languages such as any of the following: Python, Golang, Java, Kotlin, Scala or similar
What gives you an edge:
Experience working with the distributed systems and stream processing solutions, hands on experience with Apache Kafka is highly desirable
Strong grasp of DevSecOps practices and ability to contribute to improving systems reliability, quality, and time-to-market
Experience designing and implementing multiple automated deployment pipelines at both applications and infrastructure level. Ideally, you would have experience with Ansible and Terraform on multiple projects
Experience working with the Hashicorp tool set, specifically Vault for secrets management and Consul for service discovery
Experience deploying applications and infrastructure into the cloud
Citi Canada is an equal opportunity employer. Accordingly, we will make accommodations to respond to the needs of people with disabilities (including, without limitation, physical and mental health disabilities) during the recruitment process and otherwise in accordance with law. Individuals who view themselves as Aboriginals, members of visible minority or racialized communities, and people with disabilities are encouraged to apply.
Job Family Group:
Citi is an equal opportunity and affirmative action employer.
Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Citigroup Inc. and its subsidiaries ("Citi”) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm) .
View the "EEO is the Law (https://www.dol.gov/sites/dolgov/files/ofccp/regs/compliance/posters/pdf/eeopost.pdf) " poster. View the EEO is the Law Supplement (https://www.dol.gov/sites/dolgov/files/ofccp/regs/compliance/posters/pdf/OFCCP_EEO_Supplement_Final_JRF_QA_508c.pdf) .
View the EEO Policy Statement (http://citi.com/citi/diversity/assets/pdf/eeo_aa_policy.pdf) .
View the Pay Transparency Posting (https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf)
Citi is an equal opportunity and affirmative action employer. Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
- Citigroup Jobs