Experience Inc. Jobs

Job Information

The Hartford Reliability Engineering Coach (Remote) in Chicago, Illinois

Staff Reliability Engineer - IE07KE

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.

The Central Reliability and Automation team is looking for a driven and highly motivated Staff Reliability Engineer Coach to join the team. In this role you will have responsibility for designing and maintaining a given IT solution for CI/CD pipeline, observability suite (monitoring/alerting/logging tools/processes) and automation suite consumed by REs, and Software Engineers. The Site Reliability Engineer will work with the consumers and stakeholder of the solution to define functional and non-functional requirements for the service. Leveraging Open Source or Commercial of the Shelf (COTS) products, they will design, build and maintain the solution, meet current and future demand. They will apply key SRE tenets across the life-cycle of the solution.A prerequisite to the role will be a “build-to-manage”, problem-solving and innovative mindset applied to the design, build, test, deploy, change and maintenance of services drawing from deep engineering expertise. Key measures of success will include service stability, effective delivery and environment instrumentation, deployment quality, technical debt reduction, asset resiliency, risk/security compliance, cost efficiency, proactive and preventative maintenance mechanisms, top quartile operating norms. The Senior Site Reliability Engineer will actively contribute to sustained advancement of the SRE practice within and beyond a given area of responsibility.

Responsibilities:

  • Influence and design architecture, infrastructure, standards and methods for large-scale cloud systems

  • Engage in and improve the software development life-cycle through CI/CD; Improve build to deployment process to establish greater reliability and a sustainable release process; Oversee release gating; establish deployment metrics (DORA)

  • Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets

  • Observability and custom monitoring tool integrations; introduce telemetry to support SLOs

  • Automate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliability

  • Deploy software through highly available practices, rolling, blue-green or canary

  • Provide mentorship to reliability engineering squads under a consistent framework for the Development, Testing and Alerting processes

  • Practice sustainable incident response through blameless RCA and postmortems

  • Advise performance testing and capacity planning

  • Communicate proactively with colleagues and formally present work product outcomes and risk analysis to product team and management.

  • Follow the Agile/Scrum working methodologies

  • Establish dashboarding for monitoring capabilities and metrics

  • Qualifications:

  • 7+ years of experience in related field

  • 3- + years of experience in languages such as Python, Ruby, Bash, Perl

  • BS degree in Engineering, Computer Science, or equivalent practical experience

  • Experience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.

  • Experience in implementing service dashboards for monitoring. objectives, and metrics

  • Experience developing and/or administering software in AWS cloud infrastructure

  • System administration skills, including automation and orchestration of environments using Terraform or CloudFormation and configuration management

  • Demonstrable cross-functional knowledge with systems, storage, networking, security and databases

  • Experience with container orchestration tools and container management (Docker, Kubernetes, etc.)

  • Proficiency with continuous integration and continuous delivery tooling and practices

  • Strong analytical and troubleshooting skills; Experience with runbooks

    Preferred Qualifications:

  • Expertise designing, analyzing and troubleshooting large-scale distributed systems.

  • Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive

  • Experience in implementing Infrastructure as code

  • Experience building software and maintaining systems in a highly secure, regulated or compliant industry

  • Experience and passion for working within a DevSecOps team culture

Additional Details:

  • Must be authorized to work in the US without company sponsorship.

  • This role can have a Hybrid or Remote work arrangement.  Candidates who live near one of our office locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday). Candidates who do not live near an office will have a remote work arrangement, with the expectation of coming into an office as business needs arise.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$126,160 - $189,240

Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us (https://www.thehartford.com/about-us) | Culture & Employee Insights (https://www.thehartford.com/careers/employee-stories) | Diversity, Equity and Inclusion (https://www.thehartford.com/about-us/corporate-diversity) | Benefits (https://www.thehartford.com/careers/benefits)

Human achievement is at the heart of what we do.

We believe that with the right encouragement and support, people are capable of achieving amazing things.

We put our belief into action by ensuring individuals and businesses are well protected, and by going even further – making an impact in ways that go beyond an insurance policy.

Nearly 19,000 employees use their unique talents in careers that span a variety of disciplines – from developing the latest technology to creating and promoting our products to evaluating future financial risks.

We’re also committed to programs that drive education and support volunteerism, which put human beings first. We do it because it’s the right thing to do, and because when our customers, communities and employees succeed, we all do.

About Us (https://www.thehartford.com/about-us)

Culture & Employee Insights

Diversity, Equity and Inclusion (https://www.thehartford.com/about-us/corporate-diversity)

Benefits

Legal Notice (https://www.thehartford.com/legal-notice)

Accessibility StatementProducer Compensation (https://www.thehartford.com/producer-compensation) EEO

Privacy Policy (https://www.thehartford.com/online-privacy-policy)

California Privacy Policy

Your California Privacy Choices (https://www.thehartford.com/data-privacy-opt-out-form)

International Privacy Policy

Canadian Privacy Policy (https://www.thehartford.com/canadian-privacy-policy)

DirectEmployers