Insight Global Site Reliability Engineer in Dearborn, Michigan, United States

Job Information

Insight Global Site Reliability Engineer in Dearborn, Michigan

Job Description

Insight Global is looking for a SRE to join a large automotive client. The Bedrock Customer Success and SRE team is responsible for ensuring that customers derive maximum value from the platform. This team acts as the primary point of contact for customers, helping them onboard, adopt, and optimize their use of the platform's offerings. This team also works to ensure the stability of the platform that hosts the cloud applications that power connected vehicle experiences.

* Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.

* Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans

* Architect, design & develop automation to reduce toil, improve recoverability, availability, latency & scalability of supported applications with understanding of MTTD (Mean Time to Detection) & MTTR (Mean Time to Resolution)

* Maintain knowledge repository that includes Standard operating procedure, Release checklists, Runbooks for incident recovery

* Run a production environment by monitoring availability and taking a holistic view of system health.

* Developing, improving, and operating the deployment and orchestration of a complex distributed system

* Improve reliability, quality, and time-to-market of our suite of software solutions

* Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve

* Provide primary operational and engineering Support for multiple large, distributed software applications

* Identify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovation

* Collaborating with development teams to design, build, and operate scalable and resilient software systems

* Automating build, deployment, monitoring, and incident response processes

* Performing root cause analysis of production incidents and implementing preventive measures

* Conducting performance analysis and optimization of the system

* Ensuring compliance with security and regulatory standards

* Implementing and maintaining disaster recovery processes

* Providing technical guidance and mentorship to other team members

* Participating in an on-call rotation for incident response and support.

Can sit remotely out of Dearborn, Michigan or Palo Alto, California.

Compensation:

$60/hour to $72/hour

Exact compensation may vary based on several factors, including skills, experience, and education.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com .

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Skills and Requirements

* Strong background in software development and systems administration

* 5 - 6 years of experience with Golang, Java, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure, Docker/K8 in Maintenance and Development of multi-tier applications.

* Understanding of gRPC & RESTful APIs, and microservices platform

* 4 - 5 Years of experience with Application Performance Monitoring (APM) and other monitoring tools

o Grafana Cloud, Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.

* Strong experience working with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) to effectively drive the use of the budget to ensure maximum domain availability/uptime

* Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.

* Bachelor's in Computer Science or Equivalent Experience null

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to HR@insightglobal.com.

Apply Now

Experience Inc. Jobs

Job Information

Insight Global Site Reliability Engineer in Dearborn, Michigan

Insight Global

Current Search Criteria