Experience Inc. Jobs

Job Information

Oracle Principal Diagnostics Engineer – Cloud Platforms in Santa Clara, California

Job Description

Oracle Hardware Platform Development Engineering is seeking a highly driven Principal Diagnostics Engineer to join our team. In this role, you will play a critical role in the testing, debugging, and optimization of Oracle’s rapidly growing cloud server platforms. Your expertise in diagnostic software development and system-level debugging will help to drive the reliability and efficiency of Oracle’s next-generation cloud infrastructure.

Career Level - IC4

Responsibilities

As a Principal Diagnostics Engineer, you will be responsible for, but not limited to:

  • Design and implement state-of-the-art diagnostic software for our upcoming generation of cloud servers and systems crucial for testing various server subsystems and swiftly identifying potential issues.

  • Responsibilities encompass evaluating system architectures and proposing diagnostic implementation strategies, ensuring robust testing, scalability and user-friendly environment.

  • Participating in platform definition, analysis, and bring-up, ensuring seamless integration of diagnostics into the development lifecycle.

  • Collaborating with in-house engineering teams, including system architects, firmware developers, and validation engineers, to define and refine diagnostics features.

  • Integrate new diagnostic functionalities, rectifying bugs, debugging issues, and delivering solutions to mitigate problems.

  • Supporting development program managers with technical assessments and planning, offering expert insights into diagnostic requirements.

  • Working closely with third-party component suppliers, partners, and internal teams, including hardware/software development, quality assurance, cloud orchestration, and security compliances, to enhance system reliability.

  • Participating in hardware platform security evaluations, ensuring diagnostic coverage for potential vulnerabilities.

  • Providing guidance to internal Oracle teams on system diagnostics, failure analysis, and monitoring strategies, supporting large-scale cloud deployments.

  • Assisting Oracle Cloud and Support teams in root-cause analysis of hardware/software failures, leveraging lab replication, remote debugging, and in-depth telemetry analysis.

  • Partnering with Oracle manufacturing teams to ensure that hardware is secure and rigorously validated, meeting deployment standards for Cloud customers.

Required Skills:

  • Proven expertise in C programming and embedded system software development

  • Skilled in developing software tools to test CPU, GPU, memory, storage, and networking components, diagnose hardware failures, detect overheating, monitor abnormal power consumption, and collect telemetry data.

  • Good understanding of CPU/GPU architectures, chipset programming, and expertise in modern server architectures, including x86 and ARM-based platforms, with knowledge of multi-vendor system integrations.

  • Familiar with NVDIA and AMD GPU architectures

  • A strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/ BIOS and Linux tools. Skilled in scripting to customize tests.

  • Experience with early-stage platform bring-up, including prototype GPU, CPU, and memory subsystem debugging, as well as firmware and platform-level diagnostics.

  • Strong communication skills, capable of clearly articulating technical challenges across engineering teams and succinctly explaining issues and solutions to executive leadership.

  • Creative and analytical mindset, with the ability to isolate issues to their root cause and devise effective, timely, and scalable solutions.

  • Three to five years’ experience in embedded systems/diagnostics SW development for server platforms

  • BS/MS

Preferred Skills:

Experience and understanding of the latest high-speed busses and interconnect used in modern Compute and AI platforms. Familiarity with their startup connectivity and operational robustness.

  • Demonstrated knowledge of "low-level" system component interfaces, including, but not limited to, e.g.

  • High-speed: PCIe, CXL

  • Low speed: SPI, eSPI, I2C (incl. SMBus, PMBus), LPC, etc.

  • Demonstrated knowledge in Memory Technologies like DRAM, DDR, EEPROM. Cache

  • Demonstrated knowledge in Storage Types: NVMe, SATA, SAS. HDD vs SSD

  • Experience installing NVIDIA tool kit and hands on using CUDA.

  • Familiar with FPGA intercommunication, telemetry sensors, VRDs

  • Experience programming in multi-core environment

  • Multiprocessing and Multithreading programming

  • Experience with Memory test and Storage test algorithms

  • Hands-on in Python programming and bash scripting

  • Familiarity with open BMC and Open Compute Project concepts

  • Experience and understanding of the latest high-speed busses and interconnect used in modern Compute and AI platforms. Familiarity with their startup connectivity and operational robustness.

#LI-SM18

Disclaimer:

Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $97,500 to $199,500 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

  1. Medical, dental, and vision insurance, including expert medical opinion

  2. Short term disability and long term disability

  3. Life insurance and AD&D

  4. Supplemental life insurance (Employee/Spouse/Child)

  5. Health care and dependent care Flexible Spending Accounts

  6. Pre-tax commuter and parking benefits

  7. 401(k) Savings and Investment Plan with company match

  8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

  9. 11 paid holidays

  10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

  11. Paid parental leave

  12. Adoption assistance

  13. Employee Stock Purchase Plan

  14. Financial planning and group legal

  15. Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

About Us

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.

When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.

We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.

Disclaimer:

Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

* Which includes being a United States Affirmative Action Employer

DirectEmployers