Experience Inc. Jobs

Job Information

Microsoft Corporation System Engineer 2 in Bangalore, India

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission.

Cloud AI & Advanced Systems Engineering (CAASE) is responsible for expanding Microsoft’s Cloud Infrastructure to enable Microsoft’s mission to empower every person and every organization on the planet to achieve more. The CAASE team is instrumental in delivering world class and innovative hardware at scale to ensure a high-quality experience to the millions of Microsoft Azure customers. This is an excellent opportunity to work on hyperscale challenges and cutting-edge technologies. #CAASE #AZURE #Cloud 

As Microsoft's cloud business continues to grow, the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the CAASE team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure.

We are looking for a System Engineer to join the team.

#azurehwjobs #CAASE #SCHIE #AI

Responsibilities

  • Collaborate with architecture, silicon engineering, firmware, hardware design, hardware validation, OS (operating systems), manufacturing, and customer teams to build state-of-the-art AI, computer, storage, networking, and accelerator hardware solutions. 

  • Plan and lead System debug activities. Work with cross organization teams in defining pre-Silicon platform bring up, test and validation execution. Own and drive the platform bring up with SOC, test and validation plans & execution.  

  • Be able to lead cross functional/cross org work groups leading innovative solutions and solving complex problems.

  • Analyze new interfaces and subsystems to develop integration plans, analyze power efficiency, debug integration issues, and provide recommendations. 

  • Define system behavior and concept of operations for the platform to ensure compatibility with Microsoft Azure datacenter software, serviceability, telemetry, and customer expectations. 

  • Perform NUDD (new, unique, different, and difficult) technology and feature analysis and provide risk assessment and mitigations. 

  • Drive technical requirements and ensure the solution is flexible and scalable across the full (HW/FW/SW) stack. 

  • Enable platform and solution level discussions, influencing architecture of the product, and delivering to product goals across quality, reliability, and performance. 

  • Collaborate with internal, external, and open-source partners to onboard innovative technologies in a seamless manner. 

Qualifications

Required Qualifications

  • B.Tech/MS in Electrical/Computer/Electronics Engineering or related degree

  • 7+ years of relevant experience in Server systems/platforms design and/or validation for enterprise or cloud market segments, in compute and/or AI systems/platforms design and development.

  • Minimum 5+ years of hands-on experience in Cloud grade Front end and Back-end networks architecture and implementation.

  • Experience in post silicon validation, platform bring up, system Integration, functional validation and server platform validation.

  • Good grasp on the Ethernet - Physical layer, Data Link and Network layers, Congestion control, QoS, Traffic Classes

  • Understanding CLOS networks, routing protocols - BGP, ECMP, Lossless networks, Congestion handling -DCQCN, PFC, CBFC.Understanding NPU architecture and relation to network performance like bandwidth, RTT latencies, Packet size diversity 

  • Understanding on Networking hardware - QSFP-dd cables, DACs, AECs, Cable Backplanes, NICs, PHY, Switches.

  • Ability to define validation test cases to qualify end to end network across functionality, performance and scale testing

  • Ability to trouble shoot network issues at multiple layers - Physical layer, Datalink and Network Layer, Protocol layer

  • Great to have: Understanding on AI Network, Network Collectives, Traffic profiles in AI networks, Ultra Ethernet 

  • Experience in platform level test architecture and usage of debug tools like (Lauterbach, Arium, ARM JTAG tools, debug emulators or equivalent.

  • Experience in debugging complex system level issues and ability to root-cause/identifying potential fixes down to a board hardware, signal integrity, CPLD/FPGA, thermal and Firmware components, OS is required.

  • Programming Skills: Perl / Python / Shell Scripting.

  • Excellent communication skills (verbal and written) to interface with cross-functional technical teams within and/or outside the organization.

Preferred Qualifications

  • Experience in evaluating off the shelf OEM hardware designs, HW/FW/OS interactions, platform config trade-offs, performance tuning and optimizations isrequired.

  • Knowledge of high-volume silicon (SoCs, GPUs, or FPGAs), compute, storage, manufacturing, and deployment.  

  • In-depth experience with operating systems (Windows and/or Linux), system firmware (BIOS, BMC), and system security (hardware and software).  

  • Functional knowledge of secure boot, attestation, FW update & recovery on server platform architectures. 

  • Advanced troubleshooting and debugging skills.  Familiar with networking, power, rack device management and remote access environments.

  • Experienced in debugging complex system level issues and ability to root-cause/identifying potential fixes down to a board hardware, signal integrity, CPLD, thermal and Firmware components, OS is required.

  • Understanding of AI/ML workloads and how to validate software stacks, such as tensorflow.

  • Strong verbal and written communication and presentation skills. 

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

DirectEmployers