Job Information
Netflix Distributed Systems Engineer (L5) - Compute Runtime in USA, United States
Netflix is one of the world’s leading entertainment services with 278 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
The Role
Netflix has been on the leading edge of cloud adoption since migrating to AWS 15 years ago and runs one of the largest Cloud footprints. The Cloud Engineering organization exists to manage that massive scale, constantly innovating to increase fleet-wide agility, efficiency, and reliability of the Netflix cloud infrastructure, while solving scale problems that we are the first to ever hit. We build, operate, and maintain Compute, Network, and Storage services so that developers at Netflix can rely on foundational building blocks when entertaining hundreds of millions of customers globally.
About the Team
The Compute Runtime team is responsible for the data plane runtime environment for our Kubernetes-based orchestrator, which handles millions of container launches per day. We also provide the base OS and system services to hundreds of thousands of EC2 instances. We thrive on solving complex problems and love sharing our learnings with our fellow engineers. Here is a short sample: “ Debugging a FUSE deadlock in the Linux kernel (https://netflixtechblog.com/debugging-a-fuse-deadlock-in-the-linux-kernel-c75cd7989b6d) ”, “ Investigation of a Cross-regional Network Performance Issue (https://netflixtechblog.medium.com/investigation-of-a-cross-regional-network-performance-issue-422d6218fdf1) ” and “ Talking IPv4 to IPv6 without NAT (https://www.youtube.com/watch?v=igJLKyP1lFk&t=9821s) ”
About the Role
We are seeking a highly skilled and accomplished engineer with demonstrable experience in evolving large-scale infrastructure systems and container runtimes on Linux. The ideal candidate will bring a combination of leading innovative solutions across functional teams and hands-on development experience in AWS/cloud, Linux user-space, networking, GPUs, and Kubernetes.
Key Responsibilities
Technical Delivery: Use your expertise to significantly advance the state of Netflix’s compute offerings for our single and multi-tenant partners.
Strategic Planning: Evolve our infrastructure to meet Netflix’s business objectives around Streaming, Live events, and Gaming.
Project Management: Lead your own and cross-functional teams to deliver on highly ambiguous and open-ended projects enforcing each stage of the Software Development Lifecycle framework.
Operational Excellence: Contribute to the ever-improving operational standards of our large-scale global services by applying engineering best practices and providing first-class on-call support.
Performance: Identify and resolve performance bottlenecks in the Linux networking stack and resource isolation components to optimize network traffic and minimize noisy neighbor issues for containers.
System Integration: Integrate Linux OS changes with user-space applications and container runtime, ensuring seamless operation within the Netflix ecosystem.
Presentation: Deliver write-ups, blog posts, and presentations at conferences such as Linux Plumbers and eBPF Summit to represent our Netflix engineering teams.
You will excel in this role with…
4+ years of experience evolving Compute infrastructure for an organization and 8+ years of software engineering experience.
Technical expertise in:
Distributed systems at scale, preferably on AWS
Linux application development and related package managers
Go, Java, or C/C+
Containers & runtimes-as-a-service
Linux performance debugging
Basic Networking concepts
Demonstrable experience delivering multiple strategic and ambiguous projects at scale.
Leading and influencing teams of 10+ peer engineers.
Excellent presentation, communication, and collaboration skills.
We are even more excited about…
Container Performance and Container Stack Contributions
Familiarity with ML/AI concepts
Knowledge of GPU architecture, CUDA, and workload optimizations
AMI Management
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity of thought and background builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.