Principal Machine Learning Engineer, Distributed vLLM Inference
Company: Red Hat
Location: Boston
Posted on: April 4, 2026
|
|
|
Job Description:
Job Summary At Red Hat we believe the future of AI is open and
we are on a mission to bring the power of open-source LLMs and vLLM
to every enterprise. Red Hat Inference team accelerates AI for the
enterprise and brings operational simplicity to GenAI deployments.
As leading developers, maintainers of the vLLM and LLM-D projects,
and inventors of state-of-the-art techniques for model quantization
and sparsification, our team provides a stable platform for
enterprises to build, optimize, and scale LLM deployments. As a
Principal Machine Learning Engineer focused on distributed vLLM
infrastructure in the llm-d project, you will collaborate with our
team to tackle the most pressing challenges in scalable inference
systems and Kubernetes-native deployments. Your work with
distributed systems and cloud infrastructure will directly impact
enterprise AI deployments. You would be j oining the core team
behind 2025's most popular open source project on GitHub .! If you
want to solve challenging technical problems in distributed systems
and cloud-native infrastructure the open-source way, this is the
role for you. Join us in shaping the future of AI! What you will do
Develop and maintain distributed inference infrastructure
leveraging Kubernetes APIs, operators, and the Gateway Inference
Extension API for scalable LLM deployments. Create system
components in Go and/or Rust to integrate with the vLLM project and
manage distributed inference workloads. Design and implement KV
cache-aware routing and scoring algorithms to optimize memory
utilization and request distribution in large-scale inference
deployments. Enhance the resource utilization, fault tolerance, and
stability of the inference stack. Contribute to the design,
development, and testing of various inference optimization
algorithms. Actively participate in technical design discussions
and propose innovative solutions to complex challenges. Provide
timely and constructive code reviews. Mentor and guide fellow
engineers, fostering a culture of continuous learning and
innovation. What you will bring Strong proficiency in Python,
GoLang and at least one of the following: Rust, or C++. Experience
with cloud-native Kubernetes service mesh technologies/stacks such
as Istio, Cilium, Envoy (WASM filters), and CNI. A solid
understanding of Layer 7 networking, HTTP/2, gRPC, and the
fundamentals of API gateways and reverse proxies. Working knowledge
of high-performance networking protocols and technologies including
UCX, RoCE, InfiniBand, and RDMA is a plus. Excellent communication
skills, capable of interacting effectively with both technical and
non-technical team members. A Bachelor's or Master's degree in
computer science, computer engineering, or a related field.
Following is considered a plus Experience with the Kubernetes
ecosystem, including core concepts, custom APIs, operators, and the
Gateway API inference extension for GenAI workloads. Experience
with GPU performance benchmarking and profiling tools like NVIDIA
Nsight or distributed tracing libraries/techniques like
OpenTelemetry. Ph.D. in an ML-related domain is a significant
advantage LI-MD2 AI-HIRING The salary range for this position is
$189,600.00 - $312,730.00. Actual offer will be based on your
qualifications. Pay Transparency Red Hat determines compensation
based on several factors including but not limited to job location,
experience, applicable skills and training, external market value,
and internal pay equity. Annual salary is one component of Red
Hat’s compensation package. This position may also be eligible for
bonus, commission, and/or equity. For positions with Remote-US
locations, the actual salary range for the position may differ
based on location but will be commensurate with job duties and
relevant work experience. About Red Hat Red Hat is the world’s
leading provider of enterprise open source software solutions,
using a community-powered approach to deliver high-performing
Linux, cloud, container, and Kubernetes technologies. Spread across
40 countries, our associates work flexibly across work
environments, from in-office, to office-flex, to fully remote,
depending on the requirements of their role. Red Hatters are
encouraged to bring their best ideas, no matter their title or
tenure. We're a leader in open source because of our open and
inclusive environment. We hire creative, passionate people ready to
contribute their ideas, help solve complex problems, and make an
impact. Benefits ? Comprehensive medical, dental, and vision
coverage ? Flexible Spending Account - healthcare and dependent
care ? Health Savings Account - high deductible medical plan ?
Retirement 401(k) with employer match ? Paid time off and holidays
? Paid parental leave plans for all new parents ? Leave benefits
including disability, paid family medical leave, and paid military
leave ? Additional benefits including employee stock purchase plan,
family planning reimbursement, tuition reimbursement,
transportation expense account, employee assistance program, and
more! Note: These benefits are only applicable to full time,
permanent associates at Red Hat located in the United States.
Inclusion at Red Hat Red Hat’s culture is built on the open source
principles of transparency, collaboration, and inclusion, where the
best ideas can come from anywhere and anyone. When this is
realized, it empowers people from different backgrounds,
perspectives, and experiences to come together to share ideas,
challenge the status quo, and drive innovation. Our aspiration is
that everyone experiences this culture with equal opportunity and
access, and that all voices are not only heard but also celebrated.
We hope you will join our celebration, and we welcome and encourage
applicants from all the beautiful dimensions that compose our
global village. Equal Opportunity Policy (EEO) Red Hat is proud to
be an equal opportunity workplace and an affirmative action
employer. We review applications for employment without regard to
their race, color, religion, sex, sexual orientation, gender
identity, national origin, ancestry, citizenship, age, veteran
status, genetic information, physical or mental disability, medical
condition, marital status, or any other basis prohibited by law.
Red Hat does not seek or accept unsolicited resumes or CVs from
recruitment agencies. We are not responsible for, and will not pay,
any fees, commissions, or any other payment related to unsolicited
resumes or CVs except as required in a written contract between Red
Hat and the recruitment agency or party requesting payment of a
fee. Red Hat supports individuals with disabilities and provides
reasonable accommodations to job applicants. If you need assistance
completing our online job application, email
application-assistance@redhat.com . General inquiries, such as
those regarding the status of a job application, will not receive a
reply.
Keywords: Red Hat, Chelsea , Principal Machine Learning Engineer, Distributed vLLM Inference, IT / Software / Systems , Boston, Massachusetts