
Introduction
Aditya Bhatia is a technology leader, engineer, and innovator whose career spans some of the most influential companies in the industry, including Splunk (Cisco), Apple, and Yahoo. With over 15 years of experience, Aditya has established himself as an expert in cloud infrastructure, AI/ML platforms, and distributed systems. His work has been instrumental in designing and scaling resilient, high-performance cloud architectures that power mission-critical applications. Aditya has a proven track record of solving complex technical challenges and delivering impactful solutions.
In this Q&A, Aditya shares insights into his journey, his passion for leveraging technology to unlock human potential, and his vision for the future of AI and automation.
Q: Can you share a bit about your background and how you got started in technology?
Aditya Bhatia: Sure!
I've always been fascinated by technology and its ability to solve complex problems. My journey started with a B.Tech in Computer Science from Vellore Institute of Technology, where I built a strong foundation in software engineering. Eager to dive deeper, I pursued an M.S. in Computer Science at New York University, focusing on distributed systems and machine learning.
My professional career began at Yahoo, where I worked on cloud services and automation, leading initiatives that improved CI/CD efficiency by 70%. This experience gave me a deep appreciation for scalable infrastructure and operational efficiency. From there, I joined Apple, where I contributed to the Siri TTS (Text-to-Speech) team, designing distributed machine learning frameworks that significantly improved voice model evaluation and training speeds.
Currently, as a Cloud & AI Automation & Infrastructure technology leader at Splunk (now a Cisco Company), I lead a team focused on building distributed workflow orchestration systems on Kubernetes, ensuring high performance and reliability for thousands of customers. Over the years, I've been fortunate to drive multi-million-dollar cost-saving initiatives, mentor engineers, and contribute to AI-driven operational automation.
Beyond my corporate roles, I am actively engaged in the global technology community. I am a Senior Member of IEEE and an active participant in ACM (Association for Computing Machinery) and AAAI (Association for the Advancement of Artificial Intelligence). These affiliations allow me to stay at the forefront of AI, cloud computing, and automation trends, contribute to research discussions, and collaborate with experts worldwide.
In addition to my technical contributions, I have a deep passion for mentorship and knowledge sharing. I have served as a judge and mentor at hackathons like Cal Hacks, and I regularly write and speak on topics related to Kubernetes, AI infrastructure, and distributed computing. Looking back, my journey has been defined by a relentless drive to innovate, simplify complexity, and empower the next generation of engineers—and I look forward to continuing this mission in the years ahead.
Q: What drives your passion for AI, automation, and cloud infrastructure?
Aditya: I believe technology is a tool to unlock human potential. AI and automation are not just about improving efficiency; they're about enabling creativity, innovation, and growth.
For me, the passion for AI, automation, and cloud infrastructure comes from the power of technology to simplify complexity and drive innovation. I've always been drawn to solving challenging problems at scale, and these fields allow me to do just that.
AI: Unlocking Human Potential
AI isn't just about automating tasks—it's about enhancing human creativity and decision-making. My work at Apple on distributed machine learning frameworks for Siri TTS showed me firsthand how AI can transform user experiences. For example, I've worked on projects like distributed AI/ML voice training for multilingual applications, which had a tangible global impact, by becoming the voice of Siri. The ability to optimize large-scale AI models, improve training efficiency, and make AI systems more reliable and scalable is something that excites me.
Automation: Reducing Toil & Increasing Efficiency
I believe automation is key to reducing operational toil and allowing engineers to focus on innovation instead of repetitive tasks. At Splunk, I lead initiatives that automate cloud deployments, optimize Kubernetes workloads, and improve infrastructure resilience, ensuring that thousands of customers can operate seamlessly without worrying about system failures. Knowing that I can design solutions that reduce manual intervention by 80% or improve system reliability with self-healing mechanisms is incredibly rewarding.
Cloud Infrastructure: Building Scalable, Resilient Systems
Cloud infrastructure excites me because it's the backbone of modern digital transformation. Whether it's scaling AI workloads, optimizing GPU clusters for machine learning, or building distributed workflow orchestration systems, I love architecting solutions that handle massive amounts of data efficiently and securely. At Splunk, I've led projects that improved cloud performance by 100% for 5,000+ customers and saved millions in operational costs—all by designing more scalable and resilient cloud platforms.
At the core of it all, my passion is driven by impact—the ability to build systems that empower businesses, reduce complexity, and enable engineers to do their best work. Whether it's through mentoring, writing, or leading engineering teams, I strive to push the boundaries of what's possible in AI, automation, and cloud infrastructure.
Q: What are some of your most significant career achievements?
Aditya: One of my most significant career achievements has been leading the development of a distributed workflow orchestration system on Kubernetes at Splunk, which improved cloud performance by 100% for over 5,000 customers. This initiative enhanced scalability, resilience, and operational efficiency, ensuring seamless automation and self-healing mechanisms for mission-critical workloads. In addition to this, I spearheaded multi-million-dollar cost-saving initiatives, optimizing cloud infrastructure to reduce resource waste and improve efficiency, resulting in over $3M in annual savings for Splunk.
At Apple, I contributed to the Siri TTS (Text-to-Speech) team, where I designed a distributed machine learning framework that improved AI model evaluation efficiency by 90% and cut voice model training time by 50%. This work helped scale Apple's multilingual AI voice models, enabling faster and more efficient development cycles. Similarly, at Splunk, I led cross-functional automation initiatives that reduced customer delivery time by 80% and implemented FedRAMP IL2 automation, increasing security and compliance efficiency by 90%.
Beyond technical innovation, I take immense pride in my role as a mentor and leader, having helped multiple engineers earn promotions and grow into leadership roles. I also established best practices for hiring and onboarding, which reduced onboarding time by 80%, making it easier for new engineers to ramp up quickly. My commitment to thought leadership and knowledge sharing extends to writing technical articles on Kubernetes, distributed AI, and cloud resilience, as well as serving as a judge and mentor at hackathons like Cal Hacks. Each of these achievements reflects my passion for building scalable, resilient systems, automating complex processes, and empowering engineers to reach their full potential.
Q: How do you approach leadership in such a fast-paced, innovative field?
Aditya: I approach leadership in a fast-paced, innovative field by focusing on technical excellence, mentorship, and fostering a culture of collaboration and continuous learning. For me, leadership isn't just about driving results—it's about empowering engineers, creating an environment where innovation thrives, and ensuring that teams have the support they need to tackle complex challenges effectively.
In rapidly evolving domains like AI, cloud infrastructure, and automation, staying ahead requires constant learning and adaptability. I lead by example, staying hands-on with emerging technologies while encouraging my team to experiment, challenge assumptions, and find creative solutions. Whether it's building scalable cloud platforms, optimizing Kubernetes workloads, or automating AI-driven systems, I believe in giving engineers the autonomy and guidance to drive impactful innovation.
Mentorship is at the core of my leadership approach. I've had the opportunity to mentor and coach engineers who have gone on to earn promotions and leadership roles, and I take pride in helping others develop the skills to solve problems independently. I also place a strong emphasis on collaborative decision-making, ensuring that everyone on the team has a voice in shaping technical roadmaps, best practices, and long-term strategies.
Being a Senior Member of IEEE and an active participant in ACM and AAAI has also been instrumental in my growth as a leader. These memberships allow me to stay connected with cutting-edge research, exchange ideas with industry experts, and continuously learn from pioneers in the field. Engaging with these communities helps me bring fresh insights into my work, ensuring that my team remains ahead of the curve in adopting emerging technologies and best practices.
Ultimately, I see leadership as a force multiplier—it's not just about what I can accomplish as an individual but about elevating those around me, enabling teams to work more efficiently, and driving innovation that has a lasting impact.
Q: In your blog and conference contributions, you emphasize digital resilience. How can enterprises build a more resilient AI-driven infrastructure in a world increasingly vulnerable to system failures?
Aditya: In the industry, as AI workloads are scaling rapidly, system failures are inevitable, and thus, digital resilience is a key metric which will make or break the businesses. Enterprises investing in AI-driven infrastructure must ensure that their systems are fault-tolerant, scalable, and capable of recovering from failures gracefully. This topic I've explored extensively in my research paper, Fault-Tolerant Distributed ML Frameworks for GPU Clusters: A Comprehensive Review, as well as in my Medium blog and my website, where I discuss key strategies for making AI infrastructure more resilient to failures.
AI models aren't just computationally expensive; they can break easily. A single GPU failure can cause hours of training time to be lost if there are no proper checkpointing mechanisms in place. In my research paper, I discuss the role of distributed training strategies extensively on how AI systems can recover from node failures, memory leaks, and hardware crashes without restarting from scratch.
In my Medium blog, I outline how Kubernetes-based AI workloads face new challenges in multi-cluster, multi-cloud deployments. Applications built on deep learning models such as LLMs need high compute, resilient data pipelines, and reliable networks, but all of these dependency requirements also increase points of failure. To handle these risks, it is critical to focus on observability, tracing, and alerting to detect such failures and solve them with automation. For example, implementing chaos testing of AI models, which intentionally introduces failures in staging environments, ensures that infrastructure is resilient before it reaches production.
Companies that will prioritize AI resilience will be the ones that will scale efficiently, reduce downtime, and build AI systems that will succeed.
Q: What is your overarching mission when it comes to technology?
Aditya: My overarching mission in technology is to simplify complexity, drive innovation, and empower engineers to build scalable, resilient systems that have a lasting impact. I believe technology should not only solve problems but also enable creativity, reduce operational toil, and unlock new possibilities for businesses and individuals alike.
At the core of my work, I focus on building scalable AI, automation, and cloud infrastructure that enhances performance, reliability, and efficiency. Whether it's optimizing Kubernetes workloads, designing fault-tolerant distributed machine learning frameworks, or automating mission-critical cloud operations, my goal is to create intelligent, self-sustaining systems that drive efficiency at scale.
Beyond technical innovation, I am deeply passionate about mentorship and knowledge sharing. I see my role not just as a builder of technology but as a mentor and leader who helps engineers grow, push boundaries, and realize their full potential. Through writing, speaking engagements, and hands-on mentorship, I strive to inspire the next generation of engineers to think bigger and innovate boldly.
Ultimately, I want to leverage AI and automation to eliminate inefficiencies, enhance decision-making, and shape the future of cloud computing. Whether it's through reducing operational toil, improving infrastructure scalability, or fostering a culture of continuous learning, my mission is to create technology that is not just powerful but accessible, impactful, and transformative.
I often say, "Technology should simplify the complex, not complicate the simple."
Closing
Aditya Bhatia's journey from a curious student to a seasoned industry leader reflects his unwavering dedication to innovation, mentorship, and driving measurable impact through technology. With deep expertise in AI, automation, and scalable cloud infrastructure, he has played a pivotal role in advancing distributed systems, optimizing large-scale machine learning frameworks, and enhancing cloud resilience. His work not only influences cutting-edge technological advancements but also serves as an inspiration for the next generation of engineers and innovators, shaping the future of these rapidly evolving fields.
To learn more about Aditya's work and connect with him, visit his website or LinkedIn profile.
LinkedIn: www.linkedin.com/in/aditya-nyu
Website: https://adityabhatia.com/