Tomas Montvilas is the Chief Commercial Officer at Oxylabs, a market-leading web intelligence acquisition platform. He has held various leadership roles in product and business development, sales, and marketing for over ten years. Tomas' extensive knowledge of the big data market and experience overseeing AI-based solutions development helps Oxylabs' clients leverage public web data as they embark on the inevitable process of digital transformation.
Oxylabs has just launched its new AI-driven solution, OxyCopilot. Do you consider AI to be a new strategic direction for your company?
For years, Oxylabs' vision was to lead the web data acquisition industry through a strong ethical stance and focus on constant innovation. The market is extremely dynamic and volatile today, and for us, staying ahead of the competition means properly investing in R&D, failing fast, learning faster, and constantly looking for ways to improve our product offerings. Currently, AI is the main technological disruptor in the global digital economy, so we will definitely continue experimenting with it.
OxyCopilot is a culmination of long years of work we have done in the fields of data acquisition, AI, and machine learning (ML). We established our AI/ML board of advisers back in 2020, gathering top-level experts with experience in organizations such as NASA, Google, and MIT. The board aims to help our employees put complex technological ideas into action. In 2021, we introduced our first ML model. In the following years, we secured around 15 patents for AI and ML–related technologies.
AI and web scraping have a mutually beneficial relationship—most AI systems today rely on large amounts of web data for model training, and advancements in web scraping technology have made it possible to collect such data. On the other hand, web scraping involves a lot of routine tasks that can be automated with the help of AI and ML, thus increasing speed and scalability. The breakthrough in large language models (LLMs) opened up new possibilities for many tech companies, including Oxylabs.
What is the idea behind OxyCopilot—what problem have you been trying to solve?
Most websites today have dynamic layouts, which poses significant difficulties when collecting data. Moreover, the data itself (for example, product information on e-commerce sites) might be presented in different ways on the same website. As a result, developers might spend up to 40 hours per week building parsers and fixing broken parsing pipelines. It is a mundane and costly task, so we started looking for ways to automate it a few years ago.
At that time, our ML engineers and data scientists worked on an AI-based adaptive parser. It was already clear that AI offers untapped potential; however, we struggled with common AI-model-related challenges, such as manual data labeling. The breakthrough happened when LLMs, such as GPT-4, became widely accessible. In the case of OxyCopilot, it covers the semantic understanding part, which was a hard nut to crack for our developers previously.
OxyCopilot is easy to use, even for junior developers, since they can enter natural language prompts and instantly get code examples for API requests and data parsers—think of it as ChatGPT for scraping professionals. OxyCopilot allows them to use our web data collection platform, Web Scraper API, without spending hours reading documentation and manually debugging parsers. The model can recognize very complex parsing patterns and even extract nested information.
So, the first problem we are solving here is efficiency—we help our clients save costly development hours spent on structured web data collection and shift their focus to other data management tasks. A developer survey we conducted with Censuswide in August 2024 in the UK and the US showed that building and maintaining data parsers is the second biggest challenge that haunts businesses collecting public web data—it was mentioned by 49% of respondents.
The second problem OxyCopilot is solving is infrastructure-related costs—57% of the survey respondents pointed out that maintaining a necessary infrastructure is the biggest business cost associated with data parsing. OxyCopilot works as a part of Web Scraper API, an auto-scalable platform that handles all web data collection steps; thus, our clients no longer need to maintain any infrastructure on their side. We simply deliver the data they need to a cloud platform they prefer.
AI is a hype word today, and many tech companies, including various web scraping services, use it to enhance their positioning. How do you see yourself in terms of competition; what does Oxylabs bring to the table that others do not?
Most web scraping providers offer small products or specific features they brand as AI-driven. We offer an all-in-one web scraping platform—the Web Scraper API—that covers everything from ML-driven proxy management and web unblocking to OxyCopilot-powered data parsing. It acts as a gateway to any website, from e-commerce marketplaces to travel sites or any other target from which a client collects public web data.
Some providers offer similar products to OxyCopilot; however, they are built in a way that requires calling LLMs for every request, making it a slow and costly process. Our technology uses a different logic, which makes it lean and cost-effective. So, what we bring to the table is, again, an innovation-driven approach, which is our way of standing out in the industry—by putting clients' needs first and searching for unorthodox solutions.
Since Oxylabs offers an entire data collection infrastructure, including proxies, we put a lot of emphasis on ethical proxy sourcing. We do it because we believe in it; still, we also noticed it is a strong competitive advantage—over the years, ethical practices have become increasingly more important to our clients, especially enterprise-grade companies. Also, we have to fight for top talents on the market, and a strong ethical stance is a critical asset in creating a motivating work environment.
Last year, Fast Company recognized Oxylabs as one of the best places for innovators. How do you motivate your team to come up with innovative ideas?
First of all, we put a lot of focus on patent policy—patent applications make up the lion's share of our R&D investments. In 2024, our portfolio surpassed 100 patents globally. To incentivize our employees, we reward them through our Inventor's bonuses, and anyone can be rewarded if they come up with a top-notch idea.
Despite the company's size, we don't have a corporate mindset, and that's another reason why innovative ideas can flourish. We encourage informal knowledge sharing; for example, our tech teams have quarterly "innovation mining" meetings where they catch up on patenting and innovations-related matters.
Giving back to the community is an aspect that some professionals, especially senior ones, also find fascinating. Just recently, we released an LLM-based tool that will be available on an open-source basis. It allows HTMLs to be parsed automatically by only describing the Pydantic model.
Of course, none of these motivators would probably bring the desired results alone—to innovate; organizations need the right people. We have a large team of data scientists, ML engineers, experienced web scraping experts, and developers specializing in different languages and tasks. The straightest path to innovation is through people with the right level of expertise and creative mindset.
Lastly, any interesting plans for the future?
In general, we will keep working on AI and ML-driven web unblocking. The fight with AI-powered anti-scraping measures is endless, posing many challenges for businesses collecting publicly available web data, including cybersecurity companies that have to outsmart cyber criminals who use anti-scraping solutions to block threat intelligence efforts.