Understanding Scalability When Collecting Public Web Data for Business Purposes

Interview with Žydrūnas Tamašauskas, Chief Technology Officer at Oxylabs

Oxylabs
Oxylabs

As the chief technology officer (CTO) at Oxylabs, Žydrūnas Tamašauskas is responsible for a vast infrastructure that makes large-scale public web data collection possible. Balancing operational smoothness and quality with cost-effectiveness is a major part of this responsibility. We spoke with Žydrūnas about optimizing data collection to get the most value from available resources.

Scalability emerges as the key component of this optimization. According to Žydrūnas, it is crucial that businesses understand the scale of the data collection task at hand to produce desired results and manage costs.

Žydrūnas will expand on these topics in his presentation "Ensuring Scalability in Data Collection: Key Components, Challenges, and Advancements" in OxyCon 2024. The flagship conference of the public web data gathering industry will be held online on September 25th this year. Registration for the free conference is now open. The conference promises to inform everyone, from AI developers to cybersecurity companies, about the effective extraction of web data. Meanwhile, Žydrūnas' presentation seems perfect for tech leaders looking to innovate while staying on a budget or cutting costs.

Žydrūnas Tamašauskas
Žydrūnas Tamašauskas

Žydrūnas, please tell us a bit about your career path and how it led to being in charge of such a specific technology as web scraping solutions.

I have always been interested in servers, networks, and the Internet. My first work experience and degree relate to network administration. Back then, hardly anyone knew about web scraping or paid much attention to proxies.

I worked for a certain company for quite a long time when I first encountered big data and became interested in it. Later, working with a few companies that deal with big data expanded my field knowledge and gave me a deeper understanding of client needs and how to answer them with tech solutions. It allowed me to move to project management, heading product development, and finally, an invitation from Oxylabs, where I am responsible for the entire tech infrastructure and its cost management.

How is overseeing proxy and web data collection technology different from your previous experience?

It is definitely more challenging. The scale of infrastructure and operations was never so grand in my previous experience. The challenges to data gathering on such a scale are complex and thus require complex solutions.

Scale and scalability are the focus of your upcoming presentation at OxyCon. What makes this topic important?

Many clients of data collection solutions providers have little to no idea about the scale of operations their use case requires and the difference that it makes. In fact, by scaling infrastructure or the volume of data one way or another, you can save a lot of money. The fundamental thing is to understand the balance between the two. You can do a lot with few resources or a little with a lot.

We ask our clients about what they want to achieve. To create value with the collected data, they need to manage the scale on their side and figure out how it will be processed. Some of their initial goals may be unfeasible, given the scale. Thus, we advise our clients on choosing the right scale to maximize the value they get from the data. To sum up, scaling is crucial for budget management and future planning, and OxyCon will allow this message to reach a broader audience.

So, what should companies consider when estimating the correct scale for their data collection goals? What does optimal scaling depend on?

What matters is the technology you use and the environment of your data operations. Specifically, do you control this environment? Is it on-premise or cloud-based? Regarding technology, the hardware and software tools and programming languages make a difference to your scaling options. For example, reaching a grand scale of web data gathering is usually impossible without using proxies.

The other two things to consider are the types of data you want to collect and data platforms, in other words, where you will put the data you collect. This goes for all kinds of data collection, not just web scraping.

The problem is that there are very many options for how you can conduct your data collection. Many businesses still struggle to appreciate this. From our side, we can reach whatever scale the client wants. Our infrastructure opens all options. Thus, it is crucial that clients work out what they want to achieve and what means are available on their side.

The web data collection industry seems to be evolving fast along with the technological, regulatory, and economic circumstances surrounding it. What trends and developments will shape the industry's future?

Data security is already a major concern and will remain so. Data platforms play a crucial role here. If you want to gather a lot of data, think in advance about how to store it securely.

Additionally, a shift toward decentralization should increase the speed of web scraping, allowing to process big data faster. This is especially important to AI tool developers and other companies that need real-time big data.

Do you have any advice for businesses facing this future?

Generally, businesses should look for opportunities to optimize the costs related to web data-gathering. For example, by utilizing no-code tools for data gathering, companies can save system resources and the valuable time of software developers while getting the same results. Meanwhile, we offer software development kits (SDKs) for certain programming languages that help integrate directly into our services without having to figure it out manually.

Scalability also plays an important role here. Access to a scalable infrastructure allows flexibility in choosing the key data collection features, such as speed. Thus, if you know your business needs, you are free to optimize the costs.

Thank you. To hear Žydrūnas deep dive into this topic, register to OxyCon and tune in on September 25th.

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion
Real Time Analytics