The Art of IT Architecture: Insights on Robust Infrastructure Design from an Expert with 20 Years of Experience

According to Gartner's latest forecast, global spending on information technology is projected to reach $5.3 trillion in 2024, marking a 7.5% year-over-year growth. Notably, investment in data center systems is expected to rise by as much as 24%. As IT infrastructure becomes increasingly complex and distributed, the risk of unintentional disruptions also grows.

A striking example of the risks associated with IT complexity is the global outage at Crowdstrike in July, caused by an update error that conflicted with the Windows operating system, resulting in the failure of 8.5 million computers worldwide.

As IT systems become increasingly intricate, the demand for IT architects is on the rise—professionals capable of configuring these systems. One such expert is Anton Davidovskii, a solutions architect at Broadcom.Inc (owner of VMware Inc., a global leader in virtualization software). With over 20 years of experience, Anton has delivered projects for European and American corporations across various industries, overseeing initiatives worth hundreds of millions of dollars.

Anton holds two VMware Certified Design Expert (VCDX) certifications and a Cisco Certified DevNet Expert certification. Worldwide, fewer than 60 people have achieved two VCDX certifications, while less than 1% of networking professionals hold the CCIE DevNet certification.

In this interview, Anton Davidovskii shares insight on his most significant projects, discusses IT system design, explores strategies for selecting vendors and technologies, and offers advice on how to avoid "breaking" complex IT infrastructures.

Would you start by explaining what an IT architect does?

An IT architect is a specialist responsible for designing the operation of large-scale, complex, and distributed IT systems. This can include everything from networks of data centers to mobile communication systems for operators with thousands of towers or the infrastructure for international banks. Today, this role is in high demand across nearly all medium and large businesses, as companies can no longer rely on simple off-the-shelf solutions.

An IT architect develops a system design that aligns with the client's goals, business requirements, and various constraints. This design integrates multiple elements of IT infrastructure, including servers, databases, networks, and applications.

The architect also selects technologies and vendors while preparing a testing methodology for them. They analyze the possibilities for integrating new tools with existing systems, considering compatibility, scalability, costs, and other factors. Additionally, they ensure the system's architecture prioritizes security and disaster resilience. Importantly, they document all aspects so that other team members can effectively maintain, improve, and upgrade the infrastructure.

What skills are essential for such a specialist?

A strong technical and analytical foundation, but that's not all. Well-developed soft skills are also crucial. On one hand, these skills are necessary for communicating with the client's representatives, helping the architect to understand their requirements and consolidate diverse ideas into an effective technical solution. On the other hand, soft skills are important for conveying the client's wishes to the team and coordinating their actions.

With over 20 years of experience in IT architecture, what projects have you worked on during this time?

I've worked on a wide variety of projects. One notable example was for a company that provided heat and electricity to an entire region. Our goal was to ensure the continuous operation of their IT systems, which involved building a distributed data center and guaranteeing the availability and security of information. We also needed to implement technical solutions for automatically transferring applications to an alternate site in case of an unforeseen event.

To achieve this, we utilized advanced tools, conducted extensive testing, and achieved target metrics for fault tolerance, performance, and recovery speed during catastrophic events. As a result, the client no longer experiences downtimes, which previously led to significant losses.

My team also developed an automated access rights assignment system for the largest bank in Russia and Eastern Europe, with a project budget exceeding $100 million. We built a new private cloud based on VMware Cloud Foundation and migrated over ten thousand of the bank's virtual services and machines. Ultimately, we deployed remote access infrastructure for over 30,000 users across all bank branches, resulting in faster and more efficient services.

In another project, we collaborated with the telecommunications company TELIA, which had over 24 million users in the Baltic region at the time. Our goal was to create a unified horizontal digital platform for 4G and 5G core network functions. The project was completed successfully, resulting in the deployment of the VMware Telco platform across 24 sites in Denmark, Estonia, Finland, Lithuania, Norway, and Sweden. Today, this platform handles up to 50% of the operator's total network traffic.

With your experience implementing large-scale projects across various industries, are there any industry-specific nuances you've encountered?

The primary nuances are related to regulation. For example, the financial sector faces strict requirements for data security and protection, including compliance with the Payment Card Industry Data Security Standard (PCI DSS), which was developed with the involvement of Visa, MasterCard, American Express, JCB, and Discover. Additionally, there are data privacy regulations like GDPR in Europe and CCPA in the U.S.

Each industry has its own requirements, especially in potentially hazardous sectors like oil extraction and refining.

How does this affect your work as an IT architect?

From the idea phase to implementation, I need to ensure that the technical solution complies with regulatory requirements, which directly influences system design. This means that legal norms must be analyzed at the project's initial and most critical stage—gathering and analyzing client requirements.

I prefer a methodology that starts with evaluating requirements, assumptions, constraints, and risks. Requirements are what must be accomplished, and gathering them can be complex, especially within large client organizations that involve many stakeholders and approval stages. In one of our European projects, for instance, there were 19 vice presidents and various working groups, each with their own often conflicting ideas that we needed to consider.

Constraints are factors that impact the feasibility of a project and can be regulatory, technical, or organizational. For example, a client may be unwilling to change certain business processes to accommodate a new solution.

Assumptions are hypotheses about the IT infrastructure that should ideally be confirmed or disproven. Finally, risks are events that could negatively affect the desired outcome, and there are various methodologies for addressing them. Some risks can even be disregarded if their likelihood or impact is minimal.

After gathering the data, it's essential to reconcile the requirements and constraints to find a compromise that satisfies all stakeholders. For instance, if regulations require that certain data be stored separately from other information, I won't be able to fulfill the client's request to consolidate all data. Clear communication about these limitations is important. Only once this understanding is established can we move on to implementation.

What strategies do you use to make the requirements-gathering process as efficient as possible?

The first step is to obtain clear definitions from the client. Often, the client may not fully understand their own needs, so it's the architect's responsibility to dig deeper and uncover the underlying issues. This usually involves low-level tasks: a company might have established processes that top management overlooks, but a new technical solution could disrupt it, potentially leading to client dissatisfaction.

For example, it's common for two different departments to manage servers and data storage systems, communicating through a ticketing system. However, if we aim to implement a popular technology like converged storage—where servers also function as storage nodes—the client will need to reorganize their processes and determine who will take responsibility. Sometimes, we can find a compromise without our involvement as implementers. In other cases, the client may decide against adopting the technology, even if testing shows promising results.

What rules do you follow to optimize communication?

The most important rule is to document everything discussed during meetings and confirm it afterward, including what was discussed and the decisions made. If there are any disagreements, it's essential to meet again.

Beyond that, the approach can vary depending on the situation. Flexibility is crucial. If a client suddenly wants to make changes to the project, you can either refuse or agree for an additional fee. Alternatively, you might find ways to meet 70–80% of the new requirements with minimal effort. In some cases, you may need to agree to all changes.

There was an interesting incident with one of our clients: after a meeting, they called my boss and asked for my termination. This became an internal meme in our company, all because I tried to prevent the client from making a decision that would harm them. In some situations, persistence is necessary.

What is your approach to design and technology selection after gathering and analyzing requirements?

There are many design methodologies that aid in IT architecture development, such as Zachman or TOGAF. However, it's important to recognize that no methodology or framework is a rigid set of rules. For simpler projects, these frameworks may not even be necessary.

From a technical implementation perspective, there is a foundational core that is universally accepted. New technologies can often be integrated into this foundation. For example, when building a classic data center for a large enterprise, virtualization technologies will be essential. The key questions then become which components to use and how to configure them effectively.

How do you approach vendor selection?

The vendor selection process involves several steps. First, we evaluate the requirements for the supplier. For instance, when considering a data storage system, it's important to assess load profiles, required performance, storage capacity, fault tolerance, and any additional needs, such as technical complexities like data replication.

The second step is to present these requirements to potential vendors. Each category has a standard set of core vendors: for networking equipment, we usually consider one supplier; for servers, another; and for storage, a third.

There are also more niche solutions with innovative and rapidly evolving products. However, heavyweight vendors can often have the advantage of competing primarily on price. They can provide most of the required solutions at a competitive price while offering unified support. Plus, they often maintain spare parts warehouses, allowing them to deliver new equipment within hours in case of a failure—sometimes this speed matters more than an extra 10% in performance.

The third step is testing. While suppliers can provide their own test results, these are often not comparable. As an architect, my job is to determine the key parameters to focus on for maximum objectivity. We might find that System A meets 8 out of 10 criteria and costs X, while System B meets 9 criteria but costs 2X. Based on this analysis, the client will make the decision.

Of course, this represents the ideal scenario. In practice, clients have specific requests—such as a long-standing relationship with a particular vendor—and may be hesitant to switch.

Modern IT systems are highly complex. What methodologies can be employed to avoid breaking them?

GitOps is an approach to change management that treats the Git version control system as the single source of truth. The main idea is to store all configuration files, Kubernetes manifests, and other elements in Git. Developers propose changes through pull requests, and continuous monitoring tools like ArgoCD or Flux track these changes and automatically apply them to the target environment.

While this methodology helps improve deployment processes and automate updates, it is fundamentally a technical tool. One could plan all the processes in Excel, although it would be more time-consuming. The key isn't merely using GitOps; it's about establishing effective change management practices overall.

Here's a simple example: imagine an organization has three network equipment "boxes" that handle a lot of tasks. If one fails, it could halt the company's operations for several hours. Therefore, managing changes to IT infrastructure must be done very carefully, considering low-load times and having a reliable rollback plan.

Now, picture a service with 100 identical components. In this case, you can safely make changes to one component, conduct load testing, and then apply those settings to the rest of the system.

In general, most problems related to element failure in IT systems stem from human error. Someone might apply the wrong configuration, which then spreads throughout the system. Often, the companies experiencing these problems are already using GitOps. This highlights that GitOps isn't a cure-all.

Join the Discussion

The Art of IT Architecture: Insights on Robust Infrastructure Design from an Expert with 20 Years of Experience

Would you start by explaining what an IT architect does?

What skills are essential for such a specialist?

With over 20 years of experience in IT architecture, what projects have you worked on during this time?

With your experience implementing large-scale projects across various industries, are there any industry-specific nuances you've encountered?

How does this affect your work as an IT architect?

What strategies do you use to make the requirements-gathering process as efficient as possible?

What rules do you follow to optimize communication?

What is your approach to design and technology selection after gathering and analyzing requirements?

How do you approach vendor selection?

Modern IT systems are highly complex. What methodologies can be employed to avoid breaking them?

Amazon Confirms AWS Outage Fixed After Widespread Internet Disruptions Worldwide

X Will Allow Some Users to Request Inactive Usernames Through a Marketplace

Ticketmaster Unveils Plan to Crack Down on Scalpers Amid Mounting Criticism

Netflix's Ad and Gaming Push Faces Investor Scrutiny as Growth Accelerates

Elon Musk Faces Fresh Blow as Glass Lewis Urges Vote Against $1 Trillion Tesla Pay Package