The following talk was originally presented at the Devworld 2024 conference at the RAI in Amsterdam. With thanks to the conference organisers.

Today I’m going to talk about one of the biggest tech transformation projects that I’ve ever been involved with in my career. The place? Booking.com. The mission? To help everyone experience the world. The programming language? Perl. Let’s rewind the tape. The year is 1996. Crash Bandicoot, Independence Day, the Fugees.

Bookings.nl was founded, which would ultimately become Booking.com, one of the biggest travel platforms in the world. Nowadays, we sell flights, rental cars, you name, but back then, it was all about accommodations. And this talk is specifically about the accommodations area of the business.

Booking was really successful because it offered an enormous selection behind a single storefront, making it difficult to compete on supply. And it had A/B testing built into its DNA, reducing customer friction and making it difficult to compete on usability. What could go wrong?

The Camel in the Room

The programming language of choice in this area of the world circa 1996 was Perl. Now today, that would be an odd choice for an ecommerce site, but back then, it was perfectly standard. Perl is an excellent programming language — if you are an advanced programmer working on a small team in a start-up.

Perl is great for:

Small teams of expert programmers
Carefully following agreed conventions
Holding each other accountable

But Perl in my view is not a good option for:

Many large teams of programmers of different levels
When there is a lack of clear conventions or examples to follow
In an environment without code review or close oversight

The second factor is rapid growth. Booking’s business grew astronomically, leading to similar growth in our workforce. When I joined Booking at the end of 2016, there were 50 new engineers joining per month.

I’ve given you a sense that something was about to go wrong. But what?

Difficult to change

It became difficult to make changes. Well, it was easy to make one change in one place. But it became difficult — and dangerous — to make holistic changes, because logic was replicated and reimplemented all over the place.

Difficult to understand

The code itself did not attempt to model business rules or data structures. This made it hard to know how the application should work, if it is working correctly, or if it merely appeared to be working.

Product diverges

The product began to diverge, especially across platforms. Developers’ work on one platform was typically not reusable on others, and each platform – web, mobile, Android and iOS – required its own bespoke implementation, effectively quadrupling the work.

Difficult to hire and retain

The fourth issue was related to talent. Whereas in most companies one can hire for the specific skills necessary, Booking was compelled to hire polyglots: people who already knew many languages and for whom ‘Learning Perl was not a hard blocker’.

To tackle these issues and more, Booking launched a large-scale tech modernisation programme. It’s time to change the wings while the plane is flying.

The goals of the programme were, and I paraphrase:

Mobile first: releasing cross platform, overarching services and APIs, modern tech stack
Increase velocity: faster lead time, better team productivity through reducing waste
Team ownership: clear product, page, component ownership, and operational excellence supported by metrics

Now, I’m going to share with you my top tips from being a part of this journey at Booking, where I was managing this programme for the Property page and the Booking process pages. The Property page is the most trafficked page in the core funnel. The Booking process is the most critical.

A central team, a target architecture

How does one initiate a programme like this in a large organisation? The obvious answer is to spin up a team of handpicked individuals with the specific mandate to drive the programme forward.

Identify pilot use cases and teams

The first goal of the team is to publish a target architecture whitepaper with broad buy-in.

Keep it simple, silly

Initially, the work was split across many different teams and stakeholders, to propose different tiers of the solution: one team for front-end, one team for back-end, and so on. This resulted in a massive amount of documentation, which was simply overwhelming. The tip is to keep the central team small and the whitepaper concise. Use diagrams supported by a short written narrative, and aim for a 20% solution that covers 80% of the use cases.

Too much philosophizing and not enough feedback meant that a lot of up-front work was done before validating it against actual customer needs. The tip here is to consider the early adopter teams as internal customers. It is much better to share early, learn and iterate, than to work in isolation, deliver a big-bang, and hope it hits the mark.

Take it off of the shelf

The company had a historical fear of not-invented-here. Favour industry-standard solutions. Most operational issues came from proprietary technologies in the stack. If you are going to have a critical part of the stack be proprietary, be prepared to fund it with operational support. Otherwise your central tech enablement team may end up mired in operations.

Approach tech transformation top-down, bottom-up and side-on.

Top-down mandate

The programme is dead in the water without top-down sponsorship from senior leaders. Unless the initiative is prioritised as a headline strategic goal for the company, something else will invariably come along and take precedence.

Bottom-up buy-in

The way engineers worked needed to be re-aligned with the future vision. This meant shifting the company’s culture, and setting a high bar of engineering expectations — building the right thing, the right way, and moving towards the north-star architecture, as we develop the product.

We cannot expect people to succeed if we don’t give them the training and tools that they need to be successful. The company espoused the value of “Learn forever”, encouraging team members to challenge the status quo, be champions of change, and push their skills to the next level. We provided team members with training, mentorship and tailored assignments to help them grow. With so much changing for the people, we have made a massive departure from the start-up mentality of “hack, hack, don’t look back”. This can result in friction which needs to be addressed. I will briefly tell 3 stories regarding people

Promotion case

We had a strong developer who believed he was ready to be promoted. But they had to learn new skills for the new stack, and be a champion of change. This was a bitter pill for him to swallow. Explaining the ‘why’, providing him mentorship with diverse perspectives, and assigning him tasks on which to grow, helped him become an expert, and ultimately get that promotion.

Strong developer conflict

We had another strong developer who found himself in conflict with the central team. I discovered that he had also left his previous team because he did not want to work on tech modernisation. It is an unfortunate fact that some of the best “move fast and break stuff” style developers, prefer not to work on large-scale engineering projects. It’s up to us as management to maximise on their skills anyway and make sure they can contribute effectively.

Product manager says ‘no’

My group product manager counterpart was dead-set against tech modernisation. Now, it is his responsibility to ensure that we are “Thinking customer first”, and to challenge engineering. What I’d highlight here is that engineering leadership buy-in is insufficient; it’s critical to have buy-in also from product management, and all crafts. In order to raise the bar and support people sufficiently, it was necessary to introduce Engineering managers, which did not previously exist in the company.

The company created a career framework for engineers, to highlight the necessity of things like architecture and systems design. This fed into the performance and promotions processes. Engineers who were doing well but who had not yet learned these skills, or had not yet applied them, had a gap to address that needed to be considered during promotions, for example. These changes in the expectations had to be communicated carefully to each developer, who had to be supported by management with a development plan to acquire these skills, and assignmentology to give them increasing exposure and opportunities to apply them.

How does one even begin to lay out the scope of work needed for a complex multi-year programme that is going to touch all areas of the business? It has to be broken down somehow. Approaching teams and asking them to ‘replatform their areas’ was too vague to be effective.

Break things down by pages/screens

We started by looking at one page at a time. We introduced a “stewardship model of ownership” for the Booking Process pages and for the Property Page. We published an ownership model that said: We are responsible for these pages overall: how they work, how they look and feel — but we do not own everything on the page. We can point you to the owners and help with centralised coordination. We then broke the pages down into visual components. So we made a massive Miro board of the Property page, for example, and blew it up into all the different permutations on all the platforms. We gave each of the components a simple id which could then be used in further communication to ensure that we were talking about the same thing with stakeholders.

The Miro board was then cross-referenced with a giant Google sheet, which detailed information about each component; most importantly, who the owner was, and the status of replatforming. This breakdown became the master sheet to understand project progress and completion. There were more than 140 unique UI components identified on the Property page, for example. This enabled us to specifically ask teams for plans to replatform components “A, B and C”, a much more salient request than “hey go away and replatform things”. This approach proved wildly popular in the company and began to be used for all kinds of projects that I hadn’t even considered.

All in all, there were more than 140 unique components on the property page.

Definition of done

Talking about UI components helped anchor the conversations with collaborators. But it led to the misunderstanding that we were talking simply about the front-end. Oh no. The entire supporting structure, the platform, if you will, that underlies and powers the UI components, also needed to be shifted onto the new stack. So we put forward a technical definition of done, based on the target architecture.

For us, this meant:

Data services used: Cassandra + Kafka + Java
Business logic: Java services for back-end business logic across platforms, hosted on Kubernetes
Federated GraphQL to connect across business domains
React-based micro-front-ends with server-side rendering and module federation
And our in-house design system called Booking UI

And here are the most controversial decisions that I made on this project:

Each line item assumed to be equal size; with >100 components, the differences would average out
No posting of partial progress; partial progress amounts to a rounding error because there are more than 100 components (<1%). Either it’s done or it isn’t.
Insist that platforms be tackled simultaneously to maximise return on investment into the context and prevent further platform divergence.

The biggest breakthrough in the project happened, when my good friend and colleague Nan de Rosa joined as Principal Technical Programme Manager. With the work breakdown, progress overview, and the definition of done, she was then well-positioned to reach out across the company and help each team align their roadmap to the tech modernisation programme.

She would explain the ‘why’ of the project to the teams, outline at a high-level what they needed to do, and would work closely with me and the central team to address questions or concerns.

There was one problem though — it turned out that few teams owned anything, and most components on the page were unowned. Phase one was about finding owners.

The company helped here in a few ways.

A value of ‘own it’ was introduced, and tooling called the ‘Ownership tool’ was created, which at least socialised the concept of owning things (even if it ironically did not support page or component ownership).

There were two further insights from my colleagues that I found useful:

You don’t need to work on everything that you own (Helena Berkhout). It makes sense for teams to own the aspects of the application that they most need to develop on.

We encouraged teams to own everything end-to-end that they needed to develop their product freely, minimising dependencies. We split out services and databases, moving from a central service team model, to a DevOps model where if you build it, you ran it. This was supported by the company’s shift to DevOps, bringing SLAs for services, PagerDuty for alerting and a 24/7 on-call rotation with after hours compensation.

A critical enabler to this project was the ability to run the old and new stacks side-by-side.

Now, this has major drawbacks.

It’s slower, obviously – you need an additional piece to switch between the 2 systems
It’s more complicated, obviously – you have two systems in play
And engineers find it unsatisfying — the earliest failed attempts at this programme attempted to recreate entire pages from scratch. Initial attempts to rewrite entire pages from scratch did not work out.

There are also tremendous advantages.

Product development does not stall; it can continue
Shipping early and often — getting immediate feedback on small increments of modernisation
We can measure the precise impacts, positive or negative, of each incremental unit of modernisation
We can be agile, and reprioritise modernisation work as product priorities shift, critical in a multi-year project
Always possible to roll-back

Especially on a multi-year programme, it’s critical to be able to reprioritise and re-sequence

Perhaps the biggest mistake I made early on in this project was allowing it to be positioned as an engineering project. The biggest tip I can give you is to work hand-in-glove with your product, design and business counterparts.

Don’t lift and shift

Interrogate the product function – rather than ‘lifting and shifting’, revisit from first principles, what the actual value to the customer and business is in each component, and aim to retain the useful 20% of functionality that gives 80% of the benefit.

Validate business value through block-outs

Many items began with actually blocking-out the UI component in question, to determine whether it’s actually useful for customers or beneficial to the business. In many cases, it was not.

A block-out is a really simple A/B test where you just remove the component in question for some customers and measure the difference compared to the other customers.

Invest generously in discovery

I already mentioned the codebase had low discoverability. Allow plenty of time to investigate the code and get feedback from product. The number one cause of delays on this project was insufficient time invested upfront in discovery. On the other hand, we never once regretted allocating extra time for it.

Revisit and modernise the design

Since our teams are building up the context around the component anyway, let’s take the time to revisit and modernise the design.

Agree on and stick to an acceptable cost

Agree on an acceptable cost with your business partners and stick to it. Whereas A/B testing is usually used to measure a significant improvement, we also have the ability to measure change within an acceptable cost. This is seen as taking one step backwards in order to take 2 steps forwards.

Working agreement regarding new tech stack

We also had to agree on some ways of working: essentially, to favour working on the new stack, and backport as soon as possible otherwise.

Which component should be modernised next? You could go top-to-bottom. Or pick components at random. But I’d suggest to prioritise based on the product roadmap. In other words, to prioritise replatforming those components which would soon require development in order to achieve the product vision.

Those are the ones where the benefits and feedback will be most immediate. It’s a two-step process: Laying the foundation, then building on top of it.

Slide deck

Abstract

As the world’s leading online travel agency, Booking.com grew rapidly – but its systems were bursting at the seams. The largest Perl deployment on Earth became rigid, and risky to change. Learn how the company re-imagined its core systems – for innovation, and a best-in-class customer experience. This presentation explains how Booking introduced a service-oriented architecture – based on Java, GraphQL and React. The attendees will leave with unique insights on implementing a large-scale tech modernisation programme at a Fortune 500 company

Ronen Agranat Consulting

Tech modernisation at Booking