Netflix Revamps Tudum's CQRS Architecture with Raw Hollow In-Memory Object Store

79 points by NomDePlum 4 days ago

yodon 11 hours ago

The most incredible thing is someone thought it was a good idea to write an engineering blog post about a team who so screwed up the spec for a simple static website that they needed 20 engineers and exotic tech to build it.

It can be ok to admit you did something stupid, but don't brag about it like it's an accomplishment.

hnthrow20938572 18 hours ago

>However, due to the caching refresh cycle, CMS updates were taking many seconds to show up on the website. The issue made it problematic for content editors to preview their modifications and got progressively worse as the amount of content grew, resulting in delays lasting tens of seconds.

This seems superficial, why not have a local-only CMS site to preview changes for the fast feedback loop and then you only have to dump the text to prod?

>got progressively worse as the amount of content grew, resulting in delays lasting tens of seconds.

This is like the only legit concern to justify redoing this, but even then, it was still only taking seconds to a minute.

figassis 17 hours ago

Was just about to say this. There are many local first, open source CMSs and so the cost to customize them (or just build a plugin) to edit locally and publish remotely would be way less that this infra. What am I missing?
- Voultapher 13 hours ago
  
  Many expensive big ego engineers that want to feel useful with PMs to match.
  
  repeekad 13 hours ago
  
  aka what happens when you promote based on metrics rather than actual product sense
jiggawatts 14 hours ago

The common solution is to spin up a dedicated DNS hostname called something like "preview.www.netflix.com" and turn off all caching when users go via that path. Editors and reviewers use that, and that's... it. Solved!
- yandie 11 hours ago
  
  But I need to build a beautiful system with global scale!!
  They inflated the problem of “make content preview faster” for a small number of users to “make a fast global cache system”. That’s promotion material for you
- sandeepkd 11 hours ago
  
  A solution as simple as this is not easy to miss, on other side, to be fair its hard to know what other considerations were involved in the review and design process. Some one had to present a reasonable rationale to go in a certain direction.
  
  jiggawatts 6 hours ago
  
  > what other considerations were involved
  "DNS domains are managed by another team."

stopthe 9 hours ago

Having dealt with similar architectures, I have a hypothesis on how this library (Hollow) emerged and evolved.

In the beginning there was a need for a low-latency Java in-process database (or near cache). Soon intolerable GC pauses pushed them off the Java heap. Next they saw the memory consumption balloon: the object graph is gone and each object has to hold copies of referenced objects, all those Strings, Dates etc. Then the authors came up with ordinals, which are... I mean why not call them pointers? (https://hollow.how/advanced-topics/#in-memory-data-layout)

That is wild speculation ofc. And I don't mean to belittle the authors' effort: the really complex part is making the whole thing perform outside of simple use cases.

ram_rar 4 hours ago

I’m a bit underwhelmed by the quality of articles coming out of Netflix. 100 Million records / entity is nothing for Redis — even without RAW hollow-style compression techniques used (bit-packing, dedup, dict encoding is pretty standard stuff) [1].

Framing this as a hard-scaling problem (tudum seems mostly static, please cmiiw if thats not the case) feels like pure resume-driven engineering. Makes me wonder: what stage was this system at that they felt the need to build this?

[1] https://hollow.how/raw-hollow-sigmod.pdf

thinkindie 19 hours ago

Am I naive thinking this infra is overblown for a read-only content website?

As much as this website could be very trafficked I have the feeling they are overcomplicating their infra, for little gains. Or at least, I wouldn't expect to end having an article about.

Joe8Bit 17 hours ago

Reading the article I got the impression the big challenge is doing "personalzation" of the content at scale.
If it were "just" static pages, served the same to everyone, then it's pretty straightforward to implement even at the >300m users scale Netflix operates at. If you need to serve >300m _different_ pages, each built in real-time with a high-bar p95 SLO then I can see it getting complicated pretty quickly in a way that could conceivably justify this level of "over engineering".
To be honest though, I know very little about this problem beyond my intuition, so someone could tell me the above is super easy!
- ericmcer 13 hours ago
  
  Whats the point of building & caching static pages if every single user gets their own static page... The number of users who will request each page is 1?
  
  nchmy 12 hours ago
  
  I dont think that's what they're doing. They seemingly cache some common templates and then fill in the dynamic placeholders per-request. So, their new architecture that has some sort of distributed in-memory cache allows for doing so much more efficiently than, presumably, doing the same queries over the network. The way I understand it, its essentially just a fancy SSR.
  
  JamesSwift 8 hours ago
  
  > cache some common templates and then fill in the dynamic placeholders per-request
  Otherwise known as server-side includes (SSI)
- miyuru 14 hours ago
  
  > >300m users scale Netflix operates at
  They are not talking about the Netflix steaming service.
  https://www.netflix.com/tudum
  This is a site they are talking about which is very similar to a WordPress powered PR blog.
  
  giancarlostoro 13 hours ago
  
  So, is this a read-only candidate? Or is there something very nuanced about it? I've never even used this site.
  
  chatmasta 9 hours ago
  
  It’s basically a company blog.
  
  giancarlostoro 6 hours ago
  
  So it could have just been a SSG'd website...
- TYPE_FASTER 16 hours ago
  
  Yup. Content that may be read-only from a user's perspective might get updated by internal services. When those updates don't need to be reflected immediately, a CDN or similar works fine for delivering the content to a user.
  When changes to data are important, a library like Hollow can be pretty magical. Your internal data model is always up to date across all your compute instances, and scaling up horizontally doesn't require additional data/infrastructure work.
  We were processing a lot of data with different requirements: big data processed by a pipeline - NoSQL, financial/audit/transactional - relational, changes every 24hrs or so but has to be delivered to the browse fast - CDN, low latency - Redis, no latency - Hollow.
  Of course there are tradeoffs between keeping a centralized database in memory (Redis) and distributing the data in memory on each instance (Hollow). There could be cases where Hollow hasn't sync'd yet, so the data could be different across compute instances. In our case, it didn't matter for the data we kept in Hollow.
  
  motorest 15 hours ago
  
  > When those updates don't need to be reflected immediately, a CDN or similar works fine for delivering the content to a user.
  What leads you to believe a CDN is a solution to a problem of delivering personalized content to a specific user?
  
  TYPE_FASTER 14 hours ago
  
  Personalization could include showing the same cached content to many people. The personalization here could refer to which content is served from the CDN to which users.
  
  motorest 13 hours ago
  
  > Personalization could include showing the same cached content to many people.
  That's not the problem.
  The main requirement is to allow editors to preview their changes without having to wait multiple seconds due to caching refresh cycles.
  
  yandie 11 hours ago
  
  I can’t imagine the number of editors is that large? Are we talking about hundreds or thousands? Taking a look at the site there can’t be that many users that need this low latency preview capability.
shermantanktop 15 hours ago

When new people join your team and learn your infrastructure, I bet they often ask ”why is this so complicated? It’s just a <insert simple thing here>.”
And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”
- ericmcer 13 hours ago
  
  This is a static blog site with lists about popular Netflix content.
  Yes many times there are very valid reasons for complexity, but my suspicion is that this is the result of 20+ very talented engineers+designers+pms being put in charge of a building a fairly basic static site. Of course you are going to get something overengineered.
  
  RandallBrown 11 hours ago
  
  Tudum has some personalization so I'm not sure it's just a static blog site.
  I can't say whether or not it's complicated enough to justify this level of engineering. It could just be some engineers building something cool because they can, like you said.
- KronisLV 11 hours ago
  
  > And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”
  Sometimes it’s “It was made many months/years ago in circumstances and with views that no longer seem to apply (possibly by people no longer there). Sadly, the cost of switching to something else would be significant and nobody wants to take on the risk of rewriting it because it works.”
  It depends.
  
  shermantanktop 8 hours ago
  
  Engineers and product managers fall in love with their dreams a little too easily. A working system is a wonderful thing.
  Then again, sometimes it only “works” by waking people up with pages multiple times a night.
- motorest 13 hours ago
  
  > And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”
  All problems are trivial once you ignore all real-world constraints.
- jiggawatts 14 hours ago
  
  > And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”
  It's 500 MB of text. A phone could serve that at the scale we're talking about here, which is a PR blog, not netflix.com.
  They're struggling with "reading" from a database that is also used for "writes", a terribly difficult situation that no commercial database engine has ever solved in the past.
  Meanwhile they have a complex job engine to perform complex tasks such as joining two strings together to form a complete URL instead of just a relative path.
  This is pure, unadulterated architecture astronaut arrogance.
  PS: This forum, right here, supports editing at a much higher level of concurrency. It also has personalised content that is visible to users with low latency. Etc, etc... All implemented without CQRS, Kafka, and so forth.
  
  foobarian 13 hours ago
  
  Yes but HN is not Yahoo scale
  
  yandie 11 hours ago
  
  This Netflix blog isn’t yahoo scale either
__alexs 18 hours ago

Doing weird pointlessly complicated stuff on a niche area of your website is a not entirely ridiculous way to try out new things and build new skills I guess.
dakiol 18 hours ago

Most of the tech infrastructure out there is over engineered. At least based on my experience.
- busterarm 16 hours ago
  
  I remember being interested in their architecture when I attended re:Invent in 2018. I went to four separate Netflix talks given by four separate people with wildly different titles, teams and responsibilities. The talks had different titles indicating a variety of topics covered. Two of these talks weren't even obviously/outwardly Netflix-focused from the description -- they were just talks supposedly covering something I was curious about.
  All four speakers ran the exact same slide deck with a different intro slide. All four speakers claimed the same responsibility for the same part of the architecture.
  I was livid. I also stopped attending talks in person entirely because of this, outside of smaller more focused events.
  
  geodel 13 hours ago
  
  I don't know because I have not been to AWS Re:invent. But from what I have seen at workplace is that trip to this event is equivalent of corporate junket for mid-level developers who happened to be manager's favorite.
motorest 15 hours ago

> As much as this website could be very trafficked I have the feeling they are overcomplicating their infra, for little gains.
This sort of personal opinion reads like a cliche in software development circles: some rando casualy does a drive-by system analysis, cares nothing about requirements or constraints, and proceeds to apply simplistic judgement in broad strokes.
And this is then used as a foundation to go on a rant regarding complexity.
This adds nothing of value to any conceivable discussion.
- inquirerGeneral 13 hours ago
  
  [dead]
- gcr 15 hours ago
  
  characterizing netflix as a "read-only" website is incredibly shortsighted. you have:
  - a constantly changing library across constantly changing licensing regions available in constantly changing languages
  - collaborative filtering with highly personalized recommendation lists, some of which you just know has gotta be hand-tuned by interns for hyper-demographic-specific region splits
  - the incredible amounts of logistics and layers upon layers of caching to minimize centralized bandwidth to serve that content across wildly different network profiles
  i think that even the single-user case has mind boggling complexity, even if most of it boils down to personalization and infra logistics.
  
  rpsw 15 hours ago
  
  This blog about an architecture change is about the Tudum website specifically, not the whole of Netflix.
  
  motorest 15 hours ago
  
  > characterizing netflix as a "read-only" website is incredibly shortsighted considering:
  The people talking about "read-only" didn't even bothered to read the overview of the system they are criticizing. They are literally talking out of wilful ignorance.
  But here we are.
  
  jiggawatts 14 hours ago
  
  > constantly changing languages
  Wouldn't that be nice!
  NetFlix still only supports 4 or 5 subtitle languages.
  Their billions of dollars of fancy-pants infrastructure just doesn't scale to more than half a dozen of so text files.
rokkamokka 19 hours ago

From an outsiders perspective Tudum does seem to be an extremely simple site... But maybe they have complicated use cases for it? I'm also not convinced it merits this level of complexity
pram 18 hours ago

I’m gonna take a wild guess: the actual problem they’re engineering around is the “cloud” part of the diagram (that the “Page Construction Service” talks to)
There is probably some hilariously convoluted requirement to get traffic routed/internal API access. So this thing has to run in K8s or whatever, and they needed a shim to distribute the WordPress page content to it.
- sunrunner 15 hours ago
  
  Alternative idea: the actual problem they’re engineering around is their developers CVs
- piva00 18 hours ago
  
  Having to run in k8s doesn't change that much, the description of a whole Cassandra + Kafka stack to deliver the ingestion of articles already says there's a lot more architecture astronaut-ing going on than simply deployment.
  I cannot imagine why you'd need a reactive pipeline built on top of Kafka and Cassandra to deliver some fanservice articles through a CMS, perhaps some requirement about international teams needing to deliver tailored content to each market but even with that it seems quite overblown.
  In the end it will be a technical solution to an organisational issue, some parts of their infrastructure might be quite rigid and there are teams working around that instead of with that...
  
  pram 18 hours ago
  
  Probably because they were available, and now they’ve done the equivalent of throwing redis on their containers. My main point is, the blog content is probably the “easy part” and whatever is consuming it is the “hard part”
geodel 13 hours ago

Not naive but perhaps missing that army of enterprise Java developers that Netflix employ do need to justify their large salaries by creating complex architecture to handler future needs.
boxed 19 hours ago

I mean, this is the company that invented microservices so....
- moomoo11 16 hours ago
  
  Microservices for _organizational_ challenges.
  Lots of people think microservices = performance gains only.
  It’s not. It’s mainly for organizational efficiency. You can’t be blocked from deploying fixes or features. Always be shipping. Without breaking someone else’s code.
  
  williamdclt 13 hours ago
  
  > Lots of people think microservices = performance gains only.
  I don't think I ever heard that. Who claims that microservice architectures are for performance gains?
  
  moomoo11 13 hours ago
  
  You’d be surprised, a lot during interviews I conducted. I think it’s mostly people that have not worked with them.
  
  boxed 16 hours ago
  
  In fact, it's opposed to performance in many cases. And dev velocity for small teams.
- thinkindie 18 hours ago
  
  very fair point - but there are valid use cases for microservices.
rvz 17 hours ago

> As much as this website could be very trafficked I have the feeling they are overcomplicating their infra,
That is because they are and it seems that since they're making billions and are already profitable, they're less likely to change / optimize anything.*
Netflix is stuck with many Java technologies with all their fundamental issues and limitations. Whenever they need to 'optimize' any bottlenecks, their solution somehow is to continue over-engineering their architecture over the most tiniest offerings (other than their flagship website).
There is little reason for any startup to copy this architectural monstrosity just for attention on their engineering blog post for little to no advantage whatsoever.
* Unless you are not profitable, costs of infra continues to increase or the quality of service is sluggish or it is urgent to do so.
- gf000 15 hours ago
  
  > There is little reason for any startup to copy this architectural monstrosity
  This is the only reasonable take in your rant, but the reasoning is off for even this. They have little reason, because they will never hit the scale Netflix operates at. In the very very odd chance they do, they will have ample money to care about it.
  
  rvz 11 hours ago
  
  We've already seen a decade of cargo-culting around the micro-services hype which was popularized by Netflix and we don't need yet another over-engineered solution which given the results of this one, delivers little value or gains to solving the problem.
  But many startups will die trying it anyway.

mkl95 16 hours ago

Isn't Tudum mostly a static site? It must be a great project to try out cool stuff on, with a near zero chance of that cool stuff making it to the main product and having a significant impact on customers. Most of the traffic probably comes from bots.

anonzzzies 19 hours ago

Handy to have [0] [1].

[0] https://github.com/Netflix/hollow

[1] https://hollow.how/raw-hollow-sigmod.pdf

refset 18 hours ago

> Hollow employs compression techniques
Given how prominently 'compression' gets mentioned I was excited to learn more, but it looks like it simply amounts to using GZIPInputStream/GZIPOutputStream within the blob APIs, or am I missing something...?

thecupisblue 19 hours ago

Holy shit the amount of overcomplications to serve simple HTML and CSS. Someone really has to justify their job security to be pulling shit like this, or they really gotta be bored.

If anyone can _legitimately_ justify this, please do, I'd love to hear it.

And don't go "booohooo at scale" because I work at scale and am 100% not sure what is the problem this is solving that can't just be solved with a simpler solution.

Also this isn't "Netflix scale", Tudum is way less popular.

sunrunner 17 hours ago

> Tudum is way less popular
I hadn't even heard of it until today.
- geodel 13 hours ago
  
  You don't know what your are missing :-)
- moralestapia 15 hours ago
  
  I thought the article was about some internal Netflix piece of infra (well ... it is in some way) but it really is some website for some annual event ... wow.
  This has to be one of the most over-engineered websites out there.
  
  yandie 11 hours ago
  
  And it’s still slow on my phone (and I’m not a logged in user)
gf000 15 hours ago

https://news.ycombinator.com/item?id=44950901
My guess
motorest 15 hours ago

> Holy shit the amount of overcomplications to serve simple HTML and CSS.
If you read the overview, Tudum has to support content update events that target individual users and need to be templated.
How do you plan on generating said HTML and CSS?
If you answer something involving a background job, congratulations you're designing Tudum all over again. Now wait for opinionated drive-by critics to criticize your work as overcomplicated and resume-driven development.
- solid_fuel 5 hours ago
  
  I hear "content update events that target individual users and need to be templated" and immediately rule out any approach involving a background job.
  - Reddit doesn't use background jobs to render a new home page for every user after every update.
  - Facebook doesn't use background jobs to render a new feed for every user after every update.
  - Hacker News doesn't use background jobs to render a new feed for every user after every update.
  Why? Because we have no guarantee a user will access the site on a particular day, or after a particular content update, so rendering every content update for every user is immediately the wrong approach. It guarantees a lot of work will be thrown away. The sensible way to do this is to render the page on demand, when (and IF) a user requests it.
  Doing N*M work where N=<# of users> and M=<# of page updates> sure seems like the wrong approach when just doing N work where N=<# of times a user requests a page> is an option.
  There's lots of less exotic approaches that work great for this basic problem:
  - Traditional Server-Side Rendering. This approach is so common that basically every language has a framework for this.
  - Single-Page Applications. If you have a lot of content that only needs updated sometimes, why not do the templating in the users browser?
  - Maybe just use wordpress? It already supports user accounts and customization.
- bonesss 14 hours ago
  
  Don’t forget the part where people not even a little exposed to the massive tech infrastructure at hand and local skill pool make WAGs about what us and isn’t cheap.
  Kafka might seem like extra work compared with not Kafka, but if it’s already setup and running and the entire team is using it elsewhere suddenly it’s free.
- yodon 11 hours ago
  
  >How do you plan on...
  This isn't an engineering problem. It's a PM+Eng Lead failing to talk to each other problem.
  If you need 20 engineers and exotic text for what should be a simple static site or Wordpress site, you're doing it wrong.
- nchmy 11 hours ago
  
  > How do you plan on generating said HTML and CSS?
  Is this not what SSR html - via any backend language/framework - has been doing since forever?
geodel 13 hours ago

Come on. These guys are Avengers: Infinity war against simplicity.

pyrolistical 9 hours ago

CQRS wasn’t the issue and in-memory object store is over kill.

The 60 second cache expiry was the issue. This is why the only sane cache policy is cache forever. Use content hashes as cache keys

spyrefused 16 hours ago

This brings back old memories, from when they released the first version of Hollow (2016, I think) and I started writing a port in .Net because I thought it would be useful for some projects I was working on. I don't remember why I stopped, I guess I realized it was too much work, plus working at a low level with C# was quite a pain...

porridgeraisin 17 hours ago

Mom the astronauts are back at it again

yunohn 18 hours ago

I don’t normally comment on technical complexity at scale, but having used Tudum, it is truly mind boggling why they need this level of complexity and architecture for what is essentially a WP blog equivalent.

immibis 18 hours ago

https://netflixtechblog.com/netflix-tudum-architecture-from-...

Concerning that their total uncompressed data size including full history is only 520MB and they built this complex distributed system rather than, say, rsyncing an sqlite database. It sounds like only Netflix staff write to the database.

gf000 15 hours ago

I don't know any of the details, but they seem to have moved a lot of their internal stuff to Hollow.
So maybe it's just an attempt at unification of the tech stack, rather than a concrete need. But Hollow itself is definitely heavily used within, e.g. see its whitepaper.

supportengineer 12 hours ago

“Too dumb”? How did that get past everyone?

michaelbuckbee 12 hours ago

"Tudum" is Netflix's signature sound that plays within the app. They also use it as a name for things like their events, etc.
piskov 11 hours ago

That’s the sound of Kevin Spacey’s ring knocking on the desk in House of Cards
Funny how they cancelled him but not the sound :-)

conceptme 18 hours ago

If it fits in memory why choose for cqrs with kafka in the first place?

revskill 13 hours ago

Money can fix skill issue and buy happiness.

solid_fuel 5 hours ago

I find this really confusing, and feel like I'm missing some things. Overall the impression I get from this post is of an application optimized in very weird places, while being very unoptimized in others.

A few thoughts:

1) RAW Hollow sounds like it has very similar properties to Mnesia (the distributed database that comes with the Erlang Runtime System), although Mnesia is more focused on fast transactions while RAW Hollow seems more focused on read performance.

2) It seems some of this architecture was influence by the presence of a 3rd party CMS. I wonder how much impact this had on the overall design, and I would like to know some more about the constraints it imposed.

    > The write part of the platform was built around the 3rd-party CMS product, and had a dedicated ingestion service for handling content update events, delivered via a webhook. 
    > ...
    >  For instance, when the team publishes an update, the following steps must occur:
    > 
    >  1) Call the REST endpoint on the 3rd party CMS to save the data.
    >  2) Wait for the CMS to notify the Tudum Ingestion layer via a webhook.

What? You call the CMS and then the CMS calls you back? Why? What is the actual function of this 3rd-party CMS? I had the impression it may be some kind of editor tool, but then why would Tudum be making calls out to it?

3) What is the actual daily active user count, and how much customization is possible for the users? Is it just basic theming and interests or something more? When I look through the Tudum site, it seems like it is just connected to the users netflix account. I'm assuming the personalization is fairly simple like theming, favorited shows, etc.

> Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.

It's unclear to me if this is users signing up for Tudum, the number of unique monthly visitors, the number of page views, or something else. I'm assuming that it is monthly active users, and that those users generally already have netflix accounts.

4) An event-driven architecture feels odd for this sort of access and use pattern. I don't understand what prevents using a single driving database, like postgres, in a more traditional pattern. By my count just the data from the CMS is also duplicated in the Hollow datastore and, implicitly, the generated pages. Of course when you duplicate data you create synchronization problems and latency. That is the nature of computing. I have always preferred to, instead, stick with just a few active copies of relevant data when practical.

> Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!

Compressed, uncompressed, this is a comically small amount of data. High end Ryzen processors almost have this much L3 CACHE!!

As near as I can tell, writes only flow one way in this system so I don't even know if RAW Hollow needs strong read-after-write consistency. It seems like writes flow from the CMS, into RAW Hollow, and then onto the Page Builder nodes. So how does this provide anything that a postgres read-replica wouldn't?

5) Finally, the most confusing part - are they pre-building every page for every user? That seems ridiculous but it is difficult to square some of the requirements without such a thing. If you can render a page in 400 milliseconds then congratulations, you are in the realm of most good SSR applications. This would immediately save a ton of computation because there is no need to pre-build these, why not just render them on demand?

Overall this is perplexing post. I don't understand why a lot of these decisions were made and the solution seems very over-complicated for the problem as described.