comparison of centralized and decentralized networks

Last year, Youtube tried to lock some features like 4K videos behind a subscription service, the market did not bear it. Now Google will start deleting accounts (and content) that haven’t been active in 2 years. A few years ago, Whatsapp servers were facing a crisis because of the sheer number of “good morning” messages sent by indians. All of this points to one thing: It is not feasible for a platform to store user generated content for free.

This is a problem now and, as time goes on and user generated content becomes more complex, it will only get worse. What we need is a way for storage capacity to scale at the same rate as users and their consumption. The only way to do this is by using the distributed peer-to-peer storage technology behind torrents.

These companies make a lot of money, they can probably afford the storage costs

That was the bet. When companies like Youtube start, they know that they will have a lot of user generated content, and that will translate to huge storage costs. The bet is that monetizing all this data will cover the cost of storing it. For something like Twitter or Facebook, this still holds, but only because their content is text, which is orders of magnitude smaller than visual media like images or videos.

But when the content is mostly visual, the bet does not look like it holds. Consider the countless competitors Youtube had that couldn’t bear the storage costs. Think about how cashflow problems shutdown Vine.

You might point to Youtube itself as a counter example but my hunch is that it’s actually in the red because of its storage costs and being heavily subsidized by the fact that its parent company provides its storage. Look at how desperately they’re trying to increase revenue: dumping more ads than users will bear, trying to produce original content for premium members, taking larger and larger cuts of content creator ad revenue, introducing draconian content moderation to reduce the risk of advertisers leaving. They’ve even tried to fight their users on ads by having chrome neuter ad blockers, and by experimenting with blocking ad blockers themselves. Getting into an arms race with your own users isn’t a sign that the cash flow projections are looking good.

Obviously text based content looks good from a financial perspective, but that’s right now. Will it remain like that? If so then will text always be our primary medium of communication? Look at how fast technology is progressing and consider the sort of sci-fi world we might be living in in the near future. I feel like we’ll have a lot more complex mediums of communication and entertainment than text by then and as long as companies bet on user generated content in those mediums then they will face this exact same issue.

Don’t these platforms deserve a premium to store content that users consume?

No. You can’t squeeze users by selling all their data without giving them a sniff of it, and then reduce them to no more than eyeballs to view ads, and then also demand they pay for the privilege. I won’t even dignify this with a proper section.

The platforms could always monetize the content better

Maybe but I don’t think so.

We have seen that even when Youtube squeezes the users and creators as much as it can, it doesn’t earn enough to feel secure. That’s why it tried to lock features behind premium. And ad blockers aren’t just a bug to be squashed, they are a sign that the market will not bear the amount of squeezing that Youtube is putting it through. There is an upper limit to how much advertisement users will bear and ad blockers are a downward pressure that only pops up when that equilibrium is violated. No content platform will be able to fight it for long.

Another revenue stream could be monetizing the user generated content as training data for ML/AI. Before LLMs I would have said that AI can’t process unstructured data like user content cheaply enough to leverage a database as large as Youtube's. But LLMs can.

On the other hand, I think the free ride of stealing user data is not going to last forever. Platforms have been getting away with a very nice swindle where they extract a lot of value from user data for training ML/AI and not passing any of that value back on to the user (in fact it’s used to squeeze the user further with targeted ads). That’s a racket, but I don’t think it will be sustainable. People will get more savvy about all the weird value streams in tech, and which ones they should demand to be a part of.

What that will mean is that either companies will have to pay to exploit user data, or users will short circuit the training data value stream by using distributed LLMs. Either way, it’s not going to help platforms cover their data storage costs.

So what is the future of user generated content storage?

Torrents.

More specifically. The technology behind torrents: the BitTorrent protocol. Once you're done stealing entertainment media, looking into the actual structure shows that it’s actually genius.

You have a normal server client relationship, but the clients store all the data. This could be in chunks but it’s usually entire files because the client needs to consume them as well (what’ll you do with half a movie). The server, or “tracker”, only exists to know which client has what data, so that when another client needs it, they can go to the server and be redirected. The “leeching” client can then get the data streamed to it by all the “seeding” clients it was told about.

With this model of data storage, a central server can offload all the data storage to the clients and focus on the far less expensive tasks of tracking, load balancing, and securing space for endangered content. Meanwhile, the clients can simply stream content directly from each other. In fact, the content being as far out to edge as possible would make it the most effective CDN ever. Content that is most readily consumed in a particular geographic location will necessarily be stored at that location, minimizing latency.

This creates a data storage system that is flexible, allows for redundancies, and distributes costs and autonomy to the users in a way that both scales with usage, and is palatable.

But users are paying anyway

Yes but it’s cheaper.

In this model, users don’t have to pay a subscription. They have to “pay” by making some of their local storage available to the collective pool of storage space that they themselves benefit from. This creates a self policing behavior where users understand that they are only damaging people like themselves by not contributing back to the shared resource. The fact that this works is demonstrated by how private tracker groups kick out members who leech but don’t seed.

And the actual cost is going to be very feasible because storage space is something people have a lot of just lying around. People already buy and maintain high quality storage for personal use and most of it either unused or filled with unnecessary artifacts people forget to delete. Giving up 200GB out of the 1-2TB their laptop came with will be a small price to pay for the peace of not having to be pelted with ads for third parties or the service's own premium tier.

The cost of utilizing all the decentralized storage available in the world is far far lower than setting up and expanding centralized storage.

But how does the platform make money?

That's the neat part: it doesn't.

The reason platforms like Youtube deserve to monetize their platform is because they offer the storage and bandwidth necessary to deliver the content to users. Once users provide those resources for themselves, there is no basis for anyone to monetize this. It's a "by the people, for the people" type of value proposition.

In fact, because the data model is completely decentralized, a central entity can't impose any control whatsoever, as there is no choke point to block when content needs to be withheld. This is less a startup idea and more an open source project, where the actual application implements the logic that then has to be hosted by some users, whose own users supply the valuable storage and bandwidth. If the server does anything end users don't like, they can just take their storage resources to another one. That means no ads, no upsells, no censorship, and no moderation.

No moderation, won't it get taken over by neo-nazis and pedophiles?

Almost definitely. I don't really have a compelling rebuttal here.

This is an issue with all social media, not just this data storage model. Youtube itself has huge problems with providing a platform for far right extremists and pedophiles, but obviously they can have some slight chance of mitigating that by exploiting their centralized control. Even though the same scaling problem that makes it unfeasible for them to pay for storage will also make it unfeasible for them to moderate their platform effectively, a completely unmoderated platform will obviously not be able to do that at all.

But this is an issue with users abusing the tech, not a problem with the tech itself. I can see a lot of users policing the platform in a decentralized manner by ostracizing this sort of unsavory content. I can imagine filters being implemented that disallow content from being hosted based on simple exact match filters, ML models or even more powerful multi-modal LLM based engines.

The "refugees" from these platforms would just spin up their own instance that's tolerant of their nonsense, but that's true of any internet tech. You can do that with websites right now; look at 4chan.

I think it's a good tool for modeling data storage, with validation from existing technologies and online social systems. Ignoring it because of abuse potential would be like banning kitchen knives because they can be murder weapons.

It's a good idea and we should give it a go. It might be the only way forward as our data generation and consumption outpaces our centralized storage capabilities.

Decentralized, eh? Sounds like a job for the blockchain

Absolutely not.

Blockchain technology is very clever and offers a very promising solution to the digital double spending problem, but I will not go into the numerous ways its purpose has been bastardized by people who don’t understand its limitations. Blockchain gets shoehorned into a lot of applications where it absolutely does not make sense and this is one of them. So no blockchain

TL;DR

  • For services hosting user generated content, centralized hosting may not be feasible
  • Youtube is going to charge for high res videos and google will delete old accounts
  • What will be feasible is decentralized storage
  • The blockchain will not be useful here, so don’t try
  • Bittorrent will be able to store content in a cheap, flexible, and scalable manner
  • Centralized monetization won’t be possible here, so the users will have to invest in this themselves
  • Content moderation will be more difficult as it is for all federated communities