• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Internet chaos as Cloudflare goes down.

I bet it's DNS again. It's way more complicated than I could hope to avoid having to explain.
Maybe yes + bad config in their servers. I am seeing CORS failure on their crapchallenge pages. Frankly, good thing very few sites in my country use them. (And AFAIK no eshop, CF's incompetency with their "protection" would be costly as is, with outages like this, it could be downright company-ending)
 
X and Chat GPT
Wow, those two are biggies. I have another app or two which are having issues. Here now a company I wasn't aware of is responsible for 20% of internet traffic? That's still why I don't believe in keeping anything personal (like pictures or documents) in storage on the web, unless it's only a backup.
 
Wow, those two are biggies. I have another app or two which are having issues.
Apollohoax.net is down too. The webmaster there recently added a Cloudflare human authenticator to cut down on bot traffic.

...CF's incompetency with their "protection" would be costly as is, with outages like this, it could be downright company-ending)
Indeed. My friend runs an ISP that offers boutique hosting. It's where I host my Apollo website. Some of his customers have 15-minute SLA agreements. That means that the site he hosts for them can be unavailable for at most 15 minutes for pretty much any reason, after which he has to pay them $100 a minute until service is restored. When their service fails, it's an all-hands-on-deck exercise.
 
So no great loss then....

In fact a major improvement on the overall factual accuracy of the internet with those two gone....
Can we make it permanent????
No loss at all that I can see. Has zero effect on any of my online activity.
 
I was going to give this one of those emoji/smiley reaction things, but I couldn't pick between 'ha ha' and 'wow', and what I really wanted was an 'oh, we're ◊◊◊◊◊◊', which we don't have (also: I'm not sure how that could be rendered in emoji form, not without giving poor Otto a stroke, anyway).
In the end I posted this rambling bollocks instead.
 
Last edited:
I was going to give this one of those emoji/smiley reaction things, but I couldn't pick between 'ha ha' and 'wow', and what I really wanted was an 'oh, we're ◊◊◊◊◊◊', which we don't have (also: I'm not sure how that could be rendered in emoji form, not without giving poor Otto a stroke, anyway).
In the end I posted this rambling bollocks instead.
Worth it.
 
Isn't it always?
Hmmm, in my experience about 5-10% of technical issues are genuinely hardware induced, excluding hardware problems caused by human stupidity (wrong equipment, stretching cables at ankle height, turning off cooling, et cetera).
 
From that report:
This time around the company plans to do four things:
  • Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input
  • Enabling more global kill switches for features
  • Eliminating the ability for core dumps or other error reports to overwhelm system resources
  • Reviewing failure modes for error conditions across all core proxy modules
And, of course, NONE of those solutions include anything like "TEST ALL UPGRADES IN A CONTROLLED ENVIRONMENT BEFORE RELEASING THEM!!"
 
Hey - it's not that easy - you need to get hold of an old style mini-spotlight bulb, a LED one doesn't produce enough heat.
I have been asked in the past to make a replacement 'bulb' for lava lamps (a mate actually collects them lol- there's no accounting for some peoples tastes...) and it really isn't that hard to make them up for them or indeed any application that needs a 'hot' lamp eg some older egg incubators, reptile cages etc all used bulbs as a heat source in the past...

You can buy 'bulb bases' online readily, and if its just a heat source needed, an appropriately rated resistor does the job fine, for those cases where both heat and light is needed, a resistor coupled with an LED does it... its a couple of minute job to make a 'replacement' bulb/heatsource up.... the parts are readily available online- hell, you can even buy the hand tool to do it although you can do it without one- or even full on production line machines to 'make your own' as a 'production line' job...
1763683816408.png1763683870753.png
 
I generally like cloudflare. They're easy to use, and they're a perfect upstream for my pi-hole. This is a pretty big blunder though.
 
Yep. They do apologise but from the start it's as if the change to the database was a natural event that just happens.
This has more detail.


The change to the database is actually done often as new threats are detected and analysed. It's more of a continual process. For reasons of reliability and performance, the memory is allocated for the rules only once then each monitoring task starts. That means it can only fit a certain amount of rules to fit in that fixed size.

Their code to detect and react on a larger than permitted number of rules is not well written. It just does a hard fail without providing helpful diagnostics. The logic of how to elegantly deal with a larger and more complex set of rules than permitted was never implemented.
 
Questions such as, "Does the database query return the kind of result the programmer expected?" seem like something that should have been tested in a development environment and could easily be tested in staging environment. Not the kind of thing that needs to be thrown into production with fingers crossed. Ostensibly this is something my software team would have caught via review. All code in our critical applications must be approved by two senior software engineers before it can be accepted into the version control system. I know hindsight is 20/20, but programming constructs that on their face seem to produce an unrecoverable error are the kinds of red flags our team notices.

Programming that responds to errors in input data (which would include there being too much of that data) by aborting the program doesn't seem well thought out for a critical, ongoing process. I can immediately think of different programming techniques to mitigate this. But it boils down to simply avoiding allowing an unhandled exceptional condition in production code. I agree with the presenter in the video: it largely doesn't matter what language you use or how it models program exceptions.
 
The hardest part for Cloudflare is that they let the cat out of the bag too. When flaws like these get caught it generally sends signs to hackers of what type of flaws the company is prone to and what their processes are to handle data. Cloudflare better be in panic mode right now.

They always seemed like they had it together. The sheer time involved in managing traffic to 1.1.1.1 when that project took off had to be crazy.
 
Not every aspect of a global scale production system can be tested outside of that system.
These are billion-dollar companies. They can build big enough environments to test all their distributable components.

Also, decent testing should involve edge conditions such as query response overloads. "What happens if this query returns the whole database? Is that possible, and how? Is that technically a bad situation? How do we handle that? How might we prevent that?" Etc.
 
One thing the organisation could do is implement any changes for one organisation and see what happens. If it works then do a few more. Then repeat until it is fully implemented. Worst case only a few organisations go down for a few minutes until they have backed out of the change.
 
Last edited:

Back
Top Bottom