• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Where are all the Google search results?

Checkmite

Skepticifimisticalationist
Joined
Jun 7, 2002
Messages
29,007
Location
Gulf Coast
I recently watched a video on YouTube called "The Dead Internet Theory 2". It is as you might expect from the title a continuation of a longer thesis centered around the notion that most of the internet isn't "real", in the sense that most of the content you can find that presents itself as the interests, opinions, and decisions of actual people using the internet was actually produced by machines for various purposes - chief among them the ubiquitous "public manipulation" by governments and the corporate sector of course, but also for less personal purposes, like fabricating entire customer bases for products and services that aren't actually consumed by anyone.

In this post I'm not really interested in the larger thesis so much as the part of it that is the focus of this second video, which you can watch here if you want:



It focuses on the possibility that the claimed power and reach of Google's internet search capability is a deliberate lie.

As a supporting argument for this claim, about five and a half minutes into the video the author conducts an experiment by doing a Google search for the generic term "pizza". As shown in the video, Google initially reports at the top of the results page that its engine has found 1,050,000,000 results in just over a second for "pizza". The author proceeds to click through the results pages, each of which contains 10 results, save the first page which only has 9.

As he proceeds through the pages, at certain points the number of reported results at the top curiously changes, increasing or decreasing by hundreds of millions at a time, until he reaches the last page of results - page 35, where the same message that a page ago had said "Page 34 of about 1,340,000,000 results", now says "Page 35 of about 348 results".

He notes at this point that at the bottom of the final page is a disclaimer, where Google says it has "omitted some entries" that were "very similar" to the 348 already displayed, and provides a link whereupon he can "repeat the search with the omitted results included". The author proceeds to do exactly that, and during the new search, a whopping 1.9 billion results is initially reported this time; but he is only able to browse the results up to page 53 - for a grand total of 529 actually-displayed results. There is no means to proceed further - Google simply will not give any more pages of results, even though it claims there's nearly 2 billion more to be seen.

The author of the video concludes that Google's reporting of billions of results, which certainly sounds reasonable for a search query as broad and basic as "pizza", is an utterly baseless and false claim, intended to make users think the search engine is expansive, when in fact it can only provide a little over 500 actual results for this search. Of course, central to the author's overall thesis is that it isn't merely a case of Google's search engine being far weaker than advertised, but rather that, contrary to what most people expect as a consequence of what they've been led to believe about the internet, the content actually isn't there to be found - it doesn't exist at all..

I submit that the author's conclusions are misguided.

I decided to replicate his experiment for myself. I chose a different, but still very broad and generic term, "snow". Like with the pizza search, no further (manually-submitted) context. Google, find me everything you can on the internet that has to do with snow!

For my search, Google reports 8.86 billion results, and that report doesn't change at all until we reach "page 28 of about 276 results", along with the link to repeat the search including omitted results. With the "omitted" results "included", Google reports...8.33 billion results, until it cuts of my search off at "Page 47 of about 462 results".

This validates the search done in the video; and like the search done in the video, the result is absurd and dismissable on its face. It is self evident that there are more than 462 pages on the internet that have the word "snow" in them.

We can verify this even on Google itself; I took a step the author did not and did a new search, this time for "snow Cleveland". Google sensibly reports 60.9 million results instead of billions, and initially puts me at "page 10 of about 100 results" with omissions, "page 43 of about 426 results" without. Compare this to the 462 results returned for just "snow", hardly any of which mentioned Cleveland, and it becomes obvious there are a lot of results in my second search which were not included in the first, even though they logically should have been.

This does not prove that Google's claims about finding billions of results for certain searches AREN'T lies intended to impress users. But it does prove that Google isn't stopping at only 500 or so results just because that's literally all it can find or access; the truth is, Google WILL NOT display all of the search results it finds for your query over a certain amount, omitting the vast majority of them even when you've clicked a link telling it specifically not to do that.

The author's conclusions are wrong-headed; but his observations are valid. Why does Google bother to tell you that it has found millions or billions of results from a search query when it is clearly programmed to never actually return more than a few hundred, and to cut any search off that exceeds that limit?
 
Last edited:
...snip...

The author's conclusions are wrong-headed; but his observations are valid. Why does Google bother to tell you that it has found millions or billions of results from a search query when it is clearly programmed to never actually return more than a few hundred, and to cut any search off that exceeds that limit?

Two reasons I can think of - one) it is a nerdily cool number and two) - marketing.
 
By letting you know that the actual number of results are vastly more than you would ever have enough time to examine even if they would show them all they are really telling you that your search terms are too broad and you are not even going to be able to see a representative sample of the actual results.

(In my opinion the original claim belongs in the conspiracy theories forum.)
 
Last edited:
Two reasons I can think of - one) it is a nerdily cool number and two) - marketing.

I am open to believing that the number of results that Google reports at the top of its search pages is exaggerated, maybe even by several orders of magnitude.

Although, it doesn't answer the question of why Google, independently of that number, just stops linking to results after only a few hundred are displayed. There doesn't seem to be any kind of technical limitation in play. It appears to be quite arbitrary and, more inexplicably, the number is different for each search. Why stop at 529 results for "pizza" but only 426 results for "snow", especially when the latter search term is initially reported as having several times "more" results than the former? You would expect a hard limit to cut off after a round number, like an even 500 or so.
 
I wonder, given how tenuous some of the later results are with relation to the question, whether there is a cutoff for relevance. If you put in "snow," it may report that there are millions of hits in which there is not only snow, but something relating to it, and then something relating to something relating, down to a multi-page essay that has a footnote that could be interpreted to relate to snow.

How right the guess is might be determined by how often the last entry on the last page is relevant. I can also imagine that in some cases when a search is said to return some enormous number of hits, it is not doing the whole search, but referring to some database that's already been compiled from previous searches.

This is all a guess on my part, but I would imagine that if every search returned millions of results, it would clog up the servers pretty quickly.
 
By letting you know that the actual number of results are vastly more than you would ever have enough time to examine even if they would show them all they are really telling you that your search terms are too broad and you are not even going to be able to see a representative sample of the actual results.

I'm not sure I can agree with that. If the concern was that your search terms are too broad, and they intend you try a narrower search, they could include a note to that effect, suggesting you include more terms.

Google doesn't do that because they don't necessarily need you to use a narrower search. They have search routines (aka, "the algorithm") that are expressly designed to use metadata like your location, previous search history, and things it tries to deduce about you from that data like your age and interests, to contextualize your searches, effectively adding additional search terms on your behalf under the hood as it were. That's why searching for "pizza" and nothing else gives you first and foremost a list of pizza restaurants closest to you (or at least closest to the IP address that's connecting to Google), a far more narrow and specific set of results than what you actually asked for.

But even if their attitude was "after THIS many results if they haven't found what they're looking for, our algorithm obviously doesn't have enough information and they need to help with more specific search terms", there's still no reason why they can't simply say that explicitly, and there's no reason why the "magic number" for ending the search results is different for every search.
 
Billions of search results in a couple of seconds is simply impossible. There is no way Google could find search enough websites in that time to get that many results for any words except perhaps "and" or "is".

What they do is maintain some search trees, and a billion hits must be some kind of estimate, that is the size of a branch with that word. Once you really want to see the websites, you get actual results .... or something. But why they do it in that way is beyond me.

Hans
 
Although, it doesn't answer the question of why Google, independently of that number, just stops linking to results after only a few hundred are displayed. There doesn't seem to be any kind of technical limitation in play.
No, technical limit at play?

Could you explain what you're expecting Google to do that would not have technical limits?

My understanding of the behavior you want is to endlessly poke through all the millions of hits that are stored in their index. How does Google do that in a way that is free of technical cost? And, then, I would also expect you want some degree of de-duplication, right? And probably you want some semblance of most relevant first? That all requires that something somewhere is remembering your result.

Isn't it obvious they can't remember that for you? Isn't it obvious it isn't even productive to figure out what page 10 is going to be when you haven't even clicked on a second page of results yet?

Let's take a case that seems really straight forward. You do a search, it returns 40 pages of results. You click on page 1 today. A month from now you click on page 22. So what is page 22? Is it what page 22 was supposed to be a month ago and they remembered it for you all this time? Is it what page 22 is supposed to be today and they need to figure out the 21 pages ahead of it to deliver it to you?

The obvious constraint here is they can't remember the complete result list of every search everyone does, nor can they reliably recreate an exact duplicate of every prior search result in the future, nor can they can transmit the complete list of results to you in a practical page.
 
No, technical limit at play?

Could you explain what you're expecting Google to do that would not have technical limits?

My understanding of the behavior you want is to endlessly poke through all the millions of hits that are stored in their index. How does Google do that in a way that is free of technical cost? And, then, I would also expect you want some degree of de-duplication, right? And probably you want some semblance of most relevant first? That all requires that something somewhere is remembering your result.

Well no; as I said, you only even get as far as you do by clicking a link Google itself offers to "repeat your search with too-similar results not omitted". So obviously for the purposes of such an experiment I'm okay with the duplicated results.

But the thing is, even acknowledging that that problem - of result duplication, or nearly-identical results and so forth - exists, if we're looking at the situation objectively there's simply no way we can be bumping up against that problem as a practical matter of fact for the kinds of searches we're talking about. There cannot be only 529 unique web pages referencing "pizza", and all the other billions of Google's results are just duplicates, reposts, or tangential but ultimately irrelevant uses of the word pizza. It's not possible.

And this is including that we've already realized the fact that Google is making certain presumptions and automatically refining your broad search terms. As we can demonstrate, when you search for the term "pizza" with no other context, Google presumes that what you're really looking for is a pizza restaurant, and serves you those results arranged by some combination of proximity to the user and popularity. But even on the last page, 53, the "end of the results", you're still getting unique listings for just pizza restaurants - and why wouldn't you be? According to one source I found, there's over 78,000 individual pizza restaurants just in the United States alone, and only around 15,000 of those belong to the big chains where you'd expect a few thousand locations to share a single website. So no, it's not a problem of running out of results that are unique or relevant to the search.

I do agree there is probably some kind of judgment in place over how many results is too many for Google to serve a given user; but it really does seem to be a completely arbitrary number, not something forced by trying to avoid duplication or irrelevant matches.

As for the technical limits - again, I'm not really sure why delivering 47 pages with the word "snow" on them is within the server's technical capability but delivering 48 pages threatens the integrity of the system. It would make sense from a standpoint of "well a line has to be drawn somewhere" - that's logical, and reasonable - but why is that line at 48 pages for "snow" but 54 pages for "pizza"? That's an awfully flexible "limit".
 
As for the technical limits - again, I'm not really sure why delivering 47 pages with the word "snow" on them is within the server's technical capability but delivering 48 pages threatens the integrity of the system. It would make sense from a standpoint of "well a line has to be drawn somewhere" - that's logical, and reasonable - but why is that line at 48 pages for "snow" but 54 pages for "pizza"? That's an awfully flexible "limit".

The limit might be something important to Google but not obvious to the user, such as only spending a maximum of so many milliseconds of server time per query.
 
But the thing is, even acknowledging that that problem - of result duplication, or nearly-identical results and so forth - exists, if we're looking at the situation objectively there's simply no way we can be bumping up against that problem as a practical matter of fact for the kinds of searches we're talking about. There cannot be only 529 unique web pages referencing "pizza", and all the other billions of Google's results are just duplicates, reposts, or tangential but ultimately irrelevant uses of the word pizza. It's not possible.
I wasn't suggesting that there are only 529 unique pizza places or anything even remotely like that. Here's a simple question addressing the point that I am actually trying to communicate: If a single web page appears in Googles distributed index 18,000 times because it's got 400 references on a page that keeps changing, would you want to see those 18,000 repetitions of the same page?

As for the technical limits - again, I'm not really sure why delivering 47 pages with the word "snow" on them is within the server's technical capability but delivering 48 pages threatens the integrity of the system.
Did you get anything out of the last part of my post explaining what the technical issues are with even page 10?
 
Last edited:
Is Google the only one involved in this "Dead Internet Theory" or are Bing and Yahoo in on the conspiracy? Seems like an easy theory to prove or disprove. I could design a search engine and find out I guess.

I can't think of a time where I couldn't find something I was looking for with a search engine.

And I am happy if Google is filtering out pizza search results such as:

"Pizza sucks!"

"No it doesn't!"

"Yes it epicly actually does, bra-heim!"

"No way, pizza rules!!!"

"Pizza drools!!!"

"Beyotch!!!"

"Karen!"
 
I try to avoid Google these days, and use Duck Duck Go, which gets around the issue by simply not posting how many results it's found. Nor how many pages of results it will return. You simply get a page and "more results" at the bottom.

By the way, I just read that DDG's Android app not only does not track, but disables tracking in a number of other apps.
 
Last edited:
I wasn't suggesting that there are only 529 unique pizza places or anything even remotely like that. Here's a simple question addressing the point that I am actually trying to communicate: If a single web page appears in Googles distributed index 18,000 times because it's got 400 references on a page that keeps changing, would you want to see those 18,000 repetitions of the same page?

Certainly not - but, again, we can't possibly be having to contend with the constraints imposed by that specific kind of problem after only 500 returned results. I understand the problem you're describing, but it simply doesn't seem relevant to my question.

Did you get anything out of the last part of my post explaining what the technical issues are with even page 10?

No; I was not able to completely understand what you were trying to say in the latter half of your post, or how it related to my questions. I don't think I ever said or implied that I was expecting Google to save the results of a search I conduct today for months on end until the next time I decide to revisit it. I expect that Google conducts a new search each time I ask it to.
 
Is Google the only one involved in this "Dead Internet Theory" or are Bing and Yahoo in on the conspiracy?

I'm not really interested in the conspiracy theory itself - I only really described it as context for the video, which is where I first discovered this quirk of Google's search engine that is the thing I'm really interested in. The theory seems to stem from a couple of very common reference fallacies that conspiracy-minded people often make - the assumption that all consequences as a rule are intended, and the assumption that changes observed among different members of a set must have a common cause. Basically, the author notes that the internet of 2021 and how people consume and interact with it, is vastly different from the internet of, say, 1996; he judges that those changes have ultimately been harmful or at least detrimental to internet users and the world in general, and concludes that all of those changes must have happened because someone positively wanted to cause that harm and took action for the purpose of fulfilling that goal.

I guess you can watch the videos if you really want to know the ins and outs of the claim; but that discussion, as someone noted earlier, is probably better suited for the Conspiracy Theories subforum.
 
Certainly not - but, again, we can't possibly be having to contend with the constraints imposed by that specific kind of problem after only 500 returned results. I understand the problem you're describing, but it simply doesn't seem relevant to my question.
How could the problem I describe not manifest immediately?

No; I was not able to completely understand what you were trying to say in the latter half of your post, or how it related to my questions. I don't think I ever said or implied that I was expecting Google to save the results of a search I conduct today for months on end until the next time I decide to revisit it. I expect that Google conducts a new search each time I ask it to.

Then how are you thinking that page 500 will make any sense in relation to page 1 in the behavior you seem to want? What mechanism do you think they can use to give a page 500 that meets the behavior you seem to want?

To avoid bogging this conversation down, could you explain something? At the bottom of a Google search, after you've searched it give you underlined (usually, underlined) page numbers that you can click on. Do you know what those page numbers are in HTML terms?
 
Last edited:
Is Google the only one involved in this "Dead Internet Theory" or are Bing and Yahoo in on the conspiracy? Seems like an easy theory to prove or disprove. I could design a search engine and find out I guess.

I can't think of a time where I couldn't find something I was looking for with a search engine.

And I am happy if Google is filtering out pizza search results such as:

"Pizza sucks!"

"No it doesn't!"

"Yes it epicly actually does, bra-heim!"

"No way, pizza rules!!!"

"Pizza drools!!!"

"Beyotch!!!"

"Karen!"

Yep.

Google gives you what it considers the "best results".

It does this with Google Images too.
 

Back
Top Bottom