18 March 2005

Feedster's Rafer: Which universe are you living in?

When I read things like this, I wonder which universe someone resides. Based on the public information I have, there is not basis for this statement, nor am I convinced there is any real data to back up the following comment:

Rafer's Comment

From the data that Feedster can gather publicly, Google, Yahoo, and Technorati historically crawl web pages (HTML) and largely ignore RSS feeds as unique sources of data for their search indices. Their crawlers look at the same web pages that you and I and other human beings do. A lot of the work they do from that point is to remove cosmetic information and redundancy to only include the information that is accretive to their relevancy metrics.


Archiving

If this is true, I find this somewhat surprising. I've got archived blogspots that google has picked up which neither Feedster nor Technorati have found. So I'm more than slightly skeptical that Technorati crawls anything other than current content. Again, this isn't Technorati's issue as it is well known, thus not basis to suddenly state that Technorati [is or is not better or worse] relative to Feedster. There's no new information.

I'm just surprised that given what David Sifry has apparently left unchallenged and what I've come to accept [that there are no historical changes to Technorati in the deep archives] that Feedster has concluded 180-degrees opposite.

I conclude there is no news, just a smokescreen. Perhaps someone has been taking lessons from Bob Wyman?

Feeds are pinged

Second, to suggest that these services "ignore RSS feeds" seems slightly strange. All this time, I'm pinging PingOMatic, and my Atom feeds is showing up just fine. How can one "ignore an RSS feed" and at the same time get that feed to register? Answer: This statement makes no sense.

Further, if Feedster did use Google for stats, then there would be no problem with incorrect links. But, there are problems. The links have historically not matched; and then there were changes, updates, and revisions.

It looks as though the issue is whether the service reliably can either intake the existing information from an outside service; or whether there's an internal problem on the service. I'm not convinced the interface necessarily works correctly. However, this isn't based on a problem with Google-Technorati-interface, but something that is going on with Technorati.

However whether Technorati [does, or does not] have a problem [a] internally or [b] reading Google is unrelated to whether [c] Technorati reads the feed. Technorati gets pinged when I ping PingOMatic; so I'm not clear that there is any basis to say that feeds are ignored.

Updates not picked up

Third, I'm not convinced that the services' web crawlers look at webpages or do anything with the updates. They simply take a ping from PingOMatic, and that's it. They don't crawl anything on the web; if they do, why are they not updating their old content despite the ping?

Besides, the purpose of having a ping was to get rid of these bots. Or are we suggesting that despite the ping, the bots are needed as backup? Wow, I find this incredible and bluntly unpursuasive.

If I update an old file, the services don't register the change, just the original ping. Deeply archived content remains un-updated [not changed in the service], until there's substantial user-end work on both publishing and ensuring the feed is actually showing up into the aggregator.

I may republish new content on the same original site, and the feed will get updated; but I see no evidence that Technorati then goes back and updates the original information in their archive. They may do it, but I'm not convinced that the deeply archived data-updates are registered or recorded.

Multi-platform, multi-channel comparisons

If Technorati compared two inputs [one from Google, and then compared it to the PingOMatic], there might be a case to suggest that there's a problem. But I'm not convinced this type of cross-platform comparison occurs either at Technorati or PubSub.

It could occur. But if it does occur at Technorati, then there's more to be understood why multiple pings go in, get reported, yet the links do not report; then at the same time there's historical data in Google that's not matching Technorati.

If Technorati did compare one of their platform's data with Google, then they'd have alarms going off. That there continues to be "no problem" [no alarms going off, still dropping content, and reporting things are fine] suggests that the checks with Google [if they exist] are not compared to the inputs from PingOmatic.

Of course, perhaps we have the opportunity to learn from PubSub's problem in re the apparent failure to quickly recognize there was a multi-channel-platform check that was not getting tested during the QA testing, but apparently was reported as a successful test.

Redundancy exists

Fourth, I'm not convinced that the services remove redundancy. Rather, the services, despite the ping, fail to display the content despite the ping and the open blogs. I've seen the same blog spots report multiple times: Once with an Atom feed, and another with a 2RSS feed, and a third time with a Feedburner Feed.

If the services got rid of the redundancy, then we have yet to explain why some aggregators [that use the same PingOMatic that Technorati relies upon] also report the same feeds three times, as if they were distinct.

Again, the services take the pings, report the content, but don not necessarily strip out the data based on content, but simply on URIs [maybe]. Again, whether there's a cross platform check on content is doubtful. I think they look only at URIs, not content.

Moreover, this assumes that the URIs are correct, and link to content. This has yet to be proven, not assumed.

Feedster's gaps

As a parting shot, if Feedster decides to exclude the bloggers from their pings [or whatever they want to do], it's not skin off my nose. For the most part Feedster doesn't capture my content. Despite the drops and gaps in Technorati, I'd use Technorati before Feedster.

Technorati and PingOMatic seem to have gotten the issues resolved, even sending over their personnel to PingOMatic. Other than spewing out apparent non-sense, what's Feedster really doing about these integration issues with both PingOMatic and the evolving FeedMesh?

Technorati's David Sifry has been able to get back on the issues. These blogs have been up for months. Why waiting until now to say something? You've had five [count them, 5] months to come up with a story, and this is the best you can do? Give me a break!

Oh, that's right. Scott, you use Feedster to monitor who is talking about Feedster.

Ladies and Gentlemen, I present to you the most excellent CEO of Technorati! Way to go, Dave...another example of why a great CEO continues to provide superior customer service. Do we need any other examples? I don't think so.

Translation: I don't bother checking with Feedster or PubSub any more for content--I have no basis to estimate what percentage of the content doesn't get picked up. I search with Google first, then Technorati/Blogdigger.

Phishing

As far as the phishing-spamming problem that Feedster has, you're free to review the various proposals in re G-Mail and tailor them to Feedster.

Also, there could be something set up that has a secondary check: Once the services get the feed's-ping, there could be a second round of confirming checks based on both IP, content, and the source of the feed to ensure the input to Feedster was bonafide. In my view, this is something that could get explored as a way to ensure the content was not spam before posting it, much less allowing it to pass out of PingOMatic/FeedMesh.

In my view Feedster has larger problems than simply whether there are comment spammers. There is content that's getting missed. I could care less whether Feedster stopped working with blogger.

Bottom line

Rafer and Mud are in two different universes. There should be no surprise why Feedster [a] hasn't fixed their missed content problem, or [b] they are thinking about cutting off one of the sources of their content: Their analysis is flawed, as evidenced by the comments above.

Run! We can only speculate that there's more non-sense on the way. There appears to be an integration issue between PingOMatic and Feedster which Feedster either isn't aware, or has not resolved. Scott, are you related to Bob Wyman?
When I read things like this, I wonder which universe someone resides. Based on the public information I have, there is not basis for this statement, nor am I convinced there is any real data to back up the following comment:

Rafer's Comment

From the data that Feedster can gather publicly, Google, Yahoo, and Technorati historically crawl web pages (HTML) and largely ignore RSS feeds as unique sources of data for their search indices. Their crawlers look at the same web pages that you and I and other human beings do. A lot of the work they do from that point is to remove cosmetic information and redundancy to only include the information that is accretive to their relevancy metrics.


Archiving

If this is true, I find this somewhat surprising. I've got archived blogspots that google has picked up which neither Feedster nor Technorati have found. So I'm more than slightly skeptical that Technorati crawls anything other than current content. Again, this isn't Technorati's issue as it is well known, thus not basis to suddenly state that Technorati [is or is not better or worse] relative to Feedster. There's no new information.

I'm just surprised that given what David Sifry has apparently left unchallenged and what I've come to accept [that there are no historical changes to Technorati in the deep archives] that Feedster has concluded 180-degrees opposite.

I conclude there is no news, just a smokescreen. Perhaps someone has been taking lessons from Bob Wyman?

Feeds are pinged

Second, to suggest that these services "ignore RSS feeds" seems slightly strange. All this time, I'm pinging PingOMatic, and my Atom feeds is showing up just fine. How can one "ignore an RSS feed" and at the same time get that feed to register? Answer: This statement makes no sense.

Further, if Feedster did use Google for stats, then there would be no problem with incorrect links. But, there are problems. The links have historically not matched; and then there were changes, updates, and revisions.

It looks as though the issue is whether the service reliably can either intake the existing information from an outside service; or whether there's an internal problem on the service. I'm not convinced the interface necessarily works correctly. However, this isn't based on a problem with Google-Technorati-interface, but something that is going on with Technorati.

However whether Technorati [does, or does not] have a problem [a] internally or [b] reading Google is unrelated to whether [c] Technorati reads the feed. Technorati gets pinged when I ping PingOMatic; so I'm not clear that there is any basis to say that feeds are ignored.

Updates not picked up

Third, I'm not convinced that the services' web crawlers look at webpages or do anything with the updates. They simply take a ping from PingOMatic, and that's it. They don't crawl anything on the web; if they do, why are they not updating their old content despite the ping?

Besides, the purpose of having a ping was to get rid of these bots. Or are we suggesting that despite the ping, the bots are needed as backup? Wow, I find this incredible and bluntly unpursuasive.

If I update an old file, the services don't register the change, just the original ping. Deeply archived content remains un-updated [not changed in the service], until there's substantial user-end work on both publishing and ensuring the feed is actually showing up into the aggregator.

I may republish new content on the same original site, and the feed will get updated; but I see no evidence that Technorati then goes back and updates the original information in their archive. They may do it, but I'm not convinced that the deeply archived data-updates are registered or recorded.

Multi-platform, multi-channel comparisons

If Technorati compared two inputs [one from Google, and then compared it to the PingOMatic], there might be a case to suggest that there's a problem. But I'm not convinced this type of cross-platform comparison occurs either at Technorati or PubSub.

It could occur. But if it does occur at Technorati, then there's more to be understood why multiple pings go in, get reported, yet the links do not report; then at the same time there's historical data in Google that's not matching Technorati.

If Technorati did compare one of their platform's data with Google, then they'd have alarms going off. That there continues to be "no problem" [no alarms going off, still dropping content, and reporting things are fine] suggests that the checks with Google [if they exist] are not compared to the inputs from PingOmatic.

Of course, perhaps we have the opportunity to learn from PubSub's problem in re the apparent failure to quickly recognize there was a multi-channel-platform check that was not getting tested during the QA testing, but apparently was reported as a successful test.

Redundancy exists

Fourth, I'm not convinced that the services remove redundancy. Rather, the services, despite the ping, fail to display the content despite the ping and the open blogs. I've seen the same blog spots report multiple times: Once with an Atom feed, and another with a 2RSS feed, and a third time with a Feedburner Feed.

If the services got rid of the redundancy, then we have yet to explain why some aggregators [that use the same PingOMatic that Technorati relies upon] also report the same feeds three times, as if they were distinct.

Again, the services take the pings, report the content, but don not necessarily strip out the data based on content, but simply on URIs [maybe]. Again, whether there's a cross platform check on content is doubtful. I think they look only at URIs, not content.

Moreover, this assumes that the URIs are correct, and link to content. This has yet to be proven, not assumed.

Feedster's gaps

As a parting shot, if Feedster decides to exclude the bloggers from their pings [or whatever they want to do], it's not skin off my nose. For the most part Feedster doesn't capture my content. Despite the drops and gaps in Technorati, I'd use Technorati before Feedster.

Technorati and PingOMatic seem to have gotten the issues resolved, even sending over their personnel to PingOMatic. Other than spewing out apparent non-sense, what's Feedster really doing about these integration issues with both PingOMatic and the evolving FeedMesh?

Technorati's David Sifry has been able to get back on the issues. These blogs have been up for months. Why waiting until now to say something? You've had five [count them, 5] months to come up with a story, and this is the best you can do? Give me a break!

Oh, that's right. Scott, you use Feedster to monitor who is talking about Feedster.

Ladies and Gentlemen, I present to you the most excellent CEO of Technorati! Way to go, Dave...another example of why a great CEO continues to provide superior customer service. Do we need any other examples? I don't think so.

Translation: I don't bother checking with Feedster or PubSub any more for content--I have no basis to estimate what percentage of the content doesn't get picked up. I search with Google first, then Technorati/Blogdigger.

Phishing

As far as the phishing-spamming problem that Feedster has, you're free to review the various proposals in re G-Mail and tailor them to Feedster.

Also, there could be something set up that has a secondary check: Once the services get the feed's-ping, there could be a second round of confirming checks based on both IP, content, and the source of the feed to ensure the input to Feedster was bonafide. In my view, this is something that could get explored as a way to ensure the content was not spam before posting it, much less allowing it to pass out of PingOMatic/FeedMesh.

In my view Feedster has larger problems than simply whether there are comment spammers. There is content that's getting missed. I could care less whether Feedster stopped working with blogger.

Bottom line

Rafer and Mud are in two different universes. There should be no surprise why Feedster [a] hasn't fixed their missed content problem, or [b] they are thinking about cutting off one of the sources of their content: Their analysis is flawed, as evidenced by the comments above.

Run! We can only speculate that there's more non-sense on the way. There appears to be an integration issue between PingOMatic and Feedster which Feedster either isn't aware, or has not resolved. Scott, are you related to Bob Wyman?
" />