Tuesday, November 3, 2009

The Attack Of The Clones

One noticeable spam problem became visible in late 2009, thanks to yet another Blogger error.

This error left a lot of bloggers (some legitimate and others actually spammers) reporting both confusion and indignation.

The confusion was seen, in two (seemingly) separate topics of indignation.

My blog seems to be locked, but when I fill out the CAPTCHA, I get
Your blog is not locked, and does not need review.
What do I do, now?

and

My blog is not spam! Honestly, how could Blogger legitimately call my blog spam?

Both of these reports, echoed by dozens of bloggers and spammers, show two sides of the same story.

The first report was typically about blogs that were righteously locked - yet a mistake caused the message

Your blog is not locked, and does not need review.

The second report exemplifies the despair and frustration felt by thousands of honest people who simply want to publish to their blogs. And there are dozens of similar, and dissimilar reports.

Note: This isn't fiction - I'm not stealing content from the Tom Clancy "Net Force" novels. This is real.

Spammers publish new blogs, constantly, using botnets.

Spammers, in order to publish one or two active splogs (blogs containing actual hacking, porn, and / or spam content), use the power of the Internet. Each splog publisher ("splogger"), who may have at his control thousands of hijacked computers, publishes thousands of blogs each day.

A spammer starts with empty blogs, which do not appear spammy.

A splogger starts by simply publishing empty blogs, in a constant, all day activity. You may see the empty blogs some time, if you go "Next Blog" surfing. They have a title, an archives gadget in the sidebar, and maybe one or two posts with random garbage. And a Profile gadget, with a link to the "owner", each "owner" with 10 blogs.

He then adds random content, scraped from the web - maybe, from your blog.

The splogger next takes one legitimate blog - in this case, your blog. He scrapes content from your blog, takes a dozen or so of his empty blogs, adds your content to each, and republishes each blog. Now, there are a dozen clones of your blog, plus your original, in BlogSpot.

He then adds spammy content, to a few of his blogs.

Next, the splogger takes a couple of his recently created clones, adds the payload - hacking, porn, and / or spam content, or possibly, links to other blogs or websites containing the payload - and republishes his now active splogs.

Each time, he just uses a couple blogs (cloned from your blog), and keeps the others as reserve. He has dozens of other splogs with active content out there too, using content scraped from other legitimate blogs. The clones of your blog, joining the clones of other blogs, are now active members of his splog farm.

As Blogger deletes discovered spammy blogs, the spammer uses more like yours.

As the splogs with active content are detected and removed by Blogger, the splogger simply activates other reserves, placing payload into each, in turn. By the time all of the reserves, that are clones of your blog, are used up, the well designed spam blog farm will have clones of still more legitimate blogs, ready to be activated.

A spam blog has a distinctive life cycle.

Considering this process, we see the 5 Stage life cycle, of each blog in the splog farm.

  1. Empty (just published).
  2. Reserve (previously empty, republished with scraped content added).
  3. Active (previously reserve, republished with payload added).
  4. Detected (previously active, locked by the Blogger Spam Classification bot).
  5. Deleted.

The plan here is that each splog will pass through each stage in the life cycle, in proper sequence. Since the splogger publishes thousands of splogs daily, if Blogger were to simply lock then delete the active splogs, the splogger simply activates the reserve splogs, as needed. This is good project management, by the sploggers.

Blogger shortens the life cycle, using fuzzy blog identification.

Blogger aims to shorten the splog life cycle. When the anti-spam bot detects an active splog, they search their database of suspected splogs, find similar blogs with no active content (the reserves), and lock the reserves too.

Unfortunately, when Blogger locks the reserve splogs, they are also going to lock your blog. Fuzzy spam detection techniques can't tell the difference between your blog, and the clones, because there is no difference.

If a spammer clones your blog, your blog will look just like a spam blog.

A successful clone is a non distinguisable replica of the original blog. Your blog looks like one of the reserve splogs. This leaves you, and bloggers like you, reporting

My blog is not spam! Honestly, how could Blogger legitimately call my blog spam?

If the spam blogs went untouched, the search engines would index your blog.

When you look at this differently, though, you see that Blogger is actually helping you. Were your blog, and the clones, to remain in BlogSpot undisturbed, the search engines would see the clones, and levy a huge duplicated content penalty on all aliases of the content - including your original blog.

If you are able to declare your blog as legitimate to Blogger, Blogger can later delete the sploggy clones - and the search engines will, hopefully, continue to index your blog, as legitimate and unique content.

In 2012, Blogger took the Next Step, and made "Next Blog" a less productive launching ground for splog farms. With blog owners carelessly deleting and renaming their blogs, however, the blog clone farm continues as a useful model for spam distribution.

Elm0D

Author & Editor

Has laoreet percipitur ad. Vide interesset in mei, no his legimus verterem. Et nostrum imperdiet appellantur usu, mnesarchum referrentur id vim.

0 comments:

Post a Comment

Navigate» Become author for this Blog

Manual Categories