How to check your new dedicated server?

September 16th, 2007 by Harun Yayli | 2 Comments »

You’ve just bought a new dedicated server, what should you do before moving?
Check what is promissed and what is given!

  1. Check the CPU
    # dmesg | grep "CPU:"
    CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2813.54-MHz 686-class CPU)
  2. Memory
    #dmesg | grep memory
    real memory = 1072627712 (1022 MB)
    avail memory = 1040084992 (991 MB)
  3. Hard Disk
    #di -th | grep Total
    Total 104.4G 1.2G 94.8G 9%
  4. Number of IPs (I’ve masked my ips with ?)

    #ifconfig -a | grep inet
    inet ??.???.??.?? netmask 0xffffffe0 broadcast ??.???.?.?
    inet ??.???.??.?? netmask 0xffffffe0 broadcast ??.???.?.?
    inet ??.???.??.?? netmask 0xffffffe0 broadcast ??.???.?.?
    inet 127.0.0.1 netmask 0xff000000

Hack a GOV for black hat SEO

August 11th, 2007 by Harun Yayli | No Comments »

It was weird when I see a keyword, that I’ve regularly watched, poped up at a .gov site.

Some blackhats, hoping to get the page rank benefits of a .gov site, hacked the forum of the site and placed a gateway like page. I guess search results come from google as well.

Check it out yourself.
nevadacityca.gov

PS: the keyword I was looking was the name of one of my sites, not cealis lol.

Immunity on Duplicate Content

June 5th, 2007 by Harun Yayli | No Comments »

I’m trying to understand how google handles the duplicate contents and wondering why some sites are luckier than others, almost having an immunity for duplicate content issues.
Some collective sites like answers.com is seems to be immune to the duplicate content penalties.
See this page about smokey bear(PR4). Correct me if I’m wrong but all I’m seeing is a rip of wikipedia page(PR6) along with lots and lots of advertisements.
Who can really say that this page benefits surfers when wikipedia page is exactly the same thing. I guess answers.com is taking this duplication tactics to the roof. Google is also helping them to drive traffic by hardlinking to their content as the “definition” reference.
I’m expecting no comments from Matt Cutts :), just thinking laudly.

Missing categories in Google’s Directory

June 4th, 2007 by Harun Yayli | No Comments »

I was checking for the Free Content category on DMOZ.org and I saw it’s updated. I also tried the check the google version. Big surprise! The category does not exist in google’s dmoz copy.

I’m not sure if this is on purpose, but it’s most likely that google is also moderating the biggest moderated directory on the net. Hehe that’s a good one.

See it yourself (I’m not sure for how long this links will be alive)
Dmoz link : updated around June 4th
Google’s Link: doesn’t exist as of June 5th

Also I’ve found if you search for url that exist on dmoz’s certain category, google returns a result but the link is simply for oh for.
See the first link.

Last notes: Yahoo’s directory doesn’t have the category too. I kind of understand it since the category is under Business branch and it’s a paid service for yahoo to appear under that branch.

Webmaster’s World Gave me a good laugh - sitemaps again

May 6th, 2007 by Harun Yayli | No Comments »

I was reading a post at webmaster’s world, and started laughing after reading the opener:

Are Scrapers Exploiting the sitemap.xml file?

Many people seem to posting saying after adding sitemaps they are suffering a problem with content. Could sitemaps.xml be beeing abused. Is the new content title and meta tag scraped before the sitemap is submitted to google by sitemap generators?

Read more…

Full text index and other complex indexes together

April 30th, 2007 by Harun Yayli | 1 Comment »

Always remember, if you’re using MyIsam tables, KEEP THE INDEX AS SMALL AS POSSIBLE.

That means, your full text index will be huge and you don’t want to keep them together with other types of index for faster queries.

Simply create another table to have the text fields and full text indexes and left join. It works way faster for me on a table with 230K rows.

Data take about 500MB and the indexes take about 700MB. I don’t think I over indexed the table because of the delicate business logic. Even for a simple 1 field query (with normal index on it) it was taking more than 30secs. I removed the table with out the full text indexes and now it takes 1 sec.
Wow, a good thing to remember. A mental note to myself again.

Thoughts Aside:
The process of data security is the best computer backup solution and it is the way of ensuring that backup computer files are kept protected from corruption with free software and that access to it is properly controlled on a back up server. It also facilitates the process of online file sharing. The globalization of data recovery hardware networking is a facility with the help of which your stock can be sourced from anywhere in the world by doing up with your online storage.

More sitemap issues!

April 19th, 2007 by Harun Yayli | No Comments »

It’s a jungle out there!
Now I realized, yahoo is indexing the sitemap directly at the search results!!!!!!!!
this is redicilous.
Check this link the last entry is from my sitemap.

Ask.com and Their bot!

April 18th, 2007 by Harun Yayli | No Comments »

After their initiative to put the sitemaps into the robots.txt I recently posted about how to identify robots to server the sitemap or not. I believe it’s extremely important for the webmaster’s to protect themselves from site scrapers. Sitemaps in the robots.txt is like a highway sign pointing the easy way to scrape a site.
With this idea in my mind, I also added my sitemap a small snippet to check the bot if it’s from a company that I like to serve the sitemap.
Read more…

Sitemaps in the robots.txt Happy Harvesting

April 11th, 2007 by Harun Yayli | No Comments »

I’ve just read the Google Webmaster’s blog about the news on ask.com supporting Sitemaps.org’s sitemap format.
This is really a great news for all the people that like to be crawled faster and acurately.
For me the more interesting part about this news is that sitemaps.org’s proposal to include sitemaps into the robots.txt.

Simply you add a line into your robots.txt saying

Sitemap: <sitemap_location></sitemap_location>

This part is really cool but for site harvesters it’s an unbelivable tool. So you can handover the key to your site and web harvesters can crawl your site really easily because probably you’ve put all your site’s pages into your sitemap.

Sounds like a good plan in an ideal world. With all the cloakers and content scrapers you must be really smart not to be ripped apart.

My suggestion is to know who you’re serving the sitemap. Currently Google, Yahoo and Ask is supporting this sitemaps.xml and no other site has anything to do with it.
Here is a simple check you can add in the begining of your sitemap thing:

< ?php
    function botIsAllowed($ip){
        //get the reverse dns of the ip.
        $host = strtolower(gethostbyaddr($ip));
        $botDomains = array('.inktomisearch.com',
                                     '.googlebot.com',
                                     '.ask.com',
                             );

        //search for the reverse dns matches the white list
        foreach($botDomains as $bot){
            if (strpos(strrev($host),strrev($bot))===0){
                $qip= gethostbyname($host);
                return ($qip==$ip);
            }
        }
        return false;
}

if (!botIsAllowed($_SERVER['REMOTE_ADDR'])){
    echo "Banned!";
    exit;
}
?>

I’m sure everyone can get the idea of reverse dns and forward dns checking.
If I missed any decent site that uses the sitemaps let me know.

Note: If you’re still using static sitemaps (!) you can just include the xml after the code.

Goatse Can Get You Jailtime in the US, ouch!

April 9th, 2007 by Harun Yayli | No Comments »

I’ve just read a post about the goatse man can cause you get a jail time in the US for posting it on a board or site with a fake title. I’ll not link here the infamous photo here sorry :)

Here is the related US code:
Read more…