Archive for the 'Internet' Category

IMDB’s weighted rank

on Sunday, July 28th, 2013

I was cleaning old files from my computer and bumped into this note:

weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the top 50
(currently 200)
C = the mean vote across the whole report
(currently 6.9)

Above (from years ago) what used to be the weighted ranking formula for IMDB’s top X movies.
Needed to note this somewhere before deleting the file.
There you have it.

Designing for Social Traction

on Monday, September 28th, 2009

Extremely nice approach. It’s a nice tool to teach people about social web

Source

CPM and Ajax (a.k.a New Metrics)

on Sunday, January 27th, 2008

I was trying not to write about facebook but it came to a point that it’s a good example for my post.

Classic web advertising for banners (in some cases text links) are still paid by impressions. Like the banners on the left column of Facebook pages, everytime user changes a page, an ad is shown, and the advertiser is charged per 1000 impressions. Everybody knows that there is nothing interesting with this.

Facebook, some weeks ago has changed their photo gallery to an ajax photo gallery.
Everytime you see a photo and start browsing to the other photos, without refreshing the page, an Ajax call was done and new photo was shown on the very same page. There was no refresh, therefore the rest of the page -as well as the banner on the left- were not changed.

Facebook recently switched back to their old way. One photo is 1 page view again. I wondered why? It was really obvious. They’ve lost major page views with just this change. On the old system, people were rapidly viewing photos spending (in most cases) no more than 15 seconds per photo. For an album of 20 photos, it was an easily generated 20 page views for them. New system, slowed down the page views but now the users were spending more time on the same page with the same banner. Still with the old metric, 1 page view for 20 photos.
Read more…

4C + V + P = ? and why?

on Thursday, November 15th, 2007

I was reviewing the formula 4C + V + P = Web 3.0

This is what I think:
Read more…

Hack a GOV for black hat SEO

on Saturday, August 11th, 2007

It was weird when I see a keyword, that I’ve regularly watched, poped up at a .gov site.

Some blackhats, hoping to get the page rank benefits of a .gov site, hacked the forum of the site and placed a gateway like page. I guess search results come from google as well.

Check it out yourself.
nevadacityca.gov

PS: the keyword I was looking was the name of one of my sites, not cealis lol.

Missing categories in Google’s Directory

on Monday, June 4th, 2007

I was checking for the Free Content category on DMOZ.org and I saw it’s updated. I also tried the check the google version. Big surprise! The category does not exist in google’s dmoz copy.

I’m not sure if this is on purpose, but it’s most likely that google is also moderating the biggest moderated directory on the net. Hehe that’s a good one.

See it yourself (I’m not sure for how long this links will be alive)
Dmoz link : updated around June 4th
Google’s Link: doesn’t exist as of June 5th

Also I’ve found if you search for url that exist on dmoz’s certain category, google returns a result but the link is simply for oh for.
See the first link.

Last notes: Yahoo’s directory doesn’t have the category too. I kind of understand it since the category is under Business branch and it’s a paid service for yahoo to appear under that branch.

Webmaster’s World Gave me a good laugh – sitemaps again

on Sunday, May 6th, 2007

I was reading a post at webmaster’s world, and started laughing after reading the opener:

Are Scrapers Exploiting the sitemap.xml file?

Many people seem to posting saying after adding sitemaps they are suffering a problem with content. Could sitemaps.xml be beeing abused. Is the new content title and meta tag scraped before the sitemap is submitted to google by sitemap generators?

Read more…

Ask.com and Their bot!

on Wednesday, April 18th, 2007

After their initiative to put the sitemaps into the robots.txt I recently posted about how to identify robots to server the sitemap or not. I believe it’s extremely important for the webmaster’s to protect themselves from site scrapers. Sitemaps in the robots.txt is like a highway sign pointing the easy way to scrape a site.
With this idea in my mind, I also added my sitemap a small snippet to check the bot if it’s from a company that I like to serve the sitemap.
Read more…

Sitemaps in the robots.txt Happy Harvesting

on Wednesday, April 11th, 2007

I’ve just read the Google Webmaster’s blog about the news on ask.com supporting Sitemaps.org’s sitemap format.
This is really a great news for all the people that like to be crawled faster and acurately.
For me the more interesting part about this news is that sitemaps.org’s proposal to include sitemaps into the robots.txt.

Simply you add a line into your robots.txt saying

Sitemap: <sitemap_location></sitemap_location>

This part is really cool but for site harvesters it’s an unbelivable tool. So you can handover the key to your site and web harvesters can crawl your site really easily because probably you’ve put all your site’s pages into your sitemap.

Sounds like a good plan in an ideal world. With all the cloakers and content scrapers you must be really smart not to be ripped apart.

My suggestion is to know who you’re serving the sitemap. Currently Google, Yahoo and Ask is supporting this sitemaps.xml and no other site has anything to do with it.
Here is a simple check you can add in the begining of your sitemap thing:

< ?php
    function botIsAllowed($ip){
        //get the reverse dns of the ip.
        $host = strtolower(gethostbyaddr($ip));
        $botDomains = array('.inktomisearch.com', 
                                     '.googlebot.com', 
                                     '.ask.com',
                             );
        
        //search for the reverse dns matches the white list
        foreach($botDomains as $bot){
            if (strpos(strrev($host),strrev($bot))===0){
                $qip= gethostbyname($host);
                return ($qip==$ip);
            }
        }
        return false;
}

if (!botIsAllowed($_SERVER['REMOTE_ADDR'])){
    echo "Banned!";
    exit;
}
?>

I’m sure everyone can get the idea of reverse dns and forward dns checking.
If I missed any decent site that uses the sitemaps let me know.

Note: If you’re still using static sitemaps (!) you can just include the xml after the code.

Goatse Can Get You Jailtime in the US, ouch!

on Monday, April 9th, 2007

I’ve just read a post about the goatse man can cause you get a jail time in the US for posting it on a board or site with a fake title. I’ll not link here the infamous photo here sorry :)

Here is the related US code:
Read more…