Monthly Archives: December 2013

Stackoverflow – lessons learned

Lessons Learned

This is a mix of lessons taken from Jeff and Joel and comments from their posts.

  • If you’re comfortable managing servers then buy them. The two biggest problems with renting costs were:
    1) the insane cost of memory and disk upgrades
    2) the fact that they [hosting providers] really couldn’t manage anything.

  • Make larger one time up front investments to avoid recurring monthly costs which are more expensive in the long term.

  • Update all network drivers. Performance went from 2x slower to 2x faster.

  • Upgrading to 48GB RAM required upgrading MS Enterprise edition.

  • Memory is incredibly cheap. Max it out for almost free performance. At Dell, for example, upgrading from 4G memory to 128G is $4378.

  • Stack Overflow copied a key part of the Wikipedia database design. This turned out to be a mistake which will need massive and painful database refactoring to fix. The refactorings will be to avoid excessive joins in a lot of key queries. This is the key lesson from giant multi-terabyte table schemas (like Google’s BigTable) which are completely join-free. This is significant because Stack Overflow’s database is almost completely in RAM and the joins still exact too high a cost.

  • CPU speed is surprisingly important to the database server. Going from 1.86 GHz, to 2.5 GHz, to 3.5 GHz CPUs causes an almost linear improvement in typical query times. The exception is queries which don’t fit in memory.

  • When renting hardware nobody pays list price for RAM upgrades unless you are on a month-to-month contract.

  • The bottleneck is the database 90% of the time.

  • At low server volume, the key cost driver is not rackspace, power, bandwidth, servers, or software; it is NETWORKING EQUIPMENT. You need a gigabit network between your DB and Web tiers. Between the cloud and your web server, you need firewall, routing, and VPN devices. The moment you add a second web server, you also need a load balancing appliance. The upfront cost of these devices can easily be 2x the cost of a handful of servers.

  • EC2 is for scaling horizontally, that is you can split up your work across many machines (a good idea if you want to be able to scale). It makes even more sense if you need to be able to scale on demand (add and remove machines as load increases / decreases).

  • Scaling out is only frictionless when you use open source software. Otherwise scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.

  • RAID-10 is awesome in a heavy read/write database workload.

  • Separate application and database duties so each can scale independently of the other. Databases scale up and the applications scale out.

  • Applications should keep state in the database so they scale horizontally by adding more servers.

  • The problem with a scale up strategy is a lack of redundancy. A cluster ads more reliability, but is very expensive when the individual machines are expensive.

  • Few applications can scale linearly with the number of processors. Locks will be taken which serializes processing and ends up reducing the effectiveness of your Big Iron.

  • With larger form factors like 7U power and cooling become critical issues. Using something between 1U and 7U might be easier to make work in your data center.

  • As you add more and more database servers the SQL Server license costs can be outrageous. So by starting scale up and gradually going scale out with non-open source software you can be in a world of financial hurt.

    Copied from



CodeIgniter – subqueries

On my latest work/project, with CodeIgniter, I need to use subqueries – a select inside another select.

This is a subquery library for CodeIgniter’s active record class. It lets you use active record methods to create subqueries in SQL queries. It supports SELECT, JOIN, FROM (and other statements, I guess). It also supports subqueries inside subqueries.

For that I had to download a subquery library for CodeIgniter’s available where

Load balacing – nginx and HAProxy

I had a scalability problem at work. Having about 3k customers with an average of 500 – 600 concurrent sessions and a middle aged all-in-one server, I have to balance http and https connections at least over 2 servers without changing the main IP address used by the client to start the application.

rock ssd


At the beginning, I evaluated Nginx, which is a well established http server (known for its outstanding capability on serving static resources) that has the capability to proxy requests to a pool of servers. I left this road quickly when I noticed that the proxy capability of Nginx does not supports (yet) sticky-sessions. In fact, when session affinity is required, Nginx can only route connection using source IP address as selection key, loosing the round-robin capability based on weight (that I need because I have server with different strength).


Looking for a balancer that supports weighted backend as long as affinity sessions, I foundHAProxy, which is considered the de-facto solution for this kind of problem. Again, evaluating this solution, I found something that bother me: HAProxy, in its stable version, can balance only http connection; it does not have support for https, which I need for my next version of the app.

Read more at:





HAProxy is really just a load balancer/reverse proxy. Nginx is a Webserver that can also function as a reverse proxy.

Here are some differences:


  • Does TCP as well as HTTP proxying (SSL added from 1.5-dev12)
  • More rate limiting options
  • The author answers questions here on Server Fault 😉


  • Supports SSL directly
  • Is also a caching server

At Stack Overflow we mainly use HAProxy with nginx for SSL offloading so HAProxy is my recommendation.

Read more at:


“Nginx is becoming the standard for front end load balancing for many high traffic sites and this helps.”

I’ve used nginx as a load balancer, and it’s not pretty. All of the nginx load-balancing modules I’ve used, or seen used (I can think of at least five off the top of my head), have fallen apart under load, or not balanced intelligently, or just been plain bad. The modules I’ve used also tend to be hard to instrument, which makes working out exactly why they’re failing a bit of an adventure. I’m sure (I’d hope, at least) the load balancing modules in nginx have improved over time, but I’d still be very wary of it. In short: nginx is a kick-arse webserver, and I highly recommend it for that purpose, but as a load balancer I’d find something else.

My preference is for IPVS almost everywhere, as it runs at the IP layer and completely avoids all the ugly problems you just can’t avoid with a proxy. If you do feel the need to use a proxy, though, I would strongly recommend HAProxy over nginx. There are (narrow) circumstances in which a proxy is the best solution for the job, and I think HAProxy is the best of the bunch.

Read more at:



A example of how to configure HAProxy and Nginx 

Number of lines of files in a folder and recursive

To make a estimation price of IonCube online encryption I had to count the number of lines of my php files inside a specific folder and respective subfolders.

Grabbed from PHP – How to count lines of code in an application – original from ircmaxell and updated by jasondavis.

Tnks Stackoverflow!

class Line_Counter
    private $filepath;
    private $files = array();

    public function __construct($filepath)
        $this->filepath = $filepath;

    public function countLines($extensions = array('php'))
        $it = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($this->filepath));
        foreach ($it as $file)
           // if ($file->isDir() || $file->isDot())
           if ($file->isDir() )
            $parts = explode('.', $file->getFilename());
            $extension = end($parts);
            if (in_array($extension, $extensions))
                $files[$file->getPathname()] = count(file($file->getPathname()));
        return $files;

    public function showLines()
        echo '<pre>';
        echo '</pre>';

    public function totalLines()
        return array_sum($this->countLines());


// Get all files with line count for each into an array
$loc = new Line_Counter('.');

echo '<br><br> Total Lines of code: ';
echo $loc->totalLines();


FFMpeg Benchmark – Effect of Threads and Bitrate on Image Quality


If you have been following along, we have recently been doing a series on command line video tools. Here I review a recent ffmpeg benchmark I performed. After reviewing the documentation and encoding a few sample videos, my questions were as follows:

  • How long does each of the various ffmpeg preset take for h264 encoding?
  • How much faster is the encoding with multiple threads?
  • For a given variable bitrate, does a given preset make a difference in image quality?
  • What is the optimal preset for ffmpeg?




CodeIgniter, PHP source code not compiled

I’v uploaded a CodeIgniter application from my localhost with Apache to a server running Nginx.
Its works perfectly on my localhost and on other server with Apache.
It’s under a subdomain, an domain and other subdomains are running PHP 100%.
This application in CI doesn’t start, and PHP is returned without being compiled.

This is what I get on /var/log/nginx/error.log:

2013/12/05 14:50:31 [error] 20139#0: *1 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Class 'M_website' not found in /home/webroot/ on line 303" while reading upstream, client:, server:, request: "GET /websites HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: ""

Why the problem?

CI files were starting with


and not with



Had to edit /etc/php5/fpm/php.ini and set short_open_tag from Off to On and restart php-fpm.

service php5-fpm restart


This is how I solved it… simple issue to solve.