Monthly Archives: July 2015

Handling 1 Billion requests a week with Symfony2

Some says that Symfony2, as every complex framework, is a slow one. Our answer’s that everything depends on you ;-) In that post, we’ll reveal some software architecture details of the Symfony2 based application running more than 1 000 000 000 requests every week.

(..)

Stack architecture

Application

The whole traffic goes to the HAProxy which distributes it to the application servers.

In front of the application instances stays Varnish Reverse Proxy.

We keep Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Distributing traffic through single Varnish would make it more risky. Having separate Varnish instances makes cache hits lower but we’re OK with that. We needed availability over a performance but as you could see from the numbers, even the performance isn’t a problem ;)

Application’s server configuration:

  • Xeon [email protected], 64GB RAM, SATA
  • Apache2 (we even don’t use nginx)
  • PHP 5.4.X running as PHP-FPM, with APC

Data storage

We use Redis and MySQL for storing data. The numbers from them’re also quite big:

  • Redis:
    • 15 000 hits/sec
    • 160 000 000 keys
  • MySQL:
    • over 400 GB of data
    • 300 000 000 records

We use Redis both for persistent storage (for the most used resources) and as a cache layer in front of the MySQL. The ratio of the storage data in comparison to the typical cache is high – we store more than 155.000.000 persistent-type keys and only 5.000.000 cache keys. So in fact you can use Redis as a primary data store:-)

Redis is configured with a master-slave setup. That way we achieve HA — during an outage we’re able to quickly switch master node with one of a slave ones. It’s also needed for making some administrative tasks like making upgrades. While upgrading nodes we can elect new master and than upgrade the previous one, at the end switch them again.

We’re still waiting for production-ready Redis Cluster which will give features like automatic-failover (and even manual failover which is great for e.g. upgrading nodes). Unfortunately there isn’t any official release date given.

MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. All tables are InnoDB and most queries are simple SELECT ... WHERE 'id'={ID} which return single result. We haven’t noticed any performance problems with such setup yet.

In contrast to the Redis setup, MySQL is running in a master-master configuration which besides of High Availability gives us better write performance (that’s not a problem in Redis as you likely won’t be able to exhaust its performance capabilities ;-) )

 

Read all about it @ http://labs.octivi.com/handling-1-billion-requests-a-week-with-symfony2/

 

PHP fetching webpage – file_get_contents – failed to open stream: Redirection limit reached, aborting

I’m my new venture, I need to fetch some webpages – public webpages -…
On a domain, the index page is parsed 100%, without any problem, but all the other pages weren’t returning me the HTML, in fact event CURL wasn’t returning me any error.

First it was protected against any fetch without a user agent defined.
After I’v work it out, I wasn’t getting any HTML source..

I’v decided to use file_get_contents to see if I got any error, since CURL wasn’t returning me any…
And I got it.

failed to open stream: Redirection limit reached, aborting
2015/07/29 14:28:22 [error] 25586#0: *6458641 FastCGI sent in stderr: "PHP message: PHP Warning: file_get_contents(http://www.domain.com/page/): failed to open stream: Redirection limit reached, aborting in /home/webroot/worker/testes.php on line 5" while reading response header from upstream, client: 84.91.69.69, server: www.gipsy.digitalwhores.net, request: "GET /worker/testes.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "gipsy.digitalwhores.net"

After a few searches I was able to solve it.
This is my PHP CURL.

function getPage ($url) {


$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
$timeout= 120;
$dir            = dirname(__FILE__);
$cookie_file    = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
$content = curl_exec($ch);
if(curl_errno($ch))
{
    echo 'error:' . curl_error($ch);
}
else
{
    return $content;        
}
    curl_close($ch);

}

The solution was in allowing cookies.

 

Shared it here http://stackoverflow.com/questions/12164196/warning-file-get-contents-failed-to-open-stream-redirection-limit-reached-ab/31704183#31704183

 

algolia – hosted cloud search as a service

 

LittleSnapper

 

Found Algolia the other day on https://cdnjs.com/.
Looks cool to try to look out it works…
They have a blog where they post some interesting articles about the service…

Moreover, Algolia is very easy to implement on your website as the company opted for a SaaS strategy. It means that you can implement the company’s search engine for database objects in just a few lines of code thanks to its hosted API, feed the service with JSON-formatted data, and customize it to your needs. After that, your users can start searching right away. They will interact with Algolia’s servers without ever leaving your site. With 12 different data centers across the world, Algolia tries to make the experience as responsive as possible for its users.

Source: http://techcrunch.com/2015/05/20/algolia-grabs-18-3-million-from-accel-for-its-search-api-on-steroids/

B-MobXdCUAARA6q

Some more readings

 

In Portugal we have a word similar to Algolia… and it isn’t good!

Continue reading algolia – hosted cloud search as a service

/home/jail is not a safe jail, check ownership and permissions.

My jailed user wasn’t connecting to the server via SFTP….
Had to see what was going on!!

root@digitalwhores:/home# tail -f /var/log/auth.log

auth.log looked like this…

Jul 23 19:47:55 digitalwhores systemd-logind[580]: New session 1307 of user sftpuser.
Jul 23 19:47:55 digitalwhores jk_chrootsh[18961]: path /home/jail is group writable
Jul 23 19:47:55 digitalwhores jk_chrootsh[18961]: path /home/jail is writable for others
Jul 23 19:47:55 digitalwhores jk_chrootsh[18961]: abort, /home/jail is not a safe jail, check ownership and permissions.
I had to 0755 the folder /home/jail/
Even that way user wasn’t being able to connect… what was auth.log saying?
Jul 23 19:50:07 digitalwhores jk_chrootsh[19034]: abort, path /home/jail/./home/sftpu is group writable, set option 'relax_home_group_permissions' to relax this check
I had to 0755 the folder /home/jail/home/sftpu
Recommend folders with 0755.
chmod 0755 /home
chmod 0755 /home/jail
chmod 0755 /home/jail/home
chmod 0755 /home/jail/home/**USERS**

 

 

Building a Fast Data Front End for Hadoop

Screencast  Building a Fast Data Front End for Hadoop.

Date: This event took place live on June 24 2015
Presented by: John Hugg
Duration: Approximately 60 minutes.
Cost: Free

Description:

Massive increases in both the volume and velocity of data have led to the development of interactive, real-time applications on fast streaming data. These fast data applications are often the front ends to Big Data (data at rest) and require integration between Fast + Big. To provide maximum value they require a data pipeline with the ability to compute real-time analytics on fast moving data, to make transactional decisions against state, and ultimately deliver data at high speeds to long-term Hadoop-based analytics stores like Cloudera, Hortonworks and MapR.

The new challenge is building applications that tap fast data and seamlessly integrate with the value contained in data stores — combining machine learning and dynamic processing. A variety of approaches are employed including Apache Storm, Spark Streaming, Samza, and in-memory database technology.

During this webcast you will learn:

  • The pros and cons of the various approaches used to create fast data applications
  • The pros and cons of Lambda and Kappa architectures compared to more traditional approaches
  • Understand the tradeoffs and advantages surrounding the resurgence of ACID and SQL
  • How integration with the Hadoop ecosystem can reduce latency and improve transactional intelligence

About John Hugg, Founding Engineer

John Hugg is a Software Developer at VoltDB. He has spent his entire career working with databases and information management. In 2008, he was lured away from a Ph.D. program by Mike Stonebraker to work on what became VoltDB. As the first engineer on the product, he liaised with a team of academics at MIT, Yale, and Brown who were building H-Store, VoltDB’s research prototype. Then he helped build the world-class engineering team at VoltDB to continue development of the open source and commercial products.