upstream sent too big header while reading response header from upstream

While I was running some scripts, of my new project, from time to time the PHP kinda had some breaks…. went to error.log and I saw the following error.

2015/08/02 19:42:19 [error] 25586#0: *8735692 upstream sent too big header while reading response header from upstream, client: 84.91.69.69, server: www.flow.domain.com, request: "GET /worker/?action=runHTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "flow.domain.com", referrer: "http://flow.domain.com/worker/?action=flushall"

I had to edit my nginx domain.conf and add the lines in bold!

location ~ \.php$ {
 try_files $uri =404;
 fastcgi_split_path_info ^(.+\.php)(.*)$;
 fastcgi_pass unix:/var/run/php5-fpm.sock;
 fastcgi_index index.php;
 include fastcgi_params;
 fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
 access_log off;
 fastcgi_buffers 16 16k;
 fastcgi_buffer_size 32k;
}

OK!
Save it and restart nginx!
Should solve it! :)

redis, some readings…

Storing hundreds of millions of simple key-value pairs in Redis

instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value

The Instagram Architecture Facebook Bought For A Cool Billion Dollars

http://highscalability.com/blog/2012/4/9/the-instagram-architecture-facebook-bought-for-a-cool-billio.html

The Architecture Twitter Uses To Deal With 150M Active Users, 300K QPS, A 22 MB/S Firehose, And Send Tweets In Under 5 Seconds

http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html

Using Redis as a Secondary Index for MySQL (sorted sets)

http://code.flickr.net/2013/03/26/using-redis-as-a-secondary-index-for-mysql/

Highly Available Real Time Push Notifications and You

http://code.flickr.net/2012/12/12/highly-available-real-time-notifications/

Handling 1 Billion requests a week with Symfony2

Some says that Symfony2, as every complex framework, is a slow one. Our answer’s that everything depends on you ;-) In that post, we’ll reveal some software architecture details of the Symfony2 based application running more than 1 000 000 000 requests every week.

(..)

Stack architecture

Application

The whole traffic goes to the HAProxy which distributes it to the application servers.

In front of the application instances stays Varnish Reverse Proxy.

We keep Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Distributing traffic through single Varnish would make it more risky. Having separate Varnish instances makes cache hits lower but we’re OK with that. We needed availability over a performance but as you could see from the numbers, even the performance isn’t a problem ;)

Application’s server configuration:

  • Xeon [email protected], 64GB RAM, SATA
  • Apache2 (we even don’t use nginx)
  • PHP 5.4.X running as PHP-FPM, with APC

Data storage

We use Redis and MySQL for storing data. The numbers from them’re also quite big:

  • Redis:
    • 15 000 hits/sec
    • 160 000 000 keys
  • MySQL:
    • over 400 GB of data
    • 300 000 000 records

We use Redis both for persistent storage (for the most used resources) and as a cache layer in front of the MySQL. The ratio of the storage data in comparison to the typical cache is high – we store more than 155.000.000 persistent-type keys and only 5.000.000 cache keys. So in fact you can use Redis as a primary data store:-)

Redis is configured with a master-slave setup. That way we achieve HA — during an outage we’re able to quickly switch master node with one of a slave ones. It’s also needed for making some administrative tasks like making upgrades. While upgrading nodes we can elect new master and than upgrade the previous one, at the end switch them again.

We’re still waiting for production-ready Redis Cluster which will give features like automatic-failover (and even manual failover which is great for e.g. upgrading nodes). Unfortunately there isn’t any official release date given.

MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. All tables are InnoDB and most queries are simple SELECT ... WHERE 'id'={ID} which return single result. We haven’t noticed any performance problems with such setup yet.

In contrast to the Redis setup, MySQL is running in a master-master configuration which besides of High Availability gives us better write performance (that’s not a problem in Redis as you likely won’t be able to exhaust its performance capabilities ;-) )

 

Read all about it @ http://labs.octivi.com/handling-1-billion-requests-a-week-with-symfony2/

 

PHP fetching webpage – file_get_contents – failed to open stream: Redirection limit reached, aborting

I’m my new venture, I need to fetch some webpages – public webpages -…
On a domain, the index page is parsed 100%, without any problem, but all the other pages weren’t returning me the HTML, in fact event CURL wasn’t returning me any error.

First it was protected against any fetch without a user agent defined.
After I’v work it out, I wasn’t getting any HTML source..

I’v decided to use file_get_contents to see if I got any error, since CURL wasn’t returning me any…
And I got it.

failed to open stream: Redirection limit reached, aborting
2015/07/29 14:28:22 [error] 25586#0: *6458641 FastCGI sent in stderr: "PHP message: PHP Warning: file_get_contents(http://www.domain.com/page/): failed to open stream: Redirection limit reached, aborting in /home/webroot/worker/testes.php on line 5" while reading response header from upstream, client: 84.91.69.69, server: www.gipsy.digitalwhores.net, request: "GET /worker/testes.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "gipsy.digitalwhores.net"

After a few searches I was able to solve it.
This is my PHP CURL.

function getPage ($url) {


$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
$timeout= 120;
$dir            = dirname(__FILE__);
$cookie_file    = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
$content = curl_exec($ch);
if(curl_errno($ch))
{
    echo 'error:' . curl_error($ch);
}
else
{
    return $content;        
}
    curl_close($ch);

}

The solution was in allowing cookies.

 

Shared it here http://stackoverflow.com/questions/12164196/warning-file-get-contents-failed-to-open-stream-redirection-limit-reached-ab/31704183#31704183

 

algolia – hosted cloud search as a service

 

LittleSnapper

 

Found Algolia the other day on https://cdnjs.com/.
Looks cool to try to look out it works…
They have a blog where they post some interesting articles about the service…

Moreover, Algolia is very easy to implement on your website as the company opted for a SaaS strategy. It means that you can implement the company’s search engine for database objects in just a few lines of code thanks to its hosted API, feed the service with JSON-formatted data, and customize it to your needs. After that, your users can start searching right away. They will interact with Algolia’s servers without ever leaving your site. With 12 different data centers across the world, Algolia tries to make the experience as responsive as possible for its users.

Source: http://techcrunch.com/2015/05/20/algolia-grabs-18-3-million-from-accel-for-its-search-api-on-steroids/

B-MobXdCUAARA6q

Some more readings

 

In Portugal we have a word similar to Algolia… and it isn’t good!

Continue reading algolia – hosted cloud search as a service