September 30, 2011
Background:
I inherited a legacy version of the hadoop stack (0.20.2) running hbase v. 0.20.3 on top.
As this is quite an old and unsupported version, we obviously wanted to upgrade.
The main problem with this is that this system is already in use in production, thus increasing the upgrade risks. In addition, the instance installed was compiled and patched manually, so an in-place upgrade was out of the question.
For the new cluster, installed side by side the old one, I chose the “Cloudera Distribution including Apache Hadoop (CDH3)” which is very easy to install and set up via aptitude, and prepares all the standardized stuff like config file locations and init scripts.
This should also hopefully enable relatively painless updates in the future.
Sematext already has a nice blog post on the subject of various HBase backup options, so I won’t go over them in too much detail.
All I will say is that the built-in CopyTable MR job is out of the question since it only works with two clusters of the same major version of HBase. Otherwise life would be much easier…
Continue Reading »
tags: backup, cloudera, hadoop, hbase, python, thrift
posted in scripts by tom | No Comments
March 27, 2011
This is just a short guide on how to find the main offenders in case of web server hammering.
Sample of eventual output:
netstat -natp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n | tail
25 195.150.23.130
25 67.222.164.140
28 95.34.20.117
31 72.45.232.204
34 209.56.4.6
36 64.27.200.208
106 50.17.245.114
112 209.234.226.230
247 216.242.75.236
283 184.106.21.219
Continue Reading »
tags: apache, awk, bash, cut, grep, netstat, nginx, one-liners, scripts, server, sort, uniq
posted in one-liners by tom | No Comments
March 22, 2011

Holy Crap
Nothing technically significant there – but since this is the first time I’ve been exposed to such traffic – here is a shout out to people who referred here.
Thanks Ruby Weekly and thanks to the Ruby Show.
Another special thanks to the commenters here and on reddit – provided some very interesting technical insights. It is always humbling and informative to be peer reviewed by such experienced people.
The funny thing is that I’m not even a Ruby guy – I guess several people found the system side of things interesting as well.
Glad to be of help in any case, and will do my best to add interesting content to the site.
tags: graph, meta, ruby, thanks
posted in meta by tom | No Comments
March 10, 2011
Ruby, the ever-so-popular scripting language and Rails – one of the leading web application frameworks are prone to serious performance issues if not served efficiently.
Enter unicorn, a high performance replacement for mongrels and such.
And of course, my goto tool for all things http: nginx.
This is a short step-by-step detailed guide to achieving the following:
- RoR application listening on 2007 via unicorn
- nginx listening on 80 forwarding all dynamic requests to the unicorn, and serving static files.
The nginx and the ruby are compiled from source, to provide the latest versions (1.9.2-p136 and 0.8.54 respectively) rather than the ones available via apt (for Ubuntu 10.10 that means: nginx 0.7.67 and ruby 1.8).
Continue Reading »
tags: /etc/init.d/, bash, gem, nginx, rails, rake, RoR, ruby, scripts, sys-v, Ubuntu, unicorn
posted in scripts by tom | 7 Comments
March 1, 2011
Ok so it all started like this:
Bunch of web servers, all the same model (Dell R710), all functioning properly – however two of them are faster then the others (40% lower response time AND lower load average).
Average connections to all the machines was identical.
- Hardware: Identical
- Software: Identical (to eliminate this as an issue, a disk was removed from the “fast” server RAID and placed in the “slow” server as a primary drive, and the array was rebuilt with the exact same data)
So it wasn’t a hardware issue (as far as the specs and components could show) and it wasn’t a software issue (same applications and code were the same on both machines).
Just a 40% performance discrepancy…
Continue Reading »
tags: /etc/init.d/, bash, chkconfig, daemon, dell, for loop, power management, server
posted in hardware by tom | No Comments
February 11, 2010
Image the following scenario:
cd /etc
chown root.root * -R
OOPS
You have just destroyed the server.
This, and other similar mishaps (i.e. chmod 777 / -R) occur more often than one would imagine.
If you are in luck like me, you have access to more than one linux server, and in most cases these servers are similar enough in order to enable a quick restoration of the many file permissions you just destroyed.
Continue Reading »
tags: accidentally, bash, chmod, chown, permissions, scripts, sync
posted in scripts by tom | No Comments
January 11, 2010
It has become quite commonplace today, for high traffic sites to require more than one web server.
Here is a small script in charge of synchronizing the server configuration files (/etc/httpd in this example).
In this case, I will be using apache as the web server of choice, simply because of its prevalence and popularity, however the idea is the same, and the script could probably be easily edited to fit your web server of choice (nginx, lighttpd or what have you).
Continue Reading »
tags: apache, bash, cluster, scripts
posted in scripts by tom | No Comments
January 9, 2010
As promised, here is short example of the Daemon in operation.
Continue Reading »
tags: daemon, example, mysql, PHP, python, scripts, XML
posted in scripts by tom | No Comments
January 5, 2010
The problem that this program was designed to solve is a rapidly changing MySQL table in use by a high-traffic website.
For instance, stock quotes on the front page of a bank’s or investments firm’s site.
The data is constantly updated by a service on the backend, and is referred to by some ajax widget or by impatient client, repeatedly refreshing the page (F5-F5-F5-F5-F5).
In order to prevent hammering of the database with SELECT queries, the scripts creates a static page every few seconds, so that users get a relatively updated version of the data (near-live data = good enough).
I chose python for the application, with bash as a wrapper for the daemon init script.
The MySQL data is retrieved with the help of the MySQLdb library (known as python-mysqldb in aptitude repos), also used in a previous script of mine.
The init script is a chkconfig compatible script, although it could be easily modified to a debian/ubuntu styled script as well.
For your convenience, here are all the source files in a tarball in addition to posting the actual code.
Continue Reading »
tags: /etc/init.d/, bash, chkconfig, daemon, init, mysql, python, scripts, sys-v
posted in scripts by tom | No Comments
November 30, 2009
Don’t ask me why scripting in batch is still necessary on the verge of the year 2010, however, what can you do when some clients still use Windows to host their MySQL servers…
REM Simple MySQL backup script per database
@echo off
REM Set some variables
set mysqlcmd="C:\Program Files\MySQL\MySQL Server 5.0\bin\mysql.exe"
set mysqlpwd=amazingsecretsamplepassword
set mysqlconnect=
%mysqlcmd% -u root --password=
%mysqlpwd%
set mysqldumper="C:\Program Files\MySQL\MySQL Server 5.0\bin\mysqldump.exe" -v -u root --password=
%mysqlpwd%
set backupdir="d:\backup_mysql"
set logfile="d:\installs\backup_script\backup_log"
REM Loop over list of databases and dump
date /t
> %logfile%
time /t
>> %logfile%
echo Starting Script Run
>> %logfile%
for /f
%%i in ('"
%mysqlconnect% -e "show databases" --skip-column-names"'
) do (
echo ----------------------------------------------
>> %logfile%
date /t
>> %logfile%
time /t
>> %logfile%
echo Now handling
%%i >> %logfile%
%mysqldumper% %%i > %backupdir%\
%%i.sql 2
>> %logfile%
)
As this was a very quick and dirty script, I won’t go into detail, as it should be quite self explanatory – I think the only interesting part here is the stderr redirection to the logfile, that actually works in windows as should be expected!
See you soon!
tags: batch, mysql, scripts
posted in scripts by tom | 2 Comments