Friday, August 28, 2009

Drizzle and the Gearman logging plug-in

Disclaimer:
This blog post is about things I did on my own free time, not endorsed by my employer.

I have been meaning to look at Gearman for a long time, but I just couldn't find any project where I could use it.

Well, that was true until last week a couple of weeks ago, when I started to put together Drizzle, the Gearman logging plug-in, Perl and the Enterprise Monitor.

As I was finishing writing the agent.pl script, I thought that it would be a good idea to split the script in at least two components: one that would just collect the queries, and another component that would do the processing of the log entries (replacing the literals for "?", grouping queries by query text, etc).

It was right there when I realized that this was my chance to look at Gearman! The first thought was to still use the regular query logging plug-in.
But there is already a Gearman logging plugin, and I was curious about how that worked.

A quick Google search returned very little information, but I did find the doxygen docs, and reading the code was fairly straight forward.

By reading the code, I found out that the plug-in registers the function drizzlelog with the Gearman Job server, and that it passes the same string that the query logging plug-in sends to the log file.

Next step was to find a hello world Perl + Gearman example. And I found a sample for the client and the worker. That almost worked out of the box, but I got this error:

Can't call method "syswrite" on an undefined value at /Library/Perl/5.8.8/Gearman/Taskset.pm line 201.

A little google search and I found an example where the port was appended to the host. I then added the port 4730 to worker.pl and client.pl and it all worked as expected.

Once I got the simple example working, I added most of the agent.pl code to the worker.pl script, made a few small changes, and added comments. I was done!

The Gearman logging plugin sends query logs to the job server, and the job server asks the worker to do the actual job.
In the end, the service manager ends up with all the information related to the queries that go to the Drizzle server.

Layout.
For this initial version, one worker cannot handle jobs for more than one drizzle server, this is not a Gearman limitation. When I wrote this script, there was no way to tell the worker, which Drizzle server was sending the log entry.

And that was an excellent excuse to add a few more fields to the Gearman logging plugin. (That patch was already approved and will soon be part of the main Drizzle source.)



worker-1 handles requests for drizzled-1 and worker-2 handles jobs for drizzled-2. I am already looking into ways to change this.

Where is the code?
As usual, I posted the worker.pl script on the MySQL Forge.

How do I start the worker?
Like this:

$ DEBUG=1 perl worker.pl --serveruuid="22222222-5555-5555-5555-222222222211"\
--serverhostuuid="ssh:{11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:21}" \
--serverdisplayname="Main-Drizzle-web2"



How do I start the client?
In this case, the Gearman client is the drizzle plug-in, so, all you need to do is add these lines to your drizzle.cnf

$ cat /etc/drizzle/drizzled.cnf
[drizzled]
logging_gearman_host = 127.0.0.1
logging_gearman_enable = true


Restart the Drizzle server and you are ready to go (well, you also need the MySQL Enterprise Monitor)

Final Note.
I was amazed at how easy it was to have it all working, I will keep looking for other projects where I could use Gearman.

2 comments:

  1. This is very cool! Thanks for hacking all this together and the patches back to Drizzle. I'm glad to see folks starting to see the power of Gearman for this type of processing. It makes this kind of analysis much lighter weight since it doesn't need to ever hit disk.

    ReplyDelete
  2. Thanks Eric, I'm really enjoying hacking around Drizzle and Gearman.

    ReplyDelete

Vote on Planet MySQL