If you are new you might want to start with this article.          Vote in the Poll, I need your help. =D            Please register and leave comments!

 

Newsfeeds
Planet MySQL
Planet MySQL - http://www.planetmysql.org/

  • Micro-benchmarking pthread_cond_broadcast()
    In my work on group commit for MariaDB, I have the following situation: A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads. The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait(): bool wakeup= false; pthread_cond_t wakeup_cond; pthread_mutex_t wakeup_mutex Waiter: pthread_mutex_lock(&wakeup_mutex); while (!wakeup) pthread_cond_wait(&wakeup_cond, &wakeup_mutex); pthread_mutex_unlock(&wakeup_mutex); // Continue processing after group commit is now done. Group leader: pthread_mutex_lock(&wakeup_mutex); wakeup= true; pthread_cond_broadcast(&wakeup_cond); pthread_mutex_unlock(&wakeup_mutex); Note the association of the condition with a mutex. This association is inherent in the way pthread condition variables work. The mutex must be locked when calling into pthread_mutex_wait(), and will be obtained again before the call returns. (Check the man page for pthread_cond_wait() for details). Now, when I think about how these condition variables work, something strikes me as somewhat odd. The idea is that the broadcast signals every waiting thread to wake up. However, because of the associated mutex, only one thread will actually be able to wake up; this thread will obtain a lock on the mutex, and all other to-be-awoken threads will now have to wait for this mutex! Only after the first thread releases this mutex will the next thread wakeup holding the mutex, then after releasing the third thread can wake up, and so on. So if we have say 100 threads waiting, the last one will have to wait for the first 99 threads to each be scheduled and each release the mutex, one after the other in a completely serialised fashion. But what I really want is to just let them all run at once in parallel (or at least as many as my machine has spare cores for). There is another way to achieve this, by simply using a separate condition and mutex for each thread, and have the group leader signal each one individually: Waiter: pthread_mutex_lock(&me->wakeup_mutex); while (!me->wakeup) pthread_cond_wait(&me->wakeup_cond, &me->wakeup_mutex); pthread_mutex_unlock(&me->wakeup_mutex); Group leader: for waiter in <all waiters> pthread_mutex_lock(&waiter->wakeup_mutex); waiter->wakeup= true; pthread_cond_signal(&wakeup_cond); pthread_mutex_unlock(&wakeup_mutex); This way, every waiter is free to start running as soon as woken up by the leader; no waiters have to wait for one another. This seems advantageous, especially as number of cores increases (rumours are that 48 core machines are becoming commodity). "Seems" advantageous. But is it really? Let us micro-benchmark it. For this, I start up 5000 threads. Each thread goes to wait on a condition, either a single shared one, or distinct in each thread. The main program then signals the threads to wakeup, either with a single pthread_cond_broadcast(), or with one pthread_cond_signal() per thread. Each thread records the time they woke up, and the main program collects these times and computes how long it took between starting to signal the condition(s) and wakeup of the last thread. (Here is the full C source code for the test program). I ran the program on an Intel quad Core i7 with hyperthreading enabled, the most parallel machine I have easy access to. The results is the following: pthread_cond_broadcast(): 46.9 msec pthread_cond_signal(): 17.6 msec Conclusion: pthread_cond_broadcast() is slower, as I speculated. I would expect the effect to be more pronounced on systems with more cores; it would be interesting if readers with access to such systems could try the test program and comment below on the results.

  • My Opinion on NoSQL DBs
    I'll let the following express my opinion about NoSQL and..

  • Why MySQL replication is better than mysqlbinlog for recovery
    You have a backup, and you have the binary logs between that backup and now. You need to do point-in-time recovery (PITR) for some reason. What do you do? The traditional answer is “restore the backup and then use mysqlbinlog to apply the binary logs.” But there’s a much better way to do it. The better way is to set up a server instance with no data, and load the binary logs into it. I call this a “binlog server.” Then restore your backup and start the server as a replication slave of the binlog server. Let the roll-forward of the binlogs happen through replication, not through the mysqlbinlog tool. Why is this better? Because replication is a more tested way of applying binary logs to a server. The results are much more likely to be correct, in my opinion. Plus, replication is easier and more convenient to use. You can do nice things like START SLAVE UNTIL, skip statements, stop and restart without having to figure out where you left off, and so on. Replication also has the ability to correctly reproduce more types of changes than mysqlbinlog does. Try this with statement-based replication: insert into tbl(col) values(connection_id()); That’ll work just fine through replication, because the SQL thread on the slave will change its connection ID to match the original. It won’t work through mysqlbinlog. Related posts:MySQL disaster recovery by promoting a slaveProgress on High Performance MySQL Backup and Recovery chapterHow MySQL replication got out of syncHigh Performance MySQL, Second Edition: Backup and RecoveryHow to make MySQL replication reliable

  • Cassandra and Ganglia
    I finally got some time to do some house cleaning. One of my nagging low-hanging fruit jobs was to stop using jconsole as my monitor. I created a ganglia script to graph what is above. The image illustrated above I am showing all the Cassandra servers and their total row read stages completed in the last hour as a gauge. In essence I am graphing the delta of the change between ganglia script runs.How I have it set up is:All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as followscass_{stat_class}_{key}stat_class - tpc, tpp, tpa means complete, pending, active respectivelykey - would be message deserialization for instance.For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:If you’re interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).GmetricDelegate is the parent packageGmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.As you can see the pattern I also haveGmetricInnoDBGmetricMySQLand so on.then on each server I run/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandrathis then talks to Ganglia through gmetric to report the stats.

  • MySQL Cluster: 5 Steps to Getting Started, then 5 More to Scale for the Web
    Join us for a live and interactive webinar session where we will demonstrate how to start an evaluation of the MySQL Cluster database in 5 easy steps, and then how to expand your deployment for web & telecoms-scale services.Just register here: http://www.mysql.com/news-and-events/web-seminars/display-566.htmlGetting Started will describe how to: Get the softwareInstall itConfigure itRun itTest it Scaling for HA and the web will describe how to: Review the requirements for a HA configurationInstall the software on more serversUpdate & extend the configuration from a single host to 4Roll out the changesOn-line scaling to add further nodesWhen: Wednesday, September 08, 2010: 09:00 Pacific time (America) Wed, Sep 08: 11:00 Central time (America) Wed, Sep 08: 12:00 Eastern time (America) Wed, Sep 08: 16:00 UTC Wed, Sep 08: 17:00 Western European time The presentation will be approximately 45 minutes long followed by Q&A.

Latest Message: 1 month ago

Only registered users are allowed to post