Winter has finally arrived. That makes it seem like I have been waiting for it to come. I think I was. Sometimes I miss having a proper, cold winter. When snow stays on the ground until April. I want to be able to dig snow forts in drifts with my kids. Lately I've been playing a lot with the kids. Building Thomas the Train track routes and bridges. I get disappointed when the track gets messed up, it can take a lot of time to maximize the use of the table top.
We are just back from a 2 week Christmas holiday. During the holiday, Julene and I left the kids with our parents and took a holiday-in-a-holiday 4 day trip to Arizona to have some time to ourselves. We took a Grand Canyon tour which was amazing, and also did a desert tour where we were able to drive our own tom car (an Israeli made off-road vehicle).
Now we're back and getting back into the swing of things. It took a while but we seem to be getting back into the swing of things.
I recently had an issue where one of my MySQL slaves repeatedly error'd on replication due to key collisions. The replication type was row based, which is much more strict that statement. In fact, if it had been statement based, a lot of these errors wouldn't have presented themselves and the slave would have continued on happily becoming more and more inconsistent.,
Because of the huge dataset and the speed of recovery required, I did not want to rebuild the entire database. I wanted to restore only the couple tables that were causing issues.
What I wanted to do was
- stop replication on a good slave and the problem slave at the same point
- dump the table (from the good slave, obviously)
- drop and import the dumped tables on the problem slave
- restart replication
Fortunately this is achievable with by stopping the slave to be fixed and then a minute or so later the source database, capturing the binlog and position and then issuing the following on the target database:
START SLAVE UNTIL MASTER_LOG_FILE='binlog.000001', MASTER_LOG_POS=123456;
SELECT MASTER_POS_WAIT('binlog.000001', 123456);
At this point, both databases should have had their slave processes halted at the same execution point and the dump and restore outlined above can be done. This reduced what would have been a 5 hour database copy into 15 minutes. Hopefully this will save someone else some time too.
Cloning a MySQL slave is typically very straight forward.
- Execute
stop slaveon the donor slave and capture the slave status information - Stop mysql on the donor
- Copy the database files from the donor to the new slave
- Start MySQL on the new slave
- Execute the change master statement to start the new slave's replication process
- Start mysql on the donor and allow replication to catch up
Simple right? It is, if you don't run into the scenario I managed to hit. Show slave status gives you a lot of information like this:
Master_Host: 10.10.10.10
Master_User: repl_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.002644
Read_Master_Log_Pos: 1015419943
Relay_Log_File: relay-log.000257
Relay_Log_Pos: 68175060
Relay_Master_Log_File: binlog.002643
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 887594041
Relay_Log_Space: 2448352803
This information is then matched into a change master statement. The most important values here are MASTER_LOG_FILE and MASTER_LOG_POS which tell the slave thread where to start replicating from. This will be the point where the donor slave was stopped before the copy. Now, you'd think that the master log file would correspond with the Master_Log_File value from show slave status. It doesn't. You want to use Relay_Master_Log_File. Most often, these log files are the same - if replication isn't lagging. But even in instances where the slave is not behind there is a small chance you'll catch it as the IO thread is getting the next binlog file. Now if you use the log file you think is correct, you'll be setting your replication to start too far ahead. Now, you may get lucky like me and the position won't exist in the binlog you set, and you'll get a replication IO error in my case, 1236 "Exceeded max_allowed_packet", instructing you to set the max_allowed_packet larger on the master which is misleading in this case.
The CHANGE MASTER statement for the above slave status should be
CHANGE MASTER TO MASTER_LOG_FILE='binlog.002643' MASTER_LOG_POS=887594041The log position is the value from
Exec_Master_Log_Pos.
Hopefully this will keep you from bludgeoning your forehead on your desk as I nearly did.
My task was to dump our database at work and build a new database cluster. Now our database isn't trivial. It's large. Nearing a terabyte large. A mysqldump took 14 hours and I figured that the subsequent import would be double that at most. I decided to go ahead without any optimization of the process. It should fit into the allowed time frame and I had a lot of other work to do.
I checked the import before going to bed two days later and it was finally nearing the end. The following morning however, when the import was somewhere around the 60 hour mark, it died on an innodb deadlock error.
Now for the redo. This time, I'm going to optimize this process. Instead of one large dump file, I broke the export to one file per table with this handy two step job
1. Get list of tables
mysql -u user -ppasswd dbname -e "show tables" > list_of_tables.txt
2. Iterate through the table list and dump to file
cat list_of_tables.txt | while read -r line; do mysqldump -h dbhost -u user -ppasswd dbname "$line" > "$line".sql; done
I had done this process in a single step previously, but in this case, there were a few tables I wanted to exclude so I dumped the table list to file first so I could edit it. For even more speed, you could break the list into multiple files and run parallel processes. I let it go over night so this was fine for me.
Fortunately, I ran this dump the night before just in case of a failure and I have all the SQL ready to go - I'm ready to import. This time I will split the tables into separate parallel jobs as the import is CPU bound and I want to get more processors involved. Break the table list into multiple text files being sure to distribute the largest tables evenly. You can then start 4 mysql import processes feeding the table names and appending ".sql" to them.
I am scripting this so I can track the process easily
#!/bin/bash
FILE=$1 # file listing tables passed in as argument
HOST=dbhost
USER=user
PASSWD=passwd
DB=dbname
SQL_PATH=/path/to/sql
# Loop through file names
if [ -f $FILE ]
then
echo "restoring from $FILE"
for f in `cat $FILE`
do
echo restoring $f
mysql -h $HOST -u $USER -p$PASSWD $DB < $SQL_PATH/$f.sql
done
fi
Then start about 4 of these with
./db_import.sh table_list1.txt > results1.txt &
./db_import.sh table_list2.txt > results2.txt &
./db_import.sh table_list3.txt > results3.txt &
./db_import.sh table_list4.txt > results4.txt &
Tail the result files to track the import progress. I've learned my lesson. Smaller files. More processes. Now if I have a failure part way through, I can deduce which table(s) failed and continue on.
Tags
ubuntu (11) web development (7) porsche (2) life (24) travel (3) work (11) linux (20) test (2) london (1) photography (3) cars (4) new year (1) holidays (1) birthday (2) weather (1) abby (1) qos (1) css (3) fire (1) bow island (2) firewall (1) parenthood (1) diy (3) laptop (2) chart (1) cooling (1) politics (2) harper (1) media (1) modx (1) development (16) pie (1) microsoft (1) sharepoint (1) wine (1) video editor (1) canada (1) wii (1) parenting (3) shoelaces (1) christmas (1) pictures (2) unix (1) time (1) motorcycle (8) google (2) picasa (1) windows (2) joel (1) twitter (1) technology (1) deadline (1) (6) triumph (3) tiger (1) holiday (1) apache (1) mysql (7) tip (1) mechanics (1) server (1) smartphone (1) android (5) vm (1) ted (1) music (1) house (2) sql (1) postgres (1) vmware (1) camping (1) replication (2) genius (19) svn (1)


Steven Reimer