Monday, July 30, 2007
MySQL Lock Contention
Lock contention can be a serious performance issue with MySQL if both read and write happen concurrently on the same table. BlogScope has tables with around 100 million rows, stored in the MyISAM storage engine. Usual workload consists of 2-3 updates (mainly INSERT) per second and several SELECT operations. Since MyISAM provides only table-level locking, every operation locks the complete table. Although SELECT operations can take place in parallel, it turns out that a long running SELECT can block the complete application. After investigating a bit in why a SELECT was blocking other SELECT operations, I found an explanation in MySQL docs.
A client issues a SELECT that takes a long time to run. Another client then issues an UPDATE on the same table. This client waits until the SELECT is finished.Another client issues another SELECT statement on the same table. Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish, and for the first SELECT to finish.
This basically means that if there is any long running SELECT query (e.g., sequential scan), the whole application will just PAUSE for hours. This also means that you can not take backups (using mysqldump), as not only the process willing to write, but even those just reading will be blocked. Fortunately, a fix is available: set the priority of SELECT higher than UPDATE. In this case, the second SELECT statement in the preceding scenario would execute before the UPDATE statement, and would not need to wait for the first SELECT to finish. To set low priority for UPDATES, add the following to /etc/my.cnf
low_priority_updates=1
and execute the following
set global LOW_PRIORITY_UPDATES=1;
While setting updates to be of low priority fixes the locking issue, I don't really like it. In this case, the UPDATE may never actually get a chance to execute if the load is too high. In my opinion, the best solution is to have a replica server. A master MySQL server where all the updates take place, and a slave for all the reads. Once we install our new hardware, I will setup a replica server for BlogScope as well.
Labels: mysql, performance
Monday, July 23, 2007
Accessing files over SFTP in Java
We have a lot of data files that needs to be copied to each of the machine we want to run our code on. Further when these data files are updated, they need to be updated on all the machines. This means that the developer has to spend a lot of time just copying these data files around. I wrote a simple solution to this where latest versions of all the data files are maintained at a central server accessible via ssh. These remote date files are copied to the local machine when required in an on-demand fashion transparently by the Java program (after comparing last modification times of the local and remote file).
For accessing files over SFTP, we are using Apache Commons VFS along with Jsch. These libraries (especially commons VFS) is not well documented. I am therefore posting some code snippets from our code documenting the API
The first code snippet demonstrates the API for copying a file from remote location to the local machine:
/**
* Copies a remote file to local filesystem.
*/
public static void copyRemoteFile(String host, String user,
String password, String remotePath, String localPath) throws IOException {
// we first set strict key checking off
FileSystemOptions fsOptions = new FileSystemOptions();
SftpFileSystemConfigBuilder.getInstance().setStrictHostKeyChecking(
fsOptions, "no");
// now we create a new filesystem manager
DefaultFileSystemManager fsManager = (DefaultFileSystemManager) VFS
.getManager();
// the url is of form sftp://user:pass@host/remotepath/
String uri = "sftp://" + user + ":" + password + "@" + host
+ "/" + remotePath;
// get file object representing the local file
FileObject fo = fsManager.resolveFile(uri, fsOptions);
// open input stream from the remote file
BufferedInputStream is = new BufferedInputStream(fo.getContent()
.getInputStream());
// open output stream to local file
OutputStream os = new BufferedOutputStream(new FileOutputStream(
localPath));
int c;
// do copying
while ((c = is.read()) != -1) {
os.write(c);
}
os.close();
is.close();
// close the file object
fo.close();
// NOTE: if you close the file system manager, you won't be able to
// use VFS again in the same VM. If you wish to copy multiple files,
// make the fsManager static, initialize it once, and close just
// before exiting the process.
fsManager.close();
System.out.println("Finished copying the file");
}
Unfortunately the Commons VFS api does not provide a way to check last modification time of a remote file. I had to write that code using the Jsch API. Below is a code snippet that returns last modification time in seconds:
/**
* Returns a Sftp session conncted using the Jsch library.
*/
public static Session connectSFTP(final String host, final String user,
final String pass) throws JSchException {
JSch jsch = new JSch();
Session session = jsch.getSession(user, host, 22);
session.setUserInfo(new UserInfo() {
public String getPassphrase() {
return null;
}
public String getPassword() {
return null;
}
public boolean promptPassphrase(String string) {
return false;
}
public boolean promptPassword(String string) {
return false;
}
public boolean promptYesNo(String string) {
return true;
}
public void showMessage(String string) {
}
});
session.setPassword(pass);
session.connect();
return session;
}
/**
* Returns last modification time of a remote file in seconds.
*/
public static int getLastModificationTime(String host, String user,
String password, String remotePath) throws IOException,
JSchException, SftpException {
Session session = connectSFTP(host, user, password);
ChannelSftp chan = (ChannelSftp) session.openChannel("sftp");
chan.connect();
SftpATTRS attrs = chan.lstat(remotePath);
int time = attrs.getMTime();
chan.disconnect();
session.disconnect();
return time;
}
I hope that this code is useful to others. Please leave a comment if you see any error of have a suggestion.