Log4j file rotation: Difference between revisions

From EggeWiki
mNo edit summary
mNo edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
If you have an application writing to a log file using log4j, under what circumstances could the log rotation cause the hosting process to fail?   
If you have an application writing to a log file using log4j, under what circumstances could the log rotation in '''DailyRollingFileAppender''' cause the hosting process to fail?   


When it failed to rotate, you may see a message like:
When it failed to rotate, you may see a message like:
<geshi>
<syntaxhighlight lang="java5">
Caused by: java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING
Caused by: java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING
         at java.nio.charset.CharsetEncoder.throwIllegalStateException(CharsetEncoder.java:941)
         at java.nio.charset.CharsetEncoder.throwIllegalStateException(CharsetEncoder.java:941)
Line 23: Line 23:
         at org.jboss.logging.Logger.info(Logger.java:189)
         at org.jboss.logging.Logger.info(Logger.java:189)
         at org.jboss.system.server.ServerImpl.shutdown(ServerImpl.java:654)
         at org.jboss.system.server.ServerImpl.shutdown(ServerImpl.java:654)
</geshi>
</syntaxhighlight>


I spent some time investigation this today, and here are my findings.  First, lets take a look at the log4j which rotates the log file.
I spent some time investigation this today, and here are my findings.  First, lets take a look at the log4j which rotates the log file.


http://www.docjar.com/html/api/org/apache/log4j/DailyRollingFileAppender.java.html
http://www.docjar.com/html/api/org/apache/log4j/DailyRollingFileAppender.java.html
<geshi lang="java5">
<syntaxhighlight lang="java5">
   320      File file = new File(fileName);
   320      File file = new File(fileName);
   321      boolean result = file.renameTo(target);
   321      boolean result = file.renameTo(target);
Line 45: Line 45:
   334        errorHandler.error("setFile("+fileName+", false) call failed.");
   334        errorHandler.error("setFile("+fileName+", false) call failed.");
   335      }
   335      }
</geshi>
</syntaxhighlight>


OK.  So it looks like log4j will attempt to rename the file, and if it fails, it'll continue to log to the original file. We'll see some message like this send to stderr:
OK.  So it looks like log4j will attempt to rename the file, and if it fails, it'll continue to log to the original file. We'll see some message like this send to stderr:
Line 55: Line 55:


https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/native/java/io/UnixFileSystem_md.c?rev=257&view=markup
https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/native/java/io/UnixFileSystem_md.c?rev=257&view=markup
<geshi lang="c">
<syntaxhighlight lang="c">
JNIEXPORT jboolean JNICALL
JNIEXPORT jboolean JNICALL
Java_java_io_UnixFileSystem_rename0(JNIEnv *env, jobject this,
Java_java_io_UnixFileSystem_rename0(JNIEnv *env, jobject this,
Line 71: Line 71:
     return rv;
     return rv;
}
}
</geshi>
</syntaxhighlight>


OK.  This is pretty simple.  So how could we cause rename to break?  Well here's a basic test harness.
OK.  This is pretty simple.  So how could we cause rename to break?  Well here's a basic test harness.


<geshi lang="java5">
<syntaxhighlight lang="java5">
while (!Thread.currentThread().isInterrupted()) {
while (!Thread.currentThread().isInterrupted()) {
  File a = new File("a.txt");
  File a = new File("a.txt");
Line 90: Line 90:
  Thread.yield();
  Thread.yield();
}
}
</geshi>
</syntaxhighlight>


When I run this on Windows, it's pretty easy to get the program to fail.  I just need to run '''tail''' against it.  It takes a couple of tries because the file is only there some of the time, but once you get the file, the test harness will start to print errors.
When I run this on Windows, it's pretty easy to get the program to fail.  I just need to run '''tail''' against it.  It takes a couple of tries because the file is only there some of the time, but once you get the file, the test harness will start to print errors.
Line 99: Line 99:


If I look at my tail program, I notice that's it's handle to the file has gone stale.
If I look at my tail program, I notice that's it's handle to the file has gone stale.
<geshi lang="bash">
<syntaxhighlight lang="bash">
>> pfiles 69
>> pfiles 69
69:    tail -f a.txt
69:    tail -f a.txt
Line 110: Line 110:
       /devices/pseudo/pts@0:1
       /devices/pseudo/pts@0:1
   0: Stale NFS file handle
   0: Stale NFS file handle
</geshi>
</syntaxhighlight>


This is normal for a SAN, but slightly different for a Netapp mount.  Running on the Netapp, it REALLY, REALLY hard to get a stale NFS handle.  It generally takes something like a network interruption to get a stale handle on the netapp.  Here's the same test on a Netapp:
This is normal for a SAN, but slightly different for a Netapp mount.  Running on the Netapp, it REALLY, REALLY hard to get a stale NFS handle.  It generally takes something like a network interruption to get a stale handle on the netapp.  Here's the same test on a Netapp:
   
   
<geshi lang="bash">
<syntaxhighlight lang="bash">
$ pfiles 27297  
$ pfiles 27297  
27297:  tail -f a.txt
27297:  tail -f a.txt
Line 129: Line 129:
   3: S_IFREG mode:0644 dev:284,1 ino:17193816 uid:5294 gid:200 size:0
   3: S_IFREG mode:0644 dev:284,1 ino:17193816 uid:5294 gid:200 size:0
       O_RDONLY|O_NONBLOCK|O_LARGEFILE
       O_RDONLY|O_NONBLOCK|O_LARGEFILE
</geshi>
</syntaxhighlight>


The last device is my file.  With the netapp, it will keep a temporary file in existence until my process releases it's handle.  Here's the file I have a lock on:
The last device is my file.  With the netapp, it will keep a temporary file in existence until my process releases it's handle.  Here's the file I have a lock on:
Line 140: Line 140:
So, I've shown that you can break a program by tailing it in another process on Windows, but so not on Solaris.  We'll I'll try some more tests to see if I can get Solaris to break. I've created this test harness for log4j and a corresponding log4j.xml file.  
So, I've shown that you can break a program by tailing it in another process on Windows, but so not on Solaris.  We'll I'll try some more tests to see if I can get Solaris to break. I've created this test harness for log4j and a corresponding log4j.xml file.  


<geshi lang="java5">
<syntaxhighlight lang="java5">
public static void main(String[] args) {
public static void main(String[] args) {
  Logger log = Logger.getLogger(AppendIt.class);
  Logger log = Logger.getLogger(AppendIt.class);
Line 153: Line 153:
  }
  }
}
}
</geshi>
</syntaxhighlight>


<geshi lang="xml">
<syntaxhighlight lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
Line 181: Line 181:


</log4j:configuration>
</log4j:configuration>
</geshi>
</syntaxhighlight>


I've tried by above 'tail' commands, running the test harness on all three file system, but still no luck.
I've tried by above 'tail' commands, running the test harness on all three file system, but still no luck.
Line 237: Line 237:


Lessons learned - on Windows don't try to tail a file unless your in a dev environment.  In Solaris you can tail a file, but don't zip it up, and try not to delete it.
Lessons learned - on Windows don't try to tail a file unless your in a dev environment.  In Solaris you can tail a file, but don't zip it up, and try not to delete it.
{{Template:728x90_as}}


== See Also ==
== See Also ==

Latest revision as of 18:55, 10 December 2011

If you have an application writing to a log file using log4j, under what circumstances could the log rotation in DailyRollingFileAppender cause the hosting process to fail?

When it failed to rotate, you may see a message like:

Caused by: java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING
        at java.nio.charset.CharsetEncoder.throwIllegalStateException(CharsetEncoder.java:941)
        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:537)
        at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java:384)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:146)
        at java.io.OutputStreamWriter.write(OutputStreamWriter.java:204)
        at java.io.Writer.write(Writer.java:126)
        at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:47)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:301)
        at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:358)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:159)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)
        at org.apache.log4j.Category.callAppenders(Category.java:203)
        at org.apache.log4j.Category.forcedLog(Category.java:388)
        at org.apache.log4j.Category.log(Category.java:853)
        at org.jboss.logging.Log4jLoggerPlugin.info(Log4jLoggerPlugin.java:161)
        at org.jboss.logging.Logger.info(Logger.java:189)
        at org.jboss.system.server.ServerImpl.shutdown(ServerImpl.java:654)

I spent some time investigation this today, and here are my findings. First, lets take a look at the log4j which rotates the log file.

http://www.docjar.com/html/api/org/apache/log4j/DailyRollingFileAppender.java.html

  320       File file = new File(fileName);
  321       boolean result = file.renameTo(target);
  322       if(result) {
  323         LogLog.debug(fileName +" -> "+ scheduledFilename);
  324       } else {
  325         LogLog.error("Failed to rename ["+fileName+"] to ["+scheduledFilename+"].");
  326       }
  327   
  328       try {
  329         // This will also close the file. This is OK since multiple
  330         // close operations are safe.
  331         this.setFile(fileName, false, this.bufferedIO, this.bufferSize);
  332       }
  333       catch(IOException e) {
  334         errorHandler.error("setFile("+fileName+", false) call failed.");
  335       }

OK. So it looks like log4j will attempt to rename the file, and if it fails, it'll continue to log to the original file. We'll see some message like this send to stderr:

log4j:ERROR Failed to rename [test.log] to [test.log.2008-09-26-13-35].

How does File#renameTo work inside the JVM. We'll it looks like it just makes a system call with the two paths.

https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/native/java/io/UnixFileSystem_md.c?rev=257&view=markup

JNIEXPORT jboolean JNICALL
Java_java_io_UnixFileSystem_rename0(JNIEnv *env, jobject this,
                                    jobject from, jobject to)
{
    jboolean rv = JNI_FALSE;

    WITH_FIELD_PLATFORM_STRING(env, from, ids.path, fromPath) {
        WITH_FIELD_PLATFORM_STRING(env, to, ids.path, toPath) {
            if (rename(fromPath, toPath) == 0) {
                rv = JNI_TRUE;
            }
        } END_PLATFORM_STRING(env, toPath);
    } END_PLATFORM_STRING(env, fromPath);
    return rv;
}

OK. This is pretty simple. So how could we cause rename to break? Well here's a basic test harness.

while (!Thread.currentThread().isInterrupted()) {
 File a = new File("a.txt");
 if (!a.exists()) {
  a.createNewFile();
 }
 File b = new File("b.txt");
 if (b.exists()) {
  b.delete();
 }
 boolean ok = a.renameTo(b);
 if (!ok)
  System.err.println("Failed to rename a to b");
 Thread.yield();
}

When I run this on Windows, it's pretty easy to get the program to fail. I just need to run tail against it. It takes a couple of tries because the file is only there some of the time, but once you get the file, the test harness will start to print errors.

OK. The interesting bit is when we run the test harness on Solaris. On my Solaris test system, I have three different file systems - local disk, Netapp, and Solaris NFS. They're all a litte different in how the handle file locks.

I run my test harness on Solaris on the Solaris NFS, but I can't get the test harness to break by using 'tail'.

If I look at my tail program, I notice that's it's handle to the file has gone stale.

>> pfiles 69
69:     tail -f a.txt
  Current rlimit: 256 file descriptors
   1: S_IFCHR mode:0620 dev:295,0 ino:12582918 uid:4911 gid:7 rdev:24,1
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /devices/pseudo/pts@0:1
   2: S_IFCHR mode:0620 dev:295,0 ino:12582918 uid:4911 gid:7 rdev:24,1
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /devices/pseudo/pts@0:1
   0: Stale NFS file handle

This is normal for a SAN, but slightly different for a Netapp mount. Running on the Netapp, it REALLY, REALLY hard to get a stale NFS handle. It generally takes something like a network interruption to get a stale handle on the netapp. Here's the same test on a Netapp:

$ pfiles 27297 
27297:  tail -f a.txt
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:5294 gid:7 rdev:24,4
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /devices/pseudo/pts@0:4
   1: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:5294 gid:7 rdev:24,4
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /devices/pseudo/pts@0:4
   2: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:5294 gid:7 rdev:24,4
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /devices/pseudo/pts@0:4
   3: S_IFREG mode:0644 dev:284,1 ino:17193816 uid:5294 gid:200 size:0
      O_RDONLY|O_NONBLOCK|O_LARGEFILE

The last device is my file. With the netapp, it will keep a temporary file in existence until my process releases it's handle. Here's the file I have a lock on:

$ ls -ltra .nfs* -rw-r--r-- 1 user lehman 0 2008-09-26 14:43 .nfsCF00C

The Netapp has basically renamed the file instead of deleting it, allow my process to continue to use it until the process exists.

So, I've shown that you can break a program by tailing it in another process on Windows, but so not on Solaris. We'll I'll try some more tests to see if I can get Solaris to break. I've created this test harness for log4j and a corresponding log4j.xml file.

public static void main(String[] args) {
 Logger log = Logger.getLogger(AppendIt.class);
 long i = 0;
 while(!Thread.currentThread().isInterrupted()) {
  log.info(i++);
  long now = System.currentTimeMillis();
  // gobs of silent logging
  while(System.currentTimeMillis() - now < 1000) {
   log.debug(i++);
  }
 }
}
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">

   <!-- A time/date based rolling appender -->
   <appender name="FILE" class="org.apache.log4j.DailyRollingFileAppender">
      <param name="File" value="test.log"/>
      <param name="Append" value="true"/>

      <!-- Rollover every minute -->
      <param name="DatePattern" value="'.'yyyy-MM-dd-HH-mm"/>

      <layout class="org.apache.log4j.PatternLayout">
         <!-- The default pattern: Date Priority [Category] Message\n -->
         <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>
      </layout>
   </appender>

   <root>
      <priority value="INFO"/>
      <appender-ref ref="FILE"/>
   </root>

</log4j:configuration>

I've tried by above 'tail' commands, running the test harness on all three file system, but still no luck.

Deleting the output file causes log4j to output this error: log4j:ERROR Failed to rename [test.log] to [test.log.2008-09-26-13-35]. but then continue without incident.

OK... now I'll try to gzip the file being written to.

log4j:ERROR Failed to rename [test.log] to [test.log.2008-09-26-13-35].
log4j:ERROR Could not close org.apache.log4j.helpers.QuietWriter@1c9b9ca
java.io.IOException: Stale NFS file handle
        at java.io.FileOutputStream.close0(Native Method)
        at java.io.FileOutputStream.close(FileOutputStream.java:279)
        at sun.nio.cs.StreamEncoder$CharsetSE.implClose(StreamEncoder.java:431)
        at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:160)
        at java.io.OutputStreamWriter.close(OutputStreamWriter.java:222)
        at java.io.FilterWriter.close(FilterWriter.java:92)
        at org.apache.log4j.FileAppender.closeFile(FileAppender.java:184)
        at org.apache.log4j.DailyRollingFileAppender.rollOver(DailyRollingFileAppender.java:313)
        at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:353)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.info(Category.java:666)
        at AppendIt.main(AppendIt.java:13)
log4j:ERROR Failed to rename [test.log] to [test.log.2008-09-26-13-42].
Exception in thread "main" java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING_END
        at java.nio.charset.CharsetEncoder.throwIllegalStateException(CharsetEncoder.java:941)
        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:537)
        at sun.nio.cs.StreamEncoder$CharsetSE.flushLeftoverChar(StreamEncoder.java:358)
        at sun.nio.cs.StreamEncoder$CharsetSE.implClose(StreamEncoder.java:414)
        at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:160)
        at java.io.OutputStreamWriter.close(OutputStreamWriter.java:222)
        at java.io.FilterWriter.close(FilterWriter.java:92)
        at org.apache.log4j.FileAppender.closeFile(FileAppender.java:184)
        at org.apache.log4j.FileAppender.reset(FileAppender.java:339)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:284)
        at org.apache.log4j.DailyRollingFileAppender.rollOver(DailyRollingFileAppender.java:331)
        at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:353)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.info(Category.java:666)
        at AppendIt.main(AppendIt.java:13)

Cool - this crashed my test program. In JBoss this would likely cause the running thread to exit. It probably dies silently, as it can't log the problem it's having. In my test harness, the main thread will write to stderr if it has an exception.

Lessons learned - on Windows don't try to tail a file unless your in a dev environment. In Solaris you can tail a file, but don't zip it up, and try not to delete it.

See Also

JBoss log4j rotation