Beta Update

Posted by Prashant Deva on March 7, 2011

We have just released an update to the Chronon beta.
This update contains a lot of bugfixes and resolves a lot of severe crashing issues.
We recommend everybody to upgrade your Chronon installation.

We have also increased the beta expiry from March 15 to March 25th.

Please note that recordings made from previous beta are incompatible with this beta release.  

 

Choosing what to record – Part 2: Configuration

Posted by Prashant Deva on March 4, 2011

As described in the previous post, the Chronon recorder allows you to choose what to include and exclude from recording. We really wanted to keep any sort of configuration required to use Chronon to an absolute minimum. After all who likes to go through boring manuals reading about all the various configuration parameters.

Even though the recorder and debugger allow and fully support method level selection of code, for the sake of simplicity of configuration we decided to allow only class level granularity for choosing which code to include/exclude from recording. We have seen that this suffices for almost all situations. If this does become an issue and lots of people do need method level granularity we will allow config file support for it in the future. You can read more about configuration details here.

All that said, as far the Eclipse plugin is concerned we really wanted people to just install Chronon and hit the ‘blue’ record button without worrying about any kind of configuration details. To achieve this, the eclipse plugin needs to select the right set of classes to include/exclude so that the debugger works as the user expects, right off the bat. The eclipse plugin makes the selection of classes to include/exclude by default as follows:

  • All classes in the src folders of the project being run and its dependent projects are included.
  • Classes in libraries of the projects, including JDK classes are not included.

Thus the plugin will include all the classes in ‘your’ code however it will exclude all the classes in 3rd party libraries. This way we achieve the initial goal of the include/exclude mechanism in the first place which is to record only relevant pieces of code and leave out all the code that you dont care about, which usually resides in 3rd party libraries.

However we knew that default mechanism wouldnt work for everyone and some people might want to include some libraries to record or might want to exclude some of their own code from recording. So we put in UI mechanisms while allow you to easily choose what to include/exclude from recording.

The include/exclude of classes can be configured in 2 places:

  1. Launch Configuration
  2. Project Configuration

Launch Configuration

Launch-config

You can choose the include/exclude classes per launch configuration. Note that if you choose to configure include/exclude here, it will override any similar setting in project configuration.
That is, the include/exclude config, if defined in the launch config, will be the only one used for that launch.

Project Configuration

Config-project-autoConfig-project-manual

We even allow configuration per project. The automatic tab allows you to visually select which classes to include/exclude. Note that this tab doesnt show any libraries your project uses. This is intentional since this tab really is all about ‘simplicity’. If you want to include libraries or some weird pattern, you can click on the ‘manual’ tab and specify the exact patterns to include/exclude whatever you want.

 

Chronon Beta : The what, who and why

Posted by Prashant Deva on February 13, 2011

So today we finally made the long awaited Chronon beta open to all.

Since this is an engineering blog, I aim to clarify who the beta is meant for and under what scenarios it should be used.

The current beta is meant for internal use strictly. We would love for you to try it out in scenarios like development, support, QA, etc. However we do not see this release as being used in production level scenarios where performance is an extremely high level priority.

This is a beta in the true sense that we really want you to evaluate the robustness of the recorder and the time travelling debugger.  We want to hear from you if the recorder crashes under some circumstances or the level of performance you would like it to reach. Similarly for the time travelling debugger, we really want to hear your opinion on whether it helps you find root causes of bugs faster than a traditional debugger.

Keep in mind, ‘beta’ means things will be a little incomplete and sharp around the edges. For one, we the current eclipse plugin doesnt have great J2EE integration (will be there in final release). 

Even though we have tried to keep the UI of the debugger familiar to that of the traditional debugger, when you are in hardcore debugging mode you will realize how different it actually is. We know that this new UI will cause some surprises for the new users. Thus we have added a bunch of videos showcasing some scenarios how Chronon should be used. This blog will also be updating very frequently from now onwards to tell you about all our design decisions and have a nice discussion with everyone before we reach the final 1.0 release.

 

Choosing what to record – Part 1: Controlling Performance

Posted by Prashant Deva on February 2, 2011

Consider this line of code which reads in the contents of a file.

byte[] contents = readFile(fileName);

The method readFile() in this case could belong to some third party library, the JDK or may even be a system call.

As far as debugging our application is concerned, we are not worried about what happens inside this method. And that is because we never wrote it in the first place. It is entirely possible that we may not even have the source code to this method. The only thing we care about when we debug our program is what the arguments and return value to this method were, which in this case would be the file name and the returned byte array.

Thus the central observation being -

  1. You cannot debug what you/your company didn’t write.
  2. Even if you could, you probably cannot change the faulty code because it is controlled by third parties.

We utilize this observation inside the Chronon recorder to achieve performance. The recorder is designed to record only the code of ‘your’ program.  For calls to all methods that reside in a third party library, or the JDK or native method calls, we only record the arguments and return values of those methods, since that is all that is needed to debug your program.

Choosing_to_record

This also allows you to control the exact impact the recorder has on the performance of your program. Thus for example you may have a million line J2EE application but it spends most of its time waiting for the database or inside third party libraries or webservers. In this case the performance impact of the recorder will be extremely low since the time spent in ‘recorded code’ is very low.  This is also the reason why most web apps can get away with using platforms like ruby, python or php, all of which are much slower than java, because the time spent in that piece of code is very little.

On the other hand, you can have a 20 line program where all its does it do some massive calculation in a tight loop. In this case almost all the time is spent in recorded code thus having a much larger impact on performance.

Of course since this is the first version of Chronon and not meant for deployment in heavy duty 24×7 production scenarios the latter case of performance impact is not such a huge problem. That said we do have plans to dramatically decrease the performance impact from recording in upcoming releases without the need to exclude code from recording.

In the next few posts I will describe how to tell Chronon what to and what not to record and the consequences of excluding code from recording when using the debugger.

Shutting down can be tough….

Posted by Prashant Deva on December 7, 2010

In my last post I talked about ‘flusher threads’ which constantly ‘flush’ the recorded data from the memory buffer and persist it to disk.

However, due to the latencies in IO between memory and hard disk, it is entirely possible that your program terminates while there is still some data left in the memory buffer which hasn’t been persisted.

To solve this issue, we register a ‘shutdown hook’ with the jvm which essentially keeps your program alive for a little while even after it terminates, so that we can persist all the leftover data. That is why you may see messages like these  on your console when your program shuts down when its running with Chronon.

Console-output

However using the shutdown hook opens a whole can of worms of its own.

Since the jvm does not impose any sort of ordering of how shutdown hooks are run, we essentially ‘stop’ recording at this point. Thus if you have any custom shutdown hooks of your own, those will not be recorded.

There are also issues where the shutdown hook does not run, in which case the recording would be considered corrupt.

Some of the cases when the shutdown hook will run are :

  • Program finishes execution normally.
  • Program calls System.exit().
  • Ctrl+C is used to kill the program.
  • An uncaught exception terminates the program.
  • The unix ‘kill’ command is used to terminate the program.

Cases when the shutdown hook does not run, thus leaving the recording in an invalid state :

  • Program calls Runtime.halt().
  • Unix command ‘kill -9′ is used to terminate the program.
  • The ‘End Process’ option of the Windows Task Manager is used to terminate the program.
  • Internal JVM crashes or crashes inside the native code.
  • Any other program which sends a SIGKILL signal on an Unix machine or the TerminateProcess call on a Windows machine.

That said, most of our used will be using eclipse to launch and terminate their programs, so I really focused on making that use case always produce a valid recording.

Eclipse by itself unfortunately doesn’t seem to be of any help in this case since the default red ‘terminate’ button sends a SIGKILL signal thus terminating the JVM instantly without waiting for any shutdown hooks to run. Thus if you use the ‘red’ button to terminate your programs running with Chronon, the recording will always be invalid.

Console-red

So I went ahead and added a ‘blue’ button next to the ‘red’ terminate button.

Console-blue

Pushing the ‘blue’ button will make sure the shutdown hook runs and you always get a valid recording. The blue button is active only when a program is running with Chronon enabled and it is what you should always use to stop your programs from within Eclipse.

You can still use the ‘red’ button when say you were just experimenting with something and no bugs appeared or for some other reason you just don’t care about the recording, but we recommend making the blue button your default for stopping programs from now on.

Design and Architecture of the Chronon Recorder

Posted by Prashant Deva on December 1, 2010

The Chronon recorder had directly opposing goals – to collect as much data about your program as possible, while at the same time having the least possible impact on it.
In this post I will try to describe some of the design and architectural decisions I made to achieve that.

Design

The prime design goals of the Chronon recorder were -

  1. Scalable
  2. Minimum impact on application responsiveness
  3. Universal

Scalability was higher on the list than raw performance. The reason being was that with a scalable implementation, even if you hit a performance wall with the recorder, you can always upgrade your machine/configure the recorder to continue recording.

To achieve this scalability, we made the following assumptions about how hardware is progressing:

  1. Cpu cores will keep increasing in number and go down in cost.
  2. Memory is cheap.
  3. Due to point 2 above, 64 bit computing is becoming the standard.
  4. Cpu cores aren’t getting any faster.

The recorder works by running as a java agent and instrumenting the bytecode of your java program in memory, thus not requiring you to make any changes to your code.

By universal, I mean that the recorder should be able to record any Java app whether it’s a J2EE app, Swing/SWT app or any other kind of application.
It should also be platform independent, being in line with the Java philosophy of ‘Write Once, Run Anywhere’. Thus you can record on any platform say a Mac and playback on any other platform, say Windows.

How it all works

The recorder works as follows to achieve its design goals:

  • The work done in the instrumented threads of your application is kept to a minimum. This is done to ensure minimum impact on the responsiveness of your application.
  • The recording data that is generated by the application threads is stored in a buffer in memory.
  • ‘Flusher Threads’ keep reading chunks of data from this buffer, do some processing on it and save it to disk in a highly packed format. Thereby essentially ‘flushing’ the data generated by the recorder.
Chronon_architecture2

So how do our assumptions about hardware help with this? Lets take a look -

  • If you have more cores than the threads of your applications, the recorder will do most of its work on those cores, inside the flusher threads and have minimum impact on the performance of your program.  This also gives you a hint on how many flusher threads you should use. So if you have a single threaded application and a quad core processor, you can tell Chronon to use 3 flusher threads, similarly if the application has 2 threads, cpu had 4 cores, use 2 flusher threads to make use of those 2 extra cores.
  • Now there are always going to be applications which use more threads than the number of cores or are generating data way faster than they can flush it out. This is where assumption 2 comes in. If you have enough memory the generated data will have a place to sit there while it waits to get flushed out. It is for these cases that we recommend using a 64 bit machine.

What about all the Garbage Collector  (GC)  issues with using all that extra memory?

It is a well known fact that current JVMs don’t handle heap sizes above 2gb very well. It is possible that if you have an extremely computationally intensive program that Chronon does generate utilize that much memory or that your application already is reaching the 2gb limit and Chronon makes it go over that.

To solve this issue we use a custom memory management. Thus even if the data generated by Chronon goes a little high, it wont have a heavy impact on the GC. It is common to see a 2-3gb Chronon heap shrink to a few hundred megabytes within a blink of an eye, which would ordinarily take many seconds or minutes without our custom memory management.  That said, in most development scenarios, the heap sizes wont reach even near that high.

But even that ain’t enough…

But even after all this you may run into hardware assumption 4 above. This happens when even though you have enough cores and memory, the application threads of your program are doing too much work and even the small overheard of the recorder directly on those threads is affecting the responsiveness your application.

For these situations, Chronon allows you to specify any part of your program which  to be excluded from recording.  So if you have a portion of your program which is doing some heavy computation and which you know doesn’t have any errors, or you just don’t care about examining it for now, you can exclude it from recording.

We won’t go into details of how to configure this right now, but it’s suffice to say that any part that is excluded runs with absolutely zero overhead, just like it would run without the recorder. For calls to these ‘unrecorded’ methods, we will record just the input arguments and the return value on the call site, which is usually enough information for debugging purposes.

 

« Previous 1 2 3 4 5 6 7 8 9 10 11