Extending the Tungsten Replicator Core JS Filter Functionality

Tungsten Replicator has a really cool feature in that we can filter data as it goes past on the wire.
The replicator itself is written entirely in Java and writing filters for it is not as straightforward as it looks, which is why the much better feature is just to use the JavaScript mechanism and write filters using that tool instead. I’ll save the details for how you can write filters to process and massage data for another time, but right now I wanted to find a good way of improving that JavaScript environment.
There are filters, for example, where I want to be able to load JSON option and configuration files, or write out JSON versions of information, and plenty more.
Mozilla’s Rhino JS environment is what is used to provide the internal JS environment for running filters. The way this is supported is that rather than creating a Rhino JS environment that can do whatever it wants, instead, we create a JS instance specifically for executing the required functions within the filter. One of these instances is created for each filter that is configured in the system (and each batch instance too).
The reason we do this is because for each filter, we want each transaction event that appears in the THL log to get executed through the JS instance where the filter() function in the JS filter is executed with a single argument, the event data.
The limitation of this model is that we dont get the full Rhino environment because we execute the JS function directly, so certain top level items and functions like load() or require(), or utilities like JSON.stringify() are not available. We could do that by changing the way we do the configuration, but that could start to get messy quickly, while also complicating the security aspects of how we execute these components.
There are some messy ways in which we could get round this, but in the end, because I also wanted to add some general functionality into the filters system shared across all JS instances, I chose instead to just load a set of utility functions, written in JavaScript, into the JS instance for the filter. The wonderful thing about JS is that we can write all of the functions in JS, even for classes, methods and functions that aren’t provided elsewhere.
So I chose the path of least resistance, which means loading and executing a core JS file before loading and executing the main filter JS so that. We can place into that JS file all of the utility functions we want to be available to all of the filters.
So, to enable this the first thing we do is update the core Java code when we load the filter JS to load our core utility JS first. That occurs in replicator/src/java/com/continuent/tungsten/replicator/filter/JavaScriptFilter.java, within the prepare() function which is where we instantiate the JS environment based on the code.
String coreutilssrc = properties.getString(“replicator.filter.coreutils”);

// Import the standard JS utility script first
try
 {
 // Read and compile the core script functions
 BufferedReader inbase = new BufferedReader(new FileReader(coreutilssrc));
 script = jsContext.compileReader(inbase, scriptFile, 0, null);
 inbase.close();
 script.exec(jsContext, scope);
 }
catch (IOException e)
 {
 throw new ReplicatorException("Core utility library file not found: "
 + coreutilssrc, e);
 }
catch (EvaluatorException e)
 {
 throw new ReplicatorException(e);
 }

This is really straightforward, we obtain the path to the core utilities script from the configuration file (we’ll look at how we define that later), and then compile that within the jsContext object, where our JavaScript is being executed. We add some sensible error checking, but otherwise this is simple.
It’s important to note that this is designed to load that core file *before* the main filter file just in case we want to use anything in there.
Next, that configuration line, we can add into a default config by creating a suitable ‘template’ file for tpm, which we do by creating the file replicator/samples/conf/filters/default/coreutils.tpl. I’ve put it into the filters section because it only applies to filter environments.
The content is simple, it’s the line with the location of our core utility script:

# Defines the core utility script location
replicator.filter.coreutils=${replicator.home.dir}/support/filters-javascript/coreutils.js

And lastly, we need the script itself, replicator/support/filters-javascript/coreutils.js :

// Core utility JavaScript and functions for use in filters
//
// Author: MC Brown (9af05337@opayq.com)
// Simulate the load() function to additional external JS scripts
function load(filename) {
    var file = new java.io.BufferedReader(new java.io.FileReader(new java.io.File(filename)));
    var sb = "";
    while((line = file.readLine()) != null)
        {
            sb = sb + line + java.lang.System.getProperty("line.separator");
        }
    eval(sb);
}
// Read a file and evaluate it as JSON, returning the evaluated portion
function readJSONFile(path)
{
    var file = new java.io.BufferedReader(new java.io.FileReader(new java.io.File(path)));
    var sb = "";
    while((line = file.readLine()) != null)
        {
            sb = sb + line + java.lang.System.getProperty("line.separator");
        }
    jsonval = eval("(" + sb + ")");
    return jsonval;
}
// Class for reoncstituing objects into JSON
JSON = {
    parse: function(sJSON) { return eval('(' + sJSON + ')'); },
    stringify: (function () {
      var toString = Object.prototype.toString;
      var isArray = Array.isArray || function (a) { return toString.call(a) === '[object Array]'; };
      var escMap = {'"': '\\"', '\\': '\\\\', '\b': '\\b', '\f': '\\f', '\n': '\\n', '\r': '\\r', '\t': '\\t'};
      return function stringify(value) {
        if (value == null) {
          return 'null';
        } else if (typeof value === 'number') {
          return isFinite(value) ? value.toString() : 'null';
        } else if (typeof value === 'boolean') {
          return value.toString();
        } else if (typeof value === 'object') {
          if (typeof value.toJSON === 'function') {
            return stringify(value.toJSON());
          } else if (isArray(value)) {
            var res = '[';
            for (var i = 0; i < value.length; i++)
              res += (i ? ', ' : '') + stringify(value[i]);
            return res + ']';
          } else if (toString.call(value) === '[object Object]') {
            var tmp = [];
            for (var k in value) {
              if (value.hasOwnProperty(k))
                tmp.push(stringify(k) + ': ' + stringify(value[k]));
            }
            return '{' + tmp.join(', ') + '}';
          }
        }
        return '"' + value.toString() + '"';
      };
    })()
  };

For the purposes of validating my process, there are three functions:

load() – which loads an external JS file and executes it, so that we can load other JS scripts and libraries.
readJSONFile() – which loads a JSON file and returns it as a JSON object.
JSON class – which does two things, one is provides JSON.parse() method for parsing strings as JSON objects into JS objects and the other is JSON.stringify() which will turn a JS object back into JSON

Putting all of this together gives you a replicator where we now have some useful functions to make writing JavaScript filters easier. I’ve pushed all of this up into my fork of the Tungsten Replicator code here: https://github.com/mcmcslp/tungsten-replicator/tree/jsfilter-enhance
Now, one final note. Because of the way load() works, in terms of running an eval() on the code to import it, it does mean that there is one final step to make functions useful. To explain what I mean, let’s say you’ve written a new JS filter using the above version of the replicator.
In your filter you include the line:

load("/opt/continuent/share/myreallyusefulfunctions.js");

Within that file, you define a function called runme():

function runme()
{
     logger.info("I'm a bit of text");
}

Now within myreallyusefulfunctions.js I can call that function fine:

runme();

But from within the JS filter, runme() will raise an unknown function error. The reason is that we eval()‘d the source file within the load() function, and so it’s context is wrong.
We can fix that within myreallyusefulfunctions.js by exporting the name explicitly:

if (runme.name) this[runme.name] = runme;

This points the parent namespace to the runme() in this context, and we put that at the end of myreallyusefulfunctions.js script and everything is fine.
I’m lazy, and I haven’t written a convenient function for it, but I will in a future blog.
Now we’ve got this far, let’s start building some useful JS functions and functionality to make it all work nicely…