Chapter 19 Machine Learning

In this chapter, we build a package that, via V8, wraps ml.js, a library that brings machine learning to JavaScript. It covers quite a few models; we only include one: the linear regression. This is an interesting example because it reveals some proceedings that one is likely to run into when creating packages using V8.

19.1 Dependency

We start by creating a package and add the V8 package as dependency.

Then we create the inst directory in which we place the ml.js file downloaded from the CDN.

With the dependency downloaded, one can start working on the R code. First, a new V8 context needs to be created and the ml.js file needs to be imported into it.

19.2 Simple Regression

The “simple linear regression” consists of a simple function that takes two arrays. We can thus create a function that takes two vectors, x, and y, and runs the regression.

Then we can document and load the model to the function.

This works but has a few issues, namely running two or more regression internally will override the variable regression in the V8 context. Let us demonstrate by implementing a function to predict.

We then document and load the functions to run two regressions in a row then observe the issue. Unlike R, the model we created only exists in JavaScript, unlike the lm, the function ml_simple_lm does not return the model. Therefore, ml_simple_lm does not distinguish between models, unlike predict, which takes the model as the first argument and runs the prediction on it.

This implementation of ml.js is indeed very dangerous.

The package ml currently under construction should emulate R in that respect; the ml_simple_lm should return the model, which should be usable with the predict function. In order to do so, we are going to need to track regressions internally in V8 so the ml_simple_lm returns a proxy of the model that predict can use to predict on the intended model.

In order to track and store regressions internally, we are going to declare an empty array when the package is loaded.

Then, one can track regressions by creating an R object, which is incremented every time ml_simple_lm runs; this can be used as a variable name in the JavaScript regressions array declare in .onLoad. This variable name must be stored in the object we intend to return so the predict method we will create later on can access the model and run predictions. Finally, in order to declare a new method on the predict function, we need to return an object bearing a unique class. Below we use mlSimpleRegression.

Then one can implement the predict method for mlSimpleRegression. The function uses the address of the model to run the JavaScript predict method on that object.

We can then build and load the package to it in action.

19.3 Exercises

There are too many great JavaScript libraries that would be great additions to the R ecosystem but perhaps try and integrate one of these below. They are simple yet exciting and thus provide ideal first forays into what this part of the book explained.