Getting started with elasticsearch and Express.js

Elasticsearch is being rapidly developed. Starting as a very simple search engine, it became a super-monster capable of so many things, while still preserving the ability to deliver very fast results. It is now at version 2.0, with over 550 new commits in the 2.1 branch and 500 new commits in the 2.2 branch. It is being actively developed.

Apart from searching, one of the simplest-but-yet-powerful feature elasticsearch has to offer is a fuzzy auto-complete feature. You might call it typeahead, it depends how cool you are. To get fast results I can only suggest using the completion suggester feature of elasticsearch, which I will also use in this short tutorial.

The first question asked is - why is it even needed. The quick answer is - speed. The completion suggester is extremely fast. The full answer can be read at the suggester's doc page.
The second question would probably be - why should I use elasticsearch for such a simple task. The answer is, again - speed. elasticsearch is built for such tasks - text analytic in (almost) real-time. If you are already using elasticsearch for serving your search results, why not use it for other tasks as well? And if you are not using it, why not use it for your search results as well?

In this tutorial I will be using elasticsearch 2.0.0 and Express.js 4 to deliver a simple JSON with text suggestions. The code can be found in this GitHub repository and is using ES5 syntax. I would have used the import/export function (which I really like) BUT it is not yet available everywhere, and I don't want to use TypeScript or Babel for a simple tutorial.

Getting started

First, let's create an Express app! I will be using the express.js generator.

npm install -g express-generator

express ./autocompleter  
cd autocompleter

npm install

Now the application is ready to be launched. But there is nothing to see!

Let's prepare elasticsearch. Download elasticsearch, and unpack it somewhere in your file system. Then run

cd locationOfElasticsearch  
bin/elasticsearch (OR bin/elasticsearch.bat on windows).

Doing that will initialize elasticsearch using the default parameters (port 9200 on your localhost is the default configuration, we'll need it later).

Now, I'll add the elasticsearch npm package to the express.js app I created before:

npm install elasticsearch --save

which will add the following line in package.json:

"elasticsearch": "^9.0.2"

The elasticsearch module

I'll create an elasticsearch module that will be imported where needed. First, I'll create elasticsearch.js:

var elasticsearch = require('elasticsearch');

var elasticClient = new elasticsearch.Client({  
    host: 'localhost:9200',
    log: 'info'
});

var indexName = "randomindex";

/**
* Delete an existing index
*/
function deleteIndex() {  
    return elasticClient.indices.delete({
        index: indexName
    });
}
exports.deleteIndex = deleteIndex;

/**
* create the index
*/
function initIndex() {  
    return elasticClient.indices.create({
        index: indexName
    });
}
exports.initIndex = initIndex;

/**
* check if the index exists
*/
function indexExists() {  
    return elasticClient.indices.exists({
        index: indexName
    });
}
exports.indexExists = indexExists;

Ok. Now we can create an elasticsearch index, delete it, and check if it is available. You might notice I am returning the function calls. This is because the elasticsearch JS client is async, and based on promises. The client supports callbacks as well, but promises are so much more convenient! Using promises I could use the following syntax to reset the index completely:

indexExists.then(function (exists) {  
  if (exists) { 
    return deleteIndex(); 
  } 
}).then(initIndex);

Simple and clean.

The next thing I will need to do is prepare the index and its mapping. ElasticSearch needs to know how to map the data that is going to be stored in it, in order to serve results faster. The data I will put is a simple document - title and content. To do that, I will use the putMapping function. This function will be added at the end of elasticsearch.js:

function initMapping() {  
    return elasticClient.indices.putMapping({
        index: indexName,
        type: "document",
        body: {
            properties: {
                title: { type: "string" },
                content: { type: "string" },
                suggest: {
                    type: "completion",
                    analyzer: "simple",
                    search_analyzer: "simple",
                    payloads: true
                }
            }
        }
    });
}
exports.initMapping = initMapping;

A mapping of type "document" (can be anything - "book", "newsarticle" etc') will be created, defining the properties - title and content are strings, and the suggest field will be a completion suggester type with a simple analyzer.

Now let's create the addDocument function that will add a document to the index:

function addDocument(document) {  
    return elasticClient.index({
        index: indexName,
        type: "document",
        body: {
            title: document.title,
            content: document.content,
            suggest: {
                input: document.title.split(" "),
                output: document.title,
                payload: document.metadata || {}
            }
        }
    });
}
exports.addDocument = addDocument;

What's going on here?
I am adding a type document, with the three fields I have defined in the mapping - title, content and suggest. Suggest has 3 field I am defining:

input - what should be used for the auto-complete analysis. I am tokenizing the document's title into an array of strings. Let's say the document's title is "The Hitchhikers guide to the galaxy". The array would look like this: ["The", "Hitchhikers", "guide", "to", "the", "galaxy"]. This way I will get this suggested to me if I enter "hit" or "the" or "gala". If I would simple put the title as input, it would only suggest this title if I would enter "the h" or something like that.
output - What would be the output given, the data sent to back to the request.
payload - an extra object with payload data - for example the document's id or metadata. This object will be sent together with the output. Helpful when you define your own IDs.

Good. we are now ready to add some documents. What's missing is the function that delivers the suggestions upon request.

function getSuggestions(input) {  
    return elasticClient.suggest({
        index: indexName,
        type: "document",
        body: {
            docsuggest: {
                text: input,
                completion: {
                    field: "suggest",
                    fuzzy: true
                }
            }
        }
    })
}
exports.getSuggestions = getSuggestions;

This function receives a text input and sending it to the completion feature of elasticsearch. The fuzzy: true part will correct simple spelling mistakes. true means elasticsearch will define the level of fuzziness. You could also enter a number, that I still need to figure out exactly. So far, the auto fuzzy worked wonderfully for me.

Integrating with Express.js

I am now ready with elasticsearch. Moving to Express.js!
First, I am creating an endpoint for the document routes. I create a new file, documents.js in ./routes , that looks something like this:

var express = require('express');  
var router = express.Router();

var elastic = require('../elasticsearch');

/* GET suggestions */
router.get('/suggest/:input', function (req, res, next) {  
  elastic.getSuggestions(req.params.input).then(function (result) { res.json(result) });
});

/* POST document to be indexed */
router.post('/', function (req, res, next) {  
  elastic.addDocument(req.body).then(function (result) { res.json(result) });
});

module.exports = router;

Using promises, I am routing the results from elasticsearch to the response.json(data) function. This will output the JSON delivered from elasticsearch straight to the user.

Don't forget to add this line to app.js (right after the other already-defined routes. Or maybe instead :-) ):

var documents = require('./routes/documents');  
//......
app.use('/documents', documents);

Right after that, let's initialize elasticsearch. First I check if the index exists. If it does, I delete it. This is NOT the right way to do it inproduction, of course. This is just for the demo :). Now I need to create the index and add the mapping.
In app.js I will add the following (right after the last app.use):

var elastic = require('./elasticsearch');  
elastic.indexExists().then(function (exists) {  
  if (exists) {
    return elastic.deleteIndex();
  }
}).then(function () {
  return elastic.initIndex().then(elastic.initMapping).then(function () {
    //Add a few titles for the autocomplete
    //elasticsearch offers a bulk functionality as well, but this is for a different time
    var promises = [
      'Thing Explainer',
      'The Internet Is a Playground',
      'The Pragmatic Programmer',
      'The Hitchhikers Guide to the Galaxy',
      'Trial of the Clone'
    ].map(function (bookTitle) {
      return elastic.addDocument({
        title: bookTitle,
        content: bookTitle + " content",
        metadata: {
          titleLength: bookTitle.length
        }
      });
    });
    return Promise.all(promises);
  });
});

Now I have an initialized elasticsearch index, and a few titles added. Assuming elasticsearch is already running, start the app and see what results you are getting from a request to http://localhost:3000/documents/suggest/hit :

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "docsuggest": [{
    "text": "hit",
    "offset": 0,
    "length": 3,
    "options": [{
      "text": "The Hitchhikers Guide to the Galaxy",
      "score": 1,
      "payload": {
        "titleLength": 35
      }
    }]
  }]
}

Or http://localhost:3000/documents/suggest/Pra:

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "docsuggest": [{
    "text": "Pra",
    "offset": 0,
    "length": 3,
    "options": [{
      "text": "The Internet Is a Playground",
      "score": 1,
      "payload": {
        "titleLength": 28
      }
    }, {
      "text": "The Pragmatic Programmer",
      "score": 1,
      "payload": {
        "titleLength": 24
      }
    }]
  }]
}

Pretty neat, huh? Notice that "pra" also returns "The Internet is a Playground". this is due to the fuzzy search. "Pra" is similar to "Pla". Searching for "Prag" will only return "the pragmatic programmer".

That's it, this was a very simple introduction to elasticsearch using express.js. This can (and SHOULD!) be extended. I haven't even added the search functionality (well, just another function in elasticsearch.js. Not even along one!). Maybe in a different post.

Again, the entire code is available in this GitHub repository, you can play along with it as much as you want.

Just a note - this blog post is really for me - I just started getting into the elasticsearch-Node.js combination and I find documenting what I do to be very helpful for remembering! If I made a mistake or two, please correct me!