How to Write Geospatial Queries to MongoDB by Setting Up a Ride-Hailing App

Author

Elle J

Updated

Jul. 9, 2020

Date

Aug. 20, 2019

Time to Read

16 min

Prerequisites

  • Node.js (basic knowledge)
  • Mongoose / MongoDB (basic knowledge)

MongoDB

Node.js

backend

Ever wondered how to query the nearest hotels, restaurants, or drivers? Well, let's say we're building an application where users should be able to find a driver anywhere in the world, but in close proximity to where the user makes the request, who can pick them up and drive them from point A to point B. To accomplish this, we could write database queries based on geometry and geographic data.

This article will introduce one solution to writing so-called geospatial queries to the NoSQL database MongoDB in Node.js.

The source code can be found at gitlab.com/ellej/demo-ride-hailing-app.

Understanding the Task at Hand

When instructing MongoDB to interpret geometry in a query it can either do that on a flat surface or on a sphere. Let’s say we need to calculate the distance between two people located just a few blocks away from each other. Using a flat surface, the distance would most likely not differ too significantly from the actual distance on our spherical Earth. But as the distance between the two increases, so does the inaccuracy of such a calculation.

Therefore, for queries that rely on calculations based on geographical data we need to tell MongoDB to make those calculations as if on a sphere. This is done by creating an index on the database collection with a value of “2dsphere” rather than the flat surface alternative “2d” which we will look at shortly. Additionally, the data has to follow a certain format and be what is called a GeoJSON object which will be introduced in the next section. This kind of indexing makes these query executions very efficient within MongoDB.

Before jumping into what a GeoJSON object is, let’s quickly remind ourselves of what the Uber-like app should do for us. Users should be able to click on a button and then have a driver ready to pick them up as soon as possible. A little more detailed explanation would be that when the user clicks on the button, the application should find all registered drivers within a specified radius of the user who are currently available. Those drivers should be asked whether or not they can go to the user’s location and drive them to another location, whereafter one driver accepts the request.

The question remains, how do we retrieve the location and what do we do with it? Well, MongoDB uses longitude (East-West position) and latitude (North-South position) coordinates internally which, when specified, should always be listed with longitude first and latitude second as such: [lng, lat]. Since this is geographical data, it should exist within the GeoJSON object. Let’s get to it.

The GeoJSON Object and the 2dsphere Index

If you want to store any kind of data pertaining to geography, this should be stored in the GeoJSON object. What makes this object GeoJSON is the existence of a field named type and a field named coordinates. There are a number of different GeoJSON object types that we can specify in the type field such as Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. For instance, a LineString could represent a road, a Polygon could represent a building, a neighborhood, or a state, and a Point could represent the position of one of our registered drivers. The second required field that I mentioned, the coordinates field, should include the object’s coordinates (preferably in an array). Since we will be working with Point types, the coordinates will be two numbers (the longitude and latitude values).

The GeoJSON standard:

// <field> can be anything (e.g. "location")
// <GeoJSON type> can be any of MongoDB's available types
// <coordinates> must be the coordinates (preferably an array)

<field>: { type: <GeoJSON type>, coordinates: <coordinates> }

Example of a GeoJSON Point:

{
  type: "Point",
  coordinates: [-63.607857, 50.786823]
}

Assuming you have worked with database schemas, our application’s drivers collection (analogous to a table in relational databases) will have a driverSchema with a top-level property called location (or whatever you’d like). This property should be embedded with a subdocument, being the GeoJSON object which we will define with its own schema. (Mongoose is used as the ODM on top of MongoDB.)

// /models/driver.js

const mongoose = require('mongoose');
const Schema = mongoose.Schema;

// a GeoJSON object consists of a "type" and "coordinates" field
const geoJsonSchema = new Schema({
  type: {
    type: String,
    default: 'Point'
  },
  // since we will only be dealing with Point types,
  // the coordinates should be an array of numbers
  // (other types may need nested arrays)
  coordinates: {
    type: [Number],
    required: true
  }
});

const driverSchema = new Schema({
  // the field with the GeoJSON object
  // (this field needs to be top-level, not nested)
  location: geoJsonSchema,

  // other fields (simplified)
  name: {
    type: String,
    minlength: 5,
    maxlength: 50,
    required: true
  },
  phone: {
    type: String,
    minlength: 5,
    maxlength: 50,
    required: true
  },
  isAvailable: {
    type: Boolean,
    default: false
  },
  car: {
    numSeats: {
      type: Number,
      min: 2,
      max: 10,
      required: true
    }
  }
});

// create a 2dsphere index on the field holding the GeoJSON data
driverSchema.index({ location: '2dsphere' });

const Driver = mongoose.model('Driver', driverSchema);

module.exports = Driver;

Notice that we created a “2dsphere” index on the drivers collection. The MongoDB function used for creating indexes is actually createIndex(), but since we are using Mongoose we instead have to use index(). If you would rather create the index via the mongo shell have a look at the following examples.

The createIndex method standard:

db.collection.createIndex({ <field> : <index type> })

Example A:

db.drivers.createIndex({ "location" : "2dsphere" })

Example B:

db.drivers.createIndex({ "location.coordinates" : "2dsphere" })

The <field> specifies where to place the index, and the <index type> should be a string of either “2d” or “2dsphere”. In Example A we place an index on the location field, whereas in Example B we instead place it on the coordinates field inside of location. As long as the 2dsphere index is on either a field holding a GeoJSON object or a coordinate pair, the operation will succeed.

Keep in mind that (a) if you are dropping your collections when running tests, make sure to recreate this index before each test, and (b) if you are using Mongoose on top of the MongoDB driver, Mongoose is calling a deprecated function when trying to create the index, so make sure to set useCreateIndex to true in one of two ways:

mongoose.set('useCreateIndex', true);
// or
mongoose.connect(DB_URI, { useCreateIndex: true });

Querying the Database for Nearby Drivers

Having created the driver model, we are now ready to build an API handling the geospatial query for when the user clicks on the button to find a nearby driver. Important to know is that in order to tell MongoDB to execute some logic within the database (rather than retrieving the data and performing the calculations on the server), such as outputting all available drivers close to the user’s location, we can use MongoDB’s operators, specifically the geospatial query operator $geoNear. This operator is considered one stage of an aggregation pipeline. Shortly put, in an aggregation pipeline multiple documents may go through various transformations in each stage to finally be returned as one (aggregated) result. For instance, our aggregate function will have two stages: $geoNear for filtering available drivers within a certain distance; and $limit for limiting the result to a specific number of drivers. Keep in mind that $geoNear always has to be the first stage in the pipeline.

The aggregate() function with the $geoNear standard:

db.collection.aggregate([ { $geoNear: { <geoNear options> } } ])

As you can see, we should specify geoNear options. These options determine what will be done in this stage of the pipeline. There are several options out of which two are required: near, the point to calculate the distance from (represented as a GeoJSON object type); and distanceField, the field on the output document to place the calculated distance (this field will be created in the output and does not have to exist on the schema). Some of the other options are included in the example below, more can be found in the MongoDB docs (see resources at the end.)

// example

db.drivers.aggregate([
  // 1st stage
  {
    $geoNear: {
      near: {
        type: "Point",
        coordinates: [ 42.359985, -71.057161 ]
      },
      distanceField: "dist.calculated",
      maxDistance: 10000,   // in meters
      query: {
        isAvailable: true,
        "car.numSeats": { $gte: 4 }
      },
      spherical: true
    }
  },
  // 2nd stage
  { $limit: 10 }
])

In the above query, we limit the result to drivers within a 10,000-meter (10-kilometer or 6-mile) radius using maxDistance, and only select the drivers where the field isAvailable is set to true and the car has four or more seats, both using query (which follows the regular read operation query syntax). Setting spherical to true forces MongoDB to use spherical geometry when calculating the distance, whereas if set to false (the default) it only uses spherical geometry for 2dsphere indexes and planar geometry for 2d indexes. Even though we are using 2dsphere already, it is useful to know about this option. (If the collection has multiple geospatial indexes, you also need to add a key option to the $geoNear stage specifying which indexed field path to use.) The output of $geoNear will be sorted in order of nearest to farthest.

Getting the Application Up and Running

Taking what we now know, let’s first set up what we need to get the application up and running and then focus on implementing the API. Open a terminal window and create a directory called ride_hailing_app.

mkdir ride_hailing_app
cd ride_hailing_app

Initialize a Node project and install the libraries Express (a web framework for Node.js) and Mongoose (this assumes you have Node.js installed on your machine). To see what versions were used in this project see the package.json file in my GitLab repository.

npm init -y
npm i express mongoose

The folder structure for this project will be:

ride_hailing_app/
|--- models/
|      |--- driver.js
|--- routes/
|      |--- drivers.js
|--- app.js
|--- package.json

In our app.js in the root directory we will start by connecting to the database, configuring what route handler should handle requests to /api/drivers, setting up some (simple) error handling, and having it listen on a port.

// app.js

const mongoose = require('mongoose');
const express = require('express');
const driversRouter = require('./routes/drivers');  // to be implemented

const app = express();

// select and connect to the database
const db = process.env.NODE_ENV === 'test'
  ? 'mongodb://localhost/my_ride_hailing_app_test'
  : 'mongodb://localhost/my_ride_hailing_app';

// make sure to set "useCreateIndex" to true
mongoose
  .connect(db, {
    useCreateIndex: true,
    useNewUrlParser: true,
    useFindAndModify: false,
    useUnifiedTopology: true
  })
  .then(() => console.log(`Connected to ${db}...`))
  .catch((err) => {
    console.error(`Error connecting to ${db}...`);
    process.exit(0);
  });

// parse incoming JSON objects
app.use(express.json());

// have the driversRouter handle all requests to /api/drivers
app.use('/api/drivers', driversRouter);

// handle requests to invalid endpoints
app.use((req, res, next) => {
  const err = new Error(`Route could not be found: ${req.url}`);
  err.status = 404;
  next(err);
});

// handle errors from the request processing pipeline
app.use((err, req, res, next) => {
  res
    .status(err.status || 500)
    .send({ error: err.message || 'Something went wrong...' });
});

// listen for requests
const port = process.env.PORT || 3000;
app.listen(port, () => console.log(`Server running on port ${port}...`));

Focusing on the /routes/drivers.js module, this will be a RESTful API and so when the HTTP GET request hits the /api/drivers endpoint we need some way of retrieving the user’s location sent by the client. Due to the nature of GET we cannot send along the coordinates in the body of the request, instead we grab the information from the URL string. You might have worked with, or at least seen, a query string before--the string following a question mark in the URL--but if not, have a look at an example:

// everything following the question mark is a query string
http://www.example.com/products/jackets?color=blue&sort_by=price

Thus, whenever location data is included in the URL it might look something like this:

// longitude and latitude values being sent as a query string
http://www.myridehailingapp.com/api/drivers?lng=70&lat=50 

Our application is using Express to set up our web server, so Express will parse the query string for us and store all the properties onto an object called query which exists on the req object (or whatever we choose to call the request variable). In our route handler we can now access the lng and lat properties through req.query, which in our case represent the longitude and latitude coordinates but can also be called anything. The user should also send a numSeats value to select the minimum number of seats for the car.

The main focus is on the GET request since that is where we are making use of the geospatial query, but POST, PUT, and DELETE requests will also be handled, nevertheless, in a simplified way in order for us to be able to try out the application more easily.

// /routes/drivers.js

const express = require('express');
const router = express.Router();
const Driver = require('../models/driver');

router.get('/', (req, res, next) => {
  Driver
    .aggregate([
      {
        $geoNear: {
          near: {
            type: 'Point',
            coordinates: [
              parseFloat(req.query.lng),  // longitude first
              parseFloat(req.query.lat)   // latitude second
            ]
          },
          distanceField: 'dist.calculated',
          maxDistance: 10000,
          query: {
            isAvailable: true,
            'car.numSeats': { $gte: parseInt(req.query.numSeats) }
          },
          spherical: true
        }
      },
      { $limit: 10 }
    ])
    .then(nearbyDrivers => res.send(nearbyDrivers))
    .catch(err => next(err));
});

router.post('/', (req, res, next) => {
  Driver
    .create(req.body)
    .then(driver => res.status(201).send(driver))
    .catch(err => next(err));
});

router.put('/:id', (req, res, next) => {
  Driver
    .findByIdAndUpdate(req.params.id, req.body, { new: true })
    .then(driver => {
      if (!driver)
        return next({
          status: 404,
          message: 'There is no driver with the given ID.'
        });

      res.send(driver);
    })
    .catch(err => next(err));
});

router.delete('/:id', (req, res, next) => {
  Driver
    .findOneAndDelete({_id: req.params.id})
    .then(driver => {
      if (!driver)
        return next({
          status: 404,
          message: 'There is no driver with the given ID.'
        });
  
      res.send(driver);
    })
    .catch(err => next(err));
});

module.exports = router;

Important to note is that the only user input validation occurring is through Mongoose’s SchemaType that we defined (only applicable to the POST method), and we are currently not implementing any authentication or authorization, which are important aspects of building more secure applications. If you want to expand on this application with login and signup endpoints, or if you just want to learn more about ways in which to implement security in Node.js, feel free to read another article of mine: The Nitty Gritty of Authentication & Authorization Using Bcrypt and JSON Web Tokens.

Revisiting our MongoDB query using the aggregation pipeline, have a look at the coordinates field on the near geoNear option. When Express parses the query string, the values are stored as strings and not as numbers. In order to convert the string into a floating point number we make use of the built-in JavaScript function parseFloat(). Forgetting this part will leave a nasty bug in our code.

// DON'T
coordinates: [ req.query.lng, req.query.lat ]

// DO
coordinates: [ parseFloat(req.query.lng), parseFloat(req.query.lat) ]

The server response will include the nearest available drivers meeting the specified criteria. Below is an example of a returned document.

// example of a returned document
{
  "_id": "5f034eb9623d3b191ca2e89f",
  "isAvailable": true,
  "name": "Stephen",
  "phone": "617-123-4567",
  "__v": 0,
  "location": {
    "type": "Point",
    "coordinates": [42.356479, -71.068316],
    "_id": "5f035b4b8c93c62a0c7d8547"
  },
  "car": {
    "numSeats": 5
  },
  "dist": {
    "calculated": 1248.20466096759
  }
}

Dummy Data for Testing

Illustrating how to test the application is a bit out of scope for this article. However, provided further below are five JSON objects you may use when creating the drivers (also available in my GitLab repo). I recommend trying out Postman for hitting the HTTP endpoints.

Let's open the terminal and start the server by running the following command from the project's root directory:

node app.js

Using coordinates around Boston, MA, use the JSON dummy data below to make POST requests for each driver to the following endpoint:

// POST
localhost:3000/api/drivers

Dummy data for creating drivers:

// driver meets criteria
// (is within distance, is available, has 4+ seats)
{
  "name": "Stephen",
  "phone": "617-123-4567",
  "location": {
    "type": "Point",
    "coordinates": [42.356479, -71.068316]
  },
  "isAvailable": true,
  "car": {
    "numSeats": 5
  }
}

// driver meets criteria
// (is within distance, is available, has 4+ seats)
{
  "name": "Melinda",
  "phone": "617-123-4567",
  "location": {
    "type": "Point",
    "coordinates": [42.348072, -71.062814]
  },
  "isAvailable": true,
  "car": {
    "numSeats": 4
  }
}

// driver does not meet criteria
// (is within distance, has 4+ seats, but is not available)
{
  "name": "Billy",
  "phone": "617-123-4567",
  "location": {
    "type": "Point",
    "coordinates": [42.339352, -71.052860]
  },
  "isAvailable": false,
  "car": {
    "numSeats": 5
  }
}

// driver does not meet criteria
// (is available, has 4+ seats, but is not within distance)
{
  "name": "Holly",
  "phone": "617-123-4567",
  "location": {
    "type": "Point",
    "coordinates": [42.259639, -71.144717]
  },
  "isAvailable": true,
  "car": {
    "numSeats": 5
  }
}

// driver does not meet criteria
// (is within distance, is available, but does not have 4+ seats)
{
  "name": "Christopher",
  "phone": "617-123-4567",
  "location": {
    "type": "Point",
    "coordinates": [42.365866, -71.076821]
  },
  "isAvailable": true,
  "car": {
    "numSeats": 2
  }
}

Now, let’s say a user needs a driver with a car having at least four seats. You would have to make a GET request to the following endpoint and query string:

// GET
localhost:3000/api/drivers?lng=42.359985&lat=-71.057161&numSeats=4

Looking Ahead

When testing the functionality through Postman, the coordinates were hardcoded. Clearly, we instead ought to locate the user’s actual position. There are various services for gathering geolocation, but there is also the Web API called Geolocation API which can be used directly in the browser (over HTTPS only). It is accessed through navigator.geolocation.getCurrentPosition() and will ask the user for permission to use that data. (This would then have to be done on the frontend.)

Another topic you may research for future development is how to continuously update the database with the drivers coordinates. Web sockets, for instance, allow for real-time communication between client and server and could serve as a good candidate for sending updated positions every few seconds.

Well that was fun! Hopefully you got some valuable information that you can apply to your own project. I bet you will find MongoDB’s geospatial indexing and their supported operators useful for lots of interesting ideas. For instance, you can determine the current neighborhood of a user with $geoIntersects, the number of hotels in that neighborhood with $geoWithin or $centerSphere, and all restaurants within a certain distance similar to how we just did it or by using $nearSphere.

Keep Coding and Stay Curious! :)

Comments powered by Talkyard.

Comments powered byTalkyard.