Analytica REST Interface in 5 Steps

This example shows the 5 steps from http://www.analytica.com/learn/5steps/ using the REST interface.

Server Installation

First, you need to install the Analytica server. Please download the installer for your operating system and install the server. Once installed, start it up. In the following we assume that the server is running on localhost.

Invoking the REST interface of the Analytica server can be done in different ways. One is to use the URL input box of a browser. Other means are writing a client in your favorite programming language or generic client tools that allow to input REST calls.

Connecting the Sample Database

In order to use the server, you first need to connect the server to a MongoDB database. For this example we’ll use a database called ‘so’ (StackOverflow) at datasets.analytica.com:43657. User name is ‘analytica’ and password is ‘analytica’.

The REST call to accomplish this is

http://localhost:8123/connect/datasets.analytica.com/43657/analytica/analytica/so/MongoDB

0. Exploring the Types

In order to find out about the schema various REST calls can retrieve the types for the database ‘so’. The top level types can be retrieved through

http://localhost:8123/describe/so

This REST calls shows all the collections available inside ‘so’. To see the types of the collection ‘users’, send

http://localhost:8123/describe/so.users

This shows all properties of the documents in ‘users’. As the output shows, all properties of users are of scalar types.

1. What are the maximum, minimum, and average reputation scores for users? [Calculating properties for a collection]

Next, we’ll start by calculating the maximum, minimum, and average reputation for our Stack Overflow users. To do that we need tell Analytica to add a calculated properties to our ‘so’ database. In Analytica, this is called a SET operation. The REST call to determine the maximum reputation is

http://localhost:8123/set/so.maxreputation/max(users.Reputation)

After the call the new calculated property is available and can be retrieved with a GET operation

http://localhost:8123/get/so.maxreputation

It’s type can be retrieved through

http://localhost:8123/describe/so.maxreputation

We can similarly calculate the average reputation and the minimum reputation:

http://localhost:8123/set/so.avgreputation/average(users.Reputation)
http://localhost:8123/set/so.minreputation/min(users.Reputation)

These the max, min, and average are just a few of the functions you can apply to your MongoDB data. Analytica has over a 100 different mathematical, statistical and string manipulation functions [if you have the need for additional functions, just ask! info@analytica.com].

2. Displaying data about our users

Analytica accesses any fields we have in our databases, collections, and documents using what are called GET commands. For example, if we want to see the maximum reputation score we calculated for users, we need to execute

http://localhost:8123/get/so.maxreputation

This will look for the ‘so’ database, and then return the ‘maxreputation’ field contained in that database.

GET operations work on either calculated properties or properties from your database. For example, you can run

http://localhost:8123/get/so.users.DisplayName

This returns the ‘DisplayName’ for all users.

The types of the calculated properties can be retrieved through REST also, for example

http://localhost:8123/describe/so.maxreputation

3. Are users grumpy?! [Adding a property to each document in a collection]

A key action on the StackOverflow site is up voting and down voting. Let’s say we wanted to see if community members were down voting more than they were upvoting (probably not a good thing!). To do this, let’s create a property for each user. We’ll call this a user’s ‘grumpiness’ score. In order to calculate the grumpiness for each user, we’d have to add a property to each user profile document in the ‘so.users’ collection:

http://localhost:8123/set/so.users.grumpiness/UpVotes-DownVotes

In ‘so.users.gumpiness’, grumpiness is the name of the property, and ‘so.users’ defines the documents where the calculated property will be in. You can read this as: for each document in ‘so.users’, create a calculated property called ‘grumpiness’.

The second part defines the value ‘so.users.grumpiness’ will contain. In this case, we’re caclulating a users likability as the difference between the number of ‘UpVotes’ and ‘DownVotes’ he has cast.

Finally, set registers the definition with the Analytica server. When the user now retrieves the ‘so.users’ schema, it will include grumpiness for each of the users. If their grumpiness score is negative then that user has are downvoting more than they have upvoting!

We can see the grumpiness of each user using

http://localhost:8123/get/so.users.grumpiness

and we can compute the average grumpiness by

http://localhost:8123/set/so.avggrumpiness/average(users.grumpiness)

Looking at the result

http://localhost:8123/get/so.avggrumpiness

we can see that the average grumpiness is 10.59 – so our Stack Overflow users are not very grumpy!

4. Where are our users located? [Grouping and ordering]

In order to figure out where are users are located, the data has to be grouped by the ‘Location’ property first and then we can count how many users are in each location. The group by is as follows and it will add a new collection to the database so.

http://localhost:8123/set/so.bylocation/group(users.by(Location))

This results a new collection ‘bylocation’. The group function groups the users in ‘users’ collection. The ‘by’ function defines the property used for grouping. Our collection contains a property called ‘Location’ (which we grouped by) and an array of documents, corresponding to the documents of each user with the same location. Exploring the type tree, you will see that calculated properties (such as grumpiness) are carried over into the new grouped documents.

Since the users are grouped by location, it is possible to count up the users for each location.

http://localhost:8123/set/so.bylocation.numusers/count(users)

This will create a ‘numusers’ property in each of our ‘bylocation’ documents which contains the number of users for that location.

If we’d like to see the top locations for our users, the last thing to do is to order them by our ‘numusers’ property

http://localhost:8123/set/so.toplocations/orderdesc(bylocation.by(numusers))

This command tells Analytica to create a new ordering as a collection called ‘toplocations’. The ‘orderdesc’ function tells Analytica to order a collection in descending order (so the one with the highest will be first), while the ‘by’ function tells Analytica which property to use for ordering.

Retrieving the top locations is done by

http://localhost:8123/get/so.toplocations

Retrieving just the ‘numusers’ is done by

http://localhost:8123/get/so.toplocations.numusers

5. Analyzing subsets of data

The last thing we’re going to do is figure out that stackoverflow has a lot of anonymous users – indicated by their ‘DisplayName’ being set to ‘userXXXX’ where ‘XXXX’ is a random number. We can filter these out by using a SELECT function on our collection. Running select(users.where(not(contains(DisplayName, “user”)))) will select only those documents that do not contain the string ‘user’ in the ‘DisplayName’. We can then assign those to a new calculated collection e.g.

http://localhost:8123/set/so.namedusers/select(users.where(not(contains(DisplayName,"user"))))

will create a new calculated collection called ‘namedusers’ which only contains users that don’t have the string ‘user’ in their ‘DisplayName’.

We can also use the results of the select as input to another function such as

http://localhost:8123/set/so.anonusercount/count(select(users.where(contains(DisplayName,"user"))))

This will just set a property called ‘anonusercount’ on ‘so’ which counts the number of anonymous users (those that have ‘user’ in their ‘DisplayName’).

If you’ve gotten this far, congratulations! You now know how to use the REST API of the Analytica server. You can now connect up to your own MongoDB database and explore it.