User accounts can be found in many applications and every company is usually interested in user account analysis – at least in a very basic level of analysis.
This analytics recipe of the Analytica cookbook shows some approaches to user account analysis and provides some actual analysis within and across user accounts.
Starting Point and Assumptions
A few assumptions are being made in this recipe:
- One User – One Document. Each user’s data is collected and stored in one document. As the user updates their data over time, either the data items are updated or their changes are collected over time.
- Basic Static Profile Data. Users have static properties like name, location, or status that do not change very frequently. And, usually it is not necessary to track and to keep the changes over time; updates are usually in-place.
- User Activities. Users have their interesting activities stored within their documents. We introduce a few examples of user activities that might be interesting; but each application is different; so these are only examples. An example of activities are votes of users, their login behavior, the number of comments or suggestions they make. User activities are often logged for further analysis.
- Varying Schema. Depending on the application, all user documents have the exact same schema, while other might have a varying schema, i.e., some user documents have a property that others do not have. For example, a user might not have a status property until one is actively set by the application.
These are the assumptions being made in the following recipe; in your enterprise, you might have different or additional assumptions, of course.
Sample User Account Data Set
The analysis examples are based on the following initial documents (stored in a database called cookbook, and a collection called ‘uam’ for User Account Management). The documents will be extended over the course of the recipe in order to add properties of specific interest for analysis.
{"firstName": "Alice",
"mi": "B.",
"lastName": "Chen"}
{"firstName": "Bob",
"lastName": "Mill"}
User-Specific Analysis
User-specific analysis is analysis being done within a user document. This means that calculated properties are being added to user documents. The input to the expressions that calculate the user-specific value can be from inside or outside that users’ document.
Adding a Calculated Property as Combination of Base Properties
To start, let’s add a ‘fullName’ to each user (using the shell for illustration):
-->set cookbook.uam.fullName = concatenate(firstName,lastName)
Success.
-->get cookbook.uam.fullName
{
"fullName" : "AliceChen"
}
{
"fullName" : "BobMill"
}
-->
This interaction shows that a new property ‘fullName’ has been added to each user. From now on, the calculated property ‘fullName’ can be used as if it was coming directly from the ‘cookbook’ database.
Conditional Logic in Expressions based on Property Values
The first more interesting situation is the fact that only one user has a middle initial (called ‘mi’); the other user does not have that property. So even the two initial simple documents already have two different schemas. What if the ‘fullName’ should include the middle initial? In order to accomplish that a more elaborate expression is necessary that conditionally includes the middle initial:
-->set cookbook.uam.fullName = xlif(mi = null, concatenate(firstName, lastName),
concatenate(firstName, mi, lastName))
Success.
-->get cookbook.uam.fullName
{
"fullName" : "AliceB.Chen"
}
{
"fullName" : "BobMill"
}
-->
A missing property can be checked for ‘null’ as Analytica adds a missing property implicitly with the value ‘null’.
Filtering Anonymous and Named Users based on Predicates
Some web site support anonymous users. Usually there is some convention that indicates in the user’s representation if the user is actually a known user or an anonymous user.
The users are extended with a boolean property called ‘isAnonymous’ and those have a fictitious last name that follows the scheme: ‘user’ appended by a number, e.g. user25.
The extended document set is:
{"firstName": "Alice",
"mi": "B.",
"lastName": "Chen"}
{"firstName": "Bob",
"lastName": "Mill"}
{"firstName": "",
"lastName": "user25",
"isAnonymous": true}
As a side remark, the fact that ‘firstName’ was chosen to be set to the empty string was helpful as the definition of ‘fullName’ does not have to change. If in case of anonymous users the ‘firstName’ would have been omitted, the definition of ‘fullName’ would have to check for ‘null’ for the ‘firstName’ also.
Anonymous users add a third schema that has to be taken into consideration. As for the filtering, the collection of ‘anonymousUam’ is defined as
-->set cookbook.anonymousUam = select(cookbook.uam.where(isAnonymous = true))
Success.
-->get cookbook.anonymousUam
{
"anonymousUam" : [
{
"_id" : "50c680d6a6d56aaec1dae2c8",
"firstName" : "",
"lastName" : "user25",
"isAnonymous" : true,
"fullName" : "user25"
}]
}
-->
This effectively created a calculated collection with only those users that have the property ‘isAnonymous’ set to ‘true’. But how to create another calculated collection with only the non-anonymous users? A structured way could be:
-->set cookbook.uam.isNotAnonymous = xlif(exists(isAnonymous, null), true, false) Success. -->
With this additional property ‘isNotAnonymous’ the named users can now be identified and a separate calculated collection ‘namedUam’ can be defined.
-->set cookbook.namedUam = select(cookbook.uam.where(isNotAnonymous = true))
Success.
-->get cookbook.namedUam
{
"namedUam" : [
{
"_id" : "50c680d6a6d56aaec1dae2c6",
"firstName" : "Alice",
"mi" : "B.",
"lastName" : "Chen",
"fullName" : "AliceB.Chen",
"isNotAnonymous" : true
},
{
"_id" : "50c680d6a6d56aaec1dae2c7",
"firstName" : "Bob",
"lastName" : "Mill",
"fullName" : "BobMill",
"isNotAnonymous" : true
}]
}
-->
Analysis of Login Behavior as Aggregation over Sub-Collections
Often users exhibit behavior (like voting, scoring points, writing blogs, etc.) that is tracked by the application. In case of user account management, one related activity is the number of logins and the number of attempts per login. In this recipe, this activity is used as example.
{"firstName": "Alice",
"mi": "B.",
"lastName": "Chen",
"logins": [
{
"date": "12/1/2012",
"attempts": 1
},
{
"date": "12/2/2012",
"attempts": 3
},
{
"date": "12/3/2012",
"attempts": 4
}
]}
{"firstName": "Bob",
"lastName": "Mill",
"logins": [
{
"date": "12/4/2012",
"attempts": 1
},
{
"date": "12/5/2012",
"attempts": 1
}
]}
{"firstName": "",
"lastName": "user25",
"isAnonymous": true}
For each user, the number of logins and the average of the login attempts will be added as calculated properties:
-->set cookbook.uam.noLogins = count(logins) Success. -->
-->set cookbook.uam.avgAttempts = average(logins.attempts) Success. -->
This concludes the user-specific analysis. The various additions of calculated properties have shown how to use the base properties coming from the database and how to add additional calculated properties providing insight into the user behavior. The resulting data set is quite expressive with the additional information:
-->get cookbook.uam
{
"uam" : [
{
"_id" : "50c7794da6d56aaec1dae2cf",
"firstName" : "Alice",
"mi" : "B.",
"lastName" : "Chen",
"logins" : [
{
"date" : "12/1/2012",
"attempts" : 1
},
{
"date" : "12/2/2012",
"attempts" : 3
},
{
"date" : "12/3/2012",
"attempts" : 4
}],
"fullName" : "AliceB.Chen",
"isNotAnonymous" : true,
"noLogins" : 3,
"avgAttempts" : 2.66666666666667
},
{
"_id" : "50c7794da6d56aaec1dae2d0",
"firstName" : "Bob",
"lastName" : "Mill",
"logins" : [
{
"date" : "12/4/2012",
"attempts" : 1
},
{
"date" : "12/5/2012",
"attempts" : 1
}],
"fullName" : "BobMill",
"isNotAnonymous" : true,
"noLogins" : 2,
"avgAttempts" : 1
},
{
"_id" : "50c7794da6d56aaec1dae2d1",
"firstName" : "",
"lastName" : "user25",
"isAnonymous" : true,
"fullName" : "user25",
"isNotAnonymous" : false,
"noLogins" : null,
"avgAttempts" : null
}]
}
-->
User-Global Analysis
User-global analysis is analysis being done across the user base in order to determine more global properties of the whole user account community as opposed to analysis within a single user. In the following, several analysis of the whole set of user accounts are performed, making use of base as well as calculated properties.
Counting of Documents in Base and Calculated Collections
An initial question every business has is the number of user accounts. In this case we immediately distinguish between names and anonymous user counts. Since the values are across users, they should not be added to individual users, but on the database level.
-->set cookbook.namedUserCount = count(cookbook.namedUam)
Success.
-->get cookbook.namedUserCount
{
"namedUserCount" : 2
}
-->
This calculated property called ‘namedUserCount’ is added at the database level. Therefore, in order to access it, only the database has to be named, and no collection. Counting anonymous users follows the same approach:
-->set cookbook.anonymousUserCount = count(cookbook.anonymousUam)
Success.
-->get cookbook.anonymousUserCount
{
"anonymousUserCount" : 1
}
-->
The total user count can be computed in two different ways. One is to count all users, or to add up the anonymous and named users. The latter is done this way:
-->set cookbook.totalUserCount = sum(cookbook.namedUserCount, cookbook.anonymousUserCount)
Success.
-->get cookbook.totalUserCount
{
"totalUserCount" : 3
}
-->
Aggregation of Document Sub-Collections
With these aggregations established, it is now possible to display ratios and charts for these to provide an overview of the overall numbers.
The total number of logins is done like this:
-->set cookbook.totalNumberLogins = sum(cookbook.uam.count(logins))
Success.
-->get cookbook.totalNumberLogins
{
"totalNumberLogins" : 5
}
-->
The total number of login attempts is aggregated like this:
-->set cookbook.totalNumberAttempts = sum(cookbook.uam.logins.attempts)
Success.
-->get cookbook.totalNumberAttempts
{
"totalNumberAttempts" : 10
}
-->
At this point quite a number of aggregates has been established across the total user account population and those can now be used for reporting.
Meta data Retrieval
With all the document-specific and cross-document aggregates being established in addition to the calculated collections, it would be interesting to actually retrieve a list of all what has been added to the base collections and properties. This can be done as follows:
-->list
{
"calculatedPropertyType" : "document",
"calculatedPropertyName" : "cookbook.uam.fullName",
"calculatedPropertyQuery" : "xlif(eq(mi,null),concatenate(firstName,lastName),concatenate(firstName,mi,lastName))"
}
{
"calculatedPropertyType" : "document",
"calculatedPropertyName" : "cookbook.uam.isNotAnonymous",
"calculatedPropertyQuery" : "xlif(exists(isAnonymous,null),true,false)"
}
{
"calculatedPropertyType" : "document",
"calculatedPropertyName" : "cookbook.uam.noLogins",
"calculatedPropertyQuery" : "count(logins)"
}
{
"calculatedPropertyType" : "document",
"calculatedPropertyName" : "cookbook.uam.avgAttempts",
"calculatedPropertyQuery" : "average(logins.attempts)"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.namedUserCount",
"calculatedPropertyQuery" : "count(cookbook.namedUam)"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.anonymousUserCount",
"calculatedPropertyQuery" : "count(cookbook.anonymousUam)"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.totalUserCount",
"calculatedPropertyQuery" : "sum(cookbook.namedUserCount,cookbook.anonymousUserCount)"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.totalNumberLogins",
"calculatedPropertyQuery" : "sum(cookbook.uam.count(logins))"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.totalNumberAttempts",
"calculatedPropertyQuery" : "sum(cookbook.uam.logins.attempts)"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.anonymousUam",
"calculatedPropertyQuery" : "select(cookbook.uam.xlwhere(eq(isAnonymous,true)))"
}
{
"calculatedPropertyType" : "database",
"calculatedPropertyName" : "cookbook.namedUam",
"calculatedPropertyQuery" : "select(cookbook.uam.xlwhere(eq(isNotAnonymous,true)))"
}
-->
This list shows each and every single calculated property and collection. The first entry indicates if the property was added on a database or document level. The second property shows the name and context. The third shows the computation expression.
With the ‘list’ command it is therefore possible to get an overview of all the additional data that is available for further processing.
Summary
This recipe has shown several approaches to aggregate data on a document and database level in context of user account management. It demonstrated different types of aggregates and filtering on different levels of documents, their properties and sub-collections.
In context of specific applications and enterprises, most likely additional properties are available in user accounts, however, the principles of analytics and filtering apply in the same way.
