
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Analytica</title>
	<atom:link href="http://www.analytica.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.analytica.com</link>
	<description></description>
	<lastBuildDate>Sat, 18 May 2013 00:16:10 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Phased Analytics based on Data Staging</title>
		<link>http://www.analytica.com/2013/02/22/phased-analytics-based-on-data-staging/</link>
		<comments>http://www.analytica.com/2013/02/22/phased-analytics-based-on-data-staging/#comments</comments>
		<pubDate>Sat, 23 Feb 2013 00:17:20 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Calculated Property]]></category>
		<category><![CDATA[Data Staging]]></category>
		<category><![CDATA[Phased Analytics]]></category>
		<category><![CDATA[saveas()]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=1346</guid>
		<description><![CDATA[Analytica provides the concept of calculated properties that are used to augment the documents from the base database. This is the mechanism to add dynamically properties to documents that are relevant for analysis. These additional properties exist within Analytica, but<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/02/22/phased-analytics-based-on-data-staging/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>Analytica provides the concept of calculated properties that are used to augment the documents from the base database. This is the mechanism to add dynamically properties to documents that are relevant for analysis. These additional properties exist within Analytica, but are not reflected in the base database in order to not have any adverse effects.</p>
<p>For example, in a customer management database sales and returns are recorded. For some analytics it is necessary to compute the total sales for a given customer. Calculated properties are used to add a &#8220;total_sales&#8221; property to each customer, adding up all sales, and subtracting from it all returns by that customer. After the &#8220;total_sales&#8221; property is established, for each customer the total sales is known.</p>
<h3>Calculated Property Sharing</h3>
<p>In a situation where several analysts are working in an enterprise, each of them might need the &#8220;total_sales&#8221; property for the specific analysis that needs to be done.</p>
<p>The worst approach would be if each analyst creates his own definition of &#8220;total_sales&#8221;. The reason that this a sub-optimal solution is that some might subtract the returns from the &#8220;total_sales&#8221;, others might not. While &#8220;total_sales&#8221; here is kept simple, in a real situation it might be a lot more complex, easily leading to disagreement between the analysts of how to properly define it.</p>
<p>An approach that is a lot better is to define the calculated property &#8220;total_sales&#8221; once, and share this definition with all analysts. In this case, all follow the same definition and all will have the correctly computed value for each of the customers.</p>
<h3>Phased Analytics</h3>
<p>Even if the definition of a calculated property is shared, each analyst has to apply it. This takes computational effort. Furthermore, analysts might apply it at different times against the base document set. In this case, each analyst might see a different value of &#8220;total_sales&#8221; for the same customer, depending on how much buying and returning activity is taking place.</p>
<p>While all analysts share the correct definition, they do not share necessarily the same base document set. In order to overcome this problem, it is possible to share a common state of the analytics data. In a first phase those calculated properties are executed that are important for more than one analyst. Once this happened, this augmented document set is then shared with the analysts in a second phase.</p>
<p>In this case, all analysts see the same set of customers and all customers have the &#8220;total_sales&#8221; property added to them. Now the analysts work with the same document set where the common calculated properties are present already.</p>
<p>While the example has two phases, more phases are possible, of course. In each phase a set of calculated properties is computed and the result is made available.</p>
<h3>Data Staging</h3>
<p>For efficiency reasons it might be necessary to materialize the calculated properties once they are computed in a given phase. The materialization writes the collections of documents (including the calculated properties) into a staging database. Analytica provides the command &#8220;saveas()&#8221; for this. Each phase has a corresponding staging database.</p>
<p>While from an IT perspective this reduces the repeated computation of calculated properties, from an analytics perspective it looks as if the &#8220;right&#8221; base documents are available. Analysts now access data as if they are base document sets.</p>
<p>This approach provides IT with a mechanism to prepare data for analysis, while for analysts the complexity of data preparation is reduced and this frees them up to the more involved analytics.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/02/22/phased-analytics-based-on-data-staging/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computation based on Order of Documents</title>
		<link>http://www.analytica.com/2013/02/14/computation-based-on-order-of-documents/</link>
		<comments>http://www.analytica.com/2013/02/14/computation-based-on-order-of-documents/#comments</comments>
		<pubDate>Thu, 14 Feb 2013 19:25:03 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Concepts]]></category>
		<category><![CDATA[index()]]></category>
		<category><![CDATA[Ordered Documents]]></category>
		<category><![CDATA[row()]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=1313</guid>
		<description><![CDATA[In many cases it is necessary to compute a property value based on the values of a property in a subsequent document. For example, to show growth over time. Analytica provides several functions that can be of help. One function<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/02/14/computation-based-on-order-of-documents/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>In many cases it is necessary to compute a property value based on the values of a property in a subsequent document. For example, to show growth over time.</p>
<p>Analytica provides several functions that can be of help. One function is order(); this function establishes an order amongst a set of documents (documents are processed in natural order otherwise). A second function is row(); this function determines the index of a document. A third function is index(); this function supports index-based access to document.</p>
<h3>Example Document Set</h3>
<p>Sensor data store measurements of sensors. A possible sensor is the temperature at a location. The following document set shows a few sensor measurements. Assume that these are stored in a database with name &#8216;sensor&#8217;, and in a collection with name &#8216;data&#8217;:</p>
<pre>{"sensor":1, "temp":50}
{"sensor":1, "temp":55}
{"sensor":1, "temp":65}
{"sensor":1, "temp":90}</pre>
<h3>Example Analytics Rules</h3>
<p>The goal of the analysis is, to have each sensor data document have its value and the difference to the next value. The last value&#8217;s difference should be &#8216;null&#8217;.</p>
<p>First, an index property is added to each document:</p>
<pre>set(sensor.data.index, 
    row())</pre>
<p>Then, the temperature of the next document in order is added:</p>
<pre>set(sensor.data.nexttemp, 
    propertyvalue("temp", index(index + 1, sensor.data)))</pre>
<p>Finally, the difference is computed for those documents where &#8216;nexttemp&#8217; has a value:</p>
<pre>set(sensor.data.tempdiff, 
    xlif(nexttemp = "n/a", null, temp - nexttemp))</pre>
<h3>Result</h3>
<p>After the calculated properties are established, the result document set looks like the following:</p>
<pre>{"sensor":1,"temp":50,"index":1,"nexttemp":55,"tempdiff":-5}
{"sensor":1,"temp":55,"index":2,"nexttemp":65,"tempdiff":-10}
{"sensor":1,"temp":65,"index":3,"nexttemp":90,"tempdiff":-25}
{"sensor":1,"temp":90,"index":4,"nexttemp":"n/a","tempdiff":null}</pre>
<h3>Summary</h3>
<p>Analytica supports the computation of values based on an ordered document set. The example has shown how to accomplish computing the regular values as well as a corner case. While the example has shown a basic approach, more complex scenarios can be built on this foundation.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/02/14/computation-based-on-order-of-documents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schema Transformation: Null Values</title>
		<link>http://www.analytica.com/2013/01/30/schema-transformation-null-values/</link>
		<comments>http://www.analytica.com/2013/01/30/schema-transformation-null-values/#comments</comments>
		<pubDate>Wed, 30 Jan 2013 19:21:31 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Calculated Property]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[null]]></category>
		<category><![CDATA[Schema Transformation]]></category>
		<category><![CDATA[Schema-less Documents]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=1243</guid>
		<description><![CDATA[Indicating that a property does not have a value can be done in several ways in schema-free databases. In context JSON it is possible to omit the property set the property value to &#8220;null&#8221; How can documents be transformed so<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/01/30/schema-transformation-null-values/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>Indicating that a property does not have a value can be done in several ways in schema-free databases. In context JSON it is possible to</p>
<ul>
<li>omit the property</li>
<li>set the property value to &#8220;null&#8221;</li>
</ul>
<p>How can documents be transformed so that they have actually a value? The following shows Analytica&#8217;s approach for those cases where this approach is required.</p>
<h3>Test Document Set</h3>
<p>The following dialog establishes the test document set:</p>
<pre>&gt;mongo
MongoDB shell version: 2.2.0
connecting to: test
&gt; use nullValues
switched to db nullValues
&gt; db.nv.save({"a": null})
&gt; db.nv.save({"a":1})
&gt; db.nv.save({"c":2})
&gt; db.nv.find()
{ "_id" : ObjectId("51096c45c905109d93d892d4"), "a" : null }
{ "_id" : ObjectId("51096c4bc905109d93d892d5"), "a" : 1 }
{ "_id" : ObjectId("51096d2ac905109d93d892d6"), "c" : 2 }
&gt;</pre>
<p>The collection &#8220;nv&#8221; has three documents. Two containing a property &#8220;a&#8221;, whereby one of those has given &#8220;a&#8221; the value of &#8220;null&#8221;.</p>
<h3>Transformed Schema</h3>
<p>The goal is to add a calculated property &#8220;b&#8221; that has the value of &#8220;a&#8221; if a is not null, otherwise zero (0). The schema transformation therefore adds a property &#8220;b&#8221; that represents the transformed schema.</p>
<p>The calculated property &#8220;b&#8221; is defined as follows (Analytica Shell syntax):</p>
<pre>--&gt;set nullValues.nv.b = xlif(a=null, 0, a)</pre>
<p>This expression adds a property &#8220;b&#8221; to every document that has either the value zero (0) or the original value of &#8220;a&#8221;.</p>
<p>But what abot the case where the document does not have &#8220;a&#8221; at all? Analytica interprets the absence of a property as &#8220;null&#8221;, so the documents missing the property altogether are actually included.</p>
<p>The documents look now like this in the Analytica server (Analytica Shell syntax) :</p>
<pre>--&gt;get nullValues.nv
{
    "nv" : [
        {
            "_id" : "51096c45c905109d93d892d4",
            "a" : null,
            "b" : 0
        },
        {
            "_id" : "51096c4bc905109d93d892d5",
            "a" : 1,
            "b" : 1
        },
        {
            "_id" : "51096d2ac905109d93d892d6",
            "c" : 2,
            "b" : 0
        }]
}
--&gt;</pre>
<h3>Benefit of Schema Transformation</h3>
<p>Effectively, the above approach makes the schema more uniform across documents. This is the classical space &#8211; time trade-off.</p>
<p>The additional property requires storage space, but makes any analytics expressions easier since those do not have to distinguish if a property has a value or is &#8220;null&#8221;. Otherwise, all analytics expressions would have to include the if-then-else clause. The above approach has the if-then-else once in the calculated property, and from then on forward all analytics expressions can assume a valid value.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/01/30/schema-transformation-null-values/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics: &#8220;Query &#8211; Result&#8221; Approaches Lack Conceptual Power</title>
		<link>http://www.analytica.com/2013/01/23/analytics-query-result-approaches-lack-conceptual-power/</link>
		<comments>http://www.analytica.com/2013/01/23/analytics-query-result-approaches-lack-conceptual-power/#comments</comments>
		<pubDate>Wed, 23 Jan 2013 19:23:39 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Calculated Property]]></category>
		<category><![CDATA[Conceptual Consistency]]></category>
		<category><![CDATA[Data Consistency]]></category>
		<category><![CDATA[Reuse]]></category>
		<category><![CDATA[Snapshot]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=1088</guid>
		<description><![CDATA[Traditional Approach &#8220;Traditional Analytics&#8221; follows the query &#8211; result approach. For example, a query is formulated determining an aggregation over a subset of the base data to be analyzed, and the system returns the requested aggregation as a query result.<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/01/23/analytics-query-result-approaches-lack-conceptual-power/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<h3>Traditional Approach</h3>
<p>&#8220;Traditional Analytics&#8221; follows the query &#8211; result approach. For example, a query is formulated determining an aggregation over a subset of the base data to be analyzed, and the system returns the requested aggregation as a query result.</p>
<p>The client program that issues the query receives the result and will further process it. If the client needs several aggregations, then the client has to issue several queries and will receive several results in return.</p>
<p>Other clients, that might need the same aggregations or a subset of those, also have to issue queries and process the returning results.</p>
<h3>What&#8217;s Wrong With That?</h3>
<p>What&#8217;s wrong with clients running queries to obtain results? From a naive viewpoint, there is nothing wrong with that as this is what clients do. However, from an analytics viewpoint there are several aspects wrong with that approach:</p>
<ul>
<li><strong>Result Consistency</strong>. Clients running the same queries at different times might obtain different results. Unless a snapshot semantics is implemented that ensures that all clients see the same state of the data set, inconsistency is unavoidable.</li>
<li><strong>Conceptual Consistency</strong>. If several clients needs the same aggregations, they might implement those in their own queries. It might be that all these queries are actually semantically equivalent, but they might not be. If as part of a query a concept (like revenue) has to be computed, chances are that different clients implement that concept of &#8216;revenue&#8217; differently depending on their knowledge of this concept.</li>
<li><strong>Query Reuse</strong>. In traditional systems it is not possible to structure queries into reusable queries or reusable parts of queries. If the same query has to be reused, it is done by code duplication leading to change management problems.</li>
<li><strong>Query Management and Supervision</strong>. Query systems do not have an inventory of the queries being executed and consequently it is impossible to manage and to supervise the query definition change and execution.</li>
</ul>
<p>In short, the organization following this approach relies on the clients (aka software engineers or data scientists) to do it right. The system itself has no means to support a consistent query definition, management and supervision.</p>
<h3>Analytica&#8217;s Approach</h3>
<p>Analytica is addressing all the issues within its approach and implementation in a structured and conceptually clear way.</p>
<h6>Result Consistency</h6>
<p>Analytica ensures result consistency by its analytics snapshot semantics. The base data set that is being analyzed is taken as a &#8216;snapshot&#8217; and all analytics is executed against this snapshot. So no matter how many clients are accessing the Analytica server, they all see the same data set in the same state.</p>
<p>The snapshot can be reset in the sense that a new snapshot is taken. From that point on forward all client requests operate on the new snapshot.</p>
<h6>Conceptual Consistency</h6>
<p>In Analytica the concept of a query does not exist. Analytica&#8217;s approach is to augment and to extend the conceptual model the base data set presents with additional concepts. This mechanism is called &#8216;calculated properties&#8217;. These calculated properties are added to the base data set as if they are part of it from the beginning.</p>
<p>For example, an aggregation is a new concept in the base data set. Clients that need the aggregation will access this additional property. Therefore, all clients use the same definition of that aggregation as well as the same value.</p>
<p>Calculated properties ensure conceptual consistency.</p>
<h6>Query Reuse</h6>
<p>Calculated properties are defined by a computation. This computation says how its value is computed and after the computation the value is stored. Whenever a client accesses a calculated property either the value is returned if it exists already, or it is being computed the first time on the snapshot.</p>
<p>Reuse is implemented by the ability of a computation of one calculated property being able to refer to another calculated property. Therefore, concepts that are not in the base data set can be added one-by-one and reused in order to derive to those properties that finally make up the analytics result.</p>
<h6>Query Management and Supervision</h6>
<p>Calculated properties are artifacts supported by Analytica. As such they are explicitly defined at the interface of Analytica. Internally they are explicitly represented and managed.</p>
<p>As a consequence, it is possible to list all calculated properties. This allows the analytics team and its engineers and data scientists to examine the existing definitions at any time.</p>
<p>Since all calculated properties are explicit, their dependencies are determined by Analytica and they are executed in that dependency order. So clients do not have to manage the definitions and their correct execution order themselves.</p>
<p>Change management is applied automatically. When a calculated property is changed, Analytica will determine if any reuse dependencies are broken and will message this to the engineer or data scientist.</p>
<h3>Summary</h3>
<p>Based on the novel approach to analytics by Analytica, all of the issues and problems around traditional query &#8211; result approaches have been resolved and addressed in a consistent manner.</p>
<p>Emphasis is given to conceptual and data consistency so that organizations using Analytica can derive to a better and consistent analytics of their base data sets.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/01/23/analytics-query-result-approaches-lack-conceptual-power/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytica and Snapshot Semantics for Reporting</title>
		<link>http://www.analytica.com/2013/01/16/analytica-and-snapshot-semantics-for-reporting/</link>
		<comments>http://www.analytica.com/2013/01/16/analytica-and-snapshot-semantics-for-reporting/#comments</comments>
		<pubDate>Wed, 16 Jan 2013 19:57:36 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Analytics consistency]]></category>
		<category><![CDATA[Snapshot]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=1041</guid>
		<description><![CDATA[In many cases, analytics is performed on a transactional data set that is constantly changing due to end user activities or B2B connections. The rate of change might be small or large, depending on the application system, the date and<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/01/16/analytica-and-snapshot-semantics-for-reporting/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>In many cases, analytics is performed on a transactional data set that is constantly changing due to end user activities or B2B connections. The rate of change might be small or large, depending on the application system, the date and time and other factors.</p>
<p>Analytics operations access the transactional data set to retrieve the required data. Once the data is required, the analytics operations take place. A slightly different mode is that the analytics operations are taking place interleaved with the retrieval of the transactional data.</p>
<h3>Analytics Result Inconsistency</h3>
<p>Analytics operations might take longer than the time of transactional data update and there might be several analytics operations necessary to execute for a desired report. In principle, if nothing special is done, each of the analytics operations see a different state of the transactional data set. This in turn means that they can be inconsistent wrt. each other.</p>
<p>As example, if one analytics operation counts the number of on-line users (users logged in), and another analytics operations adds the total time the currently on-line user spent on-line, then these two numbers will be inconsistent in a highly active web site as the second analytics operation might add times from users that the first one did not see.</p>
<h3>Analytica Snaphots</h3>
<p>Analytica addresses inconsistency explicitly by implementing a snapshot semantics from the analytics viewpoint. Analytica fetches the required transactional data before starting any analytics operation. This means that all analytics operations are taking place on a fixed and non-changing transactional data set.</p>
<p>This means that no matter how many analytics operations are taking place, all share the same version or status of the transactional data set and so the results are consistent with each other (given that the analytics rules are correct).</p>
<h3>Snapshot Refresh</h3>
<p>With a snapshot semantics the problem of data staleness appears. This is addressed by being able to tell Analytica to refresh the transactional data set and to re-run the analytics operations on the newly acquired transactional data set. (The REST API for that is called executeallsetstatements()).</p>
<h3>Life Cycle</h3>
<p>With the ability to snapshot the transactional data set and refresh it at any time, customers now can implement their required analytics lifecycle. For example,</p>
<ul>
<li>A customer can refresh his analysis every 2 hours</li>
<li>A customer can refresh his analysis every morning at 6am for all employees starting their workday to see fresh, and also the same data</li>
<li>A customer can provide an environment where end users can refresh their own data, but not more than once an hour</li>
</ul>
<p>These are only a few examples how he ability to snapshot and refresh provides a customizable analytics experience while preserving data consistency.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/01/16/analytica-and-snapshot-semantics-for-reporting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Analytics over a REST API?</title>
		<link>http://www.analytica.com/2013/01/10/why-analytics-over-a-rest-api/</link>
		<comments>http://www.analytica.com/2013/01/10/why-analytics-over-a-rest-api/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 17:18:34 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[REST API]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=985</guid>
		<description><![CDATA[Analytica is making all its functionality available over a REST API. Why is that important? Cool and Interesting Technology From a distributed computing viewpoint, REST is cool technology due to the programming language and platform independence, paired with the availability<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/01/10/why-analytics-over-a-rest-api/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>Analytica is making all its functionality available over a REST API. Why is that important?</p>
<h3>Cool and Interesting Technology</h3>
<p>From a distributed computing viewpoint, REST is cool technology due to the programming language and platform independence, paired with the availability on most platforms. While this is not specifically relevant for Analytica per se, it is important from an architecture and infrastructure perspective.</p>
<h3>Integration into Existing Infrastructure</h3>
<p>REST, because of its availability, it a good mechanism for integrating analytics functionality into any existing infrastructure. Analytica does not force a separate technology stack to be created, but is designed to be integrated into an existing infrastructure. While Analytica provides end user clients, they are not the only avenue to access analytics functionality. REST is equally powerful and so an explicit decision can be made to integrate Analytica into existing infrastructure.</p>
<h3>Programming Language Independence</h3>
<p>Most current programming languages support REST invocations (over the HTTP protocol). Plus, the data structures are implemented in JSON, which is an ASCII representation. Based on this, Analytica can be used from any programming language and does not force developers to learn a new programming language at all when using Analytica.</p>
<h3>Multiple Consistent User Interfaces</h3>
<p>The REST API provides not only programming language independence, but also end user client independence. Analytica itself provides several clients (Excel, Shell, iPhone App). Since all these end user clients are based on the REST API, their functionality is consistent across these user interfaces. Furthermore, customers of Analytica can build their own custom user interface clients without running the risk of inconsistencies or missing out on functionality.</p>
<h3>Existing User Interface Integration</h3>
<p>Many companies have already different forms of dash boards and analytics interfaces. With Analytica in the mix to support the NoSQL database native access, the companies can integrate Analytica functionality into their existing user interface infrastructure making the analytics functionality seamless to their end users. REST is a great way to accomplish this integration.</p>
<h3>Single Data Representation Paradigm</h3>
<p>Analytica operates natively on the JSON representation of document-oriented databases. In addition, the result of analytics rules is itself represented in JSON. The REST API returns all results as JSON documents (without exception). This means that Analytica provides a consistent data representation paradigm and the continued processing of analytics results in JSON is guaranteed.</p>
<h3>Development Assurance and Efficiency</h3>
<p>Last, but not least, a REST API makes the interactions and the data transfer (both invocation and result) explicit. This supports the development process greatly as during development and debugging it is possible to explicitly look at the communication patterns and details.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/01/10/why-analytics-over-a-rest-api/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtual Collections: Document Selection for Analytics</title>
		<link>http://www.analytica.com/2013/01/03/virtual-collections-document-selection-for-analytics/</link>
		<comments>http://www.analytica.com/2013/01/03/virtual-collections-document-selection-for-analytics/#comments</comments>
		<pubDate>Thu, 03 Jan 2013 21:50:46 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Database Query]]></category>
		<category><![CDATA[Document Selection]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Virtual Collection]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=873</guid>
		<description><![CDATA[The document-oriented data model suggests to collect all properties for a concept or object within one document. In addition, the application system decides on the properties it needs in order to provide the required functionality to end users. Furthermore, documents<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2013/01/03/virtual-collections-document-selection-for-analytics/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>The document-oriented data model suggests to collect all properties for a concept or object within one document. In addition, the application system decides on the properties it needs in order to provide the required functionality to end users. Furthermore, documents representing the same concepts (e.g., user accounts) are usually stored in one collection.</p>
<p>From the perspective of analysis, however, only a subset of the documents of a collection might be needed and others are irrelevant. For example, when analyzing user accounts, only named user accounts might be interesting, but not anonymous user accounts.</p>
<p>Operating on all documents causes operational inefficiency if only a subset of the documents is relevant. From an analytics perspective, only relevant documents should be part of the analyzed data set. Document selection is required to increase efficiency. Not only that, if only the relevant documents are available, the analytics rules will become easier as they do not have to exclude the non-relevant ones.</p>
<h3>Document Selection</h3>
<p>Removing documents not needed for analysis is called document selection. Document selection means that only those documents are made available to analytics that are relevant for analysis. Those that are not relevant are not made available. Instead of using the documents of the base collections, selected documents are accessed.</p>
<p>An initial approach in Analytica is a two-step approach:</p>
<ul>
<li>For each base collection, create a calculated collection that only contains the relevant documents.</li>
<li>Define the analysis on the calculated collections.</li>
</ul>
<p>While in principle this approach is valid, it is inefficient as Analytica has to create and to maintain calculated collections to achieve the document selection.</p>
<p>A more suitable approach is Virtual Collections.</p>
<h3>Virtual Collections</h3>
<p>Virtual collections is a concept introduced by Analytica. Virtual collections are defined within Analytica and are available as if there were defined directly in the database (hence the name &#8216;virtual&#8217; collection). However, virtual collections are not expressed as calculated properties; instead, they are expressed as native database queries.</p>
<p>For example, the following virtual collection selects named users from a collection containing user accounts (Analytica shell syntax):</p>
<pre>setvc namedUsers AccountManagementDB uam "{NamedUser: true}"</pre>
<p>The command &#8216;setvc&#8217; has four parameters</p>
<ol>
<li><strong>Virtual collection name</strong>. The first parameter is the name of the virtual collection being created. In the above example this is &#8216;namedUsers&#8217;. Once this definition is added into the system, a collection called &#8216;namedUsers&#8217; is available as if it were coming directly from the database.</li>
<li><strong>Database name</strong>. The second parameter is the name of the database that contains base collections. And this is also the database that will contain the virtual collection. This is not a true containment as the virtual collection will not be stored back into the database, however, from a naming convention, the virtual collections belongs to that database.</li>
<li><strong>Base collection name</strong>. The third parameter is the base collection from which the documents of the virtual collection are selected.</li>
<li><strong>Native query</strong>. The last parameter is the native database query (using the query language of the database). This query is performed in order to select the documents for the virtual collection. In the above example, only those documents are selected where the property &#8216;NamedUser&#8217; has the value true.</li>
</ol>
<p>It is possible to define as many virtual collections as needed in Analytica; there is no restriction.</p>
<h3>Operational Efficiency and Analysis Simplification</h3>
<p>Virtual collections provide two major benefits:</p>
<ul>
<li><strong>Operational efficiency</strong>. Operational efficiency is increased as only those documents are made visible to Analytica that are truly needed. The documents that are not needed will not be processed by Analytica at all.</li>
<li><strong>Analysis simplification</strong>. Since the documents are already selected by means of the virtual collection, the selection does not have to be made within the analytics expressions. This makes the expressions a lot simpler and easier to understand.</li>
</ul>
<p>In summary, the benefits of virtual collections are significant and if a meaningful selection is possible, virtual collections are a perfect approach and methodology.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2013/01/03/virtual-collections-document-selection-for-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Accessing MongoDB serverStatus, dbStats and collStats</title>
		<link>http://www.analytica.com/2012/12/10/accessing-mongodb-serverstatus-dbstats-and-collstats/</link>
		<comments>http://www.analytica.com/2012/12/10/accessing-mongodb-serverstatus-dbstats-and-collstats/#comments</comments>
		<pubDate>Mon, 10 Dec 2012 21:54:26 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Concepts]]></category>
		<category><![CDATA[CollStats]]></category>
		<category><![CDATA[DatabaseStats]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Operational Analytics]]></category>
		<category><![CDATA[ServerStats]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=662</guid>
		<description><![CDATA[Aside from analyzing business or application data, sometimes it is important to analyze the behavior of a MongoDB server, database or individual collections. Special Operational Collections In order to support users in accomplishing this, Analytica makes key operational data sets<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2012/12/10/accessing-mongodb-serverstatus-dbstats-and-collstats/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>Aside from analyzing business or application data, sometimes it is important to analyze the behavior of a MongoDB server, database or individual collections.</p>
<h3>Special Operational Collections</h3>
<p>In order to support users in accomplishing this, Analytica makes key operational data sets available as special collections within each database that it connects to. There are 3 special collections for each MongoDB database:</p>
<ul>
<li>__serverstats__</li>
<li>__dbstats__</li>
<li>__collstats__</li>
</ul>
<p>&#8216;__serverstats__&#8217; exposes the status of the MongoDB server, &#8216;__dbstats__&#8217; exposes the database status and &#8216;__collstats__&#8217; exposes the status of the collections of the database.</p>
<p>The content of the collections is the result of the respective MongoDB commands that are exposed on the server, database and collection level. MongoDB does not make these data available as collections itself, however, in order to be able to use the data inside Analytica for analytics, Analytica makes them available. Analytica accesses the data sets programmatically and exposes them as the special collections.</p>
<h3>Operational Collection Content</h3>
<p>In the following an example for each of the three operational collections is given. The system is connected to a database &#8216;GD&#8217; and the contents of the 3 collections is shown for &#8216;GD&#8217;.</p>
<h5>Serverstats</h5>
<p>The following is the content of &#8216;__serverstats__&#8217;:</p>
<pre>--&gt;get GD.__serverstats__
{
    "__serverstats__" : [
        {
            "host" : "THEFORCE",
            "version" : "2.2.0",
            "process" : "mongod",
            "pid" : 1612,
            "uptime" : 385506,
            "uptimeMillis" : 385505975,
            "uptimeEstimate" : 203152,
            "localTime" : "12/17/2012 12:00:00 AM",
            "locks" : [
                {
                    "dot" : [
                        {
                            "timeLockedMicros" : [
                                {
                                    "R" : 5972047,
                                    "W" : 73724293
                                }],
                            "timeAcquiringMicros" : [
                                {
                                    "R" : 15774116,
                                    "W" : 2729949
                                }]
                        }],
                    "GD" : [
                        {
                            "timeLockedMicros" : [
                                {
                                    "r" : 929843,
                                    "w" : 0
                                }],
                            "timeAcquiringMicros" : [
                                {
                                    "r" : 8610,
                                    "w" : 0
                                }]
                        }]
                }],
            "globalLock" : [
                {
                    "totalTime" : 385505975000,
                    "lockTime" : 73724293,
                    "currentQueue" : [
                        {
                            "total" : 0,
                            "readers" : 0,
                            "writers" : 0
                        }],
                    "activeClients" : [
                        {
                            "total" : 0,
                            "readers" : 0,
                            "writers" : 0
                        }]
                }],
            "mem" : [
                {
                    "bits" : 64,
                    "resident" : 44,
                    "virtual" : 14202,
                    "supported" : true,
                    "mapped" : 7023,
                    "mappedWithJournal" : 14046
                }],
            "connections" : [
                {
                    "current" : 1,
                    "available" : 19999
                }],
            "extra_info" : [
                {
                    "note" : "fields vary by platform",
                    "page_faults" : 665747,
                    "usagePageFileMB" : 215,
                    "totalPageFileMB" : 32720,
                    "availPageFileMB" : 26461,
                    "ramMB" : 16361
                }],
            "indexCounters" : [
                {
                    "note" : "not supported on this platform"
                }],
            "backgroundFlushing" : [
                {
                    "flushes" : 3414,
                    "total_ms" : 120661,
                    "average_ms" : 35.3429994141769,
                    "last_ms" : 10,
                    "last_finished" : "12/17/2012 12:00:00 AM"
                }],
            "cursors" : [
                {
                    "totalOpen" : 0,
                    "clientCursors_size" : 0,
                    "timedOut" : 0
                }],
            "network" : [
                {
                    "bytesIn" : 1903842,
                    "bytesOut" : 49941988,
                    "numRequests" : 22259
                }],
            "opcounters" : [
                {
                    "insert" : 1943,
                    "query" : 95048,
                    "update" : 17,
                    "delete" : 0,
                    "getmore" : 1702,
                    "command" : 9097
                }],
            "asserts" : [
                {
                    "regular" : 0,
                    "warning" : 3,
                    "msg" : 0,
                    "user" : 12,
                    "rollovers" : 0
                }],
            "writeBacksQueued" : false,
            "dur" : [
                {
                    "commits" : 30,
                    "journaledMB" : 0,
                    "writeToDataFilesMB" : 0,
                    "compression" : 0,
                    "commitsInWriteLock" : 0,
                    "earlyCommits" : 0,
                    "timeMs" : [
                        {
                            "dt" : 3060,
                            "prepLogBuffer" : 0,
                            "writeToJournal" : 0,
                            "writeToDataFiles" : 0,
                            "remapPrivateView" : 0
                        }]
                }],
            "recordStats" : [
                {
                    "accessesNotInMemory" : 2243,
                    "pageFaultExceptionsThrown" : 194,
                    "GD" : [
                        {
                            "accessesNotInMemory" : 15,
                            "pageFaultExceptionsThrown" : 14
                        }]
                }],
            "ok" : 1
        }]
}
--&gt;</pre>
<p>There is one property called &#8220;dot&#8221;. This property has been renamed by Analytica. Originally, the property name is &#8220;.&#8221;; however, MongoDB does not allow dots being part of property names; in order to make this available and allow users to define analytics on it, the &#8220;.&#8221; was renamed to &#8220;dot&#8221;.</p>
<h5>Dbstats</h5>
<p>The content of &#8216;__dbstats__&#8217; is:</p>
<pre>--&gt;get GD.__dbstats__
{
    "__dbstats__" : [
        {
            "db" : "GD",
            "collections" : 3,
            "objects" : 10,
            "avgObjSize" : 624.4,
            "dataSize" : 6244,
            "storageSize" : 36864,
            "numExtents" : 3,
            "indexes" : 1,
            "indexSize" : 8176,
            "fileSize" : 67108864,
            "nsSizeMB" : 16,
            "ok" : 1
        }]
}
--&gt;</pre>
<h5>Collstats</h5>
<p>The content of &#8216;__collstats__&#8217; is:</p>
<pre>--&gt;get GD.__collstats__
{
    "__collstats__" : [
        {
            "ns" : "GD.People",
            "count" : 6,
            "size" : 6084,
            "avgObjSize" : 1014,
            "storageSize" : 28672,
            "numExtents" : 1,
            "nindexes" : 1,
            "lastExtentSize" : 28672,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 0,
            "totalIndexSize" : 8176,
            "indexSizes" : [
                {
                    "_id_" : 8176
                }],
            "ok" : 1
        },
        {
            "ns" : "GD.system.indexes",
            "count" : 1,
            "size" : 64,
            "avgObjSize" : 64,
            "storageSize" : 4096,
            "numExtents" : 1,
            "nindexes" : 0,
            "lastExtentSize" : 4096,
            "paddingFactor" : 1,
            "systemFlags" : 0,
            "userFlags" : 0,
            "totalIndexSize" : 0,
            "indexSizes" : [],
            "ok" : 1
        }]
}
--&gt;</pre>
<h3>Operational Analytics</h3>
<p>As it turns out, each of the 3 operational collections has one document each. And because Analytica implements the customary snapshot semantics in context of analytics, the content of the 3 operational collections will not change unless an explicit refresh is done by the user.</p>
<p>In context of operational analytics of the MongDB server and databases, it is of course desirable to have a dynamic or time-series-based analysis available. In order to accomplish this, the various status data need to be collected regularly in order to be able to determine trending or aggregation of the operational data.</p>
<p>Analytica is going to provide a monitoring application that collects the operational data in regular intervals as specified by the user. Based on the collected operational data, time-series based operational analytics of the MongoDB server and databases is possible. Stay tuned for a cookbook entry on this specific topic soon.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2012/12/10/accessing-mongodb-serverstatus-dbstats-and-collstats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Document &#8216;Uniformification&#8217;</title>
		<link>http://www.analytica.com/2012/12/09/document-uniformification/</link>
		<comments>http://www.analytica.com/2012/12/09/document-uniformification/#comments</comments>
		<pubDate>Sun, 09 Dec 2012 20:16:20 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[Analytica]]></category>
		<category><![CDATA[Schema-less Documents]]></category>
		<category><![CDATA[Uniformification]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=651</guid>
		<description><![CDATA[A contrasting situation is building up between the &#8216;schema-less&#8217; approach of document-oriented NoSQL databases and JSON-native analytics tools like Analytica. Documents that do not have to comply to a global document schema are being regarded as a plus in terms<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2012/12/09/document-uniformification/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>A contrasting situation is building up between the &#8216;schema-less&#8217; approach of document-oriented NoSQL databases and JSON-native analytics tools like Analytica.</p>
<p>Documents that do not have to comply to a global document schema are being regarded as a plus in terms of application system programming, however, in context of analysis the fact that each document might have its own schema can be quite problematic when computing aggregates.</p>
<h3>Example of a Problem introduced by different Schemas</h3>
<p>Let&#8217;s look at two documents (stored in a collection called &#8216;het&#8217; for heterogeneous):</p>
<pre>{ "_id" : ObjectId("50c65a7cb4dbabc20a115510"), 
  "x" : "d1", 
  "v" : 5 }
{ "_id" : ObjectId("50c65a8cb4dbabc20a115511"), 
  "y" : "d2" }</pre>
<p>For these two documents, let&#8217;s implement count(v) and avg(v). In MongoDB there is only one aggregation query necessary to compute both values:</p>
<pre>db.het.aggregate(
    { $group : {
        _id : 1,
        noDocs: { $sum : 1},
        total: { $sum : "$v" },
        average: { $avg : "$v" }
    }}
);</pre>
<p>The result (copied from the shell) is:</p>
<pre>{"result" : [
        {
                "_id" : 1,
                "noDocs" : 2,
                "total" : 5,
                "average" : 5
        }
],
"ok" : 1}</pre>
<p>This example clearly shows results that would been regarded as &#8216;wrong&#8217; in a relational world, but what about a schema-less situation? Is the average of 5 correct or false?</p>
<p>At this point we could regard the average being correct in a schema-less world; and we&#8217;d be done at this point. However, if it is necessary to change the way the average is computed to take missing properties in account, then different alternative paths can be taken:</p>
<ul>
<li>Functions that assume schema-less documents</li>
<li>Build explicit schema-aware expressions</li>
<li>General &#8216;Uniformification&#8217; of documents</li>
<li>Analytica&#8217;s Uniformification approach</li>
</ul>
<p>The alternatives are discussed next.</p>
<h3>Functions for schema-less Documents</h3>
<p>One path is to change the formula of how averages are computed. This means that average is not computed based on the number of the values, but the number of the values is given to an average() function separately from the number of documents. So instead of average(&lt;set of values&gt;) it is average(&lt;set of values&gt;,&lt;number of documents&gt;).</p>
<p>This approach basically means that all analytics functions have to be redefined based on the assumption that each document can potentially have its own schema.</p>
<h3>Explicit Schema-aware Expressions</h3>
<p>Instead of rewriting all functions, it is possible to build the appropriate expressions directly. For example, the average can be computed as the sum of the values divided by the number of documents. Instead of asking MongoDB to compute the average, the application computes the average itself by using the &#8216;counting&#8217; result of MongoDB: 5 / 2 = 2.5</p>
<p>In this alternative, the author of the expression has to be aware of the fact that the documents are not following the same schema and the author has to be aware of all variations that can exist in order to write the proper expressions.</p>
<h3>General &#8216;Uniformification&#8217; Approach</h3>
<p>An alternative approach to changing the functions is to make all documents follow the same schema. This then suggests a two step approach</p>
<ul>
<li>Make the schema of all documents in a collection uniform</li>
<li>Perform analytics on the assumption of uniform document schemas</li>
</ul>
<p>While this introduces a nice homogeneous schema approach, in principle every time a document is inserted (or updated or deleted) possibly the uniformification has to happen again to take any additional changes into consideration.</p>
<h3>Analytica&#8217;s Uniformification Approach</h3>
<p>Analytica does an implicit uniformification that covers many cases and might be sufficient in general. Analytica adds implicitly missing properties. This means that if Analytica knows about a property that is part of one document, but not of another, it implicitly adds one with value &#8216;null&#8217;. The result of this approach is for the above example (using Analytica&#8217;s shell):</p>
<pre>--&gt;executexl average(uniform.het.v)
{
 "average" : 2.5
}
--&gt;</pre>
<p>What is going here is quite involved. The average() function adds up all values of &#8216;v&#8217;. It uses a sum() function that knows that &#8216;null&#8217; does not contribute to the result of sum(). However, the average() function itself counts all documents, including those where &#8216;v&#8217; has the value &#8216;null&#8217;. So it arrives to a value of 2.5 without requiring any explicit transformation of the documents.</p>
<p>While this approach covers many cases, it might not solve all issues around heterogeneity. For example, if the value a missing property should be assumed to be the empty string instead of &#8216;null&#8217;.</p>
<h3>Discussion</h3>
<p>While NoSQL databases that provide schema-less documents are an interesting and productive approach from an application system development perspective, from an analytics perspective they can prove being a challenge, as this blog outlined.</p>
<p>However, Analytica and its understanding of the document structure supports many heterogeneous situations &#8216;out-of-the-box&#8217; reducing the explicit transformation requirements significantly.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2012/12/09/document-uniformification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Global vs. Document-Local Calculated (&#8216;Virtual&#8217;) Properties</title>
		<link>http://www.analytica.com/2012/12/05/global-vs-document-local-calculated-virtual-properties/</link>
		<comments>http://www.analytica.com/2012/12/05/global-vs-document-local-calculated-virtual-properties/#comments</comments>
		<pubDate>Thu, 06 Dec 2012 01:32:12 +0000</pubDate>
		<dc:creator>chbussler</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Concepts]]></category>
		<category><![CDATA[Calculated Property]]></category>
		<category><![CDATA[Universe]]></category>
		<category><![CDATA[Virtual Property]]></category>

		<guid isPermaLink="false">http://www.analytica.com/?p=593</guid>
		<description><![CDATA[Analytica supports the concept of calculated properties, aka, virtual properties. The name &#8216;calculated&#8217; comes from the fact that the values are not retrieved from the database, but are dynamically calculated. &#8216;Virtual&#8217; has the same co-notation, i.e., not coming from the<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://www.analytica.com/2012/12/05/global-vs-document-local-calculated-virtual-properties/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
			<content:encoded><![CDATA[<p>Analytica supports the concept of calculated properties, aka, virtual properties. The name &#8216;calculated&#8217; comes from the fact that the values are not retrieved from the database, but are dynamically calculated. &#8216;Virtual&#8217; has the same co-notation, i.e., not coming from the database.</p>
<p>Calculated properties are a major concept of Analytica since they support storing computed values into documents. In terms of analytics, calculated properties are used to compute and to store values that are relevant for reporting, e.g., aggregations, subsets of documents, and so on.</p>
<h3>Document-Local Virtual Properties</h3>
<p>A first example was introduced in this blog: <a title="http://www.analytica.com/2012/11/30/nosql-native-reporting-on-json-data-analytica-and-mongodb/" href="http://www.analytica.com/2012/11/30/nosql-native-reporting-on-json-data-analytica-and-mongodb/" target="_blank">http://www.analytica.com/2012/11/30/nosql-native-reporting-on-json-data-analytica-and-mongodb/</a>. The example shows how to compute the total score of all games a player played &#8211; and that for each known player.</p>
<p>It is called &#8216;document-local&#8217; since each document representing a player was extended by &#8216;totalscore&#8217; that has a different value for each player.</p>
<h3>Document-Global Virtual Properties</h3>
<p>Sometimes it is necessary to compute values across documents, but not for each document. For example, an interesting use case is the maximum score any player has achieved. How would that be computed? And, more importantly, where would that be stored as a property so that its value can be retrieved?</p>
<p>Analytica introduces the notion of virtual properties that can be added at the database level, across all documents in all collections. The following dialog on the shell shows the approach:</p>
<pre>--&gt;connect localhost 27017 --- --- gms MongoDB
Connected data sources:
 DataSource: gms (Type: MongoDB, Host: localhost, Port: 27017)</pre>
<p>The first command connects to a database called &#8216;gms&#8217;. This database has a collection called &#8216;players&#8217;. Each document in this collection has a sub-collection &#8216;games&#8217;, each of which has a sub-collection of &#8216;sessions&#8217;. Each session has the accomplished &#8216;score&#8217;.</p>
<pre>--&gt;set gms.maxScore = max(players.games.sessions.score)
 Success.</pre>
<p>The second command is a &#8216;set&#8217; statement. This set statement creates a virtual property called &#8216;maxScore&#8217; and the value is maximum score across all players, their games and sessions. The value is then stored in &#8216;maxScore&#8217;.</p>
<pre>--&gt;get gms.maxScore
 {
 "maxScore" : 712500
 }
--&gt;</pre>
<p>In order to retrieve the value of the virtual property &#8216;maxScore&#8217;, the &#8216;get&#8217; command is issued asking for the &#8216;maxScore&#8217; value stored at the database level. The result is 712500.</p>
<p>This example has shown that it is not only possible to compute values across documents in collections, but also store them on a document-global level.</p>
<p>Now let&#8217;s add a virtual property &#8216;minScore&#8217; also:</p>
<pre>--&gt;set gms.minScore = min(players.games.sessions.score)
Success.
--&gt;get gms.minScore
{
    "minScore" : 100
}
--&gt;</pre>
<p>At this point, there are two virtual properties defined at the database level.</p>
<h3>Database-Global Virtual Properties</h3>
<p>What if values have to be computed across databases? Analytica has a solution for this case also. The root concept is called &#8216;Universe&#8217;. All databases are inside a Universe and therefore the Universe is the place to store virtual properties that are global across databases.</p>
<p>The following shell dialog shows how virtual properties are defined on the level of the Universe. Since we only have one connected databases in this example, we use its values.</p>
<pre>--&gt;set UNIVERSE.minPlusMax = sum(gms.minScore,gms.maxScore)
Success.
--&gt;get UNIVERSE.minPlusMax
{
    "minPlusMax" : 712600
}
--&gt;</pre>
<p>Now the Universe has a virtual property that contains the sum of the two values that are on the level of the database &#8216;gms&#8217;.</p>
<h3>Summary</h3>
<p>It is possible in Analytica to store virtual properties not only inside the document (and its sub-collections), but also across collections on the database level as well as across databases on the Universe level. This supports a natural and easy way to scope virtual properties and their values as needed.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.analytica.com/2012/12/05/global-vs-document-local-calculated-virtual-properties/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
