Filtering for Latest Data in a Universe

Summary

This article looks at a reporting requirement where we need to restrict data to the latest data, for example, to only show the latest account balance or the latest action item for work streams in a project.

Our solution is to implement a series of predefined conditions one for each dimension that we need to filter for latest data by.

Requirement

Consider the following table that is displaying a list of accounts, the account type and the account balance for consecutive months.

Account Number Account Type Balance Date Account Balance
10023409 Savings 01-Jan-09 12,332
10023409 Savings 01-Feb-09 13,543
10023409 Savings 01-Mar-09 15,228
10026701 Deposit 01-Jan-09 987
10026701 Deposit 01-Feb-09 1,234
10026701 Deposit 01-Mar-09 823
10042303 Savings 01-Jan-09 1,000
10042303 Savings 01-Feb-09 1,100
10042303 Savings 01-Mar-09 1,200
10028706 Deposit 01-Jan-09 80
10028706 Deposit 01-Feb-09 -40
10028706 Deposit 01-Mar-09 60

The business have asked that in our BusinessObjects universe we provide a mechanism to display latest account balance by either account or by account type.

Technical Details

In SQL you would write code similar to,

SELECT
  account_number, account_balance
FROM
  fact_accounts
WHERE
  balance_date IN
  (SELECT max(balance_date) as MaxDate
   FROM   fact_accounts fa
   WHERE  fa.account_number = account_number
   GROUP BY fa.account_number)

The above will display each account and the balance for the latest date – 1 March 2009. Similarly the following query will display account balance but this time by account type,

SELECT
  account_number, account_balance
FROM
  fact_accounts
WHERE
  account_date IN
  (SELECT max(account_date) as MaxDate
   FROM   fact_accounts fa
   WHERE  fa.account_number = account_number
   GROUP BY fa.account_number)

This technique where we have a subquery within the where clause that joins back to the main query is known as a correlated subquery and is supported by the majority of leading database vendors including Oracle, SQL Server, DB2 and My SQL.
More information can be found at wikipedia or from IBM.

Our challenge is now how best to implement this in a universe.

Solution

The solution is to use predefined queries and the key part is to know that we need to create a predefined condition for each dimension that we need the latest account balance for. That is in our example above we need a predefined condition that filters for latest account balance by account number and another predefined condition that filters for latest account balance by account type.

It would be ideal if we could just create one predefined query called say ‘Latest Data’ that the users can add to their report query and this then will automatically filter for latest data. However this is not possible because we still need to know in what context do the users mean ‘Latest Data’ for, that is, do they want latest data by account type or by account number.

In the example above we had two contexts (and I don’t mean universe context here) for latest data: namely latest account balance by account number and latest account balance by account type. As such we need to create the following two predefined conditions in our Universe.

Name Latest Balance by Account Number
Description Will filter the results to display latest balance for each account
Where Clause
Balance_Date IN (
  SELECT  MAX(fab.Balance_Date) AS MaxDate
  FROM    FACT_ACCOUNT_BAL AS fab
  WHERE   (fab.Account_Number = Account_Number)
  GROUP BY fab.Account_Number )
Name Latest Balance by Account Type
Description Will filter the results to display latest balance for each account type
Where Clause
Balance_Date IN (
  SELECT  MAX(fab.Balance_Date) AS MaxDate
  FROM    FACT_ACCOUNT_BAL AS fab
  WHERE   (fab.Account_Type  = Account_Type)
  GROUP BY fab.Account_Type )

Then we can then use either of these in our queries.

The screenshot below illustrates a query in Web Intelligence using this predefined condition. (Click for larger image)

Web Intelligence Query Panel

Web Intelligence Query Panel using a predefined condition

The following is the SQL generated by the above query,

SELECT
  FACT_ACCOUNT_BAL.Account_Number,
  FACT_ACCOUNT_BAL.Balance_Date,
  FACT_ACCOUNT_BAL.Balance
FROM
  FACT_ACCOUNT_BAL
WHERE
  ( Balance_Date IN (
    SELECT MAX(fab.Balance_Date) AS MaxDate
    FROM   FACT_ACCOUNT_BAL AS fab
    WHERE  (fab.Account_Number = Account_Number)
    GROUP BY fab.Account_Number)  )

And the screenshot below is of a simple table displaying the results of this query.

Screenshot of query result in Web Intelligence

Screenshot of the result in a Web Intelligence document

Conclusion

Our requirement was to add a mechanism to a BusinessObjects universe that allows users to easily filter a query so that it only displays the latest data. We saw that in order to fulfil this requirement we needed to understand the context of the meaning of “latest data” and we can then create predefined conditions that filter for latest data by each required context.

In the example above Account Balance is a special kind of measure know as “semi-additive”. These are measures that don’t aggregate uniformly across all dimensions. In this case we are logically able to add account balances across different accounts or account types but we can’t add account balances over time.

BusinessObjects has made some progress in the latest release (XI3.1) where we can delegate that a measure is aggregated using a database query rather than by the report. What would be ideal however would be a mechanism to say how a measure is aggregated depending on which dimension (object or maybe class) is used. For example if our dimension is account type then the aggregation is “sum” however if it is by time then it is “max” based on dimension value or maybe “none”.

Semi-additive measures can be difficult to handle and hopefully above highlights one method of handling at least time base semi-additive measures.

This article was published on November 16, 2009 by Al Gulland.

3 comments on “Filtering for Latest Data in a Universe

  1. KFonMurphi

    You’re perfectly right when you say that aggregation should depend on the dimension. I’m encountering this problem regularly.

    The notion of “analysis axis” which contains dimensions is not there, although hierarchies do exist.

    Hopefully there will be some progress in XI4, but I’m not holding my breath.

  2. agulland Post author

    Hi, thanks for your comment! I’m also eagerly awaiting next release and would love to see much more data analysis features added to Web Intelligence!

  3. Pingback: correlated subqueries in webi/designer | SES

Leave a Reply