Research Blog - Customer Intelligence

Readers of this blog may be interested to know about a handy guide to enterprise reporting that I've put together.

Most organisations publish reports on a range of metrics - customers, employees, transactions, production, contacts etc. I noticed there was a real dearth of vendor-neutral information about management reporting, so I thought I'd fill the gap.

The guide can be used as a checklist for project management, discussing the elements of report design, publication requirements and the all-important user experience. There's also info about the technical architecture and functional requirements for reporting systems.

The last few sections provide an overview of how to go about articulating how reporting initiatives creates value for your organisation, and how to assess the quality of reporting systems as seen by your users.

All in all, a reasonably comprehensive sales-free guide to enterprise reporting.

Something I've been noticing lately is the proposition that one can drive up data quality by denying users the ability to complete their task with fields left blank. You've probably seen this on a lot of web apps - if you leave a form field blank some Javascript validation code will pipe up and complain. It's also widespread in call-centre (and CRM) applications. My theory is that this "lock-out" approach is used only when application designers (or owners) are deeply suspicious of the people entering data (staff or customers) and cannot align the incentives of the data enterers and organisation as a whole.

But does this actually work? A (possibly apocryphal) story that abounds in the data quality literature is something I call the parable of the "broken legs". It goes something like this:

A leading health-care organisation was doing some analysis of patient claims data. They found that some 80% of claims made were for broken legs (or fracture to the lower limbs). They tested a number of theories - perhaps the plan was popular with skiers? - before they figured out what was happening. When the patients made a claim, the call-centre staff had to select from a drop-list the appropriate ailment or they could not advance to the next screen. They resented this step since it added a large delay (and they're bonuses are linked to promptness). However, the call-centre staff soon realised that they could merely select the first item in the list and proceed. That first item was "broken legs". Here endeth the lesson.

So, I ask the question: under what circumstances is it a good idea to insist on non-blank fields? Or, equivalently, when is it a good idea to allow blank fields?

Here's the setup:

Assume that you presently allow blank fields, and are deciding whether to add a control to ban them.

A field value can be in one of three states: Correct (C), Blank (B) or Rubbish (R). Note that Rubbish means the value is valid, but randomly generated (perhaps drawn from the same distribution as the Correct values).

Presently, your data is split across these three types in proportion [C,B,R].

The idea is that putting in a blank-field check will turn some of your Blanks into Corrects. Of course, you'll also create some extra Rubbishes too. Let's think about the relative costs involved.

Firstly, the cost of a Correct entry is 0. Next, let's set the cost of a Rubbish entry as 1 "penalty unit". (As we see later, it doesn't really matter what the absolute cost is in dollars.) I argue that the cost of a Blank (c) is somewhere between 0 and 1: at least with Blanks you know where the dodgy data is so you can (presumably) work around it, or retrospectively fix it. With Rubbish though, you don't even know how bad the problem is.

Proposition: 0 <= c <= 1
"Proof": Assume c > 1, then each Blank could be replaced with a Rubbish entry driving down the total cost. Since this doesn't happen in practice, no one believes that c > 1.

[If Blanks are in fact costing you more than Rubbish, please get in touch with me via email and I will happily - for a modest fee - make the substitution for you in your data set.]

NB: c < 0 implies a negative cost ie you're making money on a Blank and is perverse and indicates something is wrong with how you're using information.

The astute reader will say "Ah! But sometimes Blanks are substituted in a process known as imputation." True. But the imputed values are not Rubbish, they are usually mode, mean or other average (or likely) values.

OK, so we've established a rough-and-ready estimate of the relative cost of a Blank (c). Now, we need to guesstimate the proportion of Blanks that will converted into Corrects. Naively, people may hope this is 100%. As the parable of the "broken legs" shows, this is not the case. So, we set q as the proportion going from Blank -> Correct, and (1-q) as the proportion going from Blank -> Rubbish.

Now, the gain is going to be the expected benefits net costs. (Here we assume there is no switching cost of putting the check in.) The benefit is going to be the proportion of Blanks converted to Corrects times the relative cost of Blanks. The cost will the proportion of Blanks converted to Rubbish times the extra cost of a Rubbish over a Blank.

G = [benefit] - [cost]
= [q * c] - [(1-q) * (1-c)]
= qc - [1 - q - c + qc]

= q + c - 1

So, G has a maximum at (1 + 1 - 1 = +1), and a minimum at (0 + 0 - 1 = -1). If G = 0 it means you're at the critical (or break-even) point.

So here's a simple decision model: it's worthwhile removing the blank-field option if and only if q + c > 1.

Aside: If the switching cost (ie developing new specs, building the code, testing it, deploying it, training and documentation etc) are not negligible, then you'll need to know the absolute costs. To do this, let C be the absolute cost of a Rubbish value (in dollars), B is the proportion of your data that are Blanks and S is this switching cost.

GA = [B * q * c * C] - [B * (1-q) * (1-c) * C] - S
= BC * [q + c - 1] - S

And the decision rule becomes "implement if and only if BC[q + c - 1] > S".

However, in most cases the simpler case suffices (q + c > 1) and we only need to ask: what are reasonable values for q and c?

Well, my hunch is that for most cases q < 25%, particularly where users are losing the option of entering a blank. This may seem low, but ask yourself how many people would think like this:

Ah, I see I can no longer leave this field blank. I must now select an option from this list. Management has not made this change to frustrate me. They've offered me an opportunity to improve myself. Sure, I could just put in any old thing, but this is my chance to stand up, make a difference and Do The Right Thing.

Clearly, not many people would react like this. The figure of 80% bandied about in the broken legs parable suggests that q<25% is about right.

Of course, you'll need to test this yourself with some field experiments. Here's what NOT to do: pick your five best employees or most dilligent customers. Get them to use the new system for the first time while their supervisor, boss and entire board of management look over their shoulder, for five minutes max. Declare it a success when q=100%. Don't repeat six weeks later once staff have figured out the work-arounds.

Now, what's reasonable for c ie cost of a Blank as a proportion of cost of a Rubbish value? I would suggest in general c<50% ie a Blank costs at most half as much as a Rubbish. Why so low? Well, as mentioned above, a Blank has at least the benefit of being obviously wrong so some contingent action may be taken (eg data processed differently or re-acquired). The quality problems will have a high visibility, making it easier to get management support for fixing them. You can also analyse where the data quality problems are, where they're coming from and why users are reluctant to use that particular field. This is potentially very valuable. Also, it doesn't screw up averages and other stats about the data set as a whole.

Based on these hunches and the "q + c > 1" decision rule:

0.25 + 0.5 < 1

so, in general, it's not a good idea to disallow blanks. My recommendation for application designers is:

Notify users that they have left a field blank, but allow them to proceed to the next step. If a client demands that fields are completed before users can proceed, tell them the parable of the broken legs. If they persist, ask them to estimate values of q and c (off the top of their head) and use the above decision rule. In the event of a dispute, flip it around to "what values of q and c make it worthwhile?" and check these values for reasonableness. As a last resort, go and look for empirical evidence of q and c in your particular context to settle it.

Of course, you'll need to drag out the back of your own envelope and do the sums yourself - but I hope this is a good start.

As is traditional with this near-defunct blog, I'll begin by remarking: Gee, it's been a year since I last posted! That must be a record! Not like my other blogs, which receive more frequent attention.

Now, with the formalities out of the way, I can proceed to today's topic: Information Quality Measurement. David Loshin has an article in DM Review about this very fraught topic. (Along with Tom Redman, David Loshin is an IQ guru-practitioner worth reading.)

The article lays out some aspects of creating IQ metrics, following the usual top-down Key Performance Indicator approach. That's fair enough; most companies will find this well-inside their comfort zone. It goes on to:
  • lists some generally desirable characteristics of business performance measures,
  • show what that means for IQ specifically,
  • link that - abracadabra! - to the bottom-line.

As far as a list of desirable characteristics go, it's not bad. But then, there's no reason to think it's any better than any other list one might draw up. For me, a list like this is Good if you can show that each item is necessary (ie if it weren't there, the list would be deficient), and that no other items are required (ie it's exhaustive). I don't think that has been achieved in this case.

In any case, this approach makes sense but is hampered by not considering how these numbers are to be used by the organisation. Are the metrics diagnostic in intent (ie to help find out where the problems are)? Or perhaps to figure out how improvements in IQ would create value across the company?

My own research based on a series of interviews - forthcoming when I kick this #$%! PhD thesis out the door - suggests that IQ managers are well-aware of where the problems are and what will be required to fix them. Through sampling, pilots and benchmarking they seem reasonably confident about what improvements are likely too. I would question the usefulness of measures such as "% fields missing" as a diagnostic tool for all but the most rudimentary analysis. What they're crying out for is ammo - numbers they can take to their boss, the board or investment panel to argue for funding. Which leads to the next point.

"The need for business relevance" could perhaps be better explained as pricing quality. That is, converting points on an arbitrary scale into their cash equivalents. This is a very tall order: promotions and job-retention must hinge on them if they are to have meaning. Even management bonuses and Service Level Agreements will be determined (in part) by these scales. In effect, these scales become a form of currency within the organisation.

Now, what manager (or indeed supplier) is going to be happy about a bright young analyst (or battle-hardened super-consultant) sitting down at a spreadsheet and defining their penalty/bonus structure? Management buy-in is essential and if they're skeptical or reluctant then it is unlikely to work. If you try to force an agreement you risk getting the wrong (ie achievable!) metrics in place, which can be worse than having no KPIs at all. There is a huge literature on what economists call the principal-agent problem: how do owners write contracts with managers that avoid perverse incentives, without being hammered by monitoring costs?

But suppose these problems have been overcome for functional managers (in eg. the credit and marketing units). These people own the processes that consume information (decision processes) and so should value high-quality information, right? Why not get these people to price the quality of their information? But what's high-quality for one is low-quality for another.

Plus, they know that information is (usually) a shared resource. It's possible to imagine a credit manager, when asked to share the costs of improvements to the source systems, holding out saying "no, we don't need that level of quality" knowing full-well that the marketing manager will still shell out for 100% of the expense - with the benefits flowing onto the sneay credit manager. (This is where it gets into the realm of Game Theory.

So, what would help here is having an objective measure of relevance. That way, quality points could be converted into cash in a reasonably transparent way. But how do you objectively measure relevance? Well, a another tidbit from my research: relevance is not a property of the data. It is a property of the decision-process. If you want to understand (and quantify and price) the relevance of some information, then staring at the database will tell you nothing. Even seeing how well it corresponds to the real-world won't help. You need to see how it's used. And for non-discretional users (eg. hard-coded decision-makers like computers and call-centre staff) the relevance is constant regardless of any changes to the correctness.

In light of this, doesn't it makes sense to:
  1. Identify the valuable decision-making processes.
    This could include credit scoring (mortgage/no mortgage), marketing campaigns (offer/no offer), fraud detection (investigate/ignore) and so on.

  2. Price the possible mistakes arising from each process.
    Eg. giving Platnum Card to bad debtor. Don't forget the opportunity costs such as missing an upsell candidate.

  3. Score the relevance of each dataset of interest (eg. attribute) to that mistake.
    Some attributes will have no bearing at all; for others, the decision largely hinges on it.

  4. Measure the informativeness of the attribute to the real-world value.
    What can I find out about the real-world value just by inspection of the data? This is a statistical question, best asked of a communications engineer ;-)

The first two tasks would be undertaken as part of the functional units' KPI process, and tell us how much money is at stake with the various processes. The last two could be undertaken by the IS unit (governed, perhaps, by the IS steering committee made up of stakeholders). The resulting scores - stake, relevance and informativeness - could be used as the basis of prioritising different quality initiatives. It could also help develop a charge-back model for the information producers to serve their (internal) customers.

Two questions: how do you score relevance and informativness? My conference paper (abstract below) gives some hints. Next: will corporate managers (IT, finance, marketing) accept this approach? For that, I'll be doing a focus group next month. Stay tuned.

I am pleased to say that my first academic paper has gone through the rigorous double-blind review process and has been accepted for a conference!

The conference is the 2004 IFIP International Conference on Decision Support Systems with the conference theme Decision Supoort in an Uncertain World. It is held this year in Prato, Italy, an hour from Florence. I am excited to be invited to present my work a stone's throw from the birth of the Renaissance!

I've also been accepted to participate in the conference's doctoral consortium, where I will get to workshop my research project and network with academic luminaries from my field. This will happen from June 30 to July 4, which means I get to avoid another dreary Melbourne winter.

The paper can be found on my research documents page under dss2004. The abstract is provided below.

An Information-Theoretic Model of Customer Information Quality

Gregory Hill

School of Business Systems, Monash University
Melbourne, Australia


The definition of measures of Information Quality (IQ) is an important part of Information Systems research and practice. A model is presented for organisational processes that classify a large number of customers into a relatively small number of partitions (or segments) using attributes in a database. The model is built upon concepts from the IQ literature, but uses Information Theory to define appropriate measures. These measures allow business analysts to assess different aspects of the processes, with a view to understanding the impact of IQ upon business performance. Based upon this understanding and the organisational context, IQ treatment proposals may be constructed, refined and evaluated. In addition to specifying the model and measures, an illustrative example is presented.


Information Quality Assessment, Information Value, Information Theory, Customer Relationship Management

One area related to my research that is of little concern to me is privacy. You know, Big Brother is watching and all that. I've heard academics at conferences get up and talk about the universality of privacy as a human right. Bullshit! You only have to travel briefly in other parts of the world to realise what a culturally-specific and loaded concept "privacy" is.

On the other hand, some of those hard-core economists from what's often called the "Chicago School" promote the idea that privacy is inefficient. That is, privacy is when you attempt to misrepresent your demand to the market. The corresponding loss of efficiency on the supply side reduces the public good ergo individual consumers should pay a premium to retain privacy.

Another way of looking at it is that by revealing our demand to the market so it can organise itself better we get an efficiency discount - the price without revelation is the "true price". Generally, if you signal future intent in a way that the market can use to organise itself (eg. flatten supply chains, reduce various kinds of risk) you will be paid a dividend. For consumers, this means that if you buy in bulk or pre-order your purchases, you can expect to get it cheaper.

I have an alternative theory about why consumers value privacy, based on information-theoretic considerations. As humans, we're increasingly defined unqiuely by our individual consumption choices. Hence the need to have "our" brands, labels, sports teams and other preferences expressed by execution: no point being a "Chanel No. 5" girl if you can't buy it. On the other hand, herd instinct and economies of scale mean that we are encouraged to consume what those around us do. How do we break this contradiction? Product lifecycles, constant innovation, fashion and marketing niches allow us to differentiate. "Sure, I wore cargos - when they first came out".

There seem to be two topics in Information System (IS) research that everyone seems tired of talking about: information vs data, and value vs quality. As such, there is not a lot of agreement about how these terms are used. For example, it is very widespread to read such things as "thoughout this article, the terms 'data' and 'information' will be used interchangably". To my mind, this is like using 'heat' and 'temperature' interchangably: acceptable in everyday life, but not if you're a physicist or refrigeration mechanic. They are two different words for a good reason - they are two different concepts. There may be disagreement about how they differ (information theory as compared with semiotics), but we should at least recognise that there is a difference! But, I've got a feeling I'm going to ride this hobby-horse to hell ...

The second term-pair is the object of this text: the concepts of quality and value. They are both seen as "good things" and are some how related yet quite distinct. I will start of by describing how they (seem to be) used and some shortcomings. Then, I will present my conceptual framework of these terms, culminating in the synthetic notion of "value of quality".

Both "value" and "quality" are concepts used for assessment of past, present or future outcomes. While modern management rhetoric is replete with both of them, we sometimes find them being used without any great distinction. The situation is not dissimiliar to when people use the terms "efficiency", "efficacy" and "effectiveness" as synonyms: they are different for a reason and the expressive vocabulary of middle and strategic management is weakened when the distinctions are lost. As such, it is worth pondering the nature of these concepts, and suggest some guidelines for use.

No doubt there are hundreds of sources on this topic, but my simple, almost child-like, probing of Google didn't turn up anything addressing exactly this issue. When people use "value" I think they often have in mind a market price, whereas the term "quality" seems to be associated with customer feedback. That is, the concepts are "operationalised" by the procedure used to determine them (eg a financial transaction or survey response). At a more abstract level, value is defined as "utility" while "fitness for purpose" is a frequent level-head response to the question "what is quality?". Some people seem to attach religious overtones to it. The definition of quality as "reasonably fit for that purpose" seems to be applicable in the Commononwealth legislation, and may have equivalents in other common law countries.

"Utility" is a very abstract concept, rooted in psychology, economics and philosophy. In practice, it is measured in currency using the net present value of a stream of costs and revenue (ie discounted cashflow). There are two other value-measuring methods. "Historical cost" (what you paid for the item) and "market value" (how much it would cost to replace by going to the market; or how much you could sell it for in the market). These may be preferred by accoutants and tax planners for different purposes. Experimentally, "willing to pay" (WTP) and "willing to accept" (WTA) are often used: these are the prices in cash you would be willing to pay/accept to acquire/dispose of the item. It's interesting to note that due to the "endowment effect", people typically price items they already have three times higher than identical ones they don't have.

So the supposition is that the terms are defined by how they are elicited; this often leads to the assumption that value is quantitative while quality is qualitative. I disagree that this is the distinction: quite often quality is measured subjectively through Likert scales and objectively through defect rates, down-time and precision. Another observation is that quality seems to crop up more in public-sector and community organisational jargon, while value crops up more in the corporate and business world. To visit the supermarket, we're left the impression that "value" and "quality" are codewords for "discount" and "premium". Perhaps this is merely what advertisers would like them to mean to retail consumers. One last distinction: people sometimes refer to quality as being subjective and value as objective, somehow tying up the use of these concepts with the postitist/interpretivist methodologies war. I think that both concepts are sufficiently different that they can contribute to discussion in both camps.

Now I would like to present my own modest attempt at explaining the difference. "Quality" is to do with "satisfaction", while "value" is to do with "preference". Please allow me to elaborate: assume that you have an itch that needs scratching. The quality of the someone's scratching is intrinsic to the experience itself. It is only meaningful to talk about the value of the scratching relative to an alternative course of action, such as rubbing up against a wall. So satisfaction is an internal sensation or experience, while value is the expression of a preference by behaviour. To elicit the value of something, you have to have an item (or experience), a person to perform the evaluation and an alternative. At the end of the observation, you can determine the preference. By extension you can repeat the procedure with a number of items and generate a ranking, or partially ordered preference function. This is the basic approach taken by von Neumann and Morgestern in their axiomatisation of preferences for lotteries in their expected utility theory.

So quality is a conceptualisation of the "fit" of the item (or experience) for the purpose at hand. It seems like quality must be a Boolean notion - something either is or isn't quality. Or can there be degrees of quality? If so, how can it be agreed upon within a group, since quality is inherently a subjective sensation of satisfaction? This seems to be the starting point of the "quality movement": defining the dimensions along which quality can deteriorate, and developing instruments to measure the extent.

Value, in comparison, is relative to the alternatives and thus is operationalised and expressed as a ranking. In the case where the alternatives are a small set of discrete behaviours, observation of the subject will only reveal a small amount about the value they ascribe. However, we can make the "granularity" of the choice-set arbitrarily small using currency, and get as precise a ranking as we like by conducting WTA/WTP experiements. That is, when a subject expresses ranking of preferences for $392 cash over a surfboard over $390 cash, we can label this relative preference for the surfboard with the shorthand of "price".

Here is a diagram illustrating the difference between quality and value. The yellow block is the pre-existing need, and the quality of the item is portrayed as how well it fits in the slot. Thus, the amount of daylight could be interpreted as an (objective) measure of the quality. The value, in this case, is the ranking expressed by the consumer over the three alternatives.

As we've seen, by introducing cash alternatives we can assign a price to the quality. Note that this price may not be related to the geometry of the fit at all, and in general it would have to be experimentally determined with preference experiments. However, in some circumstances a consumer would agree to use a rule, or formula, to analytically derive their price (to accept or to pay), which necessitates an objective measure. In physical markets, this measure is most often quantity (amount or count). In non-physical markets it is typically risk or probability. This approach is also important in organisations, where decision-making is distributed and accountability requires that decision-makers can justify their valuations.

Next, we consider the use of these terms. Earlier I remarked that quality and value enjoy a loose public/private sector dichotomy. I suspect that the above framework may explain this. In the business world, the phrase "value-added" is used to death. To impart some meaning, I choose to interpret it as the "value gap" between your offering and that of your nearest competitor. This is because businesses understand that consumers make choice: the goal is to picked first. You only want to win the auction by $1: winning by $100 is a waste. Afterall, it's the ranking that matters, not the score. Hence, businesses are focused on shaping their value proposition to the customer relative to each customer's alternative choices (ie the competition, substitutes or the "do-nothing" option).

This approach doesn't pan out to well in the absence of alternatives. Many public and community sector organisations perceive themselves to be in this position. There's nothing to rank against if your a monopoly provider of an essential service. It's not reasonable to ask what the next best alternative to air is, or what price you are willing to accept to have it taken from you. To the individual consumer, I suspect that they are more interested in the quality of the ambulance service than it's value. (Still, some people will for different reasons prefer to hail a taxi cab to take them to hospital.)

There is one point of view where the value of such services is considered: that of the government. They are concerned with valuation in addition to quality because they must make resource allocation decisions across disparite activies. To decide if money should come out of roads and go into child protection, they consider the impact upon the quality of transport and safety, respectively, to citizens. We in turn express our view of the quality of the government's efforts at this task via polling of the electorate. Note that it is only during an actual vote that we express our value of the government: our behaviour reveals our preferences.

So people operating in a market focus on value. Quality is important to the extent that it influences value. If you are a supplier currently coming second in the auction, there are two ways you can improve your value (ie preference ranking). You can increase your (perceived) quality or decrease that of your competitor in first place. An interesting question is: would you prefer a change X in your quality, or a change Y in your competitor's quality? A rational supplier would answer this by considering the value of each change ie the effect on consumer ranking and choose the one that maximised the expected utility. By introducing cash bundles (or lotteries) and asking the supplier to rank them, we could determine the price of X and Y ie where abouts they fit on the ranking from $0, $1, ... , $infinity. Further, we could describe a change X by some objective measure of degree (say, parameter t), then we could experimentally create a price curve of X(t) by determining where each of X(1), X(2) etc fit.

This is how we measure the "value of quality": it is the supplier's estimate of the consumer's preferences for realising satisfaction. To recap:

In the beginning was the need, or requirement or "itch" of a consumer. They seek things to satisfy this need (or scatch the itch) from a set of suppliers. They rank the offerings and make a selection, and then experience satisfaction associated with the outcome. The consumer can assess this experience using two concepts:

  • Quality: the subjective satisfaction experienced by the consumer. It is elicited via qualitative feedback, Likert scales or agreement on indirect objective measures such as number of defects and probability of failure. Quality can be decomposed into dimensions which can measured and compared.

  • Value: the relative preference of the consumer for that alternative. It is elicitied via observation of ranking of alternatives in experiments. If a range of cash bundles are introduced as alternatives, it possible to determine the consumer's Willing to Accept/Pay prices for each alternative.

It only makes sense to talk about value relative to at least one alternative, otherwise no preference can be discerned. In a competitive market suppliers are interested in their value (ie relative standing) to all their consumers. They will compete on (perceived) quality to the extent that it will improve their value in the eyes of the consumer. It may be more effective to change the (perceived) quality of their competitors. Suppliers are interested in the sensitivity of consumer valuation to perceived quality; that is, the value of quality.

A possible approach is to use qualitative research methods to determine quality dimensions and surveys to determine scalings and calibration. Then, determine the subjective value of alternatives by conducting WTA/WTP experiments. Fit a model that estimates the consumer's value for any set of quality inputs. For a given auction, determine the cheapest quality dimensions to improve to win by $1. If it's cheaper than winning the bid, go for it.

In practice, quality meets value at the point of Service Level Agreements (SLAs). Typically, you find a quality measure associated with a value measure (bonus/penalty). Due to the organisational and legal requirements, objective measures are used. While the consumer could in theory fine-tune the bonus/penalty structure to work for the quality definitions the supplier comes up with, this isn't practical. Instead, they agree on quality dimensions and scales, and associated payments, via often very simple formulae. Further, the objectivity requirement means they often use simple technical measures. These crude constraints mean that for any point on the quality scale, it's likely that at least one party will feel agrieved by the corresponding bonus/penalty. This can quickly destroy goodwill and lead to "playing to the numbers" instead of focusing on subjective quality.

In situations where there are no alternatives (such as monopoly provider of essential services or an internal suppier), it is not possible to talk meaningfully about the value of the good or service. However, it is possible to describe the value of quality by considering the preference for alternative possible quality changes. Candidate quality changes are often parameterised, such as failure rate or response time, to allow the comparisons (for reporting and budgetary allocations) to be analytic. This approach is at the heart of modern quality management practice. It is exemplified by the way we run specialist security services and professional sports teams.

By seeing quality as subjective satisfaction and value as relative preference, we can understand the role they play in organisational decision-making, and where we're likely to encounter each. It's also useful for understanding the role of quality and quality measures in competitive markets.

Just sneaked in under a year since last blog entry. It's also timely to review progress since the last entry. Briefly, I got confirmed in my PhD by the Committee. I am now going through Ethics approval and hope to start collecting data next month. I have a plan to finish this in two years, so I figure I'm about 3-6 months behind the ideal, which is about on target for most candidates.

More importantly, I have a much clearer view of what I want to do with my research, and how to get there. I also have a stronger grasp on this IS discipline and what it is about. (This my cue to start my rant.) It amazes me the things people research in this field - and what they leave alone. The concept of "information" for one. It seems to me that the "systems" side of things gets about 95% of the spotlight. People seem strangely reluctant to tackle "information". I regularly come into contact with researchers and gurus who state that "data and information are interchangable terms", or "information is data in context". Well, you'd expect that from a researcher who studies the work of SAP installers (ie interested in organisation, not content), but I even see it from the leading lights in my own sub-discipline of Information Quality (or Data Quality).

"Data" is to do with symbols. "Information" is to do with uncertainty. They are not the same thing.

Yes, "Information Science" (or what I think of as "Information Theory") has very little to do with "Information Systems". This must be quite a puzzle to people outside the field, perhaps arguing by analogy with "Chemistry" and "Chemical Engineering". But, I would be suprised if 10% of the people turning up to an IS conference could tell you anything at all about Information Theory. Engineers and economists are big on Information Theory, yet IS ignores it. This is a curious state of affairs, but I have a very crude explanation. Suppose you're interested in computers; if you're also interested in, and capable of handling, mathematical concepts, you do Computer Science. If you're not, you do Information Systems, and acquire a psychological or sociological reference discipline. I'm sure this does injustice to some researchers, but is probably broadly true. This situation impoverishes both fields. (I think that CS people are broadening their perspectives, especially the AI oriented researchers, via Cognitive Science.)

New record for longest-time-since-blogging - over two months! Well, to be fair, there was Christmas and I went to South East Asia for 5 weeks.

Last Friday I had doctoral consortium were I presented some of my views. Seemed to go fairly successfully: there was a mixture of healthy controversy and curiosity from the academics.

I met with my supervisor Graeme today, in preparation for a meeting with the industry sponsor, Bill, next Tuesday. It all seems to be ticking along nicely. The above-linked slides are the best formulation yet of the topic, and I'm moving closer to having the research questions elucidated. I believe the key is the idea of information as a discriminator - information is that which allows us to seperate, filter, distinguish and group. A central strategy for many organisations is to treat people differently, and there are numerous value models that explain the benefits of such differentiation. Information, then, is to be assessed on its ability to do such discrimination. Ie customer segments are valuable to the extent that they allow differentiated treatment of customers, information is valuable to the extent that it supports this segmentation activity. In order to measure this, we have to break-out my favourite tool - entropy.

I like this formulation because it pushes the value measuring problem onto an existing marketing area associated with CLV, CVA, LVT etc. It also naturally accomodates the difference between information at different levels (ie "average customer wants to spend $10" and "Robert Smith wants to spend $10").

My current approach is the gedanken experiment - I'm constructing a hypothetical model around Krusty Burger seeking to upsell fries, where it knows the cost and benefits of asking, and either being accepted or rejected. This is useful for me to sort out some ideas and look at generalisability.

This leads to my next point: I need to be careful with my research that it doesn't piss off the stakeholders. The reason is that I'm more comfortable with taking on risk than my supervisor or industry sponsor. I'm also happy to stay away from industry and then emerge from my Ivory Tower after a couple of years and say "You should do it this way" and pretty much leave it at that. I think I missed the boat on that as a viable research option (sometime in the 1970s). That said, I'm pretty sure I don't want to gain any "deep insight" into a particular case study and analyse the political macchinations and generally humans-as-social-animals approach (people react to situations depending on how they think it affects their careers - suprise, suprise!). Nor do I want to canvass "best-practices" from a dozen organisations. AFIC, if people want that they can pay Gartner for it like everyone else.

OK - here's a hypothesis, no, more of an analogy: Options (calls and puts) are second order transactions. They're transactions about transactions, and they involve a shift in the time dimension and a capping or limiting in the value dimension. Similarly, we can have decisions about decisions; we can decide today that "I will make a decision 6 months hence" or "No matter what, Phil won't be deciding the issue". These are second order, or metadecisions. We can also make contingent decisions: "I will review your salary in 6 months. If revenue hasn't increased, you will not be getting an increase." No doubt a large chunk of what we mean by "manage" could be described as decisions about decisions about ... ad infinitum.

From an information-theoretic point of view, what is going on here? Well, to some extent we're creating options, and to another extent we're eliminating options. For the salary-review example, the manager has decided to remove the option of "increase" (implicitly leaving only "stay the same" or "decrease") contingent on some variable. Perhaps an approach is to enumerate all possible decision outcomes, and assign a probability of it being selected (from the point of view of the manager). Eg. "increase", "constant" and "decrease" are all equally likely. Hence, we can look at the entropy of the decision space, D:

E[-log D]

Obviously, the selection "increase" hinges on a random variable, R, that relates to revenue and the decision rule. By comparing entropy before and after certain events, we are measuring the change in decision selection entropy NOT as a measure of information - but intelligence. The events that lead to a change in entropy (or propensity to decide a certain way) would fall into three types:

1) Change in option structure (eg. merging, eliminating, creating) "I've been told I can't give you a decrease, regardless or revenue".
2) Change in decision rules (eg. contigency) "If revenue hasn't increased by 10%, you won't be getting an increase".
3) Change in parameters (eg. variable uncertainty) "Revenue will remain constant with 95% certainty".

Generally speaking, people like having options and will pay money to keep their options open. However, markets like people to relinquish options, so that it can operate more efficiently through planning and risk-sharing. For example, renting (should be) dearer than taking out a mortgage. Or if you promise to give Target all your business, you should get a modest discount. Basically, you help out the market, and the market kicks some back your way.

If options are valuable (and freely traded in secondary markets), why then, would managers eliminate them? (Partion their decision space.) Why would they knowingly in advance reduce the courses of action available to them? First guess: they rarely do it. Most managers I've dealt with are extremely reluctant to do this, and don't want to see targets or similar on their product reports. No one wants their business case coming back to bite them on the bum.

Second guess: it's a communication thing. Specifically, it's a negotiation thing. The motivation for telling your staff about the salary review, and the fact that it's tied to revenue, is an incentivation technique. The manager thinks that her staff will work better (ie increase revenues) knowing this: they will act differently ie make different decisions. The existence of this decision rule in the manager's head is a decision variable in the head of the staff. Thus, it falls into the domain of "threats and promises".

Third guess: it's a communication/negotiation thing between the manager and her boss/company. "See, I'm managing my staff for performance - please give me my bonus".

Where does this leave us? Perhaps a measurable and testable (ie normative/postivist) theory of decison-making could provide us with a basis for arguing what the effects of decision rules and parameters are. By linking these effects to money via Utility Theory, we could subsume the question of "what resources should I expend on changing my decision selection propensities?" into general Utility Theory (microeconomics and game theory). This then, might be of help to people when managing their information and improving the quality of decisions, and hence increase social welfare.

Wow - this time a month's delay. That's a new record!

Papers: I'm reading a set of papers by John Mingers, a leading thinker in the fundamental questions underpinning systems theories, including information systems. Particularly, I'm reading about information and meaning, and how autopoiesis can help explain this. This is definitely not for the faint-hearted, and in fact, reading this and related material makes me think that to really participate in this dialogue you'd need to have spent some time in universities - preferably Californian - during the late 60s, if you know what I mean. The most understandable (to me) idea of picked up so far is that "information is the propositional content of a sign". This is related to my concept of information ("information is the change in uncertainty of a proposition"), but in a way that's not entirely clear to me.

I'm also reading selected papers from ECIS 2000, particularly those dealing with economic analyses of information, such as operation of markets, and those dealing with customer operations, such as data mining.

Seminars: Lasty Tuesday I attented an industry seminar on creating value from Clickstream Analytics. It was a bit disappointing: in a nutshell SAS has put out a web log analyser, and the National Museum of Australia has started to analyse its web logs. Welcome to 1997.

This afternoon I attented a seminar by Prof Lofti Zadeh, a particularly famous researcher from the electrical engineer discipline who crossed over into computer science, but now appears to be heading fully into cognitive science (he developed fuzzy logic and possibility theory amongst things). His seminar was on precisiated natural language. The idea is that traditional analytical tools like calculus, predicate logic and probability theory are too limited in their ability to express propositions ("Robert is very honest", "it is hot today"). So he is promoting an approach to allow one to do formal computation on propositions by imposing constraints on them: it's a way of formally reasoning as you would with logic ("all men are mortal"), except you can incorporate "loose" terms such as "usually" and "about". In essence, it's a formalisation of the semantics in natural language: somewhat of a Holy Grail. I'm pretty sure he turned off a lot of linguists with his use of mathematics - good, I say.

People: I've meet a few times with Graeme. We're currently looking at two issues: 1) The ongoing intellectual property issue; 2) the 1-pager for Bill Nankervis (industry sponsor).

1) The University wants me to sign over all my IP rights to it. This causes me some concern, as it is the University's policy for post-grads to own their IP, except if they're in industry partnerships. My goal is to "open source" my research so that I (and anyone else) can criticise, use, extend and teach it to others. In academia, this done through publishing papers and theses. As sole and exclusive owners of the IP in perpetuity, the University can do what it likes: sell it, bury it, whatever. This makes me uncomfortable, as Australian universities - sadly - are under enormous funding pressure as the government weans them off public money. I'm in ongoing negotiations about how to best ensure that this state of affairs doesn't impact on my agenda.

2) I'm having to narrow and refine my resarch question further. I've tried coming at it from the top down, so now I'm trying from the bottom up:

The Satisfaction Wager: I Bet You Want Fries With That. A Game-Theoretic Approach to Anticipating Customer Requirements.

It seems that even when you get away from thinking about business and information technology, and start thinking about customers and information, I still read a lot of authors who talk in business-centric terms, about organisation functions: billing, sales, marketing, product development and so on. I'm trying to think within a paradigm of information about customers and hence ask "what sort of information does an organisation need about its customers to satisfy them?". I map it out from the point of view of what an organisation does with customer requirements.

Fulfilling Customer Requirements. Eg. Customer contact history. Service request/order state. Operations
Anticipating Customer Requirements. Eg. Customer demand. Changes in circumstances. Channel preferences. Planning/Sales/Marketing
Creating Customer Requirements. Eg. Customer opinions. Market expectations. Development/PR

I'm not entirely sure what a customer requirement is: there's a lot of literature around requirements engineering/analysis, but I think this is from a point of view of developing systems for use by an organisation that operates on customers. I'm talking here about something that looks more like a value proposition.

Anyway, grouping it this way might be a way into understanding the value of different types of information. For example, the "fulfilling" side of things is concerned with the "database world", records, tables, lists of details. The stakes for correctness are high: If you install ADSL in the unit next door, you won't get 80% of the money. The "anticipating" side is the "statistical world", where you deal with guesses: if someone gets onto a marketer's campaign target list and turns out to not be interested, it's not the end of the world. Finally, the "creating" side is where we deal with extremely fuzzy concepts of information to do with perceptions and opinions, such as "unfavourable" news articles, endorsements, sponsorship and the whole "branding" and "reputation" thing.

This latter category is definitely out of scope for me: I think I'll focus my efforts on the interaction between the database world and the statistical world. Hence the facetious title above: if you stood in a Maccas and observed the "up-sell" process, how would you assign value to the information involved? Ie what resources (risks) should you expend (accept) to (potentially) acquire what information? Is current order information enough? Does it help if you have historical information? How much difference does having an individual customer's history make, compared to a history of similar customers (segment history)? Does information about the customer's appearance matter? (Eg. compare in-shop with drive-through.) What about information pertaining to future behaviour (compare take-away with eat-in)? Lastly, what about the interaction of information about the McWorker? The time of day? The location? Etc.

Once again, it's been a couple of weeks since I've blogged. I'll quickly highlight - in reverse chronological order - the people, seminars and texts before going into a lengthy ramble about ... stuff.

People: I met with my supervisor Graeme this morning, and had a quick discussion about the spectrum of formality surrounding business decision making. See the below ramble. Last Monday I had lunch with Dr. Bob Warfield - former manager from Telstra and now something of a role model or mentor for me - and Dr. Peter Sember, data miner and machine learning colleague from Telstra's Research Labs. We discussed my research, industry news and gossip and collaboration prospects.

The Friday before I re-introduced myself to Dr. Tim van Gelder, a lecturer I had in a cognitive philosophy subject a few years ago. We discussed Tim's projects to do with critical thinking, and his consultancy, and possible synergies with my own research and practice in business intelligence. While there are similarities - the goal is a "good decision" - there are differences: I'm looking at the relationships between inputs to a decision (information and decision rules) and outcomes; he's looking at the process itself and ensuring that groups of people don't make reasoning "mistakes".

Seminars: I've attended two since last blog. The first one was on a cognitive engineering framework, and its application to the operational workflow analysis of the Australian Defence Force's AWACS service. (This is where I bumped into Tim.)

The second one was on the "Soft-Systems Methodology" being used as an extension to an existing methodology ("Whole of Chain") for improving supply chains. SSM looked to me like de-rigoured UML or similar. I'm not sure what value it was contributing to the existing method (I asked what their measures of success were, and they didn't have any), but they had quotes from a couple of workshop participants who thought it was helpful. So I figure that's their criteria: people accept it. They didn't report on whether or not some people thought it unhelpful. They didn't talk about proportions of people who responded favourably, and unfavourably, and then compare with people who participated in the "reference" scheme (ie without SSM). In short, since I wasn't bowled over by the obvious and self-evident benefits of their scheme, and they gave me no reason to think that it meets other people's needs better than existing schemes, I'm not buying it.

I have to confess I'm still getting my head around IS research.

Book: I read half of, but then lost (dammit!), a text on Decision Support Systems. It was about 10 years old, but had papers going back to the 60s in it! I don't have the title at hand, but Graeme's going to try and score another copy.

I've also discovered a promising text by Stuart MacDonald entitled Information for Innovation. This is the first text I've read that talks about the economics of INFORMATION as opposed to IT. (I read some lecture notes and readings on "information economics", but found it to be an argument for why organisations shouldn't apply traditional cost/benefit analyses to IT capex.) It's quite clear that information is unlike anything else we deal with, is extremely important in determing our quality of life and yet it is suprisingly poorly understood. I would like to make a contribution in this area, and I'm starting to think that Shannon's insights have yet to be fully appreciated.

Ramble: I've been thinking that to drill-down on a topic, I'm going to have to purge areas of interest. For example, some months ago I realised that I was only going to look at "intelligence" (as opposed to "content" - see below). Now, I'm thinking I need to focus on formal decision processes. Allow me to explain ...

There's a spectrum of formality with respect to decision-making. Up one end, the informal end, we have the massively complex strategic decisions which are made by groups of people, using a limitless range of information, with an implied set of priorities and unspoken methods. Example: the board's weekend workshop to decide whether or not to spin-off a business unit.

Up the other - formal - end, we have extremely simple decisions which are made by machines, using a defined set of information, with explicit goals and rules to achieve them. Example: the system won't let you use your phone because you didn't pay your bill.

The idea is that decisions can be delegated to other people - or even machines - if they are characterised sufficiently well for the delegator to be comfortable with the level of discretion the delegatee may have to employ. The question of what becomes formalised, and what doesn't, is probably tied up many things (eg politics), but I think a key one is "repeatability". At some point, organisations will "hard-code" their decisions as organisational processes. At other times, decision-makers will step in and resume decision-making authority from the organisation process (for example, celebrities don't get treated like you or me).

I'm thinking that for each process, you could imagine a "slider" control that sets how much decision-making is formalised, and how much is informal. This "slider" might have half a dozen states, relating to process functions:
  • Documenting Maintaining the authoritative process map

  • Recording Maintaining the authoritative current state of the process

  • Controling Driving/executing the process map, changing current state and prompting people where necessary

  • Designing Building, testing and deploying new or modified processes based on experience or simulation

  • Commissioning Determining if new or modified processes are required, and the goals, parameters and resources of the process

The more informal the decision, the more you'd need to look at group-think phenomena, cognitive biases, tacit knowledge and other fuzzy issues best left to the psychologists. I'm thinking that the formal or explicit processes are going to lend themselves best to my style of positivist analysis.

So in that sense, I'm inclined to look at metrics, and their role in decision-making for business processes (customer), service level agreements (supplier), and key performance indicators (staff). Typically, these things are parameterised models in that the actual specific numbers used are not "built into it". For example, a sales person can have a KPI as part of their contract, and the structure and administration of this KPI is separate from the target of "5 sales per day": it would be just as valid with "3" or "7" instead. Why, then, "5"? That is obviously a design aspect of the process.

Perhaps if these processes are measurably adding value (eg. the credit-assessment process stops the organisation losing money on bad debters), then it is reasonable to talk about the value of the metrics (both general-thresholds and instance-measures) in light of how they affect the performance of the process? If the process is optimised by the selection and use of appropriate metrics, then those metrics have value.

While I'm not sure about this, I think it's easier than performing a similar analysis on the value of an executives decisions.

This issue: People, books, seminar and more ramblings.

People: I caught up with xxxxx xxxxxxxx (name deleted on request 25/10/2006) and Joan Valdez, both former colleagues from Telstra days. They are now working in the CRM part of Telstra On Air (data systems). Also, I've been in touch with Dr. Peter Sember, a data miner from Telstra's New Wave Innovation labs. We worked on some projects to do with search query analysis and web mining, so I'm keen to collaborate with him again. Lastly, at the seminar (below), I forced myself upon Brigitte Johnson and Peter Davenport - more Telstra CRM people, but higher up and in Retail. I'm keen to let them know about my research, and look for opportunities there too.

Book: Peter Weill & Marianne Broadbent's book ("Leveraging the New Infrastructure"). This is far more scholarly than Larry P. English's book (below), probably due to the different perspective, purpose and audience. Hell, they even quote Aristotle!

The gist of their approach is to identify IT (and communications) expenditure over the last decade or more for a large number of companies (in excess of two dozen) in different industries. They then compare business outcomes (including competitive positioning) over the same period. By breaking down the spend into eg. firm-wide infrastructure and local business unit, they're able to discern IT strategies (eg. utility, enablement) and see how well they align with business strategy. From this, they draw a set of maxims (in the Aristotelean sense) by which organisations can manage their IT investments.

This seems very sensible. But, both the strength and the weakness of this approach is that it treats IT as a capital investment program, as part of an organisation's portfolio of assets. Throughout the book, you get a feeling that they might as well be talking about office space. I've not yet found any discussion about the value of information, separate from the capital items within which it resides. That is, it's very technology focussed. Also, the "money goes in, money comes out later" black-box thing has yet to shed any light (for me) on the fundamental question of WHERE IS THE VALUE? The approach might be useful for benchmarking, and would be useful for people responsible for managing investments in ALL the organisations activities, in that it put IT expenditure on an even footing with office space and stationery. But, I still have a sense that something is missing ...

So while I'm on a roll, I've got some more comments about Larry P. English's book. This guy is - I'm sure he won't consider this defaming - a quality fanatic. He is relentless in his pursuit of quality and I think that this is a good thing. I wish that my phone company and bank had the benefits of some quality fanatics and gurus. But his approach/advice leaves me thinking that the hard bit, the interesting bit, isn't being addressed. By that, I mean that it appears he assumes you already have rock-solid immutable business requirements handed down on an stone tablet. For example, (and I'll paraphrase here):

Suppose you have three customers of your data, and two want 99% accuracy and one wants 99.99% - then you must give them 99.99%

After working as a business analyst and supplier of information to decision-makers, I flinched when I read that. Honed by my corporate experience, my first reaction was "are they willing to PAY for 99.99% ?" - closely followed by "what's 99.99% WORTH to them?" followed by "what would they do with 99.99% that they WOULDN'T DO with 99%?". I think that this step - what you'd call the information requirements analysis - is what Larry's missing. (Behind every information requirement is an implicit decision model.) And this step is the gist of where this research is headed. Quality without consideration of value isn't helpful in the face of scarse resources when priotisation needs to occur.

Seminar: This morning I attended an industry learning seminar put on by Priority Learning. It was about CRM Analytics. There were talks by ING and Telstra on their experiences implementing this, and SAS and Acxiom on the how and why. The main message I took away from this was that ROI is not the last word in why you'd want to do this. Both ING and Telstra feel it has been and will be worthwhile, but neither could show ROI (as yet).

There was the usual mish-mash of definitions ("what is intelligence anyway?"), the usual vendor-hype/buyer-scepticism, the "this is not a technology - it's a way of life" talk, the usual biases (SAS flogging tools, Acxiom flogging data) - in short, an industry seminar! I'm glad though, that my supervisor Graeme was able to come as it has given him a better view of where the CRM/Customer Intelligence practice is, and the questions that my research is asking.

Re: the practice side. There seems to be an emerging consensus that there is an Analytic component, and an Operational component, usually entwined in some sort of perpetual embrace, possibly with a "Planning" or "Learning" phase thrown in. This was made explicit through the use of diagrams. This is par for the course, though from what I've seen I have to say I like the Net.Genesys model better.

One aspect I found interesting was the implicit dichotomy inherint in "data" (or information, or intelligence - the terms are used interchangably): facts about customers, and facts about facts about customers. The former is typically transactions embedded in ER models that reflect business processes. The latter is typically parameters of models that reflect business processes.

Consider the example of a website information system (clickstream log). Here's two "first order" customer facts:

"Greg Hill", "11/3/01 14:02", "homepage.html"
"John Smith", "11/3/01 14:02", "inquiry.html"

Here's two "second order" customer facts:

"50% of customers are Greg Hill"
"100% of times are 11/3/01 14:02"

The former only makes sense (semantically) in the context of an ER model, or a grammar of some type (the logical view). The latter only makes sense in the context of a statistical model (quantitative view). Certainly the same business process can be modelled with say UML, or a Markov Model, and then "populated/parameterised" with the measurements. They will have different use (and value) to different decision-makers - a call centre worker would probably rather the fact "Greg Hill has the title Mr" - if they were planning on calling me. A market analyst would probably prefer the fact "12% of customers have a title of Dr" - if they were planning an outbound campaign.

But what does information in one domain tell us about information in the other? How does information move back-and-forth between these two quite different views? How does that improve decision-making? How does that generate value?

One last ramble: modelling decisions. To date, I've been thinking that the value of customer information lies in the decisions that it drives, in creating and eliminating options for decision-makers (like above). But nearly all the examples presented today involve implicit modelling of the decision-making of customers: When do they decide to churn? Which product will they be up-sold to? Which channel do they want to be reached through? That is, we're talking about making decisions about other people's decisions: "If I decide to make it free today, then no one will decide to leave tomorrow".

Modelling the mental states of other people involves having a Theory of Mind (a pet interest of mine from cognitive philosophy). Hence, if you take the view that communication is the process of changing our perception of the uncertainty residing in another person's mind ("we don't know anyone else - all we have access to is our own models of them" etc), then marketing really is a dialogue. With yourself. This begs the question: do autistic people - who allegedly lack a Theory of Mind - make particularly bad marketers? Does anyone even know what makes for a particularly bad marketer?

So, putting it together, I'm modelling decision-making about decision-making by looking at facts about facts about customers and how this relates to decision-making about facts about customers.

I need a lie down.

Wow - it's certainly been a while since I've contributed to this blog. First, some texts.

I've been reading Larry P. English's "Information Quality" book. Not very scholarly, seems to be loaded with good advice, but lacks a grasp on data, information, intelligence, representation etc. Ie the interesting and difficult theoretical stuff. He sets up a sequence ("data" -> "information" -> "knowledge" -> "wisdom") and more less says that each one is the former one plus context. Whatever that means. Anyway, I'm sure the book would be useful for data custodians or managers of corporate information systems, but I expect it would have limited use to decision-makers and business users, as well as researchers.

Also been going through an introductory book on "Accounting Concepts for Managers". I figure that a lot of accounting concepts and jargon have found their way into information systems, as evidenced by Dr. Moody's paper which proposes historical cost as the basis for the value of information (see below). While I respectfully disagree, it has motivated me to pick up more of these ideas.

Lastly, in the meeting with Graeme this morning we discussed Daniel's paper further, and information economics in general. I got a lead into this area from Mary Sandow-Quirk (Don Lamberton's work) and I'll also chase up Mingers' papers on semiotics. We seem to agree that the value of information lies in its use, and in an organisational context that means decisions. Hence, I've got a book on "Readings in Decision Support System", which will tie in with Graeme's work on Data Tagging for decision outcomes (see below).
We also discussed further our proposed joint paper on SLAs for customer data and analytics, and our plan to put together a "newsletter" document every month or so for Bill Nankervis, our industry sponsor.

I read a Paul Davies book on biogenesis, or the origins of life. A lot of his arguments revolved around complex systems, and the emergence of biological information. The ideas - genes as syntax, emergent properties of semantics, evolution as a (non-teleological) designer - are similar to Douglas Hoffstadter's classic "Godel, Escher, Back - Eternal Golden Braid" book. Davies' book, though, is nowhere near as playful, lively or interesting. Still, it had some good material on the information/entropy front, which got me to thinking about "information systems" in general, and my inability to define one.

Here's a proposed definition set:
A System is an object that can occupy one of an enumerable (possibly infinite) set of states, and manipulations exists that can cause transitions between these states.

A State is a unique configuration of physical properties.

There are two kinds of systems: Artefacts are those systems with an intentional (teleological) design. Emergent systems are everything else.

An Information system is no different from any other system - any system capable of occupying states and transitioning between them can be used to represent or process information. Some properties make certain systems more or less suitable for information systems (ie it's a quality difference).

The key one is that the effort required to maintain a certain state should be the same for all states. This means that - via Bayes' Rule - the best explanation for why a particular state is observed is that it was the same state someone left it in. (This is getting tantalisingly close to entropy/information/semiotics cluster of ideas.)

For example, a census form has two adjacent boxes labelled "Male" and "Female", and a tick in one is just as easy to perform/maintain as a tick in the other. On the other hand, if you were to signify "I'm hungry" by balancing a pencil on its end, and "I'm full" by lying it on its side, you'd go a long time between meals. Hence, box-ticking makes for a higher-quality information systems than pencil-balancing. The change in significance per effort-expended is maximised. The down-side - errors creep in due to noise. (And they say Shannon has no place in Information Systems theory!)

Another view: a system is a set of possible representations. The greater our uncertainty at "design time", the bigger the set of representations it can maintain. As we apply layer after layer of syntax, we are in effect restricting the set of states available. For example, if we have a blank page that can represent any text, we may restrict it to only accept English text. And then only sentences. And then only propositions. And then only Aristotelean syllogisms. We're eliminating possible representations.

By excluding physical states, we're decreasing the entopy, which according to the Second Law of Thermodynamics ("entropy goes up") means that we're pushing it outside the system (ie it's open). The mathematically-inclined would say that the amount of information introduced is equal to the amount of entropy displaced.

Then, at "use time", the system is "populated" and our uncertainty lies in knowing which state in the subset of valid ones is "correct". At different levels of syntax, we could define equivalences between certain states. One idea I'm kicking around is that this equivalence or isomorphism the key to the problem of semantics (or the emergence of meaning). More reading to do!

I've been thinking about the value people ascribe to information (as per my thesis!) and I'm of the view that, from a value perspective, there's two broad categories:

  • Content

  • Intelligence

Here, content refers to some strange and inexplicable mechanism whereby people appreciate some experience. This can be music, a video, web page, newspaper article, phone conversation, opera etc. In the affluent West, where food and shelter are assured, it is the reason we get up in the morning.

We can measure content in a variety of ways - most obviously duration of experience (time). Generally, the longer the experience, the more we value it. Other measures relate to quality (eg. if the sound is scratchy, the print hard to read or the video requires subtitles then we may value it less). It's hard for me to see a unifying theory for valuing this kind of thing, as it is very subjective. Yet, we do it every day: what's a CD worth, a movie ticket, a phone call etc? In the information age, we are continually valuing content.

What about entropy (mathematical information) measures? I recall a former housemate of mine - a PhD student in communications engineering and applied maths - joked that the entropy in a Bollywood Indian movie approaches zero, since the plot/characters/dialogue/score etc is all completely predictable. Since entropy requires a (parametric) model, what would that be for movies? This is a weird question, and one that I will stay well away from in my research. I suspect that this analysis is properly the domain of a branch of philosophy that deals with aesthetics.

The other category was intelligence. By this, I don't mean it in a directly cognitive sense. I mean it in a sense that, historically, stemmed from the military. So, "I" as in "CIA", not as in "IQ" or "AI". Hence, "Business Intelligence" is about producing actionable information. That is, information that you are required to make a decision and act upon.

For example, if customer numbers don't reach a certain threshold at a particular moment in time, then the product is exited. This decision rule is the model, and the customer count is the parameter. Often, the decision rule is more valuable than the actual metric. This confirms a long-held piece of wisdom: questions are more valuable than answers.

The appropriate measure for intelligence, then, is the extent to which you acted differently. For business intelligence, it is the financial consequences of this action. The idea of entropy (mathematical information) can be applied to measuring the uncertainty in the decision itself. For example, suppose there are two options: take the high road, take the low road. Intially, each is equally likely of being chosen, or acted upon (50%). If some event causes that to shift (20% / 80%), then the change in probabilities can be related to the influence of that event on the decision. That change in decision can have a value ascribed to it using regular decion analysis. It seems reasonable to me to ascribe that value to the the change of probabilities resulting from that event: the value of the intelligence.

I plan to look at options pricing theory (including "real options" analysis). This is a school of thought that links concepts of value, uncertainty (risk) and time with decisions, and is typically applied to investment decisions, specifically, the futures and derivatives markets. It can be applied to a much wider range of decision-making too.

In setting up a "content/intelligence dichotomy" it's interesting to consider the boundary. For example, "news" feels like intelligence, but is it? I am happy to receive headlines on my mobile phone via GSM, but I don't actually do anything differently: news of a celebrity's passing doesn't prompt me to do anything I wouldn't have done anyway. Yet I value it anyway, so it's content. What about politics? Voting is compulsory (well, turning up is anyway). What about weather reports? For other cities? Things to keep in mind as I stumble along ...

Last week, I reviewed a paper by a former PhD candidate of Graeme's - Daniel Moody (with Peter Walsh). They work at Simsion Bowles & Associates. The paper was presented at ECIS '99 and is called "Measuring the Value of Information: An Asset Valuation Approach". As the title suggests, it is very much in line with my thesis topic. The thrust of the paper is that organisations should treat information as a type of asset, and it should be accorded the same accounting principles as regular assets. The paper goes onto highlight the ways that information is different from regular assets, and hence how it should be treated differently.

The paper suggests seven "Laws" about information-as-an-asset, and proposes that (historical) cost accounting should be the basis for determing the value. This was done without reference to information economics. While I disagree with the both approach and the conclusions/recommendations in this paper, I am given great heart to see that there is a dearth of research in this area. I'm confident that this is a thesis-sized problem!

I am also pleased to finally see an IS researcher cite Shannon's fundamental analysis of "information" - even if I disagree with the conclusion. I'm puzzled, though, that the whole Sveiby/KM thing wasn't mentioned at all. (There was a passing mention of "information registers" but that was it.)

In other news, Graeme and I met with our industry sponsor - Bill Nankervis (Telstra/Retail/Technology Services/..?.../Information Management). I met with Bill a couple of times before while I was still a Telstra employee, but this was our first meeting as researcher/sponsor. We discussed some of Telstra's current goals and issues with regard to information value and data quality, and I'm confident that there is a strong alignment between my work experience, my thesis and Bill's objectives and approach.

Had the regular weekly supervision session with Graeme. Today we discussed the relationship between theories and frameworks, especially in light of Weber's 1st chapter and Dr. Hitchman's seminar (below). Mostly we looked at Graeme's paper on "The Impact of Data Quality Tagging on Decision Outcomes". The main feedback I had was the idea that people will use pre-existing knowledge about the decision task to infer data quality when they aren't presented with any explicitly. In the terms of Graeme's semiotic framework, the social-level "leaks" into the semantic-level. One approach - potentially underway - to control this is to use completely contrived decision tasks totally unfamiliar to the subjects. Also, I'm curious about how the tagging (quality metadata) of accuracy relates to "traditional" measures of uncertainty such as variance and entropy. Lastly, it seems that this research is heading towards exploring the relationships between data quality and decision quality. Ie consensus, time taken, confidence etc seem to be attributes of a decision, and teasing out the significance of data quality constructs on these outcomes would be a whole field of research in itself.

The other topic we discussed was the idea for a joint paper on Service Level Agreements for outsourced customer information. This would be an application of Graeme's framework to the question of how to construct, negotiate and implement an SLA for the provision of customer information services. I think this is quite topical as while CRM is taking off, organisations are shying away from the underlying data warehousing infrastructure. The paper would involve researching ideas of information-as-a-service and service quality theories and my own experiences as a practitioner. The motivation would be to show that data quality issues are a business problem, and can't be contained soley to the IT department. While it's not the main thrust of my thesis, it would be a nice introduction to the "trade" aspects of the research process (ethics, reviews, peer assessment, publication etc).

Lastly, there was a stack of actions for Graeme, involving chasing up information from various people (industry co-sponsor and former PhD student). I've borrowed two books: "Leveraging the New Infrastructure" (Weill and Broadbent) and "Quality Information and Knowledge" (Huang, Lee and Wang).

This morning we had a seminar from Dr. Stephen Hitchman on "Data Muddelling". In essence, he was saying that the IS academy has lost its way and is failing practioners in this subject area. That is, the program of seeking a sound basis for data modelling in various philosophies is a waste of tax-payers' resources and that, if anything, we should be looking at the work of Edward De Bono.

I'm not sure that I accept that my role as an IS researcher is to ensure that everything I do is of immediate relevance to practitoners. Academic research is risky, and involves longer time scales. Low-risk, quick-delivery research can be directly funded by the beneficiaries, and there are a number of organsiations who will take this on. This is part of the "division of labour" of IS research.

That said, Stephen's provocative stance has failed to dissuade me from finishing the introduction to Ron Weber's monologue on "The Ontological Foundations of Information Systems".

Last Friday, there was a seminar on "decision intelligence". I was keen to go, but unexpected family business whisked me away. After reading the abstract (below), I think while it may of been of general interest, it probably wasn't related to my research domain. It would, however, be of extreme interest and relevance to people working in large, complex and dynamic organisations, who are required to lobby somewhat-fickle decision-makers.

Predicting people's policymaking styles

Dr Ray Wyatt

School of Anthropology, Geography and Environmental Studies, University of Melbourne


Rather than "decision support", the focus is on "decision intelligence" for policymaking. This involves anticipating what policies different kinds of people are likely to favor. Such anticipation enables us to guess how much any proposed policy is likely to be accepted within the community - a consideration that can be just as vital for its ultimate success as any amount of logical, empirical or analytical "support". Therefore, this presentation begins by looking at the planning literature and at the decision-making literature for clues as to how to anticipate people's policy choices. But on finding very few, a radically different approach is outlined. It uses the speaker's own self-improving, advice-giving software which collects enough knowledge, about its past users' decision-making styles, to identify what policymaking criteria different sorts of people tend to emphasize. Such people-specific emphases will be outlined. They should help all professionals, everywhere, to foreshadow the community acceptance of any policy within any problem domain.

This is the website of one Karl-Erik Sveiby: He appears to be a leading researcher and practitioner - even pioneer - of the field of knowledge management. He has some interesting ideas on valuing non-tangible assets, and some very sensible things to say about organisational performance metrics. While his Intangible Asset Monitor is similar to ideas encapsulated in the Balanced Score Card methodology, he is at pains to point out the differences.

I wish I'd caught his seminar in my department last semester, but, the ".au" suggests he might be back.

Uh oh - a week's gone by without any blog postings. Hardly the point. Okay, a quick review then. I've been having regular weekly meeting with my supervisor, Graeme Shanks. So far, the discussion is around two topics: 1) the nature of research in the IS discipline and 2) Graeme's research in data quality. Of the former, I've been reading papers on IS research approaches (experiments, case studies, action research, coneptual studies etc) and stages (theory building, theory testing, and the difference between scholarship and research).

Of the latter, I've been getting across Graeme's approach, based on semiotic theory - the use of signs and symbols to convey knowledge. There may be collaboration opportunities to apply this framework to some of my professional work in defining and negotiationg Service Level Agreements with Application Service Providers, who primarily provide data and reports. While this isn't the thrust of my research, it might prove to be an interesting and useful (ie publishable) area.

The main gist, though, is on the value of information. This is no doubt related to the quality of data - probably through the notion of quality as "fitness for purpose". To that end, this week I'm looking into a text on the "Ontological Foundations of Information Systems" (Weber) and reviewing another of Graeme's papers on the role of quality in decision outcomes. I will also begin in earnest a look into information economics. I've attended some lectures on Game Theory, which, along with Decision Theory, will probably be a formalist way in.

I'm mindful of the relevance vs rigour aspects of this though, as I expect that models of how entities make decisions bears little resemblance to what people actually do in organisations. I think, generally, the benefits of a model lie in what is left out as much as anything.

This is my first post to a blog. I plan to post links and commentary to this blog as a journal of my research. I guess the audience is me (in the future) and friends, colleagues and well-wishers who have a passing interest in this topic, a web connection, and too much spare time. Hopefully, this will lend some legitimacy to my web browsing.

So first off, here's my homepage. I'm a PhD candidate in the Information Systems Department, working on an industry-sponsored research project with Telstra Corporation on, well, something to do with the value of customer intelligence.