Prev:
Chapter 4 - Context Interviews
Chapter 5
Conceptual Study
Next:
Chapter 6 - Simulations

1             

Summary. 1

Chapter 1 - Introduction. 12

Chapter 2 - Research Method and Design. 18

Chapter 3 - Literature Review.. 36

Chapter 4 - Context Interviews. 56

Chapter 5 - Conceptual Study. 84

5.1 Summary. 84

5.2 Practical Requirements. 84

5.2.1 Organisational Context 85

5.2.2 Purpose. 86

5.2.3 Outputs. 86

5.2.4 Process. 86

5.3 Theoretical Basis. 87

5.3.1 Semiotics. 88

5.3.2 Ontological Model 88

5.3.3 Information Theory. 90

5.3.4 Information Economics. 94

5.4 Components. 105

5.4.1 Communication. 105

5.4.2 Decision-making. 106

5.4.3 Impact 109

5.4.4 Interventions. 111

5.5 Usage. 114

5.5.1 Organisational Processes. 116

5.5.2 Decision-Making Functions. 116

5.5.3 Information System Representation. 118

5.5.4 Information Quality Interventions. 118

5.6 Conclusion. 121

Chapter 6 - Simulations. 124

Chapter 7 - Research Evaluation. 166

Chapter 8 - Conclusion. 180

References. 184

Appendix 1. 194


Conceptual Study

5.1        Summary

This chapter is a conceptual study of Information Quality (IQ) undertaken in order to develop a framework for IQ valuation. It evaluates and synthesises concepts from theoretical reference disciplines (including Information Theory, semiotics and decision theory) from the Literature Review (Chapter 3), motivated by the requirements from the practitioner Context Interviews (Chapter 4).

As part of the Design Science methodology, this constitutes artefact design, where the artefact is a framework comprising of a conceptual model, measures and methods for analysing the quality of information in Customer Relationship Management (CRM) processes.

The outcome is a target framework for valuing IQ improvements, with a view to organisational uses including business case development, performance evaluation and inter-organisational agreements. Subsequent chapters will evaluate this framework for rigour, relevance and usefulness.

5.2       Practical Requirements

This section addresses the context, intended use and goals of the framework under development. These are motivated by the findings from the practitioner interviews (Chapter 4), which identified a gap in Information Systems (IS) practice when it comes to IQ. The interviews also provided insights into the problem domain, for which a properly-conceived framework may prove useful. The role of design in the framework development is through the selection, composition and evaluation of abstract theoretical concepts for practical ends.

Thus, the development of this framework (including evaluation) is a Design Science research project. The creation of organisation-specific information-value models by business analysts is also a design science activity, in the sense that it involves the development of an artefact (model). However, the development, use and evaluation of these concrete models is not within the scope of this research project. The framework for creating such models is the object of analysis.

 

Figure 7 Use of the Designed Artefact in Practice

5.2.1         Organisational Context

The following points are a distillation of the analysis of this organisational context from the practitioner interviews (Chapter 4). Large organisations which maintain complex relationships with customers rely on high-quality customer information to operate successfully. To acquire and manage this information requires significant expenditure on systems, including capital and operational items. These can include information technology, staff training, auditing and testing activities, customer communication and vendor and supplier management.

In these organisations, significant resources are typically allocated to projects on a competitive funding basis, whereby projects compete for access to a capital budget, assessed by an investment review panel or similar decision-making body. The project owner develops a business case for the expenditure of resources, couched in investment terms and supported by financial models and metrics. Return on Investment and Net Present Value (along with other discounted cash flow approaches) are used the most.

While IQ is recognised by organisations as being important, it is difficult for IQ projects to compete for access to resources when the business case cannot be articulated. The problem stems from an inability to quantify the impact of IQ improvement, and to express this impact financially. As a result, there is likely to be significant and widespread under-investment in IQ, contributing to inefficiencies in resource allocation, customer dissatisfaction and competitive risks.

In some instances, IQ projects are approved by support from a sufficiently senior executive, relying on judgement. In addition to the potential for misallocation, this can also result in undermining confidence in the capital investment program in the organisation, characterised by one executive as “the squeaky wheel gets the oil”.

This investment problem does not arise from a lack of IQ measures: practitioners are aware of and use a number of measures to describe their systems. Nor is it due to a lack of financial sophistication among practitioners, with many managers, analysts and consultants experienced in preparing IS business cases. The key gap identified lies in conceptually linking the quality of customer information to financial outcomes in a way supports organisational decision-making. Specifically, in linking the two in a way that allows consideration of alternative courses of action: diagnoses of key value drivers, speculative “what-if” scenario testing and the evaluation and prioritisation of interventions.

5.2.2        Purpose

The framework is designed to be used by analysts to prepare a model of how IQ impacts on outcomes in customer processes. This model is to be used for organisational decision-making, primarily business case development. Further, the model may be useful for designing and setting service level agreements (for suppliers) and key performance indicators (for staff). In each case, an understanding of how IQ impacts on financial outcomes will better align the interests of managers with those of the organisation.

The use of the framework to develop such a model will also help the organisation better understand its own information supply chain. That is, it will foster an understanding of how customer information is used to create value within the organisation, the relative importance of different information elements for different decision tasks and the true costs associated with low-quality customer information.

Based on the extensive practitioner interviews, it seems that managers and analysts “close to the data” generally have firm views on where the IQ problems lie and how to go about fixing them. From this perspective, the development of such information value models is not seen as supporting their decision-making about diagnostics or intervention design; rather, it’s a means of communicating the problems and opportunities to key decision-makers “higher up” in the organisation.

These models can also help further the shared understanding between the owners of customer processes (“business”) and the managers of the supporting infrastructure (“IT”). By addressing the so-called alignment problem, prioritisation of work and planning should be facilitated, as well as improvements in working relationships.

5.2.3         Outputs

The framework, as an artefact in itself, is instantiated as a collection of concepts, formulae, measures and tasks for describing and modelling aspects of customer processes. As such, it is necessarily abstract and highly theoretical. The framework is to be used by analysts to prepare an information value model tailored to the target organisation.

This output model has a number of components that describe and link:

1.        Information elements.

2.       Customer processes.

3.        Quality interventions.

4.       Organisational outcomes.

Depending on the scope of the analytic effort, these may be mapped at a level of great detail or more superficially, by focusing on just the key aspects of the organisation.

As a bridging model spanning IT, operations and finance, the terms and quantities should be familiar to professionals in those areas, where possible.

5.2.4        Process

The framework is employed to produce information value models for the organisation. There are precedents for developing such artefacts within large organisations that invite comparison. For example, on the business side, most organisations produce and use cash flow models of their business activities. These capture the (expected) flow of cash over time from customers and to suppliers across different organisational units and are used to support planning and evaluation activities. On the IS side, many organisations conduct data modelling, where they document (and sometimes mandate) enterprise-wide definitions of entities, relationships and processes. Another example is the statistical or data mining models developed to support some aspect of operations, such as logistics, credit or marketing.

In each case, these artefacts are valuable organisational assets. They require a non-trivial effort to generate and a commitment from the organisation to adhere to or use them to realise that value. Further, they require ongoing maintenance or review to allow for changes in the operating environment or the organisation’s strategy. Lastly, they can be shared across organisational units (such as subsidiaries) or even outside the organisation, with trusted partners.

In common with these types of models, the proposed information value models would follow a similar high-level lifecycle of scoping, development, deployment and maintenance. Responsibilities and resources could be allocated in a similar fashion to other enterprise projects. Expertise would be required from the information management function, the customer process owners and the finance unit.

The theoretical concepts that underpin the model are, necessarily, complicated and not widely available. This is because the key to the model lies in the quantification of information, which inherently demands the use of comparatively advanced statistical techniques. However, a thorough understanding of these concepts should not be required to construct and interpret models.

In terms of technology, the models themselves can be expressed using spreadsheets or similar programmable calculation environments; no specialised software or hardware is required. As artefacts, these models would represent the distillation of knowledge of how the organisation acquires and uses customer information to create value. The effective sharing and security of such an asset must also be carefully managed.

5.3        Theoretical Basis

The framework under development must be grounded on a sound theoretical basis. This is because, for the artefact to be useful, it must generate models that describe what they purport to describe: the relationship between IQ in customer processes and organisational value.

This study draws on four reference theories, discussed in detail during the literature review (Chapter 3). As I have previously argued (Hill 2004) these reference theories provide sufficient conceptual and quantitative rigour for modelling of information and value.

·         Semiotics. This is the formal study of signs and symbols and provides an over-arching hierarchy for organising discussion of data and information.

·         Ontological Model of IQ. This maps the relationship between information systems and the external world they are intended to represent.

·         Information Theory. This mathematical theory is used to quantify the amounts of information within the models.

·         Information Economics. This theory is used to value the use of information for decision-making.

With the exception of the Ontological Model, these theories have their own long-standing traditions and conventions and have been applied to a wide variety of situations. In this context, semiotics has been used to tackle IQ from a purely conceptual perspective (Shanks and Darke 1998), while the Ontological Model (Wand and Wang 1996) is a rigorous general theory of IQ. The Ontological Model comprises the semantic level in the Semiotic Framework for Information Quality (Price and Shanks 2005a). The semiotic framework provides the starting point for this analysis.

The inclusion of Information Theory (Shannon and Weaver 1949) is necessitated by the practitioner requirement for quantification of IQ. Information Theory has enjoyed widespread success in this task in other applied disciplines, such as communications engineering, psychology, economics and genetics (Cover and Thomas 2005). Further, Information Economics has been included to enable practitioners to explain and characterise their models in financial terms, an identified gap in accepted methods for valuing IQ.

5.3.1         Semiotics

Semiotics, the study of signs and symbols in the most abstract sense, is a philosophical discipline that underpins linguistics, critical theory and related fields. At the core of the modern theory is the concept of a sign (Chandler 2007). This is a very general notion: a sign could be a traffic light used to control the flow of traffic, an article of clothing worn in a particular way or text written on a sheet of paper. Semiotics, in the Peircean tradition, is the study of the triadic relations between the sign’s (physical) representation, its referent (intended meaning) and interpretation (received meaning). As Price and Shanks note:

Informally, these three components can be described as the form, meaning, and use of a sign. Relations between these three aspects of a sign were further described by Morris as syntactic (between sign representations), semantic (between a representation and its referent), and pragmatic (between the representation and the interpretation) semiotic levels. Again, informally, these three levels can be said to pertain to the form, meaning, and use of a sign respectively. (Price and Shanks 2005a, p218)

The authors use this stratification into syntax (form), semantics (meaning) and pragmatics (use) as an organising principle for collating and rationalising a number of commonly-used IQ goals and criteria. The syntactic level is the domain of integrity constraints and conformance rule checking. Here, I focus on the semantic level (correspondence between the information system and the external world) and the pragmatic level (the impact of the information system upon organisational decision-making).

The reason for this is two-fold: firstly, the semantic level subsumes the syntactic in the sense that flaws in the syntactic level will manifest in the semantic level. For example, a syntactic problem like a malformed expression of a date (“2005-20-a3”) will result in a semantic (meaning) problem. The second reason is due to the scope of the study. With the emphasis on organisational value, the framework focuses on how meaningful customer information translates into action.

Following the Semiotic Framework for IQ, the semantic level is analysed in terms of the earlier Ontological Model for IQ to derive the criteria. However, here the Ontological Model is quantified using Information Theory and extended to include the pragmatic level. This is further analysed through an economic analysis of the (value) impact of information upon decision-making and action-taking within customer processes.

5.3.2         Ontological Model

This model of IQ, proposed in 1996 by Wand and Wang, is a clear expression of the relation between an information system and the external world it purports represent. It is a rigorous and theoretically sound approach to analysing this relation, based upon the idea of “states of nature” (Wand and Wang 1996). In this model, both the information system (IS) and the external world (EW) are taken as two (related) sub-systems of the physical world, each of which is governed by laws and must assume precisely one state (out of many) at every point in time. The model captures the essential nature of an IS in that the IS “tracks” the EW in some significant way. An IS user must be able to infer the underlying state of the EW based on observing only the IS. Based on this key insight, the authors proceed to establish the technical conditions under which this is possible.

Figure 8 Ontological Model (a) perfect (b) flawed.

 

The simple examples here illustrate the concept. The columns of circles represent the five possible states of interest, forming a state-space σ. “EW” refers to the external world, which has five possible states. In the context of customer processes, this could be the preferred title of address (eg “Mr”, “Ms”, “Dr” and “Prof”). At any point in time, each customer has precisely one title – not two, not zero. The purpose of the IS is to capture this by providing five states and maintaining a mapping between the possible external world states and the internal states of the IS. Wand and Wang refer to this process as representation (rep). The inverse – interpretation (int) – is the process of inspecting the IS state and inferring what the original external world state is.

So, for (a) this works perfectly: all states of interest are captured and the IS represents perfectly the EW. However, for (b), a number of deficiencies or flaws have been introduced. Firstly, there is a “missing” state: the fifth EW does not have a corresponding state in the IS. This is a design problem. Secondly, there is ambiguity during interpretation, arising from a representation problem. Observing that the IS is in the second state does not conclusively inform us of the original EW state: it could have been the first or second state since both could result in the IS being the second state. Based on similar considerations, the Ontological Model identifies four possible deficiencies in the mapping between the external world and information system.

·         Incompleteness. An EW state of interest cannot be represented by the IS.

·         Ambiguity. An IS state maps to more than one EW state.

·         Incorrectness. An EW state maps to an IS state such that the inverse mapping cannot recover the original EW state.

·         Meaninglessness. An IS state that does not map to a valid EW state.

Note that the Price and Shanks Semiotic Framework adds the concept of “redundancy” as a deficiency, on the grounds that multiple IS states mapping to a single EW state introduces the potential for other IQ problems (Price and Shanks 2005a). However, subsequent focus groups with practitioners suggest that redundancy is not necessarily an issue for them, equating it with the notion of “replication” (Price and Shanks 2005b). This suggests the practitioners did not understand the distinction between multiple instances of a data set (that is, replication in the database sense) and multiple equivalent possible states in an IS.

In the example used here, making a copy of the file is redundancy in the informal “replication” sense. This may or may not be a good idea, as the practitioners reported. However, in terms of the Ontological Model under discussion here, redundancy would be adding a sixth state, “Mister”, which is semantically equivalent to “Mr”. Regardless of which one is chosen, it will always be possible to infer what the EW state is. Since these two are semantically equivalent, the states can be treated as one “merged state” by the IS, without loss.

In general, if one state is a perfect synonym for another state under all conditions and possible uses, then its existence cannot introduce semantic errors. If there is a meaningful distinction to be made between them, then it is not a redundant state and its inclusion is a valid design choice. Accordingly, non-redundancy is not required here as a semantic criteria.

Also, the original definition of “completeness” used by Wand and Wang was restricted to mappings where the EW state cannot be represented by the IS (as above), indicating a design problem. The modified definition in the Semiotic Framework states that it arises where the EW is not represented, expanding to include operational problems such as “when a data entry clerk manually entering data into the IS accidentally omits an entry” (Price and Shanks 2005a).

Here, “missing data” is regarded as an operational correctness problem. If the IS is in the “null” state (that is, accepts the empty field) so it is not possible to infer the original EW state, then it meets the definition for being incorrect.

Consider the example of customers’ preferred titles above. If a customer is filling in an enquiry form displaying the five options and ticks multiple boxes, then this is ambiguous. If a customer leaves no box ticked, then this is a valid (but incorrect) state of the information system (enquiry form). If there is no option for “Fr” when this is a state of interest, then the enquiry form is incomplete. The completeness (or otherwise) of the relationship between the states is determined by the possibilities (design), not the actuality (operation).

Using the term “completeness” to characterise the expressive power of a language has a long history in theoretical computer science and meta-mathematics (for example, Gödel’s Second Incompleteness Theorem). The definition of incompleteness does not need to be expanded to incorporate the informal meaning of “missing data”, as long as the context makes it clear.

From the earlier discussion of semiotics, it is clear that this Ontological Model falls in the semantic level as it addresses the meaning of signs. That is, the relationship between the representation (IS) and its referent (EW). The four criteria for semantic quality are that the relationship is complete, unambiguous, correct and meaningful. Departing from the Semiotic Framework, the model adopted here uses the original definition of completeness and excludes the additional concept of redundancy.

Conceptually, these definitions are clear, concise and sufficient for explaining the semantic deficiencies of an IS. However, not all flaws are equivalent and it is not clear how best to quantify them for comparisons. The next section proposes how Information Theory can be used for this purpose.

5.3.3         Information Theory

The Ontological Model of IQ provides a very clear basis for modelling the relationship between an IS and the EW. The definitions for semantic quality criteria that arise from it are logical, not quantitative. That is, a mapping is either complete or its not, or it’s correct or not. The definitions do not allow for degrees of ambiguity or grades of meaninglessness. In practice, no mapping is going to be perfect in all criteria so this raises the issue of how to compare deficient (potential) EW/IS mappings.

Most obviously, we could count the deficiencies. Using these logical definitions, it may be possible to identify one IS as missing three states (incompleteness) while another is missing four. Or one IS has four ambiguous states while another has six. This simple counting strategy may work with one criterion (such as completeness), but it’s not clear how this approach could be adapted to make comparisons across multiple criteria, for example, to make design trade-offs.

Further, consider two EW/IS mappings of the same external world, both with a (different) single meaningless state. On a naïve counting basis, these might be regarded as equivalently deficient. Suppose that, in operation, the first IS never gets into its meaningless state while the second one is frequently found in it. It seems reasonable at a common-sense level to infer that the second deficiency is worse than the first.

Lastly, comparison between mappings when the underlying EW is different is fraught too. If one mapping has 25% of its states incorrect and another has 20%, is that worse? What if the former has hundreds of states while the latter just ten? What’s needed is a reliable, objective and quantitative method for assessing and comparing the semantic quality of the EW/IS relationship.

The most natural approach to try is Information Theory. Developed by Shannon in the context of communications engineering[5] after World War II, it has evolved into a significant body of rigorous mathematical research, underpinning a range of applied activities spanning economics to linguistics to genetics (Cover and Thomas 2005).

Building on earlier mathematical ideas from Fischer, Hartley and Nyquist, Shannon showed how to quantify the amount of information conveyed in a message through a channel (Shannon and Weaver 1949). His first key insight was that a message is a selection (choice) from a range of possible alternatives. When the same selection is made by the sender (source) and the recipient (receiver), the message is deemed to have been communicated. Shannon’s second key insight was that information is the reduction of uncertainty. As a result of receiving the message, the recipient’s beliefs about the world change. (In semiotic terms, this process is called semiosis.)

Shannon’s remarkable achievement was to develop and present a unified and coherent model to quantify both of these insights: the amount of information in a source and the amount conveyed in a channel. As his basic model is isomorphic to the Ontological Model outlined above, this approach is applicable here too.

Figure 9 Simplified Source/Channel Model proposed by Shannon

In this simplified model (the channel encoding and decoding stages have been omitted), the sender (W) selects one of five possible messages (w1, w2 … w5). The receiver (X) must determine which of the five possibilities was sent. If both W and X agree, then the message was successfully transmitted. If X selects a different message, then we say the message was garbled. In the case of the “noisy channel”, (b) above, garbling is indicated by the dashed arrow. For example, w1 could result in x1 being received, or x2. If a w1 was sent, then either x1 or x2 could be received. Conversely, if x2 is received, it could be the result of either w1 or w2 being sent.

At a conceptual level, the specific medium of transmission (the channel) does not matter; it only matters whether the source and receiver agree on what was sent. This is purely a function of the probabilities of a particular message being garbled. Mathematically, a channel is characterised as a transition matrix, where the elements are the conditional probabilities. For the case of a perfect channel (Figure 9a), the transition matrix is the identity matrix ie ones on the diagonal and zeroes elsewhere.

X

W

Figure 10 Channel as a Transition Matrix

For a particular set of messages to be sent, some channels will be better than others, with none being better than the perfect case (a). (That is, more of the information in the source will be able to reach the receiver.) The figure of merit for assessing different channels is called the mutual information, or transinformation.

But how much information is in the source initially? Information Theory states that the amount of information is not determined by the number of symbols in the message, but how likely it is that a message is selected. When all options are equally likely, the amount of information in the source is at a maximum. Based on mathematical arguments, Shannon presents the entropy function as the appropriate function to quantify the amount of uncertainty (Shannon and Weaver 1949). The amount of information in the source, W, is given by the entropy (or self-information):

H(W) =

=


 

This quantity reaches a maximum when all source symbols w1 through w5 are equally likely to be sent (ie with a 20% chance). This quantity is log 5 ≈ 2.32 bits[6]. The derivation below uses logarithm laws to show this.

=

=

= log 5

So what is this the meaning of “2.32 bits”? One natural interpretation is that it must take, on average, at least 2.32 well-formed “Yes/No” questions to deduce which of the five messages was being sent. At the other extreme, if the only message ever sent was w3 (probability of one, all the others are probability zero), then the amount of information in W is zero. (Here, the convention that 0 log 0 = 0 is adopted, following arguments by limit.) This satisfies our intuition that a deterministic source must have zero information.

Using this definition of entropy, we can also define the mutual information of the transition matrix p(W=wi|X=xj), as used to characterise the channel:

I(W;X) = H(W) – H(W|X)

In words, the mutual information between two random variables W and X is the uncertainty about W minus the uncertainty about W given X. It is the difference between the uncertainty about W before observing X and afterwards. That is, how much uncertainty about one variable (W) is “left over” after observing a second variable (X). Mutual information reaches a maximum of H(W) when H(W|X) = 0. That is, when observing X is sufficient to extinguish all uncertainty about W.

Similarly for the correlation co-efficient, if W and X are statistically independent then their mutual information must be zero (observing X cannot tell us anything about W).

Armed with these definitions, I can systematically and objectively quantify the Ontological Model of IQ. The essence is to conceive of the external world as being the source and the information system as the receiver. The external world sends information about its state to the information system (representation), while users of the information system are able to infer the external world’s state by inspecting the information system (interpretation). The semantic quality of the IS is determined by how well it mirrors the EW. Now, with Information Theory, I can quantify this as follows.

Firstly, the source messages in W (w1 through w5) correspond to the states of the EW, while the received messages in X (x1 through x5) correspond to the states of the IS. The probabilities of these states occurring are known a priori, perhaps through historical observation. The transition matrix, T, describes the probabilities of each state being garbled (that is, incorrect) upon receipt. The amount of information in W is the entropy in W, H(W). This is the amount of uncertainty in W resolved upon observing W (hence the term “self-information”). The amount of information in X is similarly defined as H(X). The measure of semantic quality of the information system is defined as the normalised mutual information between W and X, which is dubbed here “fidelity”:

F =

= 1 –

Conveniently, this measure ranges from 0% to 100%. When F=0%, it implies “perfect failure”, that is, H(W|X) = H(W) so that observing the IS tells us nothing at all about the EW. (The IS is formally useless.) When F=100%, it implies “perfect information” and H(W|X)=0 so that observing the IS reduces our uncertainty about the EW to zero. (The IS is formally infallible.) Real ISs are likely to be somewhere within this range.

The name “fidelity” is chosen here because it means “faithfulness” (from the Latin fide), in the sense of how “faithfully” the IS tracks the EW. In using the IS as a proxy for the EW, users are trusting the IS to be a “faithful” substitute. It is in this sense that audiophiles use the terms “hi-fi” and “lo-fi” (high fidelity and low fidelity, respectively).

This measure captures the operational semantic criteria of ambiguity and incorrectness. An increase in ambiguity entails a dispersion of the probabilities in the transition matrix, resulting in a decrease in fidelity. Similarly, an increase in incorrectness entails moving probabilities off the main diagonal of the transition matrix, also decreasing fidelity. It is possible to decrease the ambiguity and increase the correctness, or vice versa, while increasing the fidelity.

The design semantic criteria of completeness and meaningfulness relate to the presence of all and only all state in the IS needed to capture states of interest in the EW. Suppose there is an “extra” state in the EW (w6, implying incompleteness). If there is a zero probability of this state being occupied, then it – literally – doesn’t count. If there’s a finite chance of the EW getting into this “extra” state then the IS must be in some state (because it’s always in precisely one state), which, by definition, must be the wrong one. Hence, incompleteness during design necessarily results in incorrectness. An “extra” EW state may map to just one IS state (making it consistently and predictably incorrect), or it may map probabilistically to a range of IS states (making it ambiguous as well as incorrect).

Suppose instead that there was an “extra” state in the IS (x6, implying meaninglessness). As above, a non-zero probability of the IS ever getting into this state means that it is the same as that state not existing. However, if it is ever occupied then, again by definition, the EW must be in some other state, which also results in incorrectness. Furthermore, with the IS in a meaningless state, the EW cannot consistently be in the same state – if it were then it would no longer be meaningless because it would become perfectly synonymous with another meaningful (yet redundant) IS state.

So the magnitude of design errors – incompleteness and meaninglessness – can be assessed by their effect (if any) during operation. The way they are manifested is through ambiguity and incorrectness which, as shown, are characterised by the fidelity measure. Therefore fidelity is an appropriate general measure of semantic information quality.

With the semantic level of IQ quantified in this way, it is now possible to tackle the pragmatic level, which is concerned with the use of information. First, we need to understand how information is “used” and how we can quantify that. For that, I turn to the analysis of decision-making and the value of information as formulated in the discipline of information economics.

5.3.4         Information Economics

The concept of “value” as a criterion, measure or goal in IQ research has often been poorly understood. This difficulty is explicitly acknowledged in the Semiotic Framework for IQ:

[V]aluable relates to the overall worth or importance of the data with respect to the use of that data. Of all the quality criteria listed, this is the most problematic in that it has inter-dependencies with all of the other quality criteria. That is, data which is not highly rated with respect to other criteria (e.g. not complete, not reliable) will necessarily be less valuable as a result. However, in accord with most information quality researchers, we believe that a comprehensive understanding of quality requires the inclusion of such a criterion, sometimes termed value-added or value.

In essence, the quality criterion valuable acts as a generic place-holder for those aspects of quality specific to a given application rather than universally applicable. Thus, other than replacing the generic quality criterion with the appropriate domain-specific terms for each individual application, the only other option is its inclusion despite the resulting inter-dependencies. The problems and significance of this particular quality criterion has not, to our knowledge, previously been acknowledged in the literature. (Price and Shanks 2005a, p222)

The distinction between value and quality is an important but difficult one. In particular, whether value is a quality characteristic or quality is a determinant of value is difficult to reconcile. In both cases, it’s clear that some sort of comparison is being made. Here, it is proposed that the comparator is what is different: a quality assessment is made using an “ideal” as the yardstick, whereas a value assessment compares against (potential) alternatives.

Consider the example of something mundane, like ice-cream. When determining the quality of an ice-cream, a consumer may compare a range of characteristics of the ice-cream (flavour, texture, quantity, safety etc) against an “ideal” ice-cream, which is conceivable but does not exist. This ideal will vary from person to person, and perhaps even across time. The quality assessment of an ice-cream is expressed in terms of shortcomings against these criteria, perhaps as a star-rating, a percentage score or text description (review).

By contrast, a value assessment involves ranking the ice-cream against candidate alternatives in terms of how much utility (satisfaction, pleasure) the consumer derives from it. (In the formalism of Utility Theory, this ranking is called a partially-ordered preference function.) The specific reasons why the ice-cream is ranked in its position are not considered, and they too vary from person to person and over time. This ranking process can be operationalised experimentally by using observations of people’s behaviour to reveal their preferences.

These two kinds of assessments have their own strengths and weaknesses, and may be useful in different situations. Both have subjective elements, such as the weighting given to quality criteria or the specific ranking of an alternative. Both have objective elements too: the quantity of ice-cream, for example, can be measured objectively. So too can people’s preference for a vanilla over cabbage flavoured ice-cream.

Quality assessments have two advantages over value assessments. Firstly, quality assessments can be used to gain insights into which aspects or characteristics consumers care about the most. With ice-cream, for instance, by explicitly setting up a list of criteria and evaluating each in turn, it is easier to see how changes could be made to improve the product. In this sense, quality is more important for design activities where trade-offs must be made.

The second advantage is for situations when no comparable alternative is available. A nation’s tax system is, in general, not discretionary and taxpayers do not have the option of using another one. Assessing the value of such a system (and hence, their preferences for tax systems) is going to be fraught when no such choice is possible. (Governments, which do have such a choice, may examine the value of their tax system.)

Value assessments, however, do have advantages of their own. Primarily, value assessments – being more abstract – allow comparisons between unalike things. A child faced with “You can have two more rides on the merry-go-round or an ice-cream” will determine which is more valuable to them. For business, the most powerful aspect of this is the comparison with cash amounts, which allows pricing of goods and services. If someone ranks their preferences for a $10 note, an ice-cream and a $5 note in that order, we can conclude they value the ice-cream at between $5 and $10. Using smaller and smaller amounts of cash, we could drill down to a specific price (cash equivalent) at which they are indifferent. This quantity of cash is their price[7].

These different valuations by buyers and sellers is what drives transactions, and hence markets, and explains the concept of “value-added”. If the seller is willing to accept $6 for the ice-cream while the buyer is willing to pay $9, this $3 gap is called the “consumer surplus”. In general, a proportion of this $3 will go to both parties; the specific amount depends on market conditions such as competition and regulation.

I use this distinction to address the pragmatic quality of information and the value of information.

While the Semiotic Framework conceives of the pragmatic level as incorporating quality criteria such as useful, relevant and valuable, it does not prescribe a quantitative model for measurement of these constructs. Instead, the authors propose the use of consumer-centric, context-specific instruments such as surveys to understand this dimension (Price et al. 2008). In contrast to the semantic level, the pragmatic one is necessarily subjective.

This approach is not sufficient for the purpose of this framework. During the practitioner interviews, practitioners reported that they understood where “the points of pain” were in their systems and – broadly – how to fix them. What they said they required was a solid business case that credibly stated, in particular, the benefits side of the equation. They were adamant that organisational funding processes dictated that this had to be quantitative, expressed in financial terms and commensurate with other capital projects. In essence, this is a call for a de-contextualisation of the information systems and associated customer processes, where all factors are reduced to future expected cash flows.

As a measurement approach, pricing like this could be characterised as subjective-quantitative. It is quantitative, in the sense that real ordinal values are used to describe the phenomenon and logic and mathematics drive the calculations. However, it is also subjective in that, at root, preferences are innate and ultimately cannot be explicated. In this way, prices differ from measurements of natural phenomena in science and engineering. For example, the price of a single share in a listed company may be subject to a huge amount of mathematical analysis and modelling yet it is determined by the aggregate opinions of thousands or millions of actors. This is not to say the prices are entirely arbitrary or set on a whim, but constitute a shared understanding.

Prices may ultimately be subjective, but if an organisation has an agreed valuation method for final outcomes, then intermediate prices may be derived objectively (in the sense that different people can arrive at the same answer). For example, suppose a retailer has a standard method for valuing stock in various locations (warehouses, retail shops etc). These prices may be set using a mixture of estimated market prices from competitors and suppliers, precedent, custom and judgement. But given these (subjective) prices, it is possible to price objectively the costs and benefits of changes to the retailer’s logistical system, such as changing shipping frequencies, trucking routes or warehouse capacity. It is in this sense that the quality of information can be objectively and quantitatively valued.

Information Economics, as broadly conceived, examines the value of information in widely different contexts. The starting point is that information is both costly and – potentially – beneficial. People can be observed behaving in ways that suggest they have preferences for different information for different tasks. Managers, for instance, will expend organisational resources on acquiring information in the belief that the benefits outweigh the costs.

Approaches to quantify and valuing information are incorporated into microeconomics, which deals with supply and demand, individual decision-making and Utility Theory. In the von Neumann and Morgenstern game-theoretic reformulation of neo-classical microeconomic theory (Neumann and Morgenstern 2004), very general assumptions are made about how rational people deal with uncertainty. Specifically, the Expected Utility Hypothesis assumes that (groups of) people, when faced with multiple possible outcomes, will assign a utility (“benefit”) to each outcome and then weight each utility by the probability of its occurrence.

This “weighting” approach to risk can be described as a “rational gambler’s perspective”, in that it involves calculating the probabilities and pay-offs for possible outcomes. Indeed, it was in that context that it was first proposed by Daniel Bernoulli in 1738. The Expected Utility Hypothesis is a normative theory of behaviour and, while it stacks up quite well in practice (Lawrence 1999), more sophisticated descriptive theories take into account nuances of irrationality and cognitive biases and other deviations from this ideal. One such alternative is Prospect Theory (Kahneman and Tversky 1979).

Another key concept in Information Economics is the definitions of semantic and pragmatic information. These definitions correspond with those described in the Semiotic Framework for IQ, albeit in a more formal mathematical sense. In particular, a message or event contains semantic information if and only if it changes someone’s beliefs; while a message or event contains pragmatic information if and only if it changes someone’s actions.

Consider this explanation from an Information Economics perspective:

Pragmatic information involves the application of the statistical [semantic] information[8]; it concerns the potential impact of the statistical information; it concerns the potential impact of the statistical information on choice and pay-off in a specific decision problem. This distinction separates nouns commonly associated with statistical attributes of information, such as coherence, format, and accuracy, from pragmatic attributes such as relevance, completeness, and timeliness. Statistical information affects what the individual knows; pragmatic information affects what the individual does. (Lawrence 1999, p5).

It is clear that improving semantic information quality is necessary, but not sufficient, for improving pragmatic information quality. For example, if you know your friend’s birth date, then finding out the precise hour of his birth (thus reducing ambiguity) does not have any bearing on your decision about when to send a birthday card. Further, improving pragmatic information quality is necessary, but not sufficient, for increasing information value.

So knowing something “for its own sake” is not valuable. This is clearly a very narrow view of value and much of our information – especially entertainment, news and gossip – would not qualify as valuable by this definition[9]. However, information economists argue that information that does not result in a changed decision may still prove valuable if it reveals something to the decision-maker about their information sources (Lawrence 1999). So a rumour may not be immediately useful, but if it later proves correct, then the source may be seen as more credible and hence the decision-maker is more likely to act on future information from that source.

The elements of the information economic model (as used here) include:

·         A set of states of nature, with a probability distribution over them.

·         A set of outcomes, with pay-offs associated with each.

·         A decision-maker, with a defined utility function (for risk-aversion and time-preferences).

·         A set of options, from which the decision-maker can select.

If I can extend the semantic-level Ontological Model for IQ (meaning) to include these elements, then I have a natural extension for the pragmatic level (use) that lends itself to a quantitative valuation. The point of common contact is the state-based probabilistic view of nature and the distinction between semantic and pragmatic information quality.

Figure 11 Augmented Ontological Model

Here, I have extended the familiar Ontological Model to incorporate decision-making. It’s represented here as a process that maps states of the external world (or its proxy, the IS) to an action. This function is the object of study in Decision Theory (and related disciplines like Game Theory), in a general sense. Theories from Management Science and Operations Research play a role in developing the particular decision functions used in practice. The precise means for how such functions are designed and implemented is not the concern of this study. It suffices to say that such functions exist, are used as a matter of course and can be specified.

As before, there is a state-space σ defining a set of states of interest. The external world state is described as a probability distribution, W, over σ while the IS is a probability distribution X over σ as well. These two random variables are related by the transition matrix T, representing the communication process.

I now augment this with another state-space, π, describing a set of possible actions. Depending on the task, this set could comprise “yes” and “no” (at a minimum), or there may be a hundred actions, each indicating a price (from $1 to $100) that the decision-maker (or “DM”) may bid at an auction. The only requirement is that the options are mutually exclusive – one and only one eventuates.

Two probability distributions are defined over this state-space. The first, Y = [y1, y2, …, yn], is the action chosen by the DM. The second, Z = [z1, z2, …, zn], has the same cardinality as Y. Its interpretation is, informally, the optimal action; that is, the action that the DM would have preferred, given full and perfect information. In the same way that W represents the “correct” state of the world, Z represents the “correct” action.

The combination of Y and Z – the realisation process – defines the possible outcomes from the decision task. The realisation matrix, R, enumerates the possible outcomes and assigns a probability to each occurrence, conditional on the decision taken. From the DM’s point of view, Y and Z should be in constant agreement as this means the DM is making the “correct” decision each time. This case is described by the identity matrix.

Z Z

Y Y

Figure 12 (a) Perfect and (b) Imperfect Realisation

Note that the matrices in Figure 12 are expressed in terms of conditional probabilities p(Z=zi|Y=yj) and in this form is a row-stochastic Markov matrix (Lawrence 1999). When expressed as joint probabilities p(Z=zi,Y=yi), such matrices are referred to as “confusion tables” in the Decision Theory and Machine Learning literature. In any case, for a given Y = [y1, y2, …, yn], I can derive the joint distribution from the conditional by multiplying the ith row by yi.

In these examples, the first realisation process is “perfect”, in the sense that whenever y1 or y2 or y3 is chosen, the optimal decision is z­1, z2 or z3, respectively. The second matrix describes an imperfect realisation process: sometimes, the DM makes sub-optimal choices. For example, when the DM chooses y1, 20% of the time it eventuates that z2 is the optimal choice. The DM would have been better off choosing y­2.

Analogously to T, the transition matrix that characterises the communication process at the semantic level, R is a transition matrix that characterises the realisation process. I can define measures on this matrix that evaluate how well the DM is performing. Indeed, that is precisely what well-known statistical measures such as χ2 (the chi-square statistic) does, or ρ (Pearson’s rho for correlation) for that matter (Neuman 2000). In addition, for the binary case, the usual Information Retrieval measures apply (precision and recall, also known as sensitivity and specificity in other contexts) as do more sophisticated approaches, like lift charts and ROC analysis, in Machine Learning and data mining (Hand 1997)

These intuitively appealing measures only apply in the binary case. A more general approach must allow for multiple action possibilities. In this situation, I can use Information Theory to quantitatively score classifier performance (Kononenko and Bratko 1991). The idea is to look at the amount of uncertainty about Z, and what Y tells us about Z, using their mutual information:

I(Y;Z) = H(Z) – H(Z|Y)

This can form the basis of a performance metric – the average information score – that can be used to compare classifier performance on the same decision task (for example, using different algorithms) and also between decision tasks, since it takes into account how “hard” different tasks are. For example, detecting pregnancy in a maternity hospital is a pretty trivial decision task: simply labelling all female patients “pregnant” will (presumably) get you to at least 95%. Detecting pregnancy in the general population, however, is not so simple. The use of information-theoretic measures takes into account these “prior probabilities”.

While this does extend the Ontological Model into the pragmatic level in a way that allows quantification, it does not meet the goal of providing a valuation measure. For that, I must add one additional element: pay-offs.

The pay-offs are defined as changes to the DM’s utility (or net satisfaction), expressed as equivalent cash amounts. Note the definition of the pay-offs as cash-equivalents of a change in utility, rather than as cash, to take into account the time preferences (discount rates) and attitudes to risk. However, it is not clear how to operationalise the “change in utility” measure, as it is an abstract and essentially introspective construct. This is where the inherently subjective nature of valuation comes in to play; it may be possible to derive these cash-equivalent amounts analytically or experimentally in some cases, but generally the quanta are a matter of judgement.

For the purposes of this model, the pay-offs are expressed as costs “relative to perfection” (that is, really penalties), in keeping with the established custom in the Machine Learning and Information Economics literature (Lawrence 1999). This is because using the case of perfect information results in zero cost whereas imperfect information results in a positive cost. It should be noted some research indicates that practitioners prefer to think of decisions in terms of costs and benefits (Chauchat et al. 2001; Drummond and Holte 2006). In practice, rather than zeroing the scale at perfect information, it may be more palatable to zero it at the current level of performance, so that changes can be assessed as costs or benefits, depending on direction. In any case, since the two methods are interchangeable, I stick with the cost-based system.

Figure 13 Pay-off Matrix using the Cost-based approach. All units are dollars.

In this example, a “correct” decision attracts a $0 penalty. However, deciding on the third option (Y=y3) when the second was optimal (Z=z2) results in a penalty of $3 (Π3,2). Inspecting the columns, we see that Y=y2 is the most “risky” option – penalties for mistakes range up to $11. By contrast, Y=y3 ranges from 0 to 3. A rational DM would take this into account when choosing between y2 and option y3. Indeed, the advice from the machine learning community is to always use pay-offs to evaluate performance, where they’re available and applicable (Hand 1997).

In order to compute the expected cost of imperfect information (or, equivalently, the expected value of perfect information), I invoke the Expected Utility Hypothesis as follows. For Y = [0.2, 0.5, 0.3], I derive the joint probability distribution from R, the realisation matrix, and do entry-wise multiplication with the pay-off matrix, Π:






Hence, in this case, the expected cost of the imperfect realisation process is $1.40. A rational DM would never spend more than this amount to improve their decision-making, since there is no way the cost could be recovered. In this sense, the expected value of perfect information represents a ceiling on the price a DM would pay for improvements.

I now consider the elements that go into making up this cost. From the definitions of the realisation process and the pay-offs, I note that the costs are accrued when the ith action selected from Y is not optimal: Z = j ≠ i. I describe three sources, using the illustrative example of a mortgage approval process within a bank.

In this scenario, the semantic space σ, is very large and is the space of all possible situations applicants might be in, including age, income bracket, post code, employment status and so on. The pragmatic space π is the space of all possible actions the bank could take – y­1 is “accept application” and y2 is “refuse application”. This second state-space is much smaller than the first.

From the bank’s perspective, W is a probability distribution, since they don’t know the true situation of each applicant, but they do have information a priori. The quality of semantic information characterises how well, on average, the distribution X informs the bank about distribution W. When the external world state and the IS state differ, this is called an error.

Y is the action the bank takes for each applicant and Z is the optimal action, given hindsight, for that applicant. The bank uses a decision function, D, to map applicants to actions. For any applicant’s situation (drawn from the semantic space σ), the bank will select an action from π. Naturally the bank will wish to ensure that its chosen action is as close to optimal as possible. When this does not occur (for example, accepting the application when the optimal action would have been to reject it), this is called a mistake. In contrast to an error, which is a wrong belief about the external world, a mistake is wrong action. A mistake attracts a penalty to the bank, as specified in the pay-off matrix.

Under this model, there are three sources of mistakes: the quality of information, the decision function and residual domain uncertainty. The first arises when an error results in a mistake. That is, a mis-characterisation of the EW by the IS (an incorrect mapping, in ontological terms) causes the decision-maker to select the wrong action. This is illustrated below.

Figure 14 Costly Information Quality Defect

Here, an applicant should have been in the first state (w­1), but was mis-represented in the IS as being in the fourth state (garbled into x4). As a result, the DM refused their application (mapped them into y2). In fact, applicants in the first state should have been approved (z1 was optimal). Not all garbling events in the communication process will result in a mistake. In this example, had the applicant been mis-represented as being in x2 instead of x1, the decision function would have correctly mapped them into y1. In this instance, it would have been a semantic defect (error) but not a pragmatic defect (mistake).

The second source of mistakes is the decision function itself. Real decision-making processes will make mistakes even when presented with perfect information. This could be due to inherent limitations in the algorithms employed or defects with how it is implemented. Indeed, many researchers and practitioners involved in fields as diverse as Management Science, Operations Research and Computer Science focus entirely on improvements to decision-making where perfect information is often assumed.

The third and final source of mistakes is “residual domain uncertainty”, which captures the notion that, despite all the information and algorithms in the world, some decision tasks are always subject to an unavoidable element of chance. This means there is a “ceiling” level of performance innate in a decision task, which cannot be bested by any amount of information quality improvements or algorithmic enhancements.

Figure 15 Breakdown of Sources of Costly Mistakes

While these sources of mistakes generate real costs that are borne by the organisation, I am here only interested in those arising from deficiencies with information quality. For example, spending money on fine-tuning the mortgage approval business rules – perhaps informed by benchmarking against comparable institutions – may be a worthwhile project. But this does not directly help practitioners formulate a business case for information quality improvement. As a consequence, I need to modify the model under development to only account for the cost of mistakes introduced by IQ, not all mistakes.

To do this, I introduce a new construct, Y*, which substitutes for Z. Recall that Z is the “optimal action”, perhaps taken with the wisdom of hindsight in the case of mortgage approvals. Given deficiencies in the business logic and the inherent vagaries of human behaviour, it is setting a very high bar. Instead, I define Y* as the action that would have been taken had perfect information been used. Using a function notation D(·) to describe the decision-making process:

Y = D(X) The actual action taken, using imperfect information.

Y* = D(W) The ideal action, using perfect information.

This more attainable comparator is used to compare the decision made with imperfect information (Y) with the decision made with perfect information (Y*), using the same real-world decision function. Thus, it only measures shortcomings in IQ, not algorithms. If Y* and Z are very different from each other, it suggests a serious problem with the decision function. (That is, the decision function consistently cannot produce the optimal answer even when presented with perfect information.)

This distinction between Z and Y* also addresses the pathological situation of an error improving the decision, due to a deficiency in the decision function or the vagaries of human nature. Any complete economic analysis of error removal must include the costs introduced by new mistakes introduced when errors are removed. An information system that degrades in performance as errors are removed may seem entirely anomalous and unlikely. However, in situations where source systems are known to be compromised but fixing the problems there is not possible, it may be expected that the decision function is instead “tweaked” to address these issues. In such cases, where the “downstream” applications compensate for “upstream” data problems, fixing the source data may lead to a deterioration in decision performance. By using Y* as the comparator, and not Z, these types of deficiencies are excluded from the analysis.

Recall that V was the expected pay-off using R (the mapping between Y and Z) and the pay-off matrix, Π. I now define R* as the realisation process mapping Y and Y*, so V* is the product of R* and Π. Whereas V measured the value of removing all mistakes, V* measures the value of removing all errors, so V ≥ V*, with equality if and only if all mistakes are due to errors.

Figure 16 Revised Augmented Ontological Model

 
 


Of course, as discussed, this is not likely to be the case for real systems: mistakes will creep in even with perfect information. Similarly, since not all errors will lead to mistakes it is possible to have an imperfect IS that performs as well as one with perfect information.

Based on this discussion, I define the concept of actionability. An error is actionable if and only if it results in a different action being taken. In terms of the above model, an erroneous X (doesn’t correspond to W) is actionable if and only if it results in a mistaken Y (doesn’t correspond to Y*). It follows that inactionable errors (or changes) must be worthless.

For deterministic decision functions, a change of state (through error or correction) either will always change a decision, or it never will. It is not probabilistic. However, I can generalise this concept of actionability to situations where the change applies to a class of states and hence may be characterised as probabilistic, in the sense that at the time of the change of state, there is uncertainty as to whether the decision will also change or not. A measure of this uncertainty about the impact of a change can be used to operationalise the general concept of relevance.

The concept of relevance is defined in a number of fields dealing with information and decisions, especially law and economics. In both cases, the specific test for relevance is whether or not it has the potential (or tendency) to induce changes in the probabilities. From a legal perspective, “relevance” is considered part of the rules of evidence. For example, section 55(1) of the Uniform Evidence Act 1995 (Cth) defines relevance:

The evidence that is relevant in a proceeding is evidence that, if it were accepted, could rationally affect (directly or indirectly) the assessment of the probability of the existence of a fact in issue in the proceeding.

In economics, John Maynard Keynes formalised this notion. He explicitly defined a fact or piece of evidence as being irrelevant to an assertion if and only if the probability of x is the same with and without the evidence (Keynes 1923). More formally, for an existing set of knowledge k, new evidence e is irrelevant to assertion x if and only if P(x|k) = P(x|k&e).

In terms of the Semiotic Framework, relevance is identified as a pragmatic level criterion. Since improving the semantic quality of irrelevant information cannot – by definition – have any bearing or impact on decision-making, it is must be worthless. However, it may not be possible to know at the outset whether an improvement will change a decision. A probabilistic measure of relevance, as suggested above, captures the inherently statistical nature of such relations between information and decisions.

5.4       Components

This section outlines the components of the framework, describing them conceptually and mathematically. This allows us to analyse customer information quality with a view to quantifying and valuing the impact of interventions designed to improve IQ.

Here, I restrict my analysis to situations involving data-driven customer processes: a large number of customers are considered in turn and are partitioned into a small number of subsets for differentiated treatment based on the application of business rules to their attributes. This could apply to common data-driven decision-making functions such as direct marketing, credit scoring, loan approvals, fraud detection, segmentation, churn prediction and other classification and prediction tasks.

Customer attributes include demographics (date of birth, gender, marital status, location), socio-economic details (education, employment, income, assets), product history (details of which products were purchased or used), contact history (inbound or outbound contacts) or third party “overlays” such as credit assessments, legal status (for example, bankruptcy or immigration) or other market transactions.

5.4.1         Communication

The communication process models how well the external world is represented by the internal IS. In this case, I assume a customer database, comprised of C customers. Conceptually, each customer is represented by a row in a table (record) where each attributes is represented by a column. More formally, each customer has an external world individual state, denoted ce, for the eth customer. This state can be decomposed into a attributes, such that the customer semantic space, σ, is given by:

While some attributes have a small range of possible values (for example, gender may be just {male, female}), others may have be very large. For continuous valued attributes like income, I assume “binning” (conversion into a discrete attribute through the use of appropriately sized intervals).

To give a sense of the dimensions involved, this framework anticipates the number of attributes (columns) to be in the order of ten to fifty, while the number of customers (rows) is in the thousands to hundreds of thousands.

The quality of the representation of the external world – the semantic information quality – can be quantified by asking “How much uncertainty is removed by observing the IS?” I can put this on a mathematical footing by defining a probability distribution, W, over σ that describes the external world. Similarly, I define another probability distribution, X, over σ that describes the IS. Applying Information Theory yields the previously described fidelity measure, φ:

This expression – ranging from 0% to 100% - captures the amount of information communicated about the external world to the internal IS representation of it. It reaches 100% when H(W|X) = 0; that is, the uncertainty in the external world state given the IS state is zero (knowing X is always sufficient for knowing W). It reaches 0% when H(W|X) = H(W), which implies that knowing X always tells us nothing about W.

By extension, I can define fidelity at attribute level too, here for the eth attribute:

I use We (and Xe) to denote the value of W (and X) on attribute Ae. Note that in general these e will not all add up to as there is “side information” amongst the attributes. For example, knowing something about a person’s education will, on average, reduce my uncertainty about their income. For the case where all the attributes are perfectly statistically independent, the e will add to .

The interpretation of the fidelity measures (the normalised mutual information score) is of the proportion of the uncertainty (in bits) reduced. In this sense, the measure takes into account how “hard” it is to “guess” the right answer and in doing so, makes comparisons between different attributes more meaningful than simple “accuracy” (percentage correct or P(W=i,X=i), for all i).

For example, consider the two attributes A1 and A2 and their respective channel matrices, C1 and C2, expressed as joint rather than conditional probabilities:

Note that both attributes are binary valued, but that while the first is balanced (a 55%-45% split) the second is quite unbalanced (a 10%-90% split). In more concrete terms, A1 might be gender (with, say, 55% female) while A2 might be a deceased flag (with 90% alive).

Both attributes are 85% correct; that is, 15% of customers are in error (the diagonal adds to 0.85). However, their fidelity measures are quite different, with while The interpretation is that it is more difficult (ie easier to be wrong) to represent gender than deceased status. In fact, given that 90% of customers in this group are alive, simply labelling all customers “not deceased” will achieve 90% correctness. Therefore, the deceased attribute is performing quite poorly, as reflected in its comparatively low fidelity.

From an information-theoretic perspective, fidelity measures the incremental amount of uncertainty removed, given the initial amount of uncertainty. These two attributes have different amounts of uncertainty in the first place (gender has 0.99 bits of uncertainty while deceased has only 0.47 bits), so the removal of uncertainty has a different impact on the score.

By taking into account the inherent “difficulty” (statistically-speaking) of capturing some attribute, I can make a better assessment of how well the information system is representing that attribute. The fidelity measure does this better than correctness or probability of error.

5.4.2        Decision-making

This framework is intended to be applied to a range of customer processes that rely on customer information. These data-driven decisions are made by considering each customer, one at a time, and applying business rules or logic to attributes of the customer to make a determination about how the organisation will treat that customer in future. Whether it’s approving a loan, making a special sales offer or assigning them to a demographic marketing segment, each customer is mapped onto one action (treatment, offer, label etc) from a small set of possible actions.

Depending on the context, this process can be described as classification, segmentation, partitioning, prediction or selection. The function that performs this task could be implemented formally, perhaps as a decision-tree (with a hierarchy of IF … THEN … ELSE logic) or regression model, or informally, as with a flow chart or customer interaction script.

The study and analysis of algorithms and techniques for making the decision is not within the scope of IQ improvements. Instead, for analytical purposes, I only require that such a function for mapping customers is deterministic. This means that the same action is produced for an identical input customer, each time. The decision function should not change (or “learn”) over time, there is no discretion, judgement or randomness in the function and the order in which customers are presented should not matter. That is, if the list of customers is randomised and then re-presented to the decision function, each customer should be mapped to exactly the same action as in the first run.

While the determinism requirement excludes judgement-based customer processes[10], it still covers a large number of situations that rely on discretionless customer processes, such as found in direct marketing and credit management.

I define a probability distribution, Y* over the set of possible actions, π; these are the actions produced by the decision function, D, if perfect information is presented to it. Another probability distribution over π, Y, is the actual decision produced by D if given the imperfect information in the information system. I express this as:


I can now ask “how much information is needed to make a decision, on average?” The answer is: “enough to remove the initial uncertainty in which action will be taken (indecision)”. Suppose that a decision task requires assigning customers into two “buckets”: the first will receive a special offer, the second will not. A decision-tree is built that uses customer attributes to make the determination. This function, D, assigns 30% of customers to receiving the offer, while 70% won’t. The indecision in this task is given by:


This means that, prior to the decision function being applied, there is 0.88 bits of uncertainty about what the decision will be. Afterwards, the uncertainty is 0 since, by definition, D is a deterministic function:

So, the mutual information between Y and X is H(Y). This can be seen from the formula for mutual information:

Intuitively, there must be less uncertainty on average about what action to take than the there is about the state of the customer. In other words, there’s less information in the decision than the description. Therefore, H(Y) ≤ H(X), since to suppose otherwise would entail creating information “out of thin air” violating the Data Processing Theorem (Cover and Thomas 2005).

That is the case for the complete customer state, X. I can examine the situation where just one attribute is revealed and ask “How does knowing attribute X­e change our indecision?” If that knowledge has precisely no effect on the decision, then I can say (following Keynes) that that attribute is irrelevant. In general, I’d expect it to have some effect on the indecision, otherwise it would not be included in the decision function. I define the measure influence to describe the effect of the eth attribute on the final decision:

It is the normalised mutual information between the decision Y and the eth attribute, X. It ranges from 0% to 100%. The former occurs when H(Y|Xe) = H(Y) (that is, telling us Xe does not change our assessment of the decision) while the latter occurs when H(Y|Xe) = 0 (that is, tell us Xe removes all uncertainty about the decision ie Xe completely “drives” the decision).

Consider the direct marketing example above. Initially, any given customer has a 30% chance of being selected for the special offer. Suppose that I am told that the customer is female and no other details. Given the way the decision function works, the odds now shift to a 35% chance of making the offer (and hence a 65% of not getting the offer). This suggests that gender has a modest bearing on the decision. If, instead, the deceased attribute showed that a customer was deceased, then the probability of making the offer drops from 30% to 0%. This suggests that deceased has a powerful effect on the decision – at least for those 10% of cases where the customer is marked “deceased”. The influence measure formalises this idea.

By definition, the mutual information is symmetric:



So the influence measure can be defined in terms of uncertainty about the attribute value given the decision:


This bi-directionality can be illustrated with the direct marketing example. Initially, we have that any given customer has a 10% chance of being deceased. If we find out that a particular customer made it onto the special offer list, our assessment of this unfortunate status changes to 0%. This means that knowing something about the decision tells us something about the customer attributes. In fact, the decision contains precisely as much information about the attribute as the attribute contains about the decision.

Either way, the individual influence scores for each attribute will not add to 100%; this is because of redundancy in “side information” between attributes. If all attributes were statistically independent, then the influence scores would add to unity.

Influence measures the relationship between customer attributes and decisions. As such, it is characterised by the decision-making function itself and not the relationship between the IS and the external world. In other words, an influential attribute will be influential regardless of its semantic correctness.

It’s also worth sounding a note of caution about attribute influence on decision outcomes. Similarly to correlation, influence does not imply causation. For example, a segmentation task may make extensive use of the postcode attribute, resulting in a high influence score. The suburb attribute would have a very similar score (since it is highly correlated to postcode), yet is never used by the decision function. Hence, any changes to suburb will not result in changes in the decision – they are not causally related.

The influence measure, defined here as the normalised mutual information between the decision and the attribute, quantifies the degree of relevance the attribute has on the decision.

5.4.3         Impact

The impact of customer IQ on customer processes lies in the decisions made. At the semantic level, IQ deficiencies result in misrepresentation of the external world (error ie W ≠ X). Pragmatically, this is only a deficiency if this error results in a different decision than would have been made with perfect IQ (mistake ie Y ≠ Y*). Whether or not an error will become a mistake depends on how the decision function uses the information to arrive at a decision.

Formally, the goal is to get Y and Y* as close to agreement as economic. The straightforward measures of agreement rely on comparing rates of false positives and false negatives: sensitivity/specificity, recall/precision, lift and ROC analysis, depending on the domain. As measures, they suffer from two drawbacks. Firstly, they only work in cases with binary “yes/no” (“approve/reject” or “offer/no offer”) decisions. They do not scale well to situations involving more than two actions (eg “platinum/gold/standard/reject”).

The second issue is that they do not address “prior probabilities”. Akin to the discussion above regarding the inherent “difficulty” of stating different customer attributes, it is more difficult to do well in a situation where the possible decisions are split 50%-50% than one that is 99%-1%. With the former, there is more uncertainty (or, here, indecision). For example, a fraud detection application that simply reports “no fraud” in all cases will be 99% correct – if the underlying rate of fraud is 1%.

One response is to quantify how close Y* and Y are by using an information-theoretic measure, the mutual information I(Y*;Y). The argument is that if the actual decisions made (Y) are a good predictor of the ones made under perfect information (Y*) then I can be confident that the decision function is operating close to the ideal. This approach takes into account non-binary decisions and deals with the “prior probability” issue.

However, if information about the cost structures of mistakes is available, this should be used. Such information – expressed as the pay-off matrix Π – allows us to describe the impact of IQ on the process in financial terms. This has two advantages. Firstly, it allows us to compare IQ impact across different processes in a fair way. Secondly, it allows IQ impact to be assessed against a range of other costs borne by the organisation.

Determining the pay-off matrices for customer processes is a non-trivial task. Theoretically, the goal is to express the costs in terms of expected utility. This can be expressed in terms of the cash equivalent, where I assume a risk-neutral decision maker (and hence a constant marginal utility of wealth). Following accepted practice, the Net Present Value (NPV) is a reasonable proxy. As found in the practitioner interviews (Chapter 3), it is familiar and widely used in large organisations, including by IS managers. The discount rate should be chosen to comply with the organisation’s financial norms and standards. If necessary, more sophisticated variations could be employed. For example, the Weighted Average Cost of Capital (WACC) could be used to take into account the difference costs associated with funding capital with equity versus debt.

To build up the pay-off matrix for a customer process, a number of objective and subjective factors must be brought together. Details of the true cost of, say, mistakenly issuing a customer with a “gold” (premium) credit card when they should have been given a standard one will, in general, depend on objective factors including:

·         the number of customers processed,

·         the frequency with which the process is run,

·         the time horizon for the process,

·         the discount rate,

·         fixed costs associated with operating the process.

However, the major factor is the subjective value placed on the mistake by the organisation. This cost includes lost revenue opportunities, additional financial and non-financial risks and damage to reputation and goodwill. In practical terms, this assessment is likely to be made by a sufficiently senior manager.

A properly constructed pay-off matrix for a customer process allows managers to understand the magnitude of the cost associated with each kind of mistake. The second part of the equation is the frequency with which these mistakes occur. This is captured by the realisation matrix R*, which is the joint probability distribution between Y and Y*.

Consider the example of a mortgage process for a financial services organisation. Suppose there are three possible decisions to make: approve, partner and reject. Approve means the mortgage is granted, partner means the applicant is referred to a partner organisation specialising in “low-doc” or “sub-prime” loans and reject is an outright rejection of the application.

After analysis of the objective and subjective costs, the organisation has the following pay-off matrix (expressed as total future expected costs per customer, discounted to current dollars):

Hence, approving an applicant that should have been rejected incurs the largest cost of $5000. By contrast, rejecting an applicant that should have been approved incurs a more modest cost of $1500, presumably in lost fees and other income.

The realisation of the process is given by the matrix R*:

This tells us that the organisation makes the right decision (or, at least, the one that would have been made with perfect information) 75% of the time (0.2+0.3+0.25). For the other 25% of customers, some mistakes will be made.

Using the Expected Utility criterion, the expected cost (per customer) is given by the scalar product of these two matrices:



Note that in this example, the cost of mistakenly accepting applications that should have been referred to the partner constitutes the largest cost element ($105), more than double the cost of the next biggest mistake. This is despite having a moderate penalty ($1500) and reflects the relatively high frequency of its occurrence.

The interpretation of the amount of $290 is that it is the maximum amount a rational decision-maker would pay to acquire perfect information for a given customer. Multiplying this amount by the number of customers that go through a process (N) and the number of times each customer is processed (f) yields the stake of the process:


In terms of correctly handling the time dimension, some adjustment should be made to the frequency to discount the time value of money. For example, for a marketing process with a static pool of 100,000 customers that is run six times a year over four years, the resulting 2.4 million “use-instances” should not all be given equal weight. Cash flows arising in the fourth year should be discounted using the organisation’s internal discount rate. Further, a more detailed analysis should take into account the natural departure and arrival of customers over the four year period in question.

Stake is a measure of the total cost introduced into a customer process by using less-than-perfect customer information. As such, it provides an upper bound on the value of any improvements to customer information; a rational decision-maker would never pay more than the stake to improve a process since it would be impossible to ever recover more than the stake by just improving the quality of information.

5.4.4        Interventions

The goal of the framework is not just to understand the costs imposed on an organisation by poor customer information quality. Rather, it is to appraise the financial implications of intervening in customer IQ with a view to improving it. Conceptually, the approach is to model these interventions as investments: an initial expenditure of resources followed by a (risky) financial return in the form of increased revenue or decreased costs.

Modelling IQ interventions in this way is useful for two reasons. Firstly, it allows for comparison between competing IQ initiatives, ensuring that the most valuable ones are selected. Secondly, it allows for the justification of IQ interventions with common organisational standards used by non-IQ initiatives so that such initiatives or projects are approved and funded.

An abstract approach to modelling IQ interventions is required, since IQ interventions can be radically different in mechanism (though not necessarily in effect). Typically, large organisations faced with customer IQ problems have a large range of options for tackling the problem, combining technical and managerial elements. Some examples include:

 

·         change design or layout of customer forms,

·         change business rules or definitions,

·         change process flow,

·         insert quality checks in processes,

·         re-work ETL (extract, transform, load) scripts in databases,

·         modify underlying data model to enforce integrity constraints,

·         train staff involved in handling data,

·         change management performance criteria or bonus/penalty structure,

·         re-negotiate Service Level Agreements with data suppliers,

·         audit, review and improve application code,

·         implement specialised data quality software for consistency checking,

·         rationalise the information systems in the organisation,

·         employ extra staff or outsource manual data validation,

·         purchase new data sources from data brokers (eg Dun and Bradstreet)m

·         adopt new standards (eg XML) for handling data,

·         introduce new management methodologies (eg Six Sigma) for quality assurance.

All of these activities are costly, time-consuming and risky. Plus, they will require significant support from a range of people within the organisation and other partners. The idea is to characterise these interventions in terms of their impact upon customer information quality, and hence, customer process outcomes. When these outcomes can be assigned a financial magnitude, I have the basis for comparing these very different candidate interventions on an objective basis.

Formally, I define an IQ intervention as a change in the IS state, X, into X’. In terms of the customer process model, the change is induced on one or more attributes of X. The decision function, D, is applied to the new IS state, X’, resulting in a new action, Y’. This is represented schematically as follows:

Figure 18 Model if IQ Intervention

The top half of this model proceeds as usual: the external world state, W, is communicated to the IS state, X, via process C. The decision function, D, is applied to the X resulting in the decision Y. The realisation process R* relates this decision with the optimal decision, Y*.

The bottom half introduces the intervention process, T. This process maps the IS state, X, into a new state X’. The intent is to achieve an IS state that is a better representation of the external world, W. When the regular decision function D is re-applied to this new state, a new decision, Y’, may result. If X’ is a better representation of W than X, then Y’ will be a better decision than Y.

In terms of comparing different interventions, it is natural to prefer the ones that will have a bigger impact. For any particular intervention, it could have a number of possible effects on an individual customer attribute:

·         No effect. This means the intervention “agrees” with IS ie the new value is the same as the old value. Hence, no change in the decision and no change in value.

·         Inactionable correction. A change of the IS state at the semantic level, but no change in the decision and hence value.

·         Actionable correction. A change of the IS state at the semantic level and the pragmatic (decision) level, corresponding to a change in value.

·         Valuable correction. An “actionable correction” resulting in a positive change in value.

This leaves open the possibility of an actionable correction with negative value – a situation that arises when an intervention actually makes a process perform worse. This may still be warranted, if overall the intervention has a net positive impact. This situation is analogous to a public inoculation programme, where the downside of some individuals’ allergic reactions is offset by the community benefit to eliminating the disease.

However, in general, it is not possible to anticipate whether a particular change of state will be valuable or not, or, indeed, be actionable or not. This must be done by running the decision function over the changed state and seeing if it produces a different decision.

In this way, it is possible to value a particular intervention process, T, by examining its effect on the decision outcomes. An intervention that results in precisely the same set of decisions being made (ie Y=Y*) is economically useless even if it corrected errors in the original communication process. This is the same distinction between semantic quality improvement and pragmatic quality improvement that motivated the earlier definition of actionability:

An Information Quality intervention is actionable if and only if the change in IS state results in a changed decision.

Clearly, any intervention that does not change the IS state (ie X=X’) is not actionable and inactionable interventions are, by definition, worthless. From a design perspective, the goal is to efficiently identify interventions that are most likely to have an impact on decision-making, especially high-value decisions.

The value of an IQ intervention, T, is given by the difference in costs between the baseline position (no intervention) and the situation with the intervention. Using the Net Present Value model for the stake discussed above, I have the Yield:

Where S is the baseline stake (cost of imperfect information) and S’ is the stake under the intervention. So, for intervention T on the pth process, I have:



In words, the value of any intervention for a particular process is the difference in the contingency matrices multiplied by a suitably-scaled pay-off matrix. However, the shared nature of information resources within most organisations means that a change in a source information system, such as a data warehouse or master data repository, will impact across a number of “downstream” processes. The total value is simply the value of the intervention summed over each process within the scope of the analysis:

This metric is the key “performance measure” for evaluating a proposed intervention, as it quantifies the financial impact of a proposed intervention across the processes of interest. The economically optimal intervention is the subset of candidate interventions that maximises this quantity. However, the intervention design task involves more than a simple brute-force search of all possible interventions. Estimating the model parameters is itself likely to be expensive and prone to error. Changes to the source information system will have different impacts on different processes, which themselves vary in their worth to the organisation. A rational approach to designing and evaluating information quality intervention requires not only that the final intervention is itself “good”, but that the process that led to it is reasonably efficient and transparent.

A “map” describing at the outset the stages of design, estimation and evaluation of competing options will help give confidence in the final result. A value-led approach will prevent wasting time and resources on investigating futile, inconsequential or insignificant interventions, as well as offering guidance on how to modify or extend interventions to increase their value. Further, a transparent and objective method of appraising interventions may go some way to assuaging concerns about special-interests in joint-funding of projects.

5.5        Usage

This section outlines the sequence of steps to design and evaluate an Information Quality intervention. It proceeds by undertaking a value-driven analysis of both the opportunities for improvements (technical capacity) and areas of greatest need (business requirements) using the mathematical metrics defined on the constructs described above.

The scenario targeted for adoption of the method has the following elements:

·         a single information source (such as a data warehouse, operational datastore or similar),

·         comprising a set of customer records, one for each customer,

·         with each record having a number of attributes, including demographic and transactional data,

·         used by a number of customer decision processes to partition, segment, classify, predict, allocate or label on a per-customer basis,

·         where multiple candidate information quality improvement interventions are to be evaluated.

Note that the “single information source” does not have to be a single physical table or even database; a network of inter-linked information systems acting in concert to produce an abstract view of the customer suffices.

An example scenario could be a financial services firm where a customer database, augmented by demographic data from an external supplier, is used periodically by a set of marketing, campaign management, customer relationship management applications and credit scoring processes. The enterprise may be considering the relative merits of training contact centre staff on data entry, purchasing a data cleansing tool or integrating customer data from a subsidiary insurance business.

The goals of the method are:

·         Effectiveness. The initiatives recommended by the method should be provably near-optimal in terms of value creation.

·         Efficiency. The method should produce an analysis using a minimal amount of resources, including time and expertise.

·         Feasibility. The computing requirements, availability of data, degree of theoretical understanding and disruption to IS operations should be within acceptable limits.

·         Transparency. The constructs, metrics and steps employed should be intelligible and reasonable to IS professionals, and perceived as being unbiased and aligned to organisational-wide interests.

Rather than evaluating proposed interventions – that is, asking “What can we fix?” – the method proceeds by asking “What needs to be fixed?”. Only then do we ask “What is the best way to fix it?” This approach mitigates developing good interventions for IQ problems of comparatively little economic significance.

The method has two phases. First, it starts with a wide scope of possible problem areas and narrows it through successive iterations of data collection and analysis using the performance metrics. (See Figure 18 below.). Second, candidate interventions are evaluated in terms of costs and benefits, providing an assessment of their value in terms consistent with the organisation’s requirements for formal decision-making. The same metrics can be used to track and review the implementation phase of interventions.

Figure 19 Overview of Method

5.5.1         Organisational Processes

A consequence of taking an organisation-wide view of customer IQ improvement is that a dollar saved from Marketing is worth the same as a dollar saved from Sales. As such, all organisational processes that rely on customer information to create value should be included within the scope. For many organisations, this is a potentially large number of processes spanning numerous information sources and organisational processes. Rather than a full audit of all processes, the Stake metric can be used to prioritise those processes that are likely to have business significance.

The factors are:

·         N. The number of customers that are processed. This can be expressed as a percentage of the customer base for comparative purposes.

·         f. The frequency with which the process is run. This can be expressed as an annual rate or, if the analysis has a fixed time horizon (such as four years), then the rate over that interval.

·         R*. The realisation matrix. Depending on the nature of the project, these probabilities may be expressed as business performance measures like lift or default rates.

·         Π. They pay-off matrix. The dollar-amounts associated with success of failure of a decision-process may not be readily available on a per customer basis. In such cases, a suitable business owner may be able to nominate approximate amounts based on an understanding of the business cost structures and internal accounting model.

At this point, the organisational processes are ranked according to Stake. Therefore, the absolute dollar amounts and probabilities are not so important. Estimating these factors is likely to be difficult, so the analyst should proceed by trying to eliminate processes as soon as possible. For example, it is a waste of effort to conduct interviews with senior managers to ascertain particular values of Π for some process only to find out that it applies to a fraction of 1% of the customer base.

Another approach is to estimate a range (minimum value to maximum value) for these parameters. The product of the minimum values gives a lower bound, the product of the maximum values gives an upper bound and the product of the mean values gives a mean estimate. The upper bound on a process’s Stake can be used to eliminate a process from further consideration. The lower bound can be used as a conservative measure for inclusion of a process on the list.

The output is a list of the most valuable organisational processes that rely on customer information, ranked by their Stake. (This metric can be interpreted as the amount of value lost due to poor IQ.)

5.5.2        Decision-Making Functions

The organisational processes under analysis rely on different customer attributes to different degrees and not all attributes are used in all processes. This step identifies the attributes that matter most using the Influence metric.

The components here are:

·         Y. The probability distribution over the decision.

·         Xe. The probability distribution over the eth attribute.

·         H(·). The entropy (uncertainty) function.

The Influence of a particular attribute on a particular decision-making function is computed without regard to the inner workings of the function. It can be described solely by inspection of the joint frequency of occurrence between inputs (IS value) and outputs (process decision). To take a tiny illustrative example, consider the Influence of a binary-valued attribute (gender) on a binary-valued decision (marketing offer):

P(Xgender,Y)

Yes

No

Male

0.25

0.1

Female

0.05

0.6

Table 11 Example of Attribute Influence On a Decision

In this example, the customer base is 35% male and 65% female and the marketing offer is made to 30% of customers (Yes=30%). Using the Influence formula, I compute:




So in this case, the gender attributes has around a 35% bearing on the decision. This indicates how much uncertainty about the decision is removed when the gender is known. For a randomly selected customer, there is a 30% chance that the decision function will classify them as receiving an offer. However, upon finding out the customer is female, this drops to a 7% chance. (Had they been male, the probability would have increased to 83%.) In this sense, the attribute is considered quite influential.

As this calculation does not rely on having “true” external world values or “correct” decisions, it is readily available and cheap to perform using existing data. The only requirement is for a suitable query language (such as SQL or even XQuery) or a very simple spreadsheet.

Using this metric, the most influential attributes for the high-stake processes can be identified and ranked. To calculate the aggregate effect of an attribute upon a set of organisational processes, I can adopt the following heuristic:

The Importance M of attribute a, is the product of its Influence and Stake over the set of processes P of interest:

Since the percentage values of the Influence do not add to unity, weighting the Stakes in this way results in a figure of merit that is useful only as a guide and should not be taken as a real financial measure. However, it does go some way to providing a sense of the financial importance of each attribute aggregated across the set of processes of interest.

The output of this step is a list of attributes that are most influential on decision-making in high-value processes, ranked by Importance.

5.5.3         Information System Representation

Using the short-list of important attributes, the analyst proceeds to examine the opportunities for improvement. The first place to look is the areas with the poorest information quality: the assumption is that the greatest scope for improvement (and hence financial gain) is those attributes that perform the worst.

As discussed, the raw “error rate” is not a good measure for improvement, owing to the issue of prior probabilities. Instead, the Fidelity metric (on a per-attribute basis) is a fairer way to compare attributes with one another:

The Fidelity metric is a “gap measure” in the sense that it provides a normalised description of how far the reality falls short of the ideal. As such, it gives a sense of the comparative “improvability” of each attribute under examination.

Naturally, sourcing the true “external world” value of a customer attribute is expensive and difficult – if it were not this entire exercise would not be required! – so collecting these values should be done sparingly. In a practical sense, this is achieved by:

·         Identifying a suitable source. Depending on the attribute, this could be done through finding an authoritative source (such as an applicable government registry) or direct customer confirmation. In some instances, it may be sufficient to find another trusted source system or industry benchmark source.

·         Sampling the customer base. Since computation of the Fidelity metric requires only a probability distribution, only a sub-set of customers need to be audited. The question “How many customers do I need in my sample?” is answered by “Enough to be confident that the attributes are ranked in correct order.” That is to say, it may not be necessary to estimate the metric for each attribute to a high degree of confidence as long as there is confidence in their rank[11].

Based on the trusted source and sample, the Fidelity metric for each attribute is computed and a new attribute ranking is produced. This new ranking can be used to eliminate attributes that, while important in the sense outlined above, are already of high-quality. This means the short-list of important attributes is further reduced to just those where significant improvement is both warranted and feasible.

5.5.4        Information Quality Interventions

The next step is to examine candidate IQ improvement interventions. These may have been proposed in advance of this analysis or new ones may have been proposed on the basis of the earlier analysis into organisational processes, decision-making functions and the representational effectiveness of the IS.

The short-list of attributes is the starting point. Interventions that address these are more likely to be (comparatively) valuable than ones that address attributes not on the list, since these attributes are both important and improvable. The value of a particular intervention, T, can be computed using the Yield formula:

This requires computing two Realisation matrices across the set of processes of interest. The first relates the prior (actual) decisions Y to the decisions with perfect information, Y*. The second relates the revised decisions Y’ (after the intervention) to Y*. In general, this can only be achieved by applying the intervention (that is, correcting the customer data) and “re-running” it through the same decision processes and comparing the rate at which decisions change. This is likely to be expensive, time-consuming and possibly disruptive to operations since it uses operational resources like storage space, network bandwidth and CPU cycles.

Similarly to computing Fidelity, only estimates of overall probabilities are required so sampling will help reduce the cost of the exercise. However, before this is undertaken across all candidate interventions, some may be eliminated beforehand. Recall that for any intervention on a particular customer attribute, there will be a proportion of instances in which the corrected value “disagrees” with the original value.

Mathematically, I define this proportion as:

(This metric is called Traction, since it characterises the degree to which an intervention actually changes the status quo.)

Not all of these interventions will be actionable, that is, result in a changed decision. Further, not all actionable changes will have a positive impact on value. In general, we might expect a trade-off between the proportion of customer records that are changed and whether or not the change is beneficial. An intervention with a large Traction may be termed aggressive, while an intervention that focuses on ensuring all changes are beneficial might be described as cautious.

I can estimate the Traction for an intervention without “re-running” the customer records through the decision function. Therefore, it is a comparatively cheap metric, given that a sample of the intervention is available. Taking a conservative approach, candidate interventions with low Traction can be eliminated if they have a low value even when assumed to be maximally cautious ie every change results in the maximum positive impact. This can be calculated by picking the maximum from each processes’ pay-off matrix and multiplying it by τ. An example illustrates:

Suppose the Region attribute is under review. This attribute, Xregion, has four states: north, south, east and west. The intervention, T, involves replacing values in Region with those from a database held by a recently-acquired subsidiary business with (presumably) better geo-coding, X’region.


I can compare the two by sampling the joint probability mass functions:

The Traction for this intervention is given by:



This means that 19% of customer records will be changed. Some of those changes will be actionable, some won’t. Some of those actionable changes will have a positive impact on value (ie improve the decision) while the remainder will have a negative impact.

Suppose I consider an alternative intervention, T’, on the same attribute, via an internal consistency check with those customers with a current street address:


I can compare the two by sampling the joint probability mass functions:

The Traction for this second intervention is given by:



So the first intervention has a Traction of 19% while the second only 3%. While this doesn’t necessarily mean the former is preferred, it does suggest that the latter could be dropped from the short-list for further (expensive) evaluation on the grounds that, even if it were perfect, it could not have more than a 3% impact on any given process. Of course, if further evaluation revealed that the first intervention was pathological (that is, introduced more errors than it removed), then this one could be pursued again.

The short-listed candidate interventions can now be examined in more detail. Again, a sampling approach is used to estimate the expected benefit of the intervention. To implement this, a sample of “corrected” customer records is fed into the decision-making function and the outputs (decisions) compared with original outputs. The differences in decisions are scaled by the pay-off matrix for each process and aggregated. The total expected value of each intervention (yield) is then computed:

The interventions can then be ranked by their expected value. At this point, proposed interventions can be combined, disaggregated or more closely targeted. For example, fixing date-of-birth and postcode may both show significant benefits, enough to justify implementation. However, when combined, they may yield even higher returns than singly through decision synergy. Alternatively, the yield may improve if the intervention is targeted to the top 25% of customers, rather than applied across the entire customer-base.

Formerly-eliminated proposals can be revived if the short-listed interventions show a lower-than-expected benefit. If this occurs, then the “re-run” and subsequent analysis can be repeated in order to increase the value realised by the entire set of interventions.

5.6       Conclusion

This chapter presents a framework for valuing improvements to the quality of customer information. It comprises a mathematical model, grounded in semiotic and economic theory and used to derive performance measures, and a method for systematically analysing the value opportunities in customer IQ improvements.

The framework responds to practitioner requirements to build robust, testable business cases to support improvements in IQ based on cash-flows. These financial measures are necessary for IS practitioners and business managers to communicate the value of such initiatives and influence existing organisational resourcing processes.

The target users of the framework are analysts within the organisation engaged in a value-led, technology-agnostic analysis exercise. The method uses iterative elimination of proposals to focus on high-value opportunities and seeks to minimise wasted effort on irrelevant, ineffective or low-stakes interventions. It also takes into account the costs of acquiring information about the performance of different aspects of the model.

The output is a set of interventions which optimises the overall expected financial yield from the organisation’s customer processes. This does not capture the intangible and the elusive “soft” benefits (improvements to morale, forecasting and planning, reputation etc), so it is going to be a lower-bound on the value of the interventions. However, it is a hard lower-bound that is more acceptable to financial controllers, enabling IQ projects to compete with a range of IS and non-IS investments.

The key constructs, metrics and steps in the method are outlined below. The analysis relies on the use of a set of candidate interventions, proposed by different business and IS stakeholders, being successively refined and eliminated.


 

 

Step

Construct

Metrics

Description

Resources

1

Organisational Processes

Stake

(Realisation matrix, R)

An audit of organisational processes that rely on customer information is undertaken.

 

Internal documentation relating to process costs and performance.

 

2

Organisational Process

Stake

(Process pay-offs, Π)

For each, the decision outcomes are identified and their pay-offs and penalties estimated.

 

Value estimates from process owners.

3

Decision-Making Function

Influence

and Importance

For each process, the Influence of each attribute is computed. The aggregate Importance is then derived.

 

Transaction history of processes, including outcomes.

4

Information System

Fidelity

A sampling approach is taken to understand how well the IS represents the external world.

An authoritative information source (or surrogate) for attributes of interest.

 

5

Quality Interventions

Traction

Sampling of interventions used to gauge magnitude of change on the database.

 

Intervention applied to representative subset of records.

6

Quality Interventions

Yield

Promising interventions are “re-run” through process to estimate net financial benefit.

 

“Revised” records processed and compared with original.

Table 12 Outline of Method for Valuation

From a project management perspective, the scope of the analysis is determined principally by two factors:

·         The number of organisational processes of interest.

·         The number of proposed quality interventions of interest.

Other factors – the number of attributes in the database and the state of existing knowledge – are not directly controllable but are determined by existing conditions. The time taken to prepare the analysis depends on the initial scope and the quality of the outputs, that is, the level of financial rigour and detail. This level is determined by the intended use: a formal cost/benefit analysis for very large projects will be judged to a higher standard than a smaller, informal analysis.

Much of the analysis can be re-used later. For example, the Influence metrics will remain constant as long as the underlying business logic and database definitions don’t change too much. Similarly, Fidelity and Stake will change slowly with respect to business conditions and, once estimated, may only need to be updated periodically to retain their usefulness.

[5] The paper that launched the field was originally called “A Mathematical Theory of Communication” in 1948.

[6] Note that all logarithms used here are base 2, unless otherwise stated. This means the unit for information is bits.

[7] Economic Theory describes two prices: Willing To Pay (WTP, or “buy price”) and Willing To Accept (WTA, or “reservation price”). In general, WTP ≤ WTA, a phenomenon known as the endowment effect.

[8] This author refers to the concept of semantic information as “statistical information”. It does not mean “statistical”, in the sense of “data collected by statisticians” or similar.

[9] Indeed, some branches of economics examine information value from the perspective of social signalling and conceive of information as a positional good, whereby being seen as someone “in the know” confers status.

[10] Of course, this is not to say that judgement and discretion were not used in formulating the business rules and decision parameters in the first place.

[11] There is a well-developed body of knowledge around the practicalities of statistical sampling, especially sample size and statistical significance of rankings can be determined with Spearman’s ρ or Kendall’s τ (Neuman, 2000).

 


Prev:
Chapter 4 - Context Interviews
Up:
Contents
Next:
Chapter 6 - Simulations