Rating tobacco industry documents for corporate deception and public fraud
Brown, Catherine Gene
MetadataShow full item record
Publicly available tobacco industry documents represent a window into an industry that perpetrated corporate deception and fraud that resulted in degraded public health and cost millions of lives. The current study addresses the topic of corporate deception and fraud from a linguistic standpoint, employing corpus methods, text analysis, (critical) discourse analysis and automated computational linguistic methods to assess a selection of six automated linguistic indicators of deceptive corporate strategy. These six linguistic indicators of deceptive corporate strategy were mined from an extensive body of deception and language research. These indicators represent common themes and observations in the literature and include the following: adversarial language, allness and superlative language, deprofiled agency due to overuse of passive constructions, group mentality, cognitive verbs, and strategically ambiguous language. Computer programs were written and used to assess single documents for the instance of each linguistic indicator of deceptive corporate strategy. Using the Tobacco Documents Corpus, a specialized full-text corpus representative of the entire body of tobacco industry documents, each indicator was assessed separately by source (company of origin), audience affiliation (internal or external to the tobacco industry), decade and audience type (individual or mass recipients). Additionally, internal audience documents were automatically ranked for deceptive corporate strategy using a vector model method. Tobacco control literature has demonstrated that external audience documents are deceptive and fraudulent as a whole. Accordingly, the linguistic benchmark for deception was estimated by taking an average external audience document. Internal audience documents were ranked against this benchmark using the vector analysis classification method. To evaluate the efficacy of the indicators and the multivariate method for ranking documents, documents from the highest, middle and lowest rankings were assessed by-hand using (critical) discourse analytic methods. Analysis validated the automatic ranking algorithm in part. However, statistical tests did not support hypotheses projecting higher instance of the six indicators in external audience documents and certain sources. Rather, deceptive corporate strategy can be better captured by examining potential indicators in concert. The automatic ranking algorithm results demonstrate an avenue for quickly organizing document in a large collection for subsequent discourse analysis.