January 22, 2009


When I was in school I did some empirical studies of what might now be called "citation clouds" -- intra- and intertextual citation patterns in science journals. I was able to build a simple yet robust statistical model in which certain features of an article's citation cloud could be used to predict how often that article would itself be cited in the next few years. Note that the model included no information whatever about the content of the articles in question. I submitted the paper to a leading psychology journal under the title "How to Do Things with Citations." The lone reviewer, whom I suspect didn't catch the tacit citation of JL Austin in the title, rejected my paper on the basis of unseemly cynicism.


the paper sounds fabulous. I confess to enjoying, though, the justification for rejection. It's so quaint. I actually could never even imagine rejecting something for such a reason; it's quite odd and surprising.


Oh, the reviewer tossed in a few more technical points; e.g., my results didn't agree with someone else's -- even though I explicitly explained why. But bad attitude is what it came down to, especially since in the Discussion section I included a list of empirically-based tips on how you too can jack up your citation rates. I'd cite the rejection letter if I could find it.


What about the hints on jacking up citation rates?


It'll cost ya...


hmmn... I'm not sure of your preferred currency. Gossip? mittens? rubles?


Just think of me fondly from time to time in years to come.

The paper is lost in my archives someplace, but the results confirmed what you might expect. So, if you wanted to persuade potential readers that your article is important without actually going to the trouble of producing high-quality work, what tactics might you employ?

(1) Cast a wide net for potential readers: do this by including a long list of citations. This move signals relevance to a lot of different audiences, plus the long list also conveys erudition, which couldn't hurt.

(2) Show that you're working on a hot topic -- cite mostly recent works, especially those that are already being cited quite frequently by others. You can include some old classics (typically heavily cited, which is a good thing), as long as your "citation half-life" -- the median age of your references -- is short.

(3) Tie your work to recognized paradigms -- cite works that tend to cluster together in others' reference lists. I.e., if researchers who cite paper A also tend to cite paper B, then you should cite both A and B.

I can't remember in what order these variables loaded into the structural model, but together they accounted for more than 50% of the variance in subsequent citation rates, which by social science standards is a very strong result. Of course this study is now dated, it applied to only certain fields, your results may vary, etc. Furthermore, resorting to this sort of manipulation surely demonstrates decline of symbolic efficiency. Still...


How's this for some trite, albeit alarming, praise of quantitative analysis of (a) speech:

"One day after the occasion, USA Today offered as an analysis of the [inauguration] speech a list of the words most frequently used, words like America, common, generation, nation, people, today, world. This is exactly the right kind of analysis to perform, for it identifies the location of the speech’s energy in the repetition of key words and the associations forged among them by virtue of that repetition.

In the years to come what USA Today has begun will be expanded and elaborated in a thousand classrooms. Canonization has already arrived."

Stanley Fish in NYT:

M Sowid

Could you have just said "buzzword" or "the way forward" or hell with it...lets talk about the folksyism of that one guy.

Speech and the repetitive nature of speech WITHIN the social structure isn't miraculous nor is it mysterious. I guess that comes with knowledge of more than one language and the requisite "linearity" of various approaches in linguistics.

All the same...it makes for good conversation for some. Speech.

We all merely propagandize. Some are more successful that others. Intensity is in the eye of the most superfluous or occassionally the most accurate.

Is Wordi really just another magnetic poetry game?

Who knows.


Great post.

A link and and an implicit counter-argument. Brad Borevitz' work State of the Union (http://stateoftheunion.onetwothree.net) uses word frequency clouds (among other things) to visualise the entire corpus of State of the Union addresses. He has a detailed political rationale for this methodology - using quantitative means to take political language apart. He writes:

"The counting up of words suggests a different sort of reading practice. There is reason to be skeptical of the positivist implications of a statistical analysis of language, but there is also motive to appreciate and explore the current vogue of quantitative methods. There is something compelling in the urge to empirically examine this particular corpus for clues as to how things have gone horribly wrong. Maybe we can no longer bear to listen to the address, or maybe it has become impossible for us to read it. There are certainly few who would be willing to scrutinize all 3000 pages of our legacy of 214 messages from the president. Perhaps counting is a defense against the spell of iconic language." (http://stateoftheunion.onetwothree.net/essay.htm)

I've written on Borevitz and other "data artists" applying similar techniques - which I read as operating against information (http://journal.fibreculture.org/issue11/issue11_whitelaw.html).




All excellent. ktismatics: so very nicely done. Now someone please tell me that Fish's last three paragraphs are ironic.

Martin @ Home

Very interesting. I find tag clouds useful, but would like to see other visualization of text (what about a reverse cloud that shows LEAST used words or unusual combinations).

Using word visualizations is not new (I'm thinking specifically of much of the avante-garde art from the teens to 50s).

And such things as word frequency counts, adjancies, etc., have been in the textual analysis of English profs for a 100 years (think concordances, think of those textual analyses that "prove" or "disprove" that Bacon or Oxford wrote "Shakespeare")


Very interesting discussion of tag clouds, one that seems to me is really a discussion about meaning-making and how we read. Your discussion is grounded in the idea that meaning via reading is achieved (in English) by reading sentences left to right as we move down the page.

Tag clouds, however, offer a different way of reading, one that asks us to think about rhetorical devices other than how the words, sentences, and paragraphs are ordered. As you write above, we are forced to think about "frequency, proximity, and duration." But we are also forced to think about colors used, number of words included in the tag cloud, order of the words, fonts, and so on, as well as the theories associated with these subjects.

For example, when I see the tag cloud of you Ranciere paper, I am reading not only the relationship between the words and their size, but the scattered layout you chose, the pastel-like font color, the serif font, the number of words that are chosen to be represented, and so on. I wonder what words are missing that you chose not to include. Were there numbers, as well? Why did you choose colors rather than the traditional black text on white background (which I imagine is how the paper was written and how it will appear when published)? Why didn't you shift to a screen-friendly sans-serif font? All of these questions enhance the meaning of a text rather than devoid it of meaning, and can lead to wonderfully nuanced discussions of both the tag cloud and the original text.

Because there are so many choices, tag clouds are rhetorical. That is, they structure readings of texts based on the choices an author makes when composing the tag cloud. I write about this briefly (with examples) in terms of Obama's Cairo Speech at: http://bit.ly/zyhJh (scroll down a bit).

Thanks, again, for this interesting discussion, which is challenging me to rethink many of my ideas.

Sherman Dorn

Tag clouds are quasi-analytical impressions. You're right that they (deliberately) strip meaning and semantic relationships from text. They also are horrible representations of quantitative data, for which frequency counts would be more accurate, scannable by the eye, and digestible by the brain.

