0

Python – Alternate standard-library implementation of NLTK concordance() that allows saving output

I need the functionality of NLTK’s concordance() for something I’m working on, but rather than struggle with┬ánot being able to download its components through corporate proxies, adding NLTK as a dependency to my project and still not being able to display the output of concordance() (best case likely being jury-rigging something using ngrams), it was easier and quicker to just rewrite the functionality of concordance, sans dependencies.

Usage is simple:

text = "We must remember that thought is abstraction. In Einstein's metaphor, the relationship between a physical fact and our mental reception of that fact is not like the relationship between beef and beef-broth, a simpler extraction and condensation; rather, as Einstein goes on, it is like the relationship between our overcoat and the ticket given us when we check our overcoat. In other words, human perception involves coding even more than crude sensing. The mesh of language, or of mathematics, or of a school of art, or of any system of human abstracting, gives to our mental constructs the structure, not of the original fact, but of the symbol system into which it is coded, just as a map-maker colors a nation purple not because it is purple but because his code demands it. But every code excludes certain things, blurs other things, and overemphasizes still other things. Nijinski's celebrated leap through the window at the climax of 'Le Spectre d'une Rose' is best coded in the ballet notation system used by choreographers; verbal language falters badly in attempting to conveying; painting or sculpture could capture totally the magic of one instant, but one instant only, of it; the physicist's equation, Force = Mass X Acceleration, highlights one aspect of it missed by all these other codes, but loses everything else about it. Every perception is influenced, formed, and structured by habitual coding habits- mental game habits- of the perceiver."

for line in concordance(text, 'thing'):
   print line

The output shows the target keyword or phrase, right down the middle:

ry code excludes certain things, blurs other things, an
tain things, blurs other things, and overemphasizes sti
eremphasizes still other things. Nijinski's celebrated 
er codes, but loses everything else about it. Every per

It also works with phrases.

for line in concordance(text, 'of the'):
   print line

Output:

ructs the structure, not of the original fact, but of th
f the original fact, but of the symbol system into which
its- mental game habits- of the perceiver.

Leave a Reply