How certain words within text become 'keywords'

   Jul 18, 3:55 pm

Washington, July 18 (ANI): What denotes keywords is not the fact that they appear very frequently in a given text, but it is that they are found in greater numbers only at certain points in the text, scientists from the Max Planck Institute for the Physics of Complex Systems have found.

They also discovered that relationships exist between sections of text, which are distant from each other, in the sense that they preferentially use the same words and letters.

How letters and words correlate with the subject of a text is something Eduardo Altmann and his colleagues from the Max Planck Institute for the Physics of Complex Systems have studied with the help of statistical methods.

The Dresden-based scientists mathematically studied the semantic properties of texts by translating ten different English texts into various codes. One of the chosen texts was the English edition of Leo Tolstoy's "War and Peace".

One example of what the scientists did was translate letters in a text into a binary sequence. They replaced all vowels with 1 and all consonants with 0. By employing additional mathematical functions, the scientists examined different levels of the text - both individual vowels and letters, as well as whole words - which had been translated into various codes.

In so doing, it was possible to identify repeating patterns within the text as a whole. Such correlation within a text is referred to as long-range correlation. This indicates whether two letters located at arbitrarily distant points in the text are connected with each other.

For example, when we find a letter "W" at a certain point, there is a measurably higher probability that we will find the letter "W" again a few pages later. "Understandably enough, if a certain point in the book talks about war, there is a high probability that the word war will also appear a few pages later. What is surprising is that we also find this higher probability at the level of individual letters," said Altmann.

The scientists found this long-range correlation not only between letters, but also within higher linguistic levels, such as words. Within individual levels, the correlation remains when looking at different texts.

Long-range correlation enables the scientists to draw conclusions about the extent to which certain words are connected to a topic.

Furthermore, the scientists also studied what is known as "burstiness", which describes whether increased occurrence of a pattern of characters is present in a passage of text. It shows, for instance, whether a word comes up at increased frequency in a certain text section. The more frequently a certain word is used in a passage, the more likely it is that that word is representative of a certain subject.

The scientists demonstrated that certain words come up repeatedly throughout a text, are however not present in bursts in a given text passage. Although these words do exhibit long-range correlation, they are not closely related to the topic at hand.

"Articles are the best examples of these. They come up very frequently in every text, but they are not crucial in conveying a given topic," said Altmann.

Their findings could be used in future to improve Internet search engines, and they could also help to analyse texts and identify plagiarism. (ANI)

Human brain can plan actions toward things that eyes don't see Jun 20, 11:12 am
Washington, June 20 (ANI): A new study has revealed that people can plan strategic movements to several different targets at the same time, even when they see far fewer targets than are actually present.
Full Story »
Newly-hatched chickens much smarter than 3-year-old human babies Jun 20, 11:12 am
London, June 20 (ANI): A new study has revealed that newly-hatched chickens have skills which can take human babies months or even years to master.
Full Story »
Females prefer biological fitness than other traits in potential mates Jun 20, 11:12 am
Washington, June 20 (ANI): A female's mating decisions are largely based on traits that reflect fitness or those that help males perform well under the local ecological conditions, a new study has found.
Full Story »
How pearls get their perfect spherical shape Jun 20, 10:47 am
Washington, June 20 (ANI): The mystery of how pearls form into the most perfectly spherical large objects in nature may have an unlikely explanation.
Full Story »
Comments

LATEST STORIES
TOP VIDEO STORIES
PHOTO GALLERY