World Knowledge in NLP
In the context of natural language processing, “world knowledge” refers to the broad grasp of the outside world required for language use and comprehension in various contexts. It involves knowledge and comprehension that people get by education, reading, introspection, discussion, and everyday experiences; it transcends the literal meaning of words or the grammatical structure of sentences.
This is a summary of NLP’s global knowledge:

Necessity for Understanding Language
- Pragmatics and discourse analysis necessitate a comprehension of linguistic meaning, whereas syntax works with sentence structure and semantics with literal meaning. This understanding of meaning is mostly based on practical experience.
- Humans use a great deal of common sense and understanding of the structure of the world to inform their judgements and direct their everyday activities. For instance, it is easier to navigate when one is aware that mediaeval European cities usually have a church in the middle with a conspicuous tower.
- When it comes to language creation and understanding, fluent speakers provide a wealth of knowledge, including references to everyday situations. Speech is frequently based on things or experiences from the real world.
Role in Specific NLP Tasks
Pragmatics and Discourse: For tasks like resolving anaphoric relations where pronouns or other phrases relate to items stated previously in the text world knowledge is essential. Hurricanes are catastrophes, for example, therefore knowing that “the disaster” relates to “Hurricane Hugo” is necessary.
Question Answering (QA): The fundamental foundation of knowledge-based QA systems is the translation of plain language queries to searches over structured databases that hold world facts. While even IR-based QA may use retrieved facts from text, this is in contrast to depending only on text in IR-based QA.
Multiword Expressions (MWEs): Knowing the particular circumstances or settings to which MWEs with “pragmatic idiomaticity” are linked such as “Good morning” being used in the morning is necessary to comprehend them.
Word Sense Disambiguation (WSD): Although it isn’t called “world knowledge”, understanding how words are employed in connection to ideas and other words is implicitly used in the process of determining a word’s accurate meaning in context. WSD makes use of resources such as WordNet, which are a type of organised lexical information about the world and capture conceptual semantic knowledge and links between word meanings.
Computational Representation and Challenges
- It is quite difficult to make the large quantity of knowledge about the human world plain for robots.
- By giving a formal, explicit explanation of ideas and their connections in a discourse domain, ontologies are a means of clearly capturing knowledge about the world, frequently for a particular domain. The goal of ontologies is to use declarative, machine-processable formal languages (such as variations of first-order logic) to encode knowledge. They stand for a common understanding of a field or the world.
- The goal of projects like Cyc is to develop “artificial common sense” by compiling extensive knowledge bases of carefully considered claims in a formal language that have been developed over many years by human experts. Although it is a challenging endeavour, this produces knowledge of the highest calibre.
- Another sizable, humanly curated dataset, WordNet, contains conceptual semantic knowledge about words. It is arranged into synsets, or collections of synonyms, connected by semantic connections such as meronymy (part-whole) and hypernymy (IS-A). A database of lexical relations that reflect word senses and their relationships is offered by WordNet. The relationships it records reflect features of how people organize their mental knowledge of the world, despite its concentration on lexical semantics.
- Because it is inherently difficult to fully describe real knowledge, computational models frequently find it challenging to capture its complexity, particularly when employing statistical techniques. Hand-coding world knowledge for profound comprehension is regarded as a very challenging endeavour.
- There are other strategies, such as the Normalised Web Distance (NWD), which uses search engines to try and access the large amount of implicitly available, low-quality, and unstructured knowledge on the Internet.
To put it briefly, world knowledge is the external, real-world data that natural language processing (NLP) systems must access and employ in order to fully comprehend and use natural language in context, especially in fields like pragmatics, discourse, and knowledge-based applications. One of the biggest challenges in the discipline is still adequately representing and using this information computationally.