Word Space Model

From LinguisticsOfTemperature

Jump to: navigation, search

Distributional semantics

Distributional semantics is the practice of relating linguistic entities (i.e. words, terms, phrases, sentences, documents, etc.) to each other based on their distributional properties. This has a long-standing tradition in mathematical approaches to linguistics, originating in the distributional methodology of Zellig Harris. It has since proven to be both useful and successful for a wide range of natural language processing tasks and applications, including vocabulary acquisition, word categorization, various forms of cognitive modeling, information retrieval, text categorization, machine translation, just to name a few examples. The major advantages of this approach are that it is completely data-driven, and that it requires no external resources. Distributional semantics can model both semantic relations - like that between "hot" and "warm" - or associative relations - like that between "hot" and "coffee", depending on what kind of distributions one is considering:

  • Semantic relations are captured by collecting information on which words co-occur with similar other words.
  • Associative relations are captured by collecting information on which words co-occur together.

Word Space Models

Distributional semantics often favor geometric representations as implementational framework - hence the name "word spaces". One example of such a framework is the Latent Semantic Analysis (LSA) model, which collects text data in a words-by-documents matrix that is factorized into a reduced-dimensional space using singular value decomposition. This type of model captures predominantely associative relations between words. LSA has been enormously influential, and has spawned a large body of literature reporting empirical results in various semantic tasks and applications. The original LSA model has since been improved and augmented in a number of ways, including various alternatives to the singular value decomposition like non-negative matrix factorization or independent component analysis, and the probabilistic latent semantic analysis framework and related topic models like latent dirichlet allocation, which extends LSA with a probabilistic basis that results in a more principled approach with a solid foundation in statistics.
Another example of a word space model is the approach introduced by the seminal work of Hinrich Schütze, and by the Hyperspace Analogue to Language (HAL) model, in which text data is collected in a words-by-words matrix by noting how often words tend to co-occur within a context window spanning a (normally) small number of word tokens. In contrast to LSA, this type of model captures predominantely semantic relations between words. The basic HAL model has also been improved and augmented in various ways, e.g. by using different sizes of the context windows and weightings of the co-occurrence counts, and by utilizing different types of dimensionality-reduction techniques to alleviate the computational complexity of the models. The arguably most important recent development of the HAL-type distributional paradigm came with the BEAGLE model, which introduced a method based on vector convolution for utilizing word order to build the distributional representations.

Random Indexing

Random Indexing is a framework designed to provide a more principled approach to distributional semantics than LSA and HAL. The idea is to use sparse high-dimensional random vectors as a basis for accumulating distributional representations incrementally. This methodology ensures that the dimensionality of the representations never increases - regardless of the size of the data. This unique property makes Random Indexing extremely scalable and considerably more efficient than other frameworks for distributional semantics. Random Indexing can also be used to produce both LSA and HAL-type of models - i.e. can be used to capture both associative and semantic relationships - and it allows for a number of operations that are very useful in distributional modelling of meaning. For example, permutation is used as an operator in the Random Indexing framework to improve on the computationally expensive convolution operation used in the BEAGLE model.

-Magnus Sahlgren, 2010

Personal tools