HJ's Blog: CRF

2011년 11월 21일 월요일

Generative learning (model) vs. Discriminative learning (model)

진우가 찾은 Q/A인데, 잘 나와있다.. ㅎㅎ

Let's say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditionalprobability distribution p(y|x) - which you should read as 'the probability of y given x'.

Here's a really simple example. Suppose you have the following data in the form (x,y):

(1,0), (1,0), (2,0), (2, 1)

p(x,y) is

      y=0   y=1
     -----------
x=1 | 1/2   0
x=2 | 1/4   1/4

p(y|x) is

      y=0   y=1
     -----------
x=1 | 1     0
x=2 | 1/2   1/2

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.

The distribution p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms model p(x,y), which can be tranformed into p(y|x) by applying Bayes rule and then used for classification. However, the distribution p(x,y) can also be used for other purposes. For example you could use p(x,y) to generate likely (x,y) pairs.

From the description above you might be thinking that generative models are more generally useful and therefore better, but it's not as simple as that. This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it's pretty heavy going. The overall gist is that discriminative models generally outperform generative models in classification tasks.

그리고, 아래 논문의 말을 보면 좀 이해가 된다.. ㅋ

HMMs are a form of generative model, that defines a joint probability distribution p(X,Y ) where X and Y are random variables respectively ranging over observation sequences and their corresponding label sequences. In order to define a joint distribution of this nature, generative models must enumerate all possible observation sequences.

One way of satisfying both these criteria is to use a model that defines a conditional probability p(Y |x) over label sequences given a particular observation sequence x, rather than a joint distribution over both label and observation sequences. Conditional models are used to label a novel observation sequence x? by selecting the label sequence y? that maximizes the conditional probability p(y?|x?). The conditional nature of such models means that no effort is wasted on modeling the observations, and one is free from having to make unwarranted independence assumptions about these sequences; arbitrary attributes of the observation data may be captured by the model, without the modeler having to worry about how these attributes are related.

글고, CRF 강의 URL

http://videolectures.net/cikm08_elkan_llmacrf/