Weka EM 协方差 - 军军小站|张军博客

Weka EM covariance

description 1:

Dear All,

    I am trying to find out what is the real meaning of the minStdDev parameter in the EM clustering algorithm. Can anyone help me?

    I have not looked at the code, but I suspect that the minStdDev is used as the first estimate of the covariance of a Gaussian in the mixture model. Am I correct?

    I have found the equations or perhaps similar equations to the ones used to calculate the parameters for a Gaussian mixture model in the EM algorithm and there are three, which have these functions:

    The first one calculates the probability of each Gaussian.
    The second calculates the mean of each Gaussian
    The third calculates the covariance matrix of each Gaussian

    But this means to start off with there has to be an initial guess at the parameters for the Gaussian mixture model ie the probability or weighting factor for each Gaussian is needed, as is the mean and Covariance matrix.

     If I am wrong how is the EM algorithm initiated ie how is the initial guess at the mixture model arrived at? Does minStdDev have any part to play in it? Also is a full covariance matrix calculated in the EM algorithm or are just the standard deviations or variances calculated, ie are right elliptical Gaussians used?

     I am guessing that the random number generator is used to pick one or more data points at random as initial values for the means.

     This question really follows up on my previous postings about differences between Mac and PC using the EM algorithm and worries about the stability of the algorithm. I was (naively) using the default value of 1.0E-6. However after a reply to a previous posting I have tried scaling the data to be between -1 and +1 and alsozero mean and unit SD. When I try these scaled data sets Mac and PC produce the same result. So I realised that ought to think about the value of minStdDev.

      Many thanks for your help in advance.

John Black

description 2:

EM in java is a naive implementation. That is, it treats each
attribute independently of the others given the cluster (much the same
as naive Bayes for classification). Therefore, a full covariance
matrix is not computed, just the means and standard deviations of each
numeric attribute.

The minStdDev parameter is there simply to help prevent numerical
problems. This can be a problem when multiplying large densities
(arising from small standard deviations) when there are many singleton
or near-singleton values. The standard deviation for a given attribute
will not be allowed to be less than the minStdDev value.

EM is initialized with the best result out of 10 executions of
SimpleKMeans (with different seed values).

Hope this helps.

Cheers,
Mark.

Weka EM 协方差

更多文章、技术交流、商务合作、联系博主

微信扫码或搜索：z360901061

微信扫一扫加我为好友

QQ号联系： 360901061

您的支持是博主写作最大的动力，如果您喜欢我的文章，感觉我的文章对您有帮助，请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧，狠狠点击下面给点支持吧，站长非常感激您！手机微信长按不能支付解决办法：请将微信支付二维码保存到相册，切换到微信，然后点击微信右上角扫一扫功能，选择支付二维码完成支付。

【本文对您有帮助就好】元

2元

5元

10元

20元

自定义