Comparative evaluation of cluster learning algorithms
Category introduction
** mini batCh k-means **
MINI BATCH K-Means is very similar to K-Means, which is a variant form of K-Means. Different from K-Means are iterative each time it is sampling a small data set from all data sets. The Mini Batch K-Means Algorithm does not have a greater impact on the clustering effect, * greatly shortens the calculation time *.
**Affinity propagation **
Affinity Propagation is also known as pro and spread clusters. AffInity Propagation is based on the design of the data point for message delivery. Different from the clustering algorithm such as K-Means is that * affinity transmission clusters do not need to be determined in advance, ie k value. However, the operating efficiency is low *.
** mean shift **
Meanshift is also known as a mean drift clustering. The purpose of Mean Shift cluster is to identify the most intensive area, and it is also an iterative process. During the clustering, the offset mean of the initial center point is first calculated, and the point is moved to this offset mean, and then the new starting point is continued until the final condition is satisfied. Mean Shift also introduces a nuclear function for improving clustering effects. In addition, * Mean Shift also has a good application in the fields of image segmentation, video tracking. *
** spectral clustering **
SpecTral Clustering is also known as spectral clustering. The spectrum cluster is also a relatively common clustering method, which is evolved from the chart. The spectrum clustering initially connects the point in the feature space. Among them, the farther the two points, the lower the weight corresponding to the side. Similarly, the closer distance is, the higher the weight of the corresponding side. Finally, by dividing all the networks consisting of all feature points, let the divided subunies are connected to each other and as low as possible, and the weights and possibly high side edges of each sub-map can be achieved. The effect of clustering. * The benefit of the spectral cluster is the ability to identify the sample space of any shape, and can be obtained globally optimal solution. *
** agglomerative clustering **
Agglomerative Clustering is also known as hierarchical clustering. The hierarchical clustering algorithm is a process of combining all sample points from bottom to form a tree, which no longer produces a single cluster, but generates a cluster level. Hierarchical clusters By calculating the distance between each sample data to determine their similarity relationship, in general, the smaller the distance is, the higher the similarity. Finally, the higher the similarity of the similarity is classified as a class, it iterates, until a tree is generated. Since the hierarchical cluster involves cyclic calculations, the time complexity is relatively high, and the running speed is slow.
** birch **
BIRCH is the abbreviation of English Balanced Item Reducing and Clustering Using Hierarchies, its Chinese translated names "Based on the balance iterative stabilization and clustering", the name is too long.
BIRCH introduces clustering feature trees (CF trees), first clustering to small clusters by other clustering methods, and then clustering clustering between clusters. * Birch's advantage is that only a single scan data set can complete the cluster, run fast, especially suitable for big data sets. *
** dbscan **
DBSCAN is an English Density-Based Spatial Clustering of Applications With noise, its Chinese is named "Based on spatial density and noise application", the name is equally long.
DBSCAN is based on density concept, requiring the number of samples included in a certain area in the cluster space not less than a given threshold. * This algorithm runs fast and can effectively process the noise points present in the feature space. However, the performance of DBSCAN is poor for a sample collection of uneven distribution of density. *
You can use the following picture to determine what is used
2. An example
Below with iris flower data sets, various clustering methods
Import cluster module
Load data set
鸢尾花数据集Output of a cluster image sequentially
最终输出图像Summary: The iris data set is not particularly suitable for cluster analysis, so the effect is not obvious, it is recommended that you can find other data sets to try this method, here examples are just to show each clustering method Features, if you want to use all methods into practice, it is also important to master the mathematics principles behind it.