# 聚类算法概述(k-Means++/FCM/凝聚层次聚类/DBSCAN)

## 聚类的概念

• 层次与划分：当允许存在子簇时，将数据按照层次划分，最终得到的是一颗树。树中包含的层次关系即为聚类划分的层次关系。各个子簇不重叠，每个元素都隶属于某个level的子簇中。
• 互斥、重叠与模糊：这个概念的核心在于，所有集合元素都不完全隶属于任何一个簇，而是按照一定隶属度归属于所有簇。对于任意一个元素，其隶属度和一般为1。
• 完全与部分：完全聚类要求所有数据元素都必须有隶属，而部分聚类则允许噪音存在，不隶属于任何簇。

## 簇的分类

• 明显分离：不同簇间任意元素距离都大于簇内元素距离。从图像上观察是明显分离类型的簇。
• 基于原型：任意元素与它所隶属的簇的簇中心（簇内元素集合的质心）的距离大于到其他簇中心的距离。
• 基于图：图中节点为对象，弧权值为距离。类似于明显分离的定义或基于原型的定义，只是用弧权值代替了人为规定的距离。
• 基于密度：基于密度的簇分类是较为常用，也是应用范围最为广泛的一种分类方法。元素的稠密程度决定了簇的分布。当存在并希望分辨噪声时，或簇形状不规则时，往往采用基于密度的簇分类。

## 常用的聚类分析算法

• 基本k均值：即k-means算法。簇的分类是基于原型的。用于已知簇个数的情况，且要求簇的形状基本满足圆形，不能区分噪声。
• 凝聚层次聚类：起初各个点为一个簇，而后按照距离最近凝聚，知道凝聚得到的簇个数满足用户要求。
• DBscan：基于密度和划分的聚类方法。

# Tutu-Android 清华小图

• 还在为预定不到文图研读间而愁眉苦脸，每晚熬到零点只为抢到一个研读间才能安心睡觉吗？
• 还在因不知道文图剩余座位为零而白跑一趟，或傻傻苦等吗？
• 总因为赶不上研读间预约时间而承担着违约的风险，最终导致一个月无法预订研读间吗？
• 不用怕，不担心，清华小图来帮你:D！

# OpenCV实现图像搜索引擎

OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform. Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of downloads exceeding 9 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through advanced robotics.

OpenCV（Open Source Computer Vision Library）的计算效率很高且能够完成实时任务。OpenCV库由优化的C/C++代码编写而成，能够充分发挥多核处理和硬件加速的优势。OpenCV有大量技术社区和超过900万的下载量，它的使用范围极为广泛，如人机互动、资源检查、拼接地图等。

## 0.Python+OpenCV实现图像搜索引擎

• 首先，OpenCV是一个开源的计算机视觉处理库，在计算机视觉图像处理模式识别中有广泛的应用。接口安全易用，而且跨平台做的相当不错，是一个不可多得的计算机图像及视觉处理库。

• 其次，Python的语法更加易用，贴近自然语言，极为灵活。虽然计算效率并不高，但快速开发上它远胜于C++或其他语言，引入pysco能够优化python代码中的循环，一定程度上缩小与C/C++在计算上的差距。而且图像处理中需要大量的矩阵计算，引入numpy做矩阵运算能够降低编程的冗杂度，更多地把精力放在匹配的逻辑上，而非计算的细枝末节。

# C++贪吃蛇

• 其一，随机生成食物时应检查是否生成在蛇节点上；
• 其二，检查碰撞时除与外围墙碰撞外，还需检查蛇头与蛇身的碰撞。

# Cyclic redundancy check

A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption.

# 循环冗余校验CRC简介

• CRC为校验和的一种，是两个字节数据流采用二进制除法（没有借位和进位，使用异或来代替减法）相除所得到的余数。
• 其中被除数是需要计算校验和的信息数据流的二进制表示；除数是一个长度为(n+1)的预定义二进制数，通常用多项式的系数来表示。
• 在做除法之前，要在信息数据之后先加上n个0。冗余码的位数是n位。冗余码的计算方法是，先将信息码后面补0，补0的个数是生成多项式最高次幂；将补零之后的信息码用模二除法（非二进制除法）除以G(X)对应的2进制码，注意除法过程中所用的减法是模2减法（注意是高位对齐），即没有借位的减法，也就是异或运算。
• 当被除数逐位除完时，得到比除数少一位的余数。此余数即为冗余位,将其添加在信息位后便构成CRC码字。

# 浅谈Prisoner's Dilemma

Golden Balls is an amusing British game show. Especially interesting is the final contest which is a version of the Prisoner’s Dilemma.

If you’re never seen the show, here is how it works. Each of two contestants independently chooses to split or steal the final prize. If both choose split, then the prize is divided evenly. If one chooses split and the other steal, the person who steals gets the entire prize. If both choose steal, however, then both walk away with nothing.

Here’s the normal form representation of the game:

How should you play this game?
One contestant had an amazingly brilliant strategy.

# The wrong way to play the game

Contestants are allowed to discuss strategy before picking split or steal.

Both realize that split gives a fair 50 percent share to each side, but each also sees the advantage of back-stabbing and stealing the prize.

The discussion usually involves the following strategy. Each person tries to convince the other person to split, and they promise to do the same.

I discussed an example of this in a previous post: strategy in Golden Balls.
In that episode, both were promising they would split the prize, but then one person decided at the last minute to steal all the money. She said she was not proud of the decision, but she herself did not want to be cheated.

So trying to split the money in a conventional way doesn’t work. Is there a better strategy?

# Loading data from multiple sources with RxJava

Simply copy from Loading data from multiple sources with RxJava

Suppose I have some Data

• that I query from the network. I could simply hit the network each time I need the data, but caching the data on disk and in memory would be much more efficient.
• More specifically, I want a setup that:
Occasionally performs queries from the network for fresh data.
Retrieves data as quickly as possible otherwise (by caching network results).

I’d like to present an implementation of this setup usingRxJava.

# Basic Pattern

Given an Observable<Data> for each source (network, disk and memory), we can construct a simple solution using two operators, concat() and first().

concat() takes multiple Observables and concatenates their sequences. first() emits only the first item from a sequence. Therefore, if you use concat().first(), it retrieves the first item emitted by multiple sources.

Let’s see it in action:

The key to this pattern is that concat() only subscribes to each child Observable when it needs to. There’s no unnecessary querying of slower sources if data is cached, sincefirst() will stop the sequence early. In other words, if memory returns a result, then we won’t bother going to disk or network. Conversely, if neither memory nor disk have data, it’ll make a new network request.

Note that the order of the source Observables in concat() matters, since it’s checking them one-by-one.

###### Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×