异度部落格

学习是一种生活态度。

0%

最近准备重温 C++,正好看到有关类和占用空间的问题,于是整理了下。先上代码

#include <iostream>
using namespace std;

//空类型,没有任何成员
class ClassA {

};

//添加构造函数和析构函数
class ClassB {

public:
ClassB() {};
~ClassB() {};

};

//仅包含一个虚构函数
class ClassC {
virtual void fun() {};
};

int main() {

ClassA A;
cout << "ClassA:" << sizeof(A) << endl;

ClassB B;
cout << "ClassB:" << sizeof(B) << endl;

ClassC C;
cout << "ClassC:" << sizeof(C) << endl;

return 0;
}

结果:

ClassA:1
ClassB:1
ClassC:4

【分析】 1)ClassA 里面没有定义任何变量和函数,本应该是 0 的,但是声明变量需要占用一个字节,因此是 1。
2)ClassB 里面仅仅定义了构造函数和析构函数。调用构造函数和析构函数只需要知道地址即可,而这些函数的地址只与类型相关,而与实例无关,编译器不会因为这两个函数在实例中添加任何信息。
3)ClassC 中仅有一个虚函数。C++编译器一旦发现类型中有虚函数,就会为该类型生成虚函数表,并在该类型的每个实例中添加一个指向虚函数表的指针。

PS:编译器 GCC 4.6.2

【试题描述】请实现一个函数,把字符串中的每个空格替换成%20。例如输入"We are happy.",则输出"We%20are%20happy."。(不能使用任何字符串操作函数)

【试题来源】未知

【试题分析】由于不能使用字符串操作函数,因此只能一个字符一个字符的进行处理。 首先先遍历一遍字符串,统计得到所有的空格,那么替换后的字符串长度应 为len(str) + numOfBlank * 2。然后从后往前面修改字符串。算法时间复杂度O(n)。

【参考代码】

#include <cstdio>
#include <cstring>
using namespace std;

void replaceSpace(char str[], int length) {
if(str == NULL) {
return ;
}

int numOfBlank = 0;
int i = 0;
while(str[i] != '\0') {
if(str[i] == ' ') {
numOfBlank++;
}
i++;
}

int indexOriginalString = length - 1;
int indexNewString = length + numOfBlank * 2 - 1;
while(indexOriginalString >= 0) {
if(str[indexOriginalString] == ' ') {
str[indexNewString--] = '0';
str[indexNewString--] = '2';
str[indexNewString--] = '%';
} else {
str[indexNewString--] = str[indexOriginalString];
}
indexOriginalString--;
}
}

int main() {
char str[] = "We are happy.";
//char str[] = " We are happy. ";
//char str[] = "Wearehappy.";
replaceSpace(str, strlen(str));
printf("%s", str);

return 0;
}

【试题描述】输入二叉树的前序遍历和中序遍历的结果,请重建出该二叉树。假设遍历结果中都不包含重复数字。例如输入前序遍历序列{1,2,4,7,3,5,6,8}和中序遍历{4,7,2,1,5,3,8,6},则重建出该二叉树,并以后续遍历输出。

【参考代码】

#include <iostream>
using namespace std;

struct BinaryTreeNode
{
int value;
BinaryTreeNode* left;
BinaryTreeNode* right;
};

BinaryTreeNode* rebulidBinaryTreeCore(int* startPreOrder, int* endPreOrder,
int* startInOrder, int* endInOrder) {

//先序遍历的第一个节点为根节点
//创建根节点
BinaryTreeNode* rootNode = new BinaryTreeNode();
rootNode->value = startPreOrder[0];
rootNode->left = NULL;
rootNode->right = NULL;

if(startPreOrder == endPreOrder) {
if(startInOrder == endInOrder
&& *startPreOrder == *startInOrder) {
return rootNode;
} else {
throw ("Invalid input.");
}
}

//在中序查找根节点
int* rootIndex = startInOrder;
while(rootIndex <= endInOrder && *rootIndex != rootNode->value) {
rootIndex++;
}
if(rootIndex == endInOrder && *rootIndex != rootNode->value) {
throw ("Invalid input.");
}

//重建左子树
int leftTreeLength = rootIndex - startInOrder;
if(leftTreeLength > 0) {
rootNode->left = rebulidBinaryTreeCore(startPreOrder + 1,
startPreOrder + leftTreeLength,
startInOrder,
rootIndex - 1);
}

//重建右子树
if(leftTreeLength < endPreOrder - startPreOrder) {
rootNode->right = rebulidBinaryTreeCore(startPreOrder + leftTreeLength + 1,
endPreOrder,
rootIndex + 1,
endInOrder);
}

return rootNode;
}

BinaryTreeNode* rebulidBinaryTree(int* preOrder, int* inOrder, int n) {
if(preOrder == NULL || inOrder == NULL || n <= 0) {
return NULL;
}

return rebulidBinaryTreeCore(preOrder, preOrder + n -1,
inOrder, inOrder + n - 1);
}

void printPostOrder(BinaryTreeNode* root) {
if(root!= NULL) {
if(root->left != NULL) {
printPostOrder(root->left);
}
if(root->right != NULL) {
printPostOrder(root->right);
}
cout << root->value << " ";
}
return ;
}

int main() {
int preOrder[8] = {1,2,4,7,3,5,6,8};
int inOrder[8] = {4,7,2,1,5,3,8,6};
BinaryTreeNode* root = rebulidBinaryTree(preOrder, inOrder, 8);
printPostOrder(root);
return 0;
}

几乎所有的博客都可以在线阅读,或者通过 RSS 订阅源进行阅读。RSS 订阅源是一个包含博客及其所有文章条目信息的简单的 XML 文档。 程序中使用了 feedparser 第三方模块,可以轻松地从任何 RSS 或 Atom 订阅源中得到标题、链接和文章的条目。完整代码如下:

'''
Created on Jul 14, 2012

@Author: killua
@E-mail: [email protected]
@Homepage: http://www.yidooo.net
@Decriptioin: Counting the words in a Feed

feedparser:feedparser is a Python library that parses feeds in all known formats, including Atom, RSS, and RDF.It runs on Python 2.4 all the way up to 3.2.

dataset: http://kiwitobes.com/clusters/feedlist.txt
You can download feeds from this list. Maybe some feeds you can access in China.
'''

import feedparser
import re

#Get word from feed
def getwords(html):
#Remove all the HTML tags
text = re.compile(r"<[^>]+>").sub('', html)

#Split words by all non-alpha characters
words = re.compile(r"[^A-Z^a-z]+").split(text)

#Convert words to lowercase
wordlist = [word.lower() for word in words if word != ""]

return wordlist

#Returns title and dictionary of word counts for an RSS feed
def getFeedwordcounts(url):
#Parser the feed
d = feedparser.parse(url)
wordcounts = {}

#Loop over all the entries
for e in d.entries:
if 'summary' in e:
summary = e.summary
else:
summary = e.description

words = getwords(e.title + ' ' + summary)
for word in words:
wordcounts.setdefault(word, 0)
wordcounts[word] += 1

return d.feed.title, wordcounts

if __name__ == '__main__':
#count the words appeared in blog
blogcount = {}
wordcounts = {}

feedFile = file('resource/feedlist.txt')
feedlist = [line for line in feedFile.readlines()]

for feedUrl in feedlist:
try:
title, wc = getFeedwordcounts(feedUrl)
wordcounts[title] = wc
for word, count in wc.items():
blogcount.setdefault(word, 0)
if count > 1:
blogcount[word] += 1
except:
print 'Failed to parse feed %s' % feedUrl

wordlist = []
for w, bc in blogcount.items():
frac = float(bc) / len(feedlist)
if frac > 0.1 and frac < 0.5:
wordlist.append(w)

#Write the result to the file
datafile = file('blogdata', 'w')
#Write result's head
datafile.write('Blog')
for word in wordlist:
datafile.write('\t%s' % word)
datafile.write('\n')
#Write results
for blogname, wc in wordcounts.items():
print blogname
datafile.write(blogname)
for word in wordlist:
if word in wc:
datafile.write("\t%d" % wc[word])
else:
datafile.write("\t0")
datafile.write('\n')

2012 Microsoft Intern Hiring Written Test

1. Suppose that a Selection Sort of 80 items has completed 32 iterations of the main loop. How many items are now guaranteed to be in their final spot (never to be moved again)?
A) 16 B) 31 C) 32 D) 39 E) 40

2. Which Synchronization mechanism(s) is/are used to avoid race conditions among processes/threads in operating systems?
A) Mutex B) Mailbox C) Semaphore D) Local procedure call

3. There is a sequence of n numbers 1, 2, 3,.., n and a stack which can keep m numbers at most. Push the n numbers into the stack following the sequence and pop out randomly. Suppose n is 2 and m is 3, the output sequence may be 1, 2 or 2, 1, so we get 2 different sequences. Suppose n is 7 and m is 5, please choose the output sequences of the stack:
A) 1, 2, 3, 4, 5, 6, 7
B) 7, 6, 5, 4, 3, 2, 1
C) 5, 6, 4, 3, 7, 2, 1
D) 1, 7, 6, 5, 4, 3, 2
E) 3, 2, 1, 7, 5, 6, 4

4. What is the result of binary number 01011001 after multiplying by 0111001 and adding 1101110?
A) 0001 0100 0011 1111
B) 0101 0111 0111 0011
C) 0011 0100 0011 0101

5. What is output if you compile and execute the following code?

void main()
{
int i = 11;
int const *p = &i;
p++;
printf("%d", *p);
}
  1. 11 B) 12 C) Garbage value D) Compile error E) None of above

6. Which of following C++ code is correct?
A) int f() { int a = new int(3); return a; } B) int f() { int a[3] = {1, 2, 3}; return a; } C) vector f() { vector v(3); return v; } D) void f(int ret) { int a[3] = {1, 2, 3}; ret = a; return; }

7. Given that the 180-degree rotated image of a 5-digit number is another 5-digit number and the difference between the numbers is 78633, what is the original 5-digit number?
A) 60918 B) 91086 C) 18609 D) 10968 E) 86901

8. Which of the following statements are true?
A) We can create a binary tree from given inorder and preorder traversal sequences.
B) We can create a binary tree from given preorder and postorder traversal sequences.
C) For an almost sorted array, insertion sort can be more effective than Quicksort.
D) Suppose T(n) is the runtime of resolving a problem with n elements, T(n) = Θ(1) if n = 1; T(n) = 2T(n/2) + Θ(n) if > 1; so T(n) is Θ(n log n).
E) None of the above.

9. Which of the following statements are true?
A) Insertion sort and bubble sort are not effcient for large data sets.
B) Quick sort makes O(n^2) comparisons in the worst case.
C) There is an array: 7, 6, 5, 4, 3, 2, 1. If using selection sort (ascending), the number of swap operation is 6.
D) Heap sort uses two heap operations: insertion and root deletion.
E) None of above.

10. Assume both x and y are integers, which one of the followings returns the minimum of the two integers?
A) y ^ ((x ^ y) & ~(x < y))
B) y ^(x ^ y)
C) x ^ (x ^ y)
D) (x ^ y) ^ (y ^ x)
E) None of the above

11. The Orchid Pavilion (兰亭集序) is well known as the top of "行书" in history of Chinese literature. The most fascinating sentence "Well I know it is a lie to say that life and death is the same thing, and that longevity and early death make no difference! Alas!" ("周知一死生为虚诞,齐彭殇为妄作。") By counting the characters of the whole content (in Chinese version), the result should be 391 (including punctuation). For these characters written to a text file, please select the possible file size without any data corrupt.
A) 782 bytes in UTF-16 encoding
B) 784 bytes in UTF-16 encoding
C) 1173 bytes in UTF-8 encoding D) 1176 bytes in UTF-8 encoding
E) None of the above

12. Fill the blanks inside class definition

class Test
{
public:
____ int a;
____ int b;
public:
Test::Test(int _a, int _b) : a(_a) {b = _b;}
};
int Test::b;
int _tmain(int argc, __TCHAR *argv[])
{
Test t1(0, 0), t2(1, 1);
t1.b = 10;
t2.b = 20;
printf("%u %u %u %u", t1.a, t1.b, t2.a, t2.b);
}

Running result: 0 20 1 20
A) static/const
B) const/static
C) --/static
D) const static/static
E) None of the above

13. A 3-order B-tree has 2047 key words, what is the maximum height of the tree?
A) 11 B) 12 C) 13 D) 14

14. In C++, which of the following keyword(s) can be used on both a variable and a function?
A) static B) virtual C) extern D) inline E) const

15. What is the result of the following program?

char* f(char *str, char ch)
{
char *it1 = str;
char *it2 = str;
while (*it2 != '\0') {
while (*it2 == ch) {
it2++;
}
*it1++ = *it2++;
}
return str;
}
void main(int argc, char *argv[])
{
char *a = new char[10];
strcpy(a, "abcdcccd");

cout << f(a, 'c');
}
  1. abdcccd B) abdd C) abcc D) abddcccd E) Access Violation

16. Consider the following definition of a recursive function, power, that will perform exponentiation.

int power(int b, int e)
{
if (e == 0) return 1;
if (e %2 == 0) return power (b * b, e / 2);
return b * power(b * b, e / 2);
}

Asymptotically (渐进地) in terms of the exponent e, the number of calls to power that occur as a result of the call power(b, e) is* A) logarithmic B) linear C) quadratic D) exponential

17. Assume a full deck of cards has 52 cards, 2 black suits (spade and club) and 2 red suits (diamond and heart). If you are given a full deck, and a half deck (with 1 red suit and 1 black suit), what's the possibility for each one getting 2 red cards if taking 2 cards?
A) 1/2, 1/2 B) 25/102, 12/50 C) 50/51, 24/25 D) 25/51, 12/25 E) 25/51, 1/2

18. There is a stack and a sequence of n numbers (i.e., 1, 2, 3, ..., n). Push the n numbers into the stack following the sequence and pop out randomly. How many different sequences of the number we may get? Suppose n is 2, the output sequence may be 1, 2 or 2, 1, so we get 2 different sequences.
A) C_2n^n
B) C_2n^n - C_2n^(n + 1)
C) ((2n)!) / (n + 1)n!n!
D) n!
E) None of the above

19. Longest Increasing Subsequence (LIS) means a sequence containing some elements in another sequence by the same order, and the values of elements keeps increasing. For example, LIS of {2, 1, 4, 2, 3, 7, 4, 6} is {1, 2, 3, 4, 6}, and its LIS length is 5. Considering an array with N elements, what is the lowest time and space complexity to get the length of LIS?
A) Time: N^2, Space: N^2
B) Time: N^2, Space: N
C) Time: NlogN, Space: N
D) Time: N, Space: N
E) Time: N, Space: C

20. What is the output of the following piece of C++ code?

#include <iostream>
using namespace std;*
struct Item
{
char c;
Item *next;
};
Item Routine1(Item *x)
{
Item *prev = NULL, *curr = x;
while (curr) {
Item *next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
}
return prev;
}
void Routine2(Item *x)
{
Item *curr = x;
while (curr) {
cout << curr->c << " ";
curr = curr->next;
}
}
void _tmain(void)
{
Item *x,
d = {'d', NULL},
c = {'c', &d},
b = {'b', &c},
a = {'a', &b};*
x = Routine1(&a);
Routine2(x);
}
  1. cbad B) badc C) dbca D) abcd E) dcba

Ubuntu: sudo apt-get install ttf-arphic-uming

Fedora:sudo yum install cjkuni-uming-fonts

然后打开http://web.sanguosha.com/  测试,如果不行请重启电脑。

PS:本人测试后发现安装字体只能解决 Firefox 下面的乱码,但无法解决 Chrome 浏览器下面的问题。

参考资料: http://hi.baidu.com/gerald07/item/fb2234cfbfef9f21e80f2e77 http://hi.baidu.com/siweng23/blog/item/aa9dfcfbd4f4564c242df2ec.html http://blog.csdn.net/devplus/article/details/6347492

协同过滤技术可以分为三类:基于用户(User-based)的协同过滤;基于项目(Item-based)的协同过滤;基于模型(Model-based)的协同过滤。 

基于用户(User-based)的协同过滤

用相似统计的方法得到具有相似爱好或者兴趣的相邻用户,所以称之为以用户为基础(User-based)的协同过滤或基于邻居的协同过滤(Neighbor-based Collaborative Filtering)。方法步骤:

1.收集用户资讯

收集可以代表用户兴趣的资讯。一般的网站系统使用评分的方式或是给予评价,这种方式被称为「主动评分」。另外一种是「被动评分」,是根据用户的行为模式由系统代替用户完成评价,不需要用户直接打分或输入评价资料。电子商务网站在被动评分的资料获取上有其优势,用户购买的商品记录是相当有用的资料。

2.最近邻搜索(Nearest neighbor search, NNS)

以用户为基础(User-based)的协同过滤的出发点是与用户兴趣爱好相同的另一组用户,就是计算两个用户的相似度。例如:寻找n个和A有相似兴趣用户,把他们对M的评分作为A对M的评分预测。一般会根据资料的不同选择不同的演算法,目前较多使用的相似度演算法有Person Correlation Coefficient、Cosine-based Similarity、Adjusted Cosine Similarity。

3.产生推荐结果

有了最近邻集合,就可以对目标用户的兴趣进行预测,产生推荐结果。依据推荐目的的不同进行不同形式的推荐,较常见的推荐结果有Top-N推荐和关联推荐。 Top-N推荐是针对个体用户产生,对每个人产生不一样的结果,例如:透过对A用户的最近邻用户进行统计,选择出现频率高且在A用户的评分项目中不存在的,作为推荐结果。关联推荐是对最近邻用户的记录进行关联规则(association rules)挖掘。

基于项目(Item-based)的协同过滤

以项目为基础的协同过滤方法有一个基本的假设:“能够引起用户兴趣的项目,必定与其之前评分高的项目相似”,透过计算项目之间的相似性来代替用户之间的相似性。方法步骤:

1.收集用户资讯

同以用户为基础(User-based)的协同过滤。

2.针对项目的最近邻搜索

先计算己评价项目和待预测项目的相似度,并以相似度作为权重,加权各已评价项目的分数,得到待预测项目的预测值。例如:要对项目A和项目B进行相似性计算,要先找出同时对A和B打过分的组合,对这些组合进行相似度计算,常用的演算法同以用户为基础(User-based )的协同过滤。

3.产生推荐结果

以项目为基础的协同过滤不用考虑用户间的差别,所以精度比较差。但是却不需要用户的历史资料,或是进行用户识别。对于项目来讲,它们之间的相似性要稳定很多,因此可以离线完成工作量最大的相似性计算步骤,从而降低了线上计算量,提高推荐效率,尤其是在用户多于项目的情形下尤为显著。

基于模型(Model-based)的协同过滤

以用户为基础(User-based)的协同过滤和以项目为基础(Item-based)的协同过滤统称为以记忆为基础(Memory based)的协同过滤技术,他们共有的缺点是资料稀疏,难以处理大资料量影响即时结果,因此发展出以模型为基础的协同过滤技术。以模型为基础的协同过滤(Model-based Collaborative Filtering)是先用历史资料得到一个模型,再用此模型进行预测。以模型为基础的协同过滤广泛使用的技术包括Latent Semantic Indexing、 Bayesian Networks…等,根据对一个样本的分析得到模型。

参考资料:http://gengrenjie.com/2009/04/12/%E5%8D%8F%E5%90%8C%E8%BF%87%E6%BB%A4%E6%89%AB%E7%9B%B2%E7%8F%AD%EF%BC%884%EF%BC%89/

本文构建了一个简单的推荐系统,使用的数据是真实的数据,叫作MovieLens,来自University of Minnesota‘s GroupLens项目组。代码以Python作为实现语言,使用版本为Python2.7。

loadMovieData:用于数据的读取。userData指的是以userId为键构建的电影评分列表。movieData值的是以movieId为键构建的电影评分列表。

euclidean:用于计算Eucidean距离系数

pearson:用于计算Pearson相关系数

getSimilarItems:计算出movieData中每一项相似度最大的n项

getRecommendationsItems:对于某个user取得推荐结果

代码如下:

'''
Created on Jun 30, 2012

@Author: killua
@E-mail: [email protected]
@Homepage: http://www.yidooo.net

Data set download from : http://www.grouplens.org/system/files/ml-100k.zip

MovieLens data sets were collected by the GroupLens Research Project
at the University of Minnesota.The data was collected through the MovieLens web site
(movielens.umn.edu) during the seven-month period from September 19th,
1997 through April 22nd, 1998.

This data set consists of:
* 100,000 ratings (1-5) from 943 users on 1682 movies.
* Each user has rated at least 20 movies.
* Simple demographic info for the users

u.data -- The full u data set, 100000 ratings by 943 users on 1682 items.
Each user has rated at least 20 movies. Users and items are
numbered consecutively from 1. The data is randomly
ordered. This is a tab separated list of
user id | item id | rating | timestamp.
The time stamps are unix seconds since 1/1/1970 UTC
u.item -- Information about the items (movies); this is a tab separated
list of
movie id | movie title | release date | video release date |
IMDb URL | unknown | Action | Adventure | Animation |
Children's | Comedy | Crime | Documentary | Drama | Fantasy |
Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
Thriller | War | Western |
The last 19 fields are the genres, a 1 indicates the movie
is of that genre, a 0 indicates it is not; movies can be in
several genres at once.
The movie ids are the ones used in the u.data data set.
'''

from math import sqrt

def loadMovieData(path = "./data/"):
"""
Load movie data from u.data and u.item
@param path: Data set path
"""
#Get movie's title
movies = {}
for line in open(path + '/u.item'):
(movieId, movieTitle) = line.split('|')[0:2]
movies[movieId] = movieTitle

#Load Data
movieData = {}
userData = {}
for line in open(path + '/u.data'):
(userId, itemId, rating, timestamp)=line.split('\t')
movieData.setdefault(movies[itemId], {})
movieData[movies[itemId]][userId] = float(rating)
userData.setdefault(userId, {})
userData[userId][movies[movieId]] = float(rating)

return (movieData, userData)

def euclidean(data, p1, p2):
"Calculate Euclidean distance"
distance = sum([pow(data[p1][item]-data[p2][item],2)
for item in data[p1] if item in data[p2]])

return 1.0 / (1 + distance)

def pearson(data, p1, p2):
"Calculate Pearson correlation coefficient"
corrItems = [item for item in data[p1] if item in data[p2]]

n = len(corrItems)
if n == 0:
return 0;

sumX = sum([data[p1][item] for item in corrItems])
sumY = sum([data[p2][item] for item in corrItems])
sumXY = sum([data[p1][item] * data[p2][item] for item in corrItems])
sumXsq = sum([pow(data[p1][item], 2) for item in corrItems])
sumYsq = sum([pow(data[p2][item],2) for item in corrItems])

if sqrt((sumXsq - pow(sumX, 2) / n) * (sumYsq - pow(sumY, 2) / n)) != 0:
pearson = (sumXY - sumX * sumY / n) / sqrt((sumXsq - pow(sumX, 2) / n) * (sumYsq - pow(sumY, 2) / n))
else:
return 0

return pearson

def getSimilarItems(movieData, n = 20, similarity=pearson):
"""
Create a dictionary of items showing which other items they are most similar to.
"""

results = {}
for movie in movieData:
#Get n items which most similar to movie
matches = [(similarity(movieData, movie, otherMovie),otherMovie)
for otherMovie in movieData if movie != otherMovie]
matches.sort()
matches.reverse()
results[movie] = matches[0:n]

return results

def getRecommendationsItems(userData, user, similarItems, n = 10):
"""
Get recommendations items for user
"""
userRatings = userData[user]
itemScores = {}
totalSim = {}

# Loop over items rated by this user
for (item, rating) in userRatings.items():
# Loop over items similar to this one
for (simValue, simItem) in similarItems[item]:
# Ignore if this user has already rated this item
if simItem in userRatings:
continue
# Weighted sum of rating times similarity
itemScores.setdefault(simItem, 0)
itemScores[simItem] += simValue * rating
# Sum of all the similarities
totalSim.setdefault(simItem, 0)
totalSim[simItem] += simValue

# Divide each total score by total weighting to get an average
rankings = [(score / totalSim[item], item) for (item, score) in itemScores.items() if totalSim[item] != 0]
rankings.sort()
rankings.reverse()

return rankings[0:n]

if __name__ == "__main__":

print 'Loading Data...'
movieData, userData = loadMovieData("./movie/")
print 'Get similarItems...'
similarItems = getSimilarItems(movieData, 50, euclidean)
print 'Calculate rankings...'
rankings = getRecommendationsItems(userData, "87", similarItems)

print rankings

Euclidean 距离

定义:欧几里得空间中点 x = (x1,...,xn) 和 y = (y1,...,yn) 之间的距离为

image

两个变量之间的相关系数越高,从一个变量去预测另一个变量的精确度就越高,这是因为相关系数越高,就意味着这两个变量的共变部分越多,所以从其中一个变量的变化就可越多地获知另一个变量的变化。如果两个变量之间的相关系数为 1 或-1,那么你完全可由变量 X 去获知变量 Y 的值。

· 当相关系数为 0 时,X 和 Y 两变量无关系。

· 当 X 的值增大,Y 也增大,正相关关系,相关系数在 0.00 与 1.00 之间

· 当 X 的值减小,Y 也减小,正相关关系,相关系数在 0.00 与 1.00 之间

· 当 X 的值增大,Y 减小,负相关关系,相关系数在-1.00 与 0.00 之间

当 X 的值减小,Y 增大,负相关关系,相关系数在-1.00 与 0.00 之间

相关系数的绝对值越大,相关性越强,相关系数越接近于 1 和-1,相关度越强,相关系数越接近于 0,相关度越弱。

image

实现代码:

from math import sqrt

# A dictionary of movie critics and their ratings of a small
# set of movies
movies = {'Lisa Rose': {'Lady in the Water': 2.5,
'Snakes on a Plane': 3.5,
'Just My Luck': 3.0,
'Superman Returns': 3.5,
'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0,
'Snakes on a Plane': 3.5,
'Just My Luck': 1.5,
'Superman Returns': 5.0,
'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5,
'Snakes on a Plane': 3.0,
'Superman Returns': 3.5,
'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5,
'Just My Luck': 3.0,
'The Night Listener': 4.5,
'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0,
'Snakes on a Plane': 4.0,
'Just My Luck': 2.0,
'Superman Returns': 3.0,
'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0,
'Snakes on a Plane': 4.0,
'The Night Listener': 3.0,
'Superman Returns': 5.0,
'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,
'You, Me and Dupree':1.0,
'Superman Returns':4.0}}

def euclidean(data, p1, p2):
"Calculate Euclidean distance"
distance = sum([pow(data[p1][item]-data[p2][item],2)
for item in data[p1] if item in data[p2]])

return distance

def pearson(data, p1, p2):
"Calculate Pearson correlation coefficient"
corrItems = [item for item in data[p1] if item in data[p2]]

n = len(corrItems)
if n == 0:
return 0;

sumX = sum([data[p1][item] for item in corrItems])
sumY = sum([data[p2][item] for item in corrItems])
sumXY = sum([data[p1][item] * data[p2][item] for item in corrItems])
sumXsq = sum([pow(data[p1][item], 2) for item in corrItems])
sumYsq = sum([pow(data[p2][item],2) for item in corrItems])

pearson = (sumXY - sumX * sumY / n) / sqrt((sumXsq - pow(sumX, 2) / n) * (sumYsq - pow(sumY, 2) / n))
return pearson

【试题描述】 你让工人为你工作7天,给工人的回报是一根金条。金条平分成相连的7段,你必须在每天结束时给他们一段金条,如果只许你两次把金条弄断,你如何给你的工人付费? 分为1、2、4 三段。 Day1:给1

Day2:给2,还1

Day3:给1

Day4:给4,还1、2

Day5:给1,还2

Day6:给2,还1

Day7:给1

【试题描述】请把一盒蛋糕切成8份,分给8个人,但蛋糕盒里还必须留有一份。

将蛋糕均分层8份,取七份给其中七个人,最后一份放入盒中给第八个。

【试题描述】小明一家过一座桥,过桥时是黑夜,所以必须有灯。现在小明过桥要1秒,小明的弟弟要3秒,小明的爸爸要6秒,小明的妈妈要8秒,小明的爷爷要12秒。每次此桥最多可过两人,而过桥的速度依过桥最慢者而定,而且灯在点燃后30秒就会熄灭。问:小明一家如何过桥?

A:小明  1s

B:小明的弟弟  3s

C:小明的爸爸  6s

D:小明的妈妈  8s

E:小明的爷爷  12s 步骤 用时 位置状况 1 小明和小明的弟弟先过去 3s AB—CDE 2 小明的弟弟回来 3s A—BCDE 3 小明的妈妈和爷爷过去 12s ADE—BC 4 小明回来 1s DE—ABC 5 小明和他的爸爸过去 6s ACDE—B 6 小明回来 1s CDE—AB 7 小明和他的弟弟过去 3s ABCDE— 总用时:29s

【试题描述】一群人舞会,上都戴着一帽子。帽子只有黑白两,黑的至少有一个人都能看到其他人帽子的色,却看不到自己的。主持人先大家看看上戴的是什帽子,然后灯,如果有人认为自己戴的是黑帽子,就打自己一个耳光。第一次灯,没有声音。于是再灯,大家再看一遍,仍然雀无声。一直到第三次灯,才有劈劈啪啪打耳光的声音响起。有多少人戴着黑帽子?

本题关键就是至少有一顶黑帽子。假如只有一个人戴黑帽子,那他看到所有人都戴白帽,在第一次关灯时就 应自打耳光,所以应该不止一个人戴黑帽子;如果有两顶黑帽子,第一次两人都只 看到对方头上的黑帽子,不敢确定自己的颜色,但到第二次关灯,这两人应该明白 ,如果自己戴着白帽,那对方早在上一次就应打耳光了,因此自己戴的也是黑帽子 ,于是也会有耳光声响起;可事实是第三次才响起了耳光声,说明全场不止两顶黑 帽,依此类推,应该是关了几次灯,有几顶黑帽。

【试题描述】一楼到十楼的每层电口都放着一颗钻石,石大小不一。你乘坐梯从一楼到十楼,每层都会打一次,只能拿一次石,才能拿到最大的一

第一步:对1到3层的大小进行比较,记住最大的一颗m1。

第二步:4到6层作为参考,将4-6层的与m1作比较,确认最大的一个的平均水平m2。

第三步:在最后4层中当遇到与m2相近时直接拿下

(答案仅供参考)

PS:这个问题类似柏拉图的麦穗问题,属于哲学的范畴了,所以再深入讨论。

【试题描述】U2合唱17内得赶到演唱会,途中必需跨一座,四个人从的同一端出,你得帮助他到达另一端,天色很暗,而他只有一只手筒。一次同最多可以有两人一起过桥,而过桥候必持有手筒,所以就得有人把手去,来回两端。手筒是不能用的方式来传递的。四个人的行速度各不同,若两人同行慢者的速度准。Bono需花1钟过桥Edge需花2钟过桥Adam需花5钟过桥Lay需花10钟过桥。他要如何在17过桥呢?**
姓名 所用时间 简称
Bono 1 min B
Edge 2 min E
Adam 5 min A
Lay 10 min L
步骤 用时 位置状况
1 Bono和Edge过去 Max(1,2)=2min BE—AL
2 Bono回来 1min E—BAL
3 Adam和Lay过去 Max(5,10)= 10min EAL—B
4 Edge回来 2min AL—BE
5 Bono和Edge过去 Max(1,2)=2min BA EL—
总用时:17min

【试题描述】一根不均匀的要用一个小,如何用它来判断半个小?**

只要将绳子两头同时点上即可。

【试题描述】下水道的盖子是的?

参考答案一:相同面积下圆形的井盖最省材料。

参考答案二:圆形井盖可以避免井盖落入下水道中。

*【试题描述】7克、2克砝各一个,天平一只,如何只用些物品三次将140克的分成5090**克各一份?*

第一次:140 = 70 + 70 ,得到两份70g的

第二次:70 = 35 + 35,得到70,35,35

第三次:35 = (7 + 15)+ (2 + 20),得到70,35,15,20

70 + 20 = 90

35 + 15 = 50

*【试题描述】有一15公里的速度离洛杉直奔纽约,另一以第小20公里的速度从纽约开往洛杉。如果有一只,以外30公里的速度和两车现时,从洛杉,碰到另辆车后返回,依次在两来回的行,直道两面相遇,请问只小鸟飞行了多长**距离?*

设New York到LA的距离为S

s = vt = 30km/h S/(15+20)

*【试题描述】你有两个罐子,50球,50球,随机出一个罐子,随机取出一个球放入罐子,怎么给红球最大的中机会?在你的划中,得到红**球的准确几率是多少?*

A罐子里放1个红球,B罐子里放49个红球和蓝球。

选中A罐就等于选中那个红球,几率就是50%,而B罐子也有49/99的几率选到红球。

整个选到红球的几率就是50%+50%*49/99=99/198+49/198=148/198=74/99,

*【试题描述】你有四丸的罐子,丸都有一定的重量,被染的丸是没被染的重量+1.只称量一次,如何判断哪个罐子的污**染了?*

第一个罐子取1颗,第二个罐子取两颗,第三个罐子取3颗,第四个罐子取4颗。称得总重量,然后减去标准重量15,得到n。n为多少就是哪一罐被污染。

*【试题描述】如果你有无多的水,一个3夸脱的和一个5夸脱的提桶,你如何准确称出4**夸脱的水?*

首先用容量3的水桶装满,往容量5的水桶里面装水。然后再取3夸脱的水,将容量5的水桶装满。此时桶里会剩下1夸脱的水。此时,将容量为5的桶清空,将剩下的1夸脱倒过来。再取3夸脱加入即可。

*【试题描述】你有一桶果,其中有黄色,绿色,色三,,上眼睛出同样颜色的两个,抓取同种颜色的两个。抓取多少个就可以确定你肯定有两个同一色的果冻**?*

4个。数量大于颜色,必定有相同的颜色。

*【试题描述】一批1~100全部开关朝上的灯行以下操作凡是1的倍数反方向一次开,2的倍数反方向又一次开关,3的倍数反方向又一次开关……最后为关熄状的灯的编**号。*

10盏是亮的,分别是1 4 9 16 25 36 49 64 81 100;90盏是灭的。

这是因为除了这些平方数以外,其余的任意一个数都能分成不同的两数乘积,质数可以分为1和本身,合数都可以分成若干组乘积(每组两个),因此,这些等都被拉了偶数倍,也就是灭的,平方数因为在被自己的开方数拉是只有一次,所以是奇数次,也就是亮的。随便举两个例子以证明。36分别被1、2、3、4、6、9、12、18、36拉过,共是9次,亮。38分别被1、2、19、38拉过,共是4次,灭。所以10盏是亮的;90盏是灭的

*【试题描述】张圆盘像唱机上的唱样转动这张盘一半是黑色,一半是白色。假你有数量不限的一些感器。要想确定圆盘转动的方向,你需要在它周围摆多少个感器?它们应该放在什么**位置?*

两个。

设两个传感器一个在12点位置(A),一个在1点的位置(B)。

首先利用传感器A的两次变化,估算出转盘转半圈所用的大致时间。然后计时,当B传感器变化的时间较短,证明是由A->B即顺时针。反之为逆时针。

*【试题描述】设时钟到了12点。注意时针和分重叠在一起。在一天之中,时针和分共重叠多少次?你知道它重叠的具体时间吗**?*

22次,即每一小时都会有一次重合(0点和24点视为一次)

分针一小时走360°,时针走30°,也就是说他们的速度差是330°/h。因此第一圈追上的时刻是360/330=(1 + 1/11),即1点5分多点。第二圈追上时刻720/330=2+ 2/11,即2点11分左右。因此得到所有的重叠时刻。

0,1 + 1/11,2 + 2/11,3 + 3/11,4+ 4/11,5 + 5/11,6 + 6/11,7 + 7/11,8 + 8/11,9 + 9/11,10 + 10/11,12,13 + 1/11,14 + 2/11,15 + 3/11,16 + 4/11,17 + 5/11,18 + 6/11,19 + 7/11,20 + 8/11,21 + 9/11,22 + 10/11

【试题描述】一个屋子有一个关闭的)和3盏电灯。屋外有3开关,分3灯相。你可以随意操纵这开关,可一旦你将,就不能变换开关了。确定开关具体管哪灯。**

先在门外打开一个开关,等一会儿,然后关掉它,再另开一个开关,再走到屋内,热而不亮的一个灯泡是第一个开关所控制,亮的便是第二个,不亮又不热的便是第三个开关所控制的了。

*【试题描述】你有8个球,其中一个略微重一些,但是找出个球的惟一方法是将两个球放在天平上比。最少要称多少次才能找出较**重的球?*

两次。

首选,天枰两边各放3个。如果平衡,取剩余两个,天枰上面各放一个可以得到结果;如果不平衡取下较重的三个记为ABC,将AB放在天枰两边,如果不平衡即得到结果,如果平衡C为较重的那个球。

*【试题描述】如果你有两个桶,一个装的是色的料,另一个装的是色的料。你从料桶里舀一杯,倒入料桶,再从料桶里舀一杯倒入蓝颜料桶。两个桶中红蓝颜料的比例哪个更高?通的方式来这**一点。*

假设两个桶里面的颜料各自为10,一杯的容量刚好是1

操作

状态

从蓝色颜料桶里舀一杯,倒入红色颜料桶 10R,1B 9B
再从红色颜料桶里舀一杯倒入蓝颜料桶 (10R + 1B)*(10/11) 9B + (10R + 1B)/11

因此红颜料桶的红蓝比例:10:1,蓝颜料桶红蓝比例:1:10。