Python数据分析——绘制函数回归线,多种统计图表,抓取信息并整理分析

又是好久不见,博主最近陆陆续续地准备雅思考试。。终于在昨天告一段落,但估计只是暂时的。。

不管成绩如何,也该重新回到专业的学习当中了。最近恰逢一个契机——加入了一支数学建模队伍,备战几天后的数学建模竞赛,作为编程人员,理应回顾一些数学建模常见的数据分析内容。这篇文章主要讲述利用Py的三方库数据分析的几种常见场景,所以是以一个个示例抛砖引玉。

柱状图、折线图

e.g. 小明作为超市老板,进行了一年一度的商品售卖情况总结。他购买每个商品的数量和对应商品受喜爱的数量最终如下表所示,请你将商品受喜爱程度作为指标做一张图(图表类型不限),并以颜色区分程度的多少

商品售卖数量喜爱该产品的人数
商品A10050
商品B20070
商品C150100
商品D2310
商品E3010
商品F5034
商品G15656
商品H3218
商品I6735
商品J8945
超市商品售卖情况

本人思路:通过运算得到不喜爱该产品的人数,将喜爱的人数/不喜爱的人数比例作为划分标准,制作散点图。

# Python Program illustrating
# pyplot.colorbar() method
import numpy as np
import matplotlib.pyplot as plt

# Dataset
# List of total number of items purchased

purchaseCount = [100, 200, 150, 23, 30, 50,
				156, 32, 67, 89]

# List of total likes of 10 products
likes = [50, 70, 100, 10, 10, 34, 56, 18, 35, 45]

# List of Like/Dislike ratio of 10 products
ratio = [1, 0.53, 2, 0.76, 0.5, 2.125, 0.56,
		1.28, 1.09, 1.02]

# scatterplot
plt.scatter(x=purchaseCount, y=likes, c=ratio, cmap="summer")

plt.colorbar(label="Like/Dislike Ratio", orientation="horizontal")
plt.show()

e.g. 现有多个国家的收入与腐败成本的数据如下,统计这两项数据,并用颜色区分腐败成本占总收入的比例。

国家收入与腐败成本数据(由于篇幅问题只展示了部分)

思路基本与上题类似,这里需要简单的用数组来存储每一列的数据

from matplotlib import pyplot as plt
import csv
# Used to display the negative sign normally
plt.rcParams['axes.unicode_minus']=False

# Define two empty lists to store x, y axis data points
x=[]
y=[]
with open("../corruption (1).csv", 'r') as csvfile:
    plots = csv.reader(csvfile, delimiter=',')
    for row in plots:
        x.append(int(row[1]))  # The data read from csv is str type, we need to convert to int type
#         print("x:",x)
        y.append(int(row[2]))
#         print("y:",y)
ratio = [a / b for a, b in zip(x, y)]
# draw a scatter plot
plt.scatter(x,y,c=ratio)
plt.colorbar(label="income/corruption Ratio", orientation="horizontal")
plt.xlabel('income')
plt.ylabel('corruption')
plt.title('income and corruption')
plt.show()

函数线性回归线

e.g. 随机给定相同数量的x与对应的y值,请拟合出对应的函数曲线

import numpy as np
import matplotlib.pyplot as plt

# Take a sequence of natural numbers as the coefficients of a polynomial
func = np.poly1d(np.array([1,5, -4]))
# x 的横坐标
x = np.random.randint(0,200,50)
y = np.random.randint(0,300,50)

# 得到y的对应值
x1=sorted(x)
print(x1)
y1=sorted(y)
print(y)

z1=np.polyfit(x1,y1,3)
p1=np.poly1d(z1)
print(z1)
print(p1)

#绘图
plt.scatter(x1,y1)
plt.plot(x1,p1(x1))
plt.xlabel('x')
plt.ylabel('y(x)')
# 显示函数图像
plt.show()

类似的,可以利用拟合出的多项式来预测x取某个值的时候y对应的值

import random
import numpy as np
import matplotlib.pyplot as plt
# Randomly assign the coordinates x, y
random_list1 = list((range(1, 300)))
random_list2 = list((range(1, 300)))

x = random.sample(random_list1, 7)
y = random.sample(random_list2, 7)

model = np.poly1d(np.polyfit(x, y, 2))

plt.scatter(x, y)

# draw polynomial regression line
myline = np.linspace(1, 300, 150)

plt.plot(myline, model(myline))
plt.show()
# predict a future value when x=255
predict = str(model(255))
print("when x is 255, value is expected to be " + predict)

抓取信息整理分析

import requests
import numpy as np
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
# Crawl the web content
page = requests.get('https://www.imdb.com/chart/top/?ref_=nv_mv_250')
soup = BeautifulSoup(page.content, 'html.parser')

# Crawl movie name
links = soup.select("table tbody tr td.titleColumn a")

# Crawl movie ratings
links1 = soup.select("table tbody tr td.ratingColumn strong")

fig = plt.figure(figsize=(25, 10))

name = []
# For better viewing, this array will store the reverse order of the name array
names = []

score = []
# For better viewing, this array will store the reverse order of the score array
scores = []

colors = np.array([10, 20, 30, 40, 50, 60, 70, 80, 100])

plt.xticks(fontsize=12)
plt.yticks(fontsize=15)

firstscore = links1[::28]
for anchor in firstscore:
    score.append(anchor.text)

# scores=score[::-1]

for i in reversed(score):
    scores.append(i)

first = links[::28]

for anchor in first:
    name.append(anchor.text)

for i in reversed(name):
    names.append(i)

print(names)
print(scores)

plt.scatter(names, scores, c=colors, cmap='viridis')
plt.colorbar()
plt.show()

e.g. 爬取某评分网站的电影评分数据,并以散点图的形式展示部分数据

以上这些例子,其实都是对Py三方库进行基础的应用,想要熟练,更深一步的应用它们,还是需要再去系统的学习。

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注