江阴二手房出售-维纳集团

深圳二手房房价分析及预测
2023年9月21日发(作者:龚贤永)

深圳⼆⼿房房价分析及预测

分析⽬标:

1. 通过处理后的房价数据,筛选对房价有显著影响的特征变量。

2. 确定特征变量,建⽴深圳房价预测模型并对假设情景进⾏模拟

数据预处理

import pandas as pd

import os

file_path="D:Python数据分析与挖掘实战深圳⼆⼿房价分析data"

#读取file_path⽬录下的所有⽂件

file_name=r(file_path)

df=ame()

lis=[]

#使⽤两种⽅法读取数据

for i in file_name:

file=_excel((file_path,i))

# (file)

df=(file)

# df=(lis)

#更改第⼀列的名字

#我们可以看到字段中只有每平⽶的单价,我们可以加⼀个字段为总价,多⼀个维度进⾏分析,总价为⾯积乘以每平⽶单价

df['total_price']=df['AREA']*df['per_price']

print(df['total_price'])

out:

0 632.002890

1 879.995700

2 110.000800

3 93.990400

4 395.998200

...

1487 116.000040

1488 119.999383

1489 145.001298

1490 128.999772

1491 80.999928

Name: total_price, Length: 18514, dtype: float64

#查看是否有重复项

print(ated().sum())

out:

0

area_map={'baoan':'宝安','dapengxinqu':'⼤鹏新区','futian':'福⽥','guangming':'光明',

'longhua':'龙华','luohu':'罗湖','nanshan':'南⼭','pingshan':'坪⼭','yantian':'盐⽥'

,'longgang':'龙岗'}

df['district']=df['district'].apply(lambda x : area_map[x])

特征变量分析

ct特征变量分析

由上图可以看出:

1. 南⼭区⼆点平均房价最⾼,⼤鹏新区最低。

2. 平均总价南⼭区最⾼,坪⼭区最低。

3. ⼆⼿房总数量有18514套,数量最多的为罗湖,接近18%。

4. 由箱型图可以看出随着区域不同,箱⼦中⼼明显不同,说明房价跟区域有关系。

m特征变量分析

由上图可以看出:

1. 厅数量为3的平均单价最⾼。

由上图可以看出:

由上图可以看出:

r(,_price,marker='x',color='b',alpha=0.5)

('⾯积AREA 单位⾯积房价per_price的散点图')

("单位⾯积房价")

("⾯积(平⽅⽶)")

()

由上图可以看出:

由上图可以看出:

1.随着楼层的变化,平均单价波动较⼤,所以楼层对单价有影响。

机器学习预测房价

由上⾯的分析可以看出(区域、房间数量、学校、楼层数、是否靠近地铁站、⾯积、厅数)等7个特征对房价有影响,因此将这些特征作为

data_new = data_(columns=['district', 'hall', 'roomnum', 'C_floor', 'total_price'], axis=1)

# 确定数据中的特征与标签

x = data_[:, data_s != "per_price"]

fea_imp = s

y = data_[:, 'per_price']

# 数据分割,随机采样30%作为测试样本,其余作为训练样本

from _selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=10, test_size=0.3)

# print(x_, x_, y_, y_)

# reshape(-1,1)表⽰任意⾏,⼀列

y_train = y_e(-1, 1)

y_test = y_e(-1, 1)

# 数据标准化处理

ss_x = StandardScaler()

ss_y = StandardScaler()

# fit_transformfittransform的组合,既包括了训练⼜包含了转换。

# transform()fit_transform()⼆者的功能都是对数据进⾏某种统⼀处理

# (⽐如标准化~N(0,1),将数据缩放(映射)到某个固定区间,归⼀化,正则化等)

x_train = ss__transform(x_train)

x_test = ss_orm(x_test)

mean_y = (y_train)

s_y = (y_train)

[0] train-rmse:1.04640 test-rmse:1.04475

[65] train-rmse:0.55049 test-rmse:0.57638

[66] train-rmse:0.54969 test-rmse:0.57585

[67] train-rmse:0.54928 test-rmse:0.57555

[68] train-rmse:0.54904 test-rmse:0.57539

[69] train-rmse:0.54829 test-rmse:0.57457

[70] train-rmse:0.54804 test-rmse:0.57442

[71] train-rmse:0.54737 test-rmse:0.57405

[72] train-rmse:0.54685 test-rmse:0.57380

[73] train-rmse:0.54622 test-rmse:0.57343

[74] train-rmse:0.54584 test-rmse:0.57330

[75] train-rmse:0.54572 test-rmse:0.57320

[76] train-rmse:0.54557 test-rmse:0.57312

[77] train-rmse:0.54502 test-rmse:0.57257

[78] train-rmse:0.54446 test-rmse:0.57215

[79] train-rmse:0.54392 test-rmse:0.57191

[80] train-rmse:0.54342 test-rmse:0.57153

[81] train-rmse:0.54309 test-rmse:0.57132

[82] train-rmse:0.54299 test-rmse:0.57130

[83] train-rmse:0.54251 test-rmse:0.57103

[84] train-rmse:0.54239 test-rmse:0.57095

[85] train-rmse:0.54197 test-rmse:0.57077

[86] train-rmse:0.54146 test-rmse:0.57042

[87] train-rmse:0.54137 test-rmse:0.57035

[88] train-rmse:0.54091 test-rmse:0.57010

17 25.0 district_龙岗

10 20.0 district_坪⼭

11 15.0 district_⼤鹏新区

_importance(xg, max_num_features=10, importance_type='gain')

()

# () 假想情形,做预测,x_new是新的⾃变量

'''

预测要找⼀个条件为:

1.南⼭区

2.3个房间

3.⾯积⼤概再80㎡左右

4.有地铁

5.学区房

的房⼦的⼤概花费

'''

room = [Roomnum['roomnum_3'] == 1].head(1).reset_index(drop=True)

dis = [District['district_南⼭'] == 1].head(1).reset_index(drop=True)

hal = [Hall['hall_3'] == 1].head(1).reset_index(drop=True)

x_new1 = ([room, dis, hal], axis=1)

x_new1['AREA'] = 80

x_new1['floor_num'] = 3

晨阳水漆价格-森霸传感股票牛叉股

深圳二手房房价分析及预测

更多推荐

深圳 二手房