江阴二手房出售-维纳集团

2023年9月21日发(作者:龚贤永)
深圳⼆⼿房房价分析及预测
分析⽬标:
1. 通过处理后的房价数据,筛选对房价有显著影响的特征变量。
2. 确定特征变量,建⽴深圳房价预测模型并对假设情景进⾏模拟
数据预处理
import pandas as pd
import os
file_path="D:Python数据分析与挖掘实战深圳⼆⼿房价分析data"
#读取file_path⽬录下的所有⽂件
file_name=r(file_path)
df=ame()
lis=[]
#使⽤两种⽅法读取数据
for i in file_name:
file=_excel((file_path,i))
# (file)
df=(file)
# df=(lis)
#更改第⼀列的名字
#我们可以看到字段中只有每平⽶的单价,我们可以加⼀个字段为总价,多⼀个维度进⾏分析,总价为⾯积乘以每平⽶单价
df['total_price']=df['AREA']*df['per_price']
print(df['total_price'])
out:
0 632.002890
1 879.995700
2 110.000800
3 93.990400
4 395.998200
...
1487 116.000040
1488 119.999383
1489 145.001298
1490 128.999772
1491 80.999928
Name: total_price, Length: 18514, dtype: float64
#查看是否有重复项
print(ated().sum())
out:
0
area_map={'baoan':'宝安','dapengxinqu':'⼤鹏新区','futian':'福⽥','guangming':'光明',
'longhua':'龙华','luohu':'罗湖','nanshan':'南⼭','pingshan':'坪⼭','yantian':'盐⽥'
,'longgang':'龙岗'}
df['district']=df['district'].apply(lambda x : area_map[x])
特征变量分析
ct特征变量分析
由上图可以看出:
1. 南⼭区⼆点平均房价最⾼,⼤鹏新区最低。
2. 平均总价南⼭区最⾼,坪⼭区最低。
3. ⼆⼿房总数量有18514套,数量最多的为罗湖,接近18%。
4. 由箱型图可以看出随着区域不同,箱⼦中⼼明显不同,说明房价跟区域有关系。
m特征变量分析
由上图可以看出:
1. 厅数量为3的平均单价最⾼。
由上图可以看出:
由上图可以看出:
r(,_price,marker='x',color='b',alpha=0.5)
('⾯积AREA 和 单位⾯积房价per_price的散点图')
("单位⾯积房价")
("⾯积(平⽅⽶)")
()
由上图可以看出:
由上图可以看出:
1.随着楼层的变化,平均单价波动较⼤,所以楼层对单价有影响。
机器学习预测房价
由上⾯的分析可以看出(区域、房间数量、学校、楼层数、是否靠近地铁站、⾯积、厅数)等7个特征对房价有影响,因此将这些特征作为
data_new = data_(columns=['district', 'hall', 'roomnum', 'C_floor', 'total_price'], axis=1)
# 确定数据中的特征与标签
x = data_[:, data_s != "per_price"]
fea_imp = s
y = data_[:, 'per_price']
# 数据分割,随机采样30%作为测试样本,其余作为训练样本
from _selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=10, test_size=0.3)
# print(x_, x_, y_, y_)
# reshape(-1,1)表⽰任意⾏,⼀列
y_train = y_e(-1, 1)
y_test = y_e(-1, 1)
# 数据标准化处理
ss_x = StandardScaler()
ss_y = StandardScaler()
# fit_transform是fit和transform的组合,既包括了训练⼜包含了转换。
# transform()和fit_transform()⼆者的功能都是对数据进⾏某种统⼀处理
# (⽐如标准化~N(0,1),将数据缩放(映射)到某个固定区间,归⼀化,正则化等)
x_train = ss__transform(x_train)
x_test = ss_orm(x_test)
mean_y = (y_train)
s_y = (y_train)
[0] train-rmse:1.04640 test-rmse:1.04475
[65] train-rmse:0.55049 test-rmse:0.57638
[66] train-rmse:0.54969 test-rmse:0.57585
[67] train-rmse:0.54928 test-rmse:0.57555
[68] train-rmse:0.54904 test-rmse:0.57539
[69] train-rmse:0.54829 test-rmse:0.57457
[70] train-rmse:0.54804 test-rmse:0.57442
[71] train-rmse:0.54737 test-rmse:0.57405
[72] train-rmse:0.54685 test-rmse:0.57380
[73] train-rmse:0.54622 test-rmse:0.57343
[74] train-rmse:0.54584 test-rmse:0.57330
[75] train-rmse:0.54572 test-rmse:0.57320
[76] train-rmse:0.54557 test-rmse:0.57312
[77] train-rmse:0.54502 test-rmse:0.57257
[78] train-rmse:0.54446 test-rmse:0.57215
[79] train-rmse:0.54392 test-rmse:0.57191
[80] train-rmse:0.54342 test-rmse:0.57153
[81] train-rmse:0.54309 test-rmse:0.57132
[82] train-rmse:0.54299 test-rmse:0.57130
[83] train-rmse:0.54251 test-rmse:0.57103
[84] train-rmse:0.54239 test-rmse:0.57095
[85] train-rmse:0.54197 test-rmse:0.57077
[86] train-rmse:0.54146 test-rmse:0.57042
[87] train-rmse:0.54137 test-rmse:0.57035
[88] train-rmse:0.54091 test-rmse:0.57010
17 25.0 district_龙岗
10 20.0 district_坪⼭
11 15.0 district_⼤鹏新区
_importance(xg, max_num_features=10, importance_type='gain')
()
# (三) 假想情形,做预测,x_new是新的⾃变量
'''
预测要找⼀个条件为:
1.南⼭区
2.有3个房间
3.⾯积⼤概再80㎡左右
4.有地铁
5.学区房
的房⼦的⼤概花费
'''
room = [Roomnum['roomnum_3'] == 1].head(1).reset_index(drop=True)
dis = [District['district_南⼭'] == 1].head(1).reset_index(drop=True)
hal = [Hall['hall_3'] == 1].head(1).reset_index(drop=True)
x_new1 = ([room, dis, hal], axis=1)
x_new1['AREA'] = 80
x_new1['floor_num'] = 3
晨阳水漆价格-森霸传感股票牛叉股

更多推荐
深圳 二手房
发布评论