2023年12月24日发(作者:黄海冰)

var = \'Neighborhood\'data = ([df_train[\'SalePrice\'], df_train[var]], axis=1)f, ax = ts(figsize=(8, 6))fig = t(x=var, y=\"SalePrice\", data=data)#(ymin=0, ymax=800000);(rotation=90);选出与价格因素最相近的10个特征,观察它们的相关性。k = 10

corrmat = df_()cols = st(k, \'SalePrice\')[\'SalePrice\'].indexcm = ef(df_train[cols].values.T)(font_scale=1.25)hm = p(cm, cbar=True, annot=True, square=True, fmt=\'.2f\', annot_kws={\'size\': 10}, yticklabels=, xticklabels=,cmap=\'YlGnBu\')()

#对数变换log(1+x)train[\"SalePrice\"] = 1p(train[\"SalePrice\"])#看看新的分布ot(train[\'SalePrice\'] , fit=norm);#

参数(mu, sigma) = (train[\'SalePrice\'])print( \'n mu = {:.2f} and sigma = {:.2f}n\'.format(mu, sigma))#画图([\'Normal dist. ($mu=$ {:.2f} and $sigma=$ {:.2f} )\'.format(mu, sigma)], loc=\'best\')(\'Frequency\')(\'SalePrice distribution\')#QQ图fig = ()res = ot(train[\'SalePrice\'], plot=plt)()(三):数据预处理

去掉离群点train = (train[(train[\'GrLivArea\']>4000) & (train[\'SalePrice\']<300000)].index)#Check the graphic againfig, ax = ts()r(train[\'GrLivArea\'], train[\'SalePrice\'])(\'SalePrice\', fontsize=13)(\'GrLivArea\', fontsize=13)()缺失值处理:拼接训练集和测试集:

ntrain = [0]ntest = [0]y_train = all_data = ((train, test)).reset_index(drop=True)all_([\'SalePrice\'], axis=1, inplace=True)print(\"all_data size is : {}\".format(all_))all_data_na = (all_().sum() / len(all_data)) * 100all_data_na = all_data_(all_data_na[all_data_na == 0].index).sort_values(ascending=False)[:30]missing_data = ame({\'Missing Ratio\' :all_data_na})missing_(20)f, ax = ts(figsize=(15, 12))(rotation=\'90\')t(x=all_data_, y=all_data_na)(\'Features\', fontsize=15)(\'Percent of missing values\', fontsize=15)(\'Percent missing data by feature\', fontsize=15)

更多推荐

拼接,相近,因素,价格,训练,处理,预测,看看