用uci的crimes做了一个线性回归,test很差

  统计/机器学习 回归分析 Python    浏览次数:2728        分享
0


import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the data
#crimesDF =pd.read_csv("crimes.csv",encoding="ISO-8859-1")
crimesDF =pd.read_csv("communities.csv",encoding="ISO-8859-1")

#Remove the 1st 7 columns
print(crimesDF.shape[1]) #128
crimesDF1=crimesDF.iloc[:,6:crimesDF.shape[1]]

# Convert to numeric
crimesDF2 = crimesDF1.apply(pd.to_numeric, errors='coerce')

# Impute NA to 0s
crimesDF2.fillna(0, inplace=True)

# Select the X (feature vatiables - all)
X=crimesDF2.iloc[:,0:120]

# Set the target
y=crimesDF2.iloc[:,121]
print(y)
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state = 0)

# Fit a multivariate regression model
linreg = LinearRegression().fit(X_train, y_train)

# compute and print the R Square
print('R-squared score (training): {:.3f}'.format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'.format(linreg.score(X_test, y_test)))

## R-squared score (training): 0.78
## R-squared score (test): 0.03

test的score只有0.03 不知道是什么原因呢

 

constant007   2019-06-05 07:39



   2个回答 
0

看样子像过拟合了,应该是你变量太多,有多重线性相关了。

SofaSofa数据科学社区DS面试题库 DS面经

TTesT   2019-06-05 09:50

0

你换个random forest试试,再用cv调一下参

SofaSofa数据科学社区DS面试题库 DS面经

道画师   2019-06-12 20:29



  相关讨论

python里怎么做分位数回归(quantile regression)?

TypeError: Expected sequence or array-like, got class 'map'

python中如何实现保序回归算法?

系数非负的线性回归的python实现

常用的回归模型中,哪些具有较好的鲁棒性(robustness)?

怎么理解分位数回归quantile regression的目标函数?

逻辑回归模型中变量的系数怎么理解?

怎么处理真值大部分为0的回归问题

最小二乘线性回归的推导

泊松回归的公式是什么?

  随便看看

Resnet-18, Resnet-50, Resnet-101这些模型里的数字是什么意思?

主成分分析法(PCA)算是黑盒算法吗?

python产生服从常用概率分布的随机数

如何在numpy array尾部增加一行

逻辑回归模型中变量的系数怎么理解?