首页>>人工智能->机器学习入门(三)之 梯度下降法

机器学习入门(三)之 梯度下降法

时间:2023-11-29 本站 点击:1

简介:

梯度下降法(Gradient Descent)严格地说其实不能算是一种机器学习的算法,而是属于一种优化算法,其目的在于基于搜索最小化一个损失函数,找到最优解。由于其简单又具有很好地效果,因此被广泛地运用在机器学习的算法中。

原理讲解:

假设在二维空间平面上,有函数如下:

因此我们要寻找到最低点,我们可以定义一个公式:y=y-ηy'(y'为y的导数)。当我们重复代入上述公式时,我们可以发现随着点逐渐下降,y'的值越来越小,直到寻找到最低点时y'的值为0,y值不再改变,这时我们就找到了最低点。

其中η我们称之为"学习效率"。不难发现η越小,我们代入公式的次数就越多,得到结果的速度也就越慢。但是η的值也不能取得太大,否则可能会出现第一次就越过了最低点,然后逐渐愈加偏离的情况:

梯度下降法不仅仅可以应用在二维的数据,多维的数据也同样适用,推广到多维时可如下图:

应用在多维事,我们可以运用之前学过的多元线性回归算法来帮助我们计算结果,并将线性回归得到的式子转换成矩阵的形式,我们就可以通过代码实现上述的算法了。

这里再拓展一种随机梯度下降法(上述方法我们通常成为批量梯度下降法),目的是让提高计算速度。因为上面的方法我们每次下降都需要计算所有的数据,这样子就需要大量的计算时间,而通过实验发现,即使是随机取其中一组数据进行计算,最终我们还是可以到达最低点的附近(这里类似于用精度换取时间)。

好了,大概思路就是这样子,接下来我们通过python代码实现上面的算法:

代码展示:

1、封装算法

# 封装的多元线性回归的梯度下降算法import numpy as npfrom sklearn.metrics import r2_scoreclass LinearRegression:    def __init__(self):        self.coef_ = None        self.interception_ = None        self._theta = None    def fit_normal(self, x_train, y_train):        assert x_train.shape[0] == y_train.shape[0], \                "the size of x_train must be equal to the size of y_train"        x_b = np.hstack([np.ones((len(x_train), 1)), x_train])        self._theta = np.linalg.inv(x_b.T.dot(x_b)).dot(x_b.T).dot(y_train);        self.interception_ = self._theta[0]            self.coef_ = self._theta[1:]    def fit_gd(self, x_train, y_train, eta=0.01, n_iters=1e4):        assert x_train.shape[0] == y_train.shape[0], \              "the size of x_train must be equal to the size of y_train"        def lose(theta, x_b, y):             try:                     return np.sum((y - x_b.dot(theta) ** 2)) / len(x_b)              except:                      return float("inf")            def Derivative(theta, x_b, y):                return x_b.T.dot(x_b.dot(theta)-y)*2/len(x_b)            def gradient_descent(x_b, y, init_theta, eta, epsilon=1e-8):             theta = init_theta             i_iters = 0             while i_iters < n_iters:                      gradient = Derivative(theta, x_b, y)                     last_theta = theta                     theta = theta - eta * gradient                     if (abs(lose(theta, x_b, y) - lose(last_theta, x_b, y)) < epsilon):                      break                i_iters += 1            return theta        x_b = np.hstack([np.ones((len(x_train), 1)), x_train])        initial_theta = np.zeros(x_b.shape[1])        self._theta = gradient_descent(x_b, y_train, initial_theta, eta, epsilon=1e-8)        self.interception_ = self._theta[0]        self.coef_ = self._theta[1:]        return self    def fit_random_gd(self, x_train, y_train, n_iters=5, t0=5, t1=50):        assert x_train.shape[0] == y_train.shape[0], \             "the size of x_train must be equal to the size of y_train"        assert n_iters >= 1        def Derivative(theta, x_b_i, y_i):             return x_b_i*(x_b_i.dot(theta)-y_i)*2        def random_gradient_descent(x_b, y, initial_theta):            def learning_rate(t):                 return t0/(t+t1)            theta = initial_theta            m = len(x_b)            for cur_iter in range(n_iters):                 indexes = np.random.permutation(m)                 x_b_new = x_b[indexes]                 y_new = y[indexes]                  for i in range(m):                         gradient = Derivative(theta, x_b_new[i], y_new[i])                        theta = theta - learning_rate(cur_iter*m+i)*gradient            return theta        x_b = np.hstack([np.ones((len(x_train), 1)), x_train])        initial_theta = np.zeros(x_b.shape[1])        self._theta = random_gradient_descent(x_b, y_train, initial_theta)        self.interception_ = self._theta[0]        self.coef_ = self._theta[1:]        return self    def predict(self, x_predict):        assert x_predict.shape[1] == len(self.coef_), \                "Simple Linear regressor can only solve single feature training data"        assert self.interception_ is not None and self.coef_ is not None, \                "must fit before predict!"        x_b = np.hstack([np.ones((len(x_predict), 1)), x_predict])        return x_b.dot(self._theta)    def score(self, x_test, y_test):        y_predict = self.predict(x_test)        return r2_score(y_test, y_predict)    def __repr__(self):        return "LinearRegression()"

2、主函数:

from sklearn import datasetsfrom sklearn.model_selection import train_test_splitfrom LR_GD_class import LinearRegressionfrom sklearn.preprocessing import StandardScalerboston = datasets.load_boston()x = boston.datay = boston.targetx = x[y < 50.0]y = y[y < 50.0]x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=666)standardScaler = StandardScaler()standardScaler.fit(x_train)x_train_stand = standardScaler.transform(x_train)x_test_stand = standardScaler.transform(x_test)# 批量梯度下降法lin_reg = LinearRegression()lin_reg.fit_gd(x_train_stand, y_train)print(lin_reg.score(x_test_stand, y_test))## 随机梯度下降法lin_reg1 = LinearRegression()lin_reg1.fit_random_gd(x_train_stand, y_train, n_iters=50)print(lin_reg1.score(x_test_stand, y_test))
原文:https://juejin.cn/post/7096315783583809550


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:/AI/1142.html