Home  >  Article  >  Gradient descent principle

Gradient descent principle

(*-*)浩
(*-*)浩Original
2019-07-09 13:36:097555browse

The three elements of the gradient method idea: starting point, descent direction, and descent step size.

Gradient descent principle

The weight update expression commonly used in machine learning is (recommended learning: Python video tutorial)

:, ​​λ here is the learning rate. This article starts from this formula to explain clearly the various "gradient" descent methods in machine learning.

Machine learning target functions are generally convex functions. What is a convex function?

Due to space limitations, we will not go into deep development. Here we will make a vivid metaphor to solve the problem of convex function. You can imagine the target loss function as a pot to find the bottom of the pot. The very intuitive idea is that we go down along the gradient direction of the function at an initial point (that is, gradient descent). Here, let’s make another vivid analogy. If we compare this move to a force, then the three complete elements are step length (how much to move), direction, and starting point. This vivid metaphor makes it easier for us to solve the gradient problem. Cheerful, the starting point is very important and is the key to consider during initialization, and the direction and step size are the key. In fact, the difference between different gradients lies in these two points!

The gradient direction is

Gradient descent principle

, and the step size is set to a constant Δ. Then you will find that if used When the gradient is large, it is far away from the optimal solution, and W is updated faster; however, when the gradient is small, that is, when it is closer to the optimal solution, W is updated at the same rate as before. This will cause W to be easily over-updated and move away from the optimal solution, and then oscillate back and forth near the optimal solution. Therefore, since the gradient is large when far away from the optimal solution and small when close to the optimal solution, we let the step length follow this rhythm, so we use λ|W| to replace Δ, Finally we get The formula we are familiar with:

Gradient descent principle

So the λ at this time changes with the steepness and gentleness of the slope, even though it is a constant.

For more Python related technical articles, please visit the Python Tutorial column to learn!

The above is the detailed content of Gradient descent principle. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn