top of page

HISTORY OF DEEP LEARNING

Updated: Aug 23, 2021

The history of deep learning can be traced back to 1943, when Walter Pitts and Warren McCulloch created a computer model based on neural networks of the human brain. They used a combination of algorithm and mathematics that they called “Threshold logic” to mimic the thought process. Since then, Deep Learning has steadily improved, with only two significant breakthroughs in its development. Both were notorious.

Henry J Kelley was given credit for developing the foundations of a continuous back-propagation model in 1960. A simpler version based solely on the chain rule was developed by Stuart Dreyfus in 1962. While the concept of backward propagation (backward propagation of errors for educational purposes) existed in the early 1960s, it was clumsy and inefficient and would not come in handy until 1985.

The first efforts to develop deep learning algorithms came in 1965 from Alexey Grigoryevich Ivakhnenko (the group developed the data processing method) and Valentin Grigoryevich Lapa (author of cybernetics and prediction techniques). They used models with polynomial (complex equations) activation functions. Then it was analyzed statistically. Statistically the best selected properties from each layer were passed to the next layer (a slow, manual process).



In the 1970s, the first artificial intelligence winter began as a result of promises that could not be kept. The impact of this lack of Funding Limited both DL and AI research. Fortunately, there were people who continued the research without funding.

The first” convoluted neural networks " were used by Kunihiko Fukushima. Fukushima designed neural networks with multiple layers of pooling and convolution. In 1979, he developed an artificial neural network called Neocognitron, which uses a hierarchical, multilayer design. This design allowed the computer to "learn" to recognize visual patterns. The networks were similar to modern versions, but trained with a reinforcement strategy of repetitive activation in multiple layers that strengthened over time. In October, the Fukushima design allowed for manual adjustment of important features, increasing the “weight” of certain connections.

Many of his concepts continue to be used. The use of top-down connections and new learning methods have enabled the realization of various neural networks. When more than one model is presented at the same time, the Selective Attention model can separate and recognize individual models by shifting its attention from one to the other. (The same process many of us use when multitasking). A Modern Neocognitron can not only identify patterns that contain incomplete information (for example, an incomplete number 5), but also complement the image by adding incomplete information. This can be described as” inference".

Backward propagation, the use of errors in the training of deep learning models, improved significantly in 1970. This was when Seppo Linnainmaa wrote his master's thesis, including a FORTRAN code for backward propagation. Unfortunately, the concept was not applied to neural networks until 1985. This was when Rumelhart, Williams and Hinton showed that back propagation in a neural network could provide “interesting” distribution representations. Philosophically, this discovery illuminated within cognitive psychology the question of whether human understanding is based on symbolic logic (computationalism) or distributed representations (connectionism). In 1989, Yann LeCun provided the first practical demonstration of backward propagation in Bell Laboratories. He combined convoluted neural networks with the following: on reading Backward“handwritten” numbers. This system was eventually used to read the numbers of handwritten checks.

This time around is also when the second AI winter (1985-90s) began, which also affected research for neural networks and Deep Learning. Various overly optimistic individuals overestimated the “instant” potential of Artificial Intelligence, breaking expectations and infuriating investors. The outrage was so intense that the expression Artificial Intelligence reached pseudo-science status. Fortunately, some people have continued to work on AI and DL, and some significant progress has been made. In 1995, Dana Cortes and Vladimir Vapnik developed the support vector machine (a system for mapping and recognizing similar data). LSTM (long short-term memory) for repetitive neural networks was developed by Sepp Hochreiter and Juergen Schmidhuber in 1997.



The next important evolutionary step for deep learning occurred in 1999, when computers began to become faster at processing data and GPUs (graphics processing units) were developed. Faster processing with GPUs that process images increased computational speeds by 1,000 times over a 10-year period. During this time, neural networks began to compete with support vector machines. While a neural network may be slow compared to a support vector machine, neural networks performed better using the same data. Neural networks also have the advantage of continuing to evolve as more training data is added.

Around the year 2000, the problem of the disappearing gradient arose. It was discovered that the “features” (lessons) formed in the lower layers were not learned by the upper layers, because no learning signal reached these layers. This was not a fundamental problem for all neural networks, but only for those with gradient-based learning methods. It turned out that the source of the problem was certain activation functions. A number of activation functions intensified their input, in turn reducing the output December somewhat chaotic. This produced large input fields mapped in an extremely small December. In these input fields, a large change will be reduced to a small change in the output, resulting in a gradient that disappears. The two solutions used to solve this problem were Layer, Layer pre-training, and the development of long-short-term memory.

A research report by the META Group (now called Gartner) in 2001 described the challenges and opportunities of data growth as three-dimensional. The report defined increasing data volume and increasing data rate as increasing the diversity of data sources and types. It was a call to prepare for the nascent big data onslaught.

In 2009, Fei-Fei Li, an artificial intelligence professor at Stanford, created the image Network, a free database of more than 14 million tagged images. The Internet is full of untagged images, and it was. Tagged images were needed to “train” neural networks. Professor Li said: “Our vision was that big data would change the way machine learning works. Data drives learning.

By 2011, the speed of GPUs had increased significantly, making it possible to train convolutional neural networks layer by layer "without" pre-training. With the increased speed of computing, deep learning turned out to have significant advantages in terms of efficiency and speed. An example is AlexNet, a evolutionary neural network whose architecture won several international competitions in 2011 and 2012. Adjusted linear units were used to increase speed and drop.

Also in 2012, Google released the results of an unusual project known as Brain. The free-spirited project explored the challenges of” unsupervised learning." Deep learning uses" supervised learning", that is, the convoluted neural network is trained using tagged data (think images from ImageNet). Using unsupervised learning, untagged data is given to a evolutionary neural network and then asked to find repetitive patterns.



The Cat experiment used a neural network spread across 1,000 computers. Ten million "untagged" images were randomly taken from YouTube, shown to the system and then allowed to run the training software. At the end of the training, a neuron in the top layer was found to react strongly to images of cats. “We also found a neuron that responds very strongly to human faces,” said Andrew Ng, the project's founder. Unsupervised learning remains an important goal in the field of Deep Learning.

The Cat Experiment works about 70% better than its predecessors in processing unlabeled images. But he recognized less than 16% of the objects used for training, and it was compounded by objects being rotated or moved.

Currently, the processing of Big Data and the evolution of artificial intelligence depend on Deep Learning. Deep learning is still developing and needs creative ideas.

Related Posts

See All
bottom of page