Page 2 of 3
R-GB: Do you claim that the assumption of mathematics and other sciences that there are very few and simple rules that govern the world is wrong?
V-V: I believe that it is wrong. As I mentioned before, the (low-dimensional) problem "forest" has a perfect solution, but it is not simple and you cannot obtain this solution using 15,000 examples.
R-GB: Maybe it is because learning from examples is too limited?
V-V: It is limited, but it is not too limited. I want to stress another point: you can get a very good decision rule, but it is a very complicated function. It can be like a fractal. I believe that in many cases we have these kinds of decision rules. But nonetheless, we can make empirical inferences. In many cases, to make empirical inference, we do not need to have a general decision rule; we can do it using different techniques. That is why empirical inference is an interesting problem. It is not a function estimation problem that has been known in statistics since the time of Gauss. Now a very exciting time has come when people try to find new ways to attack complex worlds. These ways are based on a new philosophy of inference.
In classical philosophy there are two principles to explain the generalization phenomenon. One is Occam's razor and the other is Popper's falsifiability. It turns out that by using machine learning arguments one can show that both of them are not very good and that one can generalize violating these principles. There are other justifications for inferences.
R-GB: What are the main challenges that machine learning should address?
V-V: The main challenge is to attack complex worlds.
R-GB: What do you think are the main accomplishments of machine learning?
V-V: First of all, machine learning has had a tremendous influence both on modern intelligent technology and modern methods of inferences. Ten years ago, when statisticians did not buy our arguments, they did not do very well in solving high-dimensional problems. They introduced some heuristics, but this did not work well. Now they have adopted the ideas of statistical learning theory and this is an important achievement.
Machine learning theory started in early 1960's with the Perceptron of Rosenblatt and the Novikoff theorem about Perceptron algorithm. The development of these works led to the construction of a learning theory. In 1963, Alexey Chervonenkis and I introduced an algorithm for pattern recognition based on optimal hyper-plane. We proved the consistency of this algorithm using uniform convergence arguments and got a bound for its accuracy. Generalization of these results led to the VC theory.
From a conceptual point of view the most important part of VC theory is the necessary and sufficient conditions for learn-ability not just sufficient conditions (bounds). These conditions are based on capacity concepts. There are three capacity measures, one is entropy, the second is growth function and the last is VC dimension. VC dimension is the most crude description of capacity. The best measure is the VC-entropy which is different than the classical entropy. The necessary and sufficient condition for a given probability measure states that the ratio of the entropy to the number of examples must go to zero. What happens if it goes to value a which is not zero? Then one can prove that there exists in the space X a subspace X0 with probability measure a, such that subset of training vectors that belong to this subspace can be separated in all possible ways. This means that you cannot generalize. This also means that if you have to choose a good function from an admissible set of functions you can not avoid VC type of reasoning.
R-GB: What do you think about the bounds on uniform convergence? Are they as good as we can expect them to be?
V-V: They are O.K. However the main problem is not the bound. There are conceptual questions and technical questions. From a conceptual point of view, you cannot avoid uniform convergence arguments; it is a necessity. One can try to improve the bounds, but it is a technical problem. My concern is that machine learning is not only about technical things, it is also about philosophy: What is the complex world science about? The improvement of the bound is an extremely interesting problem from mathematical point of view. But even if you'll get a better bound it will not be able help to attack the main problem: what to do in complex worlds?