2. Introduction

2.1. References to start with

We will not follow one text-book. However, it will be good to have a couple of standard references to fall back on for a literature search:

  • Two such standard references on machine learning are [russell_artificial_2010] and [bishop_pattern_2007];
  • Another reference on this topic with a more mathematical focus is [mohri_foundations_2012].

Furthermore, here are a couple of references which will be helpful for the Python implementations we are about to write during this course:

  • Official Python 3 documentation [python_development_team_python_2016]. Have a look at the Tutorial chapter.

  • The Scipy lecture notes [scipy_development_team_scipy_2016] include tutorials on scientific use of Python. The two Python modules will use most often are:

    • Numpy: Its documentation can be found here [numpy_development_team_numpy_2016].
    • Matplotlib: Its documentation can be found here [matplotlib_development_team_matplotlib_2016]. A good starting point to explore its features is to take a look at the gallery of examples.
Homework
  1. If you haven’t already done so, install a Python development environment. Depending on what operating system you run, there are several options:

    • All platforms: Anaconda provide a quite painless way to install everything you need for a first start;
    • MacOS: Python 2 is already installed. However, to install Python 3, IPython, and for easier installation of additional modules with pip, I recommend to either also install Anaconda or use Homebrew;
    • Linux: Python 2 and 3 should be installed by default. Install additional modules with pip.
  2. If you are not already expert enough in using the modules Numpy and Matplotlib, attend to the references above and learn their basic usage.

2.2. Vocabulary and Overview

Before we dive into the topic let us settle for a common vocabulary and get an idea about what we would like to achieve with our study. To put machine learning in perspective we shall start very broadly with a discussion of artificial intelligence taken from the very recommendable introduction in [russell_artificial_2010] and afterwards put machine learning and the aims of this lecture in perspective:

  • What is artificial intelligence (AI)?

    • For thousands of years, we have tried to understand how we think;

    • AI even attempts to go a step further:

      • it attempts not just to understand but also to build intelligent entities.
    • AI is one of the newest fields in science and engineering (started after World War II);

    • Possible definitions:

      • Acting humanly:

        “The art of creating machines that perform functions that require intelligence when performed by people.”

        —Kurzweil, 1990

        “The study of how to make computers do things at which, at the moment, people are better.”

        —Rich and Knight, 1991

      • Thinking humanly:

        “The exciting new effort to make computers think […] machines with minds, in the full and literal sense.”

        —Haugeland, 1985

        “[The automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning […]”

        —Bellman, 1978

      • Acting rationally:

        “Computational Intelligence is the study of the design of intelligent agents.”

        —Poole et al., 1998

        “AI […] is concerned with intelligent behavior in artifacts.”

        —Nilsson, 1998

      • Thinking rationally:

        “The study of mental faculties through the use of computational models.”

        —Charniak and McDermott, 1985

        “The study of the computations that make it possible to perceive, reason, and act.”

        —Winston, 1992

    • Acting humanly: The Turing test approach

      • In 1950 Turning [turing_i.computing_1950] devised a test to provide a satisfactory operational definition of intelligence;

      • A computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the written responses come from a person or from a computer;

      • The computer needs to posses the following features:

        • natural language processing: to communicate in, e.g., English
        • knowledge representation: to store the information
        • automated reasoning: to use the stored information, to answer question, and to draw conclusions
        • machine learning: to adapt to new circumstances, and extrapolate and detect patterns
      • The Turing test remains relevant even today but less from the engineering and more from the philosophical stance, i.e., in the following spirit (taken from [russell_artificial_2010]):

        • The quest for “artificial flight” succeeded when the Wright brothers and others stopped imitating birds and started using wind tunnels and learning about aerodynamics.
        • Aeronautical engineering texts do not define the goal of their field as making “machines that fly so exactly like pigeons that they can fool even other pigeons.”

      Homework

      Read Turning’s paper [turing_i.computing_1950] and put it in context with the field of artificial intelligence.

    • Thinking humanly: The cognitive modeling approach

      • To tell whether a program “thinks like a human” we need to learn what humanely thinking is:

        • Observe our thoughts as the go by;
        • Observe persons during a task
        • Psychological / Neuroscientific experiments
      • For our endeavor, however, it will be good practice to keep the fields such as cognitive science, neuroscience, psychology and philosophy separated as long as it is possible.

    • Thinking rationally: The “laws of thought” approach

      • Use of logical reasoning and argumentation, for example:

        • deduction: A general rule applied to a particular case implies a trivial result.

          Input Implication Example
          RULE   On a planet its sun rises every day.
          CASE   We are on a planet.
          RESULT The sun rose every day.

          The realm of mathematics.

        • induction: From a trivial result in a particular case we hope to infer the general rule.

          Input Implication Example
          RESULT   The sun rose every day.
          CASE   We are on a planet.
          RULE On a planet its sun rises every day.

          The realm of science.

        • abduction: From a general rule and a trivial result we hope to infer the particular case.

          Input Implication Example
          RULE   On a planet its sun rises every day.
          RESULT   The sun rose every day.
          CASE We are on a planet.

          More seldomly used.

      Homework

      1. Discuss which type of reasoning can most readily be learned by a computer.

      2. Discuss why the other two examples of reasonings are more difficult to implement.

      3. Observe yourself in discussions:

        1. What type of arguments do you use? Analogies, examples, experience, interpretations, faith, etc.

        2. Benchmark them w.r.t. deduction and come clean with our dilemma:

          • Read Plato’s Apology 21a-d:

          “‘When he was forty, there came a curious but crucial episode which changed Socrates’ whole life. What happened shall be told in the words which, by Plato’s account, he himself used at his trial [by which time Socrates was 70 years old (Apology 17d)]. ‘Everyone here, I think, knows Chaerephon,’ he said, ‘he has been a friend of mine since we were boys together; and he is a friend of many of you too. So you know the eager impetuous fellow he is. Well, one day he went to Delphi, and there he had the impudence to put this question – do not jeer, gentlemen, at what I am going to say – he asked, ‘Is anyone wiser than Socrates?’ And the Pythian priestess answered, ‘No one.’ Well, I was fully aware that I knew absolutely nothing. So what could the god mean? for gods cannot tell lies. For some time I was frankly puzzled to get at his meaning; but at last I embarked on my quest. I went to a man with a high reputation for wisdom – I would rather not mention his name; he was one of the politicians – and after some talk together it began to dawn on me that, wise as everyone thought him and wise as he thought himself, he was not really wise at all. I tried to point this out to him, but then he turned nasty, and so did others who were listening; so I went away, but with this reflection that anyhow I was wiser than this man; for, though in all probability neither of us knows anything, he thought he did when he did not, whereas I neither knew anything nor imagined I did.’”

          —Plato, Apology 21a-d, Translation by C.E. Robinson

        3. Argue then how we can even learn.

          • Read Meno by Socrates at least starting from this passage:

          Meno: And how will you enquire, Socrates, into that which you do not know? What will you put forth as the subject of enquiry? And if you find what you want, how will you ever know that this is the thing which you did not know?

          Socrates: I know, Meno, what you mean; but just see what a tiresome dispute you are introducing. You argue that man cannot enquire either about that which he knows, or about that which he does not know; for if he knows, he has no need to enquire; and if not, he cannot; for he does not know the, very subject about which he is to enquire.”

          —Plato, Meno, Translation by B. Jowett

    • Acting rationally: The rational agent approach

      • Create agents (e.g., computer progams) that operate autonomously, perceive their environment, persist over a prolonged time period, adapt to change, and create and pursue goals.

      • Making correct inferences (as in the “laws of thought” approach) is the extreme case of being a rational agent.

      • In many situations, however, correct inferences are not possible, e.g.:

        • insufficient understanding of the environment;
        • not enough input data to base a decission on.
      • A rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome.

      We will go down this road:

      • The standard of rationality is mathematically well defined and completely general.

      • We may exploit this, spell out specific designs, and check how they perform in certain environments. For instance:

        • The “probably approximatly correct (PAC)” framework
      • Human behavior, on the other hand, is well adapted for one specific environment and is defined by, well, the sum total of all the things that humans do.

    Homework

    In the introducion of [russell_artificial_2010] the following questions are put forward:

    1. “Surely computers cannot be intelligent—they can do only what their programmers tell them.” Is the latter statement true, and does it imply the former?
    2. Surely animals cannot be intelligent—they can do only what their genes tell them.” Is the latter statement true, and does it imply the former?
    3. “Surely animals, humans, and computers cannot be intelligent—they can do only what their constituent atoms are told to do by the laws of physics.” Is the latter statement true, and does it imply the former?

    What is your opinion?

  • Influences:

    • Philosophy:

      • Can formal rules be used to draw valid conclusions?
      • How does the mind arise from a physical brain?
      • Where does knowledge come from?
      • How does knowledge lead to action?
    • Mathematics:

      • What are the formal rules to draw valid conclusions?
      • What can be computed?
      • How do we reason with uncertain information?
    • Neuroscience:

      • How do brains process information?
    • Psychology

      • How do humans and animals think and act?
    • Computer engineering

      • How can we build an efficient computer – the artifact that we want to charge with intelligence?
    • Control theory and cybernetics:

      • How can artifacts operate under their own control?
    • Linguistics:

      • How does language relate to thought?
    • And finally economics:

      • How should we make decisions so as to maximize payoff?
      • How should we do this when others may not go along?
      • How should we do this when the payoff may be far in the future?
  • A breakdown of historical periods:

    • 1943–1955: The gestation of artificial intelligence

      • Model of artificial neurons by Warren McCulloch and Walter Pitts in 1943 – see [mcculloch_logical_1943];
      • Turning gave lectures on AI as soon as 1947.
    • 1956: The birth of artificial intelligence

      • John McCarthy convinced Marvin Minsky, Claude Shannon, and Nathaniel Rochester to help him bring together U.S. researchers interested in automata theory, neural nets, and the study of intelligence at a two-month workshop in Dartmouth.
    • 1952–1969: Early enthusiasm, great expectations

      • First problem solvers, game players, theorem provers;
      • John McCarthy referred to this period as the “Look, Ma, no hands!” era;
      • Creation of LISP;
      • Perceptron by Frank Rosenblatt in 1958 [rosenblatt_perceptron:_1958];
      • Adalines (adaptive linear neuron) by Bernie Widrow and Marcian Hoff in 1960 [widrow_adapting_1960];
    • 1966–1973: A dose of reality

      • Try and error – combinatorial explosion;
      • Lack of computational resources.
    • 1969–1979: Knowledge-based systems: The key to power?

      • Algorithms using of domain-specific knowledge instead of general-purpose solvers;
      • Expert systems for medical diagnosis;
      • Incorporation of uncertainty.
    • 1980–present: AI becomes an industry

      • Optimization of logistics;
      • Sudden boom but only few projects lived up to their expectations;
      • AI winter.
    • 1986–present: The return of neural networks

      • The back-propagation algorithm for training neural networks was reinvented in [rumelhart_learning_1986];
    • 1987–present: AI adopts the scientific method

      • Hidden Markov models;
      • Bayesian networks;
    • 1995–present:

      • The Internet pushes the development of intelligent (?) agents, e.g.:

        • chatbots
        • recommender systems
        • aggregates
      • Access to computation resources at sufficient speed.

      • The big data age: Huge amount of labeled training data available, e.g.:

        • Dictionaries
        • Word corpora on different topics
        • Wordnets
        • Wikipedia
        • Google
      • Founders of AI discontent with current state:

        • AI should return to its roots of striving for, in Herbert Simon’s words, “machines that think, that learn and that create.”
      • State of the art – some examples:

        • Spam fighting: Most adaptation done my machine learning algorithms
        • Speech recognition: Siri,
        • Face recognition: Facebook, Apple Photos, Google Photos
        • Game playing: IBM’s deep blue chess player against world champion Garry Kasparov
        • Autonomous planning and scheduling: NASA’s mars rover
        • Robotic vehicles: Tesla’s self-driven car
        • Machine Translation: Google Translate

      Homework

      1. Discuss the difference between understanding and knowing – take as an example the repeating phenomena of the sun rise discussed above.
      2. From this perspective, discuss why big data is certainly a great resource to have to advance the field of AI but by itself will most likely disappoint us – take for example the human genom.

Note

2016-10-19

Now that we have an overview where about we are, let us discuss the direction of our study. Also we might disappoint the founders such as Simon as we will focus solely on machine learning. However, in our defense we may claim that in whichever direction AI might develop, machine learning will at least be a extremely important stepping stone if not even stay an integral part in the field:

  • What is machine learning (ML)?

    • ML is a subfield of AI:

      “[Machine learning] gives computers the ability to learn without being explicitly programmed”

      —Samuel, 1959

      Put differently, one seeks “soft” algorithms which to some extend can adapt themselves to a certain type of task instead of consisting merely of hard-coded logic.

    • Dealing with large amounts of data:

      • structuring data
      • finding correlation
      • classification of data
      • pattern recognition
      • data compression
      • data driven decision
      • adaptation of tasks to data
      • extrapolation / prediction
    • Types:

      1. Supervised learning

        “Soft” algorithms which are supposed to infer the designated task by inspection of appropriate training data.

        Scheme of supervised learning.

        Examples:

        • Classification: prediction of discreet classes, e.g.:

        • Regression: prediction of continuous parameters, e.g.:

          • energy consumption according to learned user behavior
          • prediction of a trend according to a given history
      2. Unsupervised learning

        Structuring data into clusters without detailed prior knowledge.

        Example of cluster analysis.

        Ficticious example of 2d data points. The color indicates a relation between the data points. From these relations the shaded regions may be inferred by an unsupervised machine learning algorithm. This may be useful when looking for coarse, structural properties of a datat set. (source).

        Examples:

        • Wordnets: Relationships between words of a natural language;
        • Cross-references between documents;
        • Data compression and dimensionality reduction.
      3. Reinforcement learning

        An agent (machine learning program + artifact) learns to fulfill a certain task by, e.g., trial and error. Learning is facilitated by the ability to observe the environment and receive feedback depending on the actions.

        Reinforcement learning scheme.

        Examples:

        • Movement of a robot in unknown terrain or under varying conditions;
        • Getting high-scores in Atari games like Google Deepmind [mnih_human-level_2015].

Our main focus in this short course will lie on supervised learning using neural networks.