On Fourth Generation Adaptive Networks
From StoneHome
Generational seperation has proven a useful, if relatively arbitrary, way to categorize approaches and complexities involved in a task. Probably the best known generational metaphor is that pertaining to programming languages, eventually leading to the well-known if badly understood acronym <acronym title="Fourth Generation Language">4GL</acronym>, indicating a language with a strong awareness of and integration with a data hosting system external to the software (typically a database.) I believe that applying a similar perspective to the state of the art in AI development could provide significant procedural insight into a field which is currently driven mostly by the work of a few brilliant individuals, rather than a large group of good-enough engineers. As anyone familiar with the distinction can tell you, this leads to a field with incredible insight, but which isn't well rigored, whose documentation leaves much to speculation, and whose design criteria are understood murkily at best. Nearly every capital-g good language started out this way; I'm certainly not criticizing the approach. Large groups of good-enough engineers generally do not pioneer fields.
That said, it is time for these tools to come to the masses, and in a Big Way ™. It is my belief that this should have occurred in the mid-80s; the field of NNs and other primitive AI technologies was reasonably well understood by then, and end-user commodity computing was powerful enough to get real use from such tools. Indeed, a number of companies did start with the intention of providing NNs, Expert Systems and other AI tools to end-users, some seeing their clientele as programmers, others as the average joe. None of these businesses took off in a big way; even in industry, where many AI tools have become Big Business, AI is virtually always specialized, developed in-house, and absolutely proprietary (generally it's also absolutely locked to a single view of data, impeding the application of later insights without reformulation, an expensive, difficult and falliable process.)
This is ridiculous.
So What Does Captain Rants-a-Lot Want?
There was a time that this was also quite common in software development (and still is, in the case of some of the programmers which are either too stubborn to modernize or which doubt the benefits of newer techniques.) Using C/C++ as an example, before the provision of classes, a person wasn't a Real Programmer unless they'd reinvented the map and the doubly linked list and the red-black tree a half dozen times each. With classes, most major compiler vendors began providing basic behavioral libraries including a string, a stack, a vector, a queue, lists both singly and doubly linked, and if you were lucky, possibly also a map, a set, one or a few kinds of tree, et cetera. Implementations varied by vendor, and suddenly one wasn't a Real Programmer unless one knew the benefits and detriments of each compiler's library, and was usage-familiar with at least the major cases. As the STL came into play and compilers began to conform to a real standard for this behavior, the definition of Real Programmer shifted to people intensely familiar with the standard, able to design software which would do things The Good Way ™ on a variety of platforms, relying on the implementation of the STL to conform to certain rules.
It is my belief that the current state of AI Research is in that first phase: cowboy superheroes which can implement complex internal structures and algorithms. Brooks observes that most programming paradigms have a complexity ceiling; he suggests that the complexity ceiling for assembly tends to work out to about 60-80kLoc of assembly. He suggests that the move to higher-level languages and higher-level structures allow more complex software by burying complexity at the cost of a bit of overhead. Inheritance, Multiple inheritance, class encapsulation, template specialization and RTTI are each good examples of this; each has its own unique cost style, and each can bring new approaches to software which otherwise wouldn't be possible.
It is common for some programmers to suggest that complex language-level structural tools invite bloat, or that they're hideously expensive compared to a local implementation. Whereas specialization can bring gains, occasionally significant gains, those gains are generally far smaller than believed; people unfamiliar with standard tools frequently make significant usage errors which majorly impact efficiency, and not being aware of their mistake blame the tools for the problem. Excepting intensely used code, such as kernel code, interrupt handlers, database cores, 3d cores and sound synthesizers, where single cycle improvements actually matter, the amount of time spent on specialization is generally
Haters
Also frequent is the suggestion that they're valueless based on a turing-equivalence or an all-languages-are-eventually-machine-code basis, both of which are relatively asinine; simply because one can use a simpler system to implement something more complex doesn't mean that the more complex thing, as a child of the simpler thing, is inherently valueless. A frequent alternate view of the machine code argument is that all things in C++ can be written in assembly. That you can implement polymorphism in a language which doesn't already have it doesn't mean that to have it at language level is useless. Arguably, having language-level support is far preferable, as it provides a single mechanism implementation which is compatible, well-tested, well-understood and efficient. Single programmers rarely approach the sophistication of a rudimentary C++ implementation, let alone that of a refined compiler. Furthermore, language-level implementations allow optimization whereas code-implemented versions do not.
Much more telling than these opinions on design, though, are the wide-spread and profound success of STL-based tools. Given the bulk of newer code wholly reliant on the STL, and the number of users which frequently claim that their work would be beyond them without the STL, it is reasonable to suggest in the absence of better criterion that the STL is a Good Thing. Another such indicator is the STL's wide adoption despite utter lack of advertising or large-vendor promotion. Whereas many technologies' successes are debatably attributable to employers' vulnerability to marketing and general lack of time or resources for appropriate technology research, I remain unaware of such an argument which can be made for the STL (though I'm open to non-silly suggestions.) Arguably, given the competition and its vigorous promotion - notably MFC, ATL and .NET - I find it quite interesting that by most accounts the STL is the defacto C++ toolkit. That said, there is also the argument that C/C++ programmers are more strongly focussed on portability than those of most other languages, which would give the STL, arguably the only major well-defined well-implemented portable structural toolkit, a major edge. Still, even if the portability issue is what drives the STL, then it can be argued that the STL is still the successful toolkit because of its focus on an area which other toolkits have a vested interest in ignoring, and that leads to the possible belief that portability is crucial. So, what would happen if we had something STL-ish for AI?
C++ tools? Wasn't this about AI?
The basis of the STL is absolute and concrete definitions. STL-focussed programmers frequently hold The C++ Standard as if it were a religious book, and many (myself included) can quote stretches of it with section number much in the way that priests quote psalms, lawyers precedents and engineers physical constants and formulae. Programmers are famous for "holy wars," a term reflecting the intensity with which we hold and debate beliefs, even those which are well understood as opinions rather than fact. Case in point, it is frequent for bitter, weeks-long arguments to erupt between adherents of the benefits of certain text editors or certain online games over their competition, occasionally leaving long-held grudges in their wake. Whereas some attribute this to John Gabriel's Greater Internet Fuckwad Theory, others see it as a social (and socially inept) evolutionary process for tool replacement. Many older text editors, once defended as if temples, have now fallen by the wayside; indeed, of the early set, only vi and emacs seem to maintain live descendants.
That said, some things are absolute. Just as various branches of core religious groups debate semantics and interpretations of archaic texts, some things are left crystal clear. For example, to my knowledge there is no branch of the Christian church which debates the interpretations or respect of the Ten Commandments, heresies and debates about Satanism as a descendant of Christianity aside. The same can be said for certain core principles of most religions. In the group of C++ programmers, that absolute and undebatable set of definitions is exceptionally large, well-understood and thoroughly defined. The ISO C++ standard is one of the most detailed and specific CS documents I've ever seen. The standard is therefore an exceptionally powerful tool: a programmer familiar with it can rely on any of a huge list of things being true. Posix extensions provide another such tool. Standard libraries provide another such tool. In many ways, the standard, the standard library, and very-well-defined widely-implemented language extensions like the Posix C++ extensions can be seen in terms of an interface, from the OOP point of view. The programmer knows they're there, knows their significance, can expect certain things of them, and doesn't really care about their guts.
Yes, a Real Programmer can write a red-black tree, and with time tune it to a specific case and beat an STL hash. But, really, why bother? In almost every real-world case the efficiency benefit enjoyed by such a tool is fantastically small and cannot justify the programmer's time, effort, or the bugs that come with new code. To wit, Microsoft, which has almost certainly the largest proprietary toolkit and arguably the largest set of frequent-use specialty applications for containers, still uses the STL extensively, even in time-important code. The wise programmer learns from Microsoft's money decisions, and both time and effort are money.
Some Definitions
I am largely uncomfortable with the terminology I have learned from the books which I've read. It's generally speculative, ill-defined, and in contrast with other books. Unfortunately, they've also taken most of the good words. I will not burden you, the reader, with the need to seperate general-use terms from local-use terms; therefore, some of the terminology I use is driven away from what seems like the sensible word due to prior differing usage by other writers. I will add other defintions as I go, but three are immediately critical.
That said, I think that a stronger toolkit for describing AI techniques and tactics would benefit the field immensely. Much as Design Patterns (hereafter referred to by its common nickname - Gang of Four, or GoF) immediately changed the way I thought about software and design, I believe a strong method of concrete discussion within AI would allow stronger and clearer communication, documentation, and therefore faster and more accurate design, better understanding of results, and therefore faster (and quite possibly broader) progress.
Adaptive Network
As a pedant, I object to the term "neural network," on basis that these tools bear no more similarity to an actual neural system than a metaphor built on a hundred-year-old and relatively obsolete view of the mechanics of the brain, and as the common design of the networks involves a fantastically different approach to connectivity. It can be argued that the brain is extremely modular, partially redundant and able to change its interconnections (such as in the case of stroke victims relearning behaviors) and therefore far more similar to dynamic object-oriented systems.
I am far more comfortable with the largely forgotten engineering term "massively parallel adaptive filter," as these network tools are literally mathematical filters, are generally honed through an adaptation by evaluation reaction system (be it deterministic, random, heuristic, hybrid or otherwise,) are parallel, and in non-trivial cases can become quite massive. Unfortunately, the term is both cumbersome and generally unfamiliar, and the acronym is kind of silly. I'll settle for the relatively cleaner "adaptive networks," because 'network' is a reasonable term for the layout and is well understood, and because 'adaptive' in my opinion is a far better description of the network's actual mechanisms than 'neural' is. I'm discarding the term 'reactive' both because it's cumbersome and because it's (rarely) not the case.
Selection
I find the terms "learning," "training" and "evolution" disingenuous and misleading.
Not Learning
Real learning, provided sufficient data, isn't vulnerable to local minima and maxima - in fact, real learning frequently helps surmount local optima by analogy, exception, elimination, inference and occasionally by lit curiosity or stimulated intuition. Learning denotes that some understanding of a system is coming about: that attributes, characteristics and mechanisms of the target are coming to light, and that expectations of what will happen given a new untested situation are being formed. It can be argued therefore that early animist religions - excepting, of course, the potential sole correct religion - are a form of learning; they helped us categorize and provide early, rough models for understanding the world around us. No, sickness isn't the result of demon posession, but yes, sickness does come from the places that demons of sickness were believed to have lived: cesspools, primitive mortuaries, corpses and carcasses, contaminated rivers believed to be evil, foeces, and people thought to be unvirtuous (and therefore a carrier for demons) which were actually carriers of embattled diseases (tuberculosis, ghirardia, typhoid and two of the three diseases underlying bubonic plague frequently make carriers rather than fatalities.)
Also the case of learning is that learning can be unrolled when it's found to be incorrect. As our beliefs regarding disease grew more complex and provided closer to accurate results, individuals began to suspect the truth - infestation by tiny competitive organisms. As our beliefs regarding physics grew stronger, we came to reject first terracentrism then heliocentrism, and are now teetering on rejecting the notions that ours are the only dimensions, universes, and even directions through time. This is coming about because of an actual understanding of the underlying data, combined with the careful inspection of the ramifications of errors between the current conceptual model and experimental data. A neural network does not suddenly make the leap of intuition that provides for paradigm shifts like Newtonian mechanics, relativity, quantum mechanics, superstring theory, n-brane topology and whatever's coming next.
Not Training
Training is a similar, though more difficult to unravel, problem. Training implies that a person is taking the time to coax a trainee to behave in a specific fashion. That's not what's actually happening with neural networks, even when dealing with networks which have been created in a fashion to conform output to example data with known results. There isn't a fundamental understanding of what's going on; a network simply gets weighted through some progression to a series of equations which model the outputs for the inputs. It's important to understand that modelling is not training. Training is based on actual understanding; modelling is simply finding another, well-defined system which matches what is known about the abstract system. To that end, any transformation on the behavior which involves actual understanding results in failure; for example, with a person trained to avoid red walls in a maze and a network which has been stimulated to find the lowest cost traversal of a path where red borders are expensive and there's a guaranteed 0-cost solution, discounting human error you should get the same answer from each, given a problem with exactly one correct answer.
However, given an answer which could only have been caused by one problem - say, "tell me where the red walls are given that the path taken was this," the network is utterly useless. It cannot even be adapted to the task with an extra piece of code, unless one proposes brute-force searching through the network's responses, at which point the utility and efficiency of the network are utterly lost and the network becomes a liability. Given a not-terribly-difficult problem, the human can usually work it out. This is the difference between training and what a neural network really does: training, an observation of models, behavior, and the right and wrong things to do, can be adapted by breaking up criteria and re-applying them elsewhere; model equations, which are simply blind observations of coefficients which happen to map data, cannot.
Not Evolution
One gets one's best argument from 'evolution,' because whether evolution applies is largely based on viewpoints. Evolution is a complex process involving not only adapting to one's environment, but also adapting to the changes that prior adaptations - yours or those of others - have made. In my opinion, evolution is not the action of one species, but rather of the largest applicable biosphere. Case examples abound; the cane toad and den rabbit imported to australia, kudzu to the mid-atlantic states; africanized bees and asian red ants. It is common that members of a larger biosphere - say, a continental system - have had more and better varied pressures than inhabitants of a smaller biosphere - say, a small isolated island - and therefore have stronger survival mechanisms. European pigs utterly obliterated the indigenous ecosystems of many Hawaiian and Indonesian islands when agriculture was considered more important than biodiversity.
I do believe that the system being discussed provides an apt metaphor for neural networks. However, I believe we're aiming a bit too high. A good analogue for a power plant turbine is a jet turbine, but not the whole jet plane. The process being discussed isn't evolution. It's selection. Darwin actually used the term selection (well, natural selection, but adaptive networks aren't natural) almost exclusively with regards to evolution, and modern use by biologists suggests that the action taken by a single species in regards to a single or a set of pressures is selection. Evolution is the process of that species' selections, and then other species' responsive selections, and so forth. Selection is the local action; evolution is the large-view ongoing process.
Adaptive networks generally don't have large-scope training. There's a single problem they're apportioned to, which does not change, and the comparative success of their peers doesn't alter their approach in the search for niches. In evolution, even a single problem generally has many solutions at varying levels competing at all times; a carnivore predator is competing with other styles of hunter, scavengers, small animals and bacteria, and after the predator has moved on a number of other niche-exploiters move in, such as calcivores, plants and even occasional smaller creatures wanting defensible homes in a skeleton. By comparison, adaptive networks each get exactly the same resources: everyone gets one copy of the problem, nobody competes, nobody has any value in finding solutions to the parts of the problem which other networks haven't fully exploited. There are examples of computed evolution, but they're exceedingly rare, extremely processing intensive, and generally do not well solve single problems, making them difficult to harness and of comparatively low utility.
Therefore, I shall use the term 'selection,' and define it as carefully as I am able.
Orientation
This is going to piss a bunch of people off. I don't believe that what we currently have - expert systems, neural networks, and so forth - are artificial intelligence tools. I believe they're building blocks of something earlier. What I'm saying is analogous to the observation that addition is not a calculus tool, but rather part of something earlier - arithmetic. This is not to say that addition isn't used by calculus; calculus uses members of many earlier building blocks, such as algebra, trigonometry, and in accord with the example, arithmetic. However, things which are part of calculus are integration, summation, derivation and so forth. Note that things beyond calculus often use calculus' tools; still, though integration is used in linear algebra, it is no more a part of linear algebra than addition is part of calculus.
This Ain't Calc
With that hedging firmly in place, let me repeat myself, so I sound less combative. I don't believe that current tools are AI tools. I believe they're parts of earlier building blocks towards machine cognition. To extend the analogy, we are currently developing arithmetic, and I'm just saying "no, that's not calculus." I don't have the answers to what's that far ahead. There is still the need to describe a task, rather than the specific mechanisms against which a network has been selected. In current parlance, a network which is selected for various topics underlying chess - board position, mobility, king safety, material (piece loss,) and so on - can be said to have learned chess or to have been trained for chess. Because it's just a system of equations which have been massaged to match existing data, I will use the term "oriented towards." A long, thin rock doesn't learn to face the current; it is oriented towards the current. Iron filings don't learn their angles in a magnetic field; they are oriented towards the field. Soldiers don't learn where a target is at a given moment; they orient themselves to the target. If you give them a new current, a new magnetic field, a new target, they aren't learning something new; they're just orienting to another instance. You don't learn how to do something every time you do it with different core data; you just re-orient.
You don't learn something new when you start feeding a perfectly good network crap data, either. The network is just re-oriented. It's not unlearning, or learning wrong things; it's just orienting to different data. It's orienting to what you fed it. That's not wrong. Because it hasn't actually learned anything, you can't say "okay, now back to what we were doing before." It's not oriented that way anymore. Nothing was learned; those values were simply pushed into place by the dynamics and pressures of a system. You aren't learning how to deal with rain from a new angle when the storm changes; you just reorient your umbrella. Your network isn't learning a new game just because you're feeding it data from a different game; it's just reorienting to the new data. Nothing is stored. Nothing can be seperated and recombined.
Adaptive Filtration
On that semantic justification, I propose setting aside the notion that we're working on AI at all, and identifying what we're really working on. "This isn't calculus, it's arithmetic." Well, this isn't AI. I'd like to call what we're doing right now Adaptive Filtration: we have a filter which turns data of form A into data of form B, and we adjust the filter to get the sorts of results we want, by looking at results to counterexamples.
The television knobs example, common in AI texts, provides an extremely convenient metaphor here. Let's say you have a cheap TV from the 80s, without the potential to replace it. You dropped it on the ground, which jostled all of the knobs off of your settings, and broke the piece of plastic holding all of the controls in place. Because you've been inside the TV before, you recognize that the position of the crack means that as soon as you turn your TV to your favorite station, all the TV setting knobs are going to fall off, and there'll be no way to change anything.
Yeah yeah, it's a stupid setup, but it gives me a situation where you have to get the settings as close to correct as possible on useless but correct data before switching to the data you actually want.
So, your favorite station is 13, because you're in about 80% of the USA and that's what PBS is on. Of course, all the lower channels are network TV, giving you a choice between reality shows, court muckraking and talk tv. Being a sapient individual, you don't want to watch that, but given the damage to the TV, that's the data you have to train your settings against. So, gritting your teeth to some Judge Judy, you start screwing with the color, the contrast, the V-Hold. You get the settings in a nice place, then you switch to your channel. Sure enough, the knobs fall off; you're stuck with your settings. But, you did a pretty good job. They're doing a Nova episode on space, so you can see that you didn't get the dark colors ('cause there's a lot of black in space) quite dark enough, because there's almost no black on Judge Judy to test against. But, the color balance is good, and it's generally bright enough, so you get a good-ish picture.
That's basically what most modern neural networks really do. They take a bunch of known data, adjust their internal knobs until the known data's results match its own results as closely as can be found, and then apply those same knobs to other new data and get results. That's not learning. You're not learning how color works when you try to adjust the color on a TV. You're just finding the right combination of values for an instance of a general problem. Granted, the knob sets of these networks are rather more complex, and the knobs interact - they don't have seperate effects; tweaking one tweaks others - but the principle is essentially thus.
And the upshot?
While everyone else is making neural networks and training them to learn about models and trends, I'll be making adaptive networks and orienting them towards data sets. The actions are the same, the code is the same, the results are the same, but the viewpoint is different. When respoken in these simpler more concrete terms, the actions suddenly seem far less dramatic, the results far less profound - and using them as simple tools becomes more reasonable both alone and in groups.
Description of Need
We get a lot from the formalization of systems. The Principia Mathematica - the Russel and Whitehead one, not the Newton one - is a good example; Gödel incompleteness probably wouldn't have been understood without it or something similar. You see other examples in Peano's axioms, the Turing machine and Turing completeness, various points in psychology (popularly known ones like Freud and Jung, or less well-known but better examples like the DSM-IV and the modern deliniations of disorder by affectation,) and so forth. There are probably good examples in more diverse fields, but my depth of knowledge of the history of science is framed to rather specific topics.
