Journal of A peelleAtrAlA0 t‘ACAccrn pvtished try tt'•• ON* fo, tmw ,TeEcLOolLoUgyTION jmnctm tecnnalacitt SSW :54: -01;9 22(I) -Detente, 2011 Nine Ways to Bias Open-Source AGI Toward Friendliness Ben Goeztzel and Joel Pitt Novamente LLC44 Journal ofEITAtiOn and Technology - Vol. 22 Issue I - December 2011 - pgs xx-yy Abstract While it seems unlikely that any method ofguaranteeing human-friendliness ("Friendliness") on the part of advanced Artificial General Intelligence (AGI) systems will be possible, this doesn't mean the only alternatives are throttling AGI development to safeguard humanity, or plunging recklessly into the complete unknown. Without denying the presence of a certain irreducible uncertainty in such matters, it is still sensible to explore ways of biasing the odd. in a favorable way, such that newly created Al systems are significantly more likely than not to be Friendly. Several potential methods of effecting such biasing arc explored here, with a particular but non- exclusive focus on those that am relevant to open-source AGI projects, and with illustrative examples drawn from the OpenCog open-source AGI project. Issues regarding the relative safety of open versus closed approaches to AGI arc discussed and then nine techniques for biasing AGIs in favor of Friendliness are presented: 1. Engineer the capability to acquire integrated ethical knowledge,: 2. Provide rich ethical interaction and instruction, respecting developmental stages,: 3. Develop stable, hierarchical goal systems.: 4. Ensure that the early stages of recursive self-improvement occur relatively slowly and with rich human involvement.. S. Tightly link AGI with the Global Brain,: b. Foster deep, consensus-building interactions between divergent viewpoints,: 7. Create a mutually supportive community of AGIs„, S. Encourage measured co-advancement of AGI software and AGI ethics theory, 9. Develop advanced AGI sooner not later. In conclusion, and related to the final point, we advise the serious co-evolution of functional AGI systems and AGI-related ethical theory as soon as possible, before we have so much technical infrastructure that panics relatively unconcerned with ethics arc able to rush ahead with brute force approaches to AG! development. I. Introduction Cantmented [RKB1J: The body of theanicle should be Artificial General Intelligence_tAGIb, like any technology, carries both risks and rewards. One science utdependcat or the abstract. which Is a summary of n. w the ehhreviation needs to he introduced hoc. fiction film after another has highlighted the potential dangers of AGI, lodging the issue deep in our Russell Blackford cultural awareness. Hypothetically, an AGI with superhuman intelligence and capability could dispense 2012-01.01 13:0800 EFTA01175884 with humanity altogether and thus pose an "existential risk" (Bostrom 2002). In the worst case, an evil but brilliant AGI, programmed by some cyber Marquis de Sade, could consign humanity to unimaginable tortures (perhaps realizing a modem version of the medieval Christian imagery of hell). On the other hand, the potential benefits of powerful AGI also go literally beyond human imagination. An AGI with massively superhuman intelligence and a positive disposition toward humanity could provide us with truly dramatic benefits, through the application of superior intellect to scientific and engineering challenges that befuddle us today. Such benefits could include a virtual end to material scarcity via advancement of molecular manufacturing, and also force us to revise our assumptions about the inevitability of disease and aging (Drexler1986). Advanced AO could also help individual humans grow in a variety of directions, including directions leading beyond our biological legacy, leading to massive diversity in human experience, and hopefully a simultaneous enhanced capacity for openmindedness and empathy. Eliezer Yudkowsky introduced the term "Friendly Al" to refer to advanced AGI systems that act with human benefit in mind (Yudkowsky 2001). Exactly what this means has not been specified precisely, though informal interpretations abound. Gocrtzel (2006a) has sought to clarify the notion in terms of three core values of "Joy, Growth and Freedom." In this view, a Friendly Al would be one that advocates individual and collective human joy and growth. while respecting the autonomy of human choice. Some (for example, De Garis 2005), have argued that Friendly Al is essentially an impossibility, in the sense that the odds of a dramatically superhumanly intelligent mind worrying about human benefit am vanishingly small, drawing parallels with humanityics own exploitation of less intelligent systems. Indeed, in our daily life, questions such as the nature of consciousness in animals, plants, and larger ecological systems are generally considered merely philosophical, and only rarely lead to individuals making changes in outlook, lifestyle or diet. If Friendly Al is impossible for this reason, then the best options for the human race would presumably be to either-avoid advanced AGI development altogether, or to else to fuse with AGI before the disparity between its intelligence and humanity's becomes too large, so that beings-originated-as-humans can enjoy the benefits of greater intelligence and capability. Some may consider sacrificing their humanity an undesirable cost. The concept of humanity, however, is not a stationary oneetwicept, and can only be viewed as sacrificed from only our contemporary perspective of what humanity is. With our cell phones, massively connected world, and the inability to hunt, it's unlikely that IMI seem humanthe same species to the -the-humanity of the pavec wtothl-consider-os-paatheir CeMelletittd [RICI2J: This was a rather awkward mieseewhh CofteeftlioB-01 .4)*I-14-means-le-belittehtit- Just like an individual's self, the self of humanity will the shift to the humanity oldie past as the object oldie Or* Chit. inevitably change, and as we do not usually mourn losing our identity of a decade ago to our current self, Rouen Blackard our current concern for what we may lose may seem unfounded in retrospect. 201241-01 170[00 Others, such as Waser (200920013) have argued that Friendly Al is essentially inevitable, linking greater intelligence with greater cooperation. Waser adduces rEvidence from evolutionary and human history_-is adduced-in favor of this point, along with more abstract arguments such as the economic viability of Commented [IIKB3): Active ‘oice as better. cooperation over not cooperating. Rauell Blackford 211I2-01.01 13:0SBO Omohundro (2008) has argued that any advanced Al system will very likely demonstrate certain "basic Al drives," such as desiring to be rational, to self-protect, to acquire resources, and to preserve and protect its utility function and avoid counterfeit utility; these drives, he suggests, must be taken carefully into account in formulating approaches to Friendly Al. Yudkowsky (2006) discusses the possibility of creating AGI architectures that are in some sense "provably Friendly" — either mathematically, or else by very tight lines of rational verbal argument. However, several possibly insurmountable challenges face such an approach. First, proving mathematical results of this nature would likely require dramatic advances in multiple branches of mathematics. Second, such a proof would require a formalization of the goal of "Friendliness," which is a subtler matter than it might seem (Legg 2006; Lcgg 2006a). as formalization of human morality has vexed moral philosophers for quite some time. Finally, it is unclear the extent to which such a proof could be created in a generic, environment-independent way — but if the proof depends on properties of the physical environment, then 2 EFTA01175885 it would require a formalization of the environment itself, which runs up against various problems related to the complexity of the physical world, not to mention the current lack of a complete, consistent theory of physics. The problem of formally or at least very carefully defining the goal of Friendliness has been considered from a variety of perspectives. Among a list of fourteen objections to the Friendly Al concept, with suggested answers to each, Sotala (2011) includes the issue of friendliness being a vague oneconcept. A primary contender for this role is the concept of "Coherent Extrapolated Volition'. (CLV) suggested by Yudkowsky (2004), which roughly equates to the extrapolation of the common values shared by all people when at their best. Many subtleties arise in specifying this concept — e.g. if Bob Jones is often possessed by a strong desire to kill all Martians. but he deeply aspires to be a nonviolent person, then the CEV approach would not rate "killing Martians" as part of Bob's contribution to the CEV of humanity. Resolving inconsisteancies in aspirations and desires, and the different temporal scales involved for each, is another non-trivial problem. One of the authors. Goenzel (2010), has proposed a related notion of Coherent Aggregated Volition Commented [RKI14): The piece limed to your pdf as Geeruel 101Uu wk. actually rthlished in YO). when I goon the link (CAV), which eschews some subtleties of extrapolation, and instead seeks a reasonably compact, provided. Thai IIWOR that the piece hoed as 2010h hmoroes jun coherent, and consistent set of values that is close to the collective value-set of humanity. In the CAV 1010. Buie is no other piece that ca jun Gxnnel in 1010. approach, -killing Martians"' would be removed from humanity's collective value-set because it's Ruucll Blackford 2012.014 l 134S:00 assumedly uncommon and not part of the most compact/coherent/consistent overall model of human values, rather than because of Bob Jones's aspiration to nonviolence. More recently we have considered that the core concept underlying CAV might be better thought of as CBV or :Coherent Blended Volition:-: CAV seems to be easily misinterpreted as meaning the average of different views, which was not the original intention. The CBV terminology clarifies that the CBV of a diverse group of people should not be thought of as an average of their perspectives, but as something more analogous to a "conceptual blend" (Fauconnier and Turner 2002) — incorporating the most essential elements of their divergent views into a whole that is overall compact, elegant and harmonious. The subtlety here (to which we shall return below) is that for a CBV blend to be broadly acceptable, the different parties whose views arc being blended must agree to some extent that enough of the essential elements of their own views have been included. Multiple attempts at axiomatization of human values have also been attempted. In one case this is done with a view toward providing near-term guidance to military robots (from Arkin (2009)'s excellent though chillingly-titled book Governing Lethal Behavior in Autonomous Robots). However, there are reasonably strong arguments that human values (and similarly a human's language and perceptual classification) are too complex and multifaceted to be captured in any compact set of formal logical rules. Wallach and lAllentkiesi (2010) have made this point eloquently, and argued for the necessity of fusing top-down (e.g. Commented [RKB5): See are in the hiblioFraphy on this Russell Blackford formal logic based) and bottom-up (e.g. self-organizing learning based) approaches to machine ethics. 2012.0141 13 4540 1.1 Modes ofAG! development Other sociological considerations also arise. For example, it is sometimes argued that the risk from highly- advanced AGI going morally awry on its own may be less than that of moderately-advanced AGI being used by a human being to advocate immoral ends. This possibility gives rise to questions about the ethical value of various practical paths of AGI development, for instance: Should AGI be developed in a top-secret installation by a select group of individuals? Individuals selected for a combination of technical and scientific brilliance, moral uprightness, or any other qualities deemed relevant (a "closed approach")? Or should it be developed in the open, in the manner of open-source software projects like Linux (an "open approach")? The open approach allows the collective intelligence of the world to participate more fully-partit•pette - but also potentially allows unscrupulous elements of the human race to take some of the publicly- 3 EFTA01175886 developed AGI concepts and tools private, then privately develop than into AGIs with selfish or evil purposes in mind. Is there some meaningful intermediary between these extremes? Should governments regulate AGI, with Friendliness in mind (as advocated carefully by c.g Hibbard (2002))? Or will this just cause AGI development to move to the handful of countries with more liberal policies? Or will it cause development to move underground, where nobody can see the dangers developing? Clearly, there are many subtle and interwoven issues at play here, and it may take an AGI beyond human intelligence to unravel and understand them all thoroughly. Our goal here is more modest: to explore the question of how to militate in favor ofpositive. Friendly outcomes. Some of our suggestions arc fairly generic, but others are reliant on the assumption of an open rather than closed approach. The open approach is currently followed in our own AGI project, hence its the-properties of-this-approaehare those we're most keen to avid about lexploreinsl. Commented [IUM6): This m just a suggestion. but I think it makes the samance clearer and a bit less malcanrd. Rauell Blackford While we would love to be proven wrong on this, our current perspective is that provably, or otherwise 2014.01-01 13:O8:00 guarantee-ably, Friendly Al is not achievable. On the face of it, achieving strong certainty about the future behaviors of beings massively more generally intelligent and capable than ourselves seems somewhat implausible. Again, we arc aiming at a more modest goal — to explore ways of biasing the odds, and creating AI systems that are significantly more likely than not to be Friendly. While the considerations presented here arc conceptually fairly generic, we will frequently elaborate them using the example of the OpenCog (Goerizel at al. 2010lima; Hart and Goenzel 2008) AGI framework Commented [IUM7): The item you seem to ratan is one of two on which we are currently working, and the specific OpenCog applications now under development, that could be detenbed at Gocruel Cl Ali gets Rua cattalo:stied because they hate different co.ambors. See the bibliography on this. including game Al. robotics, and natural language conversation. where I try to deal with a. and see what you dunk. Ritual Blackford 2m2.0141 I 3:03:00 Is open or closed AGI development safer? We will not seek here to argue rigorously that the open approach to AGI is preferable to the closed approach. Rather, our goal here-is to explore ways to make AGI more probably Friendly, with a non- exclusive focus on open approaches. We do believe intuitively that the open approach is probably preferable, but our reasons are qualitative and we recognize there arc also qualitative arguments in the opposite direction. Before proceeding further, we will briefly sketch some of the reasons for our intuition on this. First, we have a strong skepticism about_of self-appointed elite groups that claim that they know what's best for everyone (even if they are genuine saints), and a healthy respect for the power of collective intelligence and the Global Brain (Heylighen 2007), which the open approach is ideal for tapping. On the other hand, we also understand the risk of terrorist groups or other malevolent agents forking an open source AGI project and creating something terribly dangerous and destructive. Balancing these factors against each other rigorously. is impossible, due to the number of assumptions currently involved. For instance, nobody really understands the social dynamics by which open technological knowledge plays out in our current world, let alone hypothetical future scenarios. Right now there exists open knowledge about many very dangerous technologies, and there exist many terrorist groups, yet these groups fortunately make scant use of these technologies. The reasons why—appear to be essentially sociological — the people involved in terrorist groups tend not to be the ones who have mastered the skills of turning public knowledge oft, cutting-edge technologies into real engineered systems. While it's easy to observe this sociological phenomenon, we certainly have no way to estimate its quantitative extent from first principles. We don't really have a strong understanding of how safe we are right now, given the technological knowledge available via the Internet. textbooks, and so forth. Relatively straightforward threats such as nuclear proliferation remain confusing, even to the experts. 4 EFTA01175887 The open approach allows for various benefits of open source software development to be applied, such as Linus's law (Raymond 2000): "Given enough eyeballs, all bugs am shallow." Software development practice has taught us that in the closed approach it's very hard to get the same level of critique as one obtains on a public, open codebase. At a conceptual level of development, a closed approach also avoids making it possible for external theorists to find specific flaws in a design. Discussing the theoretical basis for Friendliness design is all very well, but implementing and designing a system thate conforms to that design is another. Keeping powerful AGI and its development locked up by an elite group doesn't really provide reliable protection against malevolent human agents either. History is rife with such situations going awry, such as the leadership of the group being subverted, brute force being inflicted by some outside party, or a member of the elite group defecting to some outside group in the interest of personal power. reward, or internal group disagreements. Them arc many things that can go wrong in such situations, and the confidence of any particular group that-that it is they are immune to such issues, cannot be taken very seriously. Clearly, neither the open nor closed approach qualifies as a panacea. 3, The (unlikely) prospect of government controls on AGI development Given the obvious long-term risks associated with AGI development, is it feasible that governments might enact legislation intended to stop Al from being developed? Surely government regulatory bodies would slow down the progress of AGI development in order to enable measured development of accompanying ethical tools practices, and understandings? This however seems unlikely, for the following reasons. Let us consider two cases separately. First, there is the case of banning AGI r%earch and after an "AGI Sputnik" moment has occurred. We define an AGI Sputnik moment as a technological achievement that makes the short- to medium-term possibility of highly functional and useful human-level AGI broadly evident to the public and policy makers, bringing it out of the realm of science fiction to reality. Second, we might choose to ban it before such a moment has happened. After an AGI Sputnik moment, even if some nations chose to ban Al technology due to the perceived risks, others wouldill probably proceed eagerly with AGI development because of due to the wide-ranging perceived benefits. International agreements arc difficult to mach and enforce, even for extremely obvious threats like nuclear weapons and pollution, so it's hard to envision that such agreements would come rapidly in the cast of AGI. In a scenario where some nations ban AGI while others do not, it seems the slow speed of international negotiations would contrast with the rapid speed of development of a technology in the midst of revolutionary breakthrough. While worried politicians sought to negotiate agreements, AGI development would continue, and nations would gain increasing competitive advantage frontdoe to their differential participation in it. The only way it seems feasible for such an international ban to come into play, would be if the "AGI Sputnik" moment turned out to be largely illdelusory because rgueh-that-Ihe path from the moment to full human-level AGI turned out to be susceptible to severe technical bottlenecks. If AGI development somehow slowcds after the AGI Sputnik moment, then there miuhtay be time for the international community to set up a system of international treaties similar to what we now have to control nuclear weapons research. However, we note that the nuclear weapons research ban is not entirely successful — and that nuclear weapons development and testing tend to have large physical impacts that arc remotely observable by foreign nations. On the other hand, if a nation decides not to cooperate with an international AGI ban, this would be much more difficult for competing nations to discover. 5 EFTA01175888 An unsuccessful attempt to ban AGI research and development =could end up being far riskier than no ban. An international A(41-= ban that was systematically violated in the manner of current international nuclear weapons bans would have-the-.e pitel-ott shifting AGI development from cooperating developed nations to "rogue nations," thus slowing down AGI development somewhat, but also perhaps decreasing the odds of the first AGI being developed manner that is concerned with ethics and Friendly Al. Thus, subsequent to an AGI Sputnik moment, the overall value of AGI will be too obvious for AGI to be effectively banned, and monitoring AGI development would be next to impossible. The second option is an AGI ban earlier than the AGI Sputnik moment — before it's too late. This also seems highlyinfeasible. for the following reasons: cammentee [MUD I don't dunk carnotite:1g can At "lushly infearoble" it arbor Wearable or tic trxrcly beartly onproatorl Russell Blackford • Early stage AGI technology will supply humanity with dramatic economic and quality of 2012-01.01 13:0800 life improvements, as narrow AI does now. Distinguishing narrow Al from AGI from awe-,government policy perspective would also be prohibitively difficult. • If one nation chose to enforce such a slowdown as a matter of policy, the odds seem very high that other nations would explicitly seek to accelerate their own progress on Al/AGI, so as to reap the ensuing differential economic benefits. To make the point more directly, the prospect of any modern government seeking to put a damper on current real-world narrow-AI technology seems remote and absurd. It's hard to imagine the US government forcing a roll-back from modern search engines like Google and Bing to more simplistic search engines like 1997 AltaVista on the basis thathet,itniie the former embody natural language processing technology that represents a step along the path to powerful AGI. Wall Street firms (that currently have powerful economic influence on the US government) will not wish to give up their Al-based trading systems, at least not while their counterparts in other countries arc using such systems to compete with them on the international currency futures market. Assuming the government did somehow ban AI-based trading systems, how would this be enforced? Would a programmer at a hedge fund be stopped from inserting some more-effective machine learning code in place of the government-sanctioned linear regression code? The US military will not give up their Al- based planning and scheduling systems, as otherwise they would be unable to utilize their military resources effectively. The idea of the government placing an IQ limit on the Al characters in video games, out of fear that these characters might one day become too smart, also seems absurd. Even if the government did so, hackers worldwide would still be drawn to release "mods" for their own smart Als inserted illicitly into games; and one might see a subculture of pirate games with illegally smart AI. "Okay, but all these examples are narrow AI, not AGI!" you may argue,: - Banning AI that occurs embedded inside practical products is one thing; banning autonomous AGI systems with their own motivations and self-autonomy and the ability to take over the world and kill all humans is quite another!" Note though that the professional Al community does not yet draw a clear border between narrow AI and AO. While we do believe there is a clear qualitative conceptual distinction, we would find it hard to embody this distinction in a rigorous test for distinguishing narrow Al systems from "proto-AGI systems" representing dramatic partial progress toward human-level AGI. At precisely what level of intelligence would you propose to ban a conversational natural language search interface, an automated call center chatbot, or a house-cleaning robot? How would you rigomusiy distinguish rigorously, across all areas of application, a competent non-threatening narrow-Al system from something with sufficient general intelligence to count as part of the path to dangerous AGI? EFTA01175889 A recent workshop of a dozen AGI experts, oriented largely toward originating such tests, failed to come to any definitive conclusions (Adams et al. 2010), recommending instead that a looser mode of evaluation be adopted, involving qualitative synthesis of multiple rigorous evaluations obtained in multiple distinct scenarios. A previous workshop with a similar theme, funded by the US Naval Research Office, came to even less distinct conclusions (Laird et al. 2009). The OpenCog system is explicitly focused on AGI rather than narrow Al, but its various learning modules are also applicable as narrow Al systems, and some of them have largely been developed in this context. In short, there's no rule for distinguishing narrow Al wodc from proto-AGI work that is sufficiently clear to be enshrined in government policy, and the banning of narrow Al work seems infeasible as the latter is economically and humanistically valuable, tightly interwoven with nearly all aspects of the economy. and nearly always non-threatening in nature. Even in the military context, the biggest use of Al is in relatively harmless-sounding contexts such askke back-end logistics systems, not in setry.frightenine applications like killer robots. Surveying history, one struggles to find good examples of advanced, developed economies slowing down development of any technology with a nebulous definition, obvious wide-ranging short to medium term economic benefits, and rich penetration into multiple industry sectors, due to reasons of speculative perceived long-term risks. Nuclear power research is an example where government policy has slowed things down, but here the perceived economic benefit is relatively modest, the technology is restricted to one sector, the definition of what's being banned is very clear, and the risks are immediate rather than speculative. More worryingly, nuclear weapons research and development continued unabated for years, despite the clear threat it posed. In summary, we submit that, due to various aspects of the particular nature of AGI and its relation to other technologies and social institutions, it is very unlikely to be explicitly banned, either before or alter an AGI Sputnik moment. If one believes the creation of AGI to be technically feasible, then the more pragmatically interesting topic becomes how to most effectively manage and guide its development. 4s Nine ways to bias AGI toward Friendliness There is no way to guarantee that advanced AGI, once created and released into the world, will behave according to human ethical standards. There is irreducible risk here, and in a sense it is a risk that humanity has been moving towards, at accelerating speed, ever since the development of tools, language, and culture. However, there am things we can do to bias the odds in the favor of ethically positive AGI development. The degree of biasing that can be achieved seems impossible to mtaittitutively-estimate quantitatively, and any extrapolation from human history to a future populated by agents with significantly transhuman general intelligence. has an obvious stmng risk of being profoundly flawed. Nevertheless, it behooves us to do our best to bias the outcome in a positive direction, and the primary objective of this paper is to suggest some potential ways to do so. 4.1 Engineer the capability to acquire integrated ethical knowledge First of all, if we wish our AGI systems to behave in accordance with human ethics, we should klesignarebiteei them to be capable of the full range of human ethical understanding and response. As commented [RKB9): You men want in change dm heck S I disk moss tethers would Ems Mtshaw" used as a verb very reviewed in Goertzel and Bugaj (2008) and Goenzel (20091:), human ethical judgment relies on the •wtwurd ant 'newt Even meal archmom u/k Axed &signing coordination and integration of multiple faculties. One way to think about this is to draw connections houses. mhos thin arthammg houses. between the multiple types of human memory (as studied in cognitive psychology and cognitive Rumen Rackiord 2012.01-0I I0:4'S0 neuroscience) and multiple types of ethical knowledge and understanding. To wit: commented ERKII101: Your ni muilly says Goedrel 20102. Rtt the hem you hued at 2010a is o Nog pm Ma was timidly • Episodic memory corresponds to the process of ethically assessing a situation based on published in Septeenlmr 2154. similar prior situations. Russell Dianktord 2012.0141 I 3:08:00 EFTA01175890 • Sensorimotor memory corresponds to "mirror neuron" (Rizzolatti and Craighcro 2004) type ethics. where you feel another person's feelings via mirroring their physiological emotional responses and actions. • Declarative memory corresponds to rational ethical judgment. • Procedural memory corresponds to "ethical habit":: learning by imitation and reinforcement to do what is right, even when the reasons aren't well articulated or understood. • Attentional memory corresponds to the existence of appropriate patterns guiding one to pay adequate attention to ethical considerations at appropriate times. • Intentional memory corresponds to ethical management of one's own goals and motivations (e.g. when do the ends justify the means?). We argue that an ethically mature mind, human or AGI, should balance all these kinds of ethics. although none 1". are - completely independent of the others. How these memory types relate to ethical behavior and understanding depends somewhat on the cognitive architecture in question. For instance, it is straightforward to identify each of these memory types in the OpenCog architecture, and articulate therein their intuitive relationship to ethical behavior and understanding: • Episodic memory: Through placing OpenCog in ethical scenarios with a teacher agent that provides feedback on choices, and with OpenCog's goal system initially biased to seek approval from the teacher. • Sensorimotor memory: Knowledge is usually contextually represented within the OpenCog AtomSpace (a weighted hvpergraph-like—databasel knowledge base). A Commented We sbmild gat it seas to define it the perceptual interface that takes on the role of mirror neurons may activate contexts first time its mentioned. Rumen BlackiMd representing another's emotional state, causing that context to move into the attentional 2012.m-01 13:O800 focus of OpenCog. In this way, OpenCog becomes sensitive to the emotional state of other agents it has interacted with and modelled the world view of. Then, through induction or pattern mining these changes in emotionalve state can be mapped on to new agents that the Al is unfamiliar with. • Declarative memory: Declarative ethical knowledge may be embedded as a seed within the OpenCog AatomSspace (a v.e.ighttd Ilyftergraph like th,L,ba:,:1, or built from data Commented [RKB12): This i show you spelled lccollier. We mining episodic memory for patterns learned during ethical teaching,1,4his knowledge need to be consiaem. Biala Blackford can then be reasoned fromabout on-using probablistic logic to make ethical decisions in 2012.0I.0 I 13:05:00 novel situations. • Procedural memory: The development of new schema can be based on previous experience. Schema that have previously already-been evalutated in the same or similar ethical scenarios can be used to guide the construction of new program trees. • Attentional memory: OpenCog has networks of attention that can implicitly store attentional memories. These memories form from observation of temporal patterns of knowledge access, and their relative importance to goal fulfillment. Once formed they Commented ERKII13]: • not ware ulna, feftleate INS IS to degrade slowly and may provide resilience against potentially unethical replacements if The original juts says Gemini et al NW. hut therm ale MO hems initially taught ethical behavior (Crocrtzel et al. 2010g). that could be cited Mai way. See my mite m the b,hhography, Russell Blackford 201201-01 13:08:00 8 EFTA01175891 • Intentional memory (memory regarding goals and subgoals): OpenCog expresses explicit goals declaratively using uncertain logic, but also expresses implicit goals using "maps" recording habitual patterns of activity, created and stored via attentional memory. Also worth noting in this context is the theory of "Stages of Ethical Development in Artificial General Intelligence Systems" presented in Goertzel and Bugaj (2008). This theory integrates, among other aspects, Kohlberg's (1981) theory of logical ethical judgment (focused on justice and declarative knowledge) and Gilligan's (1982) theory of empathic ethical judgment (focused on interpersonal relationships formed from episodic and sensorimotor memory). In this integrated theory, as shown in Tables 1, 2. and (see Appendix), it is asserted that, to pass beyond childish ethics to the "mature" stage Commented [RXIII4J: la the pdfwnion you haw the tables at of ethical development, a deep and rich integration of the logical and empathic approaches to ethics is the bade of tbe paper tax die botliegraphy. Does Rae, so I retained it- Tit airnplest way to make this work is to call diem an Appendix. required. Here we suggest a slight modification to this idea: to pass to the mature stage of ethical Rotten Blackford development, a deep and rich integration of the ethical approaches associated with the five main types of 20124141 poitoo memory systems is required. Of course, there arc likely to be other valuable perspectives founded on different cognitive models, and this is an area wide open for further exploration both conceptually and empirically. 4.2 Provide rich ethical interaction and instruction, respecting developmental stages Of course, a cognitive architecture with capability to exercise the full richness of human ethical behavior and understanding is not enough — there next arises the question of how to fill this cognitive architecture with appropriate "ethical content." Just as human ethics are considered a combination of nature and nurture, so we should expect for AGI systems. AGI systems are learning systems by definition, and human values are complex and best conveyed via a combination of methods in order that they become well grounded. In Goertzel (2009a) the memory types listed in the previous section are associated with different common modes of human communication: Memory Type Communication 'Description Type sensorimotor depletive in which an agent creates some sort of (visual, auditory, etc.) construction to show another agent, with a goal of causing the other agent to experience phenomena similar to what they would experience upon experiencing some particular entity in the shared environment episodic dramatic in which an agent creates an evocation of specific scenes or episodes in which to evoke particular real or imagines episodes in the other agent's mind declarative linguistic communication using language whose semantics are largely (not necessarily wholly) interpretable based on the mutually experienced world procedural demonstrative in which an agent carries out a set of actions in the world, and the other agent is able to imitate these actions, or instruct another agent as to how to imitate these actions attentional indicative in which e.g. one agent points to some part of the world or delimits some intent' of time, and another agent is able to interpret the meaning intentional intentional in which an agent explicitly communicates to another agent what its goal is in a certain situation (in humans this relates closely to mirror neuron activity; Rizzolatti and Craighero EFTA01175892 I I I 12004) Our suggestion is that AGIs should be provided with ample ethical instruction using all of the above Cemmented [ItKB15]: When we do the very final fanning wen have to make sum that the table ira II cmtheme page. communication modalities. During this instruction, respect for modem thinking about progressive Rasta Blackford education will be important. Among this thinking is that it is important to tailor ethical instruction to the 2012.0I.0 I 13:08,00 student's stage of cognitive and ethical development. Instructions on the abstract nature of justice are not likely to be helpful to an AGI that hasn't yet teamed the practicalities of sharing with its peers - at that early stage, abstract ethical instructions would constitute ungrounded declarative knowledge, and the AGI system would have a hard time grounding them and integrating them with its overall world- view and Commented (ItKB161: We needle beconsistent with how we express it in all the different forms of memory available to it. Whereas after an AGI has learned some of spell this. Maybe jurrwaldriewl Rama Blackford the everyday aspects of justice, including the balance of justice with empathy in everyday life, and once it 20[2.0141 l3:08:00 has also gotten familiar with the application of abstract principles to other aspects of ordinary life, it will be well poised to appreciate abstract ethical principles and their utility in making difficult decisions — it will be able to understand the abstract nature of justice in a richer and mom holistic way. More concretely, to make just a few obvious points: 1. The teacher(s) should be observed to follow their own ethical principles, in a variety of contexts that are meaningful to the AGI. Without it, declarative memory may clash with episodic (or other memory types). However, at the same time, perceived inconsisteancies in the behavior of the teacher may hint at subtlities in human ethics which the AGI was not previously aware of. In such a case, questioning the teacher on this discrcpaency may refine the AGI's understanding. 2. The system of ethics must be relevant to the AGI's life context. and embedded within asthcir understanding of the world. Without this, episodic memories may not he sufficiently similar to new situations to engage an ethical action or response when it should. 3. Ethical principles must be grounded in both theory-of-mind thought experiments (emphasizing logical coherence). and to real: life situations in which the ethical trainee is required to make a moral judgment and is rewarded or reproached by the teacher(s). The feedback should also include imparting explanatory augmentations to the teachings regarding the reason for a particular decision on the part of the teacher. For example, in our current application of OpenCog to control intelligent game characters, we intend to have human players take the role of the teacher in a shared sandbox environment. The AGI can not only interact with the teacher through dialogue and action, but can also observe the teacher interacting with other humans and AGIs, including how they are rewarded or chastised. Initially, teaching should occur for each embodiment option: each game world in which an AGI has an virtual avatar, and each robotic body available to the AGI. Eventually, a sufficient corpus of varied episodic knowledge will allow the AGI to extract far—commonalities between embodied instances which, in turn. will heir—ereeteencourace commensurabilitv-io he-extraete41. Untenanted [LMG17]: An abinwenl clantkaion. but feel free to change bah Linda MacDonald Glenn 4.3 Create stable, hierarchy-dominated goal systems 2012.0343 IS 47BU Commented [RKB18]: By an means. change this tack if ii One aspect of cognitive architecture is especially closely associated with ethical issues: goals and doesn't cepa your meaning. If a does. though. I think .ez clearer motivations. This is an area where, we suggest, the best path to creating highly ethical AGI systems may and marten. Russell Blackford be to deviate from human cognitive architecture somewhat. 20I2.0I .0I l3:08:00 Some may perceive this as a risky assertion — since, after all, the human cognitive architecture is moderately well understood, whereas any new direction will bring with it additional uncertainties. However, the ethical weaknesses of the human cognitive architecture arc a

EFTA01175884.pdf

Entities

Document Metadata