[期刊论文][Full-length article]


Alien coding

作   者:
Thibault Gauthier;Miroslav Olšák;Josef Urban;

出版年:2023

页    码:109009 - 109009
出版社:Elsevier BV


摘   要:

We introduce a self-learning algorithm for synthesizing programs that provide explanations for OEIS sequences. The algorithm starts from scratch initially generating programs at random. Then it runs many iterations of a self-learning loop that interleaves (i) training neural machine translation to learn the correspondence between sequences and the programs discovered so far, and (ii) proposing many new programs for each OEIS sequence by the trained neural machine translator. The algorithm discovers on its own programs for more than 78000 OEIS sequences, sometimes developing unusual programming methods. We analyze its behavior and the invented programs in several experiments. Introduction Galileo once said, “Mathematics is the language of Science.” Hence, facing the same laws of the physical world, alien mathematics must have a good deal of similarity to ours. – R. Hamming - Mathematics on a Distant Planet [12] Certainly, let us learn proving, but also let us learn guessing. – G. Polya - Mathematics and Plausible Reasoning [25] Proposing explanations for abstract patterns is one of the main occupations of mathematicians. Polya famously argued that educated guessing is as important skill as theorem proving [25]. Integer sequences are perhaps the most common kind of mathematical patterns. In this work, we describe a self-learning system – a positive feedback loop between learning, search and verification – that can in 190 iterations discover from scratch programs and explanations for more than 78000 sequences from the On-Line Encyclopedia of Integer Sequences (OEIS) [29]. Such positive feedback loops between learning, search and verification are, in our opinion, among the most interesting objects of study in today's Artificial Intelligence (AI). In automated theorem proving and reasoning, such loops – typically interleaving deductive search with learning its guidance – have been researched for over 15 years in several contexts [33], [36], [16], [15], often leading to large self-improvements of the theorem proving systems. However, guessing and conjecturing [26], [35], [21], [22] are still relatively unexplored skills and subsystems in today's mainstream automated theorem provers (ATPs). Today, perhaps the most widely known and cited examples of self improvement which combine learning and search are in games such as Go and Chess, which were proposed as an interesting abstract playground for developing machine intelligence already by Turing [32]. An early example of such a reinforcement learning (RL) system is MENACE [20] which can learn to play noughts and crosses on its own and much faster than brute-forcing all possible solutions. Recently, with a residual network as a machine learner, AlphaGoZero [28] has learned to play Go better than professional players using only self-play. In this process, the system discovered many effective moves that went against 3000 years of human wisdom. The early 3-3 invasion was considered a bad opening move but is now frequently used by professionals. Similarly, our system presented here is capable of discovering on its own possibly unusual ( alien ) mathematical formulas and programs without any human guidance, inventing, for example, over 20 different definitions of (pseudo-)primes (Table C.3). Ultimately, RL systems still suffer from some human biases, not least because their different search and learning modules were coded by humans. Inspired by biological evolution, “AI life” [18] attempts to remove almost all human bias. It only sets the rules (physics) of the simulated world and lets the programs evolve freely. A famous example of such a system is Conway's Game of Life [7]. Since such systems are open-ended, they yet have to be turned into useful tools. Indeed, it is even a hard problem to detect when such systems invent useful programs. In this work, we use the OEIS setting to experiment with two different high-level heuristics that guide the overall evolution of the system that invents explanations: Occam's razor (the law of parsimony ), i.e., the length of the explanations, and efficiency , i.e., the time which it takes to evaluate an invented explanation as a program that takes integers as inputs. Occam's razor has been one of the most important heuristics guiding scientific research in general. In a machine learning setting, this can be seen as a form of regularization. A mathematical proof of this principle relying on Bayesian reasoning and assumptions about the computability of our universe is provided in Solomonoff's theory of inductive inference [30]. There is however an interesting tradeoff between short explanations and their efficiency when used as declarative programs and building blocks of more complex programs. A short but inefficient program may be too expensive to evaluate, preventing us from verifying whether it really is an explanation for a longer integer sequence, and thus further learning from such an explanation. In this work, we therefore use a combined fitness function for our growing and evolving population of programs: a program is the fittest if it is either the shortest one or the most efficient one. Overview and contributions. Our approach relies on a self-learning loop to learn how to find programs generating integer sequences. It alternates between the three phases represented in Fig. 1. During the search phase , our machine learning model synthesizes programs for integer sequences. In this work, we predominantly use neural machine translation (NMT) as the machine learning component. For each OEIS sequence, the NMT trained on previous examples typically creates 240 candidate programs using beam search . In the first iteration of this loop, programs are randomly constructed. Then, during the checking phase , the proposed millions of programs are checked to see if they generate their target sequence or any other OEIS sequences. The smallest and fastest programs generating an OEIS sequence are kept to produce the training examples. In the learning phase , NMT trains on these examples to translate the “solved” OEIS sequences into the best program(s) generating them. This updates the weights of the NMT network which influences the next search phase. Each iteration of the self-learning loop leads to the discovery of more solutions, as well as to the optimization of the existing solutions. This work builds mostly on our previous work in [10]. Our contributions are the following. The tree neural network is replaced by a relatively fast encoder-decoder NMT network (bidirectional LSTM with attention) and the previously used MCTS search is replaced by a relatively wide beam search during the NMT decoding. Our objective function lets us collect both the smallest and fastest programs for each sequence instead of only the smallest programs in [10]. We introduce local and global definitions to the programming language. These definitions are created automatically and the newly defined symbols (tokens) can be used by the NMT network to build more complex programs. Finally, our longest run finds solutions for more than 78000 OEIS sequences in 190 iterations, and all our experiments have so far together produced solutions for 84587 OEIS sequences. This is more than three times the number (27987) invented in our first experiments [10]. The rest of this paper is structured as follows. Section 2 describes in more detail the basic components of our system and our self-learning loop for discovery of mathematical explanations: the OEIS datasets, our formula (programming) language including local and global definitions, and the checking phase which implements the language of our expressions and their efficient evaluation. In Section 3, we describe the NMT network, its parameters, and the different training techniques such as continuous training and combined training. These methods contribute significantly to the final performance of the NMT runs. In Section 4, we do side experiments with the original tree neural network [10], optimizing its different parameters such as the embedding size, and the search strategy. The final experiment of this section relies on the best parameters and is run for 500 iterations (instead of 25 in [10]) to create a strong baseline. In Section 5, we give details about our long-running experiments with the NMT network. We observe how they improve over the TNN baseline and analyze how introducing local macros and global macros affects the performance. And last, in Section 6, we analyze the solutions produced during these runs and their evolution over time. We also measure the generalization performance of the smallest and the fastest solutions and show that the smallest solutions generalize better. Section snippets Components In this section, we give a technical description of the components of our system: the OEIS datasets, the programming language and its representations, and the checking phase. OEIS synthesis as an NMT task Neural networks have in the last decade become competitive in language modeling and machine translation tasks, leading to applications in many areas. In particular, recurrent neural networks (RNNs) with attention [5] and transformers [37] have been recently applied in mathematical and symbolic tasks such as rewriting [24], autoformalization [38] and synthesis of mathematical conjectures and proof steps. Many of these tasks are naturally formulated as sequence-to-sequence translation tasks. NMT Experiments with a tree neural network In our previous work [10], a tree neural network (TNN) serves as a machine learning model. In the following, we test how varying a range of parameters influences the self-learning process. To test the limit of the TNN, we run the TNN-guided learning loop for 500 generations instead of the original 25. This last experiment of this section also provides a baseline for the NMT experiments. Varying parameters. We investigate the effect of three different parameters: the TNN embedding size, the Basic run ( nmt 0 ) In the basic NMT run4 ( nmt 0 ), we are running the loop in a way that is most similar to the TNN run. In particular, the checking phase is interleaved with training a single NMT model 0 , which is then used for the search phase, implemented as NMT inference using a wide beam search. As in the TNN experiment, we start with the initial random iteration which yields 3771 solutions. Then we run 100 iterations of the NMT-based learn-generate-check Analysis of the results We provide a statistical analysis of the 78118 solutions found during the nmt 1 run and discuss the details of some techniques developed by our system. More information on the nmt runs is available in our anonymous repository.11 For some sequences, its subdirectory12 contains our analysis of the evolution1314 and proliferation1516 of important programs such as Related work The closest related recent work is [8], done in parallel to our project. The goal there is similar to ours but their general approach is focused on training a single model using supervised learning techniques on synthetic data. In contrast, our approach is based on reinforcement learning: we start from scratch and keep training new models as we discover new OEIS solutions. The programs generated in [8] are recurrence relations defined by analytical formulas. Our language seems to be more Conclusion As of January 25, 2023, all of the runs have together invented from scratch solutions for 84587 OEIS sequences. This is more than three times the number (27987) invented in our first experiments [10]. This is due to several improvements. We have started to collect both the small and fast solutions and learn jointly from them. We have seen that this gradually leads to a large speedup of the fast programs, as the population of programs evolves. This likely allows invention of further solutions Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements We thank Chad Brown, David Cerna, Hugo Cisneros, Tom Hales, Barbora Hudcova, Jan Hula, Mikolas Janota, Tomas Mikolov, Jelle Piepenbrock, and Martin Suda for discussions, comments and suggestions. This work was partially supported by the CTU Global Postdoc funding scheme (TG) , Czech Science Foundation project 20-06390Y (TG) , ERC-CZ project POSTMAN no. LL1902 (TG) , Amazon Research Awards (TG, JU), EU ICT-48 2020 project TAILOR no. 952215 (JU), and the European Regional Development Fund under the References (39) R.J. Solomonoff A formal theory of inductive inference. Part I Inf. Control (1964) Marcin Andrychowicz et al. Hindsight experience replay Adv. Neural Inf. Process. Syst. (2017) Jasmin Christian Blanchette et al. Nitpick: a counterexample generator for higher-order logic based on a relational model finder Lukas Bulwahn The new Quickcheck for Isabelle - random, exhaustive and symbolic testing under one roof Natasha Butt et al. Program synthesis for integer sequence generation Jan Chorowski et al. Attention-based models for speech recognition Koen Claessen et al. Automating inductive proofs using theory exploration John Conway The game of life Sci. Am. (1970) Stéphane d'Ascoli et al. Deep symbolic regression for recurrent sequences Kevin Ellis et al. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning Gauthier Thibault et al. Learning program synthesis for integer sequences from scratch Richard Wesley Hamming Mathematics on a distant planet Am. Math. Mon. (1998) Edvard K. Holden et al. Heterogeneous heuristic optimisation and scheduling for first-order theorem proving Jan Jakubův et al. Hierarchical invention of theorem proving strategies AI Commun. (2018) Jan Jakubův et al. Hammering Mizar by learning clause guidance Cezary Kaliszyk et al. Reinforcement learning of theorem proving Christian Krause LODA: an assembly language, a computational model, and a distributed tool for mining programs Christopher G. Langton Artificial Life: An Overview (1997) View more references Cited by (0) Recommended articles (6) Research article Novel optimistic and pessimistic family of OWA operator with constant orness International Journal of Approximate Reasoning, Volume 161, 2023, Article 109006 Show abstract In this paper, we present a novel representative of the existing family of ordered weighted aggregation (OWA) operators with constant orness (optimism/pessimism level). In other words, orness value of weight vectors generated using the proposed operator is independent of the total number of arguments. The proposed OWA operator is the operator that is based on the beta function. Two different types of operators have been proposed with respect to the level of optimism (“or-like”) and pessimism (“and-like”) each. In addition to that the proposed operators discussed here are bi-parametric and real-valued in nature in terms of their parameters. Hence, for a given orness value (optimism or pessimism), it provides an infinite number of weight vectors as the proposed operator's orness depends on the value of one of its parameters. These properties lead to lots of flexibility for the Decision-Maker (DM). Its comparison and advantages over other existing members of the OWA family with constant orness such as S-OWA operators have been elaborated. Also, the maximum Rényi entropy of the OWA (MRE-OWA) operator and the maximum Rényi entropy of the proposed OWA operator are compared and evaluated for a specified orness level. Several other important properties and applications of the family of the proposed operators have also been analyzed in detail. Research article Addressing ambiguity in randomized reinsurance stop-loss treaties using belief functions International Journal of Approximate Reasoning, Volume 161, 2023, Article 108986 Show abstract The aim of the paper is to model ambiguity in a randomized reinsurance stop-loss treaty. For this, we consider the lower envelope of the set of bivariate joint probability distributions having a precise discrete marginal and an ambiguous Bernoulli marginal. Under an independence assumption, since the lower envelope fails 2-monotonicity, inner/outer Dempster-Shafer approximations are considered, so as to select the optimal retention level by maximizing the lower expected insurer's annual profit under reinsurance. We show that the inner approximation is not suitable in the reinsurance problem, while the outer approximation preserves the given marginal information, weakens the independence assumption, and does not introduce spurious information in the retention level selection problem. Finally, we provide a characterization of the optimal retention level. Research article Pseudo-uninorms with continuous Archimedean underlying functions Fuzzy Sets and Systems, Volume 471, 2023, Article 108674 Show abstract We characterize all pseudo-uninorms with continuous Archimedean underlying functions. By discussing all possible combinations of strict and nilpotent underlying functions we show that a pseudo-uninorm with continuous Archimedean underlying functions is not a uninorm only in the case when both underlying functions are strict. Moreover, we show that in the case of pseudo-uninorms with continuous Archimedean underlying functions the set of points where the commutativity could be violated reduces to { ( 0 , 1 ) , ( 1 , 0 ) } . Research article Using scoring functions in a group decision-making procedure with heterogeneous experts and qualitative assessments International Journal of Approximate Reasoning, Volume 161, 2023, Article 109004 Show abstract In this paper, we propose a group decision-making procedure to rank alternatives in the context of ordered qualitative scales that not necessarily are uniform (the proximities between consecutive terms of the scales can be perceived as different). The procedure manages two ordered qualitative scales. One of them is used to determine the weights of the experts according to their expertise, taking into account the assessments given by a decision-maker to the experts. And another one, that is used by the experts to assess the alternatives. In order to assign numerical scores to the linguistic terms of the ordered qualitative scales, we have introduced and analyzed some scoring functions. They are based on the concept of ordinal proximity measure that properly represents the ordinal proximities between the linguistic terms of the ordered qualitative scales. Research article The α -weighted averaging operator Fuzzy Sets and Systems, Volume 471, 2023, Article 108677 Show abstract This paper introduces a new type of mean applicable in various areas of science and practice: the α -weighted averaging operator (AWA). AWA has all the properties required from a linear averaging operator and some additional ones. We discuss the applications of AWA in data aggregation in various areas including uncertainty modeling (summarization, defuzzification), multiple-criteria and multi-expert decision-making and evaluation. We prove that when applied to fuzzy numbers, the α -weighted average converges to the possibilistic mean of a fuzzy number with the increasing number of elements in its support. As such the α -weighted average is a more general aggregation operator than the original possibilistic mean. When fuzzy subsets of the real line represent the information to be aggregated, AWA provides new means for their defuzzification compatible with the possibilistic moments, but applicable to discrete and subnormal fuzzy sets. We also introduce a generalized formulation of the α -weighted averaging operator (GAWA) that can be applied in multiple-criteria and multi-expert evaluation and decision-making problems. We suggest the use of GAWA in operations research theory and applications in the context of data aggregation, multiple-criteria and group evaluation and decision-making. Research article Determination of the continuous completions of conditionally cancellative pre-t-norms associated with the characterization of (S,N)-implications: Part II Fuzzy Sets and Systems, Volume 471, 2023, Article 108675 Show abstract The continuous completions of conditionally cancellative pre-t-norms defined in the regions linked to the characterization of ( S , N ) -implications when S is a continuous t-conorm and N is a fuzzy negation with one point of discontinuity are determined. The results have been divided in two parts. In this second part, we use the results of the first part for providing the continuous completions of a conditionally cancellative pre-t-norm defined in the four remaining regions. The most significant contribution in this second part is the study of the case that corresponds to a region of the type B = ( [ 0 , 1 ] 2 ∖ ( a , 1 ) 2 ) ∪ ( { c } × [ 0 , 1 ] ) ∪ ( [ 0 , 1 ] × { c } ) with 0 < a < c < 1 , which has to be separated in six different cases. View full text © 2023 Elsevier Inc. All rights reserved. About ScienceDirect Remote access Shopping cart Advertise Contact and support Terms and conditions Privacy policy We use cookies to help provide and enhance our service and tailor content and ads. By continuing you agree to the use of cookies . Copyright © 2023 Elsevier B.V. or its licensors or contributors. ScienceDirect® is a registered trademark of Elsevier B.V. ScienceDirect® is a registered trademark of Elsevier B.V.



关键字:

暂无


所属期刊
International Journal of Approximate Reasoning
ISSN: 0888-613X
来自:Elsevier BV