-
Notifications
You must be signed in to change notification settings - Fork 114
Development Tutorial Genome
This document discusses the implementation of the cInstruction and Genome classes.
This class is used to represent a single instruction within a genome. The private portion of this class consists of a single number that uniquely identifies the type of instruction, and the public section has a number of helper methods that allow us to work with that number.
class cInstruction { private: unsigned char m_operand; public: // Constructors and Destructor... cInstruction() : m_operand(0) { ; } cInstruction(const cInstruction& inst) { *this = inst; } explicit cInstruction(int in_op) { SetOp(in_op); } ~cInstruction() { ; } // Accessors... int GetOp() const { return static_cast<int>(m_operand); } void SetOp(int in_op) { assert(in_op < 256); m_operand = in_op; } // Operators... void operator=(const cInstruction& inst) { m_operand = inst.m_operand; } bool operator==(const cInstruction& inst) const { return (m_operand == inst.m_operand); } bool operator!=(const cInstruction& inst) const { return !(operator==(inst)); } // Some extra methods to convert too and from alpha-numeric symbols... char GetSymbol() const; void SetSymbol(char symbol); };
As stated above, the only private datum is a numerical value that identifies
this instruction. The name m_operand is the term
that is used for a command name in an assembly language. Most normal assembly
languages have both an operand and arguments associated with each full command.
In Avida, the commands have no arguments, and hence they just consist of a
single operand. The data type used for this is unsigned char
. A
char is an 8 bit number, so when one is unsigned, it represents a number from 0
to 255. As Avida is currently implemented, we are limited to 256 distinct
instructions. To the outside world, we treat the instruction operand like an
integer, so it would be easy to modify this class were we ever to need more
than 256 instructions in the set. The only reason to limit it to 8 bits
internally (rather than the 32 of an int) is to save memory when we have a very
large number of instructions throughout a population. Instructions already
make up over half of all the memory resources used by Avida.
The public methods begin with the GetOp() and SetOp() methods, which are standard accessors. Next, we have a collection of methods that begin with the word 'operator'. These are used to define how the corresponding symbols should be treated when applied to objects of this class. For example, the method operator==() is called when we try to compare an object of type cInstruction to another. We have full control over the definition of this method, just like any other.
Finally, we have a pair of methods that convert instructions to and from alphanumeric characters (symbols). These methods are used to print instructions out in a maximally compressed format, and to load them back in. The order of symbols used are the letters 'a' through 'z' in lowercase, followed by an uppercase 'A' through 'Z' and finally the numbers '0' through '9'. If there are more than 62 possible instructions, all the rest are assigned a '?' when printed out, and cannot be read back in properly from this format.
A genome is a sequence of instructions. The following class maintains this sequence as an array, and provides a collection of methods to manipulate the total construct.
class Genome { protected: Apto::Array<cInstruction> genome; int active_size; public: Genome() { ; } explicit Genome(int _size); Genome(const Genome& in_genome); Genome(const cString& in_string); virtual ~Genome(); virtual void operator=(const Genome& other_genome); virtual bool operator==(const Genome& other_genome) const; virtual bool operator!=(const Genome& other_genome) const { return !(this->operator==(other_genome)); } virtual bool operator<(const Genome& other_genome) const { return AsString() < other_genome.AsString(); } cInstruction& operator[](int index) { assert(index >= 0 && index < active_size); return genome[index]; } const cInstruction& operator[](int index) const { assert(index >= 0 && index < active_size); return genome[index]; } virtual void Copy(int to, int from); int GetSize() const { return active_size; } cString AsString() const; };
The protected variable genome is an array containing an object of type cInstruction at each position. The second variable active_size denotes the number of instructions in this array that are currently being used. The fact that these variables are "protected" instead of "private" means that any class derived from Genome will also have direct access to the variables. In particular, the class cCPUMemory extends Genome, adding methods to alter the array length and new variables to keep track of information about each instruction.
Three constructors allow for a new Genome object to be specified by either a genome length, a previously created genome, or else a string -- a sequence of symbols representing each instruction in order.
The operators created for manipulating genomes include both assignment
(setting one genome equal to another) and comparison (testing to see if
two genomes are identical.) Additionally, there are two
operator[] methods. This means that if you
have an object of type Genome, you can index into it to retrieve a single
instruction. Thus, if the object was called initial_genome
, the
statement initial_genome[15]
would return the instruction at
position fifteen in the genome. This occurs by calling one of these methods
with the appropriate integer. The difference between these two operator
methods is that one of them is for mutable genomes (i.e. those that can be
modified) -- a reference to the instruction in question is returned allowing
it to be altered. The other index operator (operator[] method) is for const
genomes, which can never be changed so only the value of the instruction is
returned.
The Copy() method is a shortcut to copy memory from one position in the genome to another. This method will later be overloaded (that is, replaced with a newer version) by cCPUMemory such that the proper flags will be copied, with the instruction and others will be set to indicate the copied instruction for future tests. GetSize() returns the length of the genome, and AsString() returns a string who has symbols in each position that correspond to the the instruction in the same position in the genome.