Skip to content

Development Tutorial Genome

Matthew Andres Moreno edited this page May 13, 2023 · 1 revision

This document discusses the implementation of the cInstruction and Genome classes.

 

The cInstruction Class

This class is used to represent a single instruction within a genome. The private portion of this class consists of a single number that uniquely identifies the type of instruction, and the public section has a number of helper methods that allow us to work with that number.

class cInstruction {
private:
  unsigned char m_operand;

public:
  // Constructors and Destructor...
  cInstruction() : m_operand(0) { ; }
  cInstruction(const cInstruction& inst) { *this = inst; }
  explicit cInstruction(int in_op) { SetOp(in_op); }
  ~cInstruction() { ; }

  // Accessors...
  int GetOp() const { return static_cast<int>(m_operand); }
  void SetOp(int in_op) { assert(in_op < 256); m_operand = in_op; }

  // Operators...
  void operator=(const cInstruction& inst) { m_operand = inst.m_operand; }
  bool operator==(const cInstruction& inst) const { return (m_operand == inst.m_operand); }
  bool operator!=(const cInstruction& inst) const { return !(operator==(inst)); }

  // Some extra methods to convert too and from alpha-numeric symbols...
  char GetSymbol() const;
  void SetSymbol(char symbol);
};

As stated above, the only private datum is a numerical value that identifies this instruction. The name m_operand is the term that is used for a command name in an assembly language. Most normal assembly languages have both an operand and arguments associated with each full command. In Avida, the commands have no arguments, and hence they just consist of a single operand. The data type used for this is unsigned char. A char is an 8 bit number, so when one is unsigned, it represents a number from 0 to 255. As Avida is currently implemented, we are limited to 256 distinct instructions. To the outside world, we treat the instruction operand like an integer, so it would be easy to modify this class were we ever to need more than 256 instructions in the set. The only reason to limit it to 8 bits internally (rather than the 32 of an int) is to save memory when we have a very large number of instructions throughout a population. Instructions already make up over half of all the memory resources used by Avida.

The public methods begin with the GetOp() and SetOp() methods, which are standard accessors. Next, we have a collection of methods that begin with the word 'operator'. These are used to define how the corresponding symbols should be treated when applied to objects of this class. For example, the method operator==() is called when we try to compare an object of type cInstruction to another. We have full control over the definition of this method, just like any other.

Finally, we have a pair of methods that convert instructions to and from alphanumeric characters (symbols). These methods are used to print instructions out in a maximally compressed format, and to load them back in. The order of symbols used are the letters 'a' through 'z' in lowercase, followed by an uppercase 'A' through 'Z' and finally the numbers '0' through '9'. If there are more than 62 possible instructions, all the rest are assigned a '?' when printed out, and cannot be read back in properly from this format.

 

The Genome class

A genome is a sequence of instructions. The following class maintains this sequence as an array, and provides a collection of methods to manipulate the total construct.

class Genome
{
protected:
  Apto::Array<cInstruction> genome;
  int active_size;
 
public:
  Genome() { ; }
  explicit Genome(int _size);
  Genome(const Genome& in_genome);
  Genome(const cString& in_string);
  virtual ~Genome();
 
  virtual void operator=(const Genome& other_genome);
  virtual bool operator==(const Genome& other_genome) const;
  virtual bool operator!=(const Genome& other_genome) const { return !(this->operator==(other_genome)); }
  virtual bool operator<(const Genome& other_genome) const { return AsString() < other_genome.AsString(); }
 
  cInstruction& operator[](int index) { assert(index >= 0 && index < active_size);  return genome[index]; }
  const cInstruction& operator[](int index) const { assert(index >= 0 && index < active_size);  return genome[index]; }
 
  virtual void Copy(int to, int from);
 
  int GetSize() const { return active_size; }
  cString AsString() const;
};

The protected variable genome is an array containing an object of type cInstruction at each position. The second variable active_size denotes the number of instructions in this array that are currently being used. The fact that these variables are "protected" instead of "private" means that any class derived from Genome will also have direct access to the variables. In particular, the class cCPUMemory extends Genome, adding methods to alter the array length and new variables to keep track of information about each instruction.

Three constructors allow for a new Genome object to be specified by either a genome length, a previously created genome, or else a string -- a sequence of symbols representing each instruction in order.

The operators created for manipulating genomes include both assignment (setting one genome equal to another) and comparison (testing to see if two genomes are identical.) Additionally, there are two operator[] methods. This means that if you have an object of type Genome, you can index into it to retrieve a single instruction. Thus, if the object was called initial_genome, the statement initial_genome[15] would return the instruction at position fifteen in the genome. This occurs by calling one of these methods with the appropriate integer. The difference between these two operator methods is that one of them is for mutable genomes (i.e. those that can be modified) -- a reference to the instruction in question is returned allowing it to be altered. The other index operator (operator[] method) is for const genomes, which can never be changed so only the value of the instruction is returned.

The Copy() method is a shortcut to copy memory from one position in the genome to another. This method will later be overloaded (that is, replaced with a newer version) by cCPUMemory such that the proper flags will be copied, with the instruction and others will be set to indicate the copied instruction for future tests. GetSize() returns the length of the genome, and AsString() returns a string who has symbols in each position that correspond to the the instruction in the same position in the genome.

Clone this wiki locally