See this page online at: http://www.bioscienceworld.ca/ThePILEsystem


  • Make this your homepage
  • Print this Page


Magazine

Sign up for your free subscription and keep up-to-date.


Upcoming Events


Newsletters

Stay updated on the latest news and technologies with Bioscienceworld's newsletters.
Five to choose from.


Email Address

The PILE system


By Yugo Acimovic

The Pile System was invented and designed by Erez Ellul, with contributions from Miriam Schlip, Peter Krieg and myself. Pile can be seen as a natural evolution of computing, even though it appears more like a radical shift changing the way we think about data, procedures and relationships between them. With any new invention, a variety of opinions arise — from condemning it to strongly believing in it. To me — and to anyone who has spent decades chasing genes and proteins through hundreds of poorly annotated databases, developing systems to represent natural processes, and trying to describe life using arrays, hashes and relational logic — Pile appears like an oasis in the sea of desert sand.

Through this text I will try to communicate how Pile differs from anything we have seen so far in computer science. In the beginning I would like to mention only one feature essential for understanding Pile: the Pile System computes using relations and connections only.

In English, the word “pile” denotes a group or multiplicity of something that is organized naturally in a form — sometimes easy and sometimes very difficult to describe, depending on the environment and number of variables acting upon it. A pile of sand would be one of the simplest piles to describe because it would conform based on a small number of variables, including gravity, humidity, pressure, grain size and wind.

In order to build a house we will need other piles, such as concrete, blocks, wires, wood, etc. Once we build the house, we can refer to it as a pile, since the house is nothing but a pile of piles organized by the rules defined by an architect and the laws of physics. In order to avoid oversimplification of the pile we are developing, the word “system” will be used to refer to various sorts of piles in the remainder of this text. Since there is no real advantage to using Pile instead of Classical Object Oriented Programming when representing a simple system such as a building, I will gradually introduce more complex problems in which the advantages of Pile become very notable. Since it is impossible to fully describe the Pile System here, I strongly encourage you to read further at http://pilesys.com and < a href="http://sciux.com">http://sciux.com.

Properties of Pile
Pile always starts as a multiplicity of trees, growing and branching out as data is mapped. It grows in unusual ways so that branches start interacting with other branches, and even with roots, so that a set of initial trees evolves into a complex, yet very organized and co-ordinated, network looking more like a bush than a forest. When building more complex systems, we introduce heterogeneous data of different types and meanings simultaneously (for instance, a DNA sequence and its annotation) and want to link the two. In the process, two complex networks are being created, and our structure ends up as a highly connected and co-ordinated network of heterogeneous trees (a network of networks) in which any object (such as a leaf or a branch) knows its predecessors, descendants and external links.

In the case of DNA and annotation, this means that any DNA sequence knows all its subsequences, supersequences and annotation at any point in time.

An organism can be described as a system of dynamic coexisting systems. In almost all organisms (except some RNA viruses), everything starts from the DNA where all characteristics are encoded and passed from generation to generation. In Pile, the Nucleic Acid is notated as the base system, defined as a system which does not contain any other subsystems, i.e., in which terminal values are basic variables, called primitives in many programming languages (int or char for example). Input values, regardless of their complexity, are called terminal values (TVs). Any data derived by combining TVs are referred to as “objects” in the system. Any object must have two parents — mother and father — where a single object can take the role of both father and mother, and one parent can have more children with any other parent in the system. Parents in Pile are referred to as normative (N) and associative (A); therefore one object can be N-parent, A-parent, and both at the same time. This phenomenon is observable in nature in organisms that simultaneously exhibit sexual and asexual reproduction.

In addition to enabling objects to take multiple roles, in Pile every child node points to its parents, and to its children. Every object in the system has a unique ID and position in the structure, which is determined by the data it represents. This is a radical advancement because the data themselves can choose the best structure, rather than being forced into structures designed by developers. On the other hand, data representation is not rigid; it can be manipulated and designed by the designer to meet problem-specific needs. In the Pile world, we refer to this feature as a Self-adjusting Structure (SAS). Therefore, the best data structure is determined by the data themselves, guided and adapted by the system designer and developer on a high level.

Looking at Figure 1, one can easily observe native compression capabilities of Pile when representing DNA sequences, for example. This observation is valid for any highly redundant data set. A computer scientist with a careful eye might question compression capabilities by arguing that keeping records (pointers) from one string to many places will diminish the compression advantages that nonredundancy introduces. In Pile, however, that is not the case, since it does not deal with classical pointers, but carefully crafted combinative pointers (patented CPs).

In addition to knowing its position within one tree (DNA sequence), an object can belong to multiple trees, and a new tree can start anywhere within the old tree’s structure. This allows seamless integration of heterogeneous data, for example, DNA sequence, protein sequence and their annotation.

The strongest advantage of Pile lies in its ability to access physical memory in an associative manner. Say we are looking for the name “Sam Gamgee” in the set of thousands of books in our electronic library. Established algorithms such as Boyer-Moore (the fastest string-searching algorithm that does not require preprocessing) would have to search through millions of lines of text in order to find which book and which page in that book contains this name. Pile, on the other hand, will access its address in one operation, since the string “Sam Gamgee” played a role when the data structure was AutoDesigned when we loaded The Lord of The Rings (LOTR) in our e-book collection.

I have intentionally used a role that many of you are familiar with. Whoever watched the LOTR movies, or read the books, knows where to look to find the name instantly. That is how long it takes for Pile to find our query: no time at all. This observation suggests that Pile works in similar ways to our brain, with one important exception: brains forget, and Pile remembers everything! How many of us will make an effort to remember the following short, and important, Amino Acid sequence string: LYAKMIQKLADLRSLNEEHSKQYR (motif-6 of steroid hormone nuclear receptor)? I doubt any of us will, but Pile certainly does.

Pile can access, update and delete data in real time, with speeds unmatched by any indexing method or algorithm known today, all because it is able to mimic associative memory in a very effective way. Here is an example I like to use to describe this extraordinary capability of Pile: Think about your best friend (P1), and a friend common to both of you (P2). If you ask P1 anything about P2, P1 will instantly know to whom you are referring. Pile will do the same. In contrast, current algorithms will need to search through billions of records (not indexed).

In conclusion, Pile features scalability, flexibility, compression, integration and speed. Sciux Inc. (Toronto, ON) is developing the system that will demonstrate all these features.