Tuesday, December 20, 2011

Tinkering with evolution: Ecological implications of modular software networks

News of science: Tinkering withevolution: Ecological implications of modular software networksEnlarge
Evolution of the modular structure of the network of dependencies between packages of the Debian GNU/Linux operating system. Packages are represented by nodes. A green arrow from package i to package j indicates that package i depends on package j, and a red arrow indicates that package i has a conflict with package j. Packages within a module (depicted by a big circle) have many dependencies between themselves and only a few with packages from other modules. During the growth of the operating system, the modular structure of the network of dependencies has increased: (I) The new packages added in successive releases depended mainly on previously existing packages within the same module, and hence, the size of the modules created in earlier releases increased over time; (ii) the number of modules also increased, although the new modules consisted only of a few new packages; and (iii) the relative number of dependencies between packages from different modules decreased. Moreover, the relative number of conflicts between packages from different modules decreased, whereas those within modules increased through the different releases of the operating system.
(PhysOrg.com) -- In the 1960s, Dr.Lawrence J. Fogel introduced what would come to be known as evolutionary programming to the nascentfield of Artificial Intelligence in an attempt to produce intelligent softwarewithout relying on neural networks modeled on the brain or human expert-based heuristicprogramming. Now, researchers in the Department of Ecology and EvolutionaryBiology at Princeton University haveshown the inverse – namely, that network theory, when applied to softwaresystems, provides surprising insights into biology, ecology and evolution. Specifically,they explored evolutionary behavior in complex systems by analyzing how theDebian GNU/Linux operating system utilizes modular code. The researchers foundthat how the network becomes more modular over time in various OS installationsoften parallels that of ecological relationships between interacting species.

Lead researcher MiguelA. Fortuna, who worked with JuanA. Bonachela and Prof. Simon A.Levin, Director of Princeton’s Center for BioComplexity, describes the mainchallenges they encountered in designing and implementing the methods used toanalyze OS the evolution. “The main difficulty we had was getting, organizing,and storing the data,” says Fortuna. “Notice that the network of interdependentpackages of the last release analyzed was composed by more than 100,000dependencies. “This complexity required that they use structuring querylanguages (SQL) for managing databases. “We were very careful when identifyingsoftware packages through different release – sometimes there could bedifferent versions of the same package within the same release due to theimprovements made by developers.”
While Fortuna notes that quantifying the increase of the code’smodular structure time was the main insight of their study, he points out that reuseof code and software’s hierarchical structure were suggested by the pioneeringwork of Ricard V. SolĂ© and Sergi Valverde in the early 2000s. “The interestthat our paper has drawn has helped us to discover work we did not know aboutsoftware systems. The idea of using the network of dependencies and conflictsof different releases of the Debian as a case study hasfacilitated the understanding of how code development evolves over time withoutthe need to go deeper into the details of the code itself.”
Another key innovation cited by Fortuna was the team’s useof a very precise method to detect the modular structure of the operating system.“We borrowed an algorithm developed by physicists and widely used in ecologynowadays. In fact, this work has been constantly enriched by aninterdisciplinary mixture of ideas from biology and physics.”
The team already has its eye on ways of improving andextending the current experimental design. “The most important follow-up of ourstudy would be the exploration of proprietary software like the MicrosoftWindows operating system,” Fortuna comments. “Since Debian is the result of avolunteer effort to create a free operating system, you have the freedom todistribute copies, receive source code, modify the software or use pieces of itin new free programs. The question then becomes, what does the softwaredevelopment pattern looks like when the company developing code doesn't offerthis freedom to their users? A comparison of the structure of both developmentstrategies would be more than interesting.”


They are also developing a dynamical model to mimic thegrowth of Debian over time – an effort which, if successful, might let them predicthow many packages, dependencies, and conflicts will arise in the next releaseof the operating system. An interesting question would be,” he conjectures, “ifthere are limits to the number of packages that an operating system can offerto the users without jeopardizing its functionality and robustness. Followingour analogy with the biological evolution, we could ask if there is a limit tobiodiversity, that is, to the number of species that can coexist in our planet.”
Regarding potential analogies with evolution and ecology,Fortuna points to macroevolution – that is, speciation and extinction processes– that he sees as being in some ways equivalent to the creation of new packagesand the deprecation of those rendered obsolete from one release to the next. “Doesthe probability of a species becoming extinct depend on how long it’s been onthe planet? In other words, are the most ancient species, like crocodiles, theones with higher risk of extinction? We can formulate the question, which wasalready explored by Van Valen in the 1970's, by replacing species with software packages. Why do some packages not existafter a subsequent release? Does a new software package created in one of theearliest releases have a high probability to persist over time? What does itdepend on? We can calculate these probabilities following the identity of thepackages of the Debian operating system through time. The data to do it areavailable, and we therefore might learn something from software studies thathelp us answer the biological question – because evolution works as a tinkererin both cases.”
In relation to the ecological processes, Fortunaillustrates, “When an oceanic island is created colonization and extinction arethe main mechanisms that leads to the establishment of a stable community. Thiscommunity assembly would be equivalent to the package installation process in alocal computer. For example, dependencies and conflicts between packages mimicpredator-prey interactions and competitive exclusion relationships,respectively. A predator can colonize the island only if the prey it feeds onis already there.”
In Fortuna’s view, the same thing happens with softwarepackages. “A package can be installed in a computer only if the packages itdepends on are already installed. Ecologically similar prey species are goingto compete with each other in the island for light and nutrients so that thebest competitor is going to displace the others, which can then become extinct.Predators feeding on extinct prey are going to disappear as well. Conflictsbetween software packages have the same consequences: one package cannot beinstalled in the computer if it has a conflict with an already installed one,so that those packages depending on it cannot be installed either. Thisparallelism can help us understand the general principles operating on systemsof different nature.”
Reminiscent of AI-based evolutionary programming, Fortunaalso says that their work might well lead to improved in silico modelsof evolutionary biology and population ecology. “Charles Ofria and his lab atMichigan State University are studying evolution by using self-replicatingcomputer programs able to mutate and evolve over time.” The genome of theseprograms consists of a set of instructions that are executed by the centralprocessing unit (CPU). Some of the mutations imply the insertion of randominstructions into the genome. If the mutant program is able to reproduce fasterthan the others, its genome is going to persist through time.
“It could be interesting to explore to what extent newinstructions added to the genome interact with the preexisting ones – that is, whetheror not there is a reuse of the genome instructions of these digital organismsand its resemblance with a modular structural pattern,” Fortuna observes. “Theinterplay between ecology and computer science is much more evident if we takea look at the work developed by Luis Zaman, Ofria's graduate student, who isincorporating host-parasite interactions into these computer programs.”
Looking further afield, Fortuna describes how other modelsor applications might be targeted using the team’s findings. “The closest studywould be the comparison with the development pattern of other GNU/Linux distributions– openSuse, Fedora, Gentoo, and so on – as well as proprietary operatingsystems like Microsoft Windows and Apple OS X. The information needed toaccomplish this task would easily be compiled for the first ones – but it willbe much more difficult to get it for the last ones. The algorithms fordetecting modular structures are publicly available. There are also powerfulfree SQL relational database management systems like PostgreSQL and MySQL tostore, organize, and manage the information. So,’ he concludes, “the bottleneckis once again data availability.”
More information: Evolutionof a modular software network, Publishedonline before print November 21, 2011, PNAS December 13, 2011 vol. 108 no. 50 19985-19989, doi: 10.1073/pnas.1115960108
News source:physorg.com

No comments:

Post a Comment