Fun with GECCO 2014, ALIFE 2014, and Zotero Paper Machines

Posted February 10, 2015 by Emily Dolson in Review / 0 Comments

I recently discovered the Paper Machines add-on to Zotero, which allows you to perform visualizations and topic modeling analyses on papers in your Zotero collection. I just so happened to have the complete proceedings of both GECCO 2014 and ALife 2014 kicking around in my Zotero database, so I decided to try comparing them. As a quick background, GECCO, which focuses on Genetic and Evolutionary Computation, and ALife, which focuses on Artificial Life, are the two main computer science* conferences that we in the Devolab tend to go to. There is substantial overlap between these conferences (GECCO has an Artificial Life track, after all), but there are also some fundamental differences in approach and focus.


Word cloud of the most commonly used words in papers presented at GECCO 2014.
Word cloud of the most commonly used words in papers presented at GECCO 2014.


Word cloud of the most commonly used words in papers presented at ALife 2014.

The word clouds above do a pretty good job of capturing these differences, I think. Evolutionary Computation tends to be focused on finding solutions to problems, as suggested by the prevalence of words like “solutions”, “objective”, and “search.” As a result, there is also a greater emphasis on developing algorithms, hence the presence of Greek letters. Artificial Life, on the other hand, tends to focus more on understanding how a system as a whole works. This is reflected in the popularity of words like “complexity”, “dynamics”, and “behavior.” Artificial Life, as a field, also tends to care a lot more about biology, which explains the frequency of words like “biological,” “natural,” and “species.”

Paper Machines also allows you to create phrase nets, wherein you create a query of the form “x [regular expression] y,” and a map of common x to y mappings using that regular expression is created. Here’s an example of searching for “x and y” in the GECCO proceedings:

Words commonly connected by "and" in the GECCO 2014 proceedings.
Words commonly connected by “and” in the GECCO 2014 proceedings.

As you can see, this reveals a variety of words commonly paired using “and.” These include a lot of commonly paired concepts, such as “time and space”, “exploration and exploitation”, and “theory and practice.” This sort of analysis also turns up frequently-cited pairs of authors, such as Lehman and Stanley.

You can also use phrase nets to get a very rough summary of some important findings across a set of papers. For instance, a lot of the the findings at ALife that I would be interested in involve something “being sufficient to evolve” something else, or “favoring the evolution of” something else, and so on. By stringing these commonly used phrases together into the search pattern, you can create a phrase net such as the following:

A Phrase Net summarizing conclusions from ALife 2014.
A summary of conclusions from ALife 2014 in phrase net form.

Obviously this is an imperfect summary. Clearly some of these words are part of longer phrases (predators -> sophisticated, for instance). Also, many of these phrases may have been from literature review sections rather than conclusions. Still, there are some interesting sounding connections here (I’m definitely curious about that eavesdropping -> competition connection!), and a number of phrases that might make good jumping off points. We could get much more precise results with a fuller set of text-mining tools, but that is a topic for a future post.

What do you think? Does this seem useful? Notice any interesting patterns that I didn’t comment on? Does this raise new questions about either of these bodies of text? Anything I should explore in a follow-up post (potentially using the r tm library)?

Emily Dolson

I’m a doctoral student in the Ofria Lab at Michigan State University, the BEACON Center for Evolution in Action, and the departments of Computer Science and Ecology, Evolutionary Biology, & Behavior. My interests include studying eco-evolutionary dynamics via digital evolution and using evolutionary computation techniques to interpret time series data. I also have a cross-cutting interest in diversity in both biological and computational systems. In my spare time, I enjoy playing board games and the tin whistle.

More Posts - Website - Twitter