Using Citation Gecko to map references for scientific papers

It’s quick and easy to acquire, analyze, and index your papers using the browser-based tool

Alexander Weston, PhD
Geek Culture

--

Image from WikiMedia Commons.

For me, drafting the introduction of a scientific paper can be one of the most time-consuming aspects of the writing process.

I’m currently drafting a paper for a project I’ve been working on for over a year, and I found myself Googling for a tool that would help me organize my endless scrolling of PubMed articles and more importantly prevent me from overlooking a key paper in the field (which invariably comes up in reviewer’s comments).

I’m a visual thinker, and was hoping for a graph-based tool which could help me quickly discover and summarize oft-cited papers in the field. I believe Web of Science used to have such a citation mapping tool, but it seems to have gone extinct.

During my search, I found several software tools such as Connected Papers, Inciteful, VOSViewer, and Research Rabbit**. These tools all look powerful, but all of them required me to download software onto my work computer, or create an account which was beyond the scope of what I was hoping to accomplish.

I eventually stumbled on Citation Gecko which is a free, browser-based tool. I found it fun and easy to use; in just a couple of hours, I was able to generate a complete bibliography and even discovered five or six fascinating papers I would otherwise have overlooked.

What follows is a brief, step-by-step tutorial on how I got started using Citation Gecko. I’m showing some screenshots from my own project, an original publication on the topic of using deep-learning to measure population-based differences in frailty and obesity.

**Note — if you are interested in one of these other tools, which have many more features and may be better suited to long-term research projects, here’s a thorough overview of citation mapping tools.

What is Citation Gecko?

Figure 1. Mapping scientific paper citations using the online tool Citation Gecko. Image by Author.

Citation Gecko is a simple graphing interface which maps the connections between paper references. It’s browser-based which means you don’t have to download anything or create an account; it takes seconds to get started, and has a built-in search tool for discovering new references.

Using Citation Gecko, you can quickly generate a list of papers relevant to your topic, visualize the “similarity” of papers, and identify which articles are generating the most citations in the field.

Once you’ve curated your references, Citation Gecko allows you to export the list in a “.bib” file (bibliography) for easy import into your favorite reference manager.

Getting started with adding papers

Figure 2. Adding seed papers to Citation Gecko. Every good researcher cites themselves first. Image by Author.

When you first log in to the tool, it will prompt you to add “seed” papers — starting points for the network. Citation Gecko will automatically map the “seed” papers’ references, as well as any newer papers which cite the seeds.

I set my first “seed” as my own recent publication on this topic. I also added four seed papers from the literature. I usually find that there are a handful of key papers which served as inspiration or reference for the methodology or results, and it was relatively easy to pick useful seeds.

Figure 3. Visualizing papers which are “cited by” our yellow seed papers (left) and newer papers which “cite” our seed papers (right). Image by author.

Figure 3 shows what the networks look like after adding five seed papers. Each seed is represented as a yellow dot on the graph. Papers which are referenced to the seed appear as grey dots which are connected to the seed with lines. Citation Gecko has two frames of reference — papers which are cited by the seed (Figure 2, left) and newer papers that cite the seed (Figure 2, right). Cited by papers look backward in time, and citing papers look forward in time.

Looking at the plots, you’ll notice there are twenty grey dots in each graph which are larger and connected to multiple yellow dots. These are papers which reference multiple seed articles. In this particular network, no single paper is cited by every article, but on the left graph, a total of five papers are cited by three of our five seeds.

How I used Citation Gecko to organize my references

From this deceptively simple interface, you can quickly gain some crucial information.

First, looking at papers which are “cited by” the seeds, you can quickly identify a short-list of the most topical papers in the history of the field. I included every paper cited by two or more seeds in my bibliography, and about 25% of the papers which were cited once.

For example, in Figure 4 (below), there’s one article in the middle which is cited by four of my five reference papers. The name of this paper is U-Net: Convolutional Neural Networks for Biomedical Image Segmentation by Olaf Ronneberger. Some of you may recognize this, it’s a foundational deep-learning paper which has been cited over 40,000 times. In this case, all my seed papers used a similar deep-learning methodology based on the U-Net model.

Second, reviewing papers which “cite” your seeds allows you to quickly get up-to-date on recent progress on your topic. The menu appears to update regularly (there were citations as recent as several months ago) so it’s easy to quickly gather new papers that may be difficult to search because they haven’t garnered any citations (in case you’ve been recently scooped!)

If you happen to find a crucial reference you’ve overlooked, you can add it as a seed with a single click, and Citation Gecko will pull those papers’ citations quick review of far-reaching papers. I found this a lot easier than opening endless tabs of PubMed articles.

Figure 4. Papers can be viewed/opened in the left menu. Image by author.

Here’s the process I followed to complete my bibliography —

I kept all these references which were “cited by” at least two seeds, plus about half of the references which were “citing” two or more seeds. This built my bibliography to about thirty papers, which is already about typical in my field.

For the remaining majority of papers which were referenced by only one seed, I quickly scrolled through these citations in the leftmost menu. There’s a link to open the citation in Google Scholar to peruse the abstract, and a button to mark the citation as “irrelevant” which will remove it from the list.

I was diligent in removing papers that were irrelevant. For every reference I didn’t want to include in the manuscript, I “deleted” it and removed it from the graph.

It took an hour to sift through about 300 citations in both the “citing” and “cited by” graphs. I ended up removing about 90% of these citations, which left me with a final version of a network which contained about fifty references total, many of them connected.

Finally, I exported the complete list of fifty in a downloadable “.bib” file, which I imported directly into EndNote.

The benefits of using a citation mapping tool

The process I followed of organizing references with Citation Gecko is backwards of my typical habits. Usually, I’ll quickly draft the introduction based on my own preconceived ideas, and then Google for references which support what I’ve already written.

By starting with the citations first, I was able to start with a much less biased snapshot of the literature based on what other experts have cited, and then build my introduction around that. Not only is this more informative for my readers, but I actually found it easier to write. Rather than staring at a blank page, I started with a list of what others had done, grouped like-minded publications, and briefly summarized the impact in the text.

Throughout this process, I discovered about 5 or 6 highly-relevant papers I’d never read before, including a whole set of older publications on evaluating the impact of contrast on CT muscle density measurements which I had overlooked. In many ways, I like the relatively unbiased process of organizing references based solely on the basis of citation; I suspect that Google Scholar, PubMed, and other databases which “recommend” papers tend to re-enforce the same topics and journals just like a filter bubble in social media.

Limitations of the Tool

Citation Gecko is a simple tool (which is why I like it), and that comes with a few limitations which are worth mentioning.

The biggest limitation, it’s difficult to save sessions. You can export your current session as a “.bib” file, and then re-import it into a new session at a later date. This process worked fine but there’s plenty of opportunity for error.

Networks seem to work best with five or six seeds and a few hundred papers total. The tool lagged noticeably when I imported a paper with over 100 references. You especially need to be careful with highly cited papers, which could crash the browser. I also struggled building a compelling map from newer papers which have fewer citations, and therefore fewer connections.

Finally, the list of imported papers did frequently contain references which were either duplicated or unreadable (e.g., the title was missing). I suspect this may be a problem with the import tool; it especially affected papers with “non-standard” reference formats such as conference abstracts or websites. It was not much work to clean these up.

Conclusion

There may be other, more powerful reference managers out there, but Citation Gecko is a quick win. Using Citation Gecko, I was able to generate and export a complete bibliography in about an hour-and-a-half, a huge improvement over my usually habits which often takes multiple days.

I hope this article has been useful to you,

Thanks for reading,

Alex

--

--

Alexander Weston, PhD
Geek Culture

Principal data scientist at Mayo Clinic. My views are entirely my own.