TopNPackages

From PCTeXWiki

Contents

Which LaTeX packages are most used?

(with some real data on which packages are most used)

The problem

The LaTeX distribution has more than 2,000 classes and styles (1). These classes and styles provide easy ways for a LaTeX user to change the appearance of a document and to add useful new commands. These LaTeX add-ons exist as files and are loaded automatically by your LaTeX system. For example, if your document contains the command

 \usepackage{SIunits} % format scientific units

your LaTeX system will locate the file SIunits.sty and include it as part of your document.

But before you decided to use SIunits you might have looked in the TeX Catalogue and noticed that there were several packages that format scientific units. Which one should you use? Which ones are most current?

Another way to look at the problem is from the point of view of a LaTeX distribution maintainer who wants to document useful aspects of LaTeX packages. For example, PCTeX and other distributions provide the user with an easy way to configure options in classes and styles. Since it will take some time to document each package, it's best to work first on those that are used most. So again the problem is, which packages are most commonly used?

Current methods

Below are some of the methods LaTeX users can employ to choose packages:

LaTeX guides - If you own a LaTeX guide, there are usually directions included for composing various types of documents. The guide will include sample files or templates that reference LaTeX packages. Often the indexes to these guides will contain useful packages.

The TeX Catalogue (http://texcatalogue.sarovar.org/) - an online reference to all LaTeX and other packages, with more than 3,200 entries. There is also a useful Topical index of packages.

Comprehensive TeX Archive Network (http://ctan.org/ CTAN) - the complete TeX archive with an online search tool. This may be difficult to use for the casual LaTeX user.

A real life way to determine commonly-used packages

In mid-2007 Personal TeX, Inc. began a project to offer documentation on the most commonly used LaTeX packages. The first phase of this project was to present dialogs from within the PCTeX GUI to help users configure options for documentclasses and packages (for example, the article class, the hyperref package, etc.), and also provide an easy way to view the available documentation for each class or package.

We needed a way to determine which packages were most used so that we could begin documenting these first. Initially, we choose a set of packages that seemed to be most commonly used. We identified about 20 packages and then provided the necessary documentation so that users could configure options and display documentation.

For the next phase of the project we needed to know which of the remaining 2,000+ classes and packages were most worth documenting. We found a way to determine exactly which packages PCTeX users employed most.

PCTeX, like other LaTeX distributions, uses a just-in-time method to load packages from an online archive. For example, when a user writes the following in a document

	\documentclass{beamer}

LaTeX looks for the appropriate package files in the local archive, which is kept on the user's system. If beamer.cls is installed locally, the LaTeX system continues formatting the document. Otherwise, PCTeX will automatically download this package from an online archive and install it in the user's system.

When PCTeX requests a package from its online archive it keeps a log of which package was requested, along with the date and time of the request, and the user's serial number. Using this data were are able to determine which packages are used most frequently, how many packages were requested by each user, and the timeframe when the packages were requested.

Some statictics

With the help of statistician Peter Flom a set of data points were analyzed, and some useful graphs and tables were generated.

The most-requested packages (The top-N)


The number of packages requested per user

When we at PCTeX first implemented the just-in-time package loader, the system logs from our online archive showed that many fewer packages were requested than we had predicted. Peter Flom analyzed a set of data points from 1,013 users and generated the table below. PCTeX is installed with a small set of packages that do not need to be downloaded. Notice that nearly half of the users studied downloaded no additional packages beyond this basic set. This is shown in the first row of the table where 500 users downloaded 0 packages.

Number of packages downloaded Number of users
0 500
1 85
2 37
3 46
4 59
5 47
6 36
7 32
8 23
9 21
10 14
11-15 65
16-20 41
21-50 72
51-100 19
more 11

When are packages requested

The data points were able to tell when packages were requested. We suspected that users would have a fair amount of activity when they first began using PCTeX. That is, the documents they were formatting first would need some additional packages. After this initial period their requests would become fewer. This seems to bear out in the graph below.

Some problems with the data, and future remedies

We took care in choosing the data to be analyzed. We used the data for single users only, and discarded site license users. If we had included site license users we could not have studied the dynamics of a single user.

We used data from registered users only. PCTeX is available as a 30-day trial, and we did not include data for those who requested a trial.

On the other hand, since the data for trial users who eventually became registered users was not used, it is likely that users in this situation may have downloaded packages during the trial period. Since this data was discarded and only the data for when the user was registered is used, some of the tables and graphs above may not reflect true user behavior.

The data collected was for package bundles only, and not for individual classes and styles. For example, the tools package consists of a bundle of over 30 style files. We collected data for tools but not for the individual style files included in the tools bundle, such as afterpage, enumerate, multicol, etc.

Conclusions

The data seem to follow what we and probably others suspected. Of the 2,000+ LaTeX packages available only a small number are used frequently. We at PCTeX are continuing the package documentation project based on this data.

Even though the results reflect PCTeX users, it may be useful for maintainers of archives and other distributions. The online LaTeXpedia project proposed by the Italian User Group may be able to offer articles on commonly used packages, before embarking on documenting all known packages.


(1) The TeX Live 2007 distribution offers 201 document styles (\documentclass{...}) and 2,133 packages (\usepackage{...}). There are even more in the CTAN collection.