depgrapher: Fast Dependency Analysis for Javascript

28.04.2022 | Arne Babenhauserheide

Where is this module used? Show me a graph of all the modules I import! Visualize the semantic environment of this file!

For these tasks you can write a precise Javascript parser or extract the ast with an existing tool. And if your project is big enough (or too big?) it will be slow. What to do?

You can wait, but that’s annoying. Or you can cache, then you trade speed for cache invalidation, one of the 10 hard problems in software development (with naming being the other).

Or you analyze your requirements and turn to Unix tools for the rescue.

Requirements

What do you really need? That’s in the tasks at the beginning of this article. In short: local graphs of imports.

What do you have? There are imports:

    import 'foo/bar';

And partial imports:

    import {
      baz,
      blue
    } from 'moo';

And dynamic webpack imports:

    import(/* webpackChunkName: "blah" */ 'blah')
      .then(...);

And a few more. What do they have in common?

If you use typical and regular code-style they all map to regex. Too big and annoying regex, but regex nonetheless.

And if you can regex it you can ripgrep.

And if you can ripgrep your project won’t be too big.

The graph

Now that the requirements are clear we assemble simple regular expressions to get all edges and nodes of a graph representing the whole dependency structure of the project. I hear you ask „but didn’t we want just the local area?“ Yes, we did, but ripgrep is fast. And this keeps it simple:

    TMPDIR=$(mktemp -d) # working in a TMPDIR to avoid interfering with anything else
    rg from\ \' | grep -v '#' > "${TMPDIR}"/graphbody
    rg import\ \'".*;" | grep -v '#' >> "${TMPDIR}"/graphbody
    rg -o "import\\(.*webpackChunkName.*'.*'\\)" | grep -v '#' >> "${TMPDIR}"/graphbody

This gives us something like the following:

    foo/bar.js:import { baz, blue } from 'moo';
    foo/bar.js:import(/* webpackChunkName: "blah" */ 'blah')
    moo.js:import 'foo/bar';

(yes, we can have cycles. This is JS. The cycle is in pseudo compile-time.)

Since these are the edges and nodes of a graph, let’s turn them into a graph. The command line tool for that is Graphviz. It expects a format like the following:

    digraph {
        "foo/bar" -> "moo"
        "foo/bar" -> "blah"
        "moo" -> "foo/bar"
    }

depgrapher pseudograph

So we have this huge list of dependencies, one per line, and need to transform it. The tool for that is sed. Or awk, but I know sed better, so I use sed.

First cleanup the lines: Filter out anything we don’t need. I’ll show that only for the clearest case (the direct imports); the details for the others are in depgrapher.in.

    rg from\ \' | grep -v '#' \
      | sed "s/\\.[^.]*:.*from '/ -> /" \ | sed "s/'.*//g" \
      | sed -E 's,([^ ]+)/([^ ]+) -> \./([^ ]+),\1/\2 -> \1/\3,' \
      | sed -E 's,([^ ]+)/([^ ]+)/([^ ]+) -> \../([^ ]+),\1/\2/\3 -> \1/\4,' \
      | sed -E 's,^(.*) -> (.*)$,\2 -> \1,' \
      | sed 's,^,",' | sed 's,$,",' \
      | sed 's, -> ," -> ",' \
      | sed s,cadenza/,,g  > $TMPDIR/graphbody

We’ll need to disentangle that a bit: rg gives us the lines we saw before: basename.ext:from 'filename'.

The first sed call replaces that by basename -> filename.

The second line resolves relative imports in the same directory: ./filename becomes path/to/filename.

The third line resolves relative imports in the parent directory. There is no resolution for imports from the parent’s parent directory. We don’t need that, and one key for quick *nix tool drafting is to do what you need, not what someone might need sometime (or not).

The fourth line ('s,^(.*) -> (.*)$,\2 -> \1,') swaps the arguments to improve the graph layout for sfdp graphs: Graphs are drawn in reverse, and the arrows then pointed backwards so they look as if they were drawn forwards.

Now we have the edges:

    "foo/bar" -> "moo"
    "foo/bar" -> "blah"
    "moo" -> "foo/bar"

For styling we also need the nodes. We can do that with plain grep:

    grep -o '"[^ "]*"$' $TMPDIR/graphbody | sort -u > $TMPDIR/nodes

It’s time to combine them into graphviz format by using a shell-subprocess that automatically concatenates the outputs of the subcommands into one stdout stream:

    (echo 'digraph {'; # the parenthesis starts the subprocess
     echo '  rankdir=RL';
     echo '  graph [overlap=false splines=true]'; # cleaner lines
     echo '  node [shape=none fontsize=21]'; # just show the names
     cat $TMPDIR/nodes;
     echo '  node [fontsize=14]';
     echo '  edge [color="#aaaaaacc" dir="back"]'; # reverse the direction, see above
     cat $TMPDIR/graphbody;
     echo '}' ) > ${TMPDIR}/graph

Now you can graph all the dependencies of your project! Only one call to dot left and we’re done:

    dot -Tsvg -o/tmp/dependencies.svg ${TMPDIR}/graph

depgrapher realgraph

… except that if your project is bigger than foo/bar and baz, this graph tells you nothing (except that you miss spaghetti).

(note also that the non-existent file blah is typeset in a smaller font, so you can distinguish between your own code — or area of interest — and external dependencies)

So we need to make that graph actionable: Focus on a region of interest and highlight what we need. This gets into the next section:

subselect and beautify

We need to focus. Let’s say we want to see only the files imported by or importing moo. The simplest way to do that is to just grep in the edges:

    grep "moo" edges > filtered-edges

To highlight only the target file in the graph, let’s do the same for the nodes:

    grep "moo" nodes > filtered-nodes

So it’s filtered: a beautiful graph:

depgrapher filtered graph

But only for this toy example. In your real code, you might rather have files like cadenza/workbook/condition/date-time-condition/date-time-condition.ts which has 15 imports of similar length. Maybe you only want to see the filenames — like this:

depgraphher filtered basename graph

As before, the tool of choice is sed — generalized so you provide yourself with all its flexibility:

    ARG_GREP='moo'
    ARG_SED='s,"[^ ]*/,",g'
    cat $TMPDIR/graphbody | grep -E -- "${ARG_GREP}" | sed "${ARG_SED}" ;

And with this we are done! We can create useful graphs of the files and the target files, and they are assembled in less than a second — even if the Javascript-part of your codebase has over 200k LOC in over 2k files. And we can filter and export the graph to svg and open it in inkscape to investigate details.

Done — or almost done? Isn’t something missing?

Would you want to recall a huge command from history and teach your colleagues to use it?

Package it into a tool

The final step turns this into a program. We could rewrite it in C or in Rust or in Java with Graal.

Or we can package it as a script. For this, I turn to conf, an old project of mine that can create a instant autotools setup for bash-scripts (and some other languages).

Just initialize a folder, adjust the script, adapt the argument handling and automate away. First we need a name. I called it depgrapher because that’s what it does:

    conf new --vcs git depgrapher # bash is the default language

Now we need an API. I decided to be transparent and raw: The --grep-argument subselects, the --sed-argument beautifies. That way people who know grep and sed know what to expect. And for others it is just how it is. The help output now looks like this:

    $ depgrapher --help
    depgrapher [-h | --help] [-s | --sed <SED_EXPRESSION>] [-g | --grep <GREP>] WORKDIR

    Examples: 
    search for all files connected to paths that contain 'date-time-condition'
        depgrapher --grep 'date-time-condition'
    search for all files connected to paths that contain 'import'
        depgrapher --grep 'import'
    strip some path components:
        depgrapher --grep 'import' --sed 's,workbook/,,g;s,workbook-panel/,,g;s,import/,,g'
    only show the basenames:
        depgrapher --grep 'import' --sed 's,"[^ ]*/,",g'
    search for workbook AND conditions
        depgrapher --grep 'workbook.*conditions|conditions.*workbook'

I won’t repeat the implementation of commandline parsing because you can find it at DisyInformationssysteme/depgrapher, and it is as clean and conventional as I can make it in bash. And because it just does adjustments of the template that conf provides.

Then plug all those commandline commands into depgrapher.in and add a final call to inkscape. At the request of a colleague it later gained the ability to change the viewer. See README.md. Also you can select the graphviz command with --plot CMD, for example to use sfdp for large graphs.

(And yes, the implementation of argument handling is longer than the actual working code. Which isn’t hard, because the working code is just 17 lines. Three of which brimming with ripgrep, grep, sed, and regex.)

Now add your name in configure.ac, commit and push, and you’re done. A nice and small dependency grapher that outperforms most tools out there by doing as little as possible on its own and shelling out to existing tools wherever possible.

And you can install it with autoreconf -i; ./configure; sudo make install.

It’s free to use and adapt on Github under the Apache-2.0 License:

depgrapher - Create partial dependency graphs

A heartfelt thank you to Disy for letting me release the depgrapher as Free Software and for encouraging me to blog about the process!

Have fun depgraphing!

« Hash Collisions in Java Disy Hackathon 2022 »

Disy Tech-Blog

depgrapher: Fast Dependency Analysis for Javascript

Requirements

The graph

subselect and beautify

Package it into a tool