Table of Contents

June 26 2025 version 0.438

lmdExplorer

Overview

LmdExplorer is a clone of midiexplorer see https://midiexplorer.sourceforge.io/ designed to work with the Lakh Midi Dataset (LMD-FULL). This is a collection of over 168,000 midi files that were scraped from the internet. This is a valuable data set for analysis. You can download LMD-FULL as well as md5_to_paths.json from the web page https://colinraffel.com/projects/lmd/

Compared with the lakh clean data set (also available from the above site), the LMD-FUll is much larger and contains some music genres missing from the smaller collection. Unfortunately, LMD-FULL is less organized and more difficult to deal with.

LmdExplorer also uses the data file associated with the paper MIDICAPS: A large-scale midi dataset with text captions. See https://dorienherremans.com/content/new-dataset-midicaps-large-scale-dataset-caption-annotated-midi-files for access to this paper and associated datasets. You can get the files associated with this paper from https://huggingface.co/datasets/amaai-lab/MidiCaps/tree/main The train.json file lists some informative and interpretive features associated with each of the valid files. A typical record looks like

{"location": "lmd_full/1/12697568a5c06fe9fd0bae8b715f57f0.mid",
 "caption": "A short classical and electronic piece that evokes a 
cinematic atmosphere, featuring a piano, violin, and string ensemble.
The composition is in A major with a 3/4 time signature, progressing
through a chord sequence of A, E7, A, Bm6, and F#m at a moderate tempo
of 104 beats per minute.", "genre": ["classical", "electronic"],
 "genre_prob": [0.2291, 0.219], "mood": ["film", "melodic", "relaxing",
 "epic", "emotional"], "mood_prob": [0.1395, 0.0699, 0.0694, 0.0689, 0.0658],
By the way, the midicaps.tar.gz is a copy of the LMD-FULL dataset, and it may be more convenient to download the file from this site.

The work with this above datasets can be viewed as experimental and was used to compare it with some of the features extracted by midiexplorer.

I recommend that you first try midiexplorer and the lakh clean dataset prior to working with lmdExplorer. Many of the features in lmdExplorer are only documented in https://midiexplorer.sourceforge.io/. This web page mainly documents the features that were introduced into lmdExplorer. You can find the above referenced paper by Jan Melechovsky, Abhinababa Roys, and Dorien Herremans in the ISMIR 2024 proceedings. https://ismir.net/conferences/ismir2024.html

The LMD-FULL datasets presents several unique challenges which we attempt to address in lmdExplorer. One of the most significant problems is that all of the midi files are named after their md5 checksum. 7717b0e2e566e90e3e01497401863e82.mid is a typical file name. Here is how a subfolder appears in midiexplorer.

md5 file names

Though the dataset comes with a translation file, md5_to_paths.json, that links the md5 checksum to the file names, many of the names are not informative. For example, c/stds9.mid is a typical file name. In other cases, the same checksum may map into 20 or more file names. It is necessary to find other ways to reference these files. Random selection always work. etc.

New Features

Here is an example of the lmdExplorer user interface.

lmd Interface

The interface is divided into 3 horizontal blocks. The top block contains the usual menu items that you see in midiexplorer with a few minor exceptions. The middle block lists the midicaps attributes for the selected LMD-FULL file with a few minor omissions. The lower block contains the features that were extracted by lmdExplorer.

The second line in the third block shows the original file names that were associated with the md5 file name. This was extracted from the md5_to_paths.json. In this case, there are two possibilities and clicking the next button displays the alternate file name. The button labeled "Clean quantization" accesses the beat graph plot.

LmdExplorer, provides three methods to open a midi file in the LMD-FULL database. If you are willing to choose a random file, then clicking the "random pick" button will work. A second method is described here. The "find" menu button in midiexplorer was replaced with the "search title" button. Clicking that button produces a new window similar to below.

search name window

You enter the keyword to search in the entry box (here "cold") and click the scan button. If you select any entry in the listbox, the corresponding file will open automatically.

The third method uses the "database/search" menu button. It works the same way as in midiexplorer.

The midicaps database file contains the "all_chords" and "all_chords_timestamps" attributes which do not appear here. They have been combined into a new menu button "pitch analysis/allchords" which is found in the top block. The all_chords_timestamps are given in seconds. In order to compare the chords with the results presented by the menu button "pitch analysis/chordtext" and chordgram, it is necessary to convert the time in seconds to beat number (where a beat is equivalent to a quarter note). The conversion depends upon the tempo setting in the midi file. Some midi files have numerous tempo settings which complicates this conversion. The allchords function pops up a new window presenting the all_chords list and times in seconds and beats.

allchords output

The results are still being evaluated.

Installation and Setup

The source code is written in tcl/tk and therefore you install tcl/tk version 8.5 or higher in order the source code. On Windows PC, you can download utable version which has the tcl/tk interpreter embedded.

LmdExplorer is mainly a user interface. It links to numerous free programs that do the work. They include midi2abc, midicopy, abc2midi, abcm2ps, ghostscript, an internet browser, and numerous midi players. Some of the programs such as midi2abc, midicopy, and abc2midi are part of the abcmidi package. Details on how to get these programs will be given later. As the installation of this program with its helper executables is none trivial, more details is given elsewhere; however, for the time being we shall assume that you already have this program running on your system or that you are interested in knowing what it does.

There is an important difference between midiexplorer and lmdExplorer. The name of the lakh midi database, lmd_full/ is built into the program. Therefore the root folder specifies the folder where lmd_full is found and not the path to lmd_full as was done for midiexplorer. Thus if lmd_full is in your home directory (eg. /home/seymour) the root folder would be /home/seymour. If you are running midiexplorer then you would pass the actual path to lmd_full, i.e. /home/seymour/lmd_full.

LmdExplorer will be looking for two files in the lmd_folder which does not come with the database. These two json files, midicaps_huggingface_train.json and md5_to_paths.json, are provided with the source code lmdExplorer web site. You should download those files and move them into the lmd_full folder in your system. For your information midicaps_huggingface_train.json is actually called train.json which you can download from https://huggingface.co/datasets/amaai-lab/MidiCaps/tree/main> I have intentionally changed the file name to something more meaningful in this application. Note that the lmd_full database is downloaded as a gzipped tar file in order to save space. Windows 11 may not provide the software that you need to unpack this file, or the software available may run very slow. If you need to unpack this file in the Windows 11 operating system, I recommend that you install 7-zip on your system. This software runs much faster.

LmdExplorer.tcl is a tcl/tk script which requires an interpreter to run this script. The Tcl syntax is somewhat similar to Perl. The language used to be popular 20 years ago. Tk is part of the language designed to create widgets for the user interface. In order to use this script, you need to install an interpreter. If you do not have tcl/tk on your system, check out https://www.tcl-lang.org/software/tcltk/. I do not recommend building the binaries from the source code unless you are good at this.

Unfortunately, there do not seem to be many places where you can get the binaries for the Tcl/Tk interpreters for Windows. I found installers on https://sourceforge.net/projects/magicsplat/files/magicsplat-tcl/. Pick one of the tcl-8.6 installers and download to your system. It should probably go to your Download folder. Execute this .msi file. You should eventually see the following window:

magicsplat install

Click on the "advanced" button before proceeding to "Install". In the next frame, I recommend that you install it for all Users.

magicsplat install

This way, the application will be put in the Program Files directory, rather than buried in the AppData/Local/... folder. You may need to browse to this folder later.

magicsplat install

For the rest of the install just follow the defaults.

Magicsplat expects lmdExplorer to have a tkapp extension in order to start up. If you do not see the extension, you may need to untick the folder option.

folder options

Be sure that "Show hidden files" is also ticked. You can rename lmdExplorer.tcl to lmdExplorer.tkapp; however, to maintain compatibility with other operating systems, I prefer to stay with the tcl extension but instead change the file association. To do this, right click on lmdExplorer.tcl and choose "Open with" and then "Choose another application" and then choose "an application on your PC". Then browse to the folder "C:\Program Files\Tcl86\bin" and select wish.exe.

Implementation

LmdExplorer uses the midiexplorer_home folder to store initiation files (lmdexplorer.ini) and some work files (md5Index.txt and midicapsIndex.txt). Temporary files such as tmp.mid is also written there.

In order to prevent lmdExplorer from using too much RAM memory, the contents of the midicaps_huggingface_train.json and the md5_to_paths.json are not stored in memory but read whenever they are needed. The procedure lmdInit is called when the program starts. This procedure, loads or creates an index to the midicaps_hugginface_train.json file so that the program has random access to a particular line. It takes about 15 seconds to create the index, so it is save in a file called midicapsIndex.txt. A similar index is created for the md5_to_paths.json, but since it is much smaller not much time is saved by reading it from a disk file. Nevertheless it is stored in md5Index.txt in order to check that the program is working properly.

Future Work

There is unfortunately a lot more work that can be done.

The time to create a new database (accessible under database/create database) takes 2000 seconds on my Windows 11 system, but only 700 seconds on my VirtualBox Linux system. The output file MidiDescriptors.txt should not depend upon the operating system so that it could be distributed on the internet.

The MIDICAPS FACTORS/ all_chords and all_chords_timestamps should produce graphical output so that it could be compared with the view/chordgram function.

It would be useful to also search the lmd_full collection using the MIDICAPS genre, genre_prob values, mood, mood_prob, and possibly other factors.

Migration of midiexplorer to lmdExplorer is not complete. I have not checked some other features under the database menu.

This page was last updated on June 02 2025