June 26 2025 version 0.438
LmdExplorer is a clone of midiexplorer see https://midiexplorer.sourceforge.io/ designed to work with the Lakh Midi Dataset (LMD-FULL). This is a collection of over 168,000 midi files that were scraped from the internet. This is a valuable data set for analysis. You can download LMD-FULL as well as md5_to_paths.json from the web page https://colinraffel.com/projects/lmd/
Compared with the lakh clean data set (also available from the above site), the LMD-FUll is much larger and contains some music genres missing from the smaller collection. Unfortunately, LMD-FULL is less organized and more difficult to deal with.
LmdExplorer also uses the data file associated with the paper MIDICAPS: A large-scale midi dataset with text captions. See https://dorienherremans.com/content/new-dataset-midicaps-large-scale-dataset-caption-annotated-midi-files for access to this paper and associated datasets. You can get the files associated with this paper from https://huggingface.co/datasets/amaai-lab/MidiCaps/tree/main The train.json file lists some informative and interpretive features associated with each of the valid files. A typical record looks like
{"location": "lmd_full/1/12697568a5c06fe9fd0bae8b715f57f0.mid", "caption": "A short classical and electronic piece that evokes a cinematic atmosphere, featuring a piano, violin, and string ensemble. The composition is in A major with a 3/4 time signature, progressing through a chord sequence of A, E7, A, Bm6, and F#m at a moderate tempo of 104 beats per minute.", "genre": ["classical", "electronic"], "genre_prob": [0.2291, 0.219], "mood": ["film", "melodic", "relaxing", "epic", "emotional"], "mood_prob": [0.1395, 0.0699, 0.0694, 0.0689, 0.0658],By the way, the midicaps.tar.gz is a copy of the LMD-FULL dataset, and it may be more convenient to download the file from this site.
The work with this above datasets can be viewed as experimental and was used to compare it with some of the features extracted by midiexplorer.
I recommend that you first try midiexplorer and the lakh clean dataset prior to working with lmdExplorer. Many of the features in lmdExplorer are only documented in https://midiexplorer.sourceforge.io/. This web page mainly documents the features that were introduced into lmdExplorer. You can find the above referenced paper by Jan Melechovsky, Abhinababa Roys, and Dorien Herremans in the ISMIR 2024 proceedings. https://ismir.net/conferences/ismir2024.html
The LMD-FULL datasets presents several unique challenges which we attempt to address in lmdExplorer. One of the most significant problems is that all of the midi files are named after their md5 checksum. 7717b0e2e566e90e3e01497401863e82.mid is a typical file name. Here is how a subfolder appears in midiexplorer.
Though the dataset comes with a translation file, md5_to_paths.json, that links the md5 checksum to the file names, many of the names are not informative. For example, c/stds9.mid is a typical file name. In other cases, the same checksum may map into 20 or more file names. It is necessary to find other ways to reference these files. Random selection always work. etc.
Here is an example of the lmdExplorer user interface.
The interface is divided into 3 horizontal blocks. The top block contains the usual menu items that you see in midiexplorer with a few minor exceptions. The middle block lists the midicaps attributes for the selected LMD-FULL file with a few minor omissions. The lower block contains the features that were extracted by lmdExplorer.
The second line in the third block shows the original file names that were associated with the md5 file name. This was extracted from the md5_to_paths.json. In this case, there are two possibilities and clicking the next button displays the alternate file name. The button labeled "Clean quantization" accesses the beat graph plot.
LmdExplorer, provides three methods to open a midi file in the LMD-FULL database. If you are willing to choose a random file, then clicking the "random pick" button will work. A second method is described here. The "find" menu button in midiexplorer was replaced with the "search title" button. Clicking that button produces a new window similar to below.
You enter the keyword to search in the entry box (here "cold") and click the scan button. If you select any entry in the listbox, the corresponding file will open automatically.
The third method uses the "database/search" menu button. It works the same way as in midiexplorer.
The midicaps database file contains the "all_chords" and "all_chords_timestamps" attributes which do not appear here. They have been combined into a new menu button "pitch analysis/allchords" which is found in the top block. The all_chords_timestamps are given in seconds. In order to compare the chords with the results presented by the menu button "pitch analysis/chordtext" and chordgram, it is necessary to convert the time in seconds to beat number (where a beat is equivalent to a quarter note). The conversion depends upon the tempo setting in the midi file. Some midi files have numerous tempo settings which complicates this conversion. The allchords function pops up a new window presenting the all_chords list and times in seconds and beats.
The results are still being evaluated.
The source code is written in tcl/tk and therefore you install tcl/tk version 8.5 or higher in order the source code. On Windows PC, you can download utable version which has the tcl/tk interpreter embedded.
LmdExplorer is mainly a user interface. It links to numerous free programs that do the work. They include midi2abc, midicopy, abc2midi, abcm2ps, ghostscript, an internet browser, and numerous midi players. Some of the programs such as midi2abc, midicopy, and abc2midi are part of the abcmidi package. Details on how to get these programs will be given later. As the installation of this program with its helper executables is none trivial, more details is given elsewhere; however, for the time being we shall assume that you already have this program running on your system or that you are interested in knowing what it does.
There is an important difference between midiexplorer and lmdExplorer. The name of the lakh midi database, lmd_full/ is built into the program. Therefore the root folder specifies the folder where lmd_full is found and not the path to lmd_full as was done for midiexplorer. Thus if lmd_full is in your home directory (eg. /home/seymour) the root folder would be /home/seymour. If you are running midiexplorer then you would pass the actual path to lmd_full, i.e. /home/seymour/lmd_full.
LmdExplorer will be looking for two files in the lmd_folder which does not come with the database. These two json files, midicaps_huggingface_train.json and md5_to_paths.json, are provided with the source code lmdExplorer web site. You should download those files and move them into the lmd_full folder in your system. For your information midicaps_huggingface_train.json is actually called train.json which you can download from https://huggingface.co/datasets/amaai-lab/MidiCaps/tree/main> I have intentionally changed the file name to something more meaningful in this application. Note that the lmd_full database is downloaded as a gzipped tar file in order to save space. Windows 11 may not provide the software that you need to unpack this file, or the software available may run very slow. If you need to unpack this file in the Windows 11 operating system, I recommend that you install 7-zip on your system. This software runs much faster.
LmdExplorer.tcl is a tcl/tk script which requires an interpreter to run this script. The Tcl syntax is somewhat similar to Perl. The language used to be popular 20 years ago. Tk is part of the language designed to create widgets for the user interface. In order to use this script, you need to install an interpreter. If you do not have tcl/tk on your system, check out https://www.tcl-lang.org/software/tcltk/. I do not recommend building the binaries from the source code unless you are good at this.
Unfortunately, there do not seem to be many places where you can get
the binaries for the Tcl/Tk interpreters for Windows. I found installers
on
https://sourceforge.net/projects/magicsplat/files/magicsplat-tcl/.
Pick one of the tcl-8.6 installers and download to your system.
It should probably go to your Download folder. Execute this .msi
file. You should eventually see the following window:
Click on the "advanced" button before proceeding to "Install".
In the next frame, I recommend that you install it for all Users.
This way, the application will be put in the Program Files directory,
rather than buried in the AppData/Local/... folder. You may need to
browse to this folder later.
For the rest of the install just follow the defaults.
Magicsplat expects lmdExplorer to have a tkapp extension in order
to start up. If you do not see the extension, you may need to untick
the folder option.
Be sure that "Show hidden files" is also ticked.
You can rename lmdExplorer.tcl to lmdExplorer.tkapp; however, to
maintain compatibility with other operating systems, I prefer to
stay with the tcl extension but instead change the file association.
To do this, right click on lmdExplorer.tcl and choose "Open with"
and then "Choose another application" and then choose "an application
on your PC". Then browse to the folder "C:\Program Files\Tcl86\bin"
and select wish.exe.
LmdExplorer uses the midiexplorer_home folder to store initiation
files (lmdexplorer.ini) and some work files (md5Index.txt and
midicapsIndex.txt). Temporary files such as tmp.mid is also written
there.
In order to prevent lmdExplorer from using too much RAM memory, the
contents of the midicaps_huggingface_train.json and the md5_to_paths.json
are not stored in memory but read whenever they are needed. The procedure
lmdInit is called when the program starts. This procedure, loads or
creates an index to the midicaps_hugginface_train.json file so
that the program has random access to a particular line. It takes about
15 seconds to create the index, so it is save in a file called
midicapsIndex.txt. A similar index is created for the md5_to_paths.json,
but since it is much smaller not much time is saved by reading it
from a disk file. Nevertheless it is stored in md5Index.txt in order
to check that the program is working properly.
There is unfortunately a lot more work that can be done.
The time to create a new database (accessible under database/create database)
takes 2000 seconds on my Windows 11 system, but only 700 seconds on my
VirtualBox Linux system. The output file MidiDescriptors.txt should not
depend upon the operating system so that it could be distributed on
the internet.
The MIDICAPS FACTORS/ all_chords and all_chords_timestamps should produce
graphical output so that it could be compared with the view/chordgram
function.
It would be useful to also search the lmd_full collection using the
MIDICAPS genre, genre_prob values, mood, mood_prob, and possibly other
factors.
Migration of midiexplorer to lmdExplorer is not complete. I have
not checked some other features under the database menu.
Implementation
Future Work