DeepSig has created a small corpus of standard datasets which can be used for original and reproducible research, experimentation, measurement and comparison by fellow scientists and engineers.
These datasets allow machine learning researchers with new ideas to dive directly into an important technical area without the need for collecting or generating new datasets, and allows for direct comparison to efficacy of prior work.
All datasets provided by Deepsig Inc. are licensed under the Creative Commons Attribution - NonCommercial - ShareAlike 4.0 License (CC BY-NC-SA 4.0). If an alternative license is needed, please contact us at email@example.com.
Please reference this page or our relevant academic papers when using these datasets.
DeepSig Dataset: RadioML 2018.01A (New)
A dataset which includes both synthetic simulated channel effects and over-the-air recordings of 24 digital and analog modulation types which has been heavily validated.
This dataset was used for Over-the-air deep learning based radio signal classification published 2017 in IEEE Journal of Selected Topics in Signal Processing, which provides additional details and description of the dataset.
Dataset Download: 2018.01.OSC.0001_1024x2M.h5.tar.gz
Data are stored in hdf5 format as complex floating point values, with 2 million examples, each 1024 samples long.
DeepSig Dataset: RadioML 2016.10A
A synthetic dataset, generated with GNU Radio, consisting of 11 modulations (8 digital and 3 analog) at varying signal-to-noise ratios. This dataset was first released at the 6th Annual GNU Radio Conference.
This represents a cleaner and more normalized version of the 2016.04C dataset, which this supersedes. The file is formatted as a "pickle" file which can be open for example in python by using cPickle.load(...).
Signal Generation Software: https://github.com/radioML/dataset
Dataset Download: RML2016.10a.tar.bz2
Larger Version (including AM-SSB): RML2016.10b.tar.bz2
Example ClassifierJupyter Notebook: RML2016.10a_VTCNN2_example.ipynb
DeepSig Dataset: RadioML 2016.04C
A synthetic dataset, generated with GNU Radio, consisting of 11 modulations. This is a variable-SNR dataset with moderate LO drift, light fading, and numerous different labeled SNR increments for use in measuring performance across different signal and noise power scenarios.
This dataset was used for the "Convolutional Radio Modulation Recognition Networks" and "Unsupervised Representation Learning of Structured Radio Communications Signals" papers, found on our Publications Page.
There are three variations within this dataset with the following characteristics and labeling:
Dataset Download: 2016.04C.multisnr.tar.bz2