The DM codifies each signal sample using a single bit, which is determined from the previous sample's value and the current one. The bit specifies whether the new sample is higher than the previous one, hence only describing the variation in the information (and not its contents). The resulting set of bits can be drawn as a staircase that approximates to the original signal, being 1 a rise of the stair and a 0 a descent of it.
Remember that the number of samples per second is determined by the Nyquist theorem.
For example, consider a standard 4KHz voice channel. You need to take 8000 samples per second to preserve the original signal. Given this and using the DM, you can codify the signal with 8000 bps or 8 Kbps. OTOH, if you were using PCM with 256 levels (8 bits each), the same signal could occupy 8000 * 8 bps or 64 Kbps.
It is clear that the DM needs to transmit a lot less data than PCM. But, as you can imagine, DM doesn't provide any quality: it does not adapt well to signals with sudden changes (such as a shouting voice), although it might work for monotonous voices. On the opposite side, PCM with 256 levels provides very good quality.
IIRC, DM was once used in the Japanese public telephony network but was soon dropped due to the poor results. (Sorry, couldn't find a link to demonstrate this.)