AUDIO FILE FORMAT RESOURCE GUIDE (Version 1.1)
by Dave Huizing
1 TABLE OF CONTENTS
2 GENERAL INFORMATION
2.1 Foreword
2.2 Printed Version
2.3 Copyrights
2.4 Disclaimer
2.5 Contributrors
3 TX WAVE FORMAT
4 YAMAHA TYPHOON WAVE FILE FORMAT
4.1 DWVW v1.2 compression
4.2 DWVW sample delta bit frame
5 D009
5.1 The D00 header
5.2 The Instrument data
5.3 The SpFX data
5.4 The Arrangement data
5.5 The Sequence data
6 MIDI SAMPLE DUMP STANDARD
6.1 INTRODUCTION
6.2 SPEC: SAMPLE DUMP FORMATS
6.3 SPEC: SAMPLE DUMP MESSAGES
6.4 HANDSHAKING MESSAGES:
6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)
6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)
6.7 SDS OVERVIEW
7 ROL
7.1 Structure of .ROL files
7.2 Notes
8 8SVX
8.1 FORMblock [VHDR]
8.2 FORMblock [BODY]
9 AIFF
10 AU
11 FSM
12 GF1 PATCH
13 S3I
14 UWF
15 WAVE
15.1 RiffBLOCK [data]
15.2 RiffBLOCK [fmt ]
15.3 RiffBLOCK [loop]
16 ZYXEL
17 CREATIVE LABS FILE FORMATS
17.1 Sound Blaster Instrument File Format (SBI)
17.2 Creative Music File Format (CMF)
17.3 The CMF Instrument Block
17.4 The CMF Music Block
17.5 Sound Blaster Instrument Bank File Format (IBK)
18 CREATIVE VOICE (VOC) FILE FORMAT
19 REVISION HISTORY
2 General information
2.1 Foreword
I started to compile this document after I thought there was a need
for
it. By surfing all around the web I collected these descriptions
and
brought them to this document.I have planed to keep this document
updated
so if theres any file format description thats not in this document,
or
you have any comments on this document please send me an email
message at:
stallion@worldonline.nl.
Happy developping,
Dave Huizing
2.2 Printed Version
If you need a printed version send an email.
2.3 Copyrights
Only the title and the compilation is copyrighted by Dave Huizing.
As far
as I know all this information is free for use. See the disclaimer
part
for more details. All trademarks, technical information and file
extensions belong to their respectfull owners.
2.4 Disclaimer
This document is provided on a as is base. The information has been
verified as far as possible, but I cannot be held responsible for
any
problems caused by use or misuse of the information. All due I
think I
wont happen I am also not responseble for any damage to any knid
of
computer system after or while
using parts form this documentation. Use this document on your
own risk.
2.5 Contributrors
Dave Huizing, stallion@worldonline.nl
DJ, Producer, DTP designer, etc
muki pakesch, mpakesch@t0.or.at
Maintainer of the TX16W mailinglist
Markus - Jvnsson , f93-maj@nada.kth.se
Author of the Awave sample convertor
3 TX Wave Format
The file consists of a 32 byte header followed by the actual waveform
(the
first 16 bytes only identifies the file type). In C syntax
the header
would look like this:
char filetype[6] = "LM8953"
char nulls[10]
char dummy_aeg[6]
space for the AEG (never mind this
char format
0x49 = looped, 0xC9 = non-looped
char sample_rate
1 = 33 kHz, 2 = 50 kHz, 3 = 16 kHz
char atc_length[3]
I'll get to this...
char rpt_length[3]
char unused[2]
set these to null, to be on the safe side
The "atc_length" and "rpt_length" fields are quite complex.
First of all
you should know that there is no such thing as a looping point
in a TX
wave. Instead a wave is split into two parts, the attack part and
the
repeat part (of course the actual wave data isn't split, this is
just a
logical definition). As you might guess, the attack part
is played first
and the repeat part is looped until the key is released. Each of
these
parts are limited to a maximum of 128k words in length. That is
the reason
why waves can't be longer than 256k words (4096 blocks).
The length of a part is stored LSB first (Intel). And only
the least
significant _bit_ of the third byte (bit 0) is used (representing
the most
significant bit of the length). Are you confused yet? Then
hold your
breath. It seems that Yamaha has chosen to squeeze in the sample
rate(!)
of the wave in the unused _bits_ of these last bytes. Although
they
already have a separate byte for the sample rate, this isn't enough.
I
won't go into details on this now (or you would be even more confused).
You only need to know that the possible values are:
0x06, 0x52 = 33 kHz
0x10, 0x00 = 50 kHz
0xF6, 0x52 = 16 kHz
(The first value is located in byte three of "atc_length" and the
second
value is located in byte three of "rpt_length".) To wrap it up,
this is
the format of the two length fields on a bit level:
[0]
[1]
[2]
atc_length
AAAAAAAA
BBBBBBBB
DDDDDDDC
rpt_length
EEEEEEEE
FFFFFFFF
HHHHHHHG
A
LSB of the attack length
B
MSB of the attack length (except for one bit)
C
the utterly most significant _bit_ of the attack length
D
the first value of the magic sample rate constant (0x06,
0x10 or 0xF6)
E
LSB of the repeat length
F
MSB of the repeat length (except for one bit)
G
the utterly most significant _bit_ of the repeat length
H
the second value of the magic sample rate constant (0x52,
0x00)
Now for the most important (and probably most interesting) part.
The
waveform data. As you certainly know the TX uses 12-bit sampling
resolution, and this requires some kind of encoding if we are not
willing
to waste one fourth of our disk space. Yamaha has chosen
to group the
samples two by two, making three bytes of data in the file for
each pair.
I'll illustrate this on a bit level (as with the lengths above):
AA CD BB
A
MSB of the first sample
B
MSB of the second sample
C
least significant nybble (oh, is that the correct spelling?)
of the first
sample
D
least signiticant nybble of the second sample
4 Yamaha Typhoon wave file format
This specification describes the compression algorithm for Typhoon
format
waves. It does not cover the file format, which is AIFF-C. The
documentation for AIFF-C is available at the site ftp.sgi.com in
the
directory /sgi/aiff-c.9.26.91.ps.Z (compressed Postscript file).
4.1 DWVW v1.2 compression
DWVW was invented 1991 by Magnus Lidstrom and is copyright 1993
by NuEdge
Development. You have the right to use the algorithm freely as
long as you
make no false claims on its origin. DWVW is a lossless (or bit
faithful)
compression method for digital audio data. Lossless means that
the exact
original data will be preserved when compressing and decompressing.
The compression utilize the fact that the delta between the sample
points
is generally less than the full dynamic width. Each sample point
is
subtracted from the previous one and the difference is enthropy
encoded in
a special format. Therefore the compression works best on low frequency
sounds with low noise ratio, where the difference between each
sample is
small.
DWVW can be applied on samples of any bit resolution and with any
number
of channels. As opposed to AIFF standard, sample bits are not "left
justified". Instead the necessary translation should be done when
decompressing. Also, while AIFF interleaves multichannel sounds,
DWVW
doesn't as this complicates compression and decompression.
Each channel follows one another with only a slight break in the
bit run.
The first delta for each channel should be put at an even 16-bit
word
position. The encoding stores the delta points with only as many
bits as
is required (hence the name "variable word width").
Thus, the number of bits used by each delta has to be stored as
well.
Since this count varies very little we apply a (simpler) delta
encoding on
this information.
To wrap it up, each compressed sample point consists of two values:
the
delta from the last sample and the difference in word width of
this delta
from the last delta (hereby referred to as "the WWM" - the word
width
modifier).
Even though the word width modifier is stored first in each delta
frame we
will describe the delta information first. The delta is always
stored as
an absolute difference (i.e. unsigned) in a varible number of bits.
An
extra bit follows that tells the sign (if the delta isn't zero).
The
number of bits required for the delta (i.e. the word width) is
decided by
the position of the most significant high bit in the absolut value.
One
bit less than this is actually stored since the first bit is always
high.
For instance, the delta 11 (binary 1011) has a required word width
of four
bits ,but only the least significant three bits are stored. A zero
delta
will have a zero word width and consequently requires neither delta
bits
nor sign bit. A delta of one will require only a sign bit.
One special case requires attention. A normal two's complement number's
lowest negative number is one less than the highest positive number.
Treating zero as a positive value this gives exactly as many negative
as
positive numbers. The delta encoding on the other hand does not
consider
zero to be of any sign and does therefore not include the one extra
negative value. If this value is encountered in the delta stream
it is
encoded as one greater than it actually is (putting it within the
expressable range of values).
To distinguish it from the next lowest value one extra bit is inserted
after the sign bit. The bit is high for the lowest value and low
for the
next lowest value.
For example, a 16-bit two's complement number can be -32768. It
would be
encoded as negative 32767 with an extra high bit. The value - 32767
would
also be encoded as negative 32767 but with the extra bit low. Of
course,
only these two values require the extra bit.
The WWM preceeds the delta bits. It is encoded as a series of low
bits (0)
terminated by a high bit (1) (in most cases). The count of low
bits tells
the modifier amount. If the modifier isn't zero an extra bit follows
that
tells the modifier sign. A high bit means negative modifier. Word
width
"wraps" at the used bit resolution (new-width =3D (original-width
+
modifier) modula bit- resolution).
This enables us to go from a small width to a large width by using
a
negative modifier. Because of this fact a WWM will never need to
be larger
than the sound bit resolution divided by two (rounded downwards).
If the
modifier is the maximum the terminating high bit would be superfluous,
so
in this case it isn't inserted. (However; the sign bit is always
included,
even if the bit resolution is even.)
For encoding the current word width and sample value should be initially
reset to zero for each channel (the first delta will thus be the
sample
value). A compressed channel always starts on an even 16-bit word
boundary. Notice that the highest possible compression ratio is
eight
times, i.e. one bit per sample. This occurs when the source is
continous
series of zero samples.
4.2 DWVW sample delta bit frame:
0...
WWM is the count of low bits (can be none)
1
terminating high bit (if not max W=WM)
ms
WWM sign, high is negative (only on non-zero WWM)
delta
(word width - 1) sample delta bits (if delta 1)
sb
delta sign bit (only on non-zero delta)
xb
extra bit (only on lowest and next lowest possible delta value)
Some encoding examples (the examples all represent extreme situations
with
unusually poor
compression):
Bit resolution
16
Delta
923 (bin 00000011 10011011=)
Current width
1
New width
10
Modifier
-7 (mod 16 =3D 10)
Yields
0000000 1 1 110011011 0
Bit resolution
12
Delta
-2048 (bin 1000 00000000)
Current width
0
New width
11
Modifier
-1 (mod 12 =3D 11)
Yields
0 1 1 1111111111 1 1
(-2048 is encoded as 2047 with extra bit and negative high)
Bit resolution
8
Delta
-12 (bin 11110100, negated 00001100)
Current width
0
New width
4
Modifier
+4
Yields
0000 0 100 1 (no terminating bit for WWM)
5 D00
This part describes the D00 music format (used by the AdLib player
v4.01
coded by JCH/Vibrants) in more detail than the docs of EdLib (the
respective tracker, also coded by JCH) do. This document assumes
that you
already own EdLib and have some experience with it. Also, the availability
of the EdLib docs as well as of the docs for the player included
with
EdLib is assumed. You should know some basics about AdLib programming
and
data formats (byte, word etc.) as well as the EdLib structures
(Instruments, SpFX etc.) and with hexadecimal notation.
5.1 The D00 header
A description of the D00 header can be found in the player's docs.
So I
won't show it again here. But JCH gives very cryptic names to the
other
file structures, so I'll call them differently:
JCH's names
My names
TPoin tables
Arrangment data
SeqPointer tables
Sequence data
Instrument data
Instrument data
DataInfo text
Song description
Special tables
SpFX data
Also, I should mention that all the pointers to these tables are
meant
relative to the beginning of the D00 file.
5.2 The Instrument data
The instrument data simply consists of all instruments used in the
song.
Since the number of instruments is stored nowhere inside the file,
loaders
should the start offset of the next structure for determining if
they have
read enough data. The data for each instrument consists of 16 bytes,
which
occur in the same order as the corresponding bytes in the EdLib
Instrument
table:
xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx
+------------+ +------------+ & &
& & & &
Carrier data Modulator data &
& & & +---Unused
& & & +Hard restart SR value
& & +Hard restart timer
& +Fine-tune
+AM/FM + Feedback
For the exact meaning of these bytes, read the EdLib manual. Note
that in
the Carrier and Modulator data the ADSR parts are not stored
word-oriented, but byte-oriented. That means, they aren't stored
as a word
whose High byte is the AD part and whose Low byte is the SR part
(although
the display in EdLib creates that assumption).
Instead they're simply stored as two bytes of which the first one's
the AD
part and the second one's the SR part. 5.3 The SpFX data The SpFX
data ist
stored more or less like the Instrument data, but one single table
entry
consists of only 8 bytes arranged like this:
xxxx xx xx xx xx xxxx (note xx's are BYTES and xxxx's are
WORDS!)
& & &
& & &
& & &
& & +Pointer to next SpFX entry
& & &
& +Duration of SpFX entry in Frames
& & &
+Modulator Level add
& & +New Modulator
level
& +Note add value
+Instrument to use
Again, to really understand the meaning of these parts, you should
read
the EdLib docs.
5.4 The Arrangement data
The arrangement data determines which sequence is to be played on
which
channel at which moment and in which way, if you understand what
I mean :)
It consists of two parts: The Pointer part and the Data part (I
simply
call them that way now :). The Pointer part consists of 16 word
pointers
and one endmark (all endmarks are FFFFh, by the way). Only the
first nine
pointers are used at the moment: one for each one of the nine AdLib
channels. Each one of these nine pointers points to the part of
the Data
part which belongs to its channel. The Data part consists, as you'd
have
guessed before, of nine independent arrangement streams. Each one
of tese
streams has the following format:
First comes a word telling the speed of that stream. Since this
information is stored at the beginning of EVERY stream, I assume
that
every channel may have its own unique speed, and EdLib simply doesn't
support this.
After that, the real arrangement data is stored. This data is organized
like this: If a word below 8000h is read, it's the number of a
sequence to
be played. In that case, the saved transpose data is used.
But if a word 8XYYh is read, with X and YY being any value, the
transpose
data is updated to X and YY (see the EdLib docs for information
on the
meaning of X and YY).
I have found out that the first arrangement entry for an arrangement
stream that contains at least one sequence is always such a command
to set
the
internal transpose data. So no default value is required to be
loaded into
the transpose data before playing. And looping the arrangement
stream
becomes easier.
If the word FFFFh is read, the arrangement stream has arrived at
its
looping point. The word following the FFFFh is an offset into the
arrangement stream telling at which position the stream should
be
restarted. If the word FFFEh is read, the arrangement stream has
reached
its end. Unlike the Loop command (FFFFh), the stream mustn't get
restarted
but halted. Also, there is no word following the FFFEh command.
5.5 The Sequence data
The Sequence data again consists of a pointer part and a data part.
But
this time these two parts aren't stored in different parts of the
file,
the data part is stored directly after the pointer part. Therefore,
a
reference to a specific pattern should be seen as a reference to
a word
counted from the beginning of the Sequence data.
This word (e.g. the first word for Pattern 0000h) then points to
the
offset of the actual sequence data inside the file. I hope you
got my
point... Then, each sequence is stored as follows: Read a word.
If it's
high byte is below 20h, then it's a note. Note that RESTs and HOLDs
are
also counted as notes. In this case, the low byte can contain the
following values:
00h = REST
The high byte tells the number of rests to insert minus one! e.g.
a REST
with a high byte of 01h means "Two RESTs"
01h - 7Dh = Note
The value of this note byte tells the amount of halfnotes to add
to C-0
(e.g. 01h would mean C#0). In this case, the high byte tells the
number of
HOLDs to insert after the note.
7Fh = HOLD
The high byte tells the number of HOLDs minus one again!
If the high byte is 20h or above, but below 40h, it's a note again,
but
this time with Tienote switched on. The high word is used as repetition
count again, but don't forget to substract 20h before evaluating
it!!
If the high bzte is 40h or above, it's an effect. In this case,
the
complete word can simply be interpreted like any EdLib effect (set
instrument, set volume etc.). See the EdLib docs for a list of
them.
The note word this effect refers to follows directly after the ceffect
word.
If the read word is FFFFh, it indicates the end of that sequence.
In that
case, the next sequence to be played should be determined and loaded
and
the first effect/note of it should be played.
6 MIDI SAMPLE DUMP STANDARD
6.1 INTRODUCTION
The MIDI SDS was adopted in January
1986 by the MIDI
Manufacturers Association and the Japanese MIDI Standards
Committee. The
SDS defines the standard method for transfer of sound sample
data
between MIDI-equipped devices. Sample dumps may be
accomplished with
either an 'open loop' or 'closed loop' system.
The open loop method simply involves the straight dump of all sample
data
from its source to the destination, with no timeouts, packet
acknowledgements, or any other form of handshaking, much as in
the manner
of a sysex bulk dump, usually intiated at the source.
The closed loop method allows the use of handshaking messages between
the
dump source and destination, and usually places
the dump process
under the control of the slave, to allow it time to process
the incoming
data as necessary. As with any standard, it can not
be assumed that a
device adheres to it unless the accompanying documentation specifically
indicates it. Even then, it is best to check its conformity with
non-critical data.
6.2 SPEC: SAMPLE DUMP FORMATS
DUMP HEADER: F0 7E cc 01 ss ss ee ff ff ff gg gg gg hh hh hh ii
ii ii jj
F7
cc
channel number
ss ss
sample number (LSB first)
ee
sample format (number of significant bits; 8->28)
ff ff ff
sample period (1/sample rate) in nanoseconds (LSB first)
gg gg gg
sample length, in words
hh hh hh
sustain loop start point (word number) (LSB first)
ii ii ii
sustain loop end point (word number) (LSB first)
jj
loop type (00:forwards only; 01:alternating)
DATA PACKET: F0 7E cc 02 kk <120 bytes> mm F7
cc
channel number
kk
running packet count (00->7F)
mm
checksum (XOR of 7E, cc, 02, kk <120 bytes>)
The total size of a data packet is 127 bytes.
This is to avoid overflow
of the MIDI input buffer of a device that may want to receive an
entire
packet before processing it. A data packet consists of its own
header, a
packet number, 120 bytes of data, a checksum, and an EOX.
The packet
number begins at 00 and increments with each new packet.
It resets to 00
after it reaches 7F, and continues counting.
The packet number is used by the receiver to distinguish between
a new
data packet, or a resend of a previous packet.
The packet number is
followed by 120 bytes of data, which form 60, 40, or
30 words (MSB first
for multiword samples), depending on the length of a single
data sample.
Each data byte hold seven bits, with the msb in each byte
set to 0, in
order to conform to the requirements of MIDI data transmission.
Information is left justified within the 7-bit bytes, and
unused bits are
filled with 0. Example: Assume a data point in the
memory of a 16-bit
sampler, with the value 87E5. In binary, that would be:
1000 0111 1110 0101
and would be encoded as the following MIDI data stream:
01000011 01111001 00100000
The checksum is the running XOR of all the data after the
SYSEX byte, up
to but not including the checksum itself.
6.3 SPEC: SAMPLE DUMP MESSAGES
DUMP REQUEST: F0 7E cc 03 ss ss F7
cc
channel number
ss ss
sample number requested (LSB first)
Upon receiving the request, the sampler checks the sample
number to see
if it is within legal range. If it is not,
the request is ignored.
If it is, the sample dump is started. One packet at a time is sent,
under
control of the handshaking messages outlined below.
6.4 HANDSHAKING MESSAGES:
For all below:
cc
channel number
pp
packet number
Packet numbers are included in the
handshaking messages to
accomodate machines that have the intelligence to re-transmit specific
packets after an entire dump is finished, or if
synchronization is
lost.
ACK
F0 7E cc 7F pp F7
Means last packet was recieved correctly
(checksum OK, etc), please
send next one. Packet number is packet being acknowledged
as correct.
NAK
F0 7E cc 7E pp F7
Means last packet not received correctly,
please send again. Packet
number is packet being rejected.
CANCEL
F0 7E cc 7D pp F7
Means abort dump immediately. Packet number is packet
on which abort
occurs.
WAIT
F0 7E cc 7C pp F7
Means pause dump indefinitely, until next message is sent. Allows
the
unit recieving the dump to perform other functions (disk access,
etc),
before receiving the remainder of the dump. The next message
it sends (eg
ACK, ABORT) will determine if the dump continues or aborts.
6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)
Once a dump has been requested, either via MIDI or through
the front
panel, the DUMP HEADER is sent.
After sending the header, the master must time out for
at least two
seconds, to allow the receiver to decide if it will accept this
sample
(has enough memory, etc).If it receives a CANCEL, within
this time,
it should abort immediately.
If it receives an CAK, it will start sending packets
immediately. If it
receives a WAIT, it pauses until another message is received,
and then
processes that mesage normally. If nothing
is recieved within the
timeout, an open loop is assumed, and the dump starts with the
first
packet.
After sending each packet, the master should time
out for at least 20
milliseconds and watch its MIDI In.
If an ACK is received, it sends the next packet
immediately. If it
receives an NAK, and the packet number matches the number
of the last
packet sent, it resend that packet If
the packet numbers don't
match, and
the device is incapable of sending packets out of order, the NAK
will be
ignored.
If a WAIT is received, the master should watch its MIDI In
port
indefinitely for another ACK, NAK, or CANCEL message, which
it should
then process normally.
If no messages are received within
20 milliseconds of the
transmission of a packet, the master
may assume an open loop
configuration, and send the next packet.
This process continues until there are less than 121 data bytes
to send.
The final packet will still consist of 120n bytes, regardless of
how
many significant bytes actually remain, and the unused bytes will
be
filled with zeroes. The receiver should
handshake after receiving
the last packet.
6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)
When receiving a sample dump, a device should keep a running checksum
during reception. If its checksum matches the checksum in the data
packet, it will send an ACK and wait for the next packet.
If it does not match, it will send
an NAK containing the number of
the packet that caused the error, and wait for the next packet.
If, after
sending an NAK, the packet number of the next packet doesn't match
the
previous packet number (the one that was NAK'd), and
the unit is not
capable of accepting packets out of order, the error is
ignored and the
dump continues as if the checksums had matched.
If a receiver runs out of memory before the dumpo is completed,
it should
send a CANCEL to stop the dump.
6.7 SDS OVERVIEW
DUMP DATA FORMAT: DUMP HEADER
Sysex
ID: Universal Non-Real Time
Channel Number
Sub ID: Header
Sample Number (2 bytes, LSB first)
Sample Format
Sample Period (3 bytes, LSB first)
Sample Length (3 bytes, LSB first)
Sustain Loop Start Point (3 bytes, LSB first)
Sustain Loop End Point (3 bytes, LSB first)
Loop Type
Eox
SAMPLE DUMP DATA FORMAT: DATA PACKET
Sysex
ID: Universal Non-Real Time
Channel Number
Sub ID: Data Packet
Packet Number
Sample Data (120 bytes)
Checksum
Eox
SAMPLE DUMP MESSAGES: DUMP REQUEST
Sysex
ID: Universal Non-Real Time
Channel Number
Sub ID: Dump Request
Sample Number (2 bytes, LSB first)
Eox
SAMPLE DUMP MESSAGES: HANDSHAKING FLAGS:
Sysex
ID: Universal Non-Real Time
Channel Number
Sub ID: ACK or NAK or CANCEL or WAIT
Packet Number
Eox
7 ROL
This part contains details of .ROL files used by AdLib and compatible
cards on PC It is also used by Visual Composer (TM).
7.1 Structure of .ROL files:
fld #
size
(bytes)
type
description
1
2
int
file version, major
2
2
int
file version, minor
3
40
char
unused
4
2
int
ticks per beat
5
2
int
beats per measure
6
2
int
editing scale (Y axis)
7
2
int
editing scale (X axis)
8
1
char
unused
9
1
char
0 = percussive mode
1 = melodic mode
10
90
char
unused
11
38
char
filler
12
15
char
filler
13
4
float
basic tempo
Field 14 indicates the number of times to repeat fields 15 and 16:
fld #
size
type
description (bytes)
14
2
int
number of tempo events
15
2
int
time of events, in ticks
16
4
float
tempo multiplier (0.01 - 10.0)
The remaining fields (17 to 34) are to be repeated for each of 11
voices:
fld #
size
type
description (bytes)
17
15
char
filler
18
2
int
time (in ticks) of last note +1
Repeat the next two fields (19 and 20) while the summation of field
20 is
less than the value of field 18:
fld #
size
type
description (bytes)
19
2
int
note number: 0 => silence from 12 to 107 => normal note (you
must subtract 60 to obtain the correct value for the sound driver)
20
2
int
note duration, in ticks
21
15
char
filler
Field 22 indicates the number of times to repeat fields 23 to 26:
fld #
size
type
description (bytes)
22
2
int
number of instrument events
23
2
int
time of events, in ticks
24
9
char
instrument name
25
1
char
filler
26
2
int
unused
27
15
char
filler
Field 28 indicates the number of times to repeat fields 29 and 30:
fld #
size
type
description (bytes)
28
2
int
number of volume events
29
2
int
time of events, in ticks
30
4
float
volume multiplier (0.0 - 1.0)
31
15
char
filler
Field 32 indicates the number of times to repeat fields 33 and 34:
fld #
size
type
description (bytes)
32
2
int
number of pitch events
33
2
int
time of events, in ticks
34
4
float
pitch variation (0.0 - 2.0, nominal is 1.0)
7.2 Notes
Fields #1 and #2 should be set to 0 and 4 respectively. Field #10
should
be filled with zeros.
8 8SVX
The 8SVX files are IFF files used for digital audio data. The format
of
the VHDR block is complete guesswork. These files use Motorola
byte order.
The 8SVX file format is fixed to 8-bit mono sample data - at least
GoldWave does not support saving files in any other format than
8-bit
mono.
8.1 FORMblock [VHDR]
This is the sample information block. The normal size is 20 bytes.
OFFSET
Count
TYPE
Description
0000h
1
dword
Sampling rate of digital data in Hz. This count seems not to
be too accurate, at least GoldWave v2.0 creates different
rates for Wave and 8SVX files.
0004h
4
dword
Other data, unknown
8.2 FORMblock [BODY]
This block contains the raw sample data, maybe the usual IFF compression
was used. The details of both the compression and the information
about
the IFF format are unknow.
9 AIFF
The Audio Interchangeable File Format files are digital audio files
stored
in the IFF format; the samples are stored in signed PCM. The header
block
is [AIFF], different subblocks are :
[AUTH]
The authors information optional
[COMM]
This record stores information about the sampled data
OFFSET
Count
TYPE
Description
0000h
1
word
number of channels or number of instrument samples ???
0002h
1
dword
Sample length
0006h
1
dword
lower frequency
000Ah
1
dword
maximum frequency
000Dh
1
dword
???
[MARK]
[NAME]
The name of the instrument / sample
[SSND]
The stored sample data.
10 AU
The AU files are digital audio files used by the Sun and NeXT
workstations. Further information wanted.
OFFSET
Count
TYPE
Description
0000h
4
char
ID='.snd'
0004h
1
dword
Offset of start of sample
0008h
1
dword
Length of stored sample
000Ch
1
dword
Sound encoding :
1 - 8-bit ISDN u-law,
2 - 8-bit linear PCM (REF-PCM),
3 - 16-bit linear PCM,
4 - 24-bit linear PCM,
5 - 32-bit linear PCM,
6 - 32-bit IEEE floating point,
7 - 64-bit IEEE floating point,
23 - 8-bit ISDN u-law compressed(G.721 ADPCM)
0010h
1
dword
Sampling rate
0014h
1
dword
Number of sample channels
11 FSM
The .FSM files are samples to be used for module style music with
the
Fandarole Composer. Currently only samples of up to 64K length
are
supported, altough the header reserves a dword for the sample size.
OFFSET
Count
TYPE
Description
0000h
4
char
ID='FSM',254
0004h
32
char
ASCII name of sample
0024h
3
char
ID=10,13,26
0027h
1
dword
Length of sample (<=64K)
0028h
1
byte
Fine tune value for sample (currently unsupported)
0029h
1
byte
Sample volume (currently unsupported)
002Ah
1
dword
Start of sample loop
002Dh
1
dword
End of sample loop. If the sample is not set to loop (see below)
this should be set to the end of the sample.
0032h
1
byte
Sample type bitmapped
0 - 8-bit/16-bit sample
1-7 - reserved
0033h
1
byte
Loop mode ?bit mapped?
0-2 - reserved
3 - loop off/loop on
4-7 - reserved
0034h
?
byte
Sample data in signed format
12 GF1 PATCH
The GF1 Patch files are multipart sound files for the Gravis Ultrasound
sound card to emulate MIDI sounds in high quality. Each Patch can
consist
of many samples (for example, a string ensemble consists of Violin,
Viola,
Cello, Bass) which are played depending on the note to play. A
patch can
also
contain a part to be played before the loop and a part to be played
after
the tone has been released.
OFFSET
Count
TYPE
Description
0000h
12
char
ID='GF1PATCH110'
000Ch
10
char
Manufacturer ID
0018h
60
char
Description of the contained Instruments or copyright of
manufacturer.
0054h
1
byte
Number of instruments in this patch
0055h
1
byte
Number of voices for sample
0056h
1
byte
Number of output channels (1=mono,2=stereo)
0057h
1
word
Number of waveforms
0059h
1
word
Master volume for all samples
005Bh
1
dword
Size of the following data
0060h
36
byte
reserved
Following this header, the instruments with their headers follow.
An
instrument header contains the
name and other data about one instrument contained within the patch.
OFFSET
Count
TYPE
Description
0000h
1
word
Instrument number. ?Maybe the MIDI instrument number?. In the
Gravis patches, this is 0, in other patches, I found random values.
0002h
16
char
ASCII name of the instrument.
0012h
1
dword
Size of the whole instrument in bytes.
0016h
1
byte
Layers. Needed for whatever.
0017h
40
byte
reserved
About the patch, I don't know anything. Maybe somebody could enlighten
me.
Each patch record has the following format :
OFFSET
Count
TYPE
Description
0000h
7
char
Wave file name
0007h
1
byte
Fractions
0008h
1
dword
Wave size. Size of the wave digital data
000Ch
1
dword
Start of wave loop
0010h
1
dword
End of wave loop
0012h
1
word
Sample rate of the wave
0014h
1
word
Minimum frequency to play the wave
0016h
1
word
Maximum frequency to play the wave
0018h
1
dword
Original sample rate of the wave data
001Ch
1
int
Fine tune value for the wave
001Eh
1
byte
Stereo balance, values unknown**
001Fh
6
byte
Filter envelope rate
0025h
6
byte
Filter envelope offse
002Bh
1
byte
Tremolo sweep
002Ch
1
byte
Tremolo rate
002Dh
1
byte
Tremolo depth
002Fh
1
byte
Vibrato sweep
0030h
1
byte
Vibrato rate
0031h
1
byte
Vibrato depth
0032h
1
byte
Wave data, bitmapped
0 - 8/16 bit wave data
1 - signed/unsigned data
2 - de/enable looping
3 - no/has bidirectional looping
4 - loop forward/backward
5 - Turn envelope sustaining off/on
6 - Dis/Enable filter envelope
7 - reserved
0033h
1
int
Frequency scale, whatever that means
0035h
1
word
Frequency scale factor
0037h
36
byte
Reserved
13 S3I
This is the Digiplayer/ST3.0 digital sample file format. The sample
files
include information about the loop of the instrument. The AdLib
instruments have another format listed below.
OFFSET
Count
TYPE
Description
0000h
1
byte
ID=01h
0001h
12
char
DOS filename
000Dh
1
byte
reserved (0)
000Eh
1
word
Paragraph offset of the raw sample data from beginning of file.
0010h
1
dword
Sample length in bytes
0014h
1
dword
Start of sample loop
0018h
1
dword
End of sample loop
001Ch
1
byte
Playback volumne of sample
001Dh
1
byte
??? "DSK" what ever that means
001Eh
1
byte
Pack type
0 - unpacked
1 - DP30ADPCM 1
001Fh
1
byte
Flags (bitmapped)
0 - loop on/off
1 - stereo sample (length bytes for left channel,
then another length bytes for right channel!)
2 - 16-Bit samples (in Intel byte order)
0020h
1
dword
C2 frequency
0024h
1
dword
reserved
0028h
1
word
reserved
002Ah
1
word
ID=512
002Ch
1
dword
?? Date of last modification ?? (see table 0009)
0030h
28
char
ASCIIZ Sample name
003Ch
4
char
ID='SCRS'
0040h
?
byte
Raw sample data
Here follows the AdLib instrument format for which I don't know
the
extension:
OFFSET
Count
TYPE
Description
0000h
1
byte
Instrument type
2 - melodic instrument
3 - bass drum
4 - snare drum
5 - tom tom
6 - cymbal
7 - hihat
0001h
12
char
DOS file name
000Dh
3
byte
reserved
0010h
1
byte
Modulator description (bitmapped)
0-3 - frequency multiplier
4 - scale envelope
5 - sustain
6 - pitch vibrato
7 - volume vibrato
0011h
1
byte
Carrier description (same as modulator)
0012h
1
byte
Modulator miscellaneous (bitmapped)
0-5 - 63-volume
6 - MSB of levelscale
7 - LSB of levelscale
0013h
1
byte
Carrier description (same as modulator)
0014h
1
byte
Modulator attack / decay byte (bitmapped)
0-3 - Decay
4-7 - Attack
0015h
1
byte
Carrier description (same as modulator)
0016h
1
byte
Modulator sustain / release byte (bitmapped)
0-3 - Release count
4-7 - 15-Sustain
0017h
1
byte
Carrier description (same as modulator)
0018h
1
byte
Modulator wave select
0019h
1
byte
Carrier wave select
001Ah
1
byte
Modulator feedback byte (bitmapped)
0 - additive synthesis on/off
1-7 - modulation feedback
001Bh
1
byte
reserved
001Ch
1
byte
Instrument playback volume
001Dh
1
byte
??? "DSK"
001Eh
1
word
reserved
0020h
1
dword
C2 frequency
0024h
12
byte
reserved
0030h
28
char
ASCIIZ Instrument name
004Ch
4
char
ID='SCRI'
14 UWF
The UWF files are sample files used by the UltraTracker. Further
information wanted.
OFFSET
Count
TYPE
Description
0000h
32
char
ASCIIZ sample name
0020h
1
char
ID=1Ah
0021h
1
char
ID=10h
0022h
5
char
ID='MUWFB'
0027h
1
char
ID=0
0028h
6
char
Length of sample as ASCII long integer
002Eh
1
word
Length of sample
15 WAVE
The Windows .WAV files are RIFF format files. Some programs expect
the fmt
block right behind the RIFF header itself, so your programs should
write
out this block as the first block in the RIFF file. The subblocks
for the
wave files are:
15.1 RiffBLOCK [data]
This block contains the raw sample data. The necessary information
for
playback is contained in the
[fmt ] block.
15.2 RiffBLOCK [fmt ]
This block contains the data necessary for playback of the sound
files.
Note the blank after fmt.
OFFSET
Count
TYPE
Description
0000h
1
word
Format tag
1 = PCM (raw sample data)
2 etc. for APCDM, a-Law, u-Law ...
0002h
1
word
Channels (1=mono,2=stereo,...)
0004h
1
dword
Sampling rate
0008h
1
dword
Average bytes per second (=sampling rate*channels)
000Ch
1
word
Block alignment / reserved ??
000Eh
1
word
Bits per sample (8/12/16-bit samples)
15.3 RiffBLOCK [loop]
This block is for looped samples. Very few programs support this
block,
but if your program changes the wave file, it should preserve any
unknown
blocks.
OFFSET
Count
TYPE
Description
0000h
1
dword
Start of sample loop
0004h
1
dword
End of sample loop
16 ZyXEL
The ZyXEL Modems are capable of digitizing speech, the ZFAX software
and
answering machine software like VoiceConnect store the sampled
data in
those files. The Modems are capable of compressing the data down
to 19.2k
CPS (ADPCM) and 9.6k CPS (CELP), the algorithms for the compression
may be
found in the ZyxelVoc package by N. Igl, but as the firmware on
the modems
changes, so might the compression algorithm. Playback on the modem
is
always possible. Files are specified by the .ZVD and .ZYX extensions.
OFFSET
Count
TYPE
Description
0000h
5
char
ID='ZyXEL'
0005h
1
byte
02h, ??? format tag
0006h
4
byte
reserved
000Ah
1
word
Compression scheme
0 - CELP
1 - 2 bit ADPCM
2 - 3 bit ADPCM
000Ch
4
byte
reserved
0010h
?
????
Raw Data, The voice data is just the data received from U1496
Modem/Fax.
17 Creative Labs File Formats
17.1 Sound Blaster Instrument File Format (SBI)
The SBI format contains the register values for the FM chip to synthesize
an instrument.
Offset
Description
00h-03h
Contains id characters "SBI" followed by byte 1Ah
04h-23h
Instrument name, NULL terminated string
24h
Modulator Sound Characteristic (Mult, KSR, EG, VIB, AM)
25h
Carrier Sound Characteristic
26h
Modulator Scaling/Output Level
27h
Carrier Scaling/Output Level
28h
Modulator Attack/Delay
29h
Carrier Attack/Delay
2Ah
Modulator Sustain/Release
2Bh
Carrier Sustain/Release
2Ch
Modulator Wave Seelct
2Dh
Carrier Wave Select
2Eh
Feedback/Connection
2Fh-33h
Reserved
17.2 Creative Music File Format (CMF)
The CMF file format consists of 3 blocks: the header block, the
instrument
block and the music block.
The CMF Header Block
Offset
Description
00h-03h
Contains id characters "CTMF"
04h-05h
CMF Format Version MSB = major version, lsb = minor version
06h-07h
File offset of the instrument block
08h-09h
File offset of the music block
0Ah-0Bh
Clock ticks per quarter note (one beat) default = 120
0Ch-0Dh
Clock ticks per second
0Eh-0Fh
File offset of the music title (0 = none)
10h-11h
File offset of the composer name (0 = none)
12h-13h
File offset of the remarks (0 = none)
14h-23h
Channel-In-Use Table
24h-25h
Number of instruments used
26h-27h
Basic Tempo
28h-?
Title, composer and remarks stored here
17.3 The CMF Instrument Block
The instrument block contains one 16 byte data structure for each
instrument in the piece. Each record is of the same format as bytes
24h-33h in the SBI file format.
17.4 The CMF Music Block
The music block adheres to the standard MIDI file format, and can
have
from 1 to 16 instruments. The PC-GPE file MIDI.TXT contains more
information on this file format.
The music block consists of an alternating seqence of time and MIDI
event
records:
dTime
MIDI Event
dTime
MIDI Event
dTime
MIDI Event
........
dTime (delta Time) is the amount of time before the following MIDI
event.
MIDI Event is any MIDI channel message.
The CMF file format defines the following MIDI Control Change events:
Control No
Control Data
66h
1-127, used as markers in the music
67h
0 - melody mode, 1 = rhythm mode
68h
0-127, changes the pitch of all following notes upward by the given
number
of 1/128
semitones
69h
0-127, changes the pitch of all following notes downward by the
given
number of
1/128 semitones
In rhythm mode, the last five channels are allocated for the percussion
instruments:
Channel
Instrument
12h
Bass Drum
13h
Snare Drum
14h
Tom-Tom
15h
Top Cymbal
16h
High-hat Cymbal
17.5 Sound Blaster Instrument Bank File Format (IBK)
A bank file is a group of up to 128 instruments.
Offset
Description
00h-03h
Contains id characters "IBK" followed by byte 1Ah
04h-803h
Parameters for 128 instruments, 16 bytes for each instrument in
the same
format
as bytes 24h-33h in the SBI format
804h-C83h
Instrument names for 128 instruments, 9 bytes for each instrument,
each
name
must be null terminated
18 Creative Voice (VOC) file format
HEADER (bytes 00-19)
Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block]
byte #
Description
00-12
"Creative Voice File"
13
1A (eof to abort printing of file)
14-15
Offset of first datablock in .voc file (std 1A 00 in Intel Notation)
16-17
Version number (minor,major) (VOC-HDR puts 0A 01)
18-19
2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11)
Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes)
NOTE: Terminator Block is an exception -- it has only the TYPE
byte.
TYPE
Description
Size (3-byte int)
Info
00
Terminator
(NONE)
(NONE)
01
Sound data
2+length of data
*
02
Sound continue
length of data
Voice Data
03
Silence
3
**
04
Marker
2
Marker# (2 bytes)
05
ASCII
length of string
null terminated string
06
Repeat
2
Count# (2 bytes)
07
End repeat
0
(NONE)
08
Extended
4
***
*Sound Info Format:
**Silence Info Format:
00 Sample Rate
00-01 Length of silence - 1
01 Compression Type
02 Sample Rate
02+ Voice Data
***Extended Info Format:
00-01
Time Constant:
Mono: 65536 - (256000000/sample_rate)
Stereo: 65536 - (25600000/(2*sample_rate))
02
Pack
03
Mode:
0 = mono
1 = stereo
Marker#
Driver keeps the most recent marker in a status byte
Count#
Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD
repetitions or FFFF for endless repetitions
Sample Rate
SR byte = 256-(1000000/sample_rate)
Length of silence
in units of sampling cycle
Compression Type
of voice data
8-bits= 0
4-bits = 1
2.6-bits = 2
2-bits = 3
Multi DAC = 3+(# of channels)
[interesting this isn't in the developer's
manual]
19 Revision History
Version 1.0 - First document containing 15 formats
Version 1.1 - 2 More formats added
.
.to.top