Games & Quizzes
Don't forget to Sign In to save your points
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
Games & Quizzes
You may need to watch a part of the video to unlock quizzes
Don't forget to Sign In to save your points
PERFECT HITS | +NaN | |
HITS | +NaN | |
LONGEST STREAK | +NaN | |
TOTAL | + |
This FilmmakerIQ Lesson is proudly sponsored by RØDE Microphones. Premium microphones
and audio accessories for studio, live and location recording.
Hi, John Hess from FilmmakerIQ.com - today we’ll get into the basics of digital data
storage from how we count it, how we name it, how we store it, and finally how we make
files from it.
Digital data at it’s heart is very simple: it’s either on or off - one or zero.
The binary number system that is central to modern computing can actually be traced back
to Gottfried Leibniz with his 'On the Art of Combination’ published in 1666. Leibniz
was interested in creating a pure mathematical language guided by perfect logic. Towards
his later life, the binary system came to adopt quasi-religious mysticism with one being
God and zero being the void.
It didn’t work out so good for Leibniz. but when people started training machines
to add - that’s when binary really grew some legs. By being only one of two options,
binary offered a level of precision that analog signals could never match.
The smallest bit of information in binary is... the bit. The bit can only express one
of two states - either zero or one. When we add a second bit and we now have four possible
states - zero zero, zero one, one zero, one one. If we add another bit we now have 8 possible
states. Each new bit doubles the number of possible states - to 16, 32, 64, 128 and 256.
And now we arrive at our first marker - the Byte which is 8 bits. Coined by IBM engineer
Werner Buchholz in July 1956 a byte historically has been designated any number of bits necessary
to express a character. This meant it was hardware dependent number with no definitive
standards. Early computers used four bit or six bit. When ASCII, a standard for encoding
English text and numbers came out in 1963, it was a seven bit system.
So how did we get 8 bits to the byte? During ASCII’s development, IBM introduced an eight-bit
Extended Binary Coded Decimal Interchange Code which was quite popular. Into the 70s
eight-bit microprocessors such as the Intel 8008, the direct predecessor of the 8080,
and the 8086, the precursor the x86 line of processors, popularized the 8 bit standard.
In those early days, computer RAM memory was labeled by the exact number of bytes they
contained - like a 4096, 8192, or 16384 - it was generally a power of two because that’s
what played nicely with processor’s architecture.
But that numbering system wasn’t going to work as memory got bigger and bigger.
In 1960 the International System of Units, abbreviated SI, formalized the metric system
and naming practices for units. Although there is no SI unit for memory, computer manufacturers
began to borrow prefixes like Kilo, Mega, and Giga to describe computer memory and hard
drive space. But there was just one problem:
Some people used the letter K or kilobyte to refer to 1024 bytes - that’s 2 to the
10th power. But SI unit prefix of kilo is for 1000 units, not 1024. Computer memory
manufacturers who were tied into the 8 bit architecture of the CPU just started using
capital K for Kilo as a shorthand for 1024 bytes. They would later use M for megabytes
as 1024 to the second power and G for gigabytes as 1024 to the third power.
But hard drive manufacturers took a different tack. There was rule saying the size of a
hard drive that said it had to be a power of two. In fact the very first commercial
hard drive, the IBM 350, first shipped in June 1956 and had 50 physical disk "platters"
containing a whopping 3.75 MB.
So to avoid the extra numbers that binary introduced, hard drives manufactures used
the SI unit prefix in standard decimal. One K wasn’t 1024 bytes but exactly 1000. This
dual definition between 1024 or 1000 became common practice in the 70s which is why even
today when you plug in your brand new 300 gigabyte hard drive your computer will only
show it as 279.4 GB. It’s not that they’re cheating you, it’s that operating systems
like Windows count in multiples of 1024 while the manufacturers count by the 1000s.
In 1998, the IEC, a governing board which sets standards on electronics, created a new
prefixes to try to clear up the confusion. Basically you take the first two letters of
the SI unit and adds bi resulting sizes like kibibyte, mebibyte, gibibyte and tebibyte.
A kilobyte would remain its SI decimal version but a kibibyte adhere to the binary version.
So we go from kilobytes at 1000 bytes to megabytes at 1 million bytes to gigabytes at one billion
and terabytes at 1 trillion bytes. What’s after that?
Well to help get a sense of size, let’s compare these data sizes to an regular single
layer DVD which holds about 4.7 gigabytes.
If we wanted to burn one terabyte onto DVDs, we would need 213 DVDs. If we stack them up
it would be a just shy of 1 foot high. For comparison - if we were shooting uncompressed
4K raw, this would only give us 34 minutes of shooting time.
Let’s use a very conservative 10:1 shooting ratio - that is for every finished minute
of film, there’s 10 minutes of raw footage - a two hour narrative would eat through almost
35 terabytes. In that case our stack of DVDs would be almost 3 stories high. A most realistic
30:1 shooting ratio would get that stack up to 8 stories.
You can see why only major productions shoot in uncompressed RAW, but even then it’s
a lot of data to wrangle. Not impossible as it isn’t stored on DVDs but there’s a
lot of data.
What’s after terabyte?
The Petabyte - one thousand terabytes.
If we took every single feature film listed on IMDB and compress each one to fit exactly
on a single 4.7 gig DVD we would have a little over 1.5 Petabytes of information and a stack
of DVDs taller than the Petronas Towers of Kuala Lumpur.
But social media is pumping out even more data. One full year of tweets weighs in at
4 petabytes. On an average day in 2008, Google processes 20 Petabytes of information. And
according to a stock report in 2013, Facebook stores over 100 petabytes of status updates,
photos and video. At almost 84,000 feet, Our stack of DVDs is now about the cruising altitude
of the SR-71 Blackbird - at the edge of the stratosphere.
The next step up is the Exabyte - 1000 petabytes. To create one Exabyte our stack climbs up
to 159 miles, the same altitude that chimpanzee Ham reached in an early spaceflight test on
board Mercury-Redstone 2 in January of 1961.
A thousand Exabytes is a Zettabyte that’s one sextillion bytes. The size of the entire
web has been put as 4 Zettabytes as of 2013 - to put that all on DVD would require a stack
that’s three times the distance from the Earth to the moon.
Mark Liberman calculated that if we digitized every single word spoken as 16kHz 16-bit audio,
we would need 42 zettabytes - taking us now a stack of 6 million miles - the view of earth
would look something like this taken from the Juno Spacecraft en route to Jupiter.
And lastly we reach our biggest named data size so far - the Yottabyte. one septillion
bytes: One million million million kilobytes. If we were to store that on DVDs our stack
would reach 158 million miles into space - which is more than the average distance from here
to Mars.
Even if we could get a quantity discount of say 10 cents per DVD which is a real bargain,
this tower to Mars would cost over $21 trillion dollars and weigh about a third of the moon’s
mass.
In our little thought experiment we used DVDs to store the data. DVDs, along with CDs and
Blurays are optical mediums. That is they bounce light, in the form of a laser, off
the surface of the disc. If there is pit in the surface, the reader sees it as a 0 or
off. A land and the reader sees it as a 1.
Optical media has been a great way of distributing media from music to movies but it’s for
the most part a write once deal. For storing data that we can work with like movie assets,
we’ll need something more malleable.
Traditionally this has come in the form of the spinning hard drive disk - one or more
platters coated with a magnetic material which can be written and read with a magnetic head.
This magnetic head looks at the the polarity of the material on the disk. If the polarity
of a section remains constant, that bit is read as a zero. If there is a switch in polarity,
a small voltage is created within the magnetic head as it sweeps over the surface. This spike
in voltage, whether it is positive or negative, is read as a one.
The newer type of memory storage is the Solid State Drive based on Flash memory. First introduced
in 1984 by Toshiba - flash memory is very similar to design of a MOSFET transistor with
the addition of what’s called a floating gate. You can think of a transistors as a
switch - when we apply a positive voltage to the gate the electrical field opens up
a channel which allows current to flow between the source and drain. A flash memory cell
adds this floating gate between the control gate and the semiconductor. This floating
gate is isolated with non conductive oxide which means once we put a negative charge
on it, it should hold it indefinitely.
If this floating gate has no negative charge, the transistor will switch on when a certain
positive voltage is applied to the control gate. If the gate has a negative charge - it
will cancel out some of the charge from the control gate which means we have to run a
higher voltage through the control gate in order to open the switch.
So now in order to read this memory cell, we run a voltage that’s somewhere in between.
If the switch opens, we have no electrons stored in that floating gate and therefor
we have a one. If the switch is closed, that means there are electrons in the floating
gate canceling out the current from our gate - and we have a zero.
That’s how you read a flash memory cell. But how do we write to it? There’s two ways
- the first is through quantum tunneling. By applying a large voltage to the gate we
can actually get electrons to quantum tunnel through the oxide into the floating gate.
The other way is something called Hot Electron Injection which again uses high voltage to
get the kenetic energy of the electrons to power through the oxide substrate.
I’ll spare you and myself the details but both these techniques require higher voltages
and oxide layer separating the floating gate eventually becomes damaged with all the electrons
traveling through it. For this reason flash memory can only be written so many times before
it fails.
But the beauty of flash memory is it can be made incredibly small. Remember that crazy
tower to Mars of DVDs? If we instead created a Yottabyte using 200 GB microSDXC cards (the
most compact data storage medium available as of this video) we would only need a pile
of disks about one third the size of the Great Pyramid of Giza.
When cameras started going digital, the solid state drives were the perfect replacement
as we transitioned out of tape. There were many different formats from the Panasonic
P2 to Sony’s Memory stick and SxS system. Some cameras even use computer SSD hard drives
but I want to focus on two particular types of Flash memory you’ll see in professional
and consumer cameras.
The first is the CF card or Compact Flash card. First manufactured by SanDisk in 1994,
the CF card is still widely used in photography and video equipment. It’s a very robust
card although you do have to be a little careful when pushing it into a reader or camera as
the contact pins do bend easily. CF cards read speeds are either written as Megabytes
per second or as a value followed by an x. The “X” is a base of 150 Kilobytes per
second write speed. So a 200X means it’s capable of being reading at 30 megabytes per second.
A major factor in the read and write speed is the communication protocol either: PIO
(Programmed Input/Output) mode and UDMA (Ultra Direct Memory Access) mode. PIO is for industrial
use. For video and photography you want UDMA. There are several modes - so far 0 to 7. With
the UDMA 7 supporting up to 167 MB/sec. But that’s only supporting, the actual speed
may be a lot lower than that.
Very recently a variation of the CF card - the CFast card came on the market. These cards
use a Serial ATA bus rather than a Parallel ATA Bus that the regular CF card uses. This
enables speeds up to 600 MB/sec and are being used in some high data stream cameras especially
those shooting in RAW..
The other popular format I want to talk about is the Secure Digital Card or SD card. These
cards come in four families: Standard-Capacity (SDSC), the High-Capacity (SDHC), the eXtended-Capacity
(SDXC), and the Secure Digital In Out which is really more of an interface. SD cards are
available in three size forms: Standard, Mini and Micro.
Unlike CF cards which can be a little confusing about their actually transfer speeds, SD cards
have class ratings that guarantee the minimum read and write speeds. For most HD video applications
you’ll need at least a class 10 or a UHS Speed Class 1 card which guarantees 10 MB/sec.
For 4K look for the little U symbol UHS Speed Class 3 which guarantees 30 MB/sec
Of course, these speed recommendations are only suggestions for compressed formats - your
camera may have specific requirements so it’s worth consulting manual.
There’s just one final topic I want to briefly cover in this overview of storage. We’ve
been talking about storing all those ones and zeros but when you have a long string
of them - how do you tell where the data starts and ends? How do we delineate one file from
another?
That’s where file systems come into play. There are many different kinds of file systems
and they can differ in structure and logic, properties of speed, flexibility, security,
size and more. There are file systems for optical disks, RAM, tape disks, you name it.
We could go pretty deep into that rabbit hole but for this discussion let’s only focus
on the disk file systems you’ll likely run into and a little bit about them.
If you’re using an Apple product - your hard drive will be utilizing the Apple Proprietary
HFS+ file system. Windows systems will be utilizing Microsoft proprietary NTFS. Unfortunately
these two file systems don’t play nicely with each other. On Windows you can just plug
in a Mac Drive and expect it to be readable. Luckily there are software options that essentially
translate the systems and allow one filesystem to read and write to the other.
Flash media will most often be formated in a File Allocation Table (FAT) system such
as FAT32. A relatively old file system, the maximum possible size for a file on a FAT32
file system is 4 GB minus 1 byte. Cameras that record onto cards formatted with FAT32
will split up large files either at 2 gigs or 4 gigs in order to stay under the 4 gig limit.
exFAT, which debuted in 2006 from Microsoft does away with this file limitation and is
the default file system for SDXC cards larger than 32 gigabytes. Windows and Mac Systems
can both read and write to flash memory that is formated in FAT32 or exFAT.
We are producing data at an astounding rate. How that data is stored is a challenge for
today’s computer scientists and engineers. If anything they’ve come up with some amazing
technologies. Data is the lifeblood of the digital filmmaker, it’s key that we protect
it, back it up, and when we’re finished, archive it which is quickly becoming it’s
own challenge all by itself. But first you have to get out there and make something great.
I’m John Hess and I’ll see you at FilmmakerIQ.com
Metric | Count | EXP & Bonus |
---|---|---|
PERFECT HITS | 20 | 300 |
HITS | 20 | 300 |
STREAK | 20 | 300 |
TOTAL | 800 |
Sign in to unlock these awesome features: