YouTubeDrive: Store files as YouTube videos

4 years ago (github.com)

315 comments

notamy

Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.

The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.

Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)

dzhang314 4 years ago

One more thing: the choice of Wolfram Mathematica as an implementation language was a deliberate decision on my part. Not for any technical reason -- YouTubeDrive doesn't use any of Mathematica's symbolic math capabilities -- but because I didn't want YouTubeDrive to be too easy for anybody on the internet to download and use, lest I attract unwanted attention from Google. In the eyes of my paranoid freshman self, the fact that YouTubeDrive is somewhat obtuse to install was a feature, not a bug.
So, feel free to have a look and have a laugh, but don't try to use YouTubeDrive for any serious purpose! This encoding scheme is so horrendously inefficient (on the order of 99% overhead) that the effective bandwidth to and from YouTube is something like one megabyte per minute.
femto113 4 years ago

As far back as the late 1970s a surprisingly similar scheme was used to record digital audio to analog video tape. It mostly looks like kind of stripey static, but there was a clear correlation between what happened musically and what happened visually, so in college (late 1980s) one of my friends came into one of these and we'd keep it on the TV while listening to whole albums. We had a simultaneous epiphany about the encoding scheme during a Jethro Tull flute solo, when the static suddenly became just a few large squares.
Can see one in action here
https://www.youtube.com/watch?v=TSpS_DiijxQ
freedomben 4 years ago

Nice thanks, this answered my biggest question, which was "will it survive compression/re-encoding." (yes it will). Very cool idea!
ArrayBoundCheck 4 years ago
Do you have any idea how many more bits you'd be able to use if you applied any of the encoding transformations?
- dzhang314 4 years ago
  
  I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.
  
  4 replies →

lb1lf 4 years ago

-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.

It worked a charm.

Second round? A year later, when the archive was still available from umpteen hosts.

For all I know, it still languishes on who knows how many old hard drives...

marginalia_nu 4 years ago
Poor guys, still looking for the right codec to play the britney tape they downloaded 28 years ago.
- iso1210 4 years ago
  
  Disturbing, she'd have been 12 at that time.
  
  12 replies →
f0e4c2f7 4 years ago

Your story reminds me of a Linus quote.
"Real men don’t use backups, they post their stuff on a public ftp server and let the rest of the world make copies." -Linus Torvalds
S-E-P 4 years ago
You devil! I'm pretty sure I remember running into a file that looked like that and a quick poke around showed it wasn't anything valid.
Funny how these things work since I'm pretty sure I remember running into it around 2008 (i'm a few years younger).
I think i just deleted it though since I was suspicious of most strange files back then; I was the nerd who didn't have friends so i used to troll forums for anything i could get my hands on.
- bradwood 4 years ago
  
  "running into it"... Yeah. Right. ;)
  
  2 replies →
jjice 4 years ago

That's a perfect college CS story. Beer and bastardized files - what a combo!
jimmygrapes 4 years ago

ah hell, you're the one who made my computer crash trying to open that and make me panic? damn you man

pingtickle 4 years ago

Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.

414techie 4 years ago
That is really interesting. I wonder if there were any other interesting uses of paid programming to solve problems like these around that time?
- anyfoo 4 years ago
  
  Videonics DirectED, a video editing system, conveniently loaded its software from... VHS tape. Here are some details: https://twitter.com/foone/status/1325945997160165376 and apparently you can still buy it well-preserved as new old stock, complete with the VHS software tape: https://www.ebay.com/itm/124380109086
  A little less crazy and more straightforward (software on audio tape was super common after all): Radio stations and vinyl discs that transmitted programs to the microcomputers of the time (C64, TRS-80 etc.) have quite a long tradition. Some examples:
  http://www.trs-80.org/basic-over-shortwave/ https://www.youtube.com/watch?v=6_CZpFqvDQo&t=2s
- hnlmorg 4 years ago
  
  Not paid for programming but essentially the same tech: VHS games used to encode data in exotic ways so that the content was both viewable on regular TVs with a regular VHS player, but also had some kind of playable content.
  https://youtu.be/WI133HNGNfk
- mwcremer 4 years ago
  
  Not quite paid programming, but Scientific Atlanta had a Broadcast File System that would send data to set top boxes over coax QAM channels used for digital TV. It would loop through all the content on the "carousel" repeatedly so all the boxes connected to that head end would eventually see the updates.
alphabet9000 4 years ago

i made something like this for live streaming encrypted audio/video, but for the web, if you are interested: http://pitahaya.jollo.org
woobilicious 4 years ago

If I was to gamble I would say that Analog TV can store more data, compression algorithms usually work at say 1:200 compression ratio, they're extremely destructive, a raw 1080p60 in yuv420p is about 187MB/s, on the other hand a decent equivalent video on YouTube is about 1MB/s

8K832d7tNmiQ 4 years ago

I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)

marginalia_nu 4 years ago
This is a tangent. I must have been maybe 15-16 at the time, so somewhere around 20 years ago: One of the first pieces of software I remember building was a POP3 server that served files, that you could download using an email client where they would show up as attachments.
Incredibly bizarre idea. I'm not sure who I thought would benefit from this. I guess I got swept up in RFC1939 and needed to build... something.
- babanin 4 years ago
  
  On my first job (in the beginning of the millennium) there was a limit on files you could download, something around 5Mb. If you wanted to download something bigger, you had to ask sysadmins to do that and wait... That was really annoying. So I and my colleague end up writing a service, that could download a file to local storage and chop it into multiple 5Mb attachments and send multiple emails to requestor.
  After some time the limit on single file was removed, but daily limit was set up to 100Mb. The trick is that POP3 traffic wasn't accountable, so we continued to use our "service".
  
  3 replies →
- behnamoh 4 years ago
  
  apparently e-mail is not much reliable for storing/keeping files. there have been cases where an old email with an attachment would not load correctly because the servers just erased the attachment file.
  
  5 replies →
ranger_danger 4 years ago

They also experimented with encoding videos and arbitrary files into different kinds of single (still) image formats, some of them able to be uploaded to the same 4chan thread itself, with instructions on how to decode/play it back. Examples:
https://dpaste.com/HFTKAPM5V
https://github.com/fangfufu/Converting-Arbitrary-Data-To-Vid...
https://github.com/rekcuFniarB/file2png
https://github.com/nzimm/png-stego
https://github.com/dhilst/pngencoder
https://github.com/EtherDream/web2img

anyfoo 4 years ago

I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.

softfalcon 4 years ago
Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.
For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.
Handling this compression protection for so many different codecs is likely not feasible.
- anyfoo 4 years ago
  
  Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.
  But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.
  But it's indeed cool to think about. 8)
- rasguanabana 4 years ago
  
  How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.
- patentatt 4 years ago
  
  Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.
  
  3 replies →
- ALittleLight 4 years ago
  
  What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.
  To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.
  To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.
  
  8 replies →
bambax 4 years ago
Or, film pieces of paper in succession, in a clear enough manner that they're still readable even when heavily compressed.
- ben174 4 years ago
  
  OH, i get it :)
NonNefarious 4 years ago

Back in the day, VCRs were commonly used as tape backup devices for data.
Now studios are using motion-picture film to store data, since it's known to be stable for a century or more.
colejohnson66 4 years ago
YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.
[a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source
- mod50ack 4 years ago
  
  They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.
  
  1 reply →
dheera 4 years ago
YT might still recompress your video, possibly using proprietary algorithms that are not necessarily DCT based
- anyfoo 4 years ago
  
  As said, falls apart with re-encoding. But is a bit more interesting than what is more or less QR codes.
  
  1 reply →

metadat 4 years ago

Could youtube-dlp and YouTube Vanced now be hosted on.. YouTube?

I wonder how long it'd take for Google to crack down on the system abuse.

Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

cush 4 years ago

It's one of those problems that resolves itself.
The process of creating and using the files is prohibitively unusable and so many better solutions exist that YT doesn't need to worry about it
sevenf0ur 4 years ago
Probably breaks TOS under video spam
- throwaway92394 4 years ago
  
  Just gotta add some good 'ol steganography
  
  7 replies →
- tenebrisalietum 4 years ago
  
  Add a music track, it is now a psychedelic art video.
  
  1 reply →
- ranger_danger 4 years ago
  
  Then how is Roel Van de Paar allowed to be on youtube?
- bliteben 4 years ago
  
  yeah wonder how long until the ban, also bans all of your descendants for 10 generations?
robonerd 4 years ago

If you put youtube-dlp on youtube as a video, make sure to use youtube-dlp to it up.
throwaway0a5e 4 years ago
>Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.
If creators start encoding their source and material into their content Google would probably be fine with that because it gives them data but also gives them context for that data.
Edit: I meant like "director's commentary" and "notes about production" type stuff like you used to see added to DVDs back in the day. Not "using youtube as my personal file storage". Why is this such an unpopular opinion?
- baud147258 4 years ago
  
  > If creators start encoding their source material into their files Google would probably be fine with that
  it'd depends, as I don't think people using YT to store files would watch a lot of adds
  
  1 reply →
- jklinger410 4 years ago
  
  > If creators start encoding their source material into their files Google would probably be fine with that
  Not true at all, lol. Google has a paid file storage solution. YouTube is for streaming video and that's the activity they expect on that platform. I couldn't imagine any service designed for one format would "probably be fine" with users encoding other files inside of that format.
  
  1 reply →

legitster 4 years ago

This reminds me of an old hacky product that would let you use cheap VHS tapes as backup storage: https://en.wikipedia.org/wiki/ArVid

You would hit Record on a VCR and the computer data would be encoded as video data on the tape.

People are clever.

gibolt 4 years ago

Early games and software would be delivered on audio cassettes that would then have to be 'played' in order to load your software temporarily into the device, which could take minutes
edit: Video from the 8-bit Guy on how this worked - https://www.youtube.com/watch?v=_9SM9lG47Ew
ben174 4 years ago
Wow, 2GB on a standard tape. For the time, that's incredibly efficient and cheap.
- anyfoo 4 years ago
  
  Yeah. Video, even old grainy VHS, had a pretty high bandwidth. Even much more so with S-VHS, which did not become super popular though. (I'm actually wondering whether the 2GB figure was for S-VHS, not VHS. Didn't to the math and wouldn't be surprised either way, though.)
  
  2 replies →
mobilene 4 years ago
This is old school. When I first wrote code back in the Stone Age we used to store our stuff on cassette tape.
- twh270 4 years ago
  
  You had cassette tape?? Lucky... I had to write my 1's and 0's in the dirt with a stick.
  Damn rain.
  
  4 replies →
- Random_Person 4 years ago
  
  I still have my Atari 400 and tape drive!
  
  2 replies →
- madengr 4 years ago
  
  Ha ha, when I was a kid with my C64, I used my moms old reel-to-reel tape deck to store data.
  I still have a C64 and tape drive.
  There was a magazine in the 80’s where you could scan in the code with a bar code scanner.
philjohn 4 years ago

The Alesis ADAT 8 track digital audio recorders used SVHS tapes as the medium - at the end of the day, it's just a spooled magnetic medium, not hugely different conceptually than a hard drive.
alar44 4 years ago

That's not really that hacky, audio cassettes were used forever, it's just a tape backup.
gattilorenz 4 years ago

Yes! There were many such systems, LGR made a video for one of them, also showing the interface (as in: hardware and GUI) for the backup: https://youtu.be/TUS0Zv2APjU
jhgb 4 years ago

I remember a similar solution that was marketed in a German mail order catalogue in late 1990s. It could have been Conrad, but I'm not 100% sure. I recall it being a USB peripheral, though. (Maybe I could find more about it in time...)

saint_angels 4 years ago

Reminds me of a guy who stored data in ping messages https://youtu.be/JcJSW7Rprio

evgen 4 years ago

Back in the day, when protocols were more trusting we would play games by storing data archives in other people's SMTP queues. Open the connection and send a message to yourself by bouncing it through a remote server, but wait to accept the returning email message until you wanted the data back. As long as you pulled it back in before it times out on that queue and looped it back out to the remote SMTP queue you could store several hundred MB (which was a lot of data at the time) in uuencoded chunks spread out across the NSFNet.
bluedays 4 years ago
I watch these things and I begin to realize I'll never be as intelligent as someone like this. It's good to know no matter how much you're grown there is always a bigger fish.
- qorrect 4 years ago
  
  I agree that there will always be smarter fish, but you can definitely be this smart it just takes the proper motivation ( or weird idea ) to wiggle its way into your brain.
alanh 4 years ago

What part of the video discusses this? :D So far it’s about juggling chainsaws
Edit: OK, I see where this is going. Lol

antics 4 years ago

This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).

From the README.md:

> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.

> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.

[1]: https://news.ycombinator.com/item?id=6932508

daenz 4 years ago

How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.

dsr_ 4 years ago
Back of the envelope estimate:
4096 x 2160 x 24 x 60 is your theoretical max in bits/second, 127 billion.
Assume that to counter YouTube's compression we need 16x16 blocks of no more than 256 colors and 15 keyframes/second; that reduces it to
256 * 135 * 8 * 15 = 4.1 million bits/sec.
That's not too awful. Ten minutes of this would get you about 300MB of data, which itself might be compressed.
- daenz 4 years ago
  
  To do PiP (picture in picture), you would be restricted to a much smaller size, but otherwise good calculations.
- pstrateman 4 years ago
  
  4k video is almost always 3840x2160
  
  3 replies →
behnamoh 4 years ago

“hope you enjoyed this video. btw, the source code used in this tutorial is encoded in the video.”
accrual 4 years ago

Would storing data as a 15 or 30 FPS QR code "video" be any more useful? At a minimum one would gain a configurable amount of error correction, and you could display it in the corner.
cush 4 years ago
Yeah seems way easier than adding a link in the description
- daenz 4 years ago
  
  Links die. As long as the video exists, the files that the video uses will always exist.
  
  8 replies →

umvi 4 years ago

Turns out any site that allows users to submit and retrieve data can be abused in the same way:

- FacebookDrive: "Store files as base64 facebook posts"

- TwitterDrive: "Store files as base64 tweets"

- SoundCloudDrive: "Store files as mp3 audio"

- WikipediaDrive: "Store files in wikipedia article histories"

WaxProlix 4 years ago
I wrote one of these as a POC when at AWS to store data sharded across all the free namespaces (think Lambda names), with pointers to the next chunk of data.
I like to think you could unify all of these into a FUSE filesystem and just mount your transparent multi-cloud remote FS as usual.
It's inefficient, but free! So you can have as much space as you want. And it's potentially brittle, but free! So you can replicate/stripe the data across as many providers as you want.
- turtledove 4 years ago
  
  I was an eng manager on Lambda for a time, and we definitely knew people were doing this, and had plans to cut it out if it ever became a problem. :D
  
  1 reply →
itake 4 years ago
Back in the day when @gmail was famous for their massive free storage for email, ppl wrote scripts to chunk large files and store them as email attachments.
- Grollicus 4 years ago
  
  I used this as a backup target for the longest time. Simply split the backup file into 10 MB chunks and send as mails to a gmail account. Encrypted so no privacy problems. Rock solid for years.
  And as it was just storing emails it was even using gmail for it's intended purpose so no TOS problems..
  
  2 replies →
- adzm 4 years ago
  
  People did this on AOL in the 90s as well!
  
  2 replies →
- ihaveajob 4 years ago
  
  I know someone who published an academic paper on doing exactly this.
  
  4 replies →
wging 4 years ago
See also https://github.com/qntm/base2048. "Base2048 is a binary encoding optimised for transmitting data through Twitter."
- colinmhayes 4 years ago
  
  Still need around 30,000 more unicode characters for this to work.
  
  2 replies →
jasonlotito 4 years ago
My friends and I had a joke called NSABox. It would send data around using words that would attract the attention of the NSA, and you could submit a FOIA request to recover the data. I always found it amusing.
- mickeyp 4 years ago
  
  There's a feature in Emacs that does that (unsurprisingly.)
  It's called `M-x spook'. It inserts random gibberish that NSA and the Echelon project would've supposedly picked up back in the 90s.
  
  1 reply →
- havblue 4 years ago
  
  I've heard of the loic ion cannon dos tool described as a shortcut to getting sent to jail. This sounds similar.
  
  1 reply →
thrdbndndn 4 years ago

This is pretty tame compared to some actual, practical ones such as https://github.com/apachecn/CDNDrive
For people who don't read Chinese: it encodes data into ~10M blocks in PNG and then uploads (together with a metadata/index file as an entry point) to various Chinese social media sites that don't re-compress your images. I knew people have used it to store* TBs after TBs data on them already.
*Of course, it would be foolish to think your data is even remotely safe "storing" them this way. But it's a very good solution for sharing large files.
upupandup 4 years ago

What a great time to write botnets
willcipriano 4 years ago
I made a tool that lets you store files anywhere you can store a URL: https://podje.li/
- metadat 4 years ago
  
  Is there an import URLs button? Otherwise, how does one reassemble the original?
  
  1 reply →
the_duke 4 years ago

Github repos makes for a pretty good key-value store.
It even has a full CRUD API, no need for using libgit.
momofarm 4 years ago

I wonder if we could use this technique at place which gov will censored senstive data upload to streaming site like mainland china or North Korea(they do have streaming site right?)
although for propganda use, shortwave / sat tv is a much much simpler way to distribute information to place like that, but I belive now its hard to get one SW radio for anyone.
vfinn 4 years ago

Reminds me of when I tried to Gmail myself a zip archive, and it was denied because of security reasons iirc. I then tried to base64 it, and it still didn't work, same with base32, until finally base16 did work.
fomine3 4 years ago
I found some pirates uploads video to Prezi so they get free S3 video hosting.
- Denatonium 4 years ago
  
  At one point there was a piece of software called deezcloud which exploited Deezer's user uploaded MP3 storage, allowing it to be used as free CDN cloud storage for up to 400GB of files. I don't think it works anymore, and I'm not sure if it ever worked well (I never tried it).
mike00632 4 years ago
I wonder if access permissions would be easier to maintain using Facebook...
- dheera 4 years ago
  
  Until one day your base64 ciphertext just so happens to contain a curse word and you get banned for violating "community standards"
7373737373 4 years ago

I think I've seen similar blog posts about doing the same with the DNS and BGP networks
decebalus1 4 years ago

or reddit: https://github.com/AncientEntity/PublicPyRedditStorage/
quickthrower2 4 years ago

We need an HNShowDeadDrive
behnamoh 4 years ago

also Telegram

snarfy 4 years ago

I remember my friend did something like this on an old unix system.

Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s

freestorage 4 years ago

Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.

xhrpost 4 years ago
They still do if you pay for Prime. I was surprised to see that even RAW files (which are uncompressed and quite large) were uploaded and stored with no issues. Not the same as "hiding" data but might still be possible.
- karamanolev 4 years ago
  
  In the interest of technical correctness, RAW files are frequently compressed and even lossily compressed. For example, Sony's RAW compression was only lossy until very recent cameras.
  Given that there are the options for uncompressed, lossy compressed and lossless compressed, I'd say RAW files differ in the stage of the data processing where capture is being done and doesn't imply anything about the type of compression.
  What is relevant is that the formats vary widely between manufacturers, camera lines and individual cameras, so unlike JPEG, it's really hard to create a storage service that compresses RAW files further after uploading in a meaningful way. So anything they do needs to losslessly compress the file.
  
  2 replies →
- netsharc 4 years ago
  
  I guess you can store 24 bits of data as the R,G and B components of a pixel of an "image", and store it as a lossless image...
rabuse 4 years ago

Shhhh, I still do this with encrypted database backups.

_trampeltier 4 years ago

This story from 2016 comes to my mind.

https://www.bbc.com/future/article/20160225-the-quest-to-sol...

geoffeg 4 years ago

This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.

banana_giraffe 4 years ago

Reminds me of "Cauzin Softstrip", the format some computer magazines used back in the day to distribute BASIC programs, or even executables.
Random example from an issue of Byte:
https://archive.org/details/byte-magazine-1986-05/page/n432/...
lifthrasiir 4 years ago

Searching HN for "paper backup" gives a lot of existing solutions, in fact too many that I don't know which one you saw.
aaaaaaaaaaab 4 years ago
So you invented QR codes?
- geoffeg 4 years ago
  
  Overly complicated, color QR codes.

advisedwang 4 years ago

Seems like a great way to get your account closed for abuse!

LewisVerstappen 4 years ago
You'd be surprised how much YouTube lets you upload.
I've been uploading 2-3 hours of content a day every day for the past few years. On the same account too.
I have fewer than 10 subscribers lol.
- emptysongglass 4 years ago
  
  Lucky you. I just posted my first two videos from a conference that were banned within a day for violating "Community Guidelines" without appeal.
  
  1 reply →
- c0balt 4 years ago
  
  They let you sometimes get away with a lot more[0] ;)
  [0]: https://www.youtube.com/watch?v=Olkb7fYSyiI
- deanCommie 4 years ago
  
  How MUCH - yes - as long as it's videos, and it's not violating copyright, you're probably not violating any Terms of Service.
  But I guarantee there is some clause in the ToS that this project violates.
- bityard 4 years ago
  
  What kind of content do you upload? (Should "content" be in air quotes? :P)
  
  5 replies →
deckar01 4 years ago
You could make it much harder to detect by synthesizing a unique video with a DNN and hiding the data using traditional stenography techniques.
- Mockapapella 4 years ago
  
  I think that video compression might make this not a viable technique. Artifacts would destroy the hidden data, right?
  
  6 replies →
robotnikman 4 years ago

Another thread posted today makes it seem like they don't really care
https://news.ycombinator.com/item?id=31488455
johndfsgdgdfg 4 years ago

Then the whole HN crowd would have enough outrage materials for weeks. Seems like a win-win situation to me.
Manuel_D 4 years ago
If it becomes prevalent, I think YouTube would do something like slightly randomize the compression in their videos to dissuade this kind of use.
- serf 4 years ago
  
  isn't the point here that the sub-pixels being produced are so large that it would take a tremendous amount of artifacts to reduce them to an unreadable state?
  in other words; if YTs compression was affecting it so badly that it prevented the data from being re-read, wouldn't that compression scheme render normal video-watching impossible?

dahfizz 4 years ago

Does YouTube store and stream all videos losslessly? How does this work otherwise?

LukeShu 4 years ago
No, YouTube is not lossless.
The video that is created in the example in the README is https://www.youtube.com/watch?v=Fmm1AeYmbNU
We can see that data is encoded as "pixels" that are quite large, being made up of many actual pixels in the video file. I see quite bad compression artifacts, yet I can clearly make out the pixels that would need to be clear to read the data. It looks like the video was uploaded at 720p (1280x720), but the data is encoded as a 64x36 "pixel" image of 8 distinct colors. So lots of room for lossy compression before it's unreadable.
martincmartin 4 years ago
Imagine a QR code that changes once every X milliseconds.
- dahfizz 4 years ago
  
  That's an excellent analogy, thank you.
ezfe 4 years ago
The data is represented large enough on screen that compression doesn't destroy it.
- Beltalowda 4 years ago
  
  e.g. similar to a QR code stored as a JPEG will still work fine.
kleer001 4 years ago

things like redundancy and crc checks I assume

wanderingmind 4 years ago

The code looks not too big (a single file). But it requries a paid symbolic language (Mathematica) to be used. Anyone with better Mathematica knowledge explain if it can be ported to another symbolic (Sage, Maxima) or non-symbolic languages (R, Julia, Python)

dzhang314 4 years ago
Yep! I'm the creator of YouTubeDrive, and there's absolutely nothing in the code that depends on the symbolic manipulation capabilities of Wolfram Mathematica -- you could easily port it to Python, C++, whatever. However, there are two non-technical reasons YouTubeDrive is written in Mathematica:
(1) I was a freshman in college at the time, and Mathematica is one of the first languages I learned. (My physics classes allowed us to use Mathematica to spare us from doing integrals by hand.)
(2) I intentionally chose a language that's a bit obtuse to use. I was afraid that I might attract unwanted attention from Google if YouTubeDrive were too easy for anybody to download and run.
- wanderingmind 4 years ago
  
  Cool thanks. This is an ingenious idea in the true hackers spirit. Well done.
hifikuno 4 years ago
I remember seeing years ago a python library called BitGlitter which did the same thing. It would convert any file to a image or video. You could then upload the file yourself. https://pypi.org/project/BitGlitter/
- wanderingmind 4 years ago
  
  Looks like the github page is deleted. Its better not to use it anymore.

jtxt 4 years ago

Seems like it may be a decent "harder drive". https://youtu.be/JcJSW7Rprio

abadaba 4 years ago

Are there any services out there that combine all of these “Store files as XYZ” into some kind of raid config?

Would be interesting if you could treat each service (Youtube, Docs, Reddit, Messenger, etc) as a “disk” and stripe your data across them.

Saint_Genet 4 years ago

Makes me wonder how many video and image upload sites are now used as easily accessible number stations these days

adolph 4 years ago

Probably not many. The advantage of plain old-fashioned radio is that the station doesn't keep track of the receivers. Whoever watches a YouTube numbers station is tracked six ways to Sunday.

some1else 4 years ago

Rename the project to VideoDrive or something. With the current name Google can get GitHub to take it down on the basis of trademark infringement.

helloworld11 4 years ago

Here I am trying my best to get my favorite videos OFF YouTube given that they could disappear at any second because of an account block, or just "reasons", and this link suggesting storing stuff with YouTube? By god, why? Sure, it's free, practically "limitless" slow file storage, but what a bad idea nonetheless....

layer8 4 years ago

Back in the 90’s I considered storing my backups as encrypted stenographied or binary Usenet postings, as a kind of decentralized backup, postings which would stick around long enough for the next weekly backup. (Usenet providers had at least a couple of weeks of retention time back then.)

shmatt 4 years ago

Reminds me of the old Wrapster[1] days

[1] https://www.cnet.com/tech/services-and-software/napster-hack...

fronterablog 4 years ago

I'm a GOOGL investor and I find this offensive.

kube-system 4 years ago

I can't wait until malware uses this as C2

Tijdreiziger 4 years ago
Seems pretty fragile. Google taking down your channel would be enough to disarm your malware.
- blibble 4 years ago
  
  they worked around this years ago by generating the username (domain name) based on some property of the current time
  (plus using more than one tld)
vmception 4 years ago

Ipfs is decent enough or better with free pinning services

boboche 4 years ago

This gave me a flashback of VBS on amiga… video backup system, record composite video on a vcr, and simple op amp circuit that would decode black and white blobs of video pixels, could backup floppies at reading speed. Was really impressive until, well, vhs… ;)

Just did a google and saw it had evolved over the years, used only the 1.0 implementation back in the days. For those on another nostalgic trip : http://hugolyppens.com/VBS.html

powerset 4 years ago

I wonder if something similar could be useful for transmitting data optically, like an animated QR code. Maybe a good way to transmit data over an air gap for the paranoid?

flaque 4 years ago

See also RedditFS: https://github.com/maxchehab/redditfs

iostream24 4 years ago

What does the OP have against “Google Drive” when seeking file storage via a Google Service?

Horses for courses… this is how we end up with pictures clogging transaction ledgers

nelblu 4 years ago

Reminds me of the movie Contact where the alien civilization encodes the whole design of a traveling machine inside Olympic telecast video.

self_awareness 4 years ago

Popularity of such projects is the reason of imposing more and more constraints on systems that are somewhat open (at least open to use). Maybe instead of figuring out how to abuse an easy-to-use system, people should figure out how to abuse hard-to-use systems, like e.g. creation of open protocols for closed systems. That would be an actual achievement.

grupthink 4 years ago

Upload videos as data, then build an app that streams and decodes these files back into videos. Voila, popcorn time.

anonymousiam 4 years ago

Reminds me of this similar tool that exploited GMail the same way: https://www.computerworld.com/article/2547891/google-hack--u...

wscott 4 years ago

Yes we have all done or used something similar when we were younger, but really, should this be on the front page of HN? This is abuse of a popular service and if it becomes popular it will only make YouTube worse and YouTube is getting worse without any additional help.

jimmydeans 4 years ago

I remember a project that was doing this with photo files and unlimited picture storage.

e1ghtSpace 4 years ago

This ones not the best but it works. I would recommend zipping everything and then using that as a single file. (file size limit is ~2GB fyi) https://github.com/Quadmium/PEncode

kringo 4 years ago

BEWARE: Until they clamp down and delete the files, you lose your data.

Good technical experiment though!

netsharc 4 years ago

Since he's made a ready-to-use software, yeah Google will probably ban this quite quickly...

smm11 4 years ago

I suspect people in my office who send everything as a Word attachment with an image, PPT, Excel workbook, etc., embedded, are doing this unknowingly.

There are even Word files I've found that have complete file path notation to ZIP files.

egypturnash 4 years ago

I think my favorite part of this is that the example video linked to this has ads on it. It's a backup system that pays you. Well, until someone at Youtube sees it and decides to delete your whole account.

danschumann 4 years ago

This reminds me of Blame! where humans are living light rats in the belly of the machine. Lol, also reminds me of the geocities days where we created 50 accounts to upload dragon ball z videos.

bilekas 4 years ago

I absolutely love this idea. I need to dig more into the code, but its almost like using twitter as a 'protocol' using youtube as a storage.

So many ideas are flying to mind. Really creative.

accrual 4 years ago

I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).

layer8 4 years ago
I believe YouTube supports random access, or otherwise you wouldn’t be able to jump around in a video. Youtube-dl also supports resuming downloads in the middle, I believe.
- ductsurprise 4 years ago
  
  True... But guessing YouTubeDrive 'decoder' needs whole video to get you back anything close to what you put in.
  Otherwise each frame would have to have a ridiculous amount of encoded overhead.
  Ahh, NM cant even see that working.
  edit: Maybe a file table at built from from specified first N frames, that delivers frameset/file map ...
  Still nothing like skipping spots in a video. That relies on key frames and time signatures.
  Cool stuff nonetheless...
  
  1 reply →

INTPenis 4 years ago

I like this. The last wave of Twitter users into the fediverse caused my AWS bill to go up 10 USD a month. Might have to start storing media files on youtube instead ;)

msoad 4 years ago

Reminds me of the other post that used Facebook Messenger as transport layer to get free internet in places that internet is free if you use Facebook apps.

derevaunseraun 4 years ago

This seems like something Cicada 3301 would use

I wonder how many random videos like this are floating around that are encoding some super secret data...

take_it_not 4 years ago

I’m thinking maybe we can divide files into pieces and turn each pieces into a QR code then turn each QR code into a single frame?

das_keyboard 4 years ago

Wasn't there more or less recently on HN something like "Store Data for free in DNS-Records"? Reminds me of this.

ck2 4 years ago

Imagine a Raid6 of four youtube 11-digit IDs

Bet google isn't happy with this idea and will definitely try to break it asap

Jimmc414 4 years ago

Very cool. I wonder how difficult it would be present a real watchable video to the viewer. Albeit low quality, but embed the file in a steganographic method. I think a risk of this tech is that if it takes off, YT might easily adjust the algorithms to remove unwatchable videos. Perhaps leaving a watchable video could grant it more persistence than an obvious data stream.

ragingglow 4 years ago

Sure, but the more structure your video has to have, the harder it becomes to hide information stenographically within it. Your information density will become very low I think.

mensetmanusman 4 years ago

Are the premium files stored as 4K?

jagged-chisel 4 years ago

This would be a good way to backup your YouTube videos to YouTube while avoiding Content ID.

sunlite99 4 years ago

How will you prevent youtube from re-encoding the video and data getting thrashed?

tenebrisalietum 4 years ago

Make the boxes bigger.

Group_B 4 years ago

I was literally thinking of something like this a couple days ago. Good timing!

ductsurprise 4 years ago

Could be a good and sneaky way to obfuscate encrypted message transmissions?

musicale 4 years ago

It's all fun and games until your files start getting DMCA takedowns.

kebman 4 years ago

Are there any examples? I'd love to see such a YouTube video... :p

AdriaanvRossum 4 years ago

How much kilobytes would be possible to store per minute video?

throwaway742 4 years ago

Can't you upload lossless captions to youtube?

theHNAcct 4 years ago

I believe this is the backend for AWS Glacier

behnamoh 4 years ago

there was a story on HN a while ago in which someone stored unlimited data in Google Sheets!

nth_order 4 years ago

Another "Harder Drive"!

aneil 4 years ago

Evil genius.

bgro 4 years ago

I also “invented this idea” from scratch in a series that exists solely in my mind where I abuse a variety of free services for unintended purposes.

I could seemingly never explain the concept to other developers in a meaningful way or cared myself to code these out.

Anyway my quick summary in this is just think of a dialup modem. You connect to a phone line and you get like a 56k connection. That sucks today, sure, but actually it’s kind of mind blowing for how data transfer speeds worked at the time.

You know how else you can send data via a phone line without a modem? Just literally call someone and speak the data over the phone. You could even speak in binary or base64 to transfer data. It’s slow, but it still “works,” assuming the receiving party can accurately record the information and hear you.

That seems to be what this main topic is. Using a fast medium (video player) to slowly send data over the connection, like physically speaking the contents of other data. But there could be some problems with this approach.

Mainly, YouTube will always recompress your video. For this method, that means your colors or other literal video data could be off. This limits the range of values you can use in an already limited “speaking” medium.

if this wasn’t the case, we would like to use a modem connection. Just literally send the data and pretend it’s a video. However, where I left off on this idea, we appear to be hard blocked due to that YouTube compression.

We can write data to whatever we want and label it any other file type. (As a side note, Videos also are containers like zip that could be abused to just hold other files)

But YouTube is an unknown wildcard that changes our compression and thus our data which seems to invalidate all of this.

If we somehow convert an exe to an avi, The YouTube compression seems to just hard block this from working like we want. If we didn’t have that barrier, I think we could otherwise just use essentially corrupted videos to become other file types if we can download the raw file directly.

(steganography is a potential work around I haven’t explored yet)

Without these, we’re left to just speak the data over a phone which compresses our voice quality and in theory could make some sounds hard to tell apart. This leaves us in the battle of what language is best to speak to avoid compression limiting our communication. Is English best? Or is Japanese? What about German? Which language is least likely to cause confusion when speaking but also is fast and expressive?

This translates into what’s the best compression method for text or otherwise pixels in a video where data doesn’t get lost due to compression? Is literal English characters best? What about base64? Or binary? What if we zip it first and then base64? What if we convert binary code into hex colors? Does that use less frames in a video? Will the video be able to clearly save all the hex values after YouTube compression?

mark_prutskyi 4 years ago

so cool

Annatar 4 years ago

This works on the same principle as the video backup system (VBS) which we used in the 1980's and the early 1990's on our Commodore Amigas: if I remember correctly, one three hour PAL/SECAM VHS tape had a capacity of 130 MB. The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts.

https://www.youtube.com/watch?v=VcBY6PMH0Kg

SGI IRIX also had something conceptually similar to this "YouTubeDrive" called HFS, the hierarchical filesystem, whose storage was backed by tape rather than disk, but to the OS it was just a regular filesystem like any other: applications like ls(1), cp(1), rm(1) or any other saw no difference, but the latency was high of course.

thought_alarm 4 years ago

That's how digital audio was originally recorded to tape back in the 1970s and 80s: encode the data into a broadcast video signal and record it using a VCR.
In the age of $5000 10 MB hard drives, this was the only sensible way to work with the 600+ MB of data needed to master a compact disc.
That's also where the ubiquitous 44.1 kHz sample rate comes from. It was the fastest data rate could be reliably encoded into both NTSC and PAL broadcast signals. (For NTSC: 3 samples per scan line, 245 scan lines per frame, 60 frames per second = 44100 samples per second.)
ogurechny 4 years ago
130 MB for the whole tape is not a lot. It equals to a floppy disk throughput, which is probably not a coincidence. However, basic soldering implies that the rest of the system acts like a big software-defined DAC/ADC.
Dedicated controller could pack a lot more data, as in hobo tape storage system: https://en.wikipedia.org/wiki/ArVid
- Annatar 4 years ago
  
  Dedicated controllers were absolutely out of the question because nobody could afford them, which is why Amigas were so popular: a fully multitasking, multimedia computer for 450 DM. That's 225 EUR! Somebody that cost sensitive won't even consider a dedicated controller; back then wasn't like it's today.
  This was at a time when 3.5" floppy disks were expensive (and hard to come by), and hard drives were between 40 - 60 MB, so 130 MB was quite practical. The floppy drive in the Amiga read and wrote at 11 KB / s.
  And yes, this was a DAC and an ADC in software, with added Reed-Solomon error correction encoding and CRC32. The goal was to be economical. The end price was everything; it had to be as cheap as possible.
rahimnathwani 4 years ago

"one three hour PAL/SECAM VHS tape had a capacity of 130 MB"
This reminds me of the Danmere Backer.
"The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts."
This reminds me of the DIY versions of the Covox Speech Thing: https://hackaday.com/2014/09/29/the-lpt-dac/

productceo 4 years ago

Imagine a free cloud storage, but you need to watch an ad every time you download a file.

rightbyte 4 years ago
I read that you did not download shady files from the interwebs when that was a thing sane people actually did?
- rationalfaith 4 years ago
stingta 4 years ago

Wasn't that basically megaupload its ilk
ranger_danger 4 years ago

imagine not using an ad blocker

bspear 4 years ago

Fascinating

xhrpost 4 years ago

Not immediately obvious from the Readme, but does this rely on YT always saving a providing download of the original un-altered video file? If not, then it must be saving the data in a manner that is retrievable even after compression and re-encoding, which is very interesting.