YouTubeDrive: Store files as YouTube videos

4 years ago (github.com)

Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.

The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.

Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)

  • One more thing: the choice of Wolfram Mathematica as an implementation language was a deliberate decision on my part. Not for any technical reason -- YouTubeDrive doesn't use any of Mathematica's symbolic math capabilities -- but because I didn't want YouTubeDrive to be too easy for anybody on the internet to download and use, lest I attract unwanted attention from Google. In the eyes of my paranoid freshman self, the fact that YouTubeDrive is somewhat obtuse to install was a feature, not a bug.

    So, feel free to have a look and have a laugh, but don't try to use YouTubeDrive for any serious purpose! This encoding scheme is so horrendously inefficient (on the order of 99% overhead) that the effective bandwidth to and from YouTube is something like one megabyte per minute.

  • As far back as the late 1970s a surprisingly similar scheme was used to record digital audio to analog video tape. It mostly looks like kind of stripey static, but there was a clear correlation between what happened musically and what happened visually, so in college (late 1980s) one of my friends came into one of these and we'd keep it on the TV while listening to whole albums. We had a simultaneous epiphany about the encoding scheme during a Jethro Tull flute solo, when the static suddenly became just a few large squares.

    Can see one in action here

    https://www.youtube.com/watch?v=TSpS_DiijxQ

  • Nice thanks, this answered my biggest question, which was "will it survive compression/re-encoding." (yes it will). Very cool idea!

  • Do you have any idea how many more bits you'd be able to use if you applied any of the encoding transformations?

    • I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.

      4 replies →

-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.

It worked a charm.

Second round? A year later, when the archive was still available from umpteen hosts.

For all I know, it still languishes on who knows how many old hard drives...

  • Your story reminds me of a Linus quote.

    "Real men don’t use backups, they post their stuff on a public ftp server and let the rest of the world make copies." -Linus Torvalds

  • You devil! I'm pretty sure I remember running into a file that looked like that and a quick poke around showed it wasn't anything valid.

    Funny how these things work since I'm pretty sure I remember running into it around 2008 (i'm a few years younger).

    I think i just deleted it though since I was suspicious of most strange files back then; I was the nerd who didn't have friends so i used to troll forums for anything i could get my hands on.

  • That's a perfect college CS story. Beer and bastardized files - what a combo!

  • ah hell, you're the one who made my computer crash trying to open that and make me panic? damn you man

Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.

  • That is really interesting. I wonder if there were any other interesting uses of paid programming to solve problems like these around that time?

  • If I was to gamble I would say that Analog TV can store more data, compression algorithms usually work at say 1:200 compression ratio, they're extremely destructive, a raw 1080p60 in yuv420p is about 187MB/s, on the other hand a decent equivalent video on YouTube is about 1MB/s

I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)

  • This is a tangent. I must have been maybe 15-16 at the time, so somewhere around 20 years ago: One of the first pieces of software I remember building was a POP3 server that served files, that you could download using an email client where they would show up as attachments.

    Incredibly bizarre idea. I'm not sure who I thought would benefit from this. I guess I got swept up in RFC1939 and needed to build... something.

    • On my first job (in the beginning of the millennium) there was a limit on files you could download, something around 5Mb. If you wanted to download something bigger, you had to ask sysadmins to do that and wait... That was really annoying. So I and my colleague end up writing a service, that could download a file to local storage and chop it into multiple 5Mb attachments and send multiple emails to requestor.

      After some time the limit on single file was removed, but daily limit was set up to 100Mb. The trick is that POP3 traffic wasn't accountable, so we continued to use our "service".

      3 replies →

    • apparently e-mail is not much reliable for storing/keeping files. there have been cases where an old email with an attachment would not load correctly because the servers just erased the attachment file.

      5 replies →

  • They also experimented with encoding videos and arbitrary files into different kinds of single (still) image formats, some of them able to be uploaded to the same 4chan thread itself, with instructions on how to decode/play it back. Examples:

    https://dpaste.com/HFTKAPM5V

    https://github.com/fangfufu/Converting-Arbitrary-Data-To-Vid...

    https://github.com/rekcuFniarB/file2png

    https://github.com/nzimm/png-stego

    https://github.com/dhilst/pngencoder

    https://github.com/EtherDream/web2img

I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.

  • Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.

    For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.

    Handling this compression protection for so many different codecs is likely not feasible.

    • Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.

      But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.

      But it's indeed cool to think about. 8)

    • How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.

    • Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.

      3 replies →

    • What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.

      To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.

      To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.

      8 replies →

  • Back in the day, VCRs were commonly used as tape backup devices for data.

    Now studios are using motion-picture film to store data, since it's known to be stable for a century or more.

  • YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.

    [a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source

    • They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.

      1 reply →

  • YT might still recompress your video, possibly using proprietary algorithms that are not necessarily DCT based

Could youtube-dlp and YouTube Vanced now be hosted on.. YouTube?

I wonder how long it'd take for Google to crack down on the system abuse.

Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

  • It's one of those problems that resolves itself.

    The process of creating and using the files is prohibitively unusable and so many better solutions exist that YT doesn't need to worry about it

  • >Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

    If creators start encoding their source and material into their content Google would probably be fine with that because it gives them data but also gives them context for that data.

    Edit: I meant like "director's commentary" and "notes about production" type stuff like you used to see added to DVDs back in the day. Not "using youtube as my personal file storage". Why is this such an unpopular opinion?

    • > If creators start encoding their source material into their files Google would probably be fine with that

      it'd depends, as I don't think people using YT to store files would watch a lot of adds

      1 reply →

    • > If creators start encoding their source material into their files Google would probably be fine with that

      Not true at all, lol. Google has a paid file storage solution. YouTube is for streaming video and that's the activity they expect on that platform. I couldn't imagine any service designed for one format would "probably be fine" with users encoding other files inside of that format.

      1 reply →

This reminds me of an old hacky product that would let you use cheap VHS tapes as backup storage: https://en.wikipedia.org/wiki/ArVid

You would hit Record on a VCR and the computer data would be encoded as video data on the tape.

People are clever.

  • Wow, 2GB on a standard tape. For the time, that's incredibly efficient and cheap.

    • Yeah. Video, even old grainy VHS, had a pretty high bandwidth. Even much more so with S-VHS, which did not become super popular though. (I'm actually wondering whether the 2GB figure was for S-VHS, not VHS. Didn't to the math and wouldn't be surprised either way, though.)

      2 replies →

  • This is old school. When I first wrote code back in the Stone Age we used to store our stuff on cassette tape.

  • The Alesis ADAT 8 track digital audio recorders used SVHS tapes as the medium - at the end of the day, it's just a spooled magnetic medium, not hugely different conceptually than a hard drive.

  • That's not really that hacky, audio cassettes were used forever, it's just a tape backup.

  • I remember a similar solution that was marketed in a German mail order catalogue in late 1990s. It could have been Conrad, but I'm not 100% sure. I recall it being a USB peripheral, though. (Maybe I could find more about it in time...)

Reminds me of a guy who stored data in ping messages https://youtu.be/JcJSW7Rprio

  • Back in the day, when protocols were more trusting we would play games by storing data archives in other people's SMTP queues. Open the connection and send a message to yourself by bouncing it through a remote server, but wait to accept the returning email message until you wanted the data back. As long as you pulled it back in before it times out on that queue and looped it back out to the remote SMTP queue you could store several hundred MB (which was a lot of data at the time) in uuencoded chunks spread out across the NSFNet.

  • I watch these things and I begin to realize I'll never be as intelligent as someone like this. It's good to know no matter how much you're grown there is always a bigger fish.

    • I agree that there will always be smarter fish, but you can definitely be this smart it just takes the proper motivation ( or weird idea ) to wiggle its way into your brain.

  • What part of the video discusses this? :D So far it’s about juggling chainsaws

    Edit: OK, I see where this is going. Lol

This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).

From the README.md:

> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.

> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.

[1]: https://news.ycombinator.com/item?id=6932508

How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.

  • Back of the envelope estimate:

    4096 x 2160 x 24 x 60 is your theoretical max in bits/second, 127 billion.

    Assume that to counter YouTube's compression we need 16x16 blocks of no more than 256 colors and 15 keyframes/second; that reduces it to

    256 * 135 * 8 * 15 = 4.1 million bits/sec.

    That's not too awful. Ten minutes of this would get you about 300MB of data, which itself might be compressed.

  • “hope you enjoyed this video. btw, the source code used in this tutorial is encoded in the video.”

  • Would storing data as a 15 or 30 FPS QR code "video" be any more useful? At a minimum one would gain a configurable amount of error correction, and you could display it in the corner.

Turns out any site that allows users to submit and retrieve data can be abused in the same way:

- FacebookDrive: "Store files as base64 facebook posts"

- TwitterDrive: "Store files as base64 tweets"

- SoundCloudDrive: "Store files as mp3 audio"

- WikipediaDrive: "Store files in wikipedia article histories"

  • I wrote one of these as a POC when at AWS to store data sharded across all the free namespaces (think Lambda names), with pointers to the next chunk of data.

    I like to think you could unify all of these into a FUSE filesystem and just mount your transparent multi-cloud remote FS as usual.

    It's inefficient, but free! So you can have as much space as you want. And it's potentially brittle, but free! So you can replicate/stripe the data across as many providers as you want.

    • I was an eng manager on Lambda for a time, and we definitely knew people were doing this, and had plans to cut it out if it ever became a problem. :D

      1 reply →

  • Back in the day when @gmail was famous for their massive free storage for email, ppl wrote scripts to chunk large files and store them as email attachments.

  • My friends and I had a joke called NSABox. It would send data around using words that would attract the attention of the NSA, and you could submit a FOIA request to recover the data. I always found it amusing.

    • There's a feature in Emacs that does that (unsurprisingly.)

      It's called `M-x spook'. It inserts random gibberish that NSA and the Echelon project would've supposedly picked up back in the 90s.

      1 reply →

  • This is pretty tame compared to some actual, practical ones such as https://github.com/apachecn/CDNDrive

    For people who don't read Chinese: it encodes data into ~10M blocks in PNG and then uploads (together with a metadata/index file as an entry point) to various Chinese social media sites that don't re-compress your images. I knew people have used it to store* TBs after TBs data on them already.

    *Of course, it would be foolish to think your data is even remotely safe "storing" them this way. But it's a very good solution for sharing large files.

  • Github repos makes for a pretty good key-value store.

    It even has a full CRUD API, no need for using libgit.

  • I wonder if we could use this technique at place which gov will censored senstive data upload to streaming site like mainland china or North Korea(they do have streaming site right?)

    although for propganda use, shortwave / sat tv is a much much simpler way to distribute information to place like that, but I belive now its hard to get one SW radio for anyone.

  • Reminds me of when I tried to Gmail myself a zip archive, and it was denied because of security reasons iirc. I then tried to base64 it, and it still didn't work, same with base32, until finally base16 did work.

  • I found some pirates uploads video to Prezi so they get free S3 video hosting.

    • At one point there was a piece of software called deezcloud which exploited Deezer's user uploaded MP3 storage, allowing it to be used as free CDN cloud storage for up to 400GB of files. I don't think it works anymore, and I'm not sure if it ever worked well (I never tried it).

  • I wonder if access permissions would be easier to maintain using Facebook...

    • Until one day your base64 ciphertext just so happens to contain a curse word and you get banned for violating "community standards"

I remember my friend did something like this on an old unix system.

Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s

Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.

  • They still do if you pay for Prime. I was surprised to see that even RAW files (which are uncompressed and quite large) were uploaded and stored with no issues. Not the same as "hiding" data but might still be possible.

    • In the interest of technical correctness, RAW files are frequently compressed and even lossily compressed. For example, Sony's RAW compression was only lossy until very recent cameras.

      Given that there are the options for uncompressed, lossy compressed and lossless compressed, I'd say RAW files differ in the stage of the data processing where capture is being done and doesn't imply anything about the type of compression.

      What is relevant is that the formats vary widely between manufacturers, camera lines and individual cameras, so unlike JPEG, it's really hard to create a storage service that compresses RAW files further after uploading in a meaningful way. So anything they do needs to losslessly compress the file.

      2 replies →

    • I guess you can store 24 bits of data as the R,G and B components of a pixel of an "image", and store it as a lossless image...

This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.

Seems like a great way to get your account closed for abuse!

  • You'd be surprised how much YouTube lets you upload.

    I've been uploading 2-3 hours of content a day every day for the past few years. On the same account too.

    I have fewer than 10 subscribers lol.

  • You could make it much harder to detect by synthesizing a unique video with a DNN and hiding the data using traditional stenography techniques.

  • Then the whole HN crowd would have enough outrage materials for weeks. Seems like a win-win situation to me.

  • If it becomes prevalent, I think YouTube would do something like slightly randomize the compression in their videos to dissuade this kind of use.

    • isn't the point here that the sub-pixels being produced are so large that it would take a tremendous amount of artifacts to reduce them to an unreadable state?

      in other words; if YTs compression was affecting it so badly that it prevented the data from being re-read, wouldn't that compression scheme render normal video-watching impossible?

Does YouTube store and stream all videos losslessly? How does this work otherwise?

  • No, YouTube is not lossless.

    The video that is created in the example in the README is https://www.youtube.com/watch?v=Fmm1AeYmbNU

    We can see that data is encoded as "pixels" that are quite large, being made up of many actual pixels in the video file. I see quite bad compression artifacts, yet I can clearly make out the pixels that would need to be clear to read the data. It looks like the video was uploaded at 720p (1280x720), but the data is encoded as a 64x36 "pixel" image of 8 distinct colors. So lots of room for lossy compression before it's unreadable.

The code looks not too big (a single file). But it requries a paid symbolic language (Mathematica) to be used. Anyone with better Mathematica knowledge explain if it can be ported to another symbolic (Sage, Maxima) or non-symbolic languages (R, Julia, Python)

  • Yep! I'm the creator of YouTubeDrive, and there's absolutely nothing in the code that depends on the symbolic manipulation capabilities of Wolfram Mathematica -- you could easily port it to Python, C++, whatever. However, there are two non-technical reasons YouTubeDrive is written in Mathematica:

    (1) I was a freshman in college at the time, and Mathematica is one of the first languages I learned. (My physics classes allowed us to use Mathematica to spare us from doing integrals by hand.)

    (2) I intentionally chose a language that's a bit obtuse to use. I was afraid that I might attract unwanted attention from Google if YouTubeDrive were too easy for anybody to download and run.

Are there any services out there that combine all of these “Store files as XYZ” into some kind of raid config?

Would be interesting if you could treat each service (Youtube, Docs, Reddit, Messenger, etc) as a “disk” and stripe your data across them.

Makes me wonder how many video and image upload sites are now used as easily accessible number stations these days

  • Probably not many. The advantage of plain old-fashioned radio is that the station doesn't keep track of the receivers. Whoever watches a YouTube numbers station is tracked six ways to Sunday.

Rename the project to VideoDrive or something. With the current name Google can get GitHub to take it down on the basis of trademark infringement.

Here I am trying my best to get my favorite videos OFF YouTube given that they could disappear at any second because of an account block, or just "reasons", and this link suggesting storing stuff with YouTube? By god, why? Sure, it's free, practically "limitless" slow file storage, but what a bad idea nonetheless....

Back in the 90’s I considered storing my backups as encrypted stenographied or binary Usenet postings, as a kind of decentralized backup, postings which would stick around long enough for the next weekly backup. (Usenet providers had at least a couple of weeks of retention time back then.)

This gave me a flashback of VBS on amiga… video backup system, record composite video on a vcr, and simple op amp circuit that would decode black and white blobs of video pixels, could backup floppies at reading speed. Was really impressive until, well, vhs… ;)

Just did a google and saw it had evolved over the years, used only the 1.0 implementation back in the days. For those on another nostalgic trip : http://hugolyppens.com/VBS.html

I wonder if something similar could be useful for transmitting data optically, like an animated QR code. Maybe a good way to transmit data over an air gap for the paranoid?

What does the OP have against “Google Drive” when seeking file storage via a Google Service?

Horses for courses… this is how we end up with pictures clogging transaction ledgers

Reminds me of the movie Contact where the alien civilization encodes the whole design of a traveling machine inside Olympic telecast video.

Popularity of such projects is the reason of imposing more and more constraints on systems that are somewhat open (at least open to use). Maybe instead of figuring out how to abuse an easy-to-use system, people should figure out how to abuse hard-to-use systems, like e.g. creation of open protocols for closed systems. That would be an actual achievement.

Upload videos as data, then build an app that streams and decodes these files back into videos. Voila, popcorn time.

Yes we have all done or used something similar when we were younger, but really, should this be on the front page of HN? This is abuse of a popular service and if it becomes popular it will only make YouTube worse and YouTube is getting worse without any additional help.

BEWARE: Until they clamp down and delete the files, you lose your data.

Good technical experiment though!

  • Since he's made a ready-to-use software, yeah Google will probably ban this quite quickly...

I suspect people in my office who send everything as a Word attachment with an image, PPT, Excel workbook, etc., embedded, are doing this unknowingly.

There are even Word files I've found that have complete file path notation to ZIP files.

I think my favorite part of this is that the example video linked to this has ads on it. It's a backup system that pays you. Well, until someone at Youtube sees it and decides to delete your whole account.

This reminds me of Blame! where humans are living light rats in the belly of the machine. Lol, also reminds me of the geocities days where we created 50 accounts to upload dragon ball z videos.

I absolutely love this idea. I need to dig more into the code, but its almost like using twitter as a 'protocol' using youtube as a storage.

So many ideas are flying to mind. Really creative.

I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).

  • I believe YouTube supports random access, or otherwise you wouldn’t be able to jump around in a video. Youtube-dl also supports resuming downloads in the middle, I believe.

    • True... But guessing YouTubeDrive 'decoder' needs whole video to get you back anything close to what you put in.

      Otherwise each frame would have to have a ridiculous amount of encoded overhead.

      Ahh, NM cant even see that working.

      edit: Maybe a file table at built from from specified first N frames, that delivers frameset/file map ...

      Still nothing like skipping spots in a video. That relies on key frames and time signatures.

      Cool stuff nonetheless...

      1 reply →

I like this. The last wave of Twitter users into the fediverse caused my AWS bill to go up 10 USD a month. Might have to start storing media files on youtube instead ;)

Reminds me of the other post that used Facebook Messenger as transport layer to get free internet in places that internet is free if you use Facebook apps.

This seems like something Cicada 3301 would use

I wonder how many random videos like this are floating around that are encoding some super secret data...

I’m thinking maybe we can divide files into pieces and turn each pieces into a QR code then turn each QR code into a single frame?

Wasn't there more or less recently on HN something like "Store Data for free in DNS-Records"? Reminds me of this.

Imagine a Raid6 of four youtube 11-digit IDs

Bet google isn't happy with this idea and will definitely try to break it asap

Very cool. I wonder how difficult it would be present a real watchable video to the viewer. Albeit low quality, but embed the file in a steganographic method. I think a risk of this tech is that if it takes off, YT might easily adjust the algorithms to remove unwatchable videos. Perhaps leaving a watchable video could grant it more persistence than an obvious data stream.

  • Sure, but the more structure your video has to have, the harder it becomes to hide information stenographically within it. Your information density will become very low I think.

there was a story on HN a while ago in which someone stored unlimited data in Google Sheets!

I also “invented this idea” from scratch in a series that exists solely in my mind where I abuse a variety of free services for unintended purposes.

I could seemingly never explain the concept to other developers in a meaningful way or cared myself to code these out.

Anyway my quick summary in this is just think of a dialup modem. You connect to a phone line and you get like a 56k connection. That sucks today, sure, but actually it’s kind of mind blowing for how data transfer speeds worked at the time.

You know how else you can send data via a phone line without a modem? Just literally call someone and speak the data over the phone. You could even speak in binary or base64 to transfer data. It’s slow, but it still “works,” assuming the receiving party can accurately record the information and hear you.

That seems to be what this main topic is. Using a fast medium (video player) to slowly send data over the connection, like physically speaking the contents of other data. But there could be some problems with this approach.

Mainly, YouTube will always recompress your video. For this method, that means your colors or other literal video data could be off. This limits the range of values you can use in an already limited “speaking” medium.

if this wasn’t the case, we would like to use a modem connection. Just literally send the data and pretend it’s a video. However, where I left off on this idea, we appear to be hard blocked due to that YouTube compression.

We can write data to whatever we want and label it any other file type. (As a side note, Videos also are containers like zip that could be abused to just hold other files)

But YouTube is an unknown wildcard that changes our compression and thus our data which seems to invalidate all of this.

If we somehow convert an exe to an avi, The YouTube compression seems to just hard block this from working like we want. If we didn’t have that barrier, I think we could otherwise just use essentially corrupted videos to become other file types if we can download the raw file directly.

(steganography is a potential work around I haven’t explored yet)

Without these, we’re left to just speak the data over a phone which compresses our voice quality and in theory could make some sounds hard to tell apart. This leaves us in the battle of what language is best to speak to avoid compression limiting our communication. Is English best? Or is Japanese? What about German? Which language is least likely to cause confusion when speaking but also is fast and expressive?

This translates into what’s the best compression method for text or otherwise pixels in a video where data doesn’t get lost due to compression? Is literal English characters best? What about base64? Or binary? What if we zip it first and then base64? What if we convert binary code into hex colors? Does that use less frames in a video? Will the video be able to clearly save all the hex values after YouTube compression?

This works on the same principle as the video backup system (VBS) which we used in the 1980's and the early 1990's on our Commodore Amigas: if I remember correctly, one three hour PAL/SECAM VHS tape had a capacity of 130 MB. The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts.

https://www.youtube.com/watch?v=VcBY6PMH0Kg

SGI IRIX also had something conceptually similar to this "YouTubeDrive" called HFS, the hierarchical filesystem, whose storage was backed by tape rather than disk, but to the OS it was just a regular filesystem like any other: applications like ls(1), cp(1), rm(1) or any other saw no difference, but the latency was high of course.

  • That's how digital audio was originally recorded to tape back in the 1970s and 80s: encode the data into a broadcast video signal and record it using a VCR.

    In the age of $5000 10 MB hard drives, this was the only sensible way to work with the 600+ MB of data needed to master a compact disc.

    That's also where the ubiquitous 44.1 kHz sample rate comes from. It was the fastest data rate could be reliably encoded into both NTSC and PAL broadcast signals. (For NTSC: 3 samples per scan line, 245 scan lines per frame, 60 frames per second = 44100 samples per second.)

  • 130 MB for the whole tape is not a lot. It equals to a floppy disk throughput, which is probably not a coincidence. However, basic soldering implies that the rest of the system acts like a big software-defined DAC/ADC.

    Dedicated controller could pack a lot more data, as in hobo tape storage system: https://en.wikipedia.org/wiki/ArVid

    • Dedicated controllers were absolutely out of the question because nobody could afford them, which is why Amigas were so popular: a fully multitasking, multimedia computer for 450 DM. That's 225 EUR! Somebody that cost sensitive won't even consider a dedicated controller; back then wasn't like it's today.

      This was at a time when 3.5" floppy disks were expensive (and hard to come by), and hard drives were between 40 - 60 MB, so 130 MB was quite practical. The floppy drive in the Amiga read and wrote at 11 KB / s.

      And yes, this was a DAC and an ADC in software, with added Reed-Solomon error correction encoding and CRC32. The goal was to be economical. The end price was everything; it had to be as cheap as possible.

  • "one three hour PAL/SECAM VHS tape had a capacity of 130 MB"

    This reminds me of the Danmere Backer.

    "The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts."

    This reminds me of the DIY versions of the Covox Speech Thing: https://hackaday.com/2014/09/29/the-lpt-dac/

Not immediately obvious from the Readme, but does this rely on YT always saving a providing download of the original un-altered video file? If not, then it must be saving the data in a manner that is retrievable even after compression and re-encoding, which is very interesting.