Read any good telly recently? How about catching up on your favourites shows on an e-ink reader or tablet? Sounds silly but it can be a nice, peaceful alternative to sit back and flick through Holby City. OK, we’re reaching a little bit.
The real fun here is learning about video and image manipulation, optical character recognition, generating PDFs in code, and using Python as a powerful scripting language to pull several tools together. We’ll take the raw recording produced by the Raspberry Pi TV HAT and create a PDF document, complete with captions taken from subtitles.
This tutorial was written by PJ Evans and first appeared in The MagPi 80. PJ is a writer, developer, and Milton Keynes Jam wrangler. He has terrible taste in movies.
Comic Book Creator: Get recording
Before starting, make sure you have your Raspberry Pi set up with a TV HAT and Tvheadend installed (see ‘You’ll Need’ list for a helpful link). You will need a recording from Tvheadend (it doesn’t matter what, but maybe the news wouldn’t be the most exciting choice). You can select any programme and record it, then find the recording under ‘Digital Video Recorder’ then ‘Finished Recordings’. From here you can download the file or you can find recordings in /var/lib/hts. Tvheadend records in the original broadcast MPEG‑2 TS format (or ‘transport stream’).
You'll need
- Raspberry Pi TV HAT
- Tvheadend installation
- e-book reader or tablet
Install dependencies
The process of converting a recording to a PDF is going to take several discrete stages. These include video extraction, optical character recognition (OCR), and generating PDFs. Not all of this is easily within Python’s reach, so we’ll use Python to manage the process, delegating the ‘heavy lifting’ to some command-line utilities. Their purposes will become apparent as we go through the tutorial. Here’s what you need to do at the command-line:
sudo apt update && sudo apt -y upgrade
sudo apt install git python3-pip ffmpeg imagemagick
pip3 install fpdf arrow
Compile and install ccextractor
The utility ‘ccextractor’ is able to remove subtitles from DVB (Digital Video Broadcasting) recordings. Unfortunately, this application is not available in the APT repositories, so we’re going to have to compile it ourselves. We’ll use Git, which we installed in the previous step, to download the source code from its repository. Then we’ll install its dependencies (other programs it relies on) before compiling and installing the app.
cd
git clone https://github.com/CCExtractor/ccextractor.git
sudo apt install -y libglfw3-dev cmake gcc libcurl4-gnutls-dev tesseract-ocr tesseract-ocr-dev libleptonica-dev
cd ccextractor/linux
./build
sudo mv ./ccextractor /usr/local/bin/
Install the script
As this is a series of steps potentially involving hundreds if not thousands of files, we’ve provided a Python script to control the process. It’s a bit on the large side to type in manually, so again we’ll use Git. To get the code on to your Pi, enter the following commands:
cd
git clone https://github.com/mrpjevans/comical.git
You will now have a new directory, comical, containing the script and a few other files we need.
Extract the subtitles
Rather than just run the entire script, which wouldn’t show us much, let’s run it in stages. Make sure you know the path to your recording. We’re using the public domain movie Plan 9 From Outer Space, regarded as one of the worst films ever made. The first job is to extract the subtitles from the video so we can process them and use them as captions.
cd ~/comical
python3 comical.py -i plan9.ts --extract
A folder, plan9.d, is created, containing a PNG image file for each subtitle. An XML file, plan9.xml, contains the timing information for each title.
Cleaning up
So why are our subtitles images? It’s because that’s the European digital broadcast standard. Subtitles in DVB are actually a second video stream. To make use of them, we’ll need to take the PNGs that ccextractor created and perform optical character recognition on them. Currently they’re too small to be recognised accurately by the OCR application Tesseract. So, we’ll use the ImageMagick utility ‘mogrify’ to resize them and greyscale them.
python3 comical.py -i plan9.ts --clean
If you have a look in the directory, you’ll see the subtitles are now large and monochrome.
OCR With Tesseract
Tesseract is a remarkable utility originally developed by Hewlett-Packard and open-sourced. Now you have it installed on your Pi, you can use it for many other purposes. To convert something into text, just run:
tesseract
Our script reads in every image in the directory and sends it to Tesseract for processing. At this size, you can expect a good level of accuracy from DVB titles.
python3 comical.py -i plan9.ts --ocr
In the same directory you’ll now see a matching ‘.txt’ file for each graphic subtitle.
Extract images
The next part of our script will extract a single still image for each subtitle based on that subtitle’s timestamp. To get the timestamps, we use Python’s built-in XML parsing libraries. For each timestamp, we then ask ffmpeg (a Swiss Army knife for video processing) to extract a still image as a JPEG and save it in (in our case) a new directory called plan9_process. The file name represents the time code at which it appears. We also copy across the subtitle with a matching file name.
python3 comical.py -i plan9.ts --images
A picture worth a thousand words
Have a look in your equivalent of plan9_process. We’ve got everything we need to build our PDF. Right? Well, yes, provided there’s no break in dialogue, which seems unlikely. What about scenes with no subtitles? Again, ffmpeg comes to our rescue. An advanced filter can detect when a significant amount of the screen changes, denoting a scene change. Our script will ask ffmpeg to detect every scene change and then extract further JPEG images, ignoring any that are within a second of a subtitle image.
python3 comical.py -i plan9.ts --detectscenes
python3 comical.py -i plan9.ts --extractscenes
Your _process directory is now populated with the additional images.
Build it!
The final part of the script will take all the images and text files and convert them into a PDF for you to enjoy.
python3 comical.py -i plan9.ts -o plan9.pdf --build
This part of the script uses the fpdf Python library to lay out each image in a 2×3 grid, adding pages as needed. Where there is a matching subtitle, it is placed below the image. To give the final result a bit more of a graphic novel feel, there is a comic book-style font included in the comical repository which is used by fpdf when rendering text.
Adjusting fonts
You might find that sometimes, dependent on the subtitle lengths, the captions can overflow; or that the font size isn’t large enough, with too much white space. The script provides a few arguments that can be specified on the command line to help with this:
python3 comical.py -i plan9.ts -o plan9.pdf --build --fontsize 8 --lineheight 5 --offset 68
Here, fontsize sets the size of the font. This needs to be in step with lineheight, which sets the vertical spacing between lines. offset sets the position of the first line of text below the image. The default settings are shown above.
Automating and fine-tuning
The comical.py script comes with a number of arguments to control its behaviour. In the tutorial we’ve gone step by step, but you could have just run the following:
python3 comical.py -i plan9.ts -o plan9.pdf --full
This performs every step in one operation. You can also do a pre-build:
python3 comical.py -i plan9.ts -o plan9.pdf --prebuild
This performs every step except building the PDF, as you might want to remove unwanted images and subtitles to crop the PDF to the things in which you are interested. Delete the unwanted files from your _process directory and then run:
python3 comical.py -i plan9.ts -o plan9.pdf --build
Make it your own
You could regard this project as a bit frivolous, but in the process of putting it together we’ve looked at several cool technologies such as video manipulation and optical character recognition. Examine the script code to see how we use Python to link all these different utilities together and marshal the data flowing between them. Why not see if you can improve on the results? Some ideas include adding filters to the images to give a graphic novel appearance, watching the recordings folder to trigger automatic conversion, creating glitch art, or mashing up different recordings.