Live Coding - Java Coding Exercise: a tool for synchronizing 2 dirs (backup) using SHA comparisons
Link of the Live Coding video lasting 28 minutes
live-coding.mkv
Introduction : Live Coding - Student Exercise Correction
I asked my students to code in Java a “File Synchronisation Tool”, using SHA-1 comparisons.
Basically, after the course on java.util.List, java.util.HashMap and java.io.* , I wanted to give them an exercise to practise what they had learned.
Students level in java are varying from beginners to intermediates. They are 20-21 years old, and working part-time in a company, and part time at university (3 years after french graduate “baccalaureat”)
They had approximatly 1 month to give me their work, as an eclipse source code project, including tests.
Description of the Synchronizing Tool
The “File Synchronisation Tool” must compare recursively 2 directories (aka sourceDir and destinationDir), and compute the list of copy/removal files commands to get the destination directory be in sync with the source directory.
For example, suppose you want to compare source directory “/data/dir1” to destination directory “/data/dir1.backup”, and you have some files added in source “/data/dir1/a/foo-new.txt”, modified “/data/dir1/a/bar-modified.txt”, and deleted “/data/dir1/a/baz-deleted.txt” (does not exists in source dir, only in “/data/dir1.backup/a/baz-deleted.txt”)
The tool ouput should be a list of commands to synchronize from source to dir:
The synchronisation tool must work in a “3-way” algorithm style, by first computing all SHA-1 signatures of source files, then SHA-1 signatures of destination files… then find matches and mismatches between source and destinations.
It could(should) also warn on duplicate file contents found in source, so possibly use symlink on destination instead of wasting disk copy.
It was explicitly stated that the internal datastructure to use was a Map of List, whose key is the SHA-1 of file content, and values are the related paths of all files having the SHA.
As as example, suppose you have a directory with the following content
then the data signature of theses files would be a Map with the following content:
Code Difficulty and Coding Speed Performance
The code is pretty simple, as the following shell command on the correction source code shows it:
The JUnit tests is here:
And the Java code outline is here
Basically, there is a main() to scan the 2 directories and compare the 2 Maps by SHA-1 of List of paths.
The main part takes 151 lines of code, and the JUnit tests 73, which makes a total of 224 lines.
You might wonder that it is not so much, and it should take ~ 30 seconds per line to code, so ~1 hour to get it…
Indeed, it took me ~2 hours the first time, 1 hour the second time… and 35mn the 5th time.
Being fast is possible only because you repeat something you already knows, or knows how to do it… It is just the repetition and training that makes you becoming fast, like every athletes.
Notice that the 3-th and 4-th times, it actually took me more times than before (~1h30mn) because I added more JUnit tests, checked with linux bash command “sha1sum”, and I found a bug in my hexa decimal conversion from byte[] to String… then I learned a simpler way and safer to code it.
The 5th times, I did a small mistake in my code (forgot to scan for deletions before additions/modifications), so the result was almost good, but incorrect if you try to execute it for real.
The video provided is the 5-th one.
If my students were also experimented developper, it should take them also few hours… but for them, it was their first coding of such tool … I received by email the first projects (by asking several times) after more than 15 days, but I think they have spend only few (~3 ?) days on it.
Unfortunatly, only 4 out of 18 students had done something, and the results submitted were FAR FAR from expected, both in term of results, performance optimisations and code readability/quality !!
I had to reply to them several times, to explain what was wrong in their programs.
Basically, the forgot to do a recursive scan of the directory, they only listed 1 level of directory. Then they failed to understand that the SHA1 was computed on the file content, that it should be converted to hexa string (or whatever array class in java implementing equals() and hashCode() ), and that they had to handle the possibility of file content duplicated several times (they used Map<String,String> instead of Map<String,List
More on Live Coding Video Recording ...
It took me hours to compress my video file into a usable, standard, and not “too big” file !!!
I have started with a recorded file in a proprietary format (non standard, not usable for web users..) which was 30Mo.
I could only convert from this format to “.MOV” file which was 1.5 Go (not usable for web users, for other reasons)
I ended it up after several days in a MP4 file of ~ 22Mo … but I was still very disapointed by the result
Obviously “.MOV” format was not adapted… and/or the compression was not done with the correct parameters. Clearly, MOV is adapted to movies (successions of natural photos), and JPEG compression is pretty good at that. But for artificial computer graphics images, it is a pity… and a shame, because such images are extremely simple (only lines, and text font), so they “should be” compressed much much more! Not only the spatial redundncy is high, but also the temporarl one (things changes very slowly on the screens, and only in the regions near the mouse / keyboard input focus).
And because I am curious about everything in computer sciences, I have spent the last few past-days/current-weeks(/incoming-monthes) searching for tools, documentations, research papers on the subject of “screencast video compression” !
I am convinced there is a better solution…
More on this in following post: Screencast Recording & Dedicated Computer Graphics Video Compression