Learning to Code: Week 8 – Recover

6/30/16

Problem Set 4

In ~/workspace/pset4/questions.txt, answer each of the following questions in a sentence or more.

  • What’s stdint.h?
    • A header file that standardizes integers (exact-width) across different systems.
  • What’s the point of using uint8_t, uint32_t, int32_t, and uint16_t in a program?
    • It’s easier to have aliases to use interchangeably for explicit fixed width integers. Specifies the number of bytes because one system an int might be a long in another.
  • How many bytes is a BYTE, a DWORD, a LONG, and a WORD, respectively?
    • 8, 32, 32, 16
  • What (in ASCII, decimal, or hexadecimal) must the first two bytes of any BMP file be? (Leading bytes used to identify file formats (with high probability) are generally called “magic numbers.)”
    • The file type; must be BM (ASCII).
    • 0x42 0x4D in hexadecimal.
  • What’s the difference between bfSize and biSize?
    • bfSize is the size of, in bytes, of the bitmap file.
    • biSize is the number of bytes required by the structure. (BITMAPINFOHEADER)
  • What does it mean if biHeight is negative?
    • If biHeight is negative, the bitmap is a top-down DIB and its origin is the upper-left corner.
    • If biHeight is negative, indicating a top-down DIB, biCompression must be either BI_RGB or BI_BITFIELDS. Top-down DIBs cannot be compressed.
  • What field in BITMAPINFOHEADER specifies the BMP’s color depth (i.e., bits per pixel)?
    • biBitCount
  • Why might fopen return NULL in copy.c:37?
    • If the file cannot be found.
  • Why is the third argument to fread always 1 in our code?
  • What value does copy.c:70 assign padding if bi.biWidth is 3?
    • int padding =  (4 – (bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;
    • RGBTRIPLE is 3 bytes (24 bits)
    • 3 % 4 is 3
    • Padding = 3
  • What does fseek do?
  • What is SEEK_CUR?
    • How you set the offset to be relative to the current position indicator instead of the start or end.

Whodunit.c

Implement that same idea in whodunit. Like copy, your program should accept exactly two command-line arguments. And if you execute a command like the below, stored in verdict.bmp should be a BMP in which Mr. Boddy’s drawing is no longer covered with noise.

  • Once I understood the copy.c file this update to the file isn’t so bad. Copy just scans through all the pixels in a file and copies it to the output file.
  • All that is required is to check the value of the RGBTRIPLE struct BEFORE anything is written to the new file.
  • I will check for any red pixels and change them to white and see what happens.
  • Only problem I’ve run into so far is being confused on what the RGB value for red and white are. Guessing what it is isn’t a good plan when I can just look it up.
  • RED is  rgb(255,0,0)
  • WHITE is rgb(255,255,255)
  • Got red to switch with white and I know who the person in question is, however I want to darken the image to make it more clear, but I am not sure exactly how to do that.
  • I will use the xxd command and see what pixels are blue and darken those.
  • xxd printed out too much crap for me to get a good look so forget that.
  • Revealed picture is of Rick Astley…again. Look below.

BEFORE

image02

AFTER

image04

Whodunit solution here

 

Resize.c

7/1/16

Resize.c continues

  • Continued work on resize. I think I have the loops done correctly conceptually. Haven’t messed with padding or header file changes yet.
  • I was able to figure out the header file size changes, but it seems that my image resizes properly horizontally, but not vertically.
  • See below for the 8×8 px smiley increased by a factor of 4 to 32×32 and how it doesn’t come out right. There is just white space below my squished smiley face.

image25

image05

  • Figuring out fseek is a bitch and a half but this guy here gives a really really good explanation: https://www.reddit.com/r/cs50/comments/3sv0lz/cs50_pset4_resize_i_need_help_with_this_fseek/cx0n0ak
  • Oh damn I think I finally understand seek. It continues to move on the original infile regardless of what I’m doing in my output file! So I need to put the pointer back to the beginning of the infile before I do that loop again or it will just end up printing the next line when I thought it was just printing the same pixels again when it actually couldn’t since the fseek index just kept going and going.

The only thing to be super clear on before you begin is that the “file pointer” is not a pointer in the sense of a “pointer to character” or “pointer to integer”. It is literally an integer that records an offset from the beginning of the file.

Then: When you open a file, the file pointer is normally placed at the beginning of the file. If you open the file for reading, this means that the next byte to be read will be the first byte. If you open the file for writing, this means that all the existing bytes are discarded and the next byte written will go at the beginning of the file.

Every time you read, the bytes are returned to you and the file pointer moves ahead exactly that many bytes.

Every time you write, the bytes are stored in the file and the file pointer moves ahead exactly that many bytes (so the file pointer in a writing file is generally at the end of the file, so the next write will append bytes to the file).

Do I need to remember where the file pointer was, then fseek to it’s previous location and then fread?

Yes, otherwise you would be reading the next line. If you want to read the same line again, you use ftell() or fgetpos() to remember where you are; then fseek() or fsetpos() to reposition; then start reading again.

Can I just fread from the outfile to copy the scanlines?

You could. You already have code that reads the input scanline and converts to the output scanline, so rewinding and re-reading the input file can be easier. On the other hand, since you just converted a scanline and wrote it to the output file, you could read that back in and write it out (n-1) times.

  • I think I fucking got it! My picture was coming out all fucking weird after I tried to move the pointer back to the front of the file, but only because I was using the resized output picture’s width instead of the originals which would throw the pointer into some weird part of memory causing my resized picture to look like complete crap!
  • It seems to work now!

Resize solution here

 

7/2/16

Vacation in Asheville. Nothing done today. 🛩

7/3/16

Spent a little bit of time in the morning working on fleshing out what to do for the next part of Pset4 called Recover – Recovers JPEGs from a forensic image.

  • Ensure proper usage to accept 2 command-line arguments.
  • I need to read the input file and check if NULL.
  • Open and read the file 512B at a time..
  • Iterate through each block and read the first four bytes checking for a jpg signature.
  • If a jpg signature is found then I need to rewind 4 bytes then copy that block into jpg file.
  • Check the next block of 512B.
  • If no signature there, copy that block into the jpg.
  • If there is a signature then close the previous jpg file and start a new one incrementing the filename 001, 002, 003, etc.
  • End once I detect EOF.
  • I wonder if I can do the above recursively so that I continue the check blocks until a jpg signature is found then copy the contents to the file going back down the stack?
  • Can I create a struct to easily look for a jpg signature?
  • The video overview of the problem helped visualize it.

image27

7/4/16

Continued working on PSET4 Recover.c

7/5/16

Recover solution posted here!

 

7/6/16

Scrabble is a popular word game where players remove tiles with letters on them from a bag and use them to create words on a board. The total number of tiles as well as the frequency of each letter does not change between games.

For this challenge we will be using the tile set from the English edition, which has 100 tiles total.

Each tile will be represented by the letter that appears on it, with the exception that blank tiles are represented by underscores _.

The tiles already in play are inputted as an uppercase string. For example, if 14 tiles have been removed from the bag and are in play, you would be given an input like this:

AEERTYOXMCNB_S

You should output the tiles that are left in the bag. The list should be in descending order of the quantity of each tile left in the bag, skipping over amounts that have no tiles.

In cases where more than one letter has the same quantity remaining, output those letters in alphabetical order, with blank tiles at the end.

If more tiles have been removed from the bag than possible, such as 3 Qs, you should give a helpful error message instead of printing the list.

  • A guy on reddit did it in C, and I couldn’t understand what he did at all so I asked. He encoded the tile count table which is ingenious!
  • Question about these lines:

A char array is really just an array of bytes. To encode the table of letter counts compactly in the source, I’m using a string literal to fill the char array. The advantage is that the counts don’t need to be individually delimited, only the collection as a whole (with quotes: “…”). I could instead have written it as {9, 2, 2, 4, …} but it would make for larger source code.

The downside is that the low-numbered bytes aren’t written so concisely. The first 32 ASCII characters are control characters and generally need special escaping to appear in a string. To compensate, I “encode” it by adding a fixed count to each, equal to the character a. For example, the count for A is 9, and a + 9 == j. To get the real count, I “decode” the count later in the program by subtracting a.

The positions in the array are A through Z. The challenge uses _ to encode spaces, which in ASCII is conveniently located shortly after Z. The counts for characters between these ([, \, ], ^) are padded with space to make them stand out to the human reader. That way _ doesn’t need to be a special case.

You’ll notice that I also subtract A in the program. This is to correct the offset into the array. A – A == 0, B – A == 1, etc. That first loop just reads the characters one at a time, subtracting 1 from their slot in the counts char array. If any of these dip below a (second loop), then we’ve detected the challenge’s error condition.

  • I am not sure how I would solve it myself. I don’t really know off the top of my head how to match a letter like A to the first element in an array list.

7/7/16

  • Woke up this morning and realized how I could do Challenge #272 using the ASCII values matching to an array that represents the Tile Count table found here: http://scrabblewizard.com/scrabble-tile-distribution/
  • Encoding to just the values is essentially what the Reddit guy did, just his seemed more elegant since he only had to use a string rather than an array. Also he didn’t have to deal with the ASCII values between Z and _ like I have to (I think).
  • I spent an hour or so working up the solution and I’m SO CLOSE!
  • What I’ve coded up is 90% of the way there, but the error check for invalid number of tiles removed and printing for the number of tiles if 1 or 0 is fucked up.
  • I am not good enough with GDB to really narrow down the bugs without spending a long time clicking through my loops.
  • Expected Output for Input “PQAREIOURSTHGWIOAE_”

  • My output

  • Ok I thought about it for a second then realized my stupid mistake. I saw the reddit guy use sizeof in his first loops and for some reason I used it too, but sizeof returns the number of bytes in an object not the length!
  • Not sure how to get the length of an array so I just put in the exact number into my loops and that almost fixed it, but I realized when checking for 0 number of tiles I didn’t have another check to see if the character was a valid Scrabble tile.
  • I added in that check and the code works.
  • Final bug I’ve run into is that after my error posts when an invalid number of tiles have been removed from the bag I don’t know how to printer that error then end the program. It continues to run.
  • I tried using break in my error if statement, but apparently that doesn’t work in C: http://stackoverflow.com/questions/2565659/does-break-work-only-for-for-while-do-while-switch-and-for-if-sta
  • Using exit() seems to be what I want to use. http://stackoverflow.com/questions/2425167/use-of-exit-function
  • Changed exit() to return -1. People think using exit() sucks. Not exactly sure why though.
  • Also I have another bug where sometimes it will throw an error for a blank character and sometimes it won’t. I believe I have some memory allocation issues.
  • It’s not a memory allocation issue it’s just a really dumb error. My first loop through the input I put 31 as the counter, when it should be the length of the input string. Not sure how I missed that.
  • Also I forgot to put a check for valid characters on my second output if statement.

Final solution to this challenge on my gist here!