Craig Rodrigues!

Learning to Code: Week 17 - Problem Set 5 "Mispellings"

TOTAL 🍅  THIS WEEK: 30

Goals For This Week:

  1. Finished “Linked List Problems”
  2. Finish CS50 Problem Set 5 (9/4)
  3. Get 20+ pomodoros (9/5)

September 01, 2016  🍅🍅🍅

void SortedInsert(struct node** headRef, struct node* newNode) {
    struct node* current = *headRef;

    if (*headRef == NULL || ((*headRef)->data) >= newNode->data) {
        newNode->next = *headRef;
        *headRef = newNode;
    }
    else {
        // checking the next node BEFORE I am actually at it.
        while (current->next != NULL && current->next->data < newNode->data) {
            current = current->next;
        }
        // current should be at where we want to insert the node now.
        newNode->next = current->next;
        current->next = newNode;
    }
}
    • Solution to InsertSort() now that the above function works correctly.

void InsertSort(struct node** headRef) {
    // create list and current tracker
    struct node* newList = malloc(sizeof(struct node));
    newList = NULL;

    struct node* current = *headRef;
    struct node* next; // holder of the next node so we can refer back to it

    // traverse original list inserting the nodes into the new list in order
    while (current != NULL) {
        next = current->next;
        SortedInsert(&newList, current);
        current = next;
    }

    // point headRef to new list
    *headRef = newList;
}
    • Write an Append() function that takes two lists, 'a' and 'b', appends 'b' onto the end of 'a', and then sets 'b' to NULL (since it is now trailing off the end of 'a'). Here is a drawing of a sample call to Append(a, b) with the start state in gray and the end state in black. At the end of the call, the 'a' list is {1, 2, 3, 4}, and 'b' list is empty.

image01

      • Random thoughts: Can I just find the end of list A and then have it point to the first node in list B and then just make B’s head NULL? I can’t think of a way to do this without finding the end of A.
      • I must also account for the case when the A list is empty and you just point A to the list of B.

September 02, 2016 🍅🍅

  • This site has coding bootcamp “prep” lessons. May be worth checking out later. https://coderbyte.com/courses/?s=courses#all
  • Problem Set 5 Walkthrough Video
    • Makefile - allows us to compile multiple files at once and pass in the flags we want.
    • Clean - removes exe files and .o files after a seg fault?
    • Dictionary.h - contains the function prototypes
    • Dictionary.c - fill in the functions laid out here
    • Speller.c - not actually editing this at all
    • First function to tackle is LOAD() - loads the dictionary into memory.
    • Second function is SIZE() - returns the number of words in the dictionary loaded.
    • Third function is CHECK() - checks if a word is spelled correctly or not.
    • Fourth function - UNLOAD() - unloads dictionary from memory (and the hash table?)
    • Hash functions are deterministic. Pass a value into the function and always get out the same result. Must also fall into the range of your table (modulo?)
    • Should be well balanced and spread out. Minimize collisions.

image06

This is not a textbook. It is a collection of short stories meant to help you refine your way of thinking about and working on software projects.

You won’t find any neatly packaged prescriptive advice within these pages. Instead,

you will see example after example of the many problems we encounter as practicing developers, and the thought process involved in discovering how to solve them.

To encourage you to read this work in a way that brings out your creative side, I made

you the main character in each of its stories. This will undoubtedly feel a bit weird to

you, and even as I write this introduction -- it feels weird to me, too!

My true hope is that by embedding you in the work, then this book can be more than

yet another stream of stern admonitions flowing down from the hilltops where

Expert Programmers live. Instead, I want you to ask questions like, “If I were in this

situation, would I really do things this way? If not, why not?”

When I think about that first codebase today I want to vomit in my own mouth. I am glad that I no longer have access because I want to deny it ever existed. It was a mess of spaghetti code, and even though we built it quickly, it took a lot longer than it should have.

Ironically, now that I have experience, I think experience is priceless. What’s made me change my mind?

  • Telegraph Academy is a program/bootcamp for POC: http://www.telegraphacademy.com/
    • They also have scholarships for their prep course which can be taken online/streaming (beta).

September 03, 2016  🍅x11

    • More Problem Set 5 Walkthrough Video
      • Tries
      • Valgrind
      • If I find a hash function on the internet then cite it.
    • Speller/Load video (12 minutes)
    • Worked on LOAD() (and the rest of Problem Set 5)
      • Steps
        1. Open the dictionary file.
        2. If dictionary is NULL then return false.
        3. Scan through the dictionary file word by word (WHILE LOOP).
        4. Increment our global size variable to keep track of the number of words.
        5. Use the hash function on the current word to get an index.
        6. Push the word into the linked list at that hashtable index.
        7. If the hashtable[index] is null then put the node there.
        8. If it has something there already (collision) then PUSH() the new node onto the front of that linked list.
      • My biggest problem so far is to be able to effectively test my various parts of this program/functions without needing to complete the whole thing first.
      • I want to be able to test that a dictionary loads into my hash table correctly before I move on. But how?
      • Ok, I was able to check with GDB using the proper breakpoint.
      • My load function works with a hash function that returns 3 for every word and it works to PUSH() all of the new words into the linked list at that index. Working on the Linked List Problems really made it easy to change my PUSH function from there to work here. I have a much better understanding of the pointers and linked list workings now!
      • Next is to get a hash function that doesn’t suck and load the entire small dictionary into a large hash table (500,000 slots).
      • So far my entire program WORKS as far as I can tell, except the very last word in a file isn’t checked....
      • I also need to implement the check for uppercase letters and apostrophes. A word’s case shouldn’t matter when trying to see if something is spelled correctly.
      • Ok it took a bit longer than I expected, but my CHECK() function works now. The problem was dealing with const variables that cannot be changed. Hadn’t ever really dealt with those before so I kept getting errors involving that fact.
      • Right now my speller.c function is quite a bit slower than the staff’s. The check time is 0.40 vs 0.02 for theirs. I assume the hash function is where I’m losing the time, perhaps I am running across a ton of collisions (I never checked to see how many are happening).
      • I changed my hash function to this one mentioned here and now my speller program is WAY faster. From a 0.40 load time to 0.02.
      • And with that I think I’m finished with this. I expected it to take 5 days, but only took 4 hours….wow.
      • I will go back and insert a collision counter and check how many collisions I had before versus the new function.
      • Going back to the function I fixed how to modulo worked because I had it working weirdly and it seems first FIRST function wasn’t as slow as I thought. It still has almost 20k more collisions, but it’s still pretty fast. Much faster than before. No idea what would cause 0.4s load times if I ran modulo outside of the Push function?
      • Also I realize it’s not just about collisions, it about how evenly spread out a hash function is. I am not sure how to accurately find out how spread out a function distributes hashes.
/**
 * dictionary.c
 *
 * Computer Science 50
 * Problem Set 5
 *
 * Implements a dictionary's functionality.
 */

#include <stdbool.h>
#include "dictionary.h"

// global variables
struct node* hashtable[SIZE];
int dicSize = 0;
char word[LENGTH +1];

void Push(struct node** headRef, const char* word) {

    struct node* newNode = malloc(sizeof(struct node));
    strcpy(newNode->word, word);

    if (*headRef == NULL)
        *headRef = newNode;
    else
    {
        newNode->next = *headRef;
        *headRef = newNode;
    }      
}

void PrintList(struct node* head) {

    struct node* current = head;

    while (current != NULL) {
        printf("%s ", current->word);
        current = current->next;
    }
    printf("\n");
}

void DeleteList (struct node** headRef) {

    struct node* current = *headRef;
    struct node* next;

    while (current != NULL) {
        next = current->next;
        free(current);
        current = next;
    }

    *headRef = NULL;
}

/**
 * Hash function. 2nd one found here: https://www.reddit.com/r/cs50/comments/1x6vc8/pset6_trie_vs_hashtable/cf9nlkn
 * 
 * 1st one found here: http://codereview.stackexchange.com/questions/85556/simple-string-hashing-algorithm-implementation
 * 
 */

/** 
int hash(char* word) {
    unsigned int hash = 0;
    while (*word)
        hash = (hash * 10) + *word++ - '0';
    return hash % SIZE;
}
*/

int hash(char* word) {

    unsigned int hash = 0;
    for (int i = 0, n = strlen(word); i < n; i++)
        hash = (hash << 2) ^ word[i];
    return hash % SIZE;
}

/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word)
{
    // convert word to lowercase since all words in dictionary are lowercase
    int len = strlen(word);
    char tempWord[len+1];
    strcpy(tempWord, word);

    for (int i = 0; i < strlen(tempWord); i++) {
        if (tempWord[i] != '\'')
            tempWord[i] = tolower(tempWord[i]);
    }

    // get the hash value of the passed in word
    int index = hash(tempWord);

    // go to that index and search for the word
    struct node* current = hashtable[index];

    // if word is found return true
    while (current != NULL) {
        if (strcmp(current->word, tempWord) != 0) {
            current = current->next;
        }
        else {
            return true;   
        }
    }
    return false;
}

/**
 * Loads dictionary into memory.  Returns true if successful else false.
 */
bool load(const char* dictionary)
{
    // open the dictionary
    FILE* dic = fopen(dictionary, "r");
    if (dic == NULL)
        return false;

    // scan through the dictionary word by word
    while (fscanf(dic, "%s\n", word)!= EOF)
    {
        dicSize++;
        int index = hash(word);
        Push(&hashtable[index], word);
    }
    fclose(dic); // close the dictionary file

    return true;
}

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void)
{
    return dicSize;
}

/**
 * Unloads dictionary from memory.  Returns true if successful else false.
 */
bool unload(void)
{
    // iterate through entire SIZE of the hashtable array
    for (int i = 0; i < SIZE; i++) {
        if (hashtable[i] != NULL) // if something is there then delete the entire list there
            DeleteList(&hashtable[i]);
    }
    return true;
}

September 04, 2016 🍅🍅🍅🍅

  • Worked on questions and surveys for Problem Set 4 and Problem Set 5 and submitted my code.
  • I would like to go back and implement a Trie structure as well.

image04

September 05, 2016 🍅🍅🍅

  • Week 7 - Lecture 1 Video: 53 minutes
  • Harvard had seminars going on that all look really cool. Luckily for me each of them was livestreamed and there is a video! I love the internet.

image03

  • Link to these and all from past years here: https://manual.cs50.net/seminars/
  • Cloud9 IDE has a built-in web server! Time to work with the web! 👍
  • I’ve always just tinkered around with CSS (and HTML), but this lecture has explained a lot more about it.
  • Next up is PHP.

September 06, 2016 🍅🍅🍅🍅

    • Week 7 - Lecture #2 (53m)
      • Starting with PHP.
      • As with HTML and CSS they expect us to be able to figure most of it out on our own now.
      • PHP is not a compiled language, it is interpreted by another program when run. Weird.
      • An associative array can be a hash table easily in php. Dammit. No return types for functions.
      • The price we pay however is in speed. PHP is a bit slower than C since it’s interpreted.
      • Sadf
    • Reddit Post: Coders: What would you tell your younger self?

image02

  • Finished Lecture #2

September 07, 2016 🍅🍅🍅

  • Walkthrough Videos
    • Basic HTML and CSS

image05

Learning to Code: Week 18 - Problem Set 6 "Lookup"

Learning to Code: Week 16 - Linked Lists