‘The Concert Project’ : Event Setup (2)

This blog post is dedicated to making sense of the dates of all of these concerts. The date formats are inconsistent and use strange abbreviations. As well, there are many concerts whose shows happen over long weekends or otherwise, meaning there are multiples dates in a single ‘event’. For the purposes of this project at this point, I’m just trying to find the ‘StartDate’ of each of these shows. This will give me a good idea of when in the year that these shows are happening.



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_1.ipynb

hosted with ❤ by GitHub

A quick print of two 5-row sections of the data provide a view into all of the different structures of dates. We see #2 in the first set took place over the days following Christmas in 2014. Meanwhile, shows in June and July are expressed in a completely different format.



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_2.ipynb

hosted with ❤ by GitHub

Due to some weirdness in GitHub I am not able to print the output of the above syntax, so I’ve provided a snapshot of it below. At first glance the code seems to have mostly worked, but upon further inspection that is not actually the case. I wasn’t able to correctly transform any of the multi-date events.

Missing Data

I use an iterative process to pick apart the pieces of the dates I’m concerned about in order to fix this. First I isolate the Month, Year, and first Day listed. I then re-combine them into a StartDate variable.



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_3.ipynb

hosted with ❤ by GitHub

2_3a

This method works for those with the “Mon. Day, Year” format and those with multiple dates, however the dates with the other ‘Day-Month-Year” format are not correctly translated, so I use an ifelse statement to fill in the gaps from one method with the other.

2_3b

With that our outputs are now complete! And these 10 examples look to have been translated correctly. In order to have a much better idea if everything went as expected I create a table of 100 randomly selected observations – using the original ‘Dates’ value and the new ‘StartDate’ value – to spot-check my transformations.



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_4.ipynb

hosted with ❤ by GitHub



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_5.ipynb

hosted with ❤ by GitHub

With that, I think the Events data is pretty much done. I rename the ‘newdata’ to ‘Events’ and – in order to show snapshots of the data itself – I delete the ‘Prices’ column and replace it with a ‘Num_Prices’ variable to indicate how many price categories there were. I don’t know enough about GitHub code to display it otherwise. I have a feeling it had something to do with the comma-separated nature of the list of prices. Either way, I don’t think it is a huge loss. In the next post I’ll begin to do some exploratory analysis.



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

view raw

Blog_2_6.ipynb

hosted with ❤ by GitHub