Classification process has begun

User Continent, breakdown of English and Spanish, Verified

The first thing I did today was to fix the issue of “Spanish – Mexican” being identified in the languages, in doing this I also found an issue with the “Finnish” language. I resolved these issues but their still remains some language codes I am unable to identify, for example “msa” and xx-ls”.

Then I started to breakdown the dataset into different categories and identify characteristics of the tweeters. I started to identify what continent the users where on, this was based on time zone and language spoken. Asher suggested that I should use the languages codes and the country specific variants to identify the continent they are on but I did not do it this way because most users don’t have country specific keyboards setup of the device they are using. Instead I categorised them by obtaining what time zone the user is in and then cross referenced this with their keyboard languages and typical languages spoken in that region. For example lots of North American time zones are shared with South America so I bundled all the English speakers into the category “Northern America” and Spanish and other typical languages spoken in South America into the “South America” category. This allowed me to specifically identify over 91% of cases and their locations although some of these 91% where unknown due to no time zone being provided. Asher then advised that I should include Spanish but not Mexican Spanish in the European grouped data but I did not do this due to the inaccuracy of country specific keyboard languages and I decided to keep them in the categories they are already in.

After that I then decided to break up the users that spoke English and Spanish into what continents they are located on.

Asher then advised me to identify the popularity of user and if they are verified as I did with the i360 dataset, I did this and then created the spreadsheet attached to this blog in which you can view all finding from today except popularity data with some analysis.

Very happy with the progress I’ve made today, I fell as though I’ve made substantial progress.

Leave a Reply

Your email address will not be published. Required fields are marked *