Posts

Showing posts from August, 2015

Analyzing Twitter data with R (part 3: Cleaning & organizing the Data)

Image
After we have explained in the previous parts, how to set up the access to Twitter's API and how to import tweets with a simple R command, in this third part we will try to organize and clean the data we have imported in good reusable format. So let's pick up from where we left off , after finishing the authentication process, now we want to import certain tweets. the command  searchTwitter  from the package twitteR will do the job for us.  we need to create a new object that will contain the data we are about to import and give it the result of our  searchTwitter function. In our example, we will be querying Twitter for the tweets containing the hashtag #Tunisia , we will import 1000 English tweets. and we will have a preview of the imported data with the head()  function:

String manipulation functions in R ( part 2)

Image
F irst part of this series of posts. This is the second part of a series of posts that treat the subject of strings and text manipulation in R. Every time some functions are picked with basic examples and shared here. I hope that at the end of this series, I'll have a good archive of lessons from beginners level to expert level. The points we will see in this part 2 are: character encoding  length of a string  lowercase and uppercase conversion  basic string comparison  concatenating strings extracting sub-strings   character encoding 

Mathematical Symbols in LATEX

Image
I have been going through my hard drive and organizing some of the files I have, PDF's, eBooks and articles and I found a very interesting documents about LATEX math symbols. Please deactivate ad-blockers to be able to see the pdf document ! UPDATE : I have published this article a few years and it has attracted the most attention of all my blogposts, so I'd like to thank all of the visitors that shared this post. Please take some time to check out other articles I have shared about Data Science, Machine learning and Big Data: Data Mining basic and advanced concepts - Part 2 : Describing Data Here are some of the most interesting parts of it:

M&M&M : Mean ~ Median ~ Mode : Uncovered

Image
Is it just me , or people  are still confusing the 3 M's?   I've made a general observation about the common mistakes some folks make when talking about statistical concepts., that they can not differentiate between the mean, mode and median of a particular data variable. For statisticians and domain experts, this might seem like a "stupid" topic to talk about, since statistics are actually a very deep field, and those 3 points are just small dots in an ocean of concepts and knowledge . But it is very important for those who work with numbers, to really get the idea behind some of the basic concepts they see or use everyday, because as confusing as it might seem for some people, an argument where you use the "mean" as a variable to justify a decision you made will be totally wrong if what you were really referring to in your reasoning was the "Mode"