Posts

Showing posts with the label Tutorials

Why and how I deleted 4000+ connections from Linkedin

Image
The Procession of the Trojan Horse in Troy by Domenico Tiepolo (1773)   First things first: could someone remind me of the main goal (or goals) of using Linkedin as a professional social network?  It undeniably seems to me that, while I had more than a few thousand connections and followers, for all the times I have opened Linkedin to check what was going on (or to reply to that desperate student looking for advice or an internship), I could not fully process what was posted and the quantity of things that are of absolutely no interest to me personally. In this post, I will explain the reasons that lead me to delete a big chunk of my Linkedin Network, and how I defined and identified those that have to be deleted As for Facebook, Instagram or even Twitter, every place has its own use.  Keeping in touch with close friends and family on Facebook, taking interest in following beloved artists and brands on Instagram, and enjoying the occasional discussions with likeminded peo...

Advanced (Intro) to SAS Macro Programming (Part 3)

Image
The First part and the Second part of this introduction talked about some basic things every beginner  needs to know when using SAS Macro Language. In this part we will talk about : Dynamically storing a value into a macro variable Iterative statements in macro language (Next part) Conditional statements in macro language  (Next part) 1-Dynamically storing a value into a macro variable /* Calculating an average value and storing it in a macro variable*/ Proc Means Data = SASHELP.heart noprint; Var height; Output out = test mean= avg_height; Run ;

Intro to SAS Macro Programming (Part 2)

Image
We've seen a first introduction,  in part 1 of introduction to SAS macro programming . In this part we will try to see how to write a complete macro statement and how to use multiple variables in a single macro statement. Adding parameters to a SAS macro statement: Syntaxe: % Macro_name (Value_1,……., Value_n ); Example 1: In the example bellow, we will add a single parameter to our SAS macro that calculates basic descriptive statistics using proc means: In this case, the parameter will be the table name: %Macro AVG (Table); Proc means data =&table; run; %mend ; And now we will see the result of our macro:   %AVG (SASHELP.Shoes); This is the output we get from the macro we've just created and executed :

Intro to SAS Macro Programming (Part 1)

Image
Trying different statistical languages and being confortable with them is a big advantage for today's statisticians and data scientists. This is why I wanted to do this series of posts that will include (besides R and Python) SAS, SAS Macro , Julia..etc What is SAS? I will try to be as unbiased as possible here. According to its Wikipedia page , SAS  is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS has a multitude of procedures specialized in statistical analysis, data manipulation and visualisation. SAS programming consists on writing the code composed by SAS Data Steps and SAS Procedures, and Executing it. The concepts of parameters and variables is included in the SAS Macros.

Analyzing Twitter data with R (part 3: Cleaning & organizing the Data)

Image
After we have explained in the previous parts, how to set up the access to Twitter's API and how to import tweets with a simple R command, in this third part we will try to organize and clean the data we have imported in good reusable format. So let's pick up from where we left off , after finishing the authentication process, now we want to import certain tweets. the command  searchTwitter  from the package twitteR will do the job for us.  we need to create a new object that will contain the data we are about to import and give it the result of our  searchTwitter function. In our example, we will be querying Twitter for the tweets containing the hashtag #Tunisia , we will import 1000 English tweets. and we will have a preview of the imported data with the head()  function:

String manipulation functions in R ( part 2)

Image
F irst part of this series of posts. This is the second part of a series of posts that treat the subject of strings and text manipulation in R. Every time some functions are picked with basic examples and shared here. I hope that at the end of this series, I'll have a good archive of lessons from beginners level to expert level. The points we will see in this part 2 are: character encoding  length of a string  lowercase and uppercase conversion  basic string comparison  concatenating strings extracting sub-strings   character encoding 

Mathematical Symbols in LATEX

Image
I have been going through my hard drive and organizing some of the files I have, PDF's, eBooks and articles and I found a very interesting documents about LATEX math symbols. Please deactivate ad-blockers to be able to see the pdf document ! UPDATE : I have published this article a few years and it has attracted the most attention of all my blogposts, so I'd like to thank all of the visitors that shared this post. Please take some time to check out other articles I have shared about Data Science, Machine learning and Big Data: Data Mining basic and advanced concepts - Part 2 : Describing Data Here are some of the most interesting parts of it:

Analyzing Twitter data with R (part 2: Importing Tweets )

Image
Twitter is a magnificent source of very interesting data about the world, trends, products, celebrities, current and past events..etc This is why I  have been interested in analyzing and working in Twitter data for a long time. In part one we learned how to set up an application and get some codes and key's to use later one. In this part of the tutorial, we will look at the ways of searching and importing data from Twitter. The authentication process is very easy: it has mainly  2 parts: - Entering the keys and secrets you had from Twitter - Submitting them for authentication and get access to Twitter's API.

String manipulation functions in R ( part 1)

Image
I wanted to do this quick tutorial because of an observation I made while working with some New R users who struggle with operations that does not involve numbers, operations and statistical calculations. Handling text, images, files of all formats, are operations made possible within R via its numerous packages. This time I started with basic strings, and I will probably add to this series later. We will learn 4 new functions: grep grepl These 2 function searches for matches of a given string variable, within each element of a character vector. the only difference between them is the output. The first one gives the position of the string that matches the search, the second one gives a logical result of TRUE or FALSE for all the strings of the vector. gsub This function performs a substitution of a given string by an other in all the strings given as input for the function. str_replace This function replaces first occurrence of a matched pattern in a string...

Analyzing Twitter data with R (part 1: connecting to Twitter API )

Image
All "Smart" Businesses are looking to understand the social media trends , to analyze the massive amount of public data available online and to make insightful decisions based on these analysis. Source :http://www.slashgear.com/twitter-data-grants-introduced-to-offer-select-institutes-data-trove-05315867/ For any statistician or future data scientist freshly graduating out of the university, it is very important to be able to have certain skills with the statistical modeling and mathematical knowledge. In this series of posts, I will detail the necessary steps for that you will need to access Twitter, import data, clean it and analyze it and have a conclusion based on the data you have extracted. It will be a simple step-by-step tutorial if you'd like to call it that way. This series of posts is destined to students, currently taking data science classes, and to anyone interested in R language and social media in particular . I've been asked to make ...

R & SQL: Simple Data Science with R and SQL

Hello World! The topic of today's article is databases. As Data Scientists and Statisticians work with data everyday, they wont actually use that 50 lines text file data-sets provided by teachers in the statistical analysis courses in a real-world applications. Statisticians work with massive amounts of data, whether this data is stored in flat-files or in databases, the size of the analyzed data will definitely be more then a couple of hundred records. Thus the need for a way to extract data from large tables stored in databases in a simple, intuitive way. Whats is SQL ? SQL means  Structured Query Language . For me, it has always been the "Simple Query Language". I've always used the term  "Simple"  to describe the simplicity of learning and using basic sql functions.

Quick distribution plotting with R

Image
I was asked by new students in the Statistics and Data Analysis School of Tunis (and by others friends) about ways to plot densities and the best software to do that. I will try to give some examples with  R software on how to plot density . If you don't know what R is, you can take a look at my old article that explains that very well with some good R learning resources . ( here in French ). You know that densities are all about random data and the first thing that comes in mind is histograms. To start this task, we will first set the data sample that we will work with which should be random, and then we will start plotting some nice graphics with the ggplot2 library. Basic things :Histogram and density plots: regular histogram where is the mean! density plot Overlay of density and histogram