Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learn R in R- Swirl Kazim Topuz, PhD.

Similar presentations


Presentation on theme: "Learn R in R- Swirl Kazim Topuz, PhD."— Presentation transcript:

1 Learn R in R- Swirl Kazim Topuz, PhD

2 Why R? R can handle very large datasets
R can automate and calculate much faster than Excel R source code is reproducible Community libraries worth of R source code are available to all R provides more complex and advanced data visualization R is free R can handle very large datasets Excel is limited in that there are only so many rows and columns per spreadsheet. So when you run out of rows/columns, you’re forced to move to a new tab or a new file. While it’s debatable that needing that many rows or columns of data is unlikely in most circumstances, there are cases where datasets grow over time and eventually the excel spreadsheet will not be able to contain all of that data. Bottom line: The Excel spreadsheet is finite and this limits the datasets you can use. R can automate and calculate much faster than Excel Point 1 brings us to Point 2: I can’t tell you the number of times I’ve had a gigantic file crash because it contains up to 20 tabs chock-full of data, including a Pivot Table, a tab that contains over 6 years’ worth of pricing for 3,000+ products, and countless formulas throughout. Naturally, the file crashes due to the fact that Excel can handle a certain amount of data, but can barely function properly when you use it to capacity. This creates a serious problem when you start losing data because the file seems unable to save when you add any more data to it. Bottom line: R is able to not only handle huge datasets but can still run efficiently while doing so. R source code is reproducible Research any number of R advocate blogs and you’ll find this point is a big one. R source codes can be used repeatedly and with very different datasets in ways that Excel formulas and VBA source codes cannot. There are statistical source codes available that can be applied to any dataset with only a few changes to code and reference data that can then be reapplied several times over very easily. While VBA can run virtually anything R can, it can be much more time consuming, and also limited similarly to Excel. R also has an advantage in that it shows the data and analysis separately, while Excel shows them together (data within formulas).This allows the user to view the data more clearly to correct any errors or see the progression of the data. Bottom line: Reproducibility of R source code is much more advanced and easy to use than Excel or VBA. Image Credit: trendct.org Community libraries worth of R source code are available to all R has been growing in usage and popularity over the past several years and with that, the number of users adding new functions to the available packages and libraries has also increased. This allows any R users access to not only basic statistical functions, but to an increasing number of complex new functions that may be applicable to their data. This creates a community of R users who are extending their knowledge easily to other R users who may require a similar solution to their data. Bottom line: R promotes sharing of functions to expand libraries with new and different reproducible statistical functions. R provides more complex and advanced data visualization Excel can produce several types of basic graphs once you chop up and select the exact data you want to analyze. R is designed to much more easily produce graphs without all the pre-graph work, as well as provide more types of graphs than you’d ever know what to do with. Take a look here ( ) to see the types of graphs R can create. Of course, Excel is perfectly sufficient when it comes to showing simple, straightforward data analysis, but R can take very complicated data and turn it into much easier to understand visual representation. Bottom line: R can provide advanced data visualization for more complex datasets. Image Credit: leondangio.pbworks.com R is free, Excel is not I don’t think I have to dive too deeply into this point; R can be downloaded by anyone anywhere on any platform (even more platforms than Excel). Bottom line: Everyone loves stuff that’s free. Now, please don’t think that just because I wrote this blog means I’m going to delete my Excel and never look back. There are plenty of reasons to use Excel over R just as there are reasons to use any program over any another program. Excel is still a powerful tool for smaller datasets, basic data entry, simpler functions and formulas, and viewing raw data. It would likely benefit all data analysts to have a broad understanding of multiple types of programs that can be used to organize and analyze their own specific type of data. Each dataset is unique and analysis depends entirely on the user and what they’re looking to uncover. Don’t be afraid to branch out of your Excel comfort zone; there’s plenty of interesting programs to explore to improve your data analysis.

3 Languages for analytics
New KDnuggets Poll shows the growing dominance of four main languages for Analytics, Data Mining, and Data Science: R, SAS, Python, and SQL - used by 91% of data scientists - and decline in popularity of other languages, except for Julia and Scala. Salary mostly depends on experience, education, location, industry, and unfortunately, factors such as gender. Also, most data scientists have all the three skills and more (R + Python + SQL), so it is hard to assess which one is the most valuable

4 R Challenges R cannot do everything- can drive you crazy
Learning curve- steep Limited support for 3D graphics Memory intensive- get slow when you have big data Since packages are contributed Quality & Documentation is not always guaranteed Customer/Technical not readily available

5 Me: Professional Teaching Undergraduate Data Analysis (70 students)
Analytics Programming- R (45 students) Analytics Programming- Python (45 students)

6 What do you use for data analysis?
How well you know R? Not at all 60% Package user 36% Write codes-functions 4% R-Guru 0% What do you use for data analysis? Excel 60% SAS 16% Nothing 8% Python R 4% SPSS 0% Other

7 Student groups MIT 28% MIS/MIT 12% Accounting Economics/MIT 8%
Professional MBA MBA/Finance 4% MIS/BBA MBA/MIT Sport Business Analytics Journalism

8 Learn- try- interact Analytic programming - R
No programming knowledge required Basic computer skills needed Class will be very cumulative Class will be very interactive Learn- try- interact Since this is the first course in R, no programming knowledge required. We will start with the basic building blocks. And our class is very cumulative, so I will suggest you to follow the order, you can fast forward these videos, I will be so much fun, if you speed up too much.

9 Topics! Exploratory data analysis Getting and cleaning data Functions
Summarize the data Visualize the data “ggplot2” package Getting and cleaning data Getting data in and out of R Cleaning and tidying data Data transformation Base package “dplyr” package Functions Basic statistical functions Writing scripts Apply family Data structures Vectors Matrices Data frames Lists Factors Introduction nuts and bolts Why-R, Installation R- getting started Basics Navigation In this introduction to R course you will learn about the basics of R, as well as the most common data structures to store data. By the end of this course, you will know how to write your own functions, create data structures, manipulate them and perform calculations on them to get surprising insights. You will learn 2 commonly used packages, dplyr for data transformation and ggplot2 for data visualization. Finally, in exploratory data analysis, you’ll combine visualisation and transformation with your curiosity and scepticism to ask and answer interesting questions about data.

10 Course: Pedagogical Approach
Read/view resources prior to class time Textbooks/ other sources PowerPoints Work through problems during class time Pop-up quizzes Interactive teaching tool Assess assimilation of skills through Take-home exercises Tests Project

11 swirl teaches you R programming and data science interactively, at your own pace, and right in the R console!

12 How to Install? Step 1: Get R Step 2 (recommended): Get RStudio
In order to run swirl, you must have R or later installed on your computer. Step 2 (recommended): Get RStudio Step 3: Install swirl > install.packages("swirl") Step 4: Start swirl > library("swirl") > swirl() Step 5: Install an interactive course >?InstallCourses Step 6: Lets have some fun!

13 Install interactive courses
> library(swirl) > swirl() | Welcome to swirl! Please sign in. If you've been here before, use the same name as you did then. If you are new, call yourself something unique. What shall I call you? Kazim Now I put the course files on github. Now lets install our first courses You can find many more classes in the original swirl website, as I said our version is specifically designed for you. We will use medical data in most part. If you have any trouble installing let the instructor know. Now I will go and open interactive tool and show you how to proceed.

14 | Thanks, Kazim. Let's cover a couple of quick housekeeping items before we begin our first lesson. First of all, you | should know that when you see '...', that means you should press Enter when you are done reading and ready to continue. |======= | 8% | Let's begin with basics. To get familiar with R coding environment, start with some basic calculations. R | console can be used as an interactive calculator too. Type the following in your console and press | Enter. >

15 Course Repository Beginner Intermediate Advanced
R Programming: The basics of programming in R Data Analysis: Basic ideas in statistics and data visualization Mathematical Biostatistics Boot Camp: One- and two-sample t-tests, Open Intro: A very basic introduction to statistics, data analysis, and data visualization Intermediate Regression Models: The basics of regression modeling in R Getting and Cleaning Data: dplyr, tidyr, lubridate, Advanced Statistical Inference: More..

16 Course Repository R Programming: The basics of programming in R
| Please choose a lesson, or type 0 to return to course menu. 1: Basic Building Blocks : Workspace and Files : Sequences of Numbers 4: Vectors : Missing Values : Subsetting Vectors 7: Matrices and Data Frames 8: Logic : Functions 10: lapply and sapply : vapply and tapply : Looking at Data 13: Simulation : Dates and Times : Base Graphics

17 Create Your Course Install swirl and swirlify
Open RStudio (or just plain R if you don’t have RStudio) and copy and paste the following commands into the console to install everything you need: install.packages(c("swirl", "swirlify")) Start swirlify Type library(swirlify) at the R prompt to load the package. You’ll have to repeat this step every time you restart R or RStudio. Create a new lesson or edit an existing one >new_lesson("My Lesson", "My Course")

18 Lets create a new class > library(swirlify) > swirlify() > new_lesson("My First Lesson", "My New Course")

19 Message Questions Message questions display a string of text in the R console for the student to read. Once the student presses enter, swirl will move on to the next question. Add a message question using wq_message(). Here’s an example message question: Class: text Output: Welcome to my first swirl course! | Welcome to my first swirl course!

20 Command Questions Here’s an example command question
Command questions prompt the student to type an expression into the R console. The CorrectAnswer is entered into the console if the student uses the skip() function. The Hint is displayed to the student if they don’t get the question right. The AnswerTests determine whether or not the student answered the question correctly. See the answer testing section for more information. Add a message question using wq_command(). Here’s an example command question - Class: cmd_question Output: Add 2 and 2 together using the addition operator. CorrectAnswer: 2 + 2 AnswerTests: omnitest(correctExpr='2 + 2') Hint: Just type | Add 2 and 2 together using the addition operator. >

21 Multiple Choice Questions
Multiple choice questions present a selection of options to the student. These options are presented in a different order every time the question is seen. The AnswerChoices should be a semicolon separated string of choices that the student will have to choose from. Add a message question using wq_multiple(). Here’s an example multiple choice question: - Class: mult_question Output: What is the capital of Canada? AnswerChoices: Toronto;Montreal;Ottawa;Vancouver CorrectAnswer: Ottawa AnswerTests: omnitest(correctVal='Ottawa') Hint: This city contains the Rideau Canal | What is the capital of Canada? 1: Toronto 2: Montreal 3: Ottawa 4: Vancouver

22 Other Questions! Figure Questions Numerical Questions Script Questions

23 Kazim Topuz ktopuz@ou.edu
Thank You! Kazim Topuz

24


Download ppt "Learn R in R- Swirl Kazim Topuz, PhD."

Similar presentations


Ads by Google