Stats and R https://statsandr.com/ Recent content on Stats and R Hugo -- gohugo.io en Mon, 16 Dec 2019 00:00:00 +0000 You can do more for neural networks in R with {kindling} https://statsandr.com/blog/you-can-do-more-for-neural-networks-in-r-with-kindling/ Thu, 19 Feb 2026 00:00:00 +0000 https://statsandr.com/blog/you-can-do-more-for-neural-networks-in-r-with-kindling/ This post has been written in collaboration with Joshua Marie. Why this post matters Neural networks in R are no longer niche. Today, we can choose among: {nnet} for classic, small-scale neural nets, {neuralnet} another classic neural nets package besides {nnet}, {keras} / {keras3} for the Keras API (typically with Python backends such as TensorFlow/JAX/Torch), {torch} for native-R deep learning with explicit model and training control. So why discuss another package? Paper: 'Effectiveness of pneumococcal conjugate vaccines against invasive pneumococcal disease in Vietnamese children prior to national introduction: A matched case-control study' https://statsandr.com/blog/paper-effectiveness-of-pneumococcal-conjugate-vaccines-against-invasive-pneumococcal-disease-in-vietnamese-children-prior-to-national-introduction-a-matched-case-control-study/ Tue, 17 Feb 2026 00:00:00 +0000 https://statsandr.com/blog/paper-effectiveness-of-pneumococcal-conjugate-vaccines-against-invasive-pneumococcal-disease-in-vietnamese-children-prior-to-national-introduction-a-matched-case-control-study/ I am happy to share that an article I contributed to has just been published in Vaccine (Truong et al. 2026): Effectiveness of pneumococcal conjugate vaccines against invasive pneumococcal disease in Vietnamese children prior to national introduction: A matched case-control study đź”— https://doi.org/10.1016/j.vaccine.2026.128349 Background Streptococcus pneumoniae remains a major cause of severe illness and death in young children worldwide. Pneumococcal conjugate vaccines (PCVs) have dramatically reduced invasive pneumococcal disease (IPD) in countries where they are part of national immunization programs. nycOpenData: A unified R interface to NYC Open Data APIs https://statsandr.com/blog/nycopendata-a-unified-r-interface-to-nyc-open-data-apis/ Tue, 27 Jan 2026 00:00:00 +0000 https://statsandr.com/blog/nycopendata-a-unified-r-interface-to-nyc-open-data-apis/ Guest post by Christian Martinez, developer of the nycOpenData package in R. nycOpenData: A unified R interface to NYC Open Data APIs I am pleased to announce the release of nycOpenData, an R package providing convenient, tidy access to dozens of datasets from the New York City Open Data platform. The package is designed as part of an open-science and reproducible-research effort, with the goal of lowering the friction between public data and statistical analysis—especially for teaching, exploratory research, and applied civic work. AssociationExplorer: A user-friendly shiny application for exploring associations and visual patterns https://statsandr.com/blog/associationexplorer-a-user-friendly-shiny-application-for-exploring-associations-and-visual-patterns/ Tue, 16 Dec 2025 00:00:00 +0000 https://statsandr.com/blog/associationexplorer-a-user-friendly-shiny-application-for-exploring-associations-and-visual-patterns/ I am pleased to announce the publication of our paper “AssociationExplorer: A user-friendly Shiny application for exploring associations and visual patterns” in the journal SoftwareX, together with the official release of the AssociationExplorer2 R package on CRAN. Both the paper and the software are part of an open-science effort aimed at making exploratory data analysis more accessible to non-technical users. Why AssociationExplorer? Exploring multivariate datasets is now central in social sciences, data journalism, and education. Code of Conduct https://statsandr.com/code-of-conduct/ Fri, 05 Dec 2025 00:00:00 +0000 https://statsandr.com/code-of-conduct/ 1. Purpose This Code of Conduct exists to help ensure that this site and its associated community — including interactions on GitHub — remain welcoming, respectful, and constructive for everyone. It applies to comments on the website, GitHub discussions and issues, pull requests, contributor submissions, and any communication related to this project. 2. Our Values We aim to foster a community built on: Respect and kindness — Treat others with empathy, patience, and courtesy. Nonparametric serial interval estimation https://statsandr.com/blog/nonparametric-serial-interval-estimation/ Mon, 18 Aug 2025 00:00:00 +0000 https://statsandr.com/blog/nonparametric-serial-interval-estimation/ Motivation Epidemiological delays inform about the time between two well-defined events related to a disease. The serial interval (SI) of an infectious disease is defined as the time between symptom onset in a primary case (infector) and symptom onset in a secondary case (infectee). It is a widely used epidemiological delay quantity and plays a central role in mathematical/statistical models of disease transmission. There exists a tight link between the reproduction number (average number of secondary infections generated by an infected individual) and the serial interval. Paper: 'Semi-Markov modeling for disease incidence risk and duration' https://statsandr.com/blog/paper-semi-markov-modeling-for-disease-incidence-risk-and-duration/ Mon, 16 Jun 2025 00:00:00 +0000 https://statsandr.com/blog/paper-semi-markov-modeling-for-disease-incidence-risk-and-duration/ I’m happy to share that my latest research paper, “Semi-Markov modeling for disease incidence risk and duration” has been accepted for publication in the journal Biostatistics & Epidemiology (Soetewey et al., 2025). Read the full paper here. This work focuses on the use of a Semi-Markov illness-death model to estimate both: The risk of cancer incidence over a future time period The number of years of life lost (YLL) due to cancer, with a focus on loss before age 70 The analysis relies on real-world data from the Belgian Cancer Registry, covering over 160,000 cases of melanoma, thyroid, and female breast cancer diagnosed between 2004 and 2020. Paper: 'Right to be forgotten for mortgage insurance issued to cancer survivors: critical assessment and new proposal' https://statsandr.com/blog/paper-right-to-be-forgotten-for-mortgage-insurance-issued-to-cancer-survivors-critical-assessment-and-new-proposal/ Tue, 05 Nov 2024 00:00:00 +0000 https://statsandr.com/blog/paper-right-to-be-forgotten-for-mortgage-insurance-issued-to-cancer-survivors-critical-assessment-and-new-proposal/ I am happy to announce that our paper entitled “Right to be forgotten for mortgage insurance issued to cancer survivors: critical assessment and new proposal” has been accepted for publication in European Actuarial Journal (Soetewey et al., 2025). In this paper, we propose an alternative method to determine the waiting period opening the right to be forgotten in insurance. This new method is based on a constraint imposed to the premium, which is then transposed into a target on the conditional observed survival. EpiLPS for estimation of incubation times https://statsandr.com/blog/epilps-for-estimation-of-incubation-times/ Thu, 01 Aug 2024 00:00:00 +0000 https://statsandr.com/blog/epilps-for-estimation-of-incubation-times/ Motivation Coarse data Simulated example Real data example References Motivation A group of researchers from the Data Science Institute (DSI) at Hasselt University developed a new statistical model to estimate the incubation period of a pathogenic organism based on coarse data. The incubation period of an infectious disease (defined as the time elapsed between infection and the manifestation of first symptoms) is of great importance as it permits to shed light on the epidemic potential of a disease and to optimize the length of quarantine periods to freeze transmission. Binary logistic regression in R https://statsandr.com/blog/binary-logistic-regression-in-r/ Tue, 30 Jan 2024 00:00:00 +0000 https://statsandr.com/blog/binary-logistic-regression-in-r/ Introduction Linear versus logistic regression Univariable versus multivariable logistic regression Data Binary logistic regression in R Univariable binary logistic regression Quantitative independent variable Qualitative independent variable Multivariable binary logistic regression Interaction Model selection Quality of a model Validity of the predictions Accuracy Sensitivity and specificity AUC and ROC curve Reporting results {gtsummary} package {finalfit} package Conditions of application Conclusion Introduction Regression is a common tool in statistics to test and quantify relationships between variables. What is the probability that two persons have the same initials? https://statsandr.com/blog/what-is-the-probability-that-two-persons-have-the-same-initials/ Wed, 06 Dec 2023 00:00:00 +0000 https://statsandr.com/blog/what-is-the-probability-that-two-persons-have-the-same-initials/ Introduction How likely is it? For our team For teams of different sizes Verification For our team For teams of different sizes Conclusion Introduction Last week, I joined a team to work on a collaborative project. The team was already established for a few months, with several scientists working together on the project. For simplicity, they used to sign documents, mention colleagues in emails, etc. Introduction to data manipulation in R with {dplyr} https://statsandr.com/blog/introduction-to-data-manipulation-in-r-with-dplyr/ Mon, 27 Nov 2023 00:00:00 +0000 https://statsandr.com/blog/introduction-to-data-manipulation-in-r-with-dplyr/ Introduction Data {dplyr} package Filter observations The pipe operator Extract observations Based on their positions Based on their values Sample observations Sort observations Select variables Rename variables Create or modify variables Summarize observations Identify distinct values Connected operations Group by Number of observations Number of distinct values First, last or nth value If else Case when Conclusion and other resources References Introduction In a previous post, we showed how to manipulate data in R. Paper: 'Impact of a Food Rebalancing Program Associated with Plant-Derived Food Supplements on the Biometric, Behavioral, and Biological Parameters of Obese Subjects' https://statsandr.com/blog/paper-impact-food-rebalancing-program-on-biometric-behavioral-biological-parameters-of-obese-subjects/ Tue, 14 Nov 2023 00:00:00 +0000 https://statsandr.com/blog/paper-impact-food-rebalancing-program-on-biometric-behavioral-biological-parameters-of-obese-subjects/ I am happy to announce that our paper has been accepted for publication in Nutrients (ISSN 2072-6643) (Houben et al. 2023). This study investigates the impact of a food rebalancing program associated with plant-derived food supplements on the biometric, behavioral, and biological parameters of obese subjects. Read more here. Thanks to all co-authors for the great work, and the Nutrients Editorial Office for their guidance throughout this process. We are also thankful to the two anonymous reviewers for their input that has greatly helped shape the paper. Pearson, Spearman and Kendall correlation coefficients by hand https://statsandr.com/blog/pearson-spearman-kendall-correlation-by-hand/ Tue, 05 Sep 2023 00:00:00 +0000 https://statsandr.com/blog/pearson-spearman-kendall-correlation-by-hand/ Introduction Data With ties Without ties Correlation coefficients by hand Pearson With and without ties Spearman With ties Without ties Kendall Without ties With ties Verification in R Conclusion Introduction In statistics, a correlation is used to evaluate the relationship between two variables. In a previous post, we showed how to compute a correlation and perform a correlation test in R. How to: one-way ANOVA by hand https://statsandr.com/blog/how-to-one-way-anova-by-hand/ Wed, 30 Aug 2023 00:00:00 +0000 https://statsandr.com/blog/how-to-one-way-anova-by-hand/ Introduction Data and hypotheses ANOVA by hand Overall and group means SSR and SSE ANOVA table Conclusion of the test Conclusion Introduction An ANOVA is a statistical test used to compare a quantitative variable between groups, to determine if there is a statistically significant difference between several population means. In practice, it is usually used to compare three or more groups. Scrape Yahoo search engine results with R https://statsandr.com/blog/scrape-yahoo-search-engine-results-with-r/ Thu, 24 Aug 2023 00:00:00 +0000 https://statsandr.com/blog/scrape-yahoo-search-engine-results-with-r/ Introduction Scraping Yahoo search engine results with R Conclusion Note: This is a guest post by Manthan Koolwal, founder of Scrapingdog. Introduction Web scraping is the process of extracting data from websites. It is usually done in an automated manner to obtain a large amounts of data through various websites, without the need to gather data by hand. In a previous post, we introduced this method and illustrated it with a Wikipedia page. Two-way ANOVA in R https://statsandr.com/blog/two-way-anova-in-r/ Mon, 19 Jun 2023 00:00:00 +0000 https://statsandr.com/blog/two-way-anova-in-r/ Introduction Data Aim and hypotheses of a two-way ANOVA Assumptions of a two-way ANOVA Variable type Independence Normality Homogeneity of variances Outliers Two-way ANOVA Preliminary analyses Descriptive statistics Plots Two-way ANOVA in R Pairwise comparisons Visualizations Conclusion Introduction The two-way ANOVA (analysis of variance) is a statistical method that allows to evaluate the simultaneous effect of two categorical variables on a quantitative continuous variable. Paper: 'Childhood Bacterial Meningitis Surveillance in Southern Vietnam: Trends and Vaccination Implications from 2012 to 2021' https://statsandr.com/blog/paper-childhood-bacterial-meningitis-surveillance-southern-vietnam/ Mon, 08 May 2023 00:00:00 +0000 https://statsandr.com/blog/paper-childhood-bacterial-meningitis-surveillance-southern-vietnam/ I am happy to announce that a paper I contributed to has been accepted for publication in Open Forum Infectious Diseases (Truong et al. 2023). This study investigates bacterial meningitis among children aged under five years in Southern Vietnam for the last 10 years. Read more here. I hope this paper will, to some extent, be helpful for your research. As always, if you have any question related to the topic covered in this paper, please add it as a comment so other readers can benefit from the discussion. 10 potential career options with a degree in statistics https://statsandr.com/blog/10-potential-career-options-with-a-degree-in-statistics/ Fri, 24 Mar 2023 00:00:00 +0000 https://statsandr.com/blog/10-potential-career-options-with-a-degree-in-statistics/ Introduction What types of jobs are available? Statistician Data scientist, data/business analyst, data engineer or machine learning engineer Actuary or actuarial analyst Financial (risk) analyst, investment analyst, financial trader, financial manager or quantitative analyst Business intelligence analyst Operational researcher or quality control analyst Market or survey researcher Economist or econometrician (Freelance) consultant Teacher PhD student Conclusion This post has been written in collaboration with Daniel Williams. Top 10 errors in R and how to fix them https://statsandr.com/blog/top-10-errors-in-r/ Tue, 07 Feb 2023 00:00:00 +0000 https://statsandr.com/blog/top-10-errors-in-r/ Introduction 1. Unmatched parentheses, curly braces, square brackets or quotes 2. Using a function that is not installed or loaded 3. Typos in function, variable, dataset, object or package names 4. Missing, incorrect or misspelled arguments in functions 5. Wrong, inappropriate or inconsistent data types 6. Forgetting the + sign in ggplot2 7. Misunderstanding between = and == 8. Undefined columns selected 9. Problem when importing or using the wrong data file 10. Web scraping in R https://statsandr.com/blog/web-scraping-in-r/ Mon, 16 Jan 2023 00:00:00 +0000 https://statsandr.com/blog/web-scraping-in-r/ Introduction HTML and CSS Web scraping vs. APIs Why does web scraping exist if APIs are so powerful and do exactly the same work? Web scraping in R rvest HTTP GET request Parsing HTML content CSS selector XPath Getting attributes A real application of web scraping in R HTTP GET request Parsing HTML content and getting attributes Analysis on the database To go further Conclusion Note: This post has been written in collaboration with Pietro Zanotta. What is survival analysis? Examples by hand and in R https://statsandr.com/blog/what-is-survival-analysis/ Thu, 22 Dec 2022 00:00:00 +0000 https://statsandr.com/blog/what-is-survival-analysis/ Introduction What is survival analysis? Why do we need special methods for survival analysis? Common functions in survival analysis Survival function Cumulative hazard function Hazard function Estimation By hand In R Hypothesis testing Log-rank test By hand In R To go further References Note that this article is inspired from: the lecture notes of Prof. Van Keilegom and my personal notes as teaching assistant for her course entitled “Analysis of Survival and Duration Data” given at UCLouvain the lecture notes of Prof. Google Analytics in R: Review of 2022 https://statsandr.com/blog/review-of-2022/ Fri, 16 Dec 2022 00:00:00 +0000 https://statsandr.com/blog/review-of-2022/ Introduction Prerequisites Analytics Page views over time Page views per month and year Top performing pages Page views by country Page views per day of week Thank you note Introduction It is almost the end of the year, which means it is time to do a review of Stats and R and look back on the past year. This year’s review will be much shorter compared to previous years because there are already a lot of examples in those previous reviews. Paper: 'EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number' https://statsandr.com/blog/paper-epilps-a-fast-and-flexible-bayesian-tool-for-estimation-of-the-time-varying-reproduction-number/ Wed, 19 Oct 2022 00:00:00 +0000 https://statsandr.com/blog/paper-epilps-a-fast-and-flexible-bayesian-tool-for-estimation-of-the-time-varying-reproduction-number/ Introduction Motivation Getting started A simulated example Smoothing the epidemic curve and estimating \(\mathcal{R}_t\) USA hospitalization data Conclusion References Introduction A colleague (and friend) of mine recently published a research paper entitled “EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number” in PLoS Computational Biology. I am not in the habit of sharing research paper to which I did not contribute. How to keep yourself updated with the latest R news? https://statsandr.com/blog/how-to-keep-up-to-date-with-the-latest-r-news/ Thu, 13 Oct 2022 00:00:00 +0000 https://statsandr.com/blog/how-to-keep-up-to-date-with-the-latest-r-news/ Introduction How do I keep track? Twitter Newsletters Conclusion Introduction At the end of one of the training sessions I gave on R, a student asked me the following question: How do you keep yourself updated with the latest R news? It is true that R, being open source (meaning that everyone can contribute), is evolving rapidly. One-sample Wilcoxon test in R https://statsandr.com/blog/one-sample-wilcoxon-test-in-r/ Thu, 07 Jul 2022 00:00:00 +0000 https://statsandr.com/blog/one-sample-wilcoxon-test-in-r/ Introduction When? Data How? Combine statistical test and plot Conclusion References Introduction In a previous article, we showed how to do a two-sample Wilcoxon test in R. Remember that there are actually two versions of this test: The Mann-Whitney-Wilcoxon test (also referred as Wilcoxon rank sum test or Mann-Whitney U test), used to compare two independent samples. This test is the non-parametric version of the Student’s t-test for independent samples. Koh-Lanta 2022: the ambassadors probability problem https://statsandr.com/blog/koh-lanta-2022-ambassadors-probability-problem/ Mon, 16 May 2022 00:00:00 +0000 https://statsandr.com/blog/koh-lanta-2022-ambassadors-probability-problem/ Introduction Before 2022 In 2022 Probabilities computation in R First draw Second draw Third draw Game limited to 3 draws Game limited to 5 draws Game limited to 100 draws Game limited to the number of necessary draws Final winning probabilities Visual representations Coded into a function Conclusion Introduction There is a popular TV show broadcasted in France and the french-speaking part of Belgium called “Koh-Lanta”. Paper: 'Semi-Markov modeling for cancer insurance' https://statsandr.com/blog/paper-semi-markov-modeling-for-cancer-insurance/ Wed, 06 Apr 2022 00:00:00 +0000 https://statsandr.com/blog/paper-semi-markov-modeling-for-cancer-insurance/ I am happy to announce that our paper entitled “Semi-Markov modeling for cancer insurance” has been accepted for publication in the European Actuarial Journal (Soetewey et al. 2022). Advancements in medicine and biostatistics have already resulted in a better access to insurance for people diagnosed with cancer. This materializes into the “right to be forgotten” adopted in several EU member states, granting access to insurance after a waiting period of at most 10 years starting at the end of the successful therapeutic protocol. Kruskal-Wallis test, or the nonparametric version of the ANOVA https://statsandr.com/blog/kruskal-wallis-test-nonparametric-version-anova/ Thu, 24 Mar 2022 00:00:00 +0000 https://statsandr.com/blog/kruskal-wallis-test-nonparametric-version-anova/ Introduction Data Kruskal-Wallis test Aim and hypotheses Assumptions In R Interpretations Post-hoc tests Dunn test Combination of statistical results and plot Summary References Introduction In a previous article, we showed how to do an ANOVA in R to compare three or more groups. Remember that, as for many statistical tests, the one-way ANOVA requires that some assumptions are satisfied in order to be able to use and interpret the results. Stats and R is 2 years old! https://statsandr.com/blog/statsandr-is-2-years-old/ Thu, 16 Dec 2021 00:00:00 +0000 https://statsandr.com/blog/statsandr-is-2-years-old/ Introduction Analytics Users and page views Page views over time Page views per channel Page views per day of week and month of year Page views per month and year Top performing pages Page views by country User engagement by devices Browser information End note Introduction Stats and R has been launched exactly two years ago. Like last year, I think it is a good time to do a review of the past 12 months by sharing some figures about the audience of the blog. What statistical test should I do? https://statsandr.com/blog/what-statistical-test-should-i-do/ Thu, 02 Dec 2021 00:00:00 +0000 https://statsandr.com/blog/what-statistical-test-should-i-do/ Introduction Flowchart Notes Conclusion Introduction Being a teaching assistant in statistics for students with diverse backgrounds, I have the chance to see what is globally not well understood by students. I have realized that it is usually not a problem for students to do a specific statistical test when they are told which one to use (as long as they have good resources and they have been attentive during classes, of course). Multiple linear regression made simple https://statsandr.com/blog/multiple-linear-regression-made-simple/ Mon, 04 Oct 2021 00:00:00 +0000 https://statsandr.com/blog/multiple-linear-regression-made-simple/ Introduction Simple linear regression: reminder Principle Equation Interpretations of coefficients \(\widehat\beta\) Another interpretation of the intercept Significance of the relationship Correlation does not imply causation Conditions of application Visualizations Multiple linear regression Principle Equation Interpretations of coefficients \(\widehat\beta\) Conditions of application How to choose a good linear model? \(P\)-value associated to the model Coefficient of determination \(R^2\) Parsimony Visualizations To go further Print model’s parameters Automatic reporting Predictions Linear hypothesis tests Overall effect of categorical variables Interaction Summary References Introduction Remember that descriptive statistics is a branch of statistics that allows to describe your data at hand. Running pace calculator in R Shiny https://statsandr.com/blog/running-pace-calculator/ Mon, 15 Mar 2021 00:00:00 +0000 https://statsandr.com/blog/running-pace-calculator/ Introduction Running pace calculator How to use it? Code Conclusion Introduction If you are a runner yourself, you are certainly aware of how important preparation is before a race. For the preparation of my first marathon, I used to rely on a training plan. This running plan was great, but an important information was missing: the running pace. Most of the time, the distance and the time was given, but I needed to figure out the pace myself. Hypothesis test by hand https://statsandr.com/blog/hypothesis-test-by-hand/ Wed, 27 Jan 2021 00:00:00 +0000 https://statsandr.com/blog/hypothesis-test-by-hand/ Descriptive versus inferential statistics Motivations and limitations Hypothesis test Why? When? How? Method A: Comparing the test statistic with the critical value Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Finding the critical value Step #4: Concluding and interpreting the results Why don’t we accept \(H_0\)? Method B: Comparing the p-value with the significance level \(\alpha\) Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Computing the p-value Step #4: Concluding and interpreting the results Method C: Comparing the target parameter with the confidence interval Step #1: Stating the null and alternative hypothesis Step #2: Computing the confidence interval Step #3: Concluding and interpreting the results Which method to choose? How to track the performance of your blog in R? https://statsandr.com/blog/track-blog-performance-in-r/ Wed, 16 Dec 2020 00:00:00 +0000 https://statsandr.com/blog/track-blog-performance-in-r/ Introduction Prerequisites Analytics Users, page views and sessions Sessions over time Sessions per channel Sessions per day of week Sessions per day and time Sessions per month and year Top performing pages Time-normalized page views Page views by country Browser information User engagement by devices Content Finding topics Content distribution A small note about ads Future plans Thank you note Introduction Stats and R has been launched on December 16, 2019. Paper: 'Waiting period from diagnosis for mortgage insurance issued to cancer survivors' https://statsandr.com/blog/waiting-period-cancer-survivors/ Mon, 23 Nov 2020 00:00:00 +0000 https://statsandr.com/blog/waiting-period-cancer-survivors/ I am happy to announce that our paper entitled “Waiting period from diagnosis for mortgage insurance issued to cancer survivors” has been published in the European Actuarial Journal (Soetewey et al. 2021). Here is a brief summary of it: Massart (2018) testimonial illustrates the difficulties faced by patients having survived cancer to access mortgage insurance securing home loan. Data collected by national registries nevertheless suggest that excess mortality due to some types of cancer becomes moderate or even negligible after some waiting period. ANOVA in R https://statsandr.com/blog/anova-in-r/ Mon, 12 Oct 2020 00:00:00 +0000 https://statsandr.com/blog/anova-in-r/ Introduction Data Aim and hypotheses of ANOVA Underlying assumptions of ANOVA Variable type Independence Normality Equality of variances - homogeneity Another method to test normality and homogeneity Outliers ANOVA Preliminary analyses ANOVA in R Interpretations of ANOVA results What’s next? Post-hoc test Issue of multiple testing Post-hoc tests in R and their interpretation Tukey HSD test Dunnett’s test Other p-values adjustment methods Visualization of ANOVA and post-hoc tests on the same plot Summary References Introduction ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population means are different. Why do I have a data science blog? 7 benefits of sharing your code https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/ Wed, 02 Sep 2020 00:00:00 +0000 https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/ #1 Learn by writing #2 Get feedback #3 Personal note to remind my future self #4 Contribute to the open source community #5 Stay humble, stay curious #6 Learn to be less perfectionist and to prioritize #7 Build connections and professional relationships How to start your own blog? Conclusion My blog statsandr.com was launched in December 2019. Although 9 months of writing is a very short period compared to others, I can already say that it’s been an incredible and very enriching adventure! Graphics in R with ggplot2 https://statsandr.com/blog/graphics-in-r-with-ggplot2/ Fri, 21 Aug 2020 00:00:00 +0000 https://statsandr.com/blog/graphics-in-r-with-ggplot2/ Introduction Data Basic principles of {ggplot2} Create plots with {ggplot2} Scatter plot Line plot Combination of line and points Histogram Density plot Combination of histogram and densities Dotplot Boxplot Barplot Raincloud plot Further personalization Title and axis labels Axis ticks Log transformations Limits Scales for better axis formats Legend Shape, color, size and transparency Text and labels Smooth and regression lines Facets Themes Interactive plot with {plotly} Combine plots with {patchwork} Flip coordinates Save plot Managing dates Highlight data with {gghighlight} Tip To go further Conclusion Mortgage calculator in R Shiny https://statsandr.com/blog/mortgage-calculator-r-shiny/ Fri, 14 Aug 2020 00:00:00 +0000 https://statsandr.com/blog/mortgage-calculator-r-shiny/ Introduction Mortgage calculator How to use the mortgage calculator? Code of the app Conclusion Introduction I recently moved out and bought my first apartment. Of course, I could not pay it entirely with my own savings, so I had to borrow money from the bank. I visited a couple of banks operating in my country and asked for a mortgage. If you already bought your house or apartment in the past, you know how it goes: the bank analyzes your financial and personal situation and make an offer based on your propensity to repay the bank. Outliers detection in R https://statsandr.com/blog/outliers-detection-in-r/ Tue, 11 Aug 2020 00:00:00 +0000 https://statsandr.com/blog/outliers-detection-in-r/ Introduction Descriptive statistics Minimum and maximum Histogram Boxplot Percentiles Z-scores Hampel filter Statistical tests Grubbs’s test Dixon’s test Rosner’s test Additional remarks Conclusion References Introduction An outlier is a value or an observation that is distant from other observations, that is to say, a data point that differs significantly from other data points. Enderlein (1987) goes even further as the author considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism. Wilcoxon test in R: how to compare 2 groups under the non-normality assumption? https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ Sun, 07 Jun 2020 00:00:00 +0000 https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ Introduction Two different scenarios Independent samples Paired samples Combination of plot and statistical test Independent samples Paired samples Assumption of equal variances Conclusion References Introduction In a previous article, we showed how to compare two groups under different scenarios using the Student’s t-test. The Student’s t-test requires that the distributions follow a normal distribution when in presence of small samples.1 In this article, we show how to compare two groups when the normality assumption is violated, using the Wilcoxon test. How to publish a Shiny app? An example with shinyapps.io https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/ Fri, 29 May 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/ Introduction Prerequisite Step-by-step guide Additional notes Settings of your app Publish your dataset Conclusion Introduction The COVID-19 virus led many people to create interactive apps and dashboards. A reader recently asked me how to publish a Shiny app she just created. Similarly to a previous article where I show how to upload R code on GitHub, I thought it would be useful to some people to see how I publish my Shiny apps so they could do the same. Correlation coefficient and correlation test in R https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/ Thu, 28 May 2020 00:00:00 +0000 https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/ Introduction Data Correlation coefficient Between two variables Correlation matrix: correlations for all variables Interpretation of a correlation coefficient Visualizations A scatterplot for 2 variables Scatterplots for several pairs of variables Another simple correlation matrix Correlation test For 2 variables For several pairs of variables Combination of correlation coefficients and correlation tests Correlograms Correlation does not imply causation Conclusion References Introduction Correlations between variables play an important role in a descriptive analysis. How to upload your R code on GitHub? An example with an R script on MacOS https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/ Sun, 24 May 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/ Introduction Prerequisite Step-by-step guide Additional notes Conclusion Introduction Few days ago, a colleague asked me how to upload some R code on GitHub in order to make it accessible to everyone. Due to the lockdown, I could not just go into his office and show him on his computer. So I sent him several screenshots showing, step by step, how to do so. Right before I deleted the screenshots I’d just taken, I thought that perhaps they would be useful for other persons, so I wrote this article. Press https://statsandr.com/press/ Sun, 24 May 2020 00:00:00 +0000 https://statsandr.com/press/ Here is a summary of press mentions of the blog: How can we predict the evolution of COVID 19 in Belgium? (UCLouvain: in English & in French) Evolution of COVID-19 hospital admissions in Belgium (LN24) Contact and social profiles Contact me X Bluesky Fosstodon Medium LinkedIn GitHub COVID-19 in Belgium: is it over yet? https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/ Fri, 22 May 2020 00:00:00 +0000 https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/ Introduction New hospital admissions Overall By period Zooming in Patients in hospitals Patients in intensive care Confirmed cases By province By age group and sex Static Dynamic By age group, sex and province Conclusion Introduction Note 1: The present article has been written on May 22, 2020 and has been updated infrequently. The current situation regarding COVID-19 in Belgium may therefore be different to what is presented below. One-proportion and chi-square goodness of fit test https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/ Wed, 13 May 2020 00:00:00 +0000 https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/ Introduction In R Data One-proportion test Assumption of prop.test() and binom.test() Chi-square goodness of fit test Assumptions Does my distribution follow a given distribution? Observed frequencies Expected frequencies Observed vs. expected frequencies By hand One-proportion test Verification in R Goodness of fit test Verification in R Conclusion Introduction In a previous article, I presented the Chi-square test of independence in R which is used to test the independence between two categorical variables. A package to download free Springer books during Covid-19 quarantine https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/ Sun, 26 Apr 2020 00:00:00 +0000 https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/ Update Introduction Installation Download all books at once Create a table of Springer books Download only specific books By title By author By subject Improvements Acknowledgments Conclusion Update The promotion has ended so it is not possible to download the books through R. If you did not download the books in time, you can still have access to them via this link. COVID-19 in Belgium https://statsandr.com/blog/covid-19-in-belgium/ Tue, 31 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/covid-19-in-belgium/ Introduction Top R resources on Coronavirus Coronavirus dashboard for your own country Motivations, limitations and structure of the article Analysis of Coronavirus in Belgium A classic epidemiological model: the SIR model Fitting a SIR model to the Belgium data Reproduction number \(R_0\) Using our model to analyze the outbreak if there was no intervention More summary statistics Additional considerations Ascertainment rates More sophisticated models Modelling the epidemic trajectory using log-linear models Estimating changes in the effective reproduction number \(R_e\) More sophisticated projections Conclusion References Introduction The Novel COVID-19 Coronavirus is still spreading quickly in several countries and it does not seem like it is going to stop anytime soon as the peak has not yet been reached in many countries. How to create a simple Coronavirus dashboard specific to your country in R? https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/ Mon, 23 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/ Introduction Top R resources on Coronavirus Coronavirus dashboard: the case of Belgium How to create your own Coronavirus dashboard Additional notes Data Open source Accuracy Publish your dashboard Conclusion Coronavirus dashboard: the case of Belgium Introduction The Novel COVID-19 Coronavirus is the hottest topic right now. Every day, the media and newspapers share the number of new cases and deaths in several countries, try to measure the impacts of the virus on citizens and remind us to stay home in order to stay safe. How to do a t-test or ANOVA for more than one variable at once in R? https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/ Thu, 19 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/ Introduction Perform multiple tests at once Concise and easily interpretable results T-test Additional p-value adjustment methods ANOVA To go even further Update with the {ggstatsplot} package Conclusion References Introduction As part of my teaching assistant position in a Belgian university, students often ask me for some help in their statistical analyses for their master’s thesis. A frequent question is how to compare groups of patients in terms of several quantitative continuous variables. Top 100 R resources on COVID-19 Coronavirus https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/ Thu, 12 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/ R Shiny apps and dashboards Coronavirus tracker Coronavirus dashboard from the {coronavirus} package Visualization of Covid-19 Cases Modeling COVID-19 Spread vs Healthcare Capacity COVID-19 Data Visualization Platform Coronavirus 10-day forecast Coronavirus (COVID-19) across the world COVID-19 outbreak Flatten the Curve Explore the spread of Covid-19 Governments and COVID-19 Simulating COVID-19 Epidemic in Togo - West Africa Covid-19 Prediction Covid-19 Dashboard Healthcare worker deaths from novel Coronavirus (COVID-19) in the US Covid-19 Hospitalizations in Belgium COVIDMINDER: Where you live matters! How to perform a one-sample t-test by hand and in R: test on one mean https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/ Mon, 09 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/ Introduction Null and alternative hypothesis Hypothesis testing Two versions of the one-sample t-test How to compute the one-sample t-test by hand? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Different underlying distributions for the critical value How to compute the one-sample t-test in R? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Confidence interval Combination of plot and statistical test Scenario 2: variance of the population is unknown Assumptions Conclusion References Introduction After having written an article on the Student’s t-test for two samples (independent and paired samples), I believe it is time to explain in details how to perform one-sample t-tests by hand and in R. The 9 concepts and formulas in probability that every data scientist should know https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/ Tue, 03 Mar 2020 00:00:00 +0000 https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/ What is probability? 1. A probability is always between 0 and 1 2. Compute a probability 3. Complement of an event 4. Union of two events 5. Intersection of two events 6. Independence of two events 7. Conditional probability Bayes’ theorem Example 8. Accuracy measures False negatives False positives Sensitivity Specificity Positive predictive value Negative predictive value 9. Counting techniques Multiplication Example Permutation Example By hand In R Combination Example By hand In R Conclusion What is probability? FAQ - Frequently asked questions https://statsandr.com/faq/ Mon, 02 Mar 2020 00:00:00 +0000 https://statsandr.com/faq/ Who is behind this blog? What is your background? Why did you launch this blog? What technology and theme do you use to write this blog and the articles? I am new to this blog, to R or to statistics, from where can I start? Can I reuse or translate the content of your blog? I would like to replicate an analysis you have done in one of your article, can I have access to the entire code? Student's t-test in R and by hand: how to compare two groups under different scenarios? https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/ Fri, 28 Feb 2020 00:00:00 +0000 https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/ Introduction Null and alternative hypothesis Hypothesis testing Different versions of the Student’s t-test How to compute Student’s t-test by hand? Scenario 1: Independent samples with 2 known variances Scenario 2: Independent samples with 2 equal but unknown variances Scenario 3: Independent samples with 2 unequal and unknown variances Scenario 4: Paired samples where the variance of the differences is known Scenario 5: Paired samples where the variance of the differences is unknown How to compute Student’s t-test in R? Correlogram in R: how to highlight the most correlated variables in a dataset https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/ Sat, 22 Feb 2020 00:00:00 +0000 https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/ Introduction Correlation matrix Correlogram Correlation test Code {ggstatsplot} package {lares} package All possible correlations Correlation of one variable against all others Conclusion References Introduction Correlation, often computed as part of descriptive statistics, is a statistical tool used to study the relationship between two variables, that is, whether and how strongly couples of variables are associated. Correlations are measured between 2 variables at a time. Therefore, for datasets with many variables, computing correlations can become quite cumbersome and time consuming. Getting started in R markdown https://statsandr.com/blog/getting-started-in-r-markdown/ Tue, 18 Feb 2020 00:00:00 +0000 https://statsandr.com/blog/getting-started-in-r-markdown/ R Markdown: what, why and how? Before you start Components of a .Rmd file YAML header Code chunks Text Code inside text Highlight text like it is code Images Tables Additional notes and useful resources Conclusion If you have spent some time writing code in R, you probably have heard of generating dynamic reports incorporating R code, R outputs (results) and text or comments. The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/ Thu, 13 Feb 2020 00:00:00 +0000 https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/ What is clustering analysis? Application 1: Computing distances Solution k-means clustering Application 2: k-means clustering Data kmeans() with 2 groups Quality of a k-means partition nstart for several initial centers and better stability kmeans() with 3 groups Optimal number of clusters Elbow method Silhouette method Gap statistic method Consensus-based algorithm Visualizations Manual application and verification in R Solution by hand Solution in R Hierarchical clustering Application 3: hierarchical clustering Data Solution by hand Single linkage Complete linkage Average linkage Solution in R Single linkage Optimal number of clusters Complete linkage Average linkage k-means versus hierarchical clustering What’s next? Contribute https://statsandr.com/contribute/ Sat, 08 Feb 2020 00:00:00 +0000 https://statsandr.com/contribute/ Stats and R welcomes guest posts that provides unique insight into statistics and R. How can you contribute? If you want to contribute and write a guest post for statsandr.com, please submit your article through this contribution form. Once your guest post is received, I will review it and inform you about the decision (i.e., accepted, rejected, or accepted with minor changes). Submission rules and guidelines Before submitting your article, please read the following points: An efficient way to install and load R packages https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/ Fri, 31 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/ What is a R package and how to use it? Inefficient way to install and load R packages More efficient way Most efficient way {pacman} package {librarian} package Conclusion What is a R package and how to use it? Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/ Wed, 29 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/ What is a normal distribution? Empirical rule Parameters Probabilities and standard normal distribution Areas under the normal distribution in R and by hand Ex. 1 In R By hand Ex. 2 In R By hand Ex. 3 In R By hand Ex. 4 In R By hand Ex. 5 Why is the normal distribution so crucial in statistics? Fisher's exact test in R: independence test for a small sample https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/ Tue, 28 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/ Introduction Hypotheses Example Data Observed frequencies Expected frequencies Fisher’s exact test in R Conclusion and interpretation Combination of plot and statistical test Conclusion References Introduction After presenting the Chi-square test of independence by hand and in R, this article focuses on the Fisher’s exact test. Independence tests are used to determine if there is a significant relationship between two categorical variables. There exists two different types of independence test: Chi-square test of independence by hand https://statsandr.com/blog/chi-square-test-of-independence-by-hand/ Mon, 27 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/chi-square-test-of-independence-by-hand/ Introduction Hypotheses How the test works? Example Observed frequencies Expected frequencies Test statistic Critical value Conclusion and interpretation Introduction Chi-square tests of independence test whether two qualitative variables are independent, that is, whether there exists a relationship between two categorical variables. In other words, this test is used to determine whether the values of one of the 2 qualitative variables depend on the values of the other qualitative variable. Chi-square test of independence in R https://statsandr.com/blog/chi-square-test-of-independence-in-r/ Mon, 27 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/chi-square-test-of-independence-in-r/ Introduction Data Chi-square test of independence in R Conclusion and interpretation Combination of plot and statistical test Introduction This article explains how to perform the Chi-square test of independence in R and how to interpret its results. To learn more about how the test works and how to do it by hand, I invite you to read the article “Chi-square test of independence by hand”. To briefly recap what have been said in that article, the Chi-square test of independence tests whether there is a relationship between two categorical variables. How to create a timeline of your CV in R? https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/ Sun, 26 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/ Introduction Minimal reproducible example How to personalize it Additional note Conclusion Introduction In this article, I show how to create a timeline of your CV in R. A CV timeline illustrates key information about your education, work experiences and extra activities. The main advantage of CV timelines compared to regular CV is that they make you stand out immediately by being visually appealing and easier to scan. RStudio addins, or how to make your coding life easier? https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/ Sun, 26 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/ What are RStudio addins? Installation Addins Esquisse ggThemeAssist Questionr Recoding factors Reordering factors Categorize a numeric variable Remedy Styler Snakecaser ViewPipeSteps Ymlthis Reprex Blogdown Conclusion What are RStudio addins? Although I have been using RStudio for several years, I only recently discovered RStudio addins. Since then, I am using these addins almost every time I use RStudio. What are RStudio addins? Descriptive statistics in R https://statsandr.com/blog/descriptive-statistics-in-r/ Wed, 22 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/descriptive-statistics-in-r/ Introduction Data Minimum and maximum Range Mean Median First and third quartile Other quantiles Interquartile range Standard deviation and variance Summary Coefficient of variation Mode Correlation Contingency table Mosaic plot Barplot Histogram Boxplot Dotplot Scatterplot Line plot QQ-plot For a single variable By groups Density plot Correlation plot Advanced descriptive statistics {summarytools} package Frequency tables with freq() Cross-tabulations with ctable() Descriptive statistics with descr() Data frame summaries with dfSummary() describeBy() from the {psych} package aggregate() function summaryBy() from {doBy} group_by() and summarise() from {dplyr} Conclusion Introduction This article explains how to compute the main descriptive statistics in R and how to present them graphically. Support my work https://statsandr.com/support/ Tue, 21 Jan 2020 00:00:00 +0000 https://statsandr.com/support/ On this blog, I share my knowledge in the form of free articles and tutorials about statistics and R. My goal with the blog is to help people to understand statistical concepts (through examples and in plain English), and to apply them in R. When possible, I also contribute to open source projects on GitHub. All the articles, Shiny apps and code are open source and available to everyone (code available directly in the articles or on GitHub). Tips and tricks in RStudio and R Markdown https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/ Tue, 21 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/ Run code Insert a comment in R and R Markdown Knit a R Markdown document Code snippets Ordered list in R Markdown New code chunk in R Markdown Reformat code RStudio addins {pander} and {report} for aesthetics Extract equation model with {equatiomatic} Print model’s parameters Pipe operator %>% Others Conclusion If you have the chance to work with an experienced programmer, you may be amazed by how fast she can write code. Descriptive statistics by hand https://statsandr.com/blog/descriptive-statistics-by-hand/ Sun, 19 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/descriptive-statistics-by-hand/ Introduction Location versus dispersion measures Location Minimum and maximum Mean Median Odd number of observations Even number of observations Mean vs. median \(1^{st}\) and \(3^{rd}\) quartiles \(q_{0.25}\), \(q_{0.75}\) and \(q_{0.5}\) A note on deciles and percentiles Mode Quantitative variables Qualitative variables Dispersion Range Standard deviation Standard deviation for a population Standard deviation for a sample Variance Variance for a population Variance for a sample Standard deviation vs. What is the difference between population and sample? https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/ Sat, 18 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/ Introduction Sample vs. population Why a sample? Representative sample Paired samples Conclusion Introduction People often fail to properly distinguish between population and sample. It is however essential in any statistical analysis, starting from descriptive statistics with different formulas for variance and standard deviation depending on whether we face a sample or a population. Moreover, the branch of statistics called inferential statistics is often defined as the science of drawing conclusions about a population from observations made on a representative sample of that population. A Shiny app for inferential statistics by hand https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/ Wed, 15 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/ A Shiny app for inferential statistics: hypothesis tests and confidence intervals Statistics is divided into four main branches: Descriptive statistics Inferential statistics Predictive analysis Exploratory analysis Descriptive statistics provide a summary of the data; it helps explaining the data in a concise way without losing too much information. Data can be summarized numerically or graphically. See descriptive statistics by hand or in R to learn more about this branch of statistics. A Shiny app for simple linear regression by hand and in R https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/ Wed, 15 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/ A Shiny app to perform simple linear regression (by hand and in R) Simple linear regression is a statistical method to summarize and study relationships between two variables. When more than two variables are of interest, it is referred as multiple linear regression. See this article on linear regression for more details. In this article, we focus only on a Shiny app which allows to perform simple linear regression by hand and in R: World map of visited countries in R https://statsandr.com/blog/world-map-of-visited-countries-in-r/ Thu, 09 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/world-map-of-visited-countries-in-r/ Like me, if you like traveling as much as R you might want to draw a world map of the countries you have visited in R. Below an example with the countries I have visited as of January 2020: A practical guide on optimal asset allocation https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/ Tue, 07 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/ UPDATE: Due to the limitation in terms of maximum number of Shiny apps that can be published on the free shinyapps.io plan, the Shiny app presented below has been unpublished. However, the code can be found on GitHub. Introduction In his book A Random Walk down Wall Street, Burton G. Malkiel advises readers of an optimal asset allocation depending on age. As an amateur investor, I thought it would be useful to develop a Shiny app which depicts his advice for other interested investors. Draw a word cloud with a R Shiny app https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/ Tue, 07 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/ UPDATE: Due to the limitation in terms of maximum number of Shiny apps that can be published on the free shinyapps.io plan, the Shiny app presented below has been unpublished. However, the code can be found on GitHub. Below a Shiny app to help you draw a word cloud: Word cloud Word clouds are particularly useful as part of text mining analyses. Moreover, it is also useful to analyze string and character variables for any datasets (see the different data types in R). How to embed a Shiny app in blogdown? https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/ Tue, 07 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/ Step-by-step guide If you have developed and deployed a Shiny app and would like to embed it in blogdown, follow these steps: create a new post as usual add output: html_document if it is not already included in the YAML metadata insert the following HTML code in the body of the post: <iframe height="800" width="100%" frameborder="no" src="https://antoinesoetewey.shinyapps.io/statistics-201/"> </iframe> You should change the URL with the URL of your deployed Shiny app (after src=, do not forget that the URL should start with http:// or https:// and should be surrounded by " A guide on how to read statistical tables https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/ Mon, 06 Jan 2020 00:00:00 +0000 https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/ Shiny app to compute probabilities for the main probability distributions Below a Shiny app to help you read the main statistical tables: Statistics-101 This Shiny app helps you to compute probabilities for the main probability distributions. How to use this app? Open the app via this link Choose the distribution Set the parameter(s) of the distribution (the parameters depend of course on the chosen distribution) Select whether you want to find the lower tail, upper tail or an interval Choose the value of x On the right panel (or below depending on the size of your screen) you will see: Newsletter https://statsandr.com/subscribe/ Tue, 31 Dec 2019 00:00:00 +0000 https://statsandr.com/subscribe/ By subscribing to this newsletter, you will be notified each time a new article is published. There is no spam, you can unsubscribe at any time, and your email address will never be shared. .subscribe-btn { display: inline-block; padding: 12px 20px; background: #4582EC; color: #fff; text-decoration: none; border-radius: 6px; font-weight: bold; transition: background-color 0.2s ease; } .subscribe-btn:hover, .subscribe-btn:focus, .subscribe-btn:active { background: #2f6fdd; /* darker blue */ color: #fff; /* stay white */ text-decoration: none; /* no underline */ } Subscribe Thanks in advance for reading. Data types in R https://statsandr.com/blog/data-types-in-r/ Mon, 30 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/data-types-in-r/ What data types exist in R? Numeric Integer Character Factor Logical Conclusion This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”. What data types exist in R? There are the 6 most common data types in R: Numeric Integer Complex Character Factor Logical Datasets in R are often a combination of these 6 different data types. Variable types and examples https://statsandr.com/blog/variable-types-and-examples/ Mon, 30 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/variable-types-and-examples/ Introduction Different types of variables for different types of statistical analysis Big picture Quantitative Discrete Continuous Qualitative Nominal Ordinal Variable transformations From continuous to discrete From quantitative to qualitative Additional notes Misleading data encoding Conclusion Introduction If you happen to work with datasets frequently, you probably know that each row of your dataset represents a different experimental unit (also called observation) and each column represents a different characteristic (called variable): How to create an interactive booklist with automatic Amazon affiliate links in R? https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/ Thu, 26 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/ Introduction Requirements Create a booklist Create it in Excel then import it Create it directly in R Make it interactive Add URLs with your affiliate link to the table Extract affiliate link Append the book title and author to make it automatic Add links to the interactive table Final result Conclusion Introduction Booklists are a useful way to share the books you have read and which you recommend to other readers and/or to promote the books you have written. Terms and policies https://statsandr.com/terms/ Wed, 25 Dec 2019 00:00:00 +0000 https://statsandr.com/terms/ This is my personal blog written and edited by me (Antoine Soetewey). Your use of this website, in any and all forms, constitutes an acceptance of these terms and policies. This page is reviewed and revised from time to time. All content provided is for informational purposes only. The articles and posts on this website are my own and do not necessarily represent the positions, strategies, or opinions of my employer or its subsidiaries. Data manipulation in R https://statsandr.com/blog/data-manipulation-in-r/ Tue, 24 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/data-manipulation-in-r/ Introduction Vectors Concatenation seq() and rep() Assignment Elements of a vector Type and length Finding the vector type Modifications of type and length Numerical operators Logical operators all() and any() Operations on character strings vector Orders and vectors Factors Creating factors Properties Handling Lists Creating lists Handling Getting details on an object Data frames Line and column names Subset a data frame First or last observations Random sample of observations Based on row or column numbers Based on variable names Based on one or multiple criterion Create a new variable Transform a continuous variable into a categorical variable Sum and mean in rows Sum and mean in column Categorical variables and labels management Recode categorical variables Change reference level Rename variable names Create a data frame manually Merging two data frames Add new observations from another data frame Add new variables from another data frame Missing values Remove NAs Impute NAs Scale Dates and times Dates Times Extraction from dates Exporting and saving Looking for help Conclusion Sitemap https://statsandr.com/sitemap/ Tue, 24 Dec 2019 00:00:00 +0000 https://statsandr.com/sitemap/ A list of all the pages and articles found on the blog. If you cannot find what you are looking for, do not hesitate to contact me. For you robots out there is an XML version available for digesting as well. Pages Home Blog Tags About Contact Subscribe to the newsletter FAQ - Frequently asked questions Contribute - Guest post Support the blog Press Terms and policies Sitemap How to import an Excel file in RStudio? https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/ Wed, 18 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/ Introduction Transform an Excel file to a CSV file R working directory Get working directory Set working directory User-friendly method Via the console Via the text editor Import your dataset User-friendly way Via the text editor Import SPSS (.sav) files Conclusion Introduction As we have seen in this article on how to install R and RStudio, R is useful for many kind of computational tasks and statistical analyses. Contact https://statsandr.com/contact/ Tue, 17 Dec 2019 00:00:00 +0000 https://statsandr.com/contact/ Thanks in advance for contacting me. In order for me to answer you as soon as possible, here are the best communication methods: Due to the increasing number of questions received, responding to each of them by email has become unmanageable and unproductive. Therefore, if you have a question regarding the content of an article, I invite you to add it as a comment at the end of the corresponding article. How to install R and RStudio? https://statsandr.com/blog/how-to-install-r-and-rstudio/ Tue, 17 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/how-to-install-r-and-rstudio/ What is R and RStudio? R RStudio How to install R and RStudio? The main components of RStudio Examples of code Calculator Comments Store and print values Vectors Matrices Generate random values Plot Conclusion Note that this article is inspired from the lecture notes of Prof. Johan Segers and my personal notes as teaching assistant for his course entitled “Multivariate statistical analysis” given at UCLouvain. About me https://statsandr.com/about/ Mon, 16 Dec 2019 00:00:00 +0000 https://statsandr.com/about/ My name is Antoine Soetewey. I am a postdoctoral researcher in data science and statistics at HEC Liège and UCLouvain Saint-Louis Brussels. Before that, I obtained a PhD in statistics at UCLouvain, where my research focused on survival analysis and biostatistical methods applied to cancer patients. In parallel with my research, I teach statistics and probability at the undergraduate and master’s levels as a visiting lecturer at UCLouvain and UNamur. Hello World! https://statsandr.com/blog/hello-world/ Mon, 16 Dec 2019 00:00:00 +0000 https://statsandr.com/blog/hello-world/ hello world This is the first post for the blog Stats and R, just to introduce it. This blog aims at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. The goal of this website is to make statistics easy to understand by illustrating with examples and using plain English. When possible, for all statistical concepts covered here, I also write an article on how to apply these concepts in R.