This post was kindly contributed by The DO Loop - go there to comment and to read the full post. When the computation requires column statistics, the SQL procedure is also useful. The computation requires computing the means across rows and down columns, and the student was struggling with implementing the computations in the DATA step. The data shows the hair color and eye color of European children. In the eye-by-hair table, each cell contains three values. The test statistic is
|Published (Last):||28 February 2007|
|PDF File Size:||4.33 Mb|
|ePub File Size:||16.16 Mb|
|Price:||Free* [*Free Regsitration Required]|
However, the traditional copy-paste production method is time-consuming and frequently generates typing errors. Current available statistical tools are still far away from ideal, because they are difficult to understand and they lack flexibility. A user-friendly, dynamic, and flexible tool is needed for researchers to automate the creation of demographic tables. The macro provides optional parameters that allow for the full customization of desired demographic tables.
Demographic information, usually presented in a table and widely used in medical research and population studies, provides a summary of participant characteristics 1 , 2. A demographic table, usually the first table in a peer-reviewed article on medical research and population studies, is commonly used to describe the population under study and gives the reader a sense of differences in demographic characteristics in the population according to treatment, exposure or outcome 3.
A demographic table typically contains summary statistics and P values. Summary statistics often include the counts, means, standard deviations SD , medians, 25th and 75th percentiles [also called interquartile range IQR ], and ranges minimum and maximum values for continuous variables, and frequencies and percentages of subjects for categorical variables 4.
A P value is determined from a statistical test, such as t -test, F-test, or Chi-square test. Table 1 below shows an example demographic table in clinical trials. However, there are some drawbacks to this process of producing demographic tables. First, it is tedious and time-consuming. Regardless of which software one uses, one must spend a significant amount of time and energy in formatting the results to meet the publication requirement.
Second, it is difficult to control the quality and the correctness of results. During this manual copy-paste process, one has to spend a lot of time on double-checking for typographical errors. In addition, this traditional copy-paste method does not comply with the concept of reproducible research 5 - 7 and literate programming 8 in academia.
Although we have a long way to go before fully reaching the standard of reproducible research 9 , we can minimize the usage of manual operations by automatically producing demographic tables. Many software engineers, biostatisticians, and medical researchers have attempted to develop command-line interface-based tools that can generate publishable statistical tables directly from research data 10 - However, these tools are still far from optimal because they are either hard to understand or lack flexibility and thus cannot be applied to a wide variety of situations to create demographic tables for academic journals SAS, one of the most popular statistical software, has many procedures for obtaining summary statistics and implementing statistical tests.
However, none of them can directly generate demographic tables that meet the publication requirement, such as that of the American Psychological Association APA style table 2. With some upfront coding work, we can combine SAS features to make a compelling tabulating tool for automatically producing demographic tables. Typically, a complete demographic table contains two parts: statistical description and statistical inference.
For a categorical variable, it is sufficient to report the frequency and relative percentage of each category. The statistical inference part contains P values from the appropriate statistical tests. The details on the choice of appropriate statistical tests have been discussed in many books 2 , 4. The primary purpose of demographic tables is to assess group differences in demographic characteristics of the population.
Therefore, most of the time, t -test, Wilcoxon rank-sum test, F-test, Kruskal-Wallis test, and Chi-square test would be enough for this purpose. See Table 2 below for more details. In medical research and population studies, with a sufficiently large sample, a statistical test will almost always demonstrate a significant difference, unless there is no effect whatsoever.
In this situation, the standardized difference would be a useful and straightforward alternative to P values when there are only two groups. Standardized difference scores are intuitive indexes that measure the effect size between two groups.
Compared to the t -test or Wilcoxon rank-sum test, they are independent of sample size. An absolute standardized difference greater than 10 percent is approximately equivalent to a P value less than 0.
This method has been widely used in the literature 18 , However, the absolute standardized difference can only be calculated for means or percentages. For median, Hodges-Lehmann estimator would be a proper measurement SAS has many functions and procedures for data manipulation, statistical description and inference, and data presentation. However, no procedure is available to accomplish an APA style demographic table in one step.
The most appropriate strategy is to assemble procedures that produce descriptive statistics and P values, as well as other entries in the demographic table by packing them into a user-friendly SAS macro. It is often used to reduce the amount of regular SAS code and provides an efficient way to automate a process. To develop a user-friendly SAS macro that can automatically produce publishable demographic tables, we need to perform at least four steps. First, we use statistical procedures to get descriptive and inferential statistics.
Lastly, we adapt the SAS code snippets into sub-macros, and then put the sub-macros together into a powerful macro that can be reused in the near future.
We can also check the correctness of the data, including the existence of the dataset and variables. If the names of a dataset or variables are incorrectly entered, the macro should return error messages. It can quickly produce demographic tables for both journal articles and statistical reports for clinical trials. This macro has the following features: I it is automatic: it can generate a publishable table from raw data with one click; II it is complete: it can automatically produce both descriptive statistics for all variables and P values from parametric tests and non-parametric tests; III it is dynamic: with the parameters specified by users, it is easy and efficient to set the variables labels, table title, footnote, statistical test, total column yes or no , percentage type row or column percentage , page orientation portrait or landscape and document format RTF or PDF that allow for the full customization of desired demographic tables; IV it is robust: when we run the macro, it performs error processing.
It will return error messages when the name of a dataset or variable is incorrectly entered. There are four required parameters data, var, file, and title for the demographic tables with one group and six required parameters data, var, grp, grplabel, file, and title for the demographic tables with multiple groups. The other nine optional parameters can be specified by users or left blank. For the demographic tables with a single group, four required parameters data, var, file, and title must be specified by users while for the demographic tables with multiple groups six required parameters data, var, grp, grplabel, file and title must be specified by users.
Nine optional parameters can be specified by users or left blank. The detailed demonstration will be given through working examples in the next section.
Here we illustrate the general principle on how to use it. Suppose the macro ggBaseline. All the source code of SAS macros can be obtained on request at guhongqiu yeah. After the macro has been defined, we can invoke the macro as follows to generate the desired tables.
It contains 5, observations and 17 variables from Framingham Heart Study Suppose we want to generate a demographic table with the group variable sex and use P values to evaluate the group differences of three variables, age, weight, and smoking status. Compared to traditional SAS code, the above macro code is clean and concise. Each variable is followed by the associated statistical test and variable label.
Figure 3 shows the resulting table. Each entry in this table is editable and can be easily adapted to meet journal requirements. We can use optional parameters listed in Table 3 to make further customization.
See Figure 4 for the corresponding output. Hodges-Lehmann estimator will be given in line with median IOR as well. The output is shown in Figure 5.
Sometimes, we may need to report the population information without group variables, which means that we treat all the subjects as a single group. The following code shows one example of this application. The output is shown in Figure 6. If there are many levels for one categorical variable for example, zip codes , one may want to reduce the number of levels of this variable by merging some levels together when producing a demographic table.
One can use DATA step statements in SAS to create a new categorical variable and then produce a demographic table based on the new categorical variable. This feature also works for cutting continuous variables into different categories, what we need to do is change the statistical test parameter to CHISQ after defining the format. See Figure 7 for the corresponding output. The macro allows for the quick creation of reproducible and fully customizable tables.
In addition, it allows users to save tables in two different formats, and thus makes all table layouts easily reproducible and transferable. It can significantly enhance the speed and efficiency of report creation and presentation, and thus save valuable time that can be allocated to other productive tasks.
Conflicts of Interest: The authors have no conflicts of interest to declare. Read article at publisher's site DOI : FRes , , 19 Dec To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
PLoS One , 14 9 :e, 03 Sep Alberti C , Boulkedid R. Intensive Care Med , 40 5 , 11 Mar Cited by 6 articles PMID: Kwok HK , Stevens N. Medinfo , 8 Pt , 01 Jan Cited by: 0 articles PMID: Chin J Integr Med , 22 6 , 23 Jan Cited by 2 articles PMID: Preventive Service Task Force , 12 Feb Coronavirus: Find the latest articles and preprints.
SAS - Chi Square
The chi-square test: An example of working with rows and columns in SAS
SAS Chi-Square Test – SAS PROC FREQ