The Parasitologist

Statistics

How to use R in Parasitology - Schistoma diagnostics

Recently I started using R-computing. The applications of R are just fantastic. In the future I want to share more of my R-code and its applications in the field of parasitology.

In this example I used some data of of some study from a decade ago. After one day I got an order to remove this data from public funded research. The raw data set had to be removed because third parties might misuse it. I guess we have some hot stuff here.

Anyway, i put some noise into the dataset using functions like ‘x <- sample(0:5, 176, replace=T)’ and ‘a <- rnorm(176, sd = 0.6)’. I just want to show the method, not the data.

Suppose you have the diagnostic data of microscopic examinations of feces and urine with the number of Schistosoma eggs counted. In feces it’s Schistoma mansoni eggs and in urine it’s Schistosoma haematobium. You want to compare these values with a the real-time PCR analyses on the same stool samples. S. haematobium PCR will also be performed on stool samples (yes, it’s possible).

Make sure the workingspace is correct and import the data from the file: schistoPCR.txt

setwd("~/Documents/workspace/R-project/SCHISTO")

## import dataset in txt file 
worms <- read.table("~/Documents/workspace/R-project/SCHISTO/schistoPCR2.txt", header=T, row.names = 1)

# to see variables by name
summary(worms)
##      Ct_Sm           Ct_Sh          Ct_PhHV           Kato       
##  Min.   : 0.00   Min.   : 0.00   Min.   :30.42   Min.   :   0.0  
##  1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.:31.92   1st Qu.:   0.0  
##  Median :24.23   Median : 0.00   Median :32.66   Median :  80.0  
##  Mean   :17.89   Mean   :14.98   Mean   :32.68   Mean   : 453.8  
##  3rd Qu.:27.82   3rd Qu.:35.46   3rd Qu.:33.29   3rd Qu.: 340.0  
##  Max.   :38.39   Max.   :45.00   Max.   :37.53   Max.   :9320.0  
##      urine        
##  Min.   :   0.00  
##  1st Qu.:   3.00  
##  Median :   6.00  
##  Mean   :  67.13  
##  3rd Qu.:  40.75  
##  Max.   :1097.90
head(worms)
##   Ct_Sm Ct_Sh Ct_PhHV Kato urine
## 1 38.39     0   34.60  100   4.0
## 2 23.94     0   33.07  340   4.0
## 3 25.31     0   33.45   40   0.0
## 4 31.54     0   34.67    0   0.0
## 5 29.87     0   32.94    0  14.5
## 6 24.35     0   33.64 2160   3.0

Now lets see what the plot looks like when you compare Ct values with Microscopic data.

## correlate Kato & CtSm (which looks like crap)
cor(worms$Ct_Sm, worms$Kato)
## [1] 0.1636928
plot(worms$Ct_Sm, worms$Kato)

It looks awfull. First problem: ‘not-detected’ Ct-values have value zero. It needs to be changed into a ‘weakest’ or ‘highest’ Ct-value in order to keep the negatives in line with the rest of the Ct-values.

Look for the maximum Ct value of CtSm and CtSh. Then replace ‘0’-Ct-values with max value plus 3.3 Ct’s

x <- max(worms[,1:2], na.rm=TRUE)
x
## [1] 45
CtM <- x + 3.3
CtM
## [1] 48.3
worms$Ct_Sm <- replace(worms$Ct_Sm, worms$Ct_Sm == 0, CtM)
worms$Ct_Sh <- replace(worms$Ct_Sh, worms$Ct_Sh == 0, CtM)

Do the same with the negative microscopic values: replace 0-values in Kato and in urine with new minimum values.

worms.sub <- subset(worms$Kato, worms$Kato > 0)
y <- min(worms.sub, na.rm=TRUE)
y
## [1] 20
KatoMn <- y/4
KatoMn
## [1] 5
worms$Kato <- replace(worms$Kato, worms$Kato == 0, KatoMn)

worms.sub <- subset(worms$urine, worms$urine > 0)
z <- min(worms.sub, na.rm=TRUE)
z
## [1] 0.5
urineMn <- z/10
urineMn
## [1] 0.05
worms$urine <- replace(worms$urine, worms$urine == 0, urineMn)

Check the new correlations. Now it looks much better.

cor(worms$Ct_Sm, worms$Kato)
## [1] -0.3706783
cor(worms$Ct_Sh, worms$urine)
## [1] -0.3315389

And let us make some nice plots.

library(ggplot2)
p <- qplot(worms$Ct_Sm, log(worms$Kato), worms)
## Warning: Ignoring unknown parameters: NA
p + geom_smooth(method = "lm")

w <- qplot(worms$Ct_Sh, log(worms$urine), worms)
## Warning: Ignoring unknown parameters: NA
w + geom_smooth(method = "lm")