This is an user guide accompaining Masnadi F, Armelloni E N, Guicciardi S, Pellini G, Raicevich S, Mazzoldi C, Scanu M, Sabatini L, Tassetti A N, Ferrà C, Grati F, Bolognini L, Domenichetti F, Cacciamani R, Calì F, Polidori P, Fabi G, Luzi F, Giovanardi O, Bernarello V, Pasanisi E, Franceschini G, Breggion C, Bozzetta B, Sambo A, Prioli G, Gugnali A, Piccioni E, Fiori F, Caruso F, Scarcella G “Relative survival scenarios: an application to undersized common sole (Solea solea L.) in a beam trawl fishery in the Mediterranean Sea” and serve to facilitate the use of SOLEA: Survial toOL on a scEnario bAsis.
The tool aid the interpretation of fishing dicard survival data by the mean of a scenario approach.
Starting from Vitality assessment onboard, fishing conditions parameters and delayed survival experiment, the tool will individuate the main mortality drivers occurring within the reference fishery and use them to define scenarios on which overall survival is calculated.Download Input data and R code from here by selecting the green square “clone or download” and store them in a unique folder.
To run the tool open Data_analysis.R and jump to lines 23-29 to set some basic input parameters.
setwd: path to folder where data and code are stored must be written within the brackets. Alternatively, the wd might be set by click ctr+shift+h and navigate to the right folder
surv_data: with the present code version, please type “Relative” to analyze delayed survival data with Cox Relative Hazard model (Therneau and Grambsch 2000) and return overall survival in relative terms. Option for future development: by typing “Absolute”, the script will use the Kaplan-Meier model (Kaplan and Meier 1958) to assess delayed mortality and will give overall survival in absolute terms.
n_scenarios: write here how many scenarios you want to display. Note that a large number of scenarios (>4) require large amount of data
censor: set duration of captive experiment (hours)
filename: write here input file name
setwd("~/CNR/SOS/github/SOLEA")
surv_data<-"Relative" ## Absolute if KM, relative if not
n_scenarios<-as.numeric(4)
censor<-as.numeric(120)
filename<-"Input_data.csv"
list.of.packages <- c("tidyverse", "caret","rpart","rpart.plot","e1071","randomForest","survival","survminer","data.table","Boruta")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
The tool require one data file in .csv format (“Input_data.csv”). The data file need that within each row there are details of a single specimen assessed.
print(summary(Data))
## Towing_speed Delta_T Catch_weight Air_exposure
## Min. :5.500 Min. :-16.200 Low :932 Min. :10.00
## 1st Qu.:6.000 1st Qu.: -9.300 Medium_Low :247 1st Qu.:18.00
## Median :6.400 Median : -2.010 Medium_High:404 Median :25.00
## Mean :6.546 Mean : -2.888 High :284 Mean :28.41
## 3rd Qu.:7.100 3rd Qu.: 2.500 3rd Qu.:35.00
## Max. :7.700 Max. : 10.660 Max. :78.00
##
## Vitality_class Towing_duration Seabed_type Survivability_days status
## A : 158 Min. : 29.00 CFM:1259 Min. :0.500 0:1085
## B : 276 1st Qu.: 53.00 CMS: 67 1st Qu.:1.000 1: 782
## C : 348 Median : 58.00 CSM: 535 Median :3.000
## Dead:1085 Mean : 58.24 IFS: 6 Mean :2.966
## 3rd Qu.: 65.00 3rd Qu.:5.000
## Max. :100.00 Max. :5.000
## NA's :1635
From now on the user do not need to modify the code, except where it is explicited in the present document. It is suggested to run it line by line.
After setting right input parameters, run the code until line 53
Collinearity(Data)
## [1] "Check coplot in Figures folder"
Data<-Data %>% dplyr::select(-Towing_speed)
This is an automated procedure that analyze the importance of remaining variables and exclude the meaningless.
## # A tibble: 5 x 2
## name value
## <chr> <fct>
## 1 Delta_T Confirmed
## 2 Catch_weight Confirmed
## 3 Air_exposure Confirmed
## 4 Towing_duration Confirmed
## 5 Seabed_type Confirmed
The RF algorithm (Breiman 2001) is performed on selected features. OOB estimates will give the error rate calculated on the confusion matrix
##
## Call:
## randomForest(formula = status ~ ., data = db, importance = TRUE, ntree = tree_best, mtry = try_best)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 25.44%
## Confusion matrix:
## 0 1 class.error
## 0 882 203 0.1870968
## 1 272 510 0.3478261
## # A tibble: 2 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 B s 1.39 0.445 0.104 0.598
## 2 C s 1.48 0.475 0.0895 0.575
## # A tibble: 2 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 B ns -0.333 0.398 0.640 3.04
## 2 C s 1.03 0.420 0.157 0.813
## # A tibble: 2 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 B ns 0.256 0.494 0.294 2.04
## 2 C ns 0.626 0.449 0.222 1.29
## # A tibble: 2 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 B ns 0.611 0.379 0.258 1.14
## 2 C s 1.22 0.485 0.114 0.761
## # A tibble: 2 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 B s 0.537 0.208 0.388 0.879
## 2 C s 1.23 0.226 0.188 0.454
## # A tibble: 3 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A s 0.710 0.0815 0.889 0.567
## 2 B s 0.25 0.108 0.584 0.107
## 3 C ns 0.308 0.128 0.695 0.136
## # A tibble: 3 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A s 0.0833 0.0798 0.544 0.0128
## 2 B ns 0.286 0.0986 0.562 0.145
## 3 C s 0 NaN NA NA
## # A tibble: 3 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A s 0.375 0.121 0.706 0.199
## 2 B ns 0.364 0.145 0.795 0.166
## 3 C ns 0.231 0.117 0.623 0.0855
## # A tibble: 3 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A s 0.703 0.0751 0.867 0.570
## 2 B ns 0.525 0.0790 0.705 0.391
## 3 C ns 0.222 0.139 0.754 0.0655
## # A tibble: 3 x 1
## i$Indicator $sign $coef $se $ul $ll
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A s 0.573 0.0505 0.681 0.482
## 2 B s 0.398 0.0522 0.514 0.308
## 3 C s 0.188 0.0563 0.338 0.104
## [1] "Scenario : 3"
## # A tibble: 1 x 3
## # Groups: IS [1]
## IS Percentage Scenar
## <chr> <dbl> <chr>
## 1 1 0.221 3
## # A tibble: 3 x 3
## Indicator Percentage Scenar
## <fct> <dbl> <chr>
## 1 A 0.289 3
## 2 B 0.258 3
## 3 C 0.454 3
## # A tibble: 1 x 4
## RS upper_ci low_ci Scenario
## <dbl> <dbl> <dbl> <chr>
## 1 0.105 0.132 0.0863 3
## [1] "Scenario : 4"
## # A tibble: 1 x 3
## # Groups: IS [1]
## IS Percentage Scenar
## <chr> <dbl> <chr>
## 1 1 0.546 4
## # A tibble: 3 x 3
## Indicator Percentage Scenar
## <fct> <dbl> <chr>
## 1 A 0.198 4
## 2 B 0.446 4
## 3 C 0.356 4
## # A tibble: 1 x 4
## RS upper_ci low_ci Scenario
## <dbl> <dbl> <dbl> <chr>
## 1 0.273 0.338 0.232 4
## [1] "Scenario : 6"
## # A tibble: 1 x 3
## # Groups: IS [1]
## IS Percentage Scenar
## <chr> <dbl> <chr>
## 1 1 0.437 6
## # A tibble: 3 x 3
## Indicator Percentage Scenar
## <fct> <dbl> <chr>
## 1 A 0.135 6
## 2 B 0.326 6
## 3 C 0.539 6
## # A tibble: 1 x 4
## RS upper_ci low_ci Scenario
## <dbl> <dbl> <dbl> <chr>
## 1 0.198 0.198 0.198 6
## [1] "Scenario : 7"
## # A tibble: 1 x 3
## # Groups: IS [1]
## IS Percentage Scenar
## <chr> <dbl> <chr>
## 1 1 0.716 7
## # A tibble: 3 x 3
## Indicator Percentage Scenar
## <fct> <dbl> <chr>
## 1 A 0.182 7
## 2 B 0.390 7
## 3 C 0.428 7
## # A tibble: 1 x 4
## RS upper_ci low_ci Scenario
## <dbl> <dbl> <dbl> <chr>
## 1 0.335 0.436 0.274 7
## [1] "Scenario : Aggregate"
## # A tibble: 1 x 3
## # Groups: IS [1]
## IS Percentage Scenar
## <chr> <dbl> <chr>
## 1 1 0.419 Aggregate
## # A tibble: 3 x 3
## Indicator Percentage Scenar
## <fct> <dbl> <chr>
## 1 A 0.202 Aggregate
## 2 B 0.353 Aggregate
## 3 C 0.445 Aggregate
## # A tibble: 1 x 4
## RS upper_ci low_ci Scenario
## <dbl> <dbl> <dbl> <chr>
## 1 0.228 0.268 0.194 Aggregate
Breiman, Leo. 2001. “Random Forrest.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Friedman, J H. 2001. “Greedy function approximation: a gradient boosting machine.” Annals of Statistics 29 (5): 1189–1232. https://projecteuclid.org/euclid.aos/1013203451.
ICES. 2014. “Report of the Workshop on Methods for Estimating Discard Survival (WKMEDS). ICES HQ, Copenhagen, Denmark.” ICES CM 2014/ACOM:51. 114pp.
Kaplan, E. L., and Paul Meier. 1958. “Nonparametric Estimation from Incomplete Observations.” Journal of the American Statistical Association 53 (282): 457–81. https://doi.org/10.1080/01621459.1958.10501452.
Therneau, Terry M., and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. Statistics for Biology and Health. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4757-3294-8.
Zuur, Alain F., Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed effects models and extensions in ecology with R. Vol. 58. Statistics for Biology and Health 12. Springer-Verlag New York, NY, USA. https://doi.org/10.1007/978-0-387-87458-6.