VBTree, short for vector binary tree, is a data structure designed to deal with the data with very structurized column names. It is not uncommon that summary tables are made by different structurized column names. In my case, I met a data collected from series experiments with 2 different data type, 7 temperature conditions, 4 different strain rate conditions and 1 deformation rate, that means there must be \(2*7*4*1=56\) columns in its summary table. I have extracted the first 50 rows and save them into ‘datatest’. Let’s have a look what does it look like:
library(VBTree)
dim(datatest)
#> [1] 50 56
head(datatest[,1:3])
#> Strain-900-0.001-0.6 Stress-900-0.001-0.6 Strain-900-0.01-0.6
#> 1 0.00009 2.81 0.00030
#> 2 0.00026 2.95 0.00052
#> 3 0.00059 3.37 0.00068
#> 4 0.00076 3.37 0.00084
#> 5 0.00093 3.65 0.00112
#> 6 0.00104 3.79 0.00122
colnames(datatest)
#> [1] "Strain-900-0.001-0.6" "Stress-900-0.001-0.6" "Strain-900-0.01-0.6"
#> [4] "Stress-900-0.01-0.6" "Strain-900-0.1-0.6" "Stress-900-0.1-0.6"
#> [7] "Strain-900-1-0.6" "Stress-900-1-0.6" "Strain-950-0.001-0.6"
#> [10] "Stress-950-0.001-0.6" "Strain-950-0.01-0.6" "Stress-950-0.01-0.6"
#> [13] "Strain-950-0.1-0.6" "Stress-950-0.1-0.6" "Strain-950-1-0.6"
#> [16] "Stress-950-1-0.6" "Strain-1000-0.001-0.6" "Stress-1000-0.001-0.6"
#> [19] "Strain-1000-0.01-0.6" "Stress-1000-0.01-0.6" "Strain-1000-0.1-0.6"
#> [22] "Stress-1000-0.1-0.6" "Strain-1000-1-0.6" "Stress-1000-1-0.6"
#> [25] "Strain-1050-0.001-0.6" "Stress-1050-0.001-0.6" "Strain-1050-0.01-0.6"
#> [28] "Stress-1050-0.01-0.6" "Strain-1050-0.1-0.6" "Stress-1050-0.1-0.6"
#> [31] "Strain-1050-1-0.6" "Stress-1050-1-0.6" "Strain-1100-0.001-0.6"
#> [34] "Stress-1100-0.001-0.6" "Strain-1100-0.01-0.6" "Stress-1100-0.01-0.6"
#> [37] "Strain-1100-0.1-0.6" "Stress-1100-0.1-0.6" "Strain-1100-1-0.6"
#> [40] "Stress-1100-1-0.6" "Strain-1150-0.001-0.6" "Stress-1150-0.001-0.6"
#> [43] "Strain-1150-0.01-0.6" "Stress-1150-0.01-0.6" "Strain-1150-0.1-0.6"
#> [46] "Stress-1150-0.1-0.6" "Strain-1150-1-0.6" "Stress-1150-1-0.6"
#> [49] "Strain-1200-0.001-0.6" "Stress-1200-0.001-0.6" "Strain-1200-0.01-0.6"
#> [52] "Stress-1200-0.01-0.6" "Strain-1200-0.1-0.6" "Stress-1200-0.1-0.6"
#> [55] "Strain-1200-1-0.6" "Stress-1200-1-0.6"
Sometimes I need to extract the data with fixed temperature
conditions while in some other circumstances, I have to export data with
fixed strain rate. While how to implement that without making defination
for repeat times for the code for
or while
?
The main idea is to locate all the column names into an array or tensor
with the dimension of \(2*7*4*1\)
through which the methods of array or tensor will be applicable. As
there is regularly repeat in the names with different combination
orders, it is naturally to come out the idea that make all factors in
their names be splited firstly, then put them into some proper data
structures which can make correct mapping between a character vector and
an array, or a tensor. There for these mediate data structures, called
double list and vector binary tree, are designed. Here are what them
look like:
# Save character vector into chrvec:
chrvec <- colnames(datatest)
unregdl <- chrvec2dl(chrvec) # unregularized double list
print(unregdl) # The pure numeric layers (layer2) are not sorted since all elements are treated as character
#> [[1]]
#> [1] "Strain" "Stress"
#>
#> [[2]]
#> [1] "1000" "1050" "1100" "1150" "1200" "900" "950"
#>
#> [[3]]
#> [1] "0.001" "0.01" "0.1" "1"
#>
#> [[4]]
#> [1] "0.6"
#>
#> attr(,"class")
#> [1] "Double.List"
vbt <- dl2vbt(unregdl)
print(vbt) # elements in layer 2 were sorted
#> $tree
#> $tree[[1]]
#> [1] "Strain" "Stress"
#>
#> $tree[[2]]
#> $tree[[2]][[1]]
#> [1] "900" "950" "1000" "1050" "1100" "1150" "1200"
#>
#> $tree[[2]][[2]]
#> $tree[[2]][[2]][[1]]
#> [1] "0.001" "0.01" "0.1" "1"
#>
#> $tree[[2]][[2]][[2]]
#> $tree[[2]][[2]][[2]][[1]]
#> [1] "0.6"
#>
#> $tree[[2]][[2]][[2]][[2]]
#> list()
#>
#>
#>
#>
#>
#> $dims
#> [1] 2 7 4 1
#>
#> attr(,"class")
#> [1] "Vector.Binary.Tree"
Through which column names are splited into four layers in double list and vector binary tree. The levels for each layers are 2, 7, 4 and 1 respectively. Using these data structure, we can readily convert the whole names into tensor or array from double list or vector binary tree. The demonstration:
ts <- dl2ts(unregdl) # Convert from double list to tensor
print(ts)
#> , , 1, 1
#>
#> I2
#> I1 [,1] [,2] [,3]
#> [1,] "Strain-900-0.001-0.6" "Strain-950-0.001-0.6" "Strain-1000-0.001-0.6"
#> [2,] "Stress-900-0.001-0.6" "Stress-950-0.001-0.6" "Stress-1000-0.001-0.6"
#> I2
#> I1 [,4] [,5] [,6]
#> [1,] "Strain-1050-0.001-0.6" "Strain-1100-0.001-0.6" "Strain-1150-0.001-0.6"
#> [2,] "Stress-1050-0.001-0.6" "Stress-1100-0.001-0.6" "Stress-1150-0.001-0.6"
#> I2
#> I1 [,7]
#> [1,] "Strain-1200-0.001-0.6"
#> [2,] "Stress-1200-0.001-0.6"
#>
#> , , 2, 1
#>
#> I2
#> I1 [,1] [,2] [,3]
#> [1,] "Strain-900-0.01-0.6" "Strain-950-0.01-0.6" "Strain-1000-0.01-0.6"
#> [2,] "Stress-900-0.01-0.6" "Stress-950-0.01-0.6" "Stress-1000-0.01-0.6"
#> I2
#> I1 [,4] [,5] [,6]
#> [1,] "Strain-1050-0.01-0.6" "Strain-1100-0.01-0.6" "Strain-1150-0.01-0.6"
#> [2,] "Stress-1050-0.01-0.6" "Stress-1100-0.01-0.6" "Stress-1150-0.01-0.6"
#> I2
#> I1 [,7]
#> [1,] "Strain-1200-0.01-0.6"
#> [2,] "Stress-1200-0.01-0.6"
#>
#> , , 3, 1
#>
#> I2
#> I1 [,1] [,2] [,3]
#> [1,] "Strain-900-0.1-0.6" "Strain-950-0.1-0.6" "Strain-1000-0.1-0.6"
#> [2,] "Stress-900-0.1-0.6" "Stress-950-0.1-0.6" "Stress-1000-0.1-0.6"
#> I2
#> I1 [,4] [,5] [,6]
#> [1,] "Strain-1050-0.1-0.6" "Strain-1100-0.1-0.6" "Strain-1150-0.1-0.6"
#> [2,] "Stress-1050-0.1-0.6" "Stress-1100-0.1-0.6" "Stress-1150-0.1-0.6"
#> I2
#> I1 [,7]
#> [1,] "Strain-1200-0.1-0.6"
#> [2,] "Stress-1200-0.1-0.6"
#>
#> , , 4, 1
#>
#> I2
#> I1 [,1] [,2] [,3]
#> [1,] "Strain-900-1-0.6" "Strain-950-1-0.6" "Strain-1000-1-0.6"
#> [2,] "Stress-900-1-0.6" "Stress-950-1-0.6" "Stress-1000-1-0.6"
#> I2
#> I1 [,4] [,5] [,6]
#> [1,] "Strain-1050-1-0.6" "Strain-1100-1-0.6" "Strain-1150-1-0.6"
#> [2,] "Stress-1050-1-0.6" "Stress-1100-1-0.6" "Stress-1150-1-0.6"
#> I2
#> I1 [,7]
#> [1,] "Strain-1200-1-0.6"
#> [2,] "Stress-1200-1-0.6"
#>
#> attr(,"class")
#> [1] "tensor"
arr <- vbt2arr(vbt) # Convert from vector binary tree to array
print(arr)
#> , , 1, 1
#>
#> [,1] [,2] [,3]
#> [1,] "Strain-900-0.001-0.6" "Strain-950-0.001-0.6" "Strain-1000-0.001-0.6"
#> [2,] "Stress-900-0.001-0.6" "Stress-950-0.001-0.6" "Stress-1000-0.001-0.6"
#> [,4] [,5] [,6]
#> [1,] "Strain-1050-0.001-0.6" "Strain-1100-0.001-0.6" "Strain-1150-0.001-0.6"
#> [2,] "Stress-1050-0.001-0.6" "Stress-1100-0.001-0.6" "Stress-1150-0.001-0.6"
#> [,7]
#> [1,] "Strain-1200-0.001-0.6"
#> [2,] "Stress-1200-0.001-0.6"
#>
#> , , 2, 1
#>
#> [,1] [,2] [,3]
#> [1,] "Strain-900-0.01-0.6" "Strain-950-0.01-0.6" "Strain-1000-0.01-0.6"
#> [2,] "Stress-900-0.01-0.6" "Stress-950-0.01-0.6" "Stress-1000-0.01-0.6"
#> [,4] [,5] [,6]
#> [1,] "Strain-1050-0.01-0.6" "Strain-1100-0.01-0.6" "Strain-1150-0.01-0.6"
#> [2,] "Stress-1050-0.01-0.6" "Stress-1100-0.01-0.6" "Stress-1150-0.01-0.6"
#> [,7]
#> [1,] "Strain-1200-0.01-0.6"
#> [2,] "Stress-1200-0.01-0.6"
#>
#> , , 3, 1
#>
#> [,1] [,2] [,3]
#> [1,] "Strain-900-0.1-0.6" "Strain-950-0.1-0.6" "Strain-1000-0.1-0.6"
#> [2,] "Stress-900-0.1-0.6" "Stress-950-0.1-0.6" "Stress-1000-0.1-0.6"
#> [,4] [,5] [,6]
#> [1,] "Strain-1050-0.1-0.6" "Strain-1100-0.1-0.6" "Strain-1150-0.1-0.6"
#> [2,] "Stress-1050-0.1-0.6" "Stress-1100-0.1-0.6" "Stress-1150-0.1-0.6"
#> [,7]
#> [1,] "Strain-1200-0.1-0.6"
#> [2,] "Stress-1200-0.1-0.6"
#>
#> , , 4, 1
#>
#> [,1] [,2] [,3]
#> [1,] "Strain-900-1-0.6" "Strain-950-1-0.6" "Strain-1000-1-0.6"
#> [2,] "Stress-900-1-0.6" "Stress-950-1-0.6" "Stress-1000-1-0.6"
#> [,4] [,5] [,6]
#> [1,] "Strain-1050-1-0.6" "Strain-1100-1-0.6" "Strain-1150-1-0.6"
#> [2,] "Stress-1050-1-0.6" "Stress-1100-1-0.6" "Stress-1150-1-0.6"
#> [,7]
#> [1,] "Strain-1200-1-0.6"
#> [2,] "Stress-1200-1-0.6"
Because the regularized double list, vector binary tree and tensor (array) possess unique mapping relationships, a regularized double list is necessary for correct index setting:
regdl <- vbt2dl(vbt)
print(regdl)
#> [[1]]
#> [1] "Strain" "Stress"
#>
#> [[2]]
#> [1] "900" "950" "1000" "1050" "1100" "1150" "1200"
#>
#> [[3]]
#> [1] "0.001" "0.01" "0.1" "1"
#>
#> [[4]]
#> [1] "0.6"
#>
#> attr(,"class")
#> [1] "Double.List"
It can be seen that temperatures were save in layer 2 with 7 levels while strain rates were save in layer 3 with 4 levels. Array’s methods are available now. For example, if we want to ‘Stress’ data (1st layer1, 2nd level) and make traversal in all temperature conditions with fixed 0.01 strain rate (3rd layer, 2nd level), execute the folloing code:
subset1 <- datatest[, arr[2,,2,1]]
head(subset1)
#> Stress-900-0.01-0.6 Stress-950-0.01-0.6 Stress-1000-0.01-0.6
#> 1 3.37 2.39 3.65
#> 2 3.79 2.81 3.65
#> 3 4.07 2.67 3.65
#> 4 4.49 2.95 3.65
#> 5 5.05 3.23 3.93
#> 6 5.61 3.37 3.93
#> Stress-1050-0.01-0.6 Stress-1100-0.01-0.6 Stress-1150-0.01-0.6
#> 1 2.670 0.000 3.091
#> 2 2.669 2.951 3.231
#> 3 2.949 3.370 3.370
#> 4 3.229 3.369 3.509
#> 5 3.509 3.508 3.788
#> 6 3.509 3.367 4.067
#> Stress-1200-0.01-0.6
#> 1 3.231
#> 2 3.511
#> 3 3.650
#> 4 3.790
#> 5 4.069
#> 6 4.208
If we want to automatically plot the Stress-Strain plot with fixed temperature (1050 for example, in 2nd layer, 4th level), traverse all strain rate conditions, try the following code:
xbatch <- arr[1,4,,1]
ybatch <- arr[2,4,,1]
regdl <- arr2dl(arr)
rpt <- length(xbatch)
i <- 1
for (i in 1:rpt) {
plt <- plot(datatest[,xbatch[i]], datatest[,ybatch[i]], xlab="Strain", ylab="Stress", main=paste("in T=1050, SR=",regdl[[3]][i], sep = ""))
plt
}
The methods through tensor are the same as that of array.
If we need highly customized condition select, for example I need make traversal in the temperature range from 1000 to 1150, with 0.01 and 1 two strain rate conditions, to make the Stress-Strain plot, the vector binary tree will make sense. It supports the visit through a handmade double list which can be highly customized. Firstly let us have a look at the appearance of the full vector binary tree:
print(vbt)
#> $tree
#> $tree[[1]]
#> [1] "Strain" "Stress"
#>
#> $tree[[2]]
#> $tree[[2]][[1]]
#> [1] "900" "950" "1000" "1050" "1100" "1150" "1200"
#>
#> $tree[[2]][[2]]
#> $tree[[2]][[2]][[1]]
#> [1] "0.001" "0.01" "0.1" "1"
#>
#> $tree[[2]][[2]][[2]]
#> $tree[[2]][[2]][[2]][[1]]
#> [1] "0.6"
#>
#> $tree[[2]][[2]][[2]][[2]]
#> list()
#>
#>
#>
#>
#>
#> $dims
#> [1] 2 7 4 1
#>
#> attr(,"class")
#> [1] "Vector.Binary.Tree"
Well, the desired elements locate from 3rd to 7th in layer 2, the 2nd and 4th in layer 3. We can made two double list to specify and extract the desired Stress and Strain subsets. The demonstration is:
subStrain_dl <- list(1, c(3:7), c(2,4), 1)
subStress_dl <- list(2, c(3:7), c(2,4), 1)
# make visiting from original vector binary
# tree and save them as new doube lists:
subStrain_dl2 <- advbtinq(vbt, subStrain_dl)
subStress_dl2 <- advbtinq(vbt, subStress_dl)
print(subStrain_dl2)
#> [[1]]
#> [1] "Strain"
#>
#> [[2]]
#> [1] "1000" "1050" "1100" "1150" "1200"
#>
#> [[3]]
#> [1] "0.01" "1"
#>
#> [[4]]
#> [1] "0.6"
#>
#> attr(,"class")
#> [1] "Double.List"
print(subStress_dl2)
#> [[1]]
#> [1] "Stress"
#>
#> [[2]]
#> [1] "1000" "1050" "1100" "1150" "1200"
#>
#> [[3]]
#> [1] "0.01" "1"
#>
#> [[4]]
#> [1] "0.6"
#>
#> attr(,"class")
#> [1] "Double.List"
xbatch2 <- as.vector(dl2arr(subStrain_dl2))
ybatch2 <- as.vector(dl2arr(subStress_dl2))
print(xbatch2)
#> [1] "Strain-1000-0.01-0.6" "Strain-1050-0.01-0.6" "Strain-1100-0.01-0.6"
#> [4] "Strain-1150-0.01-0.6" "Strain-1200-0.01-0.6" "Strain-1000-1-0.6"
#> [7] "Strain-1050-1-0.6" "Strain-1100-1-0.6" "Strain-1150-1-0.6"
#> [10] "Strain-1200-1-0.6"
print(ybatch2)
#> [1] "Stress-1000-0.01-0.6" "Stress-1050-0.01-0.6" "Stress-1100-0.01-0.6"
#> [4] "Stress-1150-0.01-0.6" "Stress-1200-0.01-0.6" "Stress-1000-1-0.6"
#> [7] "Stress-1050-1-0.6" "Stress-1100-1-0.6" "Stress-1150-1-0.6"
#> [10] "Stress-1200-1-0.6"
Their respective order matched perfectly. The next step is similar as what we done in previous section:
It is commonly said that R performs relative low speed compared to other popular programming languages, espcially in the situations of frequent data operations such as melt and reshape. In my opinion, an efficient logic for data management is more important rather than some amazing skills in data treatment. Although all the demos I showed from beginning to end never do any melt, bind or reshape operations on original data, but data batch processing is still can be implemented.
Lets check all object sizes we used:
# For original data:
object.size(datatest)
#> 31072 bytes
# For tensor and array:
object.size(ts)
#> 6400 bytes
object.size(arr)
#> 5568 bytes
# For vector binary tree:
object.size(vbt)
#> 2080 bytes
# For double list:
object.size(regdl)
#> 1408 bytes
I packaged the datatest
in VBTree only used first 50
rows only for demonstration. In fact, it has the scales far more than 50
rows. All these data can be structurized managed throguh VBTree, using a
only 1408 bytes object minimally.