Multiple Writers Cross Validation

From SeedWiki

Jump to: navigation, search

Contents

Our Data

We collect 10 examples of lowercase English letters from 10 different writers. Total of ~2600 examples.

CV Method

Train on data from 9 writers and test on 1 writer.

Model Number of derivatives 10-fold CV Accuracy
1-vs-rest 1-vs-rest (using prediction set) 1-vs-1 (majority vote)
Single branch 20-state HMM + linear SVMs 161 75.37% (+/- 11.07) 79.18% (+/- 10.22) 80.42% (+/- 9.41)
Single branch (14,6)-state HMM + linear SVMs 147 69.83% (+/- 11.94) 73.41% (+/- 12.33) 75.22% (+/- 9.90)
2-branch (7,3)-state HMM + linear SVMs 99 61.35% (+/- 8.67) 70.98% (+/- 10.09) 68.40 (+/- 8.88)
Single branch 30-state HMM + linear SVMs 215 74.71% (+/- 10.87) 77.45% (+/- 10.81) 78.45% (+/- 10.13)
Single branch (20,10)-state HMM + linear SVMs 189 74.49% (+/- 9.86) 77.76% (+/- 9.0) 78.88% (+/- 9.48)
2-branch (15,15)-state HMM + linear SVMs 159 69.17% (+/- 10.44) 74.64% (+/- 10.45) 74.52% (+/- 9.76)
Single branch 10-state HMM + linear SVMs 50 68.36% (+/- 11.19) 75.23% (+/- 12.27) 74.79% (+/- 10.50)
Single branch 15-state HMM + linear SVMs 86 72.86% (+/- 11.00) 76.94% (+/- 10.97) 75.99% (+/- 11.19)

Notes

  • 1-vs-rest = Always makes a prediction using the highest SVM score.
  • 1-vs-rest (using prediction set) = prediction set contain those labels whose SVM scores are in the range of max(svm scores) - 0.5. Mistakes occur only when the prediction set doesn't contain the true label.
  • 1-vs-1 = Instead of building N SVMs, this method builds N(N-1)/2 SVMs and the winning class is the one that gets the majority vote.

Q & A

  • What is the structure of the HMM?
    • Single branch N-state is a N-state left-to-right (w/ skipping) HMM. This type of HMM has only 1 pen-up state.
    • Single branch (M,N)-state is basically the two single branch HMMs with M state and N states joined together. There are two pen-up states in this type of HMM.
    • 2-branch (M,N)-state is a conjunction of two single branch (M,N)-state HMMs. Each branch are trained independently using disjoint set of training data.

--Yoavfreund 12:03, 25 January 2012 (PST) From the test results the single branch 20 states is the one to use.

  • Did you retrain the HMM for each train/test fold or did you just retrain the SVM?
    • Initially I did retrain the HMM for each train/test fold. However, doing so was very computationally expensive. Each fold took about 4 hours to complete. To get some initial results up quickly, I decided to train HMMs using all data and use them throughout the experiment.

--Yoavfreund 12:03, 25 January 2012 (PST) I think it makes sense, for our project to train the HMM using all of the writers and then train different SVM's for each writer. The way you did it here (leaving one writer out) makes sense in terms of generalization to new writers. It would also be interesting to train the SVM using a combination of the data for the writer and the other writers and then test it on the same writer in a different session.

  • What is the distribution of error rates among the 10 users/folds, are some people's handwriting significantly easier/harder than others?
Single branch 20-state HMM Single branch (14,6)-state HMM 2-branch (7,3)-state HMM
Cv score dist1.jpg Cv score dist2.jpg Cv score dist3.jpg

--Yoavfreund 12:12, 25 January 2012 (PST) Writer 5 is clearly the worst, is that me? More to the point, it would be interesting to see if we can get most of the writers to the competitive 90% level by training the SVM for the particular writer and by teaching the writer how to write confusable letters in a way that they will not be confused.

  • What is the confusion matrix like? Are there more/less confusable character pairs?
    • Here is the confusion matrix from all folds using the 1-vs-rest single branch 20-state HMM + linear SVMs.
a b c d e f g h i j k l m n o p q r s t u v w x y z
a 67 0 0 17 2 2 1 0 0 0 2 0 0 0 0 0 5 0 0 0 1 0 1 0 0 2
b 0 78 0 0 0 0 0 1 1 1 0 0 0 0 0 16 0 2 0 0 0 0 0 0 1 0
c 1 0 83 0 4 2 1 0 0 0 1 0 0 0 0 0 6 0 1 0 0 0 0 1 0 0
d 19 0 3 61 0 0 1 0 0 0 8 0 0 0 0 0 3 0 1 0 2 0 1 0 0 0
e 0 0 8 2 83 3 0 0 0 0 0 0 0 0 2 0 1 0 0 0 1 0 0 0 0 0
f 1 1 1 1 0 65 3 1 3 0 5 0 0 0 1 1 0 1 1 9 0 1 1 0 3 1
g 0 0 0 0 0 0 81 0 1 0 0 0 0 0 0 0 6 0 6 0 0 0 0 0 5 0
h 0 0 0 1 0 0 0 56 0 0 2 0 0 35 0 2 0 2 0 0 0 0 1 1 0 0
i 0 0 1 0 1 4 0 3 55 1 6 13 2 0 0 2 3 0 0 6 0 0 2 1 0 0
j 0 0 0 1 0 4 0 0 2 74 1 4 0 0 0 0 2 0 0 4 0 0 5 0 3 0
k 4 0 0 13 0 0 0 1 4 0 58 1 0 1 0 1 2 2 0 5 0 0 5 0 1 1
l 0 0 0 0 0 4 0 0 6 0 0 84 0 0 0 1 0 0 0 0 0 5 0 0 0 0
m 0 0 0 0 0 0 0 0 2 0 6 0 90 2 0 0 0 0 0 0 0 0 0 0 0 0
n 0 0 0 1 0 0 0 20 0 0 1 0 0 70 0 2 0 1 1 1 2 0 1 0 0 0
o 2 0 2 3 2 0 0 0 0 0 0 0 0 1 87 0 2 0 0 0 0 1 0 0 0 0
p 0 13 0 0 0 1 0 1 0 0 0 0 0 0 0 76 0 2 4 1 0 1 0 1 0 0
q 11 0 0 5 0 1 0 0 0 0 0 0 0 0 1 0 76 1 0 0 1 2 1 1 0 0
r 0 1 0 0 0 0 0 6 1 0 1 0 0 3 0 1 8 72 0 3 0 4 0 0 0 0
s 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 2 95 0 0 0 0 0 0 0
t 0 0 0 0 0 6 0 0 12 0 1 0 0 0 0 1 2 3 0 68 0 0 0 0 0 7
u 1 0 0 2 1 0 0 0 0 0 0 0 0 5 0 0 2 0 0 0 85 0 3 0 1 0
v 1 0 0 0 1 0 0 0 1 0 0 2 0 1 1 2 0 6 0 0 0 73 2 10 0 0
w 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 5 3 86 1 0 0
x 0 0 0 2 2 1 0 0 0 0 1 2 0 0 0 1 0 0 1 0 0 14 2 68 0 6
y 0 1 1 1 0 2 3 0 3 1 7 0 0 4 0 2 0 0 0 0 2 1 0 0 72 0
z 1 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 95

--Yoavfreund 12:12, 25 January 2012 (PST) I am happy with this confusion table, n/h and a/d are clearly confusable. This can be improved in the 1-1 scheme by putting more weight on the pairs that are confusable. It can also guide us how to teach the user and collect discriminating information (type "nhnhnhnh" or "adadadad").

  • What are the accuracy rates reported for handwriting recognition in the literature?
UNIPEN lowercase English Accuracy
QP Gaussian (Kumara et al, 2008) 89.22
CSDTW (Bahlmann and Burkhardt, 2003) 90.7
DAG-SVM-GDTW (Bahlmann et al, 2022) 87.9
HMM (Hu et al, 2000) 85.9
OnSNT (Ratzlaff, 2003) 92.1
kNN (Booth et all, 2010) 85.19

--Yoavfreund 12:12, 25 January 2012 (PST) How do they measure accuracy? How do they split the data between train and test?

--Sunsern Cheamanunkul 13:18, 25 January 2012 (PST) Most of the accuracy reported in the literature are measured by using pre-split train/test from the dataset provider. In this case, the dataset is UNIPEN. Note that the accuracy that I put up here is from writer-independent experiments. That means the same classifier was used for every user.

UJIpenchar2

We extend the experiment to lowercase English letters from UJIpenchar2. The data were collected from 60 different writers, 2 examples of each letter.

CV Method

Train on data from 54 writers and test on the other 6 writers.

Model Number of derivatives 10-fold CV Accuracy
1-vs-rest 1-vs-rest (using prediction set) 1-vs-1
Single branch 20-state HMM + linear SVMs 161 68.78% (+/- 3.71) 72.05% (+/- 3.51) 87.05% (+/- 3.64)
Single branch 30-state HMM + linear SVMs 211 74.20% (+/- 3.58) 76.99% (+/- 3.34) 78.14% (+/- 3.61)

Notes

  • 1-vs-rest = Always makes a prediction using the highest SVM score.
  • 1-vs-rest (using prediction set) = prediction set contain those labels whose SVM scores are in the range of max(svm scores) - 0.5. Mistakes occur only when the prediction set doesn't contain the true label.
  • 1-vs-1 = Instead of building N SVMs, this method builds N(N-1)/2 SVMs and the winning class is the one that gets the majority vote.
Personal tools
projects