Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D3.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-43] Deeper learning are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [45-178] We present a residual learning framework for easing the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [328-493] We also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [495-647] On the ImageNet dataset we evaluate residual networks with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [731-801] This result won the first place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.
Doc_A: [948-1071] Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_B: [955-1071] Due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_A: [1073-1293] Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Doc_B: [1073-1299] Deep residual networks are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the first places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D4.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-43] Deeper learning are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [45-178] We present a residual learning framework for easing the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [328-493] We also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [495-647] On the ImageNet dataset we evaluate residual networks with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [731-801] This result won the first place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.
Doc_A: [948-1071] Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_B: [955-1071] Due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_A: [1073-1293] Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Doc_B: [1073-1299] Deep residual networks are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the first places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Doc_A has 100% similarity with Doc_B.
Doc_B has 57.8947% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D5.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-50] Deeper neural networks are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [859-946] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 81.8182% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D6.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-43] Deeper learning are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [45-178] We present a residual learning framework for easing the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [328-493] We also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [495-647] On the ImageNet dataset we evaluate residual networks with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [731-801] This result won the first place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 81.8182% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D7.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-43] Deeper learning are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [45-178] We present a residual learning framework for easing the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [328-493] We also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [495-647] On the ImageNet dataset we evaluate residual networks with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [731-801] This result won the first place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.
Doc_A: [948-1071] Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_B: [955-1071] Due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_A: [1073-1293] Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Doc_B: [1073-1299] Deep residual networks are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the first places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Doc_A has 100% similarity with Doc_B.
Doc_B has 57.8947% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D8.txt

Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [647-741] An ensemble of these residual networks reaches an error rate of 3.57% on the ImageNet test set.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [820-901] We also analyze performance on 
CIFAR-10 with networks having 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [903-983] The depth of representation is crucial for a variety of visual recognition tasks.
Doc_A: [948-1071] Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset.
Doc_B: [985-1105] Thanks to the extremely deep representations, we achieve a 28% relative improvement on the COCO object detection dataset.
Doc_A: [1073-1293] Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Doc_B: [1107-1334] These deep residual networks are the foundation of our submissions to the ILSVRC & COCO 2015 competitions, where we also took first place in ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation tasks.

Doc_A has 45.4545% similarity with Doc_B.
Doc_B has 45.4545% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D3.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-170] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, which is based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [172-418] The method is straightforward to implement, is also computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems which are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D4.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-170] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, which is based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [172-418] The method is straightforward to implement, is also computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems which are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D5.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D6.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-170] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, which is based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [172-418] The method is straightforward to implement, is also computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems which are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D7.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-170] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, which is based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [172-418] The method is straightforward to implement, is also computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems which are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D8.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-171] We present Adam, an algorithm designed for first-order gradient-based optimization of stochastic objective functions, 
leveraging adaptive estimates of lower-order moments.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [677-898] Theoretical analysis of the algorithm’s convergence 
properties is also provided, along with a regret bound for the convergence rate, which matches the best-known results 
within the online convex optimization framework.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1022-1095] Lastly, we introduce AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 37.5% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D3.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-170] We trained a large and deep convolutional neural network to classify 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [314-544] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax layer.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [546-666] To make training faster, we used non-saturating neurons and an efficient GPU implementation of the convolution operation.
Doc_A: [667-827] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.
Doc_B: [668-833] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be extremely effective.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D4.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-170] We trained a large and deep convolutional neural network to classify 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [314-544] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax layer.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [546-666] To make training faster, we used non-saturating neurons and an efficient GPU implementation of the convolution operation.
Doc_A: [667-827] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.
Doc_B: [668-833] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be extremely effective.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 100% similarity with Doc_B.
Doc_B has 42.8571% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D5.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.

Doc_A has 66.6667% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D6.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-170] We trained a large and deep convolutional neural network to classify 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [314-544] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax layer.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [546-666] To make training faster, we used non-saturating neurons and an efficient GPU implementation of the convolution operation.

Doc_A has 66.6667% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D7.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-170] We trained a large and deep convolutional neural network to classify 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [314-544] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax layer.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [546-666] To make training faster, we used non-saturating neurons and an efficient GPU implementation of the convolution operation.
Doc_A: [667-827] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.
Doc_B: [668-833] To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be extremely effective.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 100% similarity with Doc_B.
Doc_B has 42.8571% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D8.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-175] We trained a large, deep convolutional neural network to categorize the 1.2 million high-resolution images 
from the ImageNet LSVRC-2010 competition into 1000 distinct classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [177-317] On the test set, we achieved top-1 
and top-5 error rates of 37.5% and 17.0%, significantly surpassing the previous state-of-the-art results.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [319-529] The 
network, with 60 million parameters and 650,000 neurons, is made up of five convolutional layers, some 
followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax output.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [827-1010] We also submitted a variant of this 
model in the ILSVRC-2012 competition, achieving a top-5 test error rate of 15.3%, outperforming the 
second-place entry, which had a rate of 26.2%.

Doc_A has 66.6667% similarity with Doc_B.
Doc_B has 66.6667% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D3.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-199] Random forests are a combination of small tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [201-309] The generalization error for forests converges to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [590-743] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise data.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.
Doc_A: [964-1009] These ideas are also applicable to regression.
Doc_B: [970-1021] These ideas are also applicable to regression tasks.

Doc_A has 90% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D4.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-199] Random forests are a combination of small tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [201-309] The generalization error for forests converges to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [590-743] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise data.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.
Doc_A: [964-1009] These ideas are also applicable to regression.
Doc_B: [970-1021] These ideas are also applicable to regression tasks.

Doc_A has 90% similarity with Doc_B.
Doc_B has 52.9412% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D5.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [195-245] The generalization error for forests converges a.s.
Doc_B: [195-245] The generalization error for forests converges a.s.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [577-587] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.

Doc_A has 70% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D6.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-199] Random forests are a combination of small tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [201-309] The generalization error for forests converges to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [590-743] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise data.

Doc_A has 60% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D7.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-199] Random forests are a combination of small tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [201-309] The generalization error for forests converges to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [590-743] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise data.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.
Doc_A: [964-1009] These ideas are also applicable to regression.
Doc_B: [970-1021] These ideas are also applicable to regression tasks.

Doc_A has 90% similarity with Doc_B.
Doc_B has 52.9412% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D8.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-188] Random forests are ensembles of decision trees where each tree is based on values 
drawn from a random vector, independently sampled with the same distribution for 
all trees in the forest.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [190-304] As the number of trees increases, the generalization error of the 
forest converges almost surely to a fixed limit.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [190-304] As the number of trees increases, the generalization error of the 
forest converges almost surely to a fixed limit.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [306-449] The generalization error of a forest of 
tree classifiers is influenced by both the individual tree strengths and the correlation 
between them.
Doc_A: [577-587] Freund & R.
Doc_B: [582-592] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [594-733] Schapire, Machine 
Learning: Proceedings of the Thirteenth International Conference, ***, 148–156), but 
they are more resilient to noise.

Doc_A has 60% similarity with Doc_B.
Doc_B has 66.6667% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D3.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-138] In this work we investigate the effect of the convolutional neural network depth on its accuracy in the large-scale image recognition task.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [420-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team won the first and the second places in the localisation and classification tracks respectively.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_A: [722-884] We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
Doc_B: [722-905] We have made our two best-performing convolutional neural network models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D4.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-138] In this work we investigate the effect of the convolutional neural network depth on its accuracy in the large-scale image recognition task.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [420-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team won the first and the second places in the localisation and classification tracks respectively.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_A: [722-884] We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
Doc_B: [722-905] We have made our two best-performing convolutional neural network models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Doc_A has 100% similarity with Doc_B.
Doc_B has 38.4615% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D5.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.

Doc_A has 60% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D6.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-138] In this work we investigate the effect of the convolutional neural network depth on its accuracy in the large-scale image recognition task.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [420-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team won the first and the second places in the localisation and classification tracks respectively.

Doc_A has 60% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D7.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-138] In this work we investigate the effect of the convolutional neural network depth on its accuracy in the large-scale image recognition task.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [420-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team won the first and the second places in the localisation and classification tracks respectively.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_A: [722-884] We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
Doc_B: [722-905] We have made our two best-performing convolutional neural network models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Doc_A has 100% similarity with Doc_B.
Doc_B has 38.4615% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D8.txt

Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [421-613] These insights formed the foundation for our 
submission to the ImageNet Challenge 2014, where our team won first and second places in 
the localization and classification tracks, respectively.
Doc_A: [722-884] We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
Doc_B: [776-937] To support further research on the use of deep visual representations in computer 
vision, we have made our top two performing ConvNet models publicly accessible.

Doc_A has 40% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D3.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-142] Scikit-learn is a Python library integrating a wide range of machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_A: [498-597] Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Doc_B: [482-585] The source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D4.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-142] Scikit-learn is a Python library integrating a wide range of machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_A: [498-597] Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Doc_B: [482-585] The source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

Doc_A has 100% similarity with Doc_B.
Doc_B has 38.4615% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D5.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.

Doc_A has 40% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D6.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-142] Scikit-learn is a Python library integrating a wide range of machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.

Doc_A has 40% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D7.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-142] Scikit-learn is a Python library integrating a wide range of machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_A: [498-597] Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Doc_B: [482-585] The source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

Doc_A has 100% similarity with Doc_B.
Doc_B has 38.4615% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D8.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-160] Scikit-learn is a Python library that combines a broad selection of cutting-edge 
machine learning algorithms for medium-scale supervised and unsupervised tasks.

Doc_A has 20% similarity with Doc_B.
Doc_B has 20% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D3.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-152] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks which include an encoder and decoder component.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [154-244] The best performing models also connect the encoder and decoder via an attention mechanism.
Doc_A: [239-392] We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_B: [246-394] We propose a new simple network architecture called Transformer, based on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_A: [394-561] Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Doc_B: [396-564] Experiments on two machine translation tasks show that Transformer is superior in quality while being more parallelizable and requiring significantly less time to train.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D4.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-152] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks which include an encoder and decoder component.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [154-244] The best performing models also connect the encoder and decoder via an attention mechanism.
Doc_A: [239-392] We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_B: [246-394] We propose a new simple network architecture called Transformer, based on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_A: [394-561] Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Doc_B: [396-564] Experiments on two machine translation tasks show that Transformer is superior in quality while being more parallelizable and requiring significantly less time to train.

Doc_A has 100% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D5.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.

Doc_A has 50% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D6.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-152] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks which include an encoder and decoder component.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [154-244] The best performing models also connect the encoder and decoder via an attention mechanism.

Doc_A has 50% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D7.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-152] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks which include an encoder and decoder component.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [154-244] The best performing models also connect the encoder and decoder via an attention mechanism.
Doc_A: [239-392] We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_B: [246-394] We propose a new simple network architecture called Transformer, based on attention mechanisms, dispensing with recurrence and convolutions entirely.
Doc_A: [394-561] Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Doc_B: [396-564] Experiments on two machine translation tasks show that Transformer is superior in quality while being more parallelizable and requiring significantly less time to train.

Doc_A has 100% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D8.txt

Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [127-228] The most effective models also incorporate 
an attention mechanism to connect the encoder and decoder.

Doc_A has 25% similarity with Doc_B.
Doc_B has 25% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D3.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-88] The support-vector network is a new learning model for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [90-230] The model conceptually implements the following idea: input vectors are non-linearly functions mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [296-396] Special properties of the decision surface ensures high generalization ability of the learning model.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D4.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-88] The support-vector network is a new learning model for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [90-230] The model conceptually implements the following idea: input vectors are non-linearly functions mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [296-396] Special properties of the decision surface ensures high generalization ability of the learning model.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D5.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [226-288] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D6.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-88] The support-vector network is a new learning model for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [90-230] The model conceptually implements the following idea: input vectors are non-linearly functions mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [296-396] Special properties of the decision surface ensures high generalization ability of the learning model.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D7.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-88] The support-vector network is a new learning model for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [90-230] The model conceptually implements the following idea: input vectors are non-linearly functions mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [296-396] Special properties of the decision surface ensures high generalization ability of the learning model.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D8.txt

Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [294-420] The unique 
characteristics of this decision boundary contribute to the strong generalization ability of the learning 
machine.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [603-719] The high 
generalization capability of support-vector networks with polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [722-914] Additionally, we compare the performance of the support-vector network with various classical learning 
algorithms, all of which were part of a benchmark study on Optical Character Recognition.

Doc_A has 37.5% similarity with Doc_B.
Doc_B has 42.8571% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D3.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-307] We propose a new framework for estimating generative models via an adversarial process, where we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability which is a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [600-717] In the case where G and D are defined by multilayer perceptrons, the entire model can be trained with backpropagation.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D4.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-307] We propose a new framework for estimating generative models via an adversarial process, where we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability which is a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [600-717] In the case where G and D are defined by multilayer perceptrons, the entire model can be trained with backpropagation.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 100% similarity with Doc_B.
Doc_B has 46.6667% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D5.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [391-446] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D6.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-307] We propose a new framework for estimating generative models via an adversarial process, where we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability which is a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] the training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] this framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [600-717] In the case where G and D are defined by multilayer perceptrons, the entire model can be trained with backpropagation.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D7.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-307] We propose a new framework for estimating generative models via an adversarial process, where we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability which is a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [600-717] In the case where G and D are defined by multilayer perceptrons, the entire model can be trained with backpropagation.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 100% similarity with Doc_B.
Doc_B has 46.6667% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D8.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-324] We introduce a new approach for estimating generative models through an adversarial process,
where two models are trained simultaneously: a generative model G that captures the data distribution, 
and a discriminative model D that estimates the probability that a sample originates from the training 
data rather than from G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [326-405] The training objective for G is to maximize the likelihood of D making an error.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [461-610] In the space of arbitrary functions for G and D, 
a unique solution exists, where G recovers the true data distribution, and D outputs 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [612-724] When 
both G and D are modeled by multilayer perceptrons, the entire system can be trained using backpropagation.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [727-844] No Markov chains or unrolled approximate inference networks are required during either training or sample 
generation.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [846-995] Experimental results validate the framework, showcasing its potential through both qualitative 
and quantitative assessments of the generated samples.

Doc_A has 85.7143% similarity with Doc_B.
Doc_B has 85.7143% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D3.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances such as SPPnet and Fast R-CNN reduced the execution time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-684] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection task.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [788-1038] For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_B: [793-1042] For the very deep VGG-16 model, our detection model has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D4.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances such as SPPnet and Fast R-CNN reduced the execution time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-684] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection task.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [788-1038] For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_B: [793-1042] For the very deep VGG-16 model, our detection model has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D5.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.

Doc_A has 75% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D6.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances such as SPPnet and Fast R-CNN reduced the execution time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-684] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection task.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.

Doc_A has 75% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D7.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances such as SPPnet and Fast R-CNN reduced the execution time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-684] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection task.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [788-1038] For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_B: [793-1042] For the very deep VGG-16 model, our detection model has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 100% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D8.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-99] Modern object detection networks rely on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [102-271] Techniques such as SPPnet and Fast R-CNN have significantly reduced the runtime of these networks, 
revealing the region proposal computation as a performance bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [273-448] In this paper, we present a 
Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, 
allowing for nearly free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [450-578] An RPN is a fully convolutional network that simultaneously 
predicts both object bounds and objectness scores for each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [580-718] RPNs are trained in an end-to-end 
manner to produce high-quality region proposals, which are then used by Fast R-CNN for object detection.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [721-847] By using a simple alternating optimization process, RPN and Fast R-CNN can be jointly trained to share 
convolutional features.
Doc_A: [788-1038] For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_B: [849-1116] For the deep VGG-16 model, our detection system achieves a frame rate of 
5fps (including all processing steps) on a GPU, while delivering state-of-the-art object detection 
performance on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1119-1186] The code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 100% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


