A Statistical Approach to the Composition of the Pentateuch

Yesterday, I came across a few articles by Israeli computer scientists Navot Akiva and Moshe Koppel, which take an empirical, statistical approach to examining the composition of multi-author works, like the Pentateuch. Their methodology is as follows:

They developed an algorithm to divide a multi-author text into the portions written by each individual. Then, they combined text from different, single-author, biblical works–Jeremiah and Ezekiel in one paper; Jeremiah, Ezekiel, Isaiah 1-33, Proverbs, and Job 3-41 in another–in order to test their algorithm (a process which sets their work apart from previous attempts at computer-based/statistical source analysis of the Pentateuch). Their algorithm performed quite well, correctly identifying the sources of each chunk of their artificial multi-author text.

Finally, they analyzed Genesis, Exodus, Leviticus, and Numbers with the same test process. They test for only two authors (that is, to test for a distinction between P and non-P), for which

We find that our split corresponds to the expert consensus regarding P and non-P for over 90% of the verses in the Pentateuch for which such consensus exists. We have thus been able to largely recapitulate several centuries of painstaking manual labor with our automated method. (Koppel at al. 2011: 1363).

Therefore, assuming that P and non-P are the two major sources of Genesis-Numbers–which they acknowledge as a limitation to their study–Akiva and Koppel have provided independent, empirical verification of the fruits of modern Pentateuchal source criticism.

Lastly, they note that “we offer those instances in which we disagree with the consensus for the consideration of scholars in the field” (Koppel at al. 2011: 1363), though I have not found a publication where these differences are given; I, for one, would be very interested to see their exact results.


Moshe Koppel, Navot Akiva, Idan Dershowitz, Nachum Dershowitz, “Unsupervised Decomposition of a Document into Authorial Components,” pages 1356-1364 in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Stroudsburg, PA: Association for Computational Linguistics, 2011).

Navot Akiva and Moshe Koppel, “Identifying Distinct Components of a Multi-Author Document,” pages 205-209 in Proceedings of the Intelligence and Security Informatics Conference (EISIC) (Piscataway, NJ: IEEE, 2012).

Navot Akiva and Moshe Koppel, “A Generic Unsupervised Method for Decomposing Multi-Author Documents,” Journal of the American Society for Information Science and Technology 64 (2013): 2256-2264.


