Civil and criminal cases provide direct clues about the relatively serious conflicts in a human society. These conflicts were not resolvable privately or easily via mediation, so were resorted to the legal system. The judgment documents publicized by the courts offer opportunities for us to analyze the main causes of the conflicts, and we focus on the civil disputes in this study.
The analysis of civil cases and the resulting observations are relevant to multiple social issues. Researchers studied the lawsuits for litigation strategies (Huang et al., 2010). One may study the previous cases to understand whether mediation will be a better choice than litigation (Anderson et al., 2018). Analogous research procedure that relied on the Chinese local gazetteers may help us investigate the social conflicts in the history (Li, 2000).
We aim at understanding the main causes that led to the civil disputes, and report preliminary results of analyzing the cases of labor and employment (
L&E;, henceforth) litigations and of family support (
FS, henceforth) litigations. This line of work is possible because the judicial law in Taiwan requires the courts to publicize their judgment documents except for special cases with specific concerns and because we can obtain and analyze these open data with computing techniques.
We show the main steps of our work in Figure 1.
Figure 1. Main steps of our work
Court judgment documents
We may download the court judgment documents from the official website that is maintained by the Judicial Yuan which governs the courts in Taiwan.
1 For the L&E; litigations, we use 3524 cases of between 2007 and 2020. There are three tiers of courts, and we use only the documents of the lowest tier, i.e., the local courts. For the FS litigations, we use 1223 cases of between 2014 and 2020 and also of the local courts.
There were a lot more relevant litigations than the number of documents that we use in the current studies. We chose documents of the local courts because the documents usually provide more preliminary and direct information about the disputes. In addition, only a portion of the documents for the L&E; litigations contain standard segments that should record a summary about the specific issues that the plaintiffs and the defendants were contesting. Without such specific summaries, it is not easy to identify the issues being contested, which were recorded in typically long judgment documents. Hence, we chose only 3524 instances that provide such summaries. Notice that, even if we have the summaries, it is not easy for computer programs to “understand” the statements. The statements were written for individual cases in the form of natural language (not keywords, for instance), and the wordings often differ for conflicts of the same type.
The documents for the FS litigations may have paragraphs that provide information about the reasons that the plaintiffs were seeking for financial assistance from the defendants and about the reasons that the defendants did not agree with the requests. We employed a keywords-based approach to identify 1223 documents that have such paragraphs for the current studies. Similar to what we discussed above, reasons of the same type might be written in very different styles in the documents.
Clustering for dispute identification
While we selected the documents with specific sections, we also extracted the statements within those sections. As we just explained above, we believe that these statements described the disputes in question. We tried some different ways to preprocess the extracted statements, which consisted of typical steps for natural language processing, but we could not explain them clearly in this abstract. We obtained more than 17000 statements and 1223 paragraphs for the L&E; and FS litigations, respectively.
We then converted the statements and paragraphs to vectors using both the typical TF-IDF vector-space-model approach (Croft et al., 2010) and the SBERT pretrained model (Reimers and Gurevych, 2019). In computer science, we hope that the vectors somehow represent the semantics of the original statements.
We apply and hope that clustering (Alpaydin, 2020) the vectors of statements would lead us to identify different types of issues. For this step, we explored the well-known
2 and the FAISS method of Facebook (Johnson et al., 2017). In essence, we are applying the concept of topic modeling with tools of our choice (Blei, 2010a, 2010b).
Observations and discussion
The results of topic modeling can be useful if we inspect and interpret the results with some appropriate principles (Ramage et al., 2009; Sievert and Shirley, 2014). For the L&E; litigations, we could identify interesting topics that were indicated by the statements about issues in individual clusters. Inspection by human experts certainly takes time, but that is still more efficient than reading the complete judgment documents over the years directly. The disputes may be caused by different types of compensation/payment problems. Some examples follow.
for retirement benefits
for unlawful or debatable layoff
for body injuries or fatality during worktime
for the salaries and the late-night meals as a result of overtime work
Some clusters contain statements about non-monetary issues or for special types of labor force.
the interpretation of the non-compete clause
the time for paid leave
the disputes for sailors
Once we identify these topics, we can find the lawsuits that share the same or similar combinations of disputes. Since we know when and which local courts handled these litigations, we can analyze and visualize such similar cases spatiotemporally, offering a foundation for social studies. We can also build focused databases by collecting information about similar litigations, thus providing references for future judgments.
In contrast, our current achievements for the FS litigations were less impressive. Our clustering approach could find that the involvement of the parents of the couple that was fighting for divorce. We could also algorithmically identify cases in which adult children battled for fair shares for the support of their retired parents. The issues behind FS litigations often consist of complex and mixed daily problems, and our current algorithms could not differentiate the core disputes precisely yet. We report the relatively poor results for the FS litigations to contrast our promising accomplishments for the L&E; litigations in this proposal, and we certainly will continue to refine our approach for the FS litigations.
We are thankful to the reviewers, and shall provide more technical details about our work in the oral presentation as requested. This work was supported in part by the Ministry of Science and Technology of Taiwan.
The website of the Judicial Yuan is located at
The software tools were implemented in scikit-learn and accessible at
Alpaydin, E. (2020).
Introduction to Machine Learning, fourth edition, chapter seven, The MIT Press. (Clustering, a type of machine learning technique, aims to put similar data into a cluster. Given a collection of data, clustering may assign individual datum into one of a selected number of clusters.)
Anderson, D. Q., Chua, E. and My, N. T. (2018). How Should the Courts Know Whether Dispute Is Ready and Suitable for Mediation: An Empirical Analysis of The Singapore Courts' Referral of Civil Disputes to Mediation,
Harvard Negotiation Law Review, 23(2):265‒318.
Blei, D. M. (2012a). Probabilistic Topic Models,
Communications of the ACM, 55(4):77‒84.
Blei, D. M. (2012b). Topic Modeling and Digital Humanities,
Journal of Digital Humanities, 2(1).
Croft, W. B., Metzler, D. and Strohman, T. (2010).
Search Engines: Information Retrieval in Practice, 241‒247, Pearson.
Huang, K.-C., Chen, K.-P. and Lin, C.-C. (2010). An Empirical Investigation of Settlement and Litigation—The Case of Taiwanese Labor Disputes,
Journal of Empirical Legal Studies, 7(4):786‒810.
Johnson, J., Douze, M. and Jégou, H. (2017). Billion-Scale Similarity Search with GPUs, arXiv preprint, arXiv:1702.08734.
Li, B. (2000). Civil Disputes and Law Suits in Huizhou of The Ming Dynasty,
Historical Research, Chinese Academy of History, I:94‒105.
Ramage, D., Rosen, E., Chuang, J., Manning, C. D. and McFarland, D. A. (2009). Topic Modeling for the Social Sciences,
Proceedings of the Workshop on Applications for Topic Models: Text and Beyond, Twenty-Third Conference on Neural Information Processing Systems.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks,
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the Ninth International Joint Conference on Natural Language Processing, 3982‒3992.
Sievert, C. and Shirley, K. E. (2014). LDAvis: A Method for Visualizing and Interpreting Topics,
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63‒70.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
July 25, 2022 - July 29, 2022
361 works by 945 authors indexed
Held in Tokyo and remote (hybrid) on account of COVID-19
Conference website: https://dh2022.adho.org/
Contributors: Scott B. Weingart, James Cummings
Series: ADHO (16)