Abstract
Background: Recent trends in machine learning and statistical techniques have revolutionised traditional auditing practices and unveiled new horizons to enhance corporate governance practices.
Objectives: This article proposes a hybrid model by combining statistical techniques, such as Benford’s Law and the Beneish M-Score, with machine learning algorithms to detect fraud. Integration of all the methodologies results in a broad, flexible framework for the identification of irregularities and possible fraudulent activities within financial datasets.
Method: The research addresses how these advanced tools meet the gaps in traditional auditing practices, thus enabling a more refined approach towards fraud detection.
Results: Empirical findings show that this integrated model will improve detection rates, thus strengthening governance structures and promoting transparency within organisations.
Conclusion: Major findings suggest that while machine learning algorithms are effective in improving the identification of complex fraud patterns, statistical methods prove to be effective in preliminary screening.
Contribution: The article ends with a discussion on implications for auditors and corporate governance structures along with future research recommendations and applications by the industry.
Keywords: corporate governance; fraud detection; machine learning; statistical analysis; Benford’s Law; Beneish M-Score; audit.
Introduction
The concept of corporate governance has changed appreciably in recent years, focusing much on transparency, accountability, and ethical values. This can be noticed especially in areas dealing with financial reporting and auditing, as it is in these areas that the information’s reliability becomes important to the stakeholders, regulators, and the community at large. Traditional auditing, which largely relies on rule-driven, manual, or threshold-based methods, has for a long time been an effective means of ensuring financial accuracy and preventing misrepresentation. However, with increasing complexity and size in financial data sets and the degree of sophistication in fraud strategies, some big challenges are faced by traditional methodologies. The traditional techniques may be so rigid in nature that they fail to identify complex and evolving fraud schemes, especially in the contemporary data-rich environment. This clearly shows that there is a great need for more sophisticated solutions that could examine huge data sets in detail, recognise minute patterns, and adapt to new forms of anomalies.
Machine learning and statistical methodologies have given way to bridge these gaps with new approaches in anomaly and possible fraudulent action detection. These technologies, by all means, tap into computational powers and data science to go well beyond traditional rule-based detection and permit a much more flexible and adaptive framework for identifying fraud. Machine learning algorithms can take input and process high-dimensional datasets to find latent patterns and trends, which may not be so obvious to a human auditor. On the other hand, statistical methods provide the robust frameworks needed for outlier detection and the estimation of fraudulent activity probability. Taken altogether, these instruments form a broad framework matching the needs of corporate governance as it is developing, enabling auditors to professionally and effectively oversee financial activities.
This research explores a hybrid model of fraud detection intended to improve corporate governance by combining machine learning methods and statistical techniques in the auditing process. Statistical methods, such as Benford’s Law and the Beneish M-Score, are powerful and low-cost pre-screening methods, which help auditors sift through voluminous data to identify those entries that have a high risk of manipulation or fraud. The law was first advanced by Frank Benford in 1938 and predicts the expected distribution of leading digits in naturally occurring datasets, hence it is very effective in detecting unusual patterns in numeric data. Hence, any deviations from the expected digit distribution predicted by Benford in financial data sets may indicate potential manipulation or fraud. The Beneish M-Score, on the other hand, measures specific financial ratios related to earnings manipulation and adds another layer of analysis on accounts that may have fraudulent activities. Being highly hybrid in nature narrows down the analytical focus by using such statistical approaches only as an initial filter, hence enabling auditors to focus on entries that warrant much closer inspection; it consequently increases audit efficiency and improves resource allocation.
There has been much interest in the current literature regarding the application of machine learning and statistical techniques to auditing. Amani and Fadlalla (2016) provide a general overview of the applications of data mining in accounting and present how data mining and machine learning approaches can support fraud detection by processing large datasets for irregular patterns and anomalies. Ashtiani and Raahemi (2021) underline the potential of machine learning in analysing big financial datasets to discover hidden fraud patterns that may not be exposed by classical methods. The use of machine learning-based models in fraud detection represents a trend towards more flexible auditing frameworks because these models learn from data and adapt to new patterns, which enhances the efficiency of fraud detection policies in corporate governance.
This research examines a hybrid approach to fraud detection that will improve corporate governance by combining machine learning methods with statistical methods in the audit process. Statistical methods, such as Benford’s Law and the Beneish M-Score are extremely useful and inexpensive pre-screening devices that enable auditors to assess large populations of data and highlight the ones that are at high-risk. Benford’s Law, first introduced by Benford (1938), considers the distribution of leading digits for datasets occurring in a natural way and is widely used in auditing to detect anomalies that deviate from expected patterns. A number of studies, such as those by Wardaya et al. (2022) and Tammaru and Alver (2016), provide evidence for the application of this law in auditing; they demonstrate its effectiveness in identifying possibly fraudulent transactions based on unusual digit distributions within financial data. The Beneish M-Score complements Benford’s Law in that it examines specific financial ratios associated with earnings manipulation, as explained by Gorenc (2019). Such statistical methods, when used as preliminary filters in the hybrid model, narrow down the scope of analysis and allow auditors to focus on entries that need the most thorough investigation, thereby enhancing the overall effectiveness of the audit procedure.
Chen et al. (2022) affirm this perspective by expounding on the merits of undertaking full-population audits via the utilisation of machine learning methods. Compared with traditional audit sample methods, full-population audits utilise machine learning models to examine vast amounts of data, enhancing the accuracy in detecting anomalies and aberrations. The boost is especially important in scenarios in which fraud activities have developed into more complex forms that are elusive to traditional detection. In addition to the related study, Adelakun et al. (2024) investigate incorporating machine learning models into audit work, focusing on both its merits as well as challenges faced in its application. The study finds that although fraud detection is greatly enhanced with the utilisation of machine learning, its application calls for resolving issues on data quality, conformity with rule-based requirements, as well as auditors’ qualifications.
In conjunction with statistical screening methods, machine learning methods greatly enhance predictive models as well as anomaly detection in fraud detection. Sheu and Liu (2024) describe the application of a Bayes classifier in both symmetrical and asymmetrical audit sampling, thus making it more systematic for auditors to spot areas that are susceptible to fraud. The combination of multiple machine learning classification methods, that is, decision trees as well as Support Vector Machines (SVMs), makes fraud detection models developed by auditors not just interpretable with reference to known requirements but also highly accurate. The former is a rule-based, deterministic approach that helps auditors understand decision-making on a basis of prescribed requirements, whereas SVM is highly effective in distinguishing between authentic as well as fictitious transactions in high-dimensional, complex data. The combined application of supervised learning methods, that is, logistic regression as well as decision trees, with unsupervised methods, that is, clustering, improves fraud detection models’ flexibility in detecting new fraud strategies. Sufi et al. (2024) examine how nonfinancial disclosures enhance predictions in firm performance via application of machine learning methods, hence providing important insights into artificial intelligence (AI)-based methods in firm governance. Likewise, Zeng et al. (2020) compare sparsification methods with that of support vector machines with a view towards predicting financial distress, hence establishing that application of machine learning is a viable means in fraud detection. Along a similar perspective, research by Rahman et al. (2021) examines auditors’ choices in terms of firm governance attributes with a focus on developments in AI in audit methods.
Recent advancements in the field confirm the relevance of educational pathways. For example, according to Arum and Wahyudi (2021), empirical evidence supports that audit quality is key in fraud detection, hence advocating for advanced instruments in contemporary audit methods. In parallel, resources utilised by big businesses, as seen in EY Helix and PwC GL.ai, analyse general ledger data in seconds with the application of machine learning, hence uncovering fraud patterns. Such developments are a demonstration of leading audit firms incorporating AI-based analytical methods into their structures.
The regulatory context is also drastically shifting in order to be compatible with these technologies. Wu (2024) highlights the growing impact that new technologies are having on firm governance, stressing that in order to achieve accountability as well as openness, complete structures are necessary. In addition, the draft Artificial Intelligence Act by the European Parliament and Council (2021) highlights the importance of AI solutions that are imbued with openness, accountability, as well as neutrality, particularly in finance as well as audit. The proposed rule is in tandem with the aims of this research that facilitate ethical application of AI in fraud detection as well as firm governance. The regulatory framework is also changing to support such innovations simultaneously. The recent proposal submitted by the European Council on an Artificial Intelligence Act (European Parliament and Council, 2021) underlines the necessity to guarantee transparency, accountability, and ethical standards in the practices of AI applications, especially in the auditing and finance sectors. This regulatory initiative emphasises the need for ‘reliable AI’ systems, which are auditable in line with corporate governance objectives of transparency and the mitigation of biases in decisions affected by AI. The proposed framework aims to establish clear standards for AI systems used in critical applications, such as financial auditing, where consequences of errors could be significant. This is in line with the ethical and regulatory considerations discussed in this article, which encourage the use of fair and transparent AI models for fraud detection.
The purpose of this research is therefore to establish the effectiveness of merging machine learning techniques with statistical instruments in developing a more responsive and scalable model for fraud detection. Integration of these diverse methodologies enhances the robustness of traditional auditing methods and also aligns with the broader goals of corporate governance in terms of transparency, accountability, and ethical standards of financial reporting. This hybrid model can respond to the complexities surrounding modern financial data, and organisations can thereby successfully identify probable fraudulent activity, meet regulatory requirements, and satisfy ethical considerations. From this approach, auditors will have a wide-ranging arsenal of tools to improve corporate governance mechanisms, promote stakeholder trust, and safeguard the integrity of financial markets.
Through this framework, organisations are able to improve their auditing processes and, therefore, tap stakeholders’ confidence in the respective services and maintain the integrity of financial markets. The outcome of this research has demonstrated the potential of computational techniques to bring about transformation in auditing: from a traditional, rule-based methodology to an adaptive, anticipatory approach that proactively identifies and mitigates risks. This is a significant development in corporate governance methodologies and provides auditors and organisations with an effective tool to ensure that the integrity and transparency of financial reporting are upheld in increasingly complicated and dynamic environments facing new fraudulent schemes.
Research methods and design
Statistical analysis as a pre-screening tool
Statistical methodologies form the backbone of this integrated framework, where Benford’s Law and the Beneish M-Score provide the first layer of analysis in identifying fraudulent activities. Benford’s Law analyses the distribution of leading digits in financial data sets and thus singles out those entries deviating from expected patterns. This is premised on the fact that in real-life data, there is always a higher occurrence for some digits as compared to others. Through the application of Benford’s Law to those financial accounts that are prone to manipulation, such as revenues and expenditures, this research demonstrates the usefulness of statistical anomalies as preliminary pointers towards possible frauds. Especially useful for working with large data sets, where manual examination would be impossible, Benford’s Law provides an inexpensive method of discovering suspicious entries worthy of more thorough investigation.
The Beneish M-Score extends Benford’s Law to consider certain financial ratios that are commonly distorted in fraudulent financial statements. This approach calculates a score based on several ratios, two of the most relevant of which to the earnings manipulator are the Days’ Sales in Receivables Index and the Gross Margin Index. When these ratios skew, this would suggest that a company may try to artificially inflate its revenues or reduce its expenses to reflect a healthier financial condition. In the context of this study, the Beneish M-Score serves as an added statistical dimension in reinforcing the results of Benford’s Law by highlighting specific entries demonstrating high-risk financial behaviour. These techniques collectively hone the data set and enable auditors to target entries with high possibility for fraud to be further examined.
Research methodology, data collection, and pre-processing
The methodological approach in this study relies on the integration of statistical analysis with machine learning methodologies, on top of a robust framework of data collection and preprocessing. Financial data for this study was gathered using the Yahoo Finance API. The dataset obtained was representative of the financial statements of various companies. Critical accounting indicators in this dataset include revenue, expenses, assets, liabilities, and financial ratios, very important in both statistical and machine learning analyses. This dataset is a combination of firms previously identified as fraudulent and others without documented anomalies, thus allowing balance during the modelling training and evaluation processes.
Some of the major steps in preprocessing data usually involve the following: cleaning raw financial data to resolve inconsistencies, filling in missing values, and preparing it for analysis; systematic detection of outliers, assuming that abnormal values may affect the precision of the machine learning models; and normalisation of data to put different features on the same scale because most algorithms are sensitive to scale. Feature engineering also involved the creation of new variables from data previously available, such as financial ratios, which form an indispensable part of the Beneish M-Score calculation contributing to the precision of the model.
Machine learning models for enhanced detection
Following data preparation, machine learning algorithms have been employed in order to render the fraud detection system adaptive. The used algorithms are both from supervised and unsupervised learning to derive a comprehensive approach towards anomaly detection. In supervised approaches, the application of decision trees is considered, which are trained using labelled datasets with already established fraud cases so that the model identifies features and patterns associated with fraudulent behaviour. The decision trees are especially convenient for fraud detection because of the simplicity of their interpretation-they give explicit decision rules that are understandable and can easily be assessed by auditors. As shown in Table 1, the application of the decision tree model returned an accuracy of 61.18%, showing the capability of the model in the effective categorisation of fraudulent entries while providing an insight into the characteristics involved in the fraud classification process.
Also, the Support Vector Machine model was used because it handles high-dimensional space with much ease, something very frequent in big, complex financial datasets. The SVM model demonstrated in Table 2 indicates moderately correct at about 56.87%, although its high recall encompasses a broad spectrum of fraudulent cases. This good recall rate portrays that the SVM model is actually very sensitive to the instances of fraud and flags many entries suspiciously. However, it resulted in an increased number of false positives, which means the model at times classified normal entries as fraudulent. This is a trade-off that exists between precision and recall; more tuning is required to allow the model to achieve a high degree of performance in fraud case detection and reduce misclassifying normal entries.
The decision for the selected supervised learning techniques, decision trees, and support vector machines keeps the framework open for both obvious and subtle patterns of fraud. Decision trees offer a clear and traceable decision-making process that auditors can confirm, while SVM captures more complex patterns, which could get especially well representative in high-dimensional data. These create altogether a dual-layer machine learning system complementary to the first statistical analysis.
Results
Hybrid model building and interpretation of results
This leads to an omnibus fraud detection model exploiting both methodologies. From a statistical perspective, the work of Benford’s Law and the Beneish M-Score was used to filter data and support the identification of those entries that had a high enough risk profile for further investigation. This initial filtering step reduces the dimensionality of the data and gives the machine learning models a prefiltered subset to examine. The decision trees combined with SVM at the layer of machine learning enabled further data-driven investigation that could classify transactions and detect tiny anomalies inside very complicated data sets.
It showed quite considerable corporate governance advantages by offering a scalable, adaptive framework for fraud detection: a framework that encompasses overarching anomalies and complex patterns. Results indicated that an integrative approach was highly effective in improving fraud detection accuracy and providing auditors with a robust and flexible tool. The intuition of the decision tree model is understandable, and the results are clear to the auditors as to why certain classifications have taken place, thus allowing for ease in the audit cycle. The superior recall of the SVM model promises to capture almost all fraud cases, but it still requires further refinement to reduce false alarms.
Discussion
Corporate governance implications and industry applications
This hybrid model has an impact far beyond fraud detection but also contributes to larger corporate governance objectives of transparency, accountability, and compliance with regulatory requirements. This framework puts forward a joint machine learning and statistical approach to underpin new standards in corporate governance. For instance, guidelines from regulatory organisations such as the European Council highly emphasise the aspects of transparency and accountability related to AI-enabled decision-making processes. The draft Artificial Intelligence Act underlines that the auditable and reliable operation of AI systems is increasingly necessary, especially in essential areas such as financial auditing, where false positives remain critical.
Industry applications, such as EY Helix and PwC’s GL.ai, personify enhanced analytics usage in auditing and trace their very real benefits. EY Helix applies machine learning methods over general ledger data to identify anomalies and trends as they occur. Similarly, in PwC’s GL.ai, which uses AI to point out possible abnormalities in large volumes of information. As noted by EY (2024), AI applications can support auditors by detecting complex patterns of fraud. These tools illustrate how, currently, machine learning and statistical methods are applied to enhance the efficiency and accuracy of the audit in supporting auditors to identify high-risk transactions. In support, the proposed hybrid model embeds similar techniques to support a fraud detection approach that is proactive, where auditors are able to allocate priorities for follow-up and mitigate the potential risks.
Conclusion
This article introduces a hybrid model that allows financial auditing to improve fraud detection with the support of statistical analysis combined with machine learning. This model framework includes Benford’s Law and the Beneish M-Score as preliminary screening methods; both are an inexpensive and focused approach to adopt with minimal complexity on the dataset. From this, the machine learning algorithms involving decision trees and support vector machines delve deeper in the analysis; hence, this may give one the ability to detect subtle fraud patterns. Such methodology, therefore, covers both superficial anomalies and complex fraud schemes, ensuring a scalable solution that meets modern corporate governance standards.
Results confirm that such an integrated methodology will enhance corporate governance by considering increased levels of transparency, accountability, and ethical practices in their financial reporting. Moreover, because the model can consider any leading regulatory framework, such as Generally Accepted Accounting Principles (GAAP) or International Financial Reporting Standards (IFRS), applicability to different corporate settings increases and enhances homogeneity among fraud detection methodologies. Besides, industrial cases such as EY Helix and PwC’s GL.ai represent the workability of machine learning and statistical instruments in practical audits. This article contributes to developing audit methodologies by incorporating a model that harmonises the trade-off between the need for accuracy and flexibility in fraud detection to further improve the reliability of financial reporting and confidence among stakeholders.
Acknowledgements
Competing interests
The author declares that he has no financial or personal relationships that may have inappropriately influenced him in writing this article.
Author’s contributions
T.S. is the sole author of this research article.
Ethical considerations
This article followed all ethical standards for research without direct contact with human or animal subjects.
Funding information
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Data availability
The data supporting the findings of this study are publicly available and were accessed using the Yahoo Finance Application Programming Interface (API) (https://pypi.org/project/yfinance/). The dataset includes financial data such as revenue, expenses, assets, liabilities, and financial ratios. There are no restrictions on the availability of the data.
Disclaimer
The views and opinions expressed in this article are those of the author and are the product of professional research. It does not necessarily reflect the official policy or position of any affiliated institution, funder, agency, or that of the publisher. The author is responsible for this article’s results, findings, and content.
References
Adelakun, B., Fatogun, D., Majekodunmi, T., & Adediran, G. (2024). Integrating machine learning algorithms into audit processes: Benefits and challenges. Finance & Accounting Research Journal, 6(6), 1000–1016. https://doi.org/10.51594/farj.v6i6.1233
Amani, F.A., & Fadlalla, A.M. (2016). Data mining applications in accounting: A review of the literature and organizing framework. International Journal of Accounting Information Systems, 24, 32–58. https://doi.org/10.1016/j.accinf.2016.12.004
Arum, E., & Wahyudi, S.T. (2021). Audit quality and fraud detection: Evidence of the internal auditor of Jambi Province. Advances in Economics, Business and Management Research, 177, 7–11. https://doi.org/10.2991/aebmr.k.210616.002
Ashtiani, M., & Raahemi, B. (2021). The potential of machine learning and data mining techniques in analyzing large volumes of financial statements. Journal of Financial Crime, 28(1), 311–331.
Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society, 78(4), 551–572.
Chen, Y., Wu, Z., & Yan, H. (2022). A full population auditing method based on machine learning. Sustainability, 14(24), 17008. https://doi.org/10.3390/su142417008
European Parliament and Council. (2021). Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain legislative acts of the Union. COM/2021/206 final – 2021/0106 (COD). Office of the European Union.
EY. (2024). How an AI application can help auditors detect fraud. Retrieved from https://www.ey.com/engl/better-begins-with-you/how-an-ai-application-can-help-auditors-detect-fraud
Gorenc, M. (2019). Benford’s Law as a useful tool to determine fraud in financial statements. Management: Journal of Contemporary Management Issues, 14(1), 19–31. https://doi.org/10.26493/1854-4231.14.19-31
Markert, T., Langer, F., & Danos, V. (2022). GAFAI: Proposal of a generalized audit framework for AI. In Informatik 2022: Jahrestagung der Gesellschaft für Informatik e.V. Gesellschaft für Informatik.
Rahman, A., Hassan, R., & Nordin, A. (2021). Auditor choice prediction model using corporate governance and ownership attributes: A machine learning approach. International Journal of Emerging Technology and Advanced Engineering, 11(7), 88–96. https://doi.org/10.46338/ijetae0721_11
Sheu, G., & Liu, N. (2024). Symmetrical and asymmetrical sampling audit evidence using a naive bayes classifier. Symmetry, 16(4), 500. https://doi.org/10.3390/sym16040500
Sufi, T.S., Jan, S., Ahmad, Z., & Anwar, M. (2024). Improving the prediction of firm performance using nonfinancial disclosures: A machine learning approach. Journal of Accounting in Emerging Economies, 14(5), 1223–1251. https://doi.org/10.1108/jaee-07-2023-0205
Tammaru, M., & Alver, L. (2016, December). Application of Benford’s Law for fraud detection in financial statements: Theoretical review. In Proceedings of the 5th International Conference on Accounting, Auditing, and Taxation (ICAAT 2016). Atlantis Press. https://doi.org/10.2991/icaat-16.2016.46
Wardaya, A., Handoko, B.L., Willy, R., & Hendra, E. (2022). Benford’s Law as a tool in detecting financial statement fraud. Journal of Theoretical and Applied Information Technology, 100(14), 5300–5305.
Wu, H. (2024). Exploring the integration of emerging technologies and corporate governance. Advances in Economics, Management, and Political Sciences, 87, 55–69. https://doi.org/10.54254/2754-1169/87/20241021
Zeng, Y., Li, Z., & Zhang, L. (2020). A financial distress prediction model based on sparse algorithm and support vector machine. Mathematical Problems in Engineering, 2020, 1–12. https://doi.org/10.1155/2020/5625271
|