The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Tika can help you implementing measurements to defend you from upload of bogus and malicious files https://cheatsheetseries.owasp.org/cheatsheets/File_Upload_Cheat_Sheet.html
Please be aware of the following:https://support.hcl-software.com/csm?id=kb_article&sysparm_article=KB0124165
Security Bulletin: HCL Notes is affected by an XML External Entity (XXE) vulnerability in Apache Tika (CVE-2025-54988)