LexCognition is a comprehensive AI-based text analysis tool that performs multiple analyses on text files. The system can detect authorship, analyze language complexity, and assess vocabulary diversity. It also provides style-morphed text based on the input and generates detailed audit reports. This tool is useful for analyzing and processing large volumes of text to determine potential AI generation, complexity levels, and other linguistic features.
- Authorship Detection: Determines the likelihood that a text is AI-generated or human-written.
- Language Complexity Analysis: Analyzes the average word length, language complexity, and diversity of vocabulary.
- Vocabulary Diversity Analysis: Provides statistics on sentence count, word count, average sentence length, and more.
- Style Morphing: Transforms the input text into a new style while retaining its original meaning.
- Audit Reporting: Generates detailed reports based on text analysis and version control data.
.
├── analyze_text.py # Script for analyzing text
├── lexaudit # Directory containing auditing tools and scripts
│ ├── audit_report_generation.py # Script for generating audit reports
│ ├── change_analysis.py # Script for analyzing changes
│ └── __init__.py # Init file for the lexaudit module
├── lexdetect # Directory containing authorship detection tools
├── lexguard # Directory for security and privacy tools
├── lexprivacy # Directory for privacy analysis tools
├── lexcognition_analytics # Directory containing analytics scripts
├── process_and_report.py # Main script to process text files and generate reports
├── reports # Directory where generated reports are stored
├── requirements.txt # Python dependencies
├── test_data # Directory containing test text files
└── README.md # This file
- Python 3.10 or later
- Pip (Python package manager)
-
Clone the Repository
git clone https://github.com/yourusername/LexCognition.git cd LexCognition
-
Install Dependencies
pip install -r requirements.txt
To process and analyze text files in the test_data
directory, run the following command:
python process_and_report.py
This script will analyze each text file in the test_data
directory, generate an authorship detection report, language complexity analysis, vocabulary diversity analysis, style-morphed text, and a comprehensive audit report.
When you run process_and_report.py
, you'll see output similar to the following:
Processing text from example-file.txt...
Authorship Detection: {'score': 0.89, 'is_ai_generated': True, 'summary': 'Highly Likely AI-Generated (Score: 0.89)'}
Language Complexity Analysis: {'avg_word_length': 5.2, 'language_complexity': 'Complex Language', 'vocabulary_diversity': 'High Diversity'}
Vocabulary Diversity Analysis: {'sentence_count': 12, 'word_count': 150, 'avg_sentence_length': 12.5}
Audit Report: {'filename': 'test_data/example-file.txt', 'changes': None, 'analysis_summary': 'Analysis based on version control data'}
The audit reports and other analyses are stored in the reports
directory. Each report is saved as a JSON or plain text file (depending on the analysis), making it easy to review and share results.
Contributions are welcome! Please submit a pull request with a detailed description of your changes. Ensure that your code follows the existing style and passes all tests.
This project is licensed under the MIT License. See the LICENSE
file for more details.
For any questions, suggestions, or issues, please open an issue in this repository or contact [your email].