File: //usr/local/CyberCP/lib/python3.10/site-packages/charset_normalizer/__pycache__/api.cpython-310.pyc
o
    �h�X  �                   @  s:  d dl mZ d dlZd dlmZ d dlmZ ddlmZm	Z	m
Z
mZ ddlm
Z
mZmZmZ ddlmZ dd	lmZmZ dd
lmZmZmZmZmZmZmZ e�d�Ze� � Z!e!�"e�#d�� 	
								d2d3d$d%�Z$	
								d2d4d(d)�Z%	
								d2d5d,d-�Z&	
								d6d7d0d1�Z'dS )8�    )�annotationsN)�PathLike)�BinaryIO�   )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)�
mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding�cut_sequence_chunks�	iana_name�identify_sig_or_bom�
is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bom�charset_normalizerz)%(asctime)s | %(levelname)s | %(message)s�   �   皙�����?TF皙�����?�	sequences�bytes | bytearray�steps�int�
chunk_size�	threshold�float�cp_isolation�list[str] | None�cp_exclusion�preemptive_behaviour�bool�explain�language_threshold�enable_fallback�returnr   c
           2      C  s�	  t | ttf�std�t| ����|rtj}
t�t	� t�
t� t| �}|dkrGt�
d� |r;t�t	� t�
|
p9tj� tt| dddg d�g�S |dur]t�td	d
�|�� dd� |D �}ng }|durut�td
d
�|�� dd� |D �}ng }||| kr�t�td|||� d}|}|dkr�|| |k r�t|| �}t| �tk }t| �tk}
|r�t�td�|�� n|
r�t�td�|�� g }|r�t| �nd}|dur�|�|� t�td|� t� }g }g }d}d}d}t� }t� }t| �\}}|du�r|�|� t�tdt|�|� |�d� d|v�r|�d� |t D �]7}|�r$||v�r$�q|�r.||v �r.�q||v �r5�q|�|� d}||k}|�oFt|�}|dv �rX|�sXt�td|� �q|dv �ri|�sit�td|� �qzt|�}W n t t!f�y�   t�td|� Y �qw z9|
�r�|du �r�t"|du �r�| dtd�� n	| t|�td�� |d� nt"|du �r�| n| t|�d� |d�}W n+ t#t$f�y� } zt |t$��s�t�td|t"|�� |�|� W Y d}~�qd}~ww d} |D ]
}!t%||!��r�d}  n�q�| �rt�td||!� �qt&|�sdnt|�|t|| ��}"|�o&|du�o&t|�|k }#|#�r1t�td |� tt|"�d! �}$t'|$d"�}$d}%d}&g }'g }(zLt(| ||"||||||�	D ]=})|'�|)� |(�t)|)||du �ordt|�  k�opd"kn  �� |(d# |k�r�|%d7 }%|%|$k�s�|�r�|du �r� n�qSW n! t#�y� } zt�td$|t"|�� |$}%d}&W Y d}~nd}~ww |&�s�|
�r�|�s�z| td%�d� j*|d&d'� W n# t#�y� } zt�td(|t"|�� |�|� W Y d}~�qd}~ww |(�r�t+|(�t|(� nd}*|*|k�s|%|$k�rJ|�|� t�td)||%t,|*d* d+d,�� |	�rH|dd|d-d.fv �rH|&�sHt| |||g ||d/�}+||k�r>|+}n
|dk�rF|+}n|+}�qt�td0|t,|*d* d+d,�� |�s`t-|�},nt.|�},|,�rst�td1�|t"|,��� g }-|dk�r�|'D ]})t/|)||,�r�d2�|,�nd�}.|-�|.� �q|t0|-�}/|/�r�t�td3�|/|�� t| ||*||/|
du �s�||ddfv �r�|nd|d/�}0|�|0� ||ddfv �r�|*d4k �r�|*dk�r�t�
d5|0j1� |�r�t�t	� t�
|
� t|0g�  S |�|0� t|��r-|du �s||v �r-d|v �r-d|v �r-|�2� }1t�
d5|1j1� |�r&t�t	� t�
|
� t|1g�  S ||k�rNt�
d6|� |�rEt�t	� t�
|
� t|| g�  S �qt|�dk�r�|�s`|�s`|�rft�td7� |�rvt�
d8|j1� |�|� n2|�r~|du �s�|�r�|�r�|j3|j3k�s�|du�r�t�
d9� |�|� n
|�r�t�
d:� |�|� |�r�t�
d;|�2� j1t|�d � nt�
d<� |�r�t�t	� t�
|
� |S )=af  
    Given a raw bytes sequence, return the best possibles charset usable to render str objects.
    If there is no results, it is a strong indicator that the source is binary/not text.
    By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence.
    And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will.
    The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page
    but never take it for granted. Can improve the performance.
    You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that
    purpose.
    This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32.
    By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain'
    toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging.
    Custom logging format and handler can be set manually.
    z3Expected object of type bytes or bytearray, got: {}r   z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g        F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c                 S  �   g | ]}t |d ��qS �F�r   ��.0�cp� r5   �I/usr/local/CyberCP/lib/python3.10/site-packages/charset_normalizer/api.py�
<listcomp>[   �    zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c                 S  r/   r0   r1   r2   r5   r5   r6   r7   f   r8   z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r   z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii>   �utf_16�utf_32z\Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.>   �utf_7zREncoding %s won't be tested as-is because detection is unreliable without BOM/SIG.z2Encoding %s does not provide an IncrementalDecoderg    ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.�   �   ���zaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %sg     j�@�strict)�errorsz^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d   �   )�ndigitsr:   r;   )�preemptive_declarationz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}�,z We detected language {} using {}r   z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4�
isinstance�	bytearray�bytes�	TypeError�format�type�logger�level�
addHandler�explain_handler�setLevelr
   �len�debug�
removeHandler�logging�WARNINGr   r   �log�joinr    r   r   r   �append�setr   r
   �addr   r   �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr   �range�maxr   r   �decode�sum�roundr   r   r   r	   r=   �best�fingerprint)2r   r   r!   r"