From c4e0e1d23d0f73643e54944327decfbca6602a8c Mon Sep 17 00:00:00 2001
From: irebai <irebai@linagora.com>
Date: Wed, 4 Mar 2020 14:53:55 +0100
Subject: [PATCH 001/172] Update test_deployment.sh

---
 test/test_deployment.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test_deployment.sh b/test/test_deployment.sh
index 79d509d..b1b8d36 100755
--- a/test/test_deployment.sh
+++ b/test/test_deployment.sh
@@ -1 +1 @@
-curl -X POST "http://localhost:8888/transcribe" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "wavFile=@bonjour.wav;type=audio/wav"
+curl -X POST "http://localhost:8888/transcribe" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "file=@bonjour.wav;type=audio/wav"

From b42d6e043c7cad6aa67f190587186f694d41033c Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 8 Apr 2020 11:11:24 +0200
Subject: [PATCH 002/172] add LICENCE and RELEASE files

---
 LICENCE    | 661 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 RELEASE.md |   2 +
 2 files changed, 663 insertions(+)
 create mode 100644 LICENCE
 create mode 100644 RELEASE.md

diff --git a/LICENCE b/LICENCE
new file mode 100644
index 0000000..c39e3a4
--- /dev/null
+++ b/LICENCE
@@ -0,0 +1,661 @@
+                    GNU AFFERO GENERAL PUBLIC LICENSE
+                       Version 3, 19 November 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU Affero General Public License is a free, copyleft license for
+software and other kinds of works, specifically designed to ensure
+cooperation with the community in the case of network server software.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+our General Public Licenses are intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  Developers that use our General Public Licenses protect your rights
+with two steps: (1) assert copyright on the software, and (2) offer
+you this License which gives you legal permission to copy, distribute
+and/or modify the software.
+
+  A secondary benefit of defending all users" freedom is that
+improvements made in alternate versions of the program, if they
+receive widespread use, become available for other developers to
+incorporate.  Many developers of free software are heartened and
+encouraged by the resulting cooperation.  However, in the case of
+software used on network servers, this result may fail to come about.
+The GNU General Public License permits making a modified version and
+letting the public access it on a server without ever releasing its
+source code to the public.
+
+  The GNU Affero General Public License is designed specifically to
+ensure that, in such cases, the modified source code becomes available
+to the community.  It requires the operator of a network server to
+provide the source code of the modified version running there to the
+users of that server.  Therefore, public use of a modified version, on
+a publicly accessible server, gives the public access to the source
+code of the modified version.
+
+  An older license, called the Affero General Public License and
+published by Affero, was designed to accomplish similar goals.  This is
+a different license, not a version of the Affero GPL, but Affero has
+released a new version of the Affero GPL which permits relicensing under
+this license.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU Affero General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work"s
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users" Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work"s
+users, your or third parties" legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program"s source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation"s users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party"s predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor"s "contributor version".
+
+  A contributor"s "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor"s essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient"s use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others" Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Remote Network Interaction; Use with the GNU General Public License.
+
+  Notwithstanding any other provision of this License, if you modify the
+Program, your modified version must prominently offer all users
+interacting with it remotely through a computer network (if your version
+supports such interaction) an opportunity to receive the Corresponding
+Source of your version by providing access to the Corresponding Source
+from a network server at no charge, through some standard or customary
+means of facilitating copying of software.  This Corresponding Source
+shall include the Corresponding Source for any work covered by version 3
+of the GNU General Public License that is incorporated pursuant to the
+following paragraph.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the work with which it is combined will remain governed by version
+3 of the GNU General Public License.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU Affero General Public License from time to time.  Such new versions
+will be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU Affero General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU Affero General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU Affero General Public License can be used, that proxy"s
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program"s name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If your software can interact with users remotely through a computer
+network, you should also make sure that it provides a way for users to
+get its source.  For example, if your program is a web application, its
+interface could display a "Source" link that leads users to an archive
+of the code.  There are many ways you could offer source, and different
+solutions will be better for different programs; see section 13 for the
+specific requirements.
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU AGPL, see
+<http://www.gnu.org/licenses/>.
diff --git a/RELEASE.md b/RELEASE.md
new file mode 100644
index 0000000..cc6807c
--- /dev/null
+++ b/RELEASE.md
@@ -0,0 +1,2 @@
+# 1.0.0
+- First build of LinTO-Platform-stt-standalone-worker
\ No newline at end of file

From 667e6b9af22cbb479070914ae823cd5c2f57ed25 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 19 May 2020 23:51:51 +0200
Subject: [PATCH 003/172] fix minor bugs and replace swagger docker by a python
 package

---
 .envdefault          |   6 +--
 Dockerfile           | 115 +++++++++++++++++++++++--------------------
 Jenkinsfile          |  51 +++++++++++++++++++
 RELEASE.md           |   8 ++-
 docker-compose.yml   |  17 ++-----
 document/swagger.yml |  10 +---
 run.py               | 109 ++++++++++++++++++++++++----------------
 7 files changed, 193 insertions(+), 123 deletions(-)
 create mode 100644 Jenkinsfile

diff --git a/.envdefault b/.envdefault
index 42419d0..2246e24 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,7 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-WORKER_PORT=8888
-SERVICE_PORT=2000
-
-SWAGGER_PATH=./document
-SWAGGER_JSON=/app/swagger/swagger.yml
+SWAGGER_PATH=/path/to/swagger/file
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
index 988ed0e..83d7300 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
 FROM debian:9
-MAINTAINER Ilyes REBAI <irebai@linagora.com>
+LABEL maintainer="irebai@linagora.com"
 
 # Install all our dependencies and set some required build changes
 RUN apt-get update &&\
@@ -10,62 +10,71 @@ RUN apt-get update &&\
     python3-dev \
     python-pip \
     python3-pip \
-    autoconf \
-    automake \
-    unzip \
-    bc \
-    bzip2 \
-    default-jre \
-    g++ \
-    git \
-    gzip \
-    libatlas3-base \
-    libtool-bin \
-    make \
-    sox \
-    libsox-fmt-all \
-    libav-tools \
-    subversion \
-    vorbis-tools \
-    wget \
-    zlib1g-dev &&\
-    apt-get clean autoclean && \
-    apt-get autoremove -y && \
-    ln -s /usr/bin/python2.7 /usr/bin/python ; ln -s -f bash /bin/sh
+    g++ make automake autoconf bzip2 unzip wget sox libtool git subversion zlib1g-dev ca-certificates gfortran patch ffmpeg nano && \
+    apt-get clean
 
-ENV BASE_DIR /opt/speech-to-text
+## Build kaldi and Clean installation (intel, openfst, src/*)
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+    cd /opt/kaldi && \
+    cd /opt/kaldi/tools && \
+    ./extras/install_mkl.sh && \
+    make -j $(nproc) && \
+    cd /opt/kaldi/src && \
+    ./configure --shared && \
+    make depend -j $(nproc) && \
+    make -j $(nproc) && \
+    mkdir -p /opt/kaldi/src_/lib /opt/kaldi/src_/bin && \
+    mv /opt/kaldi/src/base/libkaldi-base.so \
+       /opt/kaldi/src/chain/libkaldi-chain.so \
+       /opt/kaldi/src/cudamatrix/libkaldi-cudamatrix.so \
+       /opt/kaldi/src/decoder/libkaldi-decoder.so \
+       /opt/kaldi/src/feat/libkaldi-feat.so \
+       /opt/kaldi/src/fstext/libkaldi-fstext.so \
+       /opt/kaldi/src/gmm/libkaldi-gmm.so \
+       /opt/kaldi/src/hmm/libkaldi-hmm.so \
+       /opt/kaldi/src/ivector/libkaldi-ivector.so \
+       /opt/kaldi/src/kws/libkaldi-kws.so \
+       /opt/kaldi/src/lat/libkaldi-lat.so \
+       /opt/kaldi/src/lm/libkaldi-lm.so \
+       /opt/kaldi/src/matrix/libkaldi-matrix.so \
+       /opt/kaldi/src/nnet/libkaldi-nnet.so \
+       /opt/kaldi/src/nnet2/libkaldi-nnet2.so \
+       /opt/kaldi/src/nnet3/libkaldi-nnet3.so \
+       /opt/kaldi/src/online2/libkaldi-online2.so \
+       /opt/kaldi/src/rnnlm/libkaldi-rnnlm.so \
+       /opt/kaldi/src/sgmm2/libkaldi-sgmm2.so \
+       /opt/kaldi/src/transform/libkaldi-transform.so \
+       /opt/kaldi/src/tree/libkaldi-tree.so \
+       /opt/kaldi/src/util/libkaldi-util.so \
+       /opt/kaldi/src_/lib && \
+    mv /opt/kaldi/src/online2bin/online2-wav-nnet2-latgen-faster \
+       /opt/kaldi/src/online2bin/online2-wav-nnet3-latgen-faster \
+       /opt/kaldi/src/latbin/lattice-1best \
+       /opt/kaldi/src/latbin/lattice-align-words \
+       /opt/kaldi/src/latbin/nbest-to-ctm /opt/kaldi/src_/bin && \
+    rm -rf /opt/kaldi/src && mv /opt/kaldi/src_ /opt/kaldi/src && \
+    cd /opt/kaldi/src && rm -f lmbin/*.cc lmbin/*.o lmbin/Makefile fstbin/*.cc fstbin/*.o fstbin/Makefile bin/*.cc bin/*.o bin/Makefile && \
+    cd /opt/intel/mkl/lib && rm -f intel64/*.a intel64_lin/*.a && \
+    cd /opt/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
+    rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
 
+## Install python packages
+RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml
 
-# Build kaldi
-## Install main libraries
-RUN cd /opt && git clone https://github.com/kaldi-asr/kaldi.git && \
-    cd /opt/kaldi/tools && make -j$(nproc)
+## Create symbolik links
+RUN cd /opt/kaldi/src/bin && \
+    ln -s online2-wav-nnet2-latgen-faster kaldi-nnet2-latgen-faster && \
+    ln -s online2-wav-nnet3-latgen-faster kaldi-nnet3-latgen-faster && \
+    ln -s lattice-1best kaldi-lattice-1best && \
+    ln -s lattice-align-words kaldi-lattice-align-words && \
+    ln -s nbest-to-ctm kaldi-nbest-to-ctm
 
-#Install MKL package
-RUN cd /opt/kaldi/tools && \
-    extras/install_mkl.sh
+# Set environment variables
+ENV PATH /opt/kaldi/src/bin:/opt/kaldi/egs/wsj/s5/utils/:$PATH
 
-## Install main functions
-RUN cd /opt/kaldi/src && \
-    sed -i -e ':a;N;$!ba;s:\\\n::g' Makefile && \
-    sed -i -e 's:^SUBDIRS = .*$:SUBDIRS = base matrix util feat tree gmm transform fstext hmm lm decoder lat cudamatrix nnet bin nnet2 nnet3 chain ivector online2:g' -e 's:^MEMTESTDIRS = .*$:MEMTESTDIRS = :g' Makefile && \
-    ./configure --shared && make depend -j$(nproc) && make -j$(nproc) && rm */*{.a,.o}
-
-RUN apt install -y libatlas-dev
-RUN pip2 install flask configparser requests flask-cors
-
-RUN echo "/opt/kaldi/src/lib/" > /etc/ld.so.conf.d/kaldi.conf && \
-    echo "/opt/kaldi/tools/openfst/lib/" >> /etc/ld.so.conf.d/kaldi.conf && \
-    ldconfig
-
-COPY Makefile /opt
-RUN cd /opt && ./Makefile
-
-RUN mkdir -p /opt/tmp
-RUN cp /opt/kaldi/egs/wsj/s5/utils/int2sym.pl /opt
-ENV PATH /opt:$PATH
-
-WORKDIR $BASE_DIR
+WORKDIR /usr/src/speech-to-text
 COPY run.py .
 
-CMD ./run.py
+EXPOSE 80
+
+CMD python3 ./run.py
diff --git a/Jenkinsfile b/Jenkinsfile
new file mode 100644
index 0000000..b4bdffc
--- /dev/null
+++ b/Jenkinsfile
@@ -0,0 +1,51 @@
+pipeline {
+    agent any
+    environment {
+        DOCKER_HUB_REPO = "lintoai/linto-platform-stt-standalone-worker"
+        DOCKER_HUB_CRED = 'docker-hub-credentials'
+        
+        VERSION = ''
+    }
+
+    stages{
+        stage('Docker build for master branch'){
+            when{
+                branch 'master'
+            }
+            steps {
+                echo 'Publishing latest'
+                script {
+                    image = docker.build(env.DOCKER_HUB_REPO)
+                    VERSION = sh(
+                        returnStdout: true, 
+                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                    ).trim()
+
+                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+                        image.push("${VERSION}")
+                        image.push('latest')
+                    }
+                }
+            }
+        }
+
+        stage('Docker build for next (unstable) branch'){
+            when{
+                branch 'next'
+            }
+            steps {
+                echo 'Publishing unstable'
+                script {
+                    image = docker.build(env.DOCKER_HUB_REPO)
+                    VERSION = sh(
+                        returnStdout: true, 
+                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                    ).trim()
+                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+                        image.push('latest-unstable')
+                    }
+                }
+            }
+        }
+    }// end stages
+}
diff --git a/RELEASE.md b/RELEASE.md
index cc6807c..a2826a4 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,2 +1,6 @@
-# 1.0.0
-- First build of LinTO-Platform-stt-standalone-worker
\ No newline at end of file
+# 1.1.2
+- New features:
+    - Word timestamp computing
+    - Response type: plain/text: simple text output and application/json: the transcription and the words timestamp.
+    - Swagger: integrate swagger in the service using a python package
+    - Fix minor bugs
\ No newline at end of file
diff --git a/docker-compose.yml b/docker-compose.yml
index b8ee470..8c8e9aa 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -9,19 +9,10 @@ services:
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM
+      - ${SWAGGER_PATH}:/opt/swagger.yml
     ports:
-      - ${WORKER_PORT}:${SERVICE_PORT}
+      - target: 80
+        published: 8888
     env_file: .env
     environment:
-      - ${SERVICE_PORT}
-
-  swaggerui:
-    image: swaggerapi/swagger-ui
-    ports:
-      - 80:8080
-    hostname: swaggerui
-    volumes:
-      - ${SWAGGER_PATH}:/app/swagger/
-    env_file: .env
-    environment:
-      - SWAGGER_JSON
+      SWAGGER_PATH: /opt/swagger.yml
\ No newline at end of file
diff --git a/document/swagger.yml b/document/swagger.yml
index 9cab647..ebc5c08 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -20,19 +20,13 @@ paths:
       - "multipart/form-data"
       produces:
       - "application/json"
+      - "text/plain"
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Wave File"
+        description: "Audio File (wav, mp3, aiff, flac, ogg)"
         required: true
         type: "file"
-      - name: "metadata"
-        in: "query"
-        description: "Accepted header"
-        required: true
-        type: "string"
-        enum: [ "Text", "Json" ]
-        default: "text"
       responses:
         200:
           description: Successfully transcribe the audio
diff --git a/run.py b/run.py
index 822ee68..d678e94 100755
--- a/run.py
+++ b/run.py
@@ -1,33 +1,26 @@
-#!/usr/bin/env python2
+#!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 
 from flask import Flask, request, abort, Response, json
+from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-from os import path
-import uuid, os
-import configparser
-import subprocess
-import shlex
-import re
+import uuid, os, configparser, subprocess, shlex, re, yaml
 
 app = Flask(__name__)
-CORS(app)
-
-global busy
-busy=0
 
+# Main parameters
 AM_PATH = '/opt/models/AM'
 LM_PATH = '/opt/models/LM'
-TEMP_FILE_PATH = '/opt/tmp'  #/opt/wavs
-TEMP_FILE_PATH1= '/opt/models'
+TEMP_FILE_PATH = '/opt/tmp'
+CONFIG_FILES_PATH = '/opt/config'
+SERVICE_PORT=80
+SWAGGER_URL='/api-doc'
 
+if not os.path.isdir(TEMP_FILE_PATH):
+    os.mkdir(TEMP_FILE_PATH)
+if not os.path.isdir(CONFIG_FILES_PATH):
+    os.mkdir(CONFIG_FILES_PATH)
 
-def dockerId():
-    with open('/proc/self/cgroup') as f:
-        lines = f.readlines() 
-    for l in lines:
-        if '/docker/' in l:
-            return l.split('/')[2][:20]
 
 def run_shell_command(command_line):
     try:
@@ -36,25 +29,25 @@ def run_shell_command(command_line):
         output, error = process.communicate()
         return False, output
     except OSError as err:
-        print("OS error: {0}".format(err))
+        app.logger.info("OS error: {0}".format(err))
         return True, ''
     except ValueError:
-        print("data error.")
+        app.logger.info("data error.")
         return True, ''
     except:
-        print("Unexpected error:", sys.exc_info()[0])
+        app.logger.info("Unexpected error:", sys.exc_info()[0])
         return True, ''
 
 def decode(audio_file,wav_name,do_word_tStamp):
     # Normalize audio file and convert it to wave format
     error, output = run_shell_command("sox "+audio_file+" -t wav -b 16 -r 16000 -c 1 "+audio_file+".wav")
-    if not path.exists(audio_file+".wav"):
+    if not os.path.exists(audio_file+".wav"):
         app.logger.info(output)
         return False, 'Error during audio file conversion!!! Supported formats are wav, mp3, aiff, flac, and ogg.'
 
 
     decode_file  = audio_file+".wav"
-    decode_conf  = TEMP_FILE_PATH1+"/online.conf"
+    decode_conf  = CONFIG_FILES_PATH+"/online.conf"
     decode_mdl   = AM_PATH+"/"+AM_FILE_PATH+"/final.mdl"
     decode_graph = LM_PATH+"/HCLG.fst"
     decode_words = LM_PATH+"/words.txt"
@@ -62,20 +55,27 @@ def decode(audio_file,wav_name,do_word_tStamp):
 
 
     # Decode the audio file
+    decode_opt =" --min-active="+DECODER_MINACT
+    decode_opt+=" --max-active="+DECODER_MAXACT
+    decode_opt+=" --beam="+DECODER_BEAM
+    decode_opt+=" --lattice-beam="+DECODER_LATBEAM
+    decode_opt+=" --acoustic-scale="+DECODER_ACWT
+        
+    
     if DECODER_SYS == 'dnn3':
-        error, output = run_shell_command("kaldi-nnet3-latgen-faster --do-endpointing=false --frame-subsampling-factor="+DECODER_FSF+" --frames-per-chunk=20 --online=false --config="+decode_conf+" --minimize=false --min-active="+DECODER_MINACT+" --max-active="+DECODER_MAXACT+" --beam="+DECODER_BEAM+" --lattice-beam="+DECODER_LATBEAM+" --acoustic-scale="+DECODER_ACWT+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
+        error, output = run_shell_command("kaldi-nnet3-latgen-faster --do-endpointing=false --frames-per-chunk=20 --online=false --frame-subsampling-factor="+DECODER_FSF+" --config="+decode_conf+" --minimize=false "+decode_opt+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
     elif DECODER_SYS == 'dnn2' or DECODER_SYS == 'dnn':
-        error, output = run_shell_command("kaldi-nnet2-latgen-faster --do-endpointing=false --online=false --config="+decode_conf+" --min-active="+DECODER_MINACT+" --max-active="+DECODER_MAXACT+" --beam="+DECODER_BEAM+" --lattice-beam="+DECODER_LATBEAM+" --acoustic-scale="+DECODER_ACWT+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
+        error, output = run_shell_command("kaldi-nnet2-latgen-faster --do-endpointing=false --online=false --config="+decode_conf+" "+decode_opt+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
     else:
         return False, 'The "decoder" parameter of the acoustic model is not supported!!!'
 
-    if not path.exists(TEMP_FILE_PATH+"/"+wav_name+".lat"):
+    if not os.path.exists(TEMP_FILE_PATH+"/"+wav_name+".lat"):
         app.logger.info(output)
         return False, 'One or multiple parameters of the acoustic model are not correct!!!'
 
 
     # Normalize the obtained transcription
-    hypothesis = re.findall('\n'+wav_name+'.*',output)
+    hypothesis = re.findall('\n'+wav_name+'.*',output.decode('utf-8'))
     trans=re.sub(wav_name,'',hypothesis[0]).strip()
     trans=re.sub(r"#nonterm:[^ ]* ", "", trans)
     trans=re.sub(r" <unk> ", " ", " "+trans+" ")
@@ -88,7 +88,7 @@ def decode(audio_file,wav_name,do_word_tStamp):
         error, output = run_shell_command("kaldi-nbest-to-ctm ark:"+TEMP_FILE_PATH+"/"+wav_name+".words "+TEMP_FILE_PATH+"/"+wav_name+".ctm")
         error, output = run_shell_command("int2sym.pl -f 5 "+decode_words+" "+TEMP_FILE_PATH+"/"+wav_name+".ctm")
         if not error and output != "":
-            words = output.split("\n")
+            words = output.decode('utf-8').split("\n")
             trans = ""
             data = {}
             data["words"] = []
@@ -117,8 +117,14 @@ def transcribe():
     global busy
     busy=1
     fileid = str(uuid.uuid4())
-    metadata = True if request.args.get('metadata').lower() == 'json' else False
+    if request.headers.get('accept').lower() == 'application/json':
+        metadata = True
+    elif request.headers.get('accept').lower() == 'text/plain':
+        metadata = False
+    else:
+        return 'Not accepted header', 400
     
+        
     if 'file' in request.files.keys():
         file = request.files['file']
         file_ext = file.filename.rsplit('.', 1)[-1].lower()
@@ -144,20 +150,27 @@ def transcribe():
     json_string = json.dumps(out, ensure_ascii=False)
     return Response(json_string,content_type="application/json; charset=utf-8" ), 200
 
-@app.route('/check', methods=['GET'])
+@app.route('/healthcheck', methods=['GET'])
 def check():
     return '1', 200
 
-@app.route('/stop', methods=['POST'])
-def stop():
-    while(busy==1):
-        continue
-    subprocess.call("kill 1",shell=True)
-    return '1', 200
+# Rejected request handlers
+@app.errorhandler(405)
+def page_not_found(error):
+    return 'The method is not allowed for the requested URL', 405
 
-if __name__ == '__main__':
-    SERVICE_PORT = os.environ['SERVICE_PORT']
+@app.errorhandler(404)
+def page_not_found(error):
+    return 'The requested URL was not found', 404
 
+if __name__ == '__main__':
+    if 'SERVICE_PORT' in os.environ:
+        SERVICE_PORT = os.environ['SERVICE_PORT']
+    if 'SWAGGER_PATH' not in os.environ:
+        exit("You have to provide a 'SWAGGER_PATH'")
+    
+    SWAGGER_PATH = os.environ['SWAGGER_PATH']
+    
     #Decoder parameters applied for both GMM and DNN based ASR systems
     decoder_settings = configparser.ConfigParser()
     decoder_settings.read(AM_PATH+'/decode.cfg')
@@ -174,15 +187,15 @@ def stop():
     AM_FINAL_PATH=AM_PATH+"/"+AM_FILE_PATH
     with open(AM_FINAL_PATH+"/conf/online.conf") as f:
         values = f.readlines()
-        with open(TEMP_FILE_PATH1+"/online.conf", 'w') as f:
+        with open(CONFIG_FILES_PATH+"/online.conf", 'w') as f:
             for i in values:
                 f.write(i)
-            f.write("--ivector-extraction-config="+TEMP_FILE_PATH1+"/ivector_extractor.conf\n")
+            f.write("--ivector-extraction-config="+CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
             f.write("--mfcc-config="+AM_FINAL_PATH+"/conf/mfcc.conf")
 
     with open(AM_FINAL_PATH+"/conf/ivector_extractor.conf") as f:
         values = f.readlines()
-        with open(TEMP_FILE_PATH1+"/ivector_extractor.conf", 'w') as f:
+        with open(CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
             for i in values:
                 f.write(i)
             f.write("--splice-config="+AM_FINAL_PATH+"/conf/splice.conf\n")
@@ -192,6 +205,18 @@ def stop():
             f.write("--diag-ubm="+AM_FINAL_PATH+"/ivector_extractor/final.dubm\n")
             f.write("--ivector-extractor="+AM_FINAL_PATH+"/ivector_extractor/final.ie")
 
+    ### swagger specific ###
+    swagger_yml = yaml.load(open(SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+    swaggerui = get_swaggerui_blueprint(
+        SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+        SWAGGER_PATH,
+        config={  # Swagger UI config overrides
+            'app_name': "STT API Documentation",
+            'spec': swagger_yml
+        }
+    )
+    app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
+    ### end swagger specific ###
 
     #Run server
     app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)

From e53ed3f244244fb67a5d597ad8e2eb60ba28d2ae Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 2 Jun 2020 20:20:33 +0200
Subject: [PATCH 004/172] New stt-worker based on pykaldi package

---
 Dockerfile  | 178 ++++++++++++++++++++++++++++-----------------
 Jenkinsfile |  19 +++++
 Makefile    |  14 ----
 RELEASE.md  |   8 +-
 run.py      | 205 +++++++---------------------------------------------
 5 files changed, 158 insertions(+), 266 deletions(-)
 delete mode 100755 Makefile

diff --git a/Dockerfile b/Dockerfile
index 83d7300..dc10fa9 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,80 +1,126 @@
-FROM debian:9
+# Dockerfile for building PyKaldi image from Ubuntu 16.04 image
+FROM ubuntu:18.04
 LABEL maintainer="irebai@linagora.com"
 
-# Install all our dependencies and set some required build changes
-RUN apt-get update &&\
-    apt-get install -y \
-    python2.7 \
-    python3   \
-    python-dev \
-    python3-dev \
-    python-pip \
+# Install necessary system packages
+RUN apt-get update \
+    && apt-get install -y \
+    python3 \
     python3-pip \
-    g++ make automake autoconf bzip2 unzip wget sox libtool git subversion zlib1g-dev ca-certificates gfortran patch ffmpeg nano && \
-    apt-get clean
+    python2.7 \
+    autoconf \
+    automake \
+    cmake \
+    make \
+    curl \
+    g++ \
+    git \
+    graphviz \
+    libatlas3-base \
+    libtool \
+    pkg-config \
+    sox \
+    subversion \
+    bzip2 \
+    unzip \
+    wget \
+    zlib1g-dev \
+    ca-certificates \
+    gfortran \
+    patch \
+    ffmpeg \
+    nano && \
+    ln -s /usr/bin/python3 /usr/bin/python && \
+    ln -s /usr/bin/pip3 /usr/bin/pip
 
-## Build kaldi and Clean installation (intel, openfst, src/*)
-RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
-    cd /opt/kaldi && \
-    cd /opt/kaldi/tools && \
-    ./extras/install_mkl.sh && \
-    make -j $(nproc) && \
-    cd /opt/kaldi/src && \
-    ./configure --shared && \
-    make depend -j $(nproc) && \
-    make -j $(nproc) && \
-    mkdir -p /opt/kaldi/src_/lib /opt/kaldi/src_/bin && \
-    mv /opt/kaldi/src/base/libkaldi-base.so \
-       /opt/kaldi/src/chain/libkaldi-chain.so \
-       /opt/kaldi/src/cudamatrix/libkaldi-cudamatrix.so \
-       /opt/kaldi/src/decoder/libkaldi-decoder.so \
-       /opt/kaldi/src/feat/libkaldi-feat.so \
-       /opt/kaldi/src/fstext/libkaldi-fstext.so \
-       /opt/kaldi/src/gmm/libkaldi-gmm.so \
-       /opt/kaldi/src/hmm/libkaldi-hmm.so \
-       /opt/kaldi/src/ivector/libkaldi-ivector.so \
-       /opt/kaldi/src/kws/libkaldi-kws.so \
-       /opt/kaldi/src/lat/libkaldi-lat.so \
-       /opt/kaldi/src/lm/libkaldi-lm.so \
-       /opt/kaldi/src/matrix/libkaldi-matrix.so \
-       /opt/kaldi/src/nnet/libkaldi-nnet.so \
-       /opt/kaldi/src/nnet2/libkaldi-nnet2.so \
-       /opt/kaldi/src/nnet3/libkaldi-nnet3.so \
-       /opt/kaldi/src/online2/libkaldi-online2.so \
-       /opt/kaldi/src/rnnlm/libkaldi-rnnlm.so \
-       /opt/kaldi/src/sgmm2/libkaldi-sgmm2.so \
-       /opt/kaldi/src/transform/libkaldi-transform.so \
-       /opt/kaldi/src/tree/libkaldi-tree.so \
-       /opt/kaldi/src/util/libkaldi-util.so \
-       /opt/kaldi/src_/lib && \
-    mv /opt/kaldi/src/online2bin/online2-wav-nnet2-latgen-faster \
-       /opt/kaldi/src/online2bin/online2-wav-nnet3-latgen-faster \
-       /opt/kaldi/src/latbin/lattice-1best \
-       /opt/kaldi/src/latbin/lattice-align-words \
-       /opt/kaldi/src/latbin/nbest-to-ctm /opt/kaldi/src_/bin && \
-    rm -rf /opt/kaldi/src && mv /opt/kaldi/src_ /opt/kaldi/src && \
-    cd /opt/kaldi/src && rm -f lmbin/*.cc lmbin/*.o lmbin/Makefile fstbin/*.cc fstbin/*.o fstbin/Makefile bin/*.cc bin/*.o bin/Makefile && \
-    cd /opt/intel/mkl/lib && rm -f intel64/*.a intel64_lin/*.a && \
-    cd /opt/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
-    rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
+# Install necessary Python packages (pykaldi dependencies)
+RUN pip install --upgrade pip \
+    numpy \
+    setuptools \
+    pyparsing \
+    ninja
 
-## Install python packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml
+## Install Protobuf, CLIF, Kaldi and PyKaldi and Clean installation
+RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
+    && cd /pykaldi/tools \
+    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_kaldi.sh \
+    && sed -i "s/\-j 2/-j $(nproc)/g" ./install_clif.sh \
+    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_protobuf.sh \
+    && ./check_dependencies.sh \
+    && ./install_protobuf.sh \
+    && ./install_clif.sh \
+    && ./install_kaldi.sh \
+    && cd /pykaldi \
+    && python setup.py install \
+    && rm -rf   /pykaldi/CMakeLists.txt \
+                /pykaldi/LICENSE \
+                /pykaldi/README.md \
+                /pykaldi/setup.cfg \
+                /pykaldi/setup.py \
+                /pykaldi/docker \
+                /pykaldi/docs \
+                /pykaldi/extras \
+                /pykaldi/pykaldi.egg-info \
+                /pykaldi/tests \
+                /pykaldi/build/CMakeCache.txt \
+                /pykaldi/build/bdist.linux-x86_64 \
+                /pykaldi/build/build.ninja \
+                /pykaldi/build/cmake_install.cmake \
+                /pykaldi/build/docs \
+                /pykaldi/build/kaldi \
+                /pykaldi/build/lib \
+                /pykaldi/build/rules.ninja \
+                /pykaldi/tools/check_dependencies.sh \
+                /pykaldi/tools/clif* \
+                /pykaldi/tools/find_python_library.py \
+                /pykaldi/tools/install_* \
+                /pykaldi/tools/protobuf \
+                /pykaldi/tools/use_namespace.sh \
+                /pykaldi/tools/kaldi/COPYING \
+                /pykaldi/tools/kaldi/INSTALL \
+                /pykaldi/tools/kaldi/README.md \
+                /pykaldi/tools/kaldi/egs \
+                /pykaldi/tools/kaldi/misc \
+                /pykaldi/tools/kaldi/scripts \
+                /pykaldi/tools/kaldi/windows \
+    && mkdir -p /pykaldi/tools/kaldi/src_/lib \
+    && mv  /pykaldi/tools/kaldi/src/base/libkaldi-base.so \
+            /pykaldi/tools/kaldi/src/chain/libkaldi-chain.so \
+            /pykaldi/tools/kaldi/src/cudamatrix/libkaldi-cudamatrix.so \
+            /pykaldi/tools/kaldi/src/decoder/libkaldi-decoder.so \
+            /pykaldi/tools/kaldi/src/feat/libkaldi-feat.so \
+            /pykaldi/tools/kaldi/src/fstext/libkaldi-fstext.so \
+            /pykaldi/tools/kaldi/src/gmm/libkaldi-gmm.so \
+            /pykaldi/tools/kaldi/src/hmm/libkaldi-hmm.so \
+            /pykaldi/tools/kaldi/src/ivector/libkaldi-ivector.so \
+            /pykaldi/tools/kaldi/src/kws/libkaldi-kws.so \
+            /pykaldi/tools/kaldi/src/lat/libkaldi-lat.so \
+            /pykaldi/tools/kaldi/src/lm/libkaldi-lm.so \
+            /pykaldi/tools/kaldi/src/matrix/libkaldi-matrix.so \
+            /pykaldi/tools/kaldi/src/nnet/libkaldi-nnet.so \
+            /pykaldi/tools/kaldi/src/nnet2/libkaldi-nnet2.so \
+            /pykaldi/tools/kaldi/src/nnet3/libkaldi-nnet3.so \
+            /pykaldi/tools/kaldi/src/online2/libkaldi-online2.so \
+            /pykaldi/tools/kaldi/src/rnnlm/libkaldi-rnnlm.so \
+            /pykaldi/tools/kaldi/src/sgmm2/libkaldi-sgmm2.so \
+            /pykaldi/tools/kaldi/src/transform/libkaldi-transform.so \
+            /pykaldi/tools/kaldi/src/tree/libkaldi-tree.so \
+            /pykaldi/tools/kaldi/src/util/libkaldi-util.so \
+            /pykaldi/tools/kaldi/src_/lib \
+        && rm -rf /pykaldi/tools/kaldi/src && mv /pykaldi/tools/kaldi/src_ /pykaldi/tools/kaldi/src \
+        && cd /pykaldi/tools/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
+                rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
 
-## Create symbolik links
-RUN cd /opt/kaldi/src/bin && \
-    ln -s online2-wav-nnet2-latgen-faster kaldi-nnet2-latgen-faster && \
-    ln -s online2-wav-nnet3-latgen-faster kaldi-nnet3-latgen-faster && \
-    ln -s lattice-1best kaldi-lattice-1best && \
-    ln -s lattice-align-words kaldi-lattice-align-words && \
-    ln -s nbest-to-ctm kaldi-nbest-to-ctm
+# Install main service packages
+RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml
 
 # Set environment variables
-ENV PATH /opt/kaldi/src/bin:/opt/kaldi/egs/wsj/s5/utils/:$PATH
+ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
 
 WORKDIR /usr/src/speech-to-text
+COPY tools.py .
 COPY run.py .
 
 EXPOSE 80
 
-CMD python3 ./run.py
+CMD python3 ./run.py
\ No newline at end of file
diff --git a/Jenkinsfile b/Jenkinsfile
index b4bdffc..530e391 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -47,5 +47,24 @@ pipeline {
                 }
             }
         }
+
+        stage('Docker build for pykaldi (unstable) branch'){
+            when{
+                branch 'pykaldi'
+            }
+            steps {
+                echo 'Publishing new Feature branch'
+                script {
+                    image = docker.build(env.DOCKER_HUB_REPO)
+                    VERSION = sh(
+                        returnStdout: true, 
+                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                    ).trim()
+                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+                        image.push('pykaldi')
+                    }
+                }
+            }
+        }
     }// end stages
 }
diff --git a/Makefile b/Makefile
deleted file mode 100755
index 6b72774..0000000
--- a/Makefile
+++ /dev/null
@@ -1,14 +0,0 @@
-echo "Compile nnet2_decoder"
-g++ -std=c++11 -L/usr/lib/atlas-base/atlas -L/opt/kaldi/tools/openfst/lib -L/opt/kaldi/src/lib -lblas -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lfst -lkaldi-cudamatrix -lkaldi-ivector -I /opt/kaldi/src -I /opt/kaldi/tools/openfst/include /opt/kaldi/src/online2bin/online2-wav-nnet2-latgen-faster.cc -o kaldi-nnet2-latgen-faster /opt/kaldi/src/lib/libkaldi-feat.so /opt/kaldi/src/lib/libkaldi-nnet2.so -lrt -lm -lpthread
-
-echo "Compile nnet3_decoder"
-g++ -std=c++11 -L/usr/lib/atlas-base/atlas -L/opt/kaldi/tools/openfst/lib -L/opt/kaldi/src/lib -lblas -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lfst -lkaldi-cudamatrix -lkaldi-ivector -I /opt/kaldi/src -I /opt/kaldi/tools/openfst/include /opt/kaldi/src/online2bin/online2-wav-nnet3-latgen-faster.cc -o kaldi-nnet3-latgen-faster /opt/kaldi/src/lib/libkaldi-feat.so -lrt -lm -lpthread
-
-echo "Compile lattice-to-1best"
-g++ -std=c++11 -L/usr/lib/atlas-base/atlas -L/opt/kaldi/tools/openfst/lib -L/opt/kaldi/src/lib -lblas -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lfst -lkaldi-cudamatrix -lkaldi-ivector -I /opt/kaldi/src -I /opt/kaldi/tools/openfst/include /opt/kaldi/src/latbin/lattice-1best.cc -o kaldi-lattice-1best /opt/kaldi/src/lib/libkaldi-feat.so -lrt -lm -lpthread
-
-echo "Compile lattice-align-words"
-g++ -std=c++11 -L/usr/lib/atlas-base/atlas -L/opt/kaldi/tools/openfst/lib -L/opt/kaldi/src/lib -lblas -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lfst -lkaldi-cudamatrix -lkaldi-ivector -I /opt/kaldi/src -I /opt/kaldi/tools/openfst/include /opt/kaldi/src/latbin/lattice-align-words.cc -o kaldi-lattice-align-words /opt/kaldi/src/lib/libkaldi-feat.so -lrt -lm -lpthread
-
-echo "Compile nbest-to-ctm"
-g++ -std=c++11 -L/usr/lib/atlas-base/atlas -L/opt/kaldi/tools/openfst/lib -L/opt/kaldi/src/lib -lblas -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lfst -lkaldi-cudamatrix -lkaldi-ivector -I /opt/kaldi/src -I /opt/kaldi/tools/openfst/include /opt/kaldi/src/latbin/nbest-to-ctm.cc -o kaldi-nbest-to-ctm /opt/kaldi/src/lib/libkaldi-feat.so -lrt -lm -lpthread
diff --git a/RELEASE.md b/RELEASE.md
index a2826a4..1d02d63 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,6 +1,2 @@
-# 1.1.2
-- New features:
-    - Word timestamp computing
-    - Response type: plain/text: simple text output and application/json: the transcription and the words timestamp.
-    - Swagger: integrate swagger in the service using a python package
-    - Fix minor bugs
\ No newline at end of file
+# 2.0.0
+- New ASR engine based on pykaldi package
\ No newline at end of file
diff --git a/run.py b/run.py
index d678e94..88b6649 100755
--- a/run.py
+++ b/run.py
@@ -4,7 +4,7 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-import uuid, os, configparser, subprocess, shlex, re, yaml
+import yaml, os
 
 app = Flask(__name__)
 
@@ -16,139 +16,36 @@
 SERVICE_PORT=80
 SWAGGER_URL='/api-doc'
 
+
 if not os.path.isdir(TEMP_FILE_PATH):
     os.mkdir(TEMP_FILE_PATH)
 if not os.path.isdir(CONFIG_FILES_PATH):
     os.mkdir(CONFIG_FILES_PATH)
 
+# Environment parameters
+if 'SERVICE_PORT' in os.environ:
+    SERVICE_PORT = os.environ['SERVICE_PORT']
+if 'SWAGGER_PATH' not in os.environ:
+    exit("You have to provide a 'SWAGGER_PATH'")
+SWAGGER_PATH = os.environ['SWAGGER_PATH']
 
-def run_shell_command(command_line):
-    try:
-        command_line_args = shlex.split(command_line)
-        process = subprocess.Popen(command_line_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
-        output, error = process.communicate()
-        return False, output
-    except OSError as err:
-        app.logger.info("OS error: {0}".format(err))
-        return True, ''
-    except ValueError:
-        app.logger.info("data error.")
-        return True, ''
-    except:
-        app.logger.info("Unexpected error:", sys.exc_info()[0])
-        return True, ''
-
-def decode(audio_file,wav_name,do_word_tStamp):
-    # Normalize audio file and convert it to wave format
-    error, output = run_shell_command("sox "+audio_file+" -t wav -b 16 -r 16000 -c 1 "+audio_file+".wav")
-    if not os.path.exists(audio_file+".wav"):
-        app.logger.info(output)
-        return False, 'Error during audio file conversion!!! Supported formats are wav, mp3, aiff, flac, and ogg.'
-
-
-    decode_file  = audio_file+".wav"
-    decode_conf  = CONFIG_FILES_PATH+"/online.conf"
-    decode_mdl   = AM_PATH+"/"+AM_FILE_PATH+"/final.mdl"
-    decode_graph = LM_PATH+"/HCLG.fst"
-    decode_words = LM_PATH+"/words.txt"
-    decode_words_boundary = LM_PATH+"/word_boundary.int"
-
-
-    # Decode the audio file
-    decode_opt =" --min-active="+DECODER_MINACT
-    decode_opt+=" --max-active="+DECODER_MAXACT
-    decode_opt+=" --beam="+DECODER_BEAM
-    decode_opt+=" --lattice-beam="+DECODER_LATBEAM
-    decode_opt+=" --acoustic-scale="+DECODER_ACWT
-        
-    
-    if DECODER_SYS == 'dnn3':
-        error, output = run_shell_command("kaldi-nnet3-latgen-faster --do-endpointing=false --frames-per-chunk=20 --online=false --frame-subsampling-factor="+DECODER_FSF+" --config="+decode_conf+" --minimize=false "+decode_opt+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
-    elif DECODER_SYS == 'dnn2' or DECODER_SYS == 'dnn':
-        error, output = run_shell_command("kaldi-nnet2-latgen-faster --do-endpointing=false --online=false --config="+decode_conf+" "+decode_opt+" --word-symbol-table="+decode_words+" "+decode_mdl+" "+decode_graph+" \"ark:echo "+wav_name+" "+wav_name+"|\" \"scp:echo "+wav_name+" "+decode_file+"|\" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat")
-    else:
-        return False, 'The "decoder" parameter of the acoustic model is not supported!!!'
-
-    if not os.path.exists(TEMP_FILE_PATH+"/"+wav_name+".lat"):
-        app.logger.info(output)
-        return False, 'One or multiple parameters of the acoustic model are not correct!!!'
-
-
-    # Normalize the obtained transcription
-    hypothesis = re.findall('\n'+wav_name+'.*',output.decode('utf-8'))
-    trans=re.sub(wav_name,'',hypothesis[0]).strip()
-    trans=re.sub(r"#nonterm:[^ ]* ", "", trans)
-    trans=re.sub(r" <unk> ", " ", " "+trans+" ")
-
-
-    # Get the begin and end time stamp from the decoder output
-    if do_word_tStamp:
-        error, output = run_shell_command("kaldi-lattice-1best --acoustic-scale="+DECODER_ACWT+" ark:"+TEMP_FILE_PATH+"/"+wav_name+".lat ark:"+TEMP_FILE_PATH+"/"+wav_name+".1best")
-        error, output = run_shell_command("kaldi-lattice-align-words "+decode_words_boundary+" "+decode_mdl+" ark:"+TEMP_FILE_PATH+"/"+wav_name+".1best ark:"+TEMP_FILE_PATH+"/"+wav_name+".words") 
-        error, output = run_shell_command("kaldi-nbest-to-ctm ark:"+TEMP_FILE_PATH+"/"+wav_name+".words "+TEMP_FILE_PATH+"/"+wav_name+".ctm")
-        error, output = run_shell_command("int2sym.pl -f 5 "+decode_words+" "+TEMP_FILE_PATH+"/"+wav_name+".ctm")
-        if not error and output != "":
-            words = output.decode('utf-8').split("\n")
-            trans = ""
-            data = {}
-            data["words"] = []
-            for word in words:
-                _word = word.strip().split(' ')
-                if len(_word) == 5:
-                    meta = {}
-                    word = re.sub("<unk>","",_word[4])
-                    word = re.sub("<unk>","",_word[4])
-                    if word != "":
-                        trans = trans+" "+word
-                        meta["word"] = word
-                        meta["stime"] = float(_word[2])
-                        meta["etime"] = (float(_word[2]) + float(_word[3]))
-                        meta["score"] = float(_word[1])
-                        data["words"].append(meta)
-            data["transcription"] = trans.strip()
-            return True, data
-        else:
-            app.logger.info("error during word time stamp generation")
-
-    return True, trans.strip()
+def swaggerUI():
+    ### swagger specific ###
+    swagger_yml = yaml.load(open(SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+    swaggerui = get_swaggerui_blueprint(
+        SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+        SWAGGER_PATH,
+        config={  # Swagger UI config overrides
+            'app_name': "STT API Documentation",
+            'spec': swagger_yml
+        }
+    )
+    app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
+    ### end swagger specific ###
 
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
-    global busy
-    busy=1
-    fileid = str(uuid.uuid4())
-    if request.headers.get('accept').lower() == 'application/json':
-        metadata = True
-    elif request.headers.get('accept').lower() == 'text/plain':
-        metadata = False
-    else:
-        return 'Not accepted header', 400
-    
-        
-    if 'file' in request.files.keys():
-        file = request.files['file']
-        file_ext = file.filename.rsplit('.', 1)[-1].lower()
-        file_type = file.content_type.rsplit('/', 1)[0]
-        if file_type == "audio":
-            filename = TEMP_FILE_PATH+'/'+fileid+'.'+file_ext
-            file.save(filename)
-            b, out = decode(filename,fileid,metadata)
-            if not b:
-                busy=0
-                return 'Error while file transcription: '+out, 400
-        else:
-            busy=0
-            return 'Error while file transcription: The uploaded file format is not supported!!! Supported formats are wav, mp3, aiff, flac, and ogg.', 400
-    else:
-        busy=0
-        return 'No audio file was uploaded', 400
-
-    # Delete temporary files
-    for file in os.listdir(TEMP_FILE_PATH):
-        os.remove(TEMP_FILE_PATH+"/"+file)
-    busy=0
-    json_string = json.dumps(out, ensure_ascii=False)
-    return Response(json_string,content_type="application/json; charset=utf-8" ), 200
+    return 'Test', 200
 
 @app.route('/healthcheck', methods=['GET'])
 def check():
@@ -164,60 +61,8 @@ def page_not_found(error):
     return 'The requested URL was not found', 404
 
 if __name__ == '__main__':
-    if 'SERVICE_PORT' in os.environ:
-        SERVICE_PORT = os.environ['SERVICE_PORT']
-    if 'SWAGGER_PATH' not in os.environ:
-        exit("You have to provide a 'SWAGGER_PATH'")
-    
-    SWAGGER_PATH = os.environ['SWAGGER_PATH']
+    #start SwaggerUI
+    swaggerUI()
     
-    #Decoder parameters applied for both GMM and DNN based ASR systems
-    decoder_settings = configparser.ConfigParser()
-    decoder_settings.read(AM_PATH+'/decode.cfg')
-    DECODER_SYS = decoder_settings.get('decoder_params', 'decoder')
-    AM_FILE_PATH = decoder_settings.get('decoder_params', 'ampath')
-    DECODER_MINACT = decoder_settings.get('decoder_params', 'min_active')
-    DECODER_MAXACT = decoder_settings.get('decoder_params', 'max_active')
-    DECODER_BEAM = decoder_settings.get('decoder_params', 'beam')
-    DECODER_LATBEAM = decoder_settings.get('decoder_params', 'lattice_beam')
-    DECODER_ACWT = decoder_settings.get('decoder_params', 'acwt')
-    DECODER_FSF = decoder_settings.get('decoder_params', 'frame_subsampling_factor')
-
-    #Prepare config files
-    AM_FINAL_PATH=AM_PATH+"/"+AM_FILE_PATH
-    with open(AM_FINAL_PATH+"/conf/online.conf") as f:
-        values = f.readlines()
-        with open(CONFIG_FILES_PATH+"/online.conf", 'w') as f:
-            for i in values:
-                f.write(i)
-            f.write("--ivector-extraction-config="+CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
-            f.write("--mfcc-config="+AM_FINAL_PATH+"/conf/mfcc.conf")
-
-    with open(AM_FINAL_PATH+"/conf/ivector_extractor.conf") as f:
-        values = f.readlines()
-        with open(CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
-            for i in values:
-                f.write(i)
-            f.write("--splice-config="+AM_FINAL_PATH+"/conf/splice.conf\n")
-            f.write("--cmvn-config="+AM_FINAL_PATH+"/conf/online_cmvn.conf\n")
-            f.write("--lda-matrix="+AM_FINAL_PATH+"/ivector_extractor/final.mat\n")
-            f.write("--global-cmvn-stats="+AM_FINAL_PATH+"/ivector_extractor/global_cmvn.stats\n")
-            f.write("--diag-ubm="+AM_FINAL_PATH+"/ivector_extractor/final.dubm\n")
-            f.write("--ivector-extractor="+AM_FINAL_PATH+"/ivector_extractor/final.ie")
-
-    ### swagger specific ###
-    swagger_yml = yaml.load(open(SWAGGER_PATH, 'r'), Loader=yaml.Loader)
-    swaggerui = get_swaggerui_blueprint(
-        SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
-        SWAGGER_PATH,
-        config={  # Swagger UI config overrides
-            'app_name': "STT API Documentation",
-            'spec': swagger_yml
-        }
-    )
-    app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
-    ### end swagger specific ###
-
     #Run server
-    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
-
+    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
\ No newline at end of file

From c8a50267e91e2f9629b52e8cb5c05b3dad96ac38 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 2 Jun 2020 20:24:42 +0200
Subject: [PATCH 005/172] add ASR tools and init ASR engine

---
 run.py   |   5 +++
 tools.py | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)
 create mode 100644 tools.py

diff --git a/run.py b/run.py
index 88b6649..98cb255 100755
--- a/run.py
+++ b/run.py
@@ -4,6 +4,7 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
+from tools import ASR
 import yaml, os
 
 app = Flask(__name__)
@@ -15,6 +16,7 @@
 CONFIG_FILES_PATH = '/opt/config'
 SERVICE_PORT=80
 SWAGGER_URL='/api-doc'
+asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
 
 
 if not os.path.isdir(TEMP_FILE_PATH):
@@ -64,5 +66,8 @@ def page_not_found(error):
     #start SwaggerUI
     swaggerUI()
     
+    #Run ASR engine
+    asr.run()
+
     #Run server
     app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
\ No newline at end of file
diff --git a/tools.py b/tools.py
new file mode 100644
index 0000000..7a93dd4
--- /dev/null
+++ b/tools.py
@@ -0,0 +1,114 @@
+## Kaldi ASR decoder
+from kaldi.asr import NnetLatticeFasterOnlineRecognizer
+from kaldi.decoder import LatticeFasterDecoderOptions
+from kaldi.nnet3 import NnetSimpleLoopedComputationOptions
+from kaldi.online2 import (OnlineEndpointConfig,
+                           OnlineIvectorExtractorAdaptationState,
+                           OnlineNnetFeaturePipelineConfig,
+                           OnlineNnetFeaturePipelineInfo,
+                           OnlineNnetFeaturePipeline,
+                           OnlineSilenceWeighting)
+from kaldi.util.options import ParseOptions
+from kaldi.util.table import SequentialWaveReader
+from kaldi.matrix import Matrix, Vector
+##############
+
+## word to CTM
+from kaldi.lat.align import (WordBoundaryInfoNewOpts,
+                            WordBoundaryInfo,
+                            word_align_lattice)
+from kaldi.lat.functions import compact_lattice_to_word_alignment
+from kaldi.asr import NnetRecognizer
+import kaldi.fstext as _fst
+##############
+
+## other packages
+import configparser, sys
+##############
+
+
+
+class ASR:
+    def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
+        self.AM_PATH = AM_PATH
+        self.LM_PATH = LM_PATH
+        self.CONFIG_FILES_PATH = CONFIG_FILES_PATH
+    
+    def run(self):
+        def loadConfig(self):
+            #get decoder parameters from "decode.cfg"
+            decoder_settings = configparser.ConfigParser()
+            decoder_settings.read(self.AM_PATH+'/decode.cfg')
+            self.DECODER_SYS = decoder_settings.get('decoder_params', 'decoder')
+            self.AM_FILE_PATH = decoder_settings.get('decoder_params', 'ampath')
+            self.DECODER_MINACT = decoder_settings.get('decoder_params', 'min_active')
+            self.DECODER_MAXACT = decoder_settings.get('decoder_params', 'max_active')
+            self.DECODER_BEAM = decoder_settings.get('decoder_params', 'beam')
+            self.DECODER_LATBEAM = decoder_settings.get('decoder_params', 'lattice_beam')
+            self.DECODER_ACWT = decoder_settings.get('decoder_params', 'acwt')
+            self.DECODER_FSF = decoder_settings.get('decoder_params', 'frame_subsampling_factor')
+
+            #Prepare "online.conf"
+            self.AM_PATH=self.AM_PATH+"/"+self.AM_FILE_PATH
+            with open(self.AM_PATH+"/conf/online.conf") as f:
+                values = f.readlines()
+                with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
+                    for i in values:
+                        f.write(i)
+                    f.write("--ivector-extraction-config="+self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
+                    f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf")
+
+            #Prepare "ivector_extractor.conf"
+            with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
+                values = f.readlines()
+                with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
+                    for i in values:
+                        f.write(i)
+                    f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
+                    f.write("--cmvn-config="+self.AM_PATH+"/conf/online_cmvn.conf\n")
+                    f.write("--lda-matrix="+self.AM_PATH+"/ivector_extractor/final.mat\n")
+                    f.write("--global-cmvn-stats="+self.AM_PATH+"/ivector_extractor/global_cmvn.stats\n")
+                    f.write("--diag-ubm="+self.AM_PATH+"/ivector_extractor/final.dubm\n")
+                    f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
+          
+        # Define online feature pipeline
+        print("Load decoder config")
+        loadConfig(self)
+        feat_opts = OnlineNnetFeaturePipelineConfig()
+        endpoint_opts = OnlineEndpointConfig()
+        po = ParseOptions("")
+        feat_opts.register(po)
+        endpoint_opts.register(po)
+        po.read_config_file(self.AM_PATH+"/conf/online.conf")
+        feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
+
+        # Construct recognizer
+        print("Load Decoder model")
+        decoder_opts = LatticeFasterDecoderOptions()
+        decoder_opts.beam = float(self.DECODER_BEAM)
+        decoder_opts.max_active = int(self.DECODER_MAXACT)
+        decoder_opts.min_active = int(self.DECODER_MINACT)
+        decoder_opts.lattice_beam = float(self.DECODER_LATBEAM)
+        decodable_opts = NnetSimpleLoopedComputationOptions()
+        decodable_opts.acoustic_scale = float(self.DECODER_ACWT)
+        decodable_opts.frame_subsampling_factor = int(self.DECODER_FSF)
+        decodable_opts.frames_per_chunk = 150
+        asr = NnetLatticeFasterOnlineRecognizer.from_files(
+            self.AM_PATH+"/final.mdl", self.LM_PATH+"/HCLG.fst", self.LM_PATH+"/words.txt",
+            decoder_opts=decoder_opts,
+            decodable_opts=decodable_opts,
+            endpoint_opts=endpoint_opts)
+
+
+
+class Audio:
+    def __init__(self):
+        print("start Audio")
+        
+    def readAudio(stream,type):
+        print(type)
+    
+    
+    def transformAudio():
+        print("###")
+    
\ No newline at end of file

From cde6c9273b3c5da1c0a14256cba3958ea8fa92e6 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 3 Jun 2020 14:43:25 +0200
Subject: [PATCH 006/172] add audio management and add exception

---
 Dockerfile         |  1 +
 docker-compose.yml |  2 +-
 run.py             | 53 +++++++++++++++++++++++++++++++++++++++-------
 tools.py           | 51 ++++++++++++++++++++++++++++++++------------
 4 files changed, 85 insertions(+), 22 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index dc10fa9..a7d1a8e 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -113,6 +113,7 @@ RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
 
 # Install main service packages
 RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml
+RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git
 
 # Set environment variables
 ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
diff --git a/docker-compose.yml b/docker-compose.yml
index 8c8e9aa..cdacdeb 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,7 +5,7 @@ services:
   stt-worker:
     container_name: stt-standalone-worker
     build: .
-    image: lintoai/linto-platform-stt-standalone-worker
+    image: lintoai/linto-platform-stt-standalone-worker:pykaldi
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM
diff --git a/run.py b/run.py
index 98cb255..3ae4e97 100755
--- a/run.py
+++ b/run.py
@@ -4,8 +4,8 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-from tools import ASR
-import yaml, os
+from tools import ASR, Audio, Logger
+import yaml, os, sox
 
 app = Flask(__name__)
 
@@ -14,10 +14,13 @@
 LM_PATH = '/opt/models/LM'
 TEMP_FILE_PATH = '/opt/tmp'
 CONFIG_FILES_PATH = '/opt/config'
-SERVICE_PORT=80
-SWAGGER_URL='/api-doc'
+SAVE_AUDIO = False
+SERVICE_PORT = 80
+SWAGGER_URL = '/api-doc'
 asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
-
+audio = Audio()
+logASR = Logger(app,"ASR")
+logAUDIO = Logger(app,"AUDIO")
 
 if not os.path.isdir(TEMP_FILE_PATH):
     os.mkdir(TEMP_FILE_PATH)
@@ -47,27 +50,61 @@ def swaggerUI():
 
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
-    return 'Test', 200
+    try:
+        #get response content type
+        if request.headers.get('accept').lower() == 'application/json':
+            metadata = True
+        elif request.headers.get('accept').lower() == 'text/plain':
+            metadata = False
+        else:
+            raise ValueError('Not accepted header')
+        
+        #get input file
+        if 'file' in request.files.keys():
+            file = request.files['file']
+            file_path = TEMP_FILE_PATH+file.filename.lower()
+            file_type = file.content_type.rsplit('/', 1)[0]
+            file.save(file_path)
+            audio.transform(file_path)
+        else:
+            raise ValueError('No audio file was uploaded')
+
+        return 'Test', 200
+    except ValueError as error:
+        return str(error), 400
+    except Exception as e:
+        app.logger.error(e)
+        return 'Server Error', 500
 
 @app.route('/healthcheck', methods=['GET'])
 def check():
-    return '1', 200
+    return '', 200
 
 # Rejected request handlers
 @app.errorhandler(405)
-def page_not_found(error):
+def method_not_allowed(error):
     return 'The method is not allowed for the requested URL', 405
 
 @app.errorhandler(404)
 def page_not_found(error):
     return 'The requested URL was not found', 404
 
+@app.errorhandler(500)
+def server_error(error):
+    app.logger.error(error)
+    return 'Server Error', 500
+
 if __name__ == '__main__':
     #start SwaggerUI
     swaggerUI()
     
     #Run ASR engine
     asr.run()
+    #Set Audio Sample Rate
+    audio.set_sample_rate(asr.get_sample_rate())
+    #Set log messages
+    asr.set_logger(logASR)
+    audio.set_logger(logAUDIO)
 
     #Run server
     app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 7a93dd4..a14475f 100644
--- a/tools.py
+++ b/tools.py
@@ -23,10 +23,20 @@
 ##############
 
 ## other packages
-import configparser, sys
+import configparser, sys, sox
 ##############
 
 
+class Logger:
+    def __init__(self,app,module):
+        self.app = app
+        self.module = module
+
+    def error(self,msg):
+        self.app.logger.error("["+self.module+"] "+str(msg))
+
+    def info(self,msg):
+        self.app.logger.info("["+self.module+"] "+str(msg))
 
 class ASR:
     def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
@@ -70,7 +80,7 @@ def loadConfig(self):
                     f.write("--global-cmvn-stats="+self.AM_PATH+"/ivector_extractor/global_cmvn.stats\n")
                     f.write("--diag-ubm="+self.AM_PATH+"/ivector_extractor/final.dubm\n")
                     f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
-          
+        
         # Define online feature pipeline
         print("Load decoder config")
         loadConfig(self)
@@ -80,8 +90,8 @@ def loadConfig(self):
         feat_opts.register(po)
         endpoint_opts.register(po)
         po.read_config_file(self.AM_PATH+"/conf/online.conf")
-        feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
-
+        self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
+        
         # Construct recognizer
         print("Load Decoder model")
         decoder_opts = LatticeFasterDecoderOptions()
@@ -93,22 +103,37 @@ def loadConfig(self):
         decodable_opts.acoustic_scale = float(self.DECODER_ACWT)
         decodable_opts.frame_subsampling_factor = int(self.DECODER_FSF)
         decodable_opts.frames_per_chunk = 150
-        asr = NnetLatticeFasterOnlineRecognizer.from_files(
+        self.asr = NnetLatticeFasterOnlineRecognizer.from_files(
             self.AM_PATH+"/final.mdl", self.LM_PATH+"/HCLG.fst", self.LM_PATH+"/words.txt",
             decoder_opts=decoder_opts,
             decodable_opts=decodable_opts,
             endpoint_opts=endpoint_opts)
 
+    def get_sample_rate(self):
+        return self.feat_info.mfcc_opts.frame_opts.samp_freq
 
-
+    def set_logger(self,log):
+        self.log = log
+    
 class Audio:
     def __init__(self):
-        print("start Audio")
-        
-    def readAudio(stream,type):
-        print(type)
+        self.bit = 16
+        self.channels = 1
+        self.sr = -1
+    
+    def set_sample_rate(self,sr):
+        self.sr = sr
     
+    def set_logger(self,log):
+        self.log = log
     
-    def transformAudio():
-        print("###")
-    
\ No newline at end of file
+    def transform(self,file_name):
+        try:
+            tfm = sox.Transformer()
+            tfm.set_output_format(rate=self.sr,
+                                  bits=self.bit,
+                                  channels=self.channels)
+            self.data = tfm.build_array(input_filepath=file_name)
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("The uploaded file format is not supported!!!")
\ No newline at end of file

From e09d76ad9d60529aff20987aa3b681fa61d16028 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 4 Jun 2020 13:01:46 +0200
Subject: [PATCH 007/172] add decode funtion to perform speech-to-text

---
 Dockerfile |  2 +-
 RELEASE.md |  4 ++--
 run.py     | 27 +++++++++++++++------------
 tools.py   | 30 ++++++++++++++++++++++++------
 4 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index a7d1a8e..d881220 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -112,7 +112,7 @@ RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
                 rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml
+RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger
 RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git
 
 # Set environment variables
diff --git a/RELEASE.md b/RELEASE.md
index 1d02d63..30d145e 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,2 +1,2 @@
-# 2.0.0
-- New ASR engine based on pykaldi package
\ No newline at end of file
+# 2.1.0
+- A fonctional offline ASR engine
\ No newline at end of file
diff --git a/run.py b/run.py
index 3ae4e97..090fc22 100755
--- a/run.py
+++ b/run.py
@@ -5,9 +5,11 @@
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
 from tools import ASR, Audio, Logger
-import yaml, os, sox
+import yaml, os, sox, logging
 
 app = Flask(__name__)
+app.logger.setLevel(logging.DEBUG)
+
 
 # Main parameters
 AM_PATH = '/opt/models/AM'
@@ -19,8 +21,8 @@
 SWAGGER_URL = '/api-doc'
 asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
 audio = Audio()
-logASR = Logger(app,"ASR")
-logAUDIO = Logger(app,"AUDIO")
+asr.set_logger(Logger(app,"ASR"))
+audio.set_logger(Logger(app,"AUDIO"))
 
 if not os.path.isdir(TEMP_FILE_PATH):
     os.mkdir(TEMP_FILE_PATH)
@@ -48,6 +50,13 @@ def swaggerUI():
     app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
     ### end swagger specific ###
 
+def getAudio(file):
+    file_path = TEMP_FILE_PATH+file.filename.lower()
+    file.save(file_path)
+    audio.transform(file_path)
+    if not SAVE_AUDIO:
+        os.remove(file_path)
+    
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
@@ -62,14 +71,12 @@ def transcribe():
         #get input file
         if 'file' in request.files.keys():
             file = request.files['file']
-            file_path = TEMP_FILE_PATH+file.filename.lower()
-            file_type = file.content_type.rsplit('/', 1)[0]
-            file.save(file_path)
-            audio.transform(file_path)
+            getAudio(file)
+            text = asr.decoder(audio)
         else:
             raise ValueError('No audio file was uploaded')
 
-        return 'Test', 200
+        return text, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
@@ -97,14 +104,10 @@ def server_error(error):
 if __name__ == '__main__':
     #start SwaggerUI
     swaggerUI()
-    
     #Run ASR engine
     asr.run()
     #Set Audio Sample Rate
     audio.set_sample_rate(asr.get_sample_rate())
-    #Set log messages
-    asr.set_logger(logASR)
-    audio.set_logger(logAUDIO)
 
     #Run server
     app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index a14475f..d2b04d8 100644
--- a/tools.py
+++ b/tools.py
@@ -23,12 +23,12 @@
 ##############
 
 ## other packages
-import configparser, sys, sox
+import configparser, sys, sox, time
 ##############
 
 
 class Logger:
-    def __init__(self,app,module):
+    def __init__(self,app,module=""):
         self.app = app
         self.module = module
 
@@ -82,18 +82,18 @@ def loadConfig(self):
                     f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
         
         # Define online feature pipeline
-        print("Load decoder config")
+        self.log.info("Load decoder config")
         loadConfig(self)
         feat_opts = OnlineNnetFeaturePipelineConfig()
         endpoint_opts = OnlineEndpointConfig()
         po = ParseOptions("")
         feat_opts.register(po)
         endpoint_opts.register(po)
-        po.read_config_file(self.AM_PATH+"/conf/online.conf")
+        po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
         self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
         
         # Construct recognizer
-        print("Load Decoder model")
+        self.log.info("Load Decoder model")
         decoder_opts = LatticeFasterDecoderOptions()
         decoder_opts.beam = float(self.DECODER_BEAM)
         decoder_opts.max_active = int(self.DECODER_MAXACT)
@@ -114,6 +114,21 @@ def get_sample_rate(self):
 
     def set_logger(self,log):
         self.log = log
+        
+    def decoder(self,audio):
+        try:
+            start_time = time.time()
+            feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
+            self.asr.set_input_pipeline(feat_pipeline)
+            feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
+            feat_pipeline.input_finished()
+            self.decode = self.asr.decode()
+            self.log.info("Decode time in seconds: %s" % (time.time() - start_time))
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Decoder failed to transcribe the input audio!!!")
+        else:
+            return self.decode["text"]
     
 class Audio:
     def __init__(self):
@@ -136,4 +151,7 @@ def transform(self,file_name):
             self.data = tfm.build_array(input_filepath=file_name)
         except Exception as e:
             self.log.error(e)
-            raise ValueError("The uploaded file format is not supported!!!")
\ No newline at end of file
+            raise ValueError("The uploaded file format is not supported!!!")
+    
+    def getDataKaldyVector(self):
+        return Vector(self.data)
\ No newline at end of file

From 8ec54b65289466ead42042310a0620631690353b Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 4 Jun 2020 13:15:57 +0200
Subject: [PATCH 008/172] replace class Logger by the logger package and
 configure it to show the lowest loggin level

---
 run.py   | 12 ++++++------
 tools.py | 21 ++++-----------------
 2 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/run.py b/run.py
index 090fc22..fb9b476 100755
--- a/run.py
+++ b/run.py
@@ -4,12 +4,14 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-from tools import ASR, Audio, Logger
+from tools import ASR, Audio
 import yaml, os, sox, logging
 
-app = Flask(__name__)
-app.logger.setLevel(logging.DEBUG)
+app = Flask("__stt-standelone-worker__")
 
+# Set logger config
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.DEBUG)
 
 # Main parameters
 AM_PATH = '/opt/models/AM'
@@ -21,8 +23,6 @@
 SWAGGER_URL = '/api-doc'
 asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
 audio = Audio()
-asr.set_logger(Logger(app,"ASR"))
-audio.set_logger(Logger(app,"AUDIO"))
 
 if not os.path.isdir(TEMP_FILE_PATH):
     os.mkdir(TEMP_FILE_PATH)
@@ -67,7 +67,7 @@ def transcribe():
             metadata = False
         else:
             raise ValueError('Not accepted header')
-        
+
         #get input file
         if 'file' in request.files.keys():
             file = request.files['file']
diff --git a/tools.py b/tools.py
index d2b04d8..27e0db4 100644
--- a/tools.py
+++ b/tools.py
@@ -23,23 +23,12 @@
 ##############
 
 ## other packages
-import configparser, sys, sox, time
+import configparser, sys, sox, time, logging
 ##############
 
-
-class Logger:
-    def __init__(self,app,module=""):
-        self.app = app
-        self.module = module
-
-    def error(self,msg):
-        self.app.logger.error("["+self.module+"] "+str(msg))
-
-    def info(self,msg):
-        self.app.logger.info("["+self.module+"] "+str(msg))
-
 class ASR:
     def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
+        self.log = logging.getLogger('__stt-standelone-worker__.ASR')
         self.AM_PATH = AM_PATH
         self.LM_PATH = LM_PATH
         self.CONFIG_FILES_PATH = CONFIG_FILES_PATH
@@ -112,9 +101,6 @@ def loadConfig(self):
     def get_sample_rate(self):
         return self.feat_info.mfcc_opts.frame_opts.samp_freq
 
-    def set_logger(self,log):
-        self.log = log
-        
     def decoder(self,audio):
         try:
             start_time = time.time()
@@ -129,9 +115,10 @@ def decoder(self,audio):
             raise ValueError("Decoder failed to transcribe the input audio!!!")
         else:
             return self.decode["text"]
-    
+
 class Audio:
     def __init__(self):
+        self.log = logging.getLogger('__stt-standelone-worker__.Audio')
         self.bit = 16
         self.channels = 1
         self.sr = -1

From 11354add40408cca9ff1d24980290ec865c61df7 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Sun, 7 Jun 2020 19:52:21 +0200
Subject: [PATCH 009/172] add word timestamp and SttStandelone class to manage
 the hyperparam

---
 run.py   | 11 +++++---
 tools.py | 82 +++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/run.py b/run.py
index fb9b476..5bacc83 100755
--- a/run.py
+++ b/run.py
@@ -4,7 +4,7 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-from tools import ASR, Audio
+from tools import ASR, Audio, SttStandelone
 import yaml, os, sox, logging
 
 app = Flask("__stt-standelone-worker__")
@@ -32,6 +32,8 @@
 # Environment parameters
 if 'SERVICE_PORT' in os.environ:
     SERVICE_PORT = os.environ['SERVICE_PORT']
+if 'SAVE_AUDIO' in os.environ:
+    SAVE_AUDIO = os.environ['SAVE_AUDIO']
 if 'SWAGGER_PATH' not in os.environ:
     exit("You have to provide a 'SWAGGER_PATH'")
 SWAGGER_PATH = os.environ['SWAGGER_PATH']
@@ -61,6 +63,7 @@ def getAudio(file):
 def transcribe():
     try:
         #get response content type
+        metadata = False
         if request.headers.get('accept').lower() == 'application/json':
             metadata = True
         elif request.headers.get('accept').lower() == 'text/plain':
@@ -68,15 +71,17 @@ def transcribe():
         else:
             raise ValueError('Not accepted header')
 
+        stt = SttStandelone(asr,metadata)
+
         #get input file
         if 'file' in request.files.keys():
             file = request.files['file']
             getAudio(file)
-            text = asr.decoder(audio)
+            output = stt.run(audio,asr)
         else:
             raise ValueError('No audio file was uploaded')
 
-        return text, 200
+        return output, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
diff --git a/tools.py b/tools.py
index 27e0db4..818a5f6 100644
--- a/tools.py
+++ b/tools.py
@@ -17,7 +17,8 @@
 from kaldi.lat.align import (WordBoundaryInfoNewOpts,
                             WordBoundaryInfo,
                             word_align_lattice)
-from kaldi.lat.functions import compact_lattice_to_word_alignment
+from kaldi.lat.functions import (compact_lattice_to_word_alignment,
+                                 compact_lattice_shortest_path)
 from kaldi.asr import NnetRecognizer
 import kaldi.fstext as _fst
 ##############
@@ -40,12 +41,12 @@ def loadConfig(self):
             decoder_settings.read(self.AM_PATH+'/decode.cfg')
             self.DECODER_SYS = decoder_settings.get('decoder_params', 'decoder')
             self.AM_FILE_PATH = decoder_settings.get('decoder_params', 'ampath')
-            self.DECODER_MINACT = decoder_settings.get('decoder_params', 'min_active')
-            self.DECODER_MAXACT = decoder_settings.get('decoder_params', 'max_active')
-            self.DECODER_BEAM = decoder_settings.get('decoder_params', 'beam')
-            self.DECODER_LATBEAM = decoder_settings.get('decoder_params', 'lattice_beam')
-            self.DECODER_ACWT = decoder_settings.get('decoder_params', 'acwt')
-            self.DECODER_FSF = decoder_settings.get('decoder_params', 'frame_subsampling_factor')
+            self.DECODER_MINACT = int(decoder_settings.get('decoder_params', 'min_active'))
+            self.DECODER_MAXACT = int(decoder_settings.get('decoder_params', 'max_active'))
+            self.DECODER_BEAM = float(decoder_settings.get('decoder_params', 'beam'))
+            self.DECODER_LATBEAM = float(decoder_settings.get('decoder_params', 'lattice_beam'))
+            self.DECODER_ACWT = float(decoder_settings.get('decoder_params', 'acwt'))
+            self.DECODER_FSF = int(decoder_settings.get('decoder_params', 'frame_subsampling_factor'))
 
             #Prepare "online.conf"
             self.AM_PATH=self.AM_PATH+"/"+self.AM_FILE_PATH
@@ -81,16 +82,22 @@ def loadConfig(self):
         po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
         self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
         
+        # Set metadata parameters
+        self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
+        self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
+        self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
+        self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
+
         # Construct recognizer
         self.log.info("Load Decoder model")
         decoder_opts = LatticeFasterDecoderOptions()
-        decoder_opts.beam = float(self.DECODER_BEAM)
-        decoder_opts.max_active = int(self.DECODER_MAXACT)
-        decoder_opts.min_active = int(self.DECODER_MINACT)
-        decoder_opts.lattice_beam = float(self.DECODER_LATBEAM)
+        decoder_opts.beam = self.DECODER_BEAM
+        decoder_opts.max_active = self.DECODER_MAXACT
+        decoder_opts.min_active = self.DECODER_MINACT
+        decoder_opts.lattice_beam = self.DECODER_LATBEAM
         decodable_opts = NnetSimpleLoopedComputationOptions()
-        decodable_opts.acoustic_scale = float(self.DECODER_ACWT)
-        decodable_opts.frame_subsampling_factor = int(self.DECODER_FSF)
+        decodable_opts.acoustic_scale = self.DECODER_ACWT
+        decodable_opts.frame_subsampling_factor = self.DECODER_FSF
         decodable_opts.frames_per_chunk = 150
         self.asr = NnetLatticeFasterOnlineRecognizer.from_files(
             self.AM_PATH+"/final.mdl", self.LM_PATH+"/HCLG.fst", self.LM_PATH+"/words.txt",
@@ -109,13 +116,58 @@ def decoder(self,audio):
             feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
             feat_pipeline.input_finished()
             self.decode = self.asr.decode()
+            self.text = self.decode['text']
             self.log.info("Decode time in seconds: %s" % (time.time() - start_time))
         except Exception as e:
             self.log.error(e)
             raise ValueError("Decoder failed to transcribe the input audio!!!")
-        else:
-            return self.decode["text"]
+        
+    def wordTimestamp(self):
+        try:
+            _fst.utils.scale_compact_lattice([[1.0, 0],[0, float(self.DECODER_ACWT)]], self.decode['lattice'])
+            bestPath = compact_lattice_shortest_path(self.decode['lattice'])
+            _fst.utils.scale_compact_lattice([[1.0, 0],[0, 1.0/float(self.DECODER_ACWT)]], bestPath)
+            bestLattice = word_align_lattice(bestPath, self.asr.transition_model, self.info, 0)
+            alignment = compact_lattice_to_word_alignment(bestLattice[1])
+            words = _fst.indices_to_symbols(self.symbols, alignment[0])
+            self.timestamps={
+                "words":words,
+                "start":alignment[1],
+                "dur":alignment[2]
+            }
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Decoder failed to create the word timestamps!!!")
+        
+class SttStandelone:
+    def __init__(self,asr,metadata=False):
+        self.log = logging.getLogger('__stt-standelone-worker__.SttStandelone')
+        self.metadata = metadata
 
+    def run(self,audio,asr):
+        asr.decoder(audio)
+        if self.metadata:
+            asr.wordTimestamp()
+            self.formatOutput(asr.timestamps,asr.frame_shift, asr.DECODER_FSF)
+            return self.output
+        else:
+            return asr.text
+        
+    def formatOutput(self,timestamps,frame_shift, frame_subsampling):
+        self.output = {}
+        text = ""
+        self.output["words"] = []
+        for i in range(len(timestamps["words"])):
+            if timestamps["words"][i] != "<eps>":
+                meta = {}
+                meta["word"] = timestamps["words"][i]
+                meta["begin"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
+                meta["end"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
+                self.output["words"].append(meta)
+                text += " "+meta["word"]
+        self.output["transcription"] = text
+        
+        
 class Audio:
     def __init__(self):
         self.log = logging.getLogger('__stt-standelone-worker__.Audio')

From 4f11a64374376a3c223d88fff6c82f6c2d9da7c2 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 8 Jun 2020 16:29:19 +0200
Subject: [PATCH 010/172] adapt the asr engine with the multiprocess mode and
 set the number of processes as external parameter

---
 .envdefault |  3 +-
 run.py      | 10 +++++-
 tools.py    | 90 ++++++++++++++++++++++++++++++++---------------------
 3 files changed, 65 insertions(+), 38 deletions(-)

diff --git a/.envdefault b/.envdefault
index 2246e24..80acea5 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,3 +1,4 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=/path/to/swagger/file
\ No newline at end of file
+SWAGGER_PATH=/path/to/swagger/file
+NBR_PROCESSES=1
\ No newline at end of file
diff --git a/run.py b/run.py
index 5bacc83..9810ff3 100755
--- a/run.py
+++ b/run.py
@@ -6,6 +6,7 @@
 from flask_cors import CORS
 from tools import ASR, Audio, SttStandelone
 import yaml, os, sox, logging
+from time import gmtime, strftime
 
 app = Flask("__stt-standelone-worker__")
 
@@ -18,6 +19,7 @@
 LM_PATH = '/opt/models/LM'
 TEMP_FILE_PATH = '/opt/tmp'
 CONFIG_FILES_PATH = '/opt/config'
+NBR_PROCESSES = 1
 SAVE_AUDIO = False
 SERVICE_PORT = 80
 SWAGGER_URL = '/api-doc'
@@ -34,6 +36,11 @@
     SERVICE_PORT = os.environ['SERVICE_PORT']
 if 'SAVE_AUDIO' in os.environ:
     SAVE_AUDIO = os.environ['SAVE_AUDIO']
+if 'NBR_PROCESSES' in os.environ:
+    if int(os.environ['NBR_PROCESSES']) > 0:
+        NBR_PROCESSES = int(os.environ['NBR_PROCESSES'])
+    else:
+        exit("You must to provide a positif number of processes 'NBR_PROCESSES'")
 if 'SWAGGER_PATH' not in os.environ:
     exit("You have to provide a 'SWAGGER_PATH'")
 SWAGGER_PATH = os.environ['SWAGGER_PATH']
@@ -62,6 +69,7 @@ def getAudio(file):
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
+        app.logger.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
         #get response content type
         metadata = False
         if request.headers.get('accept').lower() == 'application/json':
@@ -115,4 +123,4 @@ def server_error(error):
     audio.set_sample_rate(asr.get_sample_rate())
 
     #Run server
-    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=1)
\ No newline at end of file
+    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=NBR_PROCESSES)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 818a5f6..48063f5 100644
--- a/tools.py
+++ b/tools.py
@@ -1,6 +1,7 @@
 ## Kaldi ASR decoder
 from kaldi.asr import NnetLatticeFasterOnlineRecognizer
-from kaldi.decoder import LatticeFasterDecoderOptions
+from kaldi.decoder import (LatticeFasterDecoderOptions,
+                           LatticeFasterOnlineDecoder)
 from kaldi.nnet3 import NnetSimpleLoopedComputationOptions
 from kaldi.online2 import (OnlineEndpointConfig,
                            OnlineIvectorExtractorAdaptationState,
@@ -75,18 +76,16 @@ def loadConfig(self):
         self.log.info("Load decoder config")
         loadConfig(self)
         feat_opts = OnlineNnetFeaturePipelineConfig()
-        endpoint_opts = OnlineEndpointConfig()
+        self.endpoint_opts = OnlineEndpointConfig()
         po = ParseOptions("")
         feat_opts.register(po)
-        endpoint_opts.register(po)
+        self.endpoint_opts.register(po)
         po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
         self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
         
         # Set metadata parameters
         self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
         self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
-        self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
-        self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
 
         # Construct recognizer
         self.log.info("Load Decoder model")
@@ -95,64 +94,83 @@ def loadConfig(self):
         decoder_opts.max_active = self.DECODER_MAXACT
         decoder_opts.min_active = self.DECODER_MINACT
         decoder_opts.lattice_beam = self.DECODER_LATBEAM
-        decodable_opts = NnetSimpleLoopedComputationOptions()
-        decodable_opts.acoustic_scale = self.DECODER_ACWT
-        decodable_opts.frame_subsampling_factor = self.DECODER_FSF
-        decodable_opts.frames_per_chunk = 150
-        self.asr = NnetLatticeFasterOnlineRecognizer.from_files(
-            self.AM_PATH+"/final.mdl", self.LM_PATH+"/HCLG.fst", self.LM_PATH+"/words.txt",
-            decoder_opts=decoder_opts,
-            decodable_opts=decodable_opts,
-            endpoint_opts=endpoint_opts)
+        self.decodable_opts = NnetSimpleLoopedComputationOptions()
+        self.decodable_opts.acoustic_scale = self.DECODER_ACWT
+        self.decodable_opts.frame_subsampling_factor = self.DECODER_FSF
+        self.decodable_opts.frames_per_chunk = 150
+        
+        # Load Acoustic and graph models and other files
+        self.transition_model, self.acoustic_model = NnetRecognizer.read_model(self.AM_PATH+"/final.mdl")
+        graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
+        self.decoder_graph = LatticeFasterOnlineDecoder(graph, decoder_opts)
+        self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
+        self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
+        del graph, decoder_opts
 
     def get_sample_rate(self):
-        return self.feat_info.mfcc_opts.frame_opts.samp_freq
+        return self.samp_freq
 
-    def decoder(self,audio):
+    def get_frames(self,feat_pipeline):
+        rows = feat_pipeline.num_frames_ready()
+        cols = feat_pipeline.dim()
+        frames = Matrix(rows,cols)
+        feat_pipeline.get_frames(range(rows),frames)
+        return frames[:,:self.feat_info.mfcc_opts.num_ceps], frames[:,self.feat_info.mfcc_opts.num_ceps:]
+        # return feats + ivectors
+        
+    def compute_feat(self,audio):
+        feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
+        feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
+        feat_pipeline.input_finished()
+        return feat_pipeline
+        
+    def decoder(self,feats):
         try:
             start_time = time.time()
-            feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
-            self.asr.set_input_pipeline(feat_pipeline)
-            feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
-            feat_pipeline.input_finished()
-            self.decode = self.asr.decode()
-            self.text = self.decode['text']
+            asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
+                                                    self.symbols, decodable_opts= self.decodable_opts, endpoint_opts=self.endpoint_opts)
+            asr.set_input_pipeline(feats)
+            decode = asr.decode()
             self.log.info("Decode time in seconds: %s" % (time.time() - start_time))
         except Exception as e:
             self.log.error(e)
             raise ValueError("Decoder failed to transcribe the input audio!!!")
+        else:
+            return decode
         
-    def wordTimestamp(self):
+    def wordTimestamp(self,decode):
         try:
-            _fst.utils.scale_compact_lattice([[1.0, 0],[0, float(self.DECODER_ACWT)]], self.decode['lattice'])
-            bestPath = compact_lattice_shortest_path(self.decode['lattice'])
+            _fst.utils.scale_compact_lattice([[1.0, 0],[0, float(self.DECODER_ACWT)]], decode['lattice'])
+            bestPath = compact_lattice_shortest_path(decode['lattice'])
             _fst.utils.scale_compact_lattice([[1.0, 0],[0, 1.0/float(self.DECODER_ACWT)]], bestPath)
-            bestLattice = word_align_lattice(bestPath, self.asr.transition_model, self.info, 0)
+            bestLattice = word_align_lattice(bestPath, self.transition_model, self.info, 0)
             alignment = compact_lattice_to_word_alignment(bestLattice[1])
             words = _fst.indices_to_symbols(self.symbols, alignment[0])
-            self.timestamps={
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Decoder failed to create the word timestamps!!!")
+        else:
+            return {
                 "words":words,
                 "start":alignment[1],
                 "dur":alignment[2]
             }
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Decoder failed to create the word timestamps!!!")
-        
+
 class SttStandelone:
     def __init__(self,asr,metadata=False):
         self.log = logging.getLogger('__stt-standelone-worker__.SttStandelone')
         self.metadata = metadata
 
     def run(self,audio,asr):
-        asr.decoder(audio)
+        feats = asr.compute_feat(audio)
+        decode = asr.decoder(feats)
         if self.metadata:
-            asr.wordTimestamp()
-            self.formatOutput(asr.timestamps,asr.frame_shift, asr.DECODER_FSF)
+            timestamps = asr.wordTimestamp(decode)
+            self.formatOutput(timestamps,asr.frame_shift, asr.decodable_opts.frame_subsampling_factor)
             return self.output
         else:
-            return asr.text
-        
+            return decode["text"]
+
     def formatOutput(self,timestamps,frame_shift, frame_subsampling):
         self.output = {}
         text = ""

From b4265164111cab48f156f24854cab3655951c5ef Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 10 Jun 2020 15:58:59 +0200
Subject: [PATCH 011/172] add Speaker diarization feature

---
 Dockerfile           |  10 +-
 RELEASE.md           |   6 +-
 document/swagger.yml |  13 ++
 run.py               |  35 +++-
 tools.py             | 416 ++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 442 insertions(+), 38 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index d881220..6608943 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -111,14 +111,18 @@ RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
         && cd /pykaldi/tools/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
                 rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
 
+# Define the main folder
+WORKDIR /usr/src/speech-to-text
+
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger
-RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git
+RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger librosa webrtcvad scipy sklearn
+RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git \
+    && git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
+    && cp /pykaldi/tools/pyBK/diarizationFunctions.py .
 
 # Set environment variables
 ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
 
-WORKDIR /usr/src/speech-to-text
 COPY tools.py .
 COPY run.py .
 
diff --git a/RELEASE.md b/RELEASE.md
index 30d145e..818a2d4 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,2 +1,4 @@
-# 2.1.0
-- A fonctional offline ASR engine
\ No newline at end of file
+# 2.2.0
+- Speaker diarization feature: pyBK package
+- Mulithreading feature: Speech decoding and Speaker diarization processes
+- Optional parameter: real number of speaker in the audio
\ No newline at end of file
diff --git a/document/swagger.yml b/document/swagger.yml
index ebc5c08..57e818f 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -27,6 +27,19 @@ paths:
         description: "Audio File (wav, mp3, aiff, flac, ogg)"
         required: true
         type: "file"
+      - name: "nbrSpeaker"
+        in: "formData"
+        description: "Number of speakers in the audio"
+        required: false
+        type: "number"
+        default: 1
+      - name: "speaker"
+        in: "formData"
+        description: "Do speaker diarization"
+        required: false
+        type: "string"
+        enum: [ "Yes", "No" ]
+        default: "No"
       responses:
         200:
           description: Successfully transcribe the audio
diff --git a/run.py b/run.py
index 9810ff3..8ae7b76 100755
--- a/run.py
+++ b/run.py
@@ -4,7 +4,7 @@
 from flask import Flask, request, abort, Response, json
 from flask_swagger_ui import get_swaggerui_blueprint
 from flask_cors import CORS
-from tools import ASR, Audio, SttStandelone
+from tools import ASR, Audio, SpeakerDiarization, SttStandelone
 import yaml, os, sox, logging
 from time import gmtime, strftime
 
@@ -24,7 +24,6 @@
 SERVICE_PORT = 80
 SWAGGER_URL = '/api-doc'
 asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
-audio = Audio()
 
 if not os.path.isdir(TEMP_FILE_PATH):
     os.mkdir(TEMP_FILE_PATH)
@@ -59,7 +58,7 @@ def swaggerUI():
     app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
     ### end swagger specific ###
 
-def getAudio(file):
+def getAudio(file,audio):
     file_path = TEMP_FILE_PATH+file.filename.lower()
     file.save(file_path)
     audio.transform(file_path)
@@ -70,6 +69,10 @@ def getAudio(file):
 def transcribe():
     try:
         app.logger.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
+        # create main objects
+        spk = SpeakerDiarization()
+        audio = Audio(asr.get_sample_rate())
+        
         #get response content type
         metadata = False
         if request.headers.get('accept').lower() == 'application/json':
@@ -79,13 +82,29 @@ def transcribe():
         else:
             raise ValueError('Not accepted header')
 
-        stt = SttStandelone(asr,metadata)
+        #get speaker parameter
+        spkDiarization = False
+        if request.form.get('speaker') != None and (request.form.get('speaker').lower() == 'yes' or request.form.get('speaker').lower() == 'no'):
+            spkDiarization = True if request.form.get('speaker').lower() == 'yes' else False
+            #get number of speakers parameter
+            try:
+                if request.form.get('nbrSpeaker') != None and spkDiarization and int(request.form.get('nbrSpeaker')) > 0:
+                    spk.set_maxNrSpeakers(int(request.form.get('nbrSpeaker')))
+                elif request.form.get('nbrSpeaker') != None and spkDiarization:
+                    raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
+            except Exception as e:
+                app.logger.error(e)
+                raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
+        else:
+            raise ValueError('Not accepted "speaker" field value (yes|no)')
 
+        stt = SttStandelone(metadata,spkDiarization)
+        
         #get input file
         if 'file' in request.files.keys():
             file = request.files['file']
-            getAudio(file)
-            output = stt.run(audio,asr)
+            getAudio(file,audio)
+            output = stt.run(audio,asr,spk)
         else:
             raise ValueError('No audio file was uploaded')
 
@@ -119,8 +138,6 @@ def server_error(error):
     swaggerUI()
     #Run ASR engine
     asr.run()
-    #Set Audio Sample Rate
-    audio.set_sample_rate(asr.get_sample_rate())
 
     #Run server
-    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=True, threaded=False, processes=NBR_PROCESSES)
\ No newline at end of file
+    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 48063f5..05c6c10 100644
--- a/tools.py
+++ b/tools.py
@@ -24,8 +24,19 @@
 import kaldi.fstext as _fst
 ##############
 
+## Speaker Diarization
+from diarizationFunctions import *
+import numpy as np
+import librosa
+from kaldi.ivector import (compute_vad_energy,
+                           VadEnergyOptions)
+from kaldi.feat.mfcc import Mfcc, MfccOptions
+from kaldi.util.options import ParseOptions
+##############
+
 ## other packages
 import configparser, sys, sox, time, logging
+from concurrent.futures import ThreadPoolExecutor
 ##############
 
 class ASR:
@@ -127,6 +138,7 @@ def compute_feat(self,audio):
     def decoder(self,feats):
         try:
             start_time = time.time()
+            self.log.info("Start Decoding: %s" % (start_time))
             asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
                                                     self.symbols, decodable_opts= self.decodable_opts, endpoint_opts=self.endpoint_opts)
             asr.set_input_pipeline(feats)
@@ -156,44 +168,399 @@ def wordTimestamp(self,decode):
                 "dur":alignment[2]
             }
 
+class SpeakerDiarization:
+    def __init__(self):
+        self.log = logging.getLogger('__stt-standelone-worker__.SPKDiarization')
+
+       ### MFCC FEATURES PARAMETERS
+        self.frame_length_s=0.025
+        self.frame_shift_s=0.01
+        self.num_bins=40
+        self.num_ceps=40
+        self.low_freq=40
+        self.high_freq=-200
+        #####
+
+        ### VAD PARAMETERS
+        self.vad_ops = VadEnergyOptions()
+        self.vad_ops.vad_energy_mean_scale = 0.9
+        self.vad_ops.vad_energy_threshold = 5
+        #vad_ops.vad_frames_context = 2
+        #vad_ops.vad_proportion_threshold = 0.12
+        #####
+
+        ### Segment
+        self.seg_length = 100 # Window size in frames
+        self.seg_increment = 100 # Window increment after and before window in frames
+        self.seg_rate = 100 # Window shifting in frames
+        #####
+
+        ### KBM
+        self.minimumNumberOfInitialGaussians = 1024 # Minimum number of Gaussians in the initial pool
+        self.maximumKBMWindowRate = 50 # Maximum window rate for Gaussian computation
+        self.windowLength = 200 # Window length for computing Gaussians
+        self.kbmSize = 320 # Number of final Gaussian components in the KBM
+        self.useRelativeKBMsize = 1 # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
+        self.relKBMsize = 0.3 # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
+        ######
+
+        ### BINARY_KEY
+        self.topGaussiansPerFrame = 5 # Number of top selected components per frame
+        self.bitsPerSegmentFactor = 0.2 # Percentage of bits set to 1 in the binary keys
+        ######
+
+        ### CLUSTERING
+        self.N_init = 16 # Number of initial clusters
+        self.linkage = 0 # Set to one to perform linkage clustering instead of clustering/reassignment
+        self.linkageCriterion = 'average' # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
+        self.metric = 'cosine' # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
+        ######
+
+        ### CLUSTERING_SELECTION
+        self.metric_clusteringSelection = 'cosine' # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
+        self.bestClusteringCriterion = 'elbow' # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
+        self.sigma = 1 # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
+        self.percentile = 40
+        self.maxNrSpeakers = 16 # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
+        ######
+
+        ### RESEGMENTATION
+        self.resegmentation = 1 # Set to 1 to perform re-segmentation
+        self.modelSize = 6 # Number of GMM components
+        self.nbIter = 10 # Number of expectation-maximization (EM) iterations
+        self.smoothWin = 100 # Size of the likelihood smoothing window in nb of frames
+        ######
+    
+    def set_maxNrSpeakers(self,nbr):
+        self.maxNrSpeakers = nbr
+    
+    def compute_feat_Librosa(self,audio):
+        try:
+            self.log.info("Start feature extraction: %s" % (time.time()))
+            if audio.sr == 16000:
+                self.low_freq=20
+                self.high_freq=7600
+            data = audio.data/32768
+            frame_length_inSample = self.frame_length_s * audio.sr
+            hop = int(self.frame_shift_s * audio.sr)
+            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
+            mfccNumpy = librosa.feature.mfcc(y=data,
+                                             sr=audio.sr,
+                                             dct_type=2,
+                                             n_mfcc=self.num_ceps,
+                                             n_mels=self.num_bins,
+                                             n_fft=NFFT,
+                                             hop_length=hop,
+                                             fmin=self.low_freq,
+                                             fmax=self.high_freq).T
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed when extracting features!!!")
+        else:
+            return mfccNumpy
+
+    def compute_feat_KALDI(self,audio):
+        try:
+            self.log.info("Start feature extraction: %s" % (time.time()))
+            po = ParseOptions("")
+            mfcc_opts = MfccOptions()
+            mfcc_opts.use_energy = False
+            mfcc_opts.frame_opts.samp_freq = audio.sr
+            mfcc_opts.frame_opts.frame_length_ms = self.frame_length_s*1000
+            mfcc_opts.frame_opts.frame_shift_ms = self.frame_shift_s*1000
+            mfcc_opts.frame_opts.allow_downsample = False
+            mfcc_opts.mel_opts.num_bins = self.num_bins
+            mfcc_opts.mel_opts.low_freq = self.low_freq
+            mfcc_opts.mel_opts.high_freq = self.high_freq
+            mfcc_opts.num_ceps = self.num_ceps
+            mfcc_opts.register(po)
+            
+            # Create MFCC object and obtain sample frequency
+            mfccObj = Mfcc(mfcc_opts)
+            mfccKaldi = mfccObj.compute_features(audio.getDataKaldyVector(), audio.sr, 1.0)
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed while extracting features!!!")
+        else:
+            return mfccKaldi
+        
+    def computeVAD_WEBRTC(self, audio):
+        try:
+            self.log.info("Start VAD: %s" % (time.time()))
+            data = audio.data/32768
+            hop = 30
+            va_framed = py_webrtcvad(data, fs=audio.sr, fs_vad=audio.sr, hoplength=hop, vad_mode=0)
+            segments = get_py_webrtcvad_segments(va_framed,audio.sr)
+            maskSAD = np.zeros([1,nFeatures])
+            for seg in segments:
+                start=int(np.round(seg[0]/frame_shift_s))
+                end=int(np.round(seg[1]/frame_shift_s))
+                maskSAD[0][start:end]=1
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed while voice activity detection!!!")
+        else:
+            return maskSAD
+    
+    def computeVAD_KALDI(self, audio, feats=None):
+        try:
+            self.log.info("Start VAD: %s" % (time.time()))
+            vadStream = compute_vad_energy(self.vad_ops,feats)
+            vad = Vector(vadStream)
+            VAD = vad.numpy()
+                        
+            ### segmentation
+            occurence=[]
+            value=[]
+            occurence.append(1)
+            value.append(VAD[0])
+
+            # compute the speech and non-speech frames
+            for i in range(1,len(VAD)):
+                if value[-1] == VAD[i]:
+                    occurence[-1]+=1
+                else:
+                    occurence.append(1)
+                    value.append(VAD[i])
+
+            # filter the speech and non-speech segments that are below 30 frames
+            i = 0
+            while(i < len(occurence)):
+                if i != 0 and (occurence[i] < 30 or value[i-1] == value[i]):
+                    occurence[i-1] += occurence[i]
+                    del value[i]
+                    del occurence[i]
+                else:
+                    i+=1
+
+            # split if and only if the silence is above 50 frames
+            i = 0
+            while(i < len(occurence)):
+                if i != 0 and ((occurence[i] < 30 and value[i] == 0.0) or value[i-1] == value[i]):
+                    occurence[i-1] += occurence[i]
+                    del value[i]
+                    del occurence[i]
+                else:
+                    i+=1
+            
+            # compute VAD mask
+            maskSAD = np.zeros(len(VAD))
+            start=0
+            for i in range(len(occurence)):
+                if value[i] == 1.0:
+                    end=start+occurence[i]
+                    maskSAD[start:end] = 1
+                    start=end
+                else:
+                    start += occurence[i]
+
+            maskSAD = np.expand_dims(maskSAD, axis=0)
+        except ValueError as v:
+            self.log.error(v)
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed while voice activity detection!!!")
+        else:
+            return maskSAD
+
+    def run(self, audio, feats=None):
+        try:
+            def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
+                numberOfSpeechFeatures = finalSegmentTable[-1,2].astype(int)+1
+                solutionVector = np.zeros([1,numberOfSpeechFeatures])
+                for i in np.arange(np.size(finalSegmentTable,0)):
+                    solutionVector[0,np.arange(finalSegmentTable[i,1],finalSegmentTable[i,2]+1).astype(int)]=finalClusteringTable[i]
+                seg = np.empty([0,3]) 
+                solutionDiff = np.diff(solutionVector)[0]
+                first = 0
+                for i in np.arange(0,np.size(solutionDiff,0)):
+                    if solutionDiff[i]:
+                        last = i+1
+                        seg1 = (first)*frameshift
+                        seg2 = (last-first)*frameshift
+                        seg3 = solutionVector[0,last-1]
+                        if seg.shape[0] != 0 and seg3 == seg[-1][2]:
+                            seg[-1][1] += seg2
+                        elif seg3 and seg2 > 0.3: # and seg2 > 0.1
+                            seg = np.vstack((seg,[seg1,seg2,seg3]))
+                        first = i+1
+                last = np.size(solutionVector,1)
+                seg1 = (first-1)*frameshift
+                seg2 = (last-first+1)*frameshift
+                seg3 = solutionVector[0,last-1]
+                if seg3 == seg[-1][2]:
+                    seg[-1][1] += seg2
+                elif seg3 and seg2 > 0.3: # and seg2 > 0.1
+                    seg = np.vstack((seg,[seg1,seg2,seg3]))
+                seg = np.vstack((seg,[dur,-1,-1]))
+                seg[0][0]=0.0
+                return seg
+        
+
+            start_time = time.time()
+            self.log.info("Start Speaker Diarization: %s" % (start_time))
+            if self.maxNrSpeakers == 1:
+                self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
+                return [[0, audio.dur, 1],
+                        [audio.dur, -1, -1]]
+            if feats == None:
+                feats = self.compute_feat_KALDI(audio)
+            nFeatures = feats.shape[0]
+            maskSAD = self.computeVAD_KALDI(audio,feats)
+            maskUEM = np.ones([1,nFeatures])
+
+            mask = np.logical_and(maskUEM,maskSAD)    
+            mask = mask[0][0:nFeatures]
+            nSpeechFeatures=np.sum(mask)
+            speechMapping = np.zeros(nFeatures)
+            #you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
+            #so that we don't lose features on the way
+            speechMapping[np.nonzero(mask)] = np.arange(1,nSpeechFeatures+1)
+            data=feats[np.where(mask==1)]
+            del feats
+
+            segmentTable=getSegmentTable(mask,speechMapping,self.seg_length,self.seg_increment,self.seg_rate)
+            numberOfSegments=np.size(segmentTable,0)
+            #create the KBM
+            #set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
+            if np.floor((nSpeechFeatures-self.windowLength)/self.minimumNumberOfInitialGaussians) < self.maximumKBMWindowRate:
+                windowRate = int(np.floor((np.size(data,0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
+            else:
+                windowRate = int(self.maximumKBMWindowRate)
+            
+            if windowRate == 0:
+                raise ValueError('The audio is to short in order to perform the speaker diarization!!!')
+            
+            poolSize = np.floor((nSpeechFeatures-self.windowLength)/windowRate)
+            if  self.useRelativeKBMsize:
+                kbmSize = int(np.floor(poolSize*self.relKBMsize))
+            else:
+                kbmSize = int(self.kbmSize)
+            
+            #Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
+            kbm, gmPool = trainKBM(data,self.windowLength,windowRate,kbmSize)
+            
+            #'Selected',kbmSize,'gaussians from the pool'
+            Vg = getVgMatrix(data,gmPool,kbm,self.topGaussiansPerFrame)
+            
+            #'Computing binary keys for all segments... '
+            segmentBKTable, segmentCVTable = getSegmentBKs(segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
+            
+            #'Performing initial clustering... '
+            initialClustering = np.digitize(np.arange(numberOfSegments),np.arange(0,numberOfSegments,numberOfSegments/self.N_init))
+            
+            
+            #'Performing agglomerative clustering... '
+            if self.linkage:
+                finalClusteringTable, k = performClusteringLinkage(segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
+            else:
+                finalClusteringTable, k = performClustering(speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
+
+            #'Selecting best clustering...'
+            if self.bestClusteringCriterion == 'elbow':
+                bestClusteringID = getBestClustering(self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
+            elif self.bestClusteringCriterion == 'spectral':
+                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection,finalClusteringTable,self.N_init,segmentBKTable,segmentCVTable,k,self.sigma,self.percentile,self.maxNrSpeakers)+1
+                
+            if self.resegmentation and np.size(np.unique(finalClusteringTable[:,bestClusteringID.astype(int)-1]),0)>1:
+                finalClusteringTableResegmentation,finalSegmentTable = performResegmentation(data,speechMapping, mask,finalClusteringTable[:,bestClusteringID.astype(int)-1],segmentTable,self.modelSize,self.nbIter,self.smoothWin,nSpeechFeatures)
+                seg = getSegments(self.frame_shift_s,finalSegmentTable, np.squeeze(finalClusteringTableResegmentation), audio.dur)
+            else:
+                seg = getSegmentationFile(self.frame_shift_s,segmentTable, finalClusteringTable[:,bestClusteringID.astype(int)-1])
+            self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
+        except ValueError as v:
+            self.log.info(v)
+            return [[0, audio.dur, 1],
+                    [audio.dur, -1, -1]]
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker Diarization failed!!!")
+        else:
+            return seg
+        
 class SttStandelone:
-    def __init__(self,asr,metadata=False):
+    def __init__(self,metadata=False,spkDiarization=False):
         self.log = logging.getLogger('__stt-standelone-worker__.SttStandelone')
         self.metadata = metadata
-
-    def run(self,audio,asr):
+        self.spkDiarization = spkDiarization
+        self.timestamp = True if self.metadata or self.spkDiarization else False
+        
+    def run(self,audio,asr,spk):
         feats = asr.compute_feat(audio)
-        decode = asr.decoder(feats)
-        if self.metadata:
+        mfcc, ivector = asr.get_frames(feats)
+        if self.spkDiarization:
+            with ThreadPoolExecutor(max_workers=2) as executor:
+                thrd1 = executor.submit(asr.decoder, feats)
+                thrd2 = executor.submit(spk.run, audio, mfcc)
+                decode = thrd1.result()
+                spkSeg = thrd2.result()
+        else:
+            decode = asr.decoder(feats)
+            spkSeg = []
+        
+        if self.timestamp:
             timestamps = asr.wordTimestamp(decode)
-            self.formatOutput(timestamps,asr.frame_shift, asr.decodable_opts.frame_subsampling_factor)
-            return self.output
+            output = self.getOutput(timestamps,asr.frame_shift, asr.decodable_opts.frame_subsampling_factor,spkSeg)
+            if self.metadata:
+                return output
+            else:
+                return {"text":output["text"]}
         else:
             return decode["text"]
 
-    def formatOutput(self,timestamps,frame_shift, frame_subsampling):
-        self.output = {}
-        text = ""
-        self.output["words"] = []
-        for i in range(len(timestamps["words"])):
-            if timestamps["words"][i] != "<eps>":
-                meta = {}
-                meta["word"] = timestamps["words"][i]
-                meta["begin"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
-                meta["end"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
-                self.output["words"].append(meta)
-                text += " "+meta["word"]
-        self.output["transcription"] = text
+    def getOutput(self,timestamps,frame_shift, frame_subsampling, spkSeg = []):
+        output = {}
+        if len(spkSeg) == 0:
+            text = ""
+            output["words"] = []
+            for i in range(len(timestamps["words"])):
+                if timestamps["words"][i] != "<eps>":
+                    meta = {}
+                    meta["word"] = timestamps["words"][i]
+                    meta["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
+                    meta["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
+                    output["words"].append(meta)
+                    text += " "+meta["word"]
+            output["text"] = text
+        else:
+            output["speakers"] = []
+            output["text"] = []
+            j = 0
+            newSpk = 1
+            for i in range(len(timestamps["words"])):
+                if timestamps["words"][i] != "<eps>":
+                    if newSpk:
+                        speaker = {}
+                        speaker["speaker_id"] = "spk_"+str(int(spkSeg[j][2]))
+                        speaker["words"] = []
+                        txtSpk = speaker["speaker_id"]+":"
+                        newSpk = 0
+                    word = {}
+                    word["word"] = timestamps["words"][i]
+                    word["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
+                    word["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
+                    speaker["words"].append(word)
+                    txtSpk += " "+word["word"]
+                    if word["etime"] > spkSeg[j+1][0]:
+                        speaker["btime"] = speaker["words"][0]["btime"]
+                        speaker["etime"] = speaker["words"][-1]["etime"]
+                        output["speakers"].append(speaker)
+                        output["text"].append(txtSpk)
+                        newSpk = 1
+                        j += 1
+            #add the last speaker to the output speakers
+            speaker["btime"] = speaker["words"][0]["btime"]
+            speaker["etime"] = speaker["words"][-1]["etime"]
+            output["speakers"].append(speaker)
+            output["text"].append(txtSpk)
+        return output
         
         
 class Audio:
-    def __init__(self):
+    def __init__(self,sr):
         self.log = logging.getLogger('__stt-standelone-worker__.Audio')
         self.bit = 16
         self.channels = 1
-        self.sr = -1
-    
-    def set_sample_rate(self,sr):
         self.sr = sr
     
     def set_logger(self,log):
@@ -206,6 +573,7 @@ def transform(self,file_name):
                                   bits=self.bit,
                                   channels=self.channels)
             self.data = tfm.build_array(input_filepath=file_name)
+            self.dur = len(self.data) / self.sr
         except Exception as e:
             self.log.error(e)
             raise ValueError("The uploaded file format is not supported!!!")

From 0eb3129c529e8934a4bacc4471a9199f89acb264 Mon Sep 17 00:00:00 2001
From: Houpert <yoann.houpert@yahoo.fr>
Date: Fri, 3 Jul 2020 10:01:50 +0200
Subject: [PATCH 012/172] Update README.md

---
 README.md | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 47bd234..8e53305 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,17 @@
-# Automatic Speech Recognition - LinSTT
+# Linto-Platform-Stt-Standalone-Worker
 
+This service is mandatory in a LinTO platform stack as the main worker for speech to text toolkit.
 
-## LinSTT
 Generally, Automatic Speech Recognition (ASR) is the task of recognition and translation of spoken language into text. Our ASR system takes advantages from the recent advances in machine learning technologies and in particular deep learning ones (TDNN, LSTM, attentation-based architecture). The core of our system consists of two main components: an acoustic model and a decoding graph. A high-performance ASR system relies on an accurate acoustic model as well as a perfect decoding graph.
 
+## Usage
+See documentation : [doc.linto.ai](https://doc.linto.ai)
+
+# Deploy
+
+With our proposed stack [linto-platform-stack](https://github.com/linto-ai/linto-platform-stack)
+
+# Develop
 
 ## Installation
 

From 761156d314c1699e73014e027ebe6196d6c8d6e2 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 7 Jul 2020 16:29:44 +0200
Subject: [PATCH 013/172] change Swagger service into optional

---
 run.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/run.py b/run.py
index 8ae7b76..643a019 100755
--- a/run.py
+++ b/run.py
@@ -23,6 +23,7 @@
 SAVE_AUDIO = False
 SERVICE_PORT = 80
 SWAGGER_URL = '/api-doc'
+SWAGGER_PATH = ''
 asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
 
 if not os.path.isdir(TEMP_FILE_PATH):
@@ -40,9 +41,8 @@
         NBR_PROCESSES = int(os.environ['NBR_PROCESSES'])
     else:
         exit("You must to provide a positif number of processes 'NBR_PROCESSES'")
-if 'SWAGGER_PATH' not in os.environ:
-    exit("You have to provide a 'SWAGGER_PATH'")
-SWAGGER_PATH = os.environ['SWAGGER_PATH']
+if 'SWAGGER_PATH' in os.environ:
+    SWAGGER_PATH = os.environ['SWAGGER_PATH']
 
 def swaggerUI():
     ### swagger specific ###
@@ -135,9 +135,11 @@ def server_error(error):
 
 if __name__ == '__main__':
     #start SwaggerUI
-    swaggerUI()
+    if SWAGGER_PATH != '':
+        swaggerUI()
+
     #Run ASR engine
     asr.run()
 
     #Run server
-    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
\ No newline at end of file
+    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)

From d28e02195a828690fd440b91f308ae536827abae Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 8 Jul 2020 16:44:25 +0200
Subject: [PATCH 014/172] fix some bugs related to ASR model loading. add
 word_boundary file generation

---
 Jenkinsfile        |  19 --------
 RELEASE.md         |  20 +++++++-
 docker-compose.yml |   4 +-
 run.py             |  18 +++++---
 tools.py           | 112 ++++++++++++++++++++++++++++++---------------
 5 files changed, 106 insertions(+), 67 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 530e391..b4bdffc 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -47,24 +47,5 @@ pipeline {
                 }
             }
         }
-
-        stage('Docker build for pykaldi (unstable) branch'){
-            when{
-                branch 'pykaldi'
-            }
-            steps {
-                echo 'Publishing new Feature branch'
-                script {
-                    image = docker.build(env.DOCKER_HUB_REPO)
-                    VERSION = sh(
-                        returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
-                    ).trim()
-                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push('pykaldi')
-                    }
-                }
-            }
-        }
     }// end stages
 }
diff --git a/RELEASE.md b/RELEASE.md
index 818a2d4..c7827d5 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,4 +1,22 @@
+# 2.2.1
+- Fix minor bugs
+- put SWAGGER_PATH parameter as optional
+- Generate the word_boundary file if it not exists
+
 # 2.2.0
 - Speaker diarization feature: pyBK package
 - Mulithreading feature: Speech decoding and Speaker diarization processes
-- Optional parameter: real number of speaker in the audio
\ No newline at end of file
+- Optional parameter: real number of speaker in the audio
+
+# 2.0.0
+- Reimplement LinTO-Platform-stt-standalone-worker using Pykaldi package
+
+# 1.1.2
+- New features:
+    - Word timestamp computing
+    - Response type: plain/text: simple text output and application/json: the transcription and the words timestamp.
+    - Swagger: integrate swagger in the service using a python package
+    - Fix minor bugs
+
+# 1.0.0
+- First build of LinTO-Platform-stt-standalone-worker
\ No newline at end of file
diff --git a/docker-compose.yml b/docker-compose.yml
index cdacdeb..08c14d0 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,7 +5,7 @@ services:
   stt-worker:
     container_name: stt-standalone-worker
     build: .
-    image: lintoai/linto-platform-stt-standalone-worker:pykaldi
+    image: lintoai/linto-platform-stt-standalone-worker:latest
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM
@@ -15,4 +15,4 @@ services:
         published: 8888
     env_file: .env
     environment:
-      SWAGGER_PATH: /opt/swagger.yml
\ No newline at end of file
+      SWAGGER_PATH: /opt/swagger.yml
diff --git a/run.py b/run.py
index 643a019..8a0f52d 100755
--- a/run.py
+++ b/run.py
@@ -134,12 +134,16 @@ def server_error(error):
     return 'Server Error', 500
 
 if __name__ == '__main__':
-    #start SwaggerUI
-    if SWAGGER_PATH != '':
-        swaggerUI()
+    try:
+        #start SwaggerUI
+        if SWAGGER_PATH != '':
+            swaggerUI()
 
-    #Run ASR engine
-    asr.run()
+        #Run ASR engine
+        asr.run()
 
-    #Run server
-    app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
+        #Run server
+        app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
+    except Exception as e:
+        app.logger.error(e)
+        exit(e)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 05c6c10..f8298e6 100644
--- a/tools.py
+++ b/tools.py
@@ -35,7 +35,7 @@
 ##############
 
 ## other packages
-import configparser, sys, sox, time, logging
+import configparser, sys, os, re, sox, time, logging
 from concurrent.futures import ThreadPoolExecutor
 ##############
 
@@ -82,41 +82,72 @@ def loadConfig(self):
                     f.write("--global-cmvn-stats="+self.AM_PATH+"/ivector_extractor/global_cmvn.stats\n")
                     f.write("--diag-ubm="+self.AM_PATH+"/ivector_extractor/final.dubm\n")
                     f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
-        
-        # Define online feature pipeline
-        self.log.info("Load decoder config")
-        loadConfig(self)
-        feat_opts = OnlineNnetFeaturePipelineConfig()
-        self.endpoint_opts = OnlineEndpointConfig()
-        po = ParseOptions("")
-        feat_opts.register(po)
-        self.endpoint_opts.register(po)
-        po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
-        self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
-        
-        # Set metadata parameters
-        self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
-        self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
+            
+            #Prepare "word_boundary.int" if not exist
+            if not os.path.exists(self.LM_PATH+"/word_boundary.int"):
+                if os.path.exists(self.AM_PATH+"phones.txt"):
+                    with open(self.AM_PATH+"phones.txt") as f:
+                        phones = f.readlines()
+
+                    with open(self.LM_PATH+"/word_boundary.int", "w") as f:
+                        for phone in phones:
+                            phone = phone.strip()
+                            phone = re.sub('^<eps> .*','', phone)
+                            phone = re.sub('^#\d+ .*','', phone)
+                            if phone != '':
+                                id = phone.split(' ')[1]
+                                if '_I ' in phone:
+                                    f.write(id+" internal\n")
+                                elif '_B ' in phone:
+                                    f.write(id+" begin\n")
+                                elif '_E ' in phone:
+                                    f.write(id+" end\n")
+                                elif '_S ' in phone:
+                                    f.write(id+" singleton\n")
+                                else:
+                                    f.write(id+" nonword\n")
 
-        # Construct recognizer
-        self.log.info("Load Decoder model")
-        decoder_opts = LatticeFasterDecoderOptions()
-        decoder_opts.beam = self.DECODER_BEAM
-        decoder_opts.max_active = self.DECODER_MAXACT
-        decoder_opts.min_active = self.DECODER_MINACT
-        decoder_opts.lattice_beam = self.DECODER_LATBEAM
-        self.decodable_opts = NnetSimpleLoopedComputationOptions()
-        self.decodable_opts.acoustic_scale = self.DECODER_ACWT
-        self.decodable_opts.frame_subsampling_factor = self.DECODER_FSF
-        self.decodable_opts.frames_per_chunk = 150
+                else:
+                    raise ValueError('Neither word_boundary.int nor phones.txt exists!!!')
         
-        # Load Acoustic and graph models and other files
-        self.transition_model, self.acoustic_model = NnetRecognizer.read_model(self.AM_PATH+"/final.mdl")
-        graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
-        self.decoder_graph = LatticeFasterOnlineDecoder(graph, decoder_opts)
-        self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
-        self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
-        del graph, decoder_opts
+        try:
+            # Define online feature pipeline
+            self.log.info("Load decoder config")
+            loadConfig(self)
+            feat_opts = OnlineNnetFeaturePipelineConfig()
+            self.endpoint_opts = OnlineEndpointConfig()
+            po = ParseOptions("")
+            feat_opts.register(po)
+            self.endpoint_opts.register(po)
+            po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
+            self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
+            
+            # Set metadata parameters
+            self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
+            self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
+
+            # Construct recognizer
+            self.log.info("Load Decoder model")
+            decoder_opts = LatticeFasterDecoderOptions()
+            decoder_opts.beam = self.DECODER_BEAM
+            decoder_opts.max_active = self.DECODER_MAXACT
+            decoder_opts.min_active = self.DECODER_MINACT
+            decoder_opts.lattice_beam = self.DECODER_LATBEAM
+            self.decodable_opts = NnetSimpleLoopedComputationOptions()
+            self.decodable_opts.acoustic_scale = self.DECODER_ACWT
+            self.decodable_opts.frame_subsampling_factor = self.DECODER_FSF
+            self.decodable_opts.frames_per_chunk = 150
+            
+            # Load Acoustic and graph models and other files
+            self.transition_model, self.acoustic_model = NnetRecognizer.read_model(self.AM_PATH+"/final.mdl")
+            graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
+            self.decoder_graph = LatticeFasterOnlineDecoder(graph, decoder_opts)
+            self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
+            self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
+            del graph, decoder_opts
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("AM and LM loading failed!!! (see logs for more details)")
 
     def get_sample_rate(self):
         return self.samp_freq
@@ -130,10 +161,15 @@ def get_frames(self,feat_pipeline):
         # return feats + ivectors
         
     def compute_feat(self,audio):
-        feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
-        feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
-        feat_pipeline.input_finished()
-        return feat_pipeline
+        try:
+            feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
+            feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
+            feat_pipeline.input_finished()
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Feature extraction failed!!!")
+        else:
+            return feat_pipeline
         
     def decoder(self,feats):
         try:

From 2bc6a96815ce388f105a89a743d07f53d3b9d0c3 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 8 Jul 2020 16:45:39 +0200
Subject: [PATCH 015/172] fix RELEASE description

---
 RELEASE.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/RELEASE.md b/RELEASE.md
index c7827d5..8712413 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,7 +1,7 @@
 # 2.2.1
 - Fix minor bugs
 - put SWAGGER_PATH parameter as optional
-- Generate the word_boundary file if it not exists
+- Generate the word_boundary file if it does not exist
 
 # 2.2.0
 - Speaker diarization feature: pyBK package

From ff80d207be010d011e0d76e69a424fa478276b3a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 23 Jul 2020 12:24:03 +0200
Subject: [PATCH 016/172] add the generation of the offline image

---
 Jenkinsfile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Jenkinsfile b/Jenkinsfile
index b4bdffc..d027c84 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -24,6 +24,7 @@ pipeline {
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push("${VERSION}")
                         image.push('latest')
+                        image.push('offline')
                     }
                 }
             }

From b09040714e8b02c5030681687b9ca6f6d756affc Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 14 Aug 2020 13:05:58 +0200
Subject: [PATCH 017/172] fix minor bugs: set speakerDiarization to False by
 default, and allow speaker diarization for audio longer than 3 seconds

---
 run.py           |   3 ++-
 test/bonjour.wav | Bin 53810 -> 38496 bytes
 tools.py         |   2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/run.py b/run.py
index 8a0f52d..ecdbb18 100755
--- a/run.py
+++ b/run.py
@@ -96,7 +96,8 @@ def transcribe():
                 app.logger.error(e)
                 raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
         else:
-            raise ValueError('Not accepted "speaker" field value (yes|no)')
+            if request.form.get('speaker') != None:
+                raise ValueError('Not accepted "speaker" field value (yes|no)')
 
         stt = SttStandelone(metadata,spkDiarization)
         
diff --git a/test/bonjour.wav b/test/bonjour.wav
index d82dff97144aaea6e86a655a7246d0a65343479c..f03944e35c448f2226923356f7208d0234a6419a 100644
GIT binary patch
literal 38496
zcmaI-1(@5q)&>eYaSVyWoJ=y4WU$QLWt=iIGq=0U%#6Ft%sgdgp0?9rW)3quzRu2>
zzdhf7?u{NxvgloFNtU#hlz~0Eb{&2g0ETrO+GXnOMJg@;02I=#Yez;_0{|7^0+S~!
zn$VOi6U-q9N-#qH2qF~zZ@(tDCj8%}f~leQf_|v%pZ@<_tr{AF`2_h3)%`Oy82|JA
z&z6Gaf9HkpLis`NYB;Qk1xx-}=KsD0YyNI6)K16`Que=dYEpxe4*i0$8kAsKO?pjk
zFs#WBrTu^VA>?3<kbFWd*2I6e^}ldJEeCV{r2lvPcb)&H=ue6LZsY%-&YJqc+QE3u
zdoUJi@y~XDa}sPRnDZwkAqoCj{=ZwSLH~_WBZEKd)RYQ^HD!Y(|HS&UR0t(hvZj0}
z|Npl8-(~)9Zm>=;E!18px90tSw--vS@oOYj^Zq;hzr7Kn<G=C$oAO}YV1AJ6kRNR0
zH#Ie~2&LB44N+dxi^2Y=aW(DzrYY2ZjbDQl$`2yd;09A_=nbXTyw|iH3aMnN;1i;u
zhN3_5LV3Sa0{@}}h>#=5F!cMAmLL_s@q)1+N{s{mf3Z+{4ML5aepB(ErvJqXA_WMX
zbiuHO&);SLMEnzt3Iy@Me_ai)!CJw#LvjpJ8*D$6T7wfr3gU$jLwtr%g7t$`hv@&4
zuMkc!1^f@sp;Ezm!QXGp5I-R)hERgO|36QN#}H*9Nrtd$I17a#w4hJ>4~HS#5Pdb=
z1UU`z@EfrP&rjk7U5LjZeIa=Uv4X9J`1mcqAWje|_(1=`tQkEa-a`EN|CL;g^nzUZ
z2_K0Pd}{DQ6#eEfgcWK(giwQ36V^~f4H8QJg8YPd3-J@;BgD~fKZF>hB}8FJHZ^EL
zE@&h#!7%iM;-R-7=OKyw#tZdP$oKt+Gat!W&;_{+`XPFP<0&Y~;2xl66o#;VqlVD_
z?7tekAZMYG7?NHHD~RAF5xoB*)u09C9dtn|L*tG5Z}0rc6AXl0kl$ec1!)R$7wm!1
zeDoVFB)#7p1!JMNU{2_J2=!bDEjU_2at!eoLJIO$Bh66X{cqkvJO%j*JvB%nj%!Bl
z@3C3acQlft8jK(!^aqC_{%Y_5;J?RW4S&C-i2p0q(05Str3a-F`p!bU)o>LILy`?i
zlJ*~&hsH=~RE9=Eu&07Nc?mb+AzY;IA{>O1d>mxVN!UqeBSR}0y2&t@?<GsR|8;&c
z7IeY7o_|YQ$&wE8F%w3@@b9A|stFyTCm-#<PsleCCh{$qZX{i(w3$S*5f&23MbZ=+
zLBTObA!QSsAN>Euf;}1fvVt-T{414E-+M_4g0wpRr8&q&kosU~ArV5H1nD!8E{Jda
z*VmJD8Auw9BsB)IT!>rGzcTTYxWQfp0Xl#HD7jB&0b)P`$N@DF0mK5)Kz*PQPzPuU
zGz0zu+5;VdPC!qfD-aL#0Qv$0fg!*!U^Flmm;g)$rU5g_a0VI90A`c%slY^FJefBN
z7(|xp2lN4Yk+pgN&4I>1y?<McCfgSSe1HcqN$g-x21f<)uXIiSQf(vUS4otTE}wKc
zL^k;(lYT0ZNrvg<TP9IR=9H7XRg!XWlX67>4#|fEhy!W^tx1}@0li6DM*)+7dBA*N
zDX<>c1snqo1E+v1z<JVLBHb<E25_77Zv%Hp|2%Mke2xM8fvvz6U<0s%tT&P5b2!k2
zY_9<+7XeADk3=jYz7y|>N5n<qEOD6FMl2_06H|ymL>HnR@fQ(8C<!6KAV6|Jf&?C*
z5C9PfFbIrb5_AG0Lk=M(B8XZ<3!)n_j2J?!APx{$iA0j803ifgkf{5B`#>cir?jUm
zq+F+DQRvjx)W506sBftbDj#ePP5}3SSHU~rS1=DO1@)i-G=jxo7Wfl<44wzqfP=ug
zAP6Q?_fv;cRn!8?HOfqifRY281V)mw&m;B{tq6Z0Auv9G1Rmf6aF73*zpKB*x5$V2
z&Uo8;Q#_MBI`;;51NRTtau>(-!+FBF+&SGj-nobjUppDD#V*{n%&qYhdtQ4l`!4to
z;Cli$iB~`>C58GDJWX2y4S{9IBcvtz0BugcM~`4kW}If^F_c&@Y&13nYl}%R3YN||
z!05rS(QnYl(^>Rq=m^w<96{P5rSLi!fsaA0pbFY)T1%Q0JOOqGIpACBHflSng>s*=
zl+uKf4QwKN<pVK_C=QGYe8n5%Tm6YXj<36SvFD<@%oXh#?>y?rwkzzDZSSnztjU&H
zmUfmfi`1gDjItzHWY)*lfi{8NYiBx@t`6?MJ(s-|zDRs@U@@@>m`LeC6@ZCg585qS
z9P|(5fV#s=;2W?J?uML3k`O(Dq4m%)=tgu0x*lDEE=RYL@tJ5Bl#iAn=aG4c68Q|T
zf`#xqXaWR6muc}dob0j7;9hVOI0qaH&H|@^Gr+arDsU0F1-uBp0kgsUfBWz#nZpA2
zQ|*+_l-a-$;%4A7zR$nOx5K;L^N)L-YnF4qW1D?}ZKL(RMQ843wwNxMrkRGC{x&6;
zBF$&be9H^V4(oc`8v9wtJ?A$U(=*Te)>q^=;D&&UumQ=GMN~g^9mt|xpsAsCP$e`1
z*1;!`_UL0&L%&XM%6Q07V3V<PSOF$w_Gd0;Ze%WEPGe4BPGt^YHeyCFP1qf528LkU
z7-Gf&I!ND*a?uk=3*-kp2G&CJp<-Gu+BL9>s-QNb3<9PTQv&_*T7H?2>lJ$dx6`S2
zRNM8o@7C9rJ?6fq2FBio+4`$GrS7Blp!Sv)*LK&X>DKG}8hRSzO)JeMmccfc{j_tK
zTk0+M-NF|Wa>@a!fHn!*0IxwtpaS|CdUM7Na&+y%9M~=<##+rPWOZV1Wxr#a*a(Nm
z@v*De-`S_wL)jwsN7h6Z%sR_#!A!*ZW0{OjjAQf)v^Kg1c?YBLVCXCjrY!(n)cNE{
zTSO@$=do9WnphAxi>LU7{`$T_-i4kG?#r&P&T5CxF16RPHL&)yOg1kx%`;9loX{I|
z!*mYqM{TN>q8qI<=x*qj8U`9inpT*fS;B2c?Qvv}toC&Bh2y5cL7*+Q80-z*gg+wZ
z(aGd@@|>Z;wqqGsUFI5Q6>|{lB1_Ne#=gQXWb-*qIITF1IMEy?=R127yD6K>e#n}}
zLRkk%UY=vkuyYKY-k!b@eTxwAAowOkg{IJw!Cv4OYJKV*ii)xp$R!#Qvjf+0ufL6d
zuP@!p^N#YIa_d|w*9hkt$1(c@+b3(bCC&WK^x9}L6zMIx_PW#BwpvW<sCHJ@)o#{G
zbRTt#$)0IV_RLaqwq>HN!oJ1X)vfe;eIM{0L|aNSbrkIZlnQ@EuA<ZEOvZJF8rzS7
z%zaE5>llm9UdYa6H{q<{+~(wR3OGMGPdSG;6FF*5CHpFSI2&hOVRdGGWR7J1#2R5c
z8F}<3^u_37goX@+FGCPComLFSgO|yXwTAK$Pyl}u4+HeTSp1b=MSdYx?>O%ZkH|CB
zecJWS>2QQO+S!NLR#-1tewhhVE7LmTH$$x9q`sd1hwg-Ko9>pbQa3?w)L$~pB7110
zd4gqw^`?#L=;zFK4fO!N6aI+-G4P7g7yL|%g9jk9(C&0L<1(W=_5rIyeihZsPOL+$
zTvi;r7kf4PJUg5HiT#v)jeVBAoc$L$)-JNfu}sViW;5m+tQJ<l7{d5O??FF>qG&tB
z2DgF_Lnmo<X*)p&btLsCr8ngya0Cbg5{Rt8&A^C&0l$ySahJcozrBBl@09PmcZT<l
zC*E_@o#e`JU2%SJ1{}C^xFgoF&wkckVB^}R*{l|xwU?#BVl<DlJhzz0S#F#awe_{-
z+Q!)99b=sW*F4ui_XE#&Z=tUb{wtsXHc@(m^=PM|EwCD$i<YCk$r(V&xIv%AILVM=
z-xxe>0k#y|ioL~(F&?uLtIu?iF_dX1XNPrI0wyKrbS9R_Sjf;a)-bZ@{plm=<>)}P
zEBXYPiL8Ln!|4zK&4ygGAGA&+w@1kJE021PI)qwC5mHe~f8Yhc02IW+Ky+Xo-WTWM
z_52CG;l2#-Vee~CktfwX-;KHL&f`wZne0$G&e-?Z@@xgx9P1CuOY><n!$P%KEq5H<
z9cfOv>x*lUdy8wD>!ss_6LIx)_wtl@r}+(jTA(qp3aC$wqMiYF((04zQ+IeJx!UwZ
z#VCWm33ZWkK!ARi-jVSEYmQCB)GQxw7JCl+9*fW3Mt)~+SQ7G!jA4Gp(ijb~=gck4
zILu5>qpRrq(RHX1os4#($J6)IchdW#5%lxOWrU6dAS$ecn?gIGtI!r&5xF)`rAAZL
zlubYiz#@7F%J8#(y>FTCniuu@+){T-cbJ>$n(u1i0$fs8v166PW>2yztn<t>&23G0
zjXdK};}SzB!*fG>1K)VWSjT+V+}t+XW^i<IhPm&1v|g6~qyISGFR(ViC%ytW08`&n
z?|?D1zi9|<4s?N*3B82ILT_kxa0R#(j0WdZ7lSFFoK`@)5AB6&!^hzF2!z7uKjd8W
z1|3h|i!McP(XXI~=q<=Kx(Qv1{)OH|JCG}NKXf5-5S;?^;JHva%@48R#n4My1T2N>
zlg~YHGz|tXP|s4<0c(Jp#PvWWUKiJTcldD6aCe?(sOzrNXj@>7vo<xyncc>=2AiI$
zTV1`oD!F1v#q7#KmAfhnD;8IbuUJ&!u4q)%s`{qZt6yn|vAnSyw2K`pU5PG@=d)+5
zhvVDq?c}@c8|#1QWB3P=Yut9+OXvf6#ADLE4eSf_34noBfeLc0U&Jc|)PRWGYyAXz
zQNB|l+9=viS~_$Qo)2>oDl!leqVM1w7)Dw^yWt5?1Ly=a7rH=>l_u~4NC0Jkb3rq;
z4@FL;QqNK9f%hp_sGF%%sIMr;C@eq}hz+#xzrb7jui?i7EdsBJZTM1OrQ7W+wU4oB
zthFp=^ElH1<9?$^-&g0Wo>qOK>S{&H@_8j4ii`5Q<o}(sJ7-I-Eq8Zelag5#^|Vcm
zVzb2_?cCs6>i*^)?|I=l;yLHp;d$ZKdZv5)o|fJf-b<b@-aHTLJ?|+cm6wk01b2zM
zz&*oL?bZ8+;f)BC!Ub{KG~^{?6LS+MmcLve7Ht!iihSY@;yt1l!YJWk{#>4ko5sDu
zZNdZjP53eV?YtD;dLEm%pX1}KVsB+BSTyDsh9AvBv~UqQo=4Hnga1%P6e+R5f5iLL
zxz#zyKHDm_>^03aJk#gvcIj4YbE{@my|1h+r<IvXO~ti~%L`r??9JbhH!=4}{<wnI
zMaxTDR6f<sHO1Qn-gKf4*bj-P-(`Mc`B+l+F?K)Jd^V4LlU+j2rPnzGcNTvlf2QDv
zV5D%Iu&szG+9X0mpF~XYPcbZAAnh!hBfk;`DThaFP_@^%BgaKQh*881j+4f1jJ*~+
zKDK4_<f!GEb1E$StU@UNB8``@MI(f-_y%4Iw}La7oyBUwTFLS;-I#$s41Ehfq-~&F
zB#!%Ede*y6I2zf*Y|CuFtSPo-)}Gc6=Fi3x`c>N1Rra!tB|C~57Z7<w?xftkIg%VW
z`%&hTOi#v}jFOB-*{$<77v+@|RzEOKv6(#ciCmD2gfl*~ZgTc>oAG+_{^GXcF6U%&
zvN)~T?KpSYZdNU}iPf4_#w=wPu<CJcaE@^){9%F}qQ51hq<Yx|IYVI%{}j<(&C)#6
zEQ)#%MMTqMcf?>Z2csjSXGJ}V{9DsR6|IEA=(5_9A;Jy31a>D@8kWk~#CU~nBfn`a
z!a^S+UE$_XG^nB4fI)#a{)Qg5lWyN{8ERr0+}e88vZ|I95#=*Wu@Y_Z>O#11K|U}4
zY%VL;l9iqDF1>l`i{#Ns>>rO3`QM&@>-k;rt1NwY_L@RtS%gkzd+A+B90%Qq8{5a8
z$`z0!sxEgR=QaBRvngvD(~0%NtPBpemf3*ZLylw(W6$RjJh32I&``8Otdu^MoRS12
zplr76h5TsPM@72`uj-?wZ`6&L`?0jRgt+B#`{QQBt&fpMf7Mi}zJ*tXwUVtD=L&Z4
zv$#t*#q2otH*ytT$kwu_ayF8y@l_U&HGtWNF$I}G@_Qdx5P0a@;W^;$=VUpi+KyQm
z<}OC6evEc?^@d8K;!OFA(oH4xOU4#mD-;&Y&hzB7%MxaEOkbD!G4)<rb$V9L+JelI
z_m%VY7cFBwb@6(@J@6bnkq(h7-gZV$#y9jbYC@hNn~?Kx7q~tg3+KY+aBp}d@(kV1
zU@*tAdUMja7kL?csc@gLo+yTNS48=uL*lxUWzz4`^>V(Vsd7sMq}r@ru1-~rQb(w*
z5p~0lhFzAn6t@(d<Q`zH!aC3?XhUQeJPm#fZGo(`LfUs)GpG}-3T#hmdzF+Qz{@~_
zf0u8B`;POz{f#xpyu>(GKT~(Ix_5Q=%B1p5Wy?y(mvk#$QPi+tLw=LIn>mfM+GaFN
ztxi6lr1+Vbxc8eb@#2q?<jZN?towOui#e5N^-9Z6`&HK^pAJU>FQuIFjJkvRh1w8Y
zMs-m?QI=8UlovoD@C#T>siYjGGHLtab!Z7A9y`a{%L(Ux<rZ;y<XqC0w~Tj>zgsX)
zv|GGY>X9?Utr4L5r}~vTMU$#tqgo!mC#;Taj^v%Nl4s(~WOc*h>D|zuNIaZMn+Lw2
zQo-q@E}KQ_L$4{DD6J`GU<2_GPw-vwh}}ibkM;!X8uK}$(6Ci^MXRgcRlTliVdcrn
z2~~k|PubPddL=uGHy0ky&&x^9x|@Ne|DF0Q<x=v)U;R>grybAgn)fxoQSp=V!`jZK
z`j(OQx9)Q9HBwVb2DVUkkn?+AY80ulsX!ypkn$7YQ6^Hhks5j;<p?;BmI&n`GwIv0
zqs(NMiT#RulJ|}m%O5K62>XZ@h_zy_?43L{ELYhj;<@UKI$U#H(@|5Z9-sol+bh1y
zmPp<U<$OLjjQxs9$L`X%qXUpJ<i0%yY69(}=|B%S6Fdw$sdcDbDD8ml#BDs<f5;2D
zEl$7PZBtnnnU|aP8I~9<x)wT2x1;)86|3rIIj?+FY0r{D#YYRP^QCz$a#=b4tVUV;
zvfJc>g)54HQmA}%^=rM~bjS9WW2fhk*Y5k`ze;Q*Rs&6_r>Rx6tF#rg!L;eLc2Ehe
zGp&f$lZHVt@HC_?x|?3j&|<}`IUJC8hX?Sx3v>d$Xqeb5UMFoR8!4Y3)<+>yrYaPQ
zSVd{rn6O3iuCipw2eDEt5OIZ<_zZplm&skkK{(6UFIWl8+N4e(V0OT6GXnHfbSp9f
z=0V578Pqwz`GC&9!Z+4i;C|wwyCyqNJ0y0OHP2Gl+{ENE;QDpCKI9xst>jcBmAx+I
zmUs&%7QD}Ik=G;lVD`PN`PqS7apCzQM~SoSSk)6<w$W+1M;iNXyH<G5_@CmViAKP7
zN&)pYEr$GB7a~`X-pE5F7l}cyB7Kk(hyz(pU(D!*RbrEwJ=t41Q@F!;mv~M2h5Ql1
z8=`~aQ&LR!Rkl@LDK8G|AEpWGD&Hkj%Oa)CB|k*9ghvGH$yw<+FOQqTdC6|XR<Uj}
z<C*c;0!DZGAab`j1bG23gBsEh@GE5=&@XU_Gzu;tHOjHBDb8_@O51H~!1B@3*-~oi
zVw|WSql?h3)LyMBt=Lv>D7#lSy6i;h(UOGXRz;nPpdwyj<ARU*(+heOnM$^m|Ek<v
zeME;D+gtYAB#wZyzq__K(~ktE66t`5>IFAK4UtIn0E(mk&}ED%jNXje3=6sAJ<2F!
z9K|j(cd-tzYja<5FYvDLcL`PtU7{8epCq2_*|4x{VLKG>6h9OticG~fvVT{H&69VL
zjg#IMQ$$w<Ufy-ChuxB8#JJcL#vOWHdNP`g_CyaLgW;RB(co&z6@X0~!%O|;zGPn$
zAH|pH?d>h_v~^pZ8yp<_3fneIZPOrq1D&I~XZ8CkT?MzIM_IG7Ib}yn<)uAIZt=p=
zg)a(E7knrvD9kC&EQ_z)Tm4nL*znDC(OSo0bslp&JTv_jfm{NjL{fW#FKBlm8PX5w
zjTWNa=zZwV=r+2JKA%w+dxV|G09GdJDEkU$0rxtW&TGZ5El3d579A0{kQ7K_WKYPo
zcB^8RVolgB`9b+RxkYwJ+CZ{Yv{l$#*ig8XzluAK-HMgL6f%WaEBYg(IkFa6g;c|T
z!B1%qN$XoFHHOlZXonvlx#W6pc;>kCU2NAl*GuOiXNF^g{i)S!?qO<ge5%jYMr#3b
zwrXEVsainpc?VbQsL+&;DcfASxWrj>p)j`aQh}sMQOqsRsc5dv)eSOjvfQ<|ay@gu
z^q%n<@RGn|Ac``cdXqYww7DHd>Y?}01kxH(%IJwzVM{P2)`YYR%q4ZzMa*BUhHRW|
z;WXk&1RCLIQGLm3X{<b+Tx%aGla&X;k0^U8GZe5w5|%29mc>eo#4PazVKV`pe}J3D
zW{`V{w~Sr%L~<Ya6|N2cgla)^Xh*?a)cWM!H=6i@SNMDS9(!AO=6f1@Y@~hUmUFSA
zx#NTVw5`Zm&m3kp7<2}vUa##_U0m6@a#+RFilho%#e?#j<%|kZWkyAO#pSY3CC5s}
zmb59&E}c@YuPm#&p)(j4TQ@skx6UK+P4GVo93?&xCx|8hg>s4_q#cGXA^GSM#%Amc
z^AQVYEny{LCPpeF9YdHrW+by6)5)64E#)J^N+BZNCYdFBF0Yas<*9OE*gs+4!<s3?
zinHVwd{4eXE|;H^J{9wXpZQ~W4Y@biE14Qh$$%JAhK3%Gjzw6o4Jx8t1kX_9KsKqn
zZNcx6#|&Km7~ce+%6F91%zL|QxhC1!wjA>#Q=Y!Qe!A|y)~G$HZLa&Nou+eXduzk0
z&sM&zSWWJWKa}1qsa0~W<VD$}@*$P_ik+49t6o>X)m_%%db@72@vKQ{jkD)D?zmz-
z>Au#1tK_&oO(~%;z<g5MLg0Du4%m*orzc=0tT$<q>&iO8Ud_>RhI7u6yXI-!t-MQo
zK;YpY5lj$qBqOA?q*@6ssgP#MKFgG{juM7=swh(QP&h#V^6PU~v%9m8v(_^2GX|3S
ztBllUUm=&_B3eUQGZ3WGD0-qzppU=I_sv`6ZRv~go%7!D-t&(3nmsREjhuCC11-;t
zRffL$j=BcALpruWXWVTrFt@g>HeWSI8#@>ZwF+%oRqv`DRVmfUT0nnWKS+O8KhDs~
z_(<1NZ_zf?{m`8?FpL9CCFUpAEPJXm&SUbFdf$4v{@(sA_=NzS++&QRvS|u<4<e>l
z)17o3xtlz|7|qC~bI5P|0AnQ9pBc@r$7#m(bDQ&fk^B23UNrw1?>n!YzlU$&oB7Ft
z`oegjj$E}~@&Dq#=GEnXAy?jP(sI~}@sZSYXQA)W5>gZFj}$>&Xew|4wG;3@(1Dmi
z&Rk8&qna7`j=*Sqm*3<)?iP3&y1v-8mTsnlrrPF<mS>i?)}FR}%Tvn;tIvGM6lR)b
zTw&H5r;|EsjwQibY@KCZV3t|lno~{vOgv+_v4gQ%?=ch@(~L$V!_?g<Gf6Bui^Sf<
zvD;D4sc~+19rdj6?Il;M#*|!21F$*GPCE*BMjE3nklDybBmrU42hd;98T15H&Zx)G
z(z)1d=6dEV<^<A$wv*kJ^Ean0Y5DxhRdREAE%{3R--0WGVf-h&ucV9y3*K-D_H1Sf
zqaP+_RM9gSQxOv_4>|)KrJaMu&?+b@@GRI9-3kc+Ik-A-E|5-)^-smy`$UA#KLVF{
zH#k>1takD=z*+1X>uh3g<&-+Oc9ngO<+wfBe#>>hIl<Au(aH%p-R{5KK^w(1d!aqk
ze#04M9cUhC?P4BgyXEL^vDtXmP8PZ4HF<=!*OF{ax0X1HoYUOTJ-fY4eedxlcr)(@
ze^@~6Ur!_vn}J!>^|Vd2?$kvz4mFojK)V3$1YO`IFq?J=ZiBu<+oHwvOl%$|!&0zf
zW*%F?o=<9b)7Vb_IzC147oW<zBIwFfvH#(2WshKez`8MTjK=n}6wJ101>*_Eg)nj#
z&w=8pujtRAGO!;ghX&F=Qhy>hq2|=9loP%LPYn13Z-eJk1_f68bNmbaX9911*Bra-
zi<~Q6Oj})lcOS=-=wIR<;u`B;<>+p$bR-&oxav8ESk&&P_P)-Jc9tX6Wwn=(e6!8{
z?Fo9b&TlO+#@jAi9@zQzr<Q$={f<d?sZHXvId3@Qd=uQCoIU-Qy<>b_yu!QRmq&~s
zjDeAq29(v5-hdPMOnDjD20RAM&~vCBy$!3BHV?kVAmAp9Yb*}RK*Y>l+!h=Qdo6n{
zI+L@F*%^C+Jz_KYsq9|7?bun)YI3fd&cdOw+|lS%fP#tOF35atA9xh17I@&rGzNVI
zdKOy55Cf-ZF8q<Nm{A3EhgN`70UL!vYYeOljI-BrwR8@0w)WJuzjbZ2_w-g<UpmM5
zw2qIKXmcHl*6_=|&AG|Z!KZLPwl;9?aaCBd&7}^zshh2p{cmf&bF}%PC*HZk*2}ky
zw9_`U95bJ@U$wS3TV40PquqUcBKK?i58oOu=<4kk`lfqZQ5M6EVJ`F!WjE+RXCdz>
zqiMsT#WW6c6Uw2_;>=^VWcpaEnd9hFnDtmspjz-GMkgkV&F8P=&Z58JF2Z2ec78YT
z8<tLI5}z1NnYF>KJQb@E+?*8yZNPdluOo+nWHcE*<tL!e#3pDc)|&E*evsw{dsA8k
z)RZv5L#KFt03#{u+-vbTe71Kj(T}phdEdLjz0}{*_uab2UmySKtnKRJ+W|1hBk8X$
zOW>TNqy3SeVs45T`F_~u`|9|p{?ER)P7!&y`;J^W``J&}{&9@6MtIu#FxwU17{>|+
z-_y^*vO-p&A9UC9_3<C_d;+%k0=_nZRXCNn?4Rtq=2=Jc5<==fNItm}NGF_uy}&Zc
z0^}jJGf@n6A?*NHP#o)stwD_R0@gjM30=*8N}IurX5M4&=1t-{xbv_iMi#R#X9k1E
z_{z{@i?EmI7*Tg}cBp23B|gK`uv^evI1@4lR<l|m+o`WW9VLx=n@}QFKSr4a)CZRm
z)xKhu)WxCX!Z$2O1BJMXdfT3ZAETVM4gf3NecTxS!MeyHaDK9o=i%PomJj~n-mT`1
z_%Ua1lZ)EO_``;Q|Crx<_6J}`EB8rvnf<C`E#BF0aYuM$-e_yM{f^(^$aFXz3p`e*
zBCr$~=^pGo?^QWdy>wjViz8lp+7eQ*bD%q=3GJY_Ng#px6@LblQf>fCh{os~=p!YG
zae``w#WXWz0Dm#i7#+k?QPN0#wJ$o4v4_dy?!nH`C-J<2mgpYtF=!dfL?6kT%ZPy6
zaB{&_jBs+?IFU0%5@Q*BAF0FHN-Q99`JGTNH6J_cya3E)cf?)fnew*4UMfr*7qny3
zp1KYJt?{Xrd;TVP7xzq$!~M<G$Y0?2j5l;t+Mg1*b$}1Bf%fJ8BIiZVR#TN5_C0fY
zym7XR=5;QM>m&JPG`765Is)Hq0^0)5T${kc@=x+Mcg}Mi!N0jr+t(3xSGDIeUgpvV
zBJp<~4W%)D-Lr?fg&2%a1?N&a5IWi|uti`Zlnu5<RwJoYF4h(ipqm-BP&-}ByoSAD
zWnc~H53!|47W)OakZwY&=o8p-5MVat9|7w!1>B?1N>I(LrUUe*SX0v4v6l82I}D2n
z340aoIq?E9<9oo?@DA@Cs4==8??!V|9sVywgm<aa6WEH2J+**Wex-k(U+b9aUGJ>t
z=6i-3XSknRw%KOmVb-l4CO+K>8TL5Nn)hn?p1pR+z1Y#rb==j~I^HqI-rJsMN1U52
zS3SATW%hZVg|?=y$?m(ZE!GHUwST8)8D8!k;mP;A0=FnaVi=wX)kS^~IAc2$53m@$
z;C!Snr#oi}w2*1!bU+BEpM8YBk)215afG*%^O)HTzRo?voQo2IH0U$^HTxX;5U!8#
zNIQOOMmT*lYyh>$bhLu-Qx_uRC;;3D;(J>J(ugshHOOA5)U^UX=eg(I=fCLN?Y!d<
zP&N=_T#sF-*Xn5H;#p=JjyfAT`L^!v&Gu%N$Nqkvg%*`9-V|Zz<v!}PR&{fYbqzAl
z^4%gW|DT;D?nKvM^4#N{eVM<d>!>%H=-_+iA4aSsp8GVEy5ydFC8HPZly^3mL1_uq
zMc-0)QFB27Dx-U`MzqfmiZw*p%=fHQjKz$d?AeSLj3)f2=odsHoI@|BKjVbK-;qSY
z7W5p9F*Sq`sRI>}Qe&gVlvBhzbeOl6{~=ZFE+giEPpv%yC;Tg2E#WVooAzS>&r{pf
z3RvKvXsccI?Hw)Qo&zqs>7yynw87cY+R)t8y3sJlP;SL+71fdUedG`@`vR`rb}=#4
zH`aBIIN+Y_jit5>WCS$O4Dzht8N-EafyaZ1@Ky3uLC@eob&x~oDC`953u6pp3_TYe
zk40cV82vcmER6kzwUX15)0x$Y+mX?p^%rM0_LR$F=FyJu!r1p|e%2;f7TAn#N7lK+
zi4i~}e1D)<pq8h*t&3k~8Dih;KVeRAC@k!<hK^poZF<5Ax>}lNnMNCyRL`woRw=a|
z%I{XaH1(?Nq}^taR_P35bO$QW8&Y&-M%eAO=Xx-59ETG$Vk&Tew3B=Uw!${lO;6%_
zIOVL?oXh-Xyb)Z07tb0=T9yYed$V?Mhj9ydgN0)RazQ4a5Zn_T;xy$uML2hy)G0`3
zCi18A=5zb8x05z#BYhNko;Cq$LRFF$m<7;u|31%PceLlSbCuQLOtjUqR9n02PE{|k
zG}o%C`soanld1++g_nIS8Bs8{Ag`n%?@0FN!v00?%Q{xODr2hox`xIDCfGzJ?SOlo
zU)^0C1F5l;B4Pv6h2CK{=O1F#=alkFcq-m@VP8R%NGXmNofC1y2H_5Vq2Pk(sjyP?
zUb01ElXR8J<+Y_PBrRoc#1q7F@g>1xzEHS8@PYM`)rGX|GFVEw2W-t41b?SI1@^lK
z`X0E`Z0)SWb&GW|meG~=>Y@7N(zR7jDw)NL3XW%A&q&UmpQ+2p$&u!N%|BeyyVPB#
z(C*O<F?2J-_7#@LZZ<jFqd-6E26zNxCi??tEUz)|I&YC+vbc=2OuXP7=HC*yg*yZ^
z(I)XB9z(Q;-<1pT&x%<5d;Fc^1VJtz7Hkp}@_KW>@%&saJDI(OokVK=9hi@4E?O~7
z4QjztzESS6?zZ-Rh7H=;75ysaS3D~@ULr5ukWb4Qka;?J5@{tooSd4%$d1X1EqGA=
zvb=$AqWQLQfEBS9IkLU)0|H8Y+7<dhW;gByVV>lgWTf09TPCX|A0Uet50QAqQYj%`
zDe?(eLbce*&*LPpxg05XJwKNx;tB*O`63=mG*jGJG)B;qtK+=qo?_l*ZpCVmbFvv8
zPu^>}>MAu)w!~Fcm*kcWFG$Glk*UbIm^>}XpH%tH@Ll;cIWay-m0p?Q&fQm<U3OBN
zYm}R`j{9z-uOqoj{stv8rgQp{GRl_kmd7ZTL}Z4qR+K7Ng|CoLRrDeCK~&OR_CpMb
z+eob9&E$zlz93TAL3B#=T-Hr0kj#{IkoS{Xq#j8V(M#?Cu96uK(coh6rEj_SqGN$6
z&v31B97$JZ#)%Z#_q(5oPu7obKgWIHC4TwoNFACD<Toq4TP!bMQ9Z;|WP9WA;~-Us
z{=>}>#7d#CeagYgtnh|PU3g1HGvyhDD~zEyA-^cim9&s#i1v$}!e^rP!a<@Yq`NJ-
zD7`G(Cfz6PE5)P^(Mn;0sIj0}Fo#z~Ujuf4uKA6&r8a?<T9#X~Ag?lI*sp2d(?9XQ
zto`JE-}YmVk7K^ll7^;<G8*P}FM3n4!!X@)+Fsl9gmN27VE*E46!;~aFiAwD8dujN
z?X%}J*CXmiJXI9Qip3+uKG6(OcZo`zD$?=Si|dP)h;*V?qH*E`aWkn^k|<}%o(UfF
z*78;Cuh<*9o_a8V`@dV88^`II6z|UJlF{R5OyaihfiKRl>%aB<!TMPz{cw7I#+O{U
zXmHtg?SAt~o6(m>Ek(9)tinXec;%V!@v2*q&Zy~{qnbIY{SgY)=I|&*ThhjOP-2$u
z7M~Fw5;*zm1$RUpglgdz(FNfL;bE~#@>TdkGDRfhJ!L&({Gb^K*qdRcnCM!0>GPaE
znN=x~zgB<Wl(_v%o3G$^^jDh{VR~U!{k+yCx651TvW$}*i5?TM6F!H1=j|4ElYLiY
zsJ!ZI^*hx8B~?Ml|B-$d*B4R+{drveVeVS~JKi#G4^qEw%%4x%R$d9-h-b+z$-uDb
zVNYbMC8vZRxzn-Jki!SNA)8NIr%YSm&)JrqkiIXe>krnC<4LR(T3S_>A!~eLYH?N7
zFI}P8?BII`Q}ndm>@8eE?2tw%8${66R1K<#RB$Bw#C6GYfgjjc@~l3C)JZx)_27}z
z7SK9SOJhJgX`>;6)Nmd$M{@_T3OH)kWKJP60I3Hw!-u(}%^IzwL{fSwdvN}U^y`@i
zGur1I&okuDD7#QHM*FL}pZS2Ti`#==rerc+vSx{AN<S!*G;vXuSX+#_R<Gz$QHc?^
z!mi6&iX#Q}$+hPty$-Snx(%!%!UH9^8Bh29^dIx}2+R*85ZUBy#=4Z-fmnRMC&5{1
zZl&#B8C9@5w`WE{nl)ugO5?N-X?Joa<q3-8OD9&A>KdDN+6TC1;!0{i^dyHa@+vy0
z3uB4e(gvB0;u>#i{G~x;{dKiF$HZ%<DVxY|iFODQI27hj#%c0~Pd0TD*oqiHEC!C?
zUc9ZpKQ8wr_*;7j$KUpTrb_+Hvb-W{wmxHidUWcG)XgdJnIAG^3pNzymQAkwrM+T2
zW-fNdxNUek#f`GrT;WhzUU;YIzvGBHVfDt>?^~~7-D$O=V(w_}hVKh|D`_sO<Q-wx
zVok!<(|O26r~$Pec?vTjzzCFjhj=CKMsB}7%^YdmP`#&|Q|QVanF(ffO+S#<H~n=+
z{oDij4@!QP8MQ{k9V_HiyPNv&;V)>D;MS~uydv=?<tKHw*qB;|dL8OQ^$*usRr_wt
zQ_Ym{&+@C{bzC27J>7@2hkVd>kV#{L3&=Ak4?e};&;7}9#k$$_$k0Q(Rm-ek6z|TD
z%if;JPTi5R=U2znkLih7g@p%7nrQ19ep<gew|e^en+7J(?m!ug7}iccDk)IBQ(uhY
z*1lEiZe3k%THOt?@lkBmz_9Vs{k&V;zc7r^4xS1vq0Io9AP#IMTH?KZgWVq-9!p<i
zb~RYVD^1VemAfMQe5N^VMH)S=URrwQ;OuP$=ZjBOq-g1;S&nHgzW+R)3O0pJ^c}2d
zUb=9PWW2&3ekgKB<lPu~%#oNyQBxzosU|6sW!r^-poC4w#*sVD1LT=r9QXrR7`PlL
z4k-OgePU<1t&UZ0{8)Xkq)SnooVnRpM$5F8nXl8c^R^dOSIn#q7@F7yIWm1tFG^$)
z5cMOiJF*p<&Eg2Ai3Wtd4R591t7#HNi85-!B2R}K6wy+r=nHQ)cMbC#c?WY2QVV$t
zEdbvVy@;v)w!U4ik@kV+rv`)edwH$WlX<MXky);cy%~oy!g5CCY%3`${i+w42fKRu
zcLHmnXK-(3Q*yPL%a#e|@LA&HVykqcd>m<87KUw;ua)|xYSCTccJ4#&Bj!5hYVy8d
zDS4T@F)bIUL($?R@sYkbugra$+~W_ngqw`oBNZ=-n-|D)h>Vr#hO|ZLW3oFG<dn>*
zCQLCdufGn+L|QT*vc0@^g0<pZ;`dUu{BMOtIWHnyeOaYbfe}$Elj3DqYe|#@;Ip|R
z=36uiYDTqEe&NsY1|ExhpL2$tZ7;P}S-gfG`UX|;B~gXWtcMxblE);MC2dS`W=zRJ
zOP^KrH5=`90-Y##kPD30%)6Xf+!2Bgf@Dd1DNi{u{J46HW^hz_BtL3_=1D|ANt16B
z598nEw8P$^g^-Clg1V4ch~plf=e^Tp?`jj8Sf*>$iYiL+q{0W;+p=DzuTCpUPfAbB
zq31KoA63pZiySf>qedZ4(l*mYuts=X8jzj|yQdr+QJ`L_{u*gl7icW%Y7Ivd7U2$W
z8kR46A#BJ8nU_!sO$nUzZ*UKDe6Y2*ela->D*eFfBNcZ`n-}PEE~oEF?VRLGy7beU
zygKcD_WUBCvX`ljLyu=t=QG~1`Utj&TFB6_tIFHqT~vKFSmdXuFVW^$ckH-Y+v29j
z>0{bOw^PfMLP=MCI>v=J5_7zO^S0%v*{J`b>s!^1JZXJd^0DARcGJubDV0eP-&-b*
z|6cZ!mG&j8qR?E?-?YfpFz^7{!*Fm`2y(^OWLmjSQLM;`7!~nGE!Qxk-bHa@RB`QN
z7_kc?>qnL;&xN%SzvE_O3*iW`D<0|n<oIY!G$rcU)qhtcl~{_d=f>s;(x<1~O*;G|
z;b-2Djj21+CHciA7q!{egB}k>itJ?3`PIVr(nQ$>MWW(v_)gVn6)p0srccb~m`QQp
z<IuPtv6SdqkzwIuWLpHKOeZ`X800(RWZUHCONO>OSyge_mg2toBeIZm>aV&#E`MVr
zPEBO~*q4%>em#F{X*>NO>toMK%1n45Ya#En*eBhlI1_$f^;CU0a$D4<n7Oe@ao1~`
zYAvn(BDPy>7tOEm>#`4mh3s;q0Gy5Y^}M%Fv&=R2)TyfBioZ)f7r1hMWnNFyCV%;r
z__NM0dGf)uoUFe2gUhR`#WuEENBj-NG0t!s^FK(ulB}?fim%E=5vw91HMKPZ$=e_+
zVr((jV!Fn}N9|VkRnC>=2wt+&&`w}6zLLD-andr$SXo_M(YNGm;h5Y*S#jy&lx4rh
z{B$MVPg<FFFJn;NiDI~FgQ<rjHqa29#Q4OnBWxz|%N8kg%Eu9wi0A6ZYGLFiO?1?v
zs9sU?qZUPO)GUrTrJN>PCrsvSWgMqP5ly^X9PP|3!yfW(3|j2W8<Sm>9-fx+>%p(9
zN#V(tQ#NG?bFvDCm;1C2Exk$otP`ydqZj8Txpyd*mnqgnbWuOi#7Aw9N{b#C&5mgv
z-7RWQWQ_W5_};M9Qdl^UBg3+wbl|xEp<7^|V7BTORUIvTSIEkJpRp%Ro}BTkchbq^
zJ1LJc2Id|tDy;aRkGA*lTmt$*7qCQj51~zbOV(D=O}QchiO{J}t5VfF)T=dn)Xmg(
z6(eG~vQJorEK9Ukz~)%7qi`M27I@|zLCTG8)KqUSw-%kso0MfrFHa3ik)+&CO-*Y|
z-W%v#+_?Ojj%I!19EE37-Dr2_Nd9i|8d)Rdrid<@p_;5Hb+j;=5?vXki2^iT)Z&Q0
z6s=^lBnJgQIFZcbXkTzFQS4K@>RHbkmR5HyTUq4D?USWVYmxded3AEf6js`_%*VMA
zMQ_U|>l#^1&NlcnYBMB?^^Ch&_)bz5Hd(nZ;)^;<LulwxE2Gv%(js@L`=|yeePNrW
zO~mi`JvgnI>Bub_0?fkCxMLj?EVJ~3tFD%QD=f+0m}yEIm9jT^WQr}7mA)nCdcon+
zuxgt5xMPnukJ1sY#7=NfQ7cKRJSu!i#5Q%FS`;-)vmo-R#;3ljnjOAJxnA~EGDeus
z<Fg@#5y_{H1vvg(Plo-MWxDZ0b+d{$#VZPWWbe!xlwO|JH}zBc*9<|9uE1TAQ*}yT
zVqfa+N^Ax<(}%GLZlUO)BqHo@#ej$}5q;EkG#)jo$sq5+T#I<EoU2$Rs}f6vt$0_N
zTj@98UDU~e7QUyh<+fz=0zF#2y|jOEqrAa63o`ZT!_!4+i!-lh=H~ee<0?kzKAU0B
zDSs600NRG#oqt|5PQFYIMJOZMsp*=_n(dLTG^-<Ls0}JzxJpS0qf57lKJt%ox-y5-
zA3(LKP8{^}T=Dig=6m`VRjH-FiU;Iw%Kn*tJiSROl2(=x$dKmED{4`}*RHTEbTj;E
zU<G=DT;o5A9?ML!9pUG~{i-Lb8=CmY3+h76NOgO4`|!r$yThc?j^aZCnA?s?WAuP&
z)E-2hua<kf{hKA+c)l8{SX=V8V0KR3>;W0e(mte)PA^QKnAIwOWO1E}5&FMud)zMr
zzi5paNvzxaw&EjFo_vF1xpJU#V|bs49pTL*-bWmb7$4p+JXZNbo+Qf=>xH#>1K2`r
zFfs%@MeOm9cfYrvw0<=X(LvSw%FV@H3NGe#%wCxplX)nkZ{~@tEqUt-UzeV&jx{CO
zpLvH<y25m<Be$9Gi};Z2Mc50)V&#bNtnf@_vC^bmqnM_c5H>N4BFmGWk?a*F3W_<s
z*lv0~ltb$YEXL^`zjL&0zWJJAiFS46m9h=R^$TKiH92r5JF|V}@XXxoh50>-_mqFv
zt+32+?!xn_U6BH22X1>|d+}A-8~I9wEc{XUF4byP2lW^AQFT4_2^ALcN-<O(FD(%4
z;^va;-WX^!Wp7}y_b<1`8E0>9m6G>>^t$!c!peE2?+f?mZ_8Pf1!WG+<YunV?wa4Z
zD5HF>ZnFh;y~78Ccaat>9e1LjuDF5pv22rkuHviWj506W9o{6u7XC22Q}}Y_y|BMz
zRLMO-1os?ENWTj0rfPtH{Fl5vJU^Vxo%ieqZG$W`O$GWLq%?aKuPRuOlahgDG)Wts
z+Be;l)h}0A#HwQHciP5zr%{GNhp}SrSN;ykZ7Cl1E^IK_Uq>S5tIn#rX=bVRtK1O-
zl+Lg!S%LU3{#H&?=5XWzbcymVu+m@0ch^(T9Yb0Q*EsVXb?wb;=S}qt^6GPCSkcNn
zeb%S6eQCnf%oHetlG8B%QRy+Q(9CwW4cw(ZL8i0TbH552h`Y#6%Ni+46+n2yh-nd*
zRS#8HRSzO`5pR`0!yd@kVh&%&u7$DTrr>L$8P4-*yr8GE8+WIZH;X^GE$%fg0eO<u
z(fF+jDm4^z%f6pBG^PG8EJ>IWpI$$^umGzJ=zrL^_*;>fW0}o)seGg8j(D<^F1sqH
zDfTJKNPT~aYO<=YnySLX%M^FW-BJPn4*NDe5`IZ32(0wwd#RrB9-eoDx0%0!{BtFT
zSVWu(sPLcUUgNQ~j^0u}v$!<Zm(e`sT1uCsy2)qL+?lStpCvs=`yAC><fnpz5F7TI
zbCNej6fMq>oso-_+VJBlothsR9rZ=?P%~CNBYb%nRr*aZh&_kV6Uqj9;B;@Kdx4Yh
zYVVrksr3H9(|}pvQ|Kbn2+c!Q!4ldLV!XGTou|K5(Y-J_r%l>FDIb!IN&Qnb>9?|r
z3hGxltNU8+J70U35Nm03kYf6C);Z21{w>jY={5Ot<>!d*n&*+?$oHD}>apP-`76m<
zULEW>`id$d?bR%Ilw-1^ja%Y6?JW$PptOdQ8LQYrZlo|)R4FVFp5bM%X^d60NqDlO
zz@RUGSjf#9n&SPrKXK5vo{0(HpZpZ1Zpqq~S6!m2x@~A|yXg5AcmhtNBW$x^uylpe
zuX06Aj@8z>T-#RbR$Pmi*_xfoX0ko}9;}YYG_ZGIzHg`Jj%$_sj9cKJg5M%-$ph&x
zSvlNjkwg4cc1m6?kCjU#e+#d3Tyz9H>^*6nQF*fHe8%mc^F9xGH}ZAI=PzFEfBE|D
zv`<rhTu%Lz-MFl7bz|!yZ(YhRqysBS*hZ=dAEo&owJ5H2?Fn^tb*9%zs5K)tGU`^u
zR(U_s5_TT_6fHk6*tgxi%hkjA%N_4M?oXzyfLCKOZl-9JEJoQ;JykO~vXHdP<b;0?
z8z?!<rC_bVF#k;J4DErUbs7DW(mrl_EqKa&H0gfzy=9Ne9t&Spe7O7le!9FsT0O&@
z>zxHgFr@qm;vLGmDt7dg*tNBm*J)fwRS&51rq+nqt5M@strQhxuQXx?5E_^e5co%X
zn|qFU@A+pEFnAT&%KXUPD<Y)N6$`?btK!ttR81o6ihtze#h>|=tb-^8eC@Y5C1ydj
zt@v%uo|MUnXFk?@ed($FvF1_xho_$$ec^m_?sJ=z*zBPtdkk|O$MK&~9rg&`T!}z&
zOgUc_sW};`)l@`fM=gztirNy%)8G+1mDA+~;t#x7b^^MOR!p43A9<rmnxsA-zKrr2
z)-$?uUx*gSW+@X@Cp4`iuSSBAcGa<PI%(H?CG5>>#Cm~lq?HiEy?5-rOuVWtMfq8A
zzaD)V{I>23^ijY2>ifU$2_N5hHvZkwFa1+L=k+b`Y3}W6PAP|9v%~p3@ljb@`50w_
zf=aH8ZNj%He<|063zaJ63dKKRgfv15314zGYz#|=9pDb?2BH~|2|S>fz*Xov=1i_d
zG+dsfWNONz%AzmF^oeO0t=804^(QsXox<l_DQgru5|U6-{i{3?j<zPA_H5aZf<qbS
ze)axx_6_{v*~7|&VYdh0$-7;8kN=eXBgdzLq|%%Xr3$0cMGp{g5px|sQPh;wALtSF
zHQge+MVCc)jvEu#t(K_Ph`1Lqd!rs`+{&Kvt)e<S1yhU;rF8+m1>WHG@yx(y;1&1`
zeu`ytE($u!wuZfn_@!<VH7M$C)cdGy>LzNw5|>>O)#bNhrJ%c^7sP5`R~Oy-Ro}eo
zMaiu^dRC)U{QKiC^WPJ%em?8-6nJcY6#FFZx$yOl4;hIUQby-aENyNWYj2Ck)3z`e
z+(x2EnMvLte0PLXHD0qXvVhd8c15p`?i1~fnjhIfb4|5Q*&wX5#3Ja=ea>>wPrwHt
zH&stPKvjYp!KTm)qzp~Q#;{s)=)4j9DZ-^fm+*!#MYvZG&2Pus$L`F0#CU=9qvcYZ
zfqlL^ZmuiL<})8L?$oDOcdNw8`6XF}H}W^-p3i!dZcA;GT=4Usp96mM{ju}s(3E*;
z4YG9w>N1bEmvy7Z67WKUu)5r%!sgNo@_CA};kP5+lbUoF^(Zw@eN6?bhK4UwNW*fa
z_e8ORqnsja1$qt=Qhx=``5SqckpH(@<{stF@wE2Y@$SG#>PE<oSQsVD_1s)8U9d;E
zhP0WU5ygnh1Ve;P_!GGxI~%)39}It?t*0atAMr+ho3|y|O1iVQbFBT6HNh+~KGxN$
zzE;t^v|rKj{4qJ(Gt<&%q~R&6QfsI0%;=TfE&q1W{W4TbGu5~4aNocOP!w<@MhWXW
zw}DVA)=9^QZB$GOKN&tFqD{oE@LviOX<PmvoX5Mw8cIJ+TSGa65Af!=W;(t0TlRX+
z5@({ji|<unFy%X~JGzlEo|VIC#P20|Bb+CaiN{C=NghkaNqR`OlJ;PksEBXjE#sVI
zxf!j|o=|=2K%h4Mx38|}tt-n(b*4JB_GR|wwk&IcCC=R0I7r__+rM&Rxv!X4I5PKc
zc131UdgqLT>1{G6k~)OAut!N*<yhSw@*hmNH#Lw(9f2ICuVCF^H{$vDoyaqoW8wpn
zCK9=HqC_BmBmByb<4$Aw7)uZ;)R=k(U=mosggfv-1P%B|-AS7PkEBn<*z9wh7JRaT
zaI|Qrh%3G>dLSw#?=Jo&5b$4cuW<IT{pA1a`Y{^PTOxztCbW9gNXlm7Y{2Aq_~qn(
z1K;}Z`N!c9eh43gTm2mWeQ&(ysdKyion^0ax^8%tygaYiR`4iyOIELp^0Ypw`6-uD
z3(^{8CggN3j4eG_^}r~z-}Y<^2x%F}VrG4=oxeath|kNy<s-rthV2S7$|L00Wg1Ce
z(N=z6?p0<oeLC!+#sfHR_n+|I^)B-INvo=axJ==LA7Bx^3v)gT;_l*g<=+(iBQOap
z<lVN8<k?wku8Lz}y=1<|iWq0;G;|LF!`-1HU?Q~vc`NZf(V6@YU9NvR{>xtrUqIg2
zxkTE3b_BNId-3yryYGRwr>E4#a%{DUE%QuI_3gFyt18Oxl^rN)Su8I+m*1tJGEZ1=
zuHZvay;5bxCY{l6+;+jmBG<{kpx%rGau#uM2ME3iRtRnjXA4>j5ApBvr93NV3TH5R
zugHwfM&fB+ii%k3|Kdq<eR4)Q);rXWZ%)j8+8c?_BZS~f+HABrV<NMVb%q_o-Neb{
ze&<}~%;9Vz|4G<^)tvPYRvYWdxQUKM7r;f(d)i`J8W;gS0w;qgZ3}G*1i}m9^@t1U
zhaN|tqF2$;Xh-tDlHcKP&@`HZ`WiSF7~s$F6gx-Tt1Yo6vEi(CX4Us{d+D*_L4_ys
z`{(j<hvekt^vLa!|D-UcB&otwz1_ITy4!WsR}?6w{y&9X2Yggj)_-McW|9y<lqQI%
zSU_;W0-`R(y1KT7wF0_IadF*G#g3?m0)nWZsEA;%pr|Vdx(Lz~MT&q*07FPGkU+v@
z%KQHRIh?&S;C|oS-@J0mx##>(x$h<KooVBbgqnx@M3zR@M4pI_i55kRBWt3+MDE0!
zzhiPHh3^S|5t!<0>8`YTd!6yu=hvH_jtz<(9h=njTzp=#mvyVt-v4&+j-0;Hhw?_`
z&u`YD;Dv%m3yvz-+N>~tcmCUX^YY%zEyG)sOYo#>Rd`*vW$5PMO@Z$INxq@(NA7j*
zW%wWKIy{Y47rY_dEoWEGj+|DJhjYHcTWPxk=lc7&)9ndfX>wY8ORUd<*#7?ecI~zH
zOsKx8GFdjV<dNbYMf-M6-f_<Mp9?Q3>|Xfuw$}?E+`e%~uxL>6jM85!;x(P>yEm3L
z_4Yn<-tk`%dO4>tvOd~1FE8(eyv?~Aav#gBi8e+1MW2d%l2eEG8CT+889BadoMqN4
z-UF##$@lO+TkF(2sVA&u_NVSPe<(CJ=ZNT(ynp071rHZg6}((9sabzKCwzbI1CfgG
zoX}pp?K#WW#`hSWSsRAuhTic$$2($|Cj-fb#0!a@$!4j^UN5_~JKBFu@Y(RL$g12Q
z^PbOd*{m%8`25a!FGQcqsSZ6Cc*OUMz1Ulvd^Wx=HoI~A{;Bnk?yaevQN6!%NBQ`&
zM@mmCxw+)p-2+Re?Os~)e#uWIx0G%w{jmJ9iuqMNYp&fp6KNp!$G%BS^e%N?^i}vD
z3|<xbcW8fTJKkWtC%8Be_n+qfo9|<{!0n5-QSZTf4y(LAUdPnq$x9N)Cbl#cHNA@`
z^6KN05`Rj4;(c#_?0$&nZ|g#j=4_7)juzyu#(OUJL}x|2MK8zupg)J-2)_{Wh2{o-
zg6wnrr~5DUU4-}DpLDi4Z#WOT3HL&OQ{aZs_?)?svfO=ncQ$KNu&v;uf?)-13jEFb
z=da7%9UT?9CZ`fl<&O>o{PW$9?1a~ttcX7m`?2xy{fp{1?47jdmfGGmYpa)6RaKo@
zHM#1O%1M>S;oX2)<qwq&DeX}5O3AR29;N$BCskZs^>%HtZo&Q&V=dxUcnf}^{i5@$
zbBcS1bFMoN?|F}~gZ6`VPaDq!*<adA>@)3OVbulRc)X7~G_fpR82>chJ6Vvb@ZPtl
zyY>FP!NPFA$V+(t_m|xHxv%C1bI*<LiPYzm=5)*XHhgb*Ysd;;99kdzYw+H{rGc~j
zo&3-E8r-?Qp1!~MFZ6$gx9!K`X`_#G9*902&4;(&o8P7&)$I1>7d8J=i<4X2+Pr)7
zE1HeXk4HyGehl3hnCqME<k_RWOH+3yuTP}nC&UY5D{$A)uc5U5lDf<Gyi)UBRcYm0
z6-DJE%C9PKTRyz(^Rh9e@sj6uUt0Wg(d?ax9V2&a-SN@R>LP#1jpaAgOsadQv0MB*
z?^fq*|1*Kwz;nSVfeC?|{hNJ_$R^R;cb9LY??k-u*u!6lyagxXDU({SG4)t-dty{#
zY+_Bat@oMT=(Y@2g=a<w<{wuuzWI<ABU{|nVruh!1+O)GI{zU&(eO%iJfd=^oNnP`
zLIZ=H1MU40-{bC7yn(yRKFR6jbaRW`N?)b_&%p_y?%@gW?!w4dkv7qdk<TJSBTq%H
zi7ttb%-x!MXWsjHBlB|eeu#FCbi;aek$;%8+A2xCpWKtUIT4RP8Gk$8GyY`LxiLI*
zx_@!~(7mDBg6g`8b>-8_2A0`npO&6qwzw=<zOC%hveV1jl};&{xciIZOLh+`Ii_q}
z#q%|f)cxErH{REp>kbTT54{&28-6o1JNSuzt~<|ZVRyFHd2Owit*@;*d#8P(<KgY2
zsa6r*&fc5&B(XJdY_cl#wso(&z+WHoM^DN-qS@^QcQ=2uc}>B90x!Q3*#?TRL`Nf^
z=bV*e=L`wk;Vz+n1V;o``ez`@eGD81I)j}aZl14`zc_GZXlJ-4r!w+kba!-c?z-r7
zJi(Y3{T6Ro=SE|A13MXM7LDV{*4J{L##6uJ@h;ezcq?_JI~M<I7;pd0-eUE%hIzwM
z-zA#IYYq%)Ozk_le&XIYYoDqKS07XTQ)PYS8<l^r%&Yvq;(sdcEI+R7jnW@?zfk<%
zu6WV2Mb{Pe*!^Wmth`nA$i02`uQ)J1u`PA7^|-y=-fYi9PMABr)v4azmEH)e&U(xl
z<xcmx{<v?B?|5Gr_OS^WW3ELWlSiz7*#+)t{&B&x!a0$9qo3!l%59&wF!%1<ZxBsi
ziu8%J!u!=7Bd12%;Enihk>xpEavlt~5BCpM2L}dk4ElqMg8lK%#$Dm+a4P4ok&?*n
z$kxadyk-7m&ie4Zp>Klo16Sg0+#h^>;ZLXf7P@ur6n6!(pq%R)<+}#{^{Vd!Uy<({
zl>3o|;x^w3caS^8skD!`2UuOa`sBjIJMmHYOT&o9_x3m6H?6M!-X*ouYL-_|tv<i{
z%IcZb4b>;tyjattc12D9n)>Sds>fC>t^BQGK*jy#*OgBy|FZnn3cIRv&1rkCsrzQ%
z@Wy_zTjF*ymHbO;S87jcXKHZjwA9Gd8>!Y_)Jj-O?0mO_uY><#|4M(8e@LK9U_s#A
z;QPTRLLGC4M_$9azbS8B{$0(A@t)k2W^3}t<i84w*$}M;PQ^Lb=IjVR5$+n!!-{uo
zXbiGz3=C}bp99&)`5y62M{bm({eAow`)B)?`|tBN_mA_X+-uzFPM-4=-jqAaKH45+
z54JD1U&VU^&7Iqwk+`#P+%w#VkcVWxyV<?U_n>c>FYK$qJKz&g-{d^#T<!F8j&we?
zPq4RIldXEMvv+%HO!D2t=g6(FKDHfC=G@#kqv0z&`Q3Hj6ZLcI4j`k-sy)N^^xCt!
zc6sd=wQtpquf3qwsa;qzp(d~9{F?4HmDN8~H&*YjzN+TKn$fj8_xz*o_I+#dOn9sK
zUCB9kT6To<2X}$H1Ug*k4s?6CW88pmf$v&>L10wy>QGr|HlA^RI6N+Ve|T?bMQCQI
zHgrFp{~w-n9dZy{8+|MKWppv#^q3iGi8qg)57&iGL;j2NgNp(W;2o7Z-`mLMak@JP
zZ$}<se}+s%uJxLC2BhkWY&@T&mZZK<g}q+hZOGxX+dB)njQ);3?W`tb<%xK2rv8wM
zr2MJhlE<YUO||v5d$(I_t#<YY_AmAd`|fmBo%8Ht@kZcsyOlH5`6u4i8;x8wk0L|G
zc;{KW*7^b&m3~4dqYAu>p7c5*SJv~M?OmU0l{z)qJux%hEB<TKV&pj()ikc@pD2N*
zxv|!<bq8)fu)Fc%##BRD!`g<$4KFq<ZTPlfQNzN99~#y*ENJ+uVSdA!hA9oJ8`?JB
z-uSP^zaMBD`#3hO>5O>aL~UYaa!hKr*Ul=k9<e(*&p2hy8SdHc`EEPBO?b4M=Pq}C
z!3unW+uOas9pa927r9TmceuUKr>lDt-d?)Oz0;kBb^2_#(7D{{;xuy>*!}Ec>s@P=
z)e@_si?#7e`v!Y7_WkX?cD;3yeKMXy|BL;#U2Ok3ZDp@J^PCyZP^Szib+tDjqf$R>
z3-Tpx^;UcHy%j*xMfRu;RwwHyWF*Q*j+I-im^IvPvR}rkJRDh#+W5M_Hk;vp#MSO@
z_W=BRm)p@7^5y#apyns{MM%`v-H6$`!?)jqjo$)p+S#^UV=Y3ar^k`OX@J$;s`h@v
z`$B`gZl0HV8P8}8Nu83aPkxjfnygB^ndq0;93L6ajZbJgv1xVe-dKlNFqQzCZDLLg
z$%tZoVtr#L$BvB+i=7ZVGj>VroLHCG*|Ae%&0@6&-aTN&j*Fdy)~>Ohv5B#Hv9+-Z
zWJF5F`ZP^!`o5`4d_nwAiMqs7$@5YhQ}=jf-kH|D$dNVI3c)9DuxH`P{l&;7aK3X1
zY-bz1V5uAS-HO;X)W6B!3BG=P;Dx|vfsX^zu{KW#ycAd)_$`nK><j!6nF>Y)dj&1L
ziS-!XHU7qbzrPJ~4B7rXyrHxi@$Yp+h~p9Yu6C=O=bh^upEJuI4y~Po#BpC+4_L=q
z-+O=cmZZ9*Rwo}wj!1S$4oE(p+?H&WIv%#^CBI0vO3p~s#>d2ujK`a{G@TnC6<-oR
zBQY!SN@5(cEVzjs@yFsWuJ{)<&2C!XRE)QRhQ_~$m*DE~?fCuizr`<%caOKin|}X@
zw@ZvkBoe)oQ<4jk8*!)F1-T$TNzP9$OKwfp!?UKPI(b(kpH_*-JY;9VD%v13SwHJ@
zYmb!&oxT9?YT-<Fnw;~Hf8kg6O5dfhfx*7{zIDFue4Bh<`sU$n)z-)hGvA$!_F=v?
zcz1QN|AN3|yd~8#Fx?-*RqaOi9ANRO^Qu$q40NZv%iMMD*YMQN?u$;+egfHXimWx(
z_wer+azmVjt3l$FYdz<+^X8>)LLA6XrIH^e&rDV%K22Pd*dCu4pB$eMA0EFu{z!aI
z{I>Xg@dx4~;yr-MfOy|{|M;NzjQFqd0g1_prHP8fmc&<ynTY|3ruh8$OYvFp?ZD+0
zygSk{`BZXl@-rY4NwrB`mYSOSC3OLEi?v18vKOo=)}7WB)+va?7O*+T8iCAUK6v9p
z;CKdXexiLXqR>6|<H+$<Xb*7Sb_$&%@vd?%JmWD~mS>mQTkO`(4bEH09=6%}36?g{
zsYTzf?Mkc({gH>|bLSgpgYyHBxx*Re3`gFh8=d={51qx%I%k*j8@7v_N1dCIL+C4b
z^G)_ptPPjrE$k1F^<oTi)_BOEcBR!DIb1xipS2M2VG!E3z_xG26GCrW$K%T3Wba61
zJ?nxzY3se_ct$G6YKbg2?YyGYt*Q0N+mbgXhbNy;E=g`qUV&U*BT{$5Dkh@5l$w$H
z3^BSg)sRZ15-Gpel=>O}7ifWOYS(+yy`|n}<c>Q9Ip195H7h|fhNrzL$a;4^^1anr
zPumCV&Q4$FdgoMUIdZ9Wu@6`)tz}l!9%9$nu5*#Iz?tJbfcLAnAbQyL=hjT?VPvlx
zX)UmJ;xC%Jtpmuyb^y8JMkAkHCu@`U8m@3Yz;`S3(S&b<SBzY6&maff4c;J>QQqU;
z1aL3|y<W%BYvA!6Z;|%{O2QimT@HgrpS8vzW8THcEq6a6!bIyU%oMe|+c!d+f3~~X
zuD#NF199SY>pgfuu@$y20L~ZK*Wel^WXG_od~1Dby^5@LE0IZNr{%YstP*Rr^$zUi
z0c5?q4KfVDHS08FNSljf*h8($k#(<~m51%=)(u$W##&>nLC{%&RRsQ?_U^ztZg+TN
z!SPg_ukbzw-%mo8vEI#i4)r!~8s5766;Zp<3n6!3$SQ^&S3s^+-V&6rkx^wG_HQ8{
z-o4(9-i6*--s#A0_msB)x$>GJgW$zhU#m5;=l$SK21*0bcA+-}bF4ycLUMHqxbu+N
zZ;?0Cdk}Np<o(@y7<utN!g{d-xdF?(Z_)Oy_abOc@n!?vHOK(igjJ*<y@D`r;c1BX
zgAiXQ0GThKy>eK<0n4!+yUwb$%CH`;v6iB|3zY5#CjG#X3tOxO^$N_<2)ikR4Q<3+
z%fR<~@Kfg<gUpAwL1Xi*71lbSy%yLnwU*;-y;W#cSxtBo)v~K`_Nny@R<Uy}uJUnY
zXxyL99(e$?H+f%pbG?bcX0&%ZY=t<_KzZJK0%!m9h9MRY!WuOgSYMx(eu#H1uCxaM
z%V9u&EM%VJz2|+J&S^6bIKKf5UO{;WlvlwAO47L?_aNhAA<jQTkD1WwWPE3MuX-N>
zyOq$;7WhXYw73)3I@RDV52*D;KF2$($E=s&ssDlvt+%#WyR7Oob}7UfKdjLQ-$+>X
zz;GuJSO)KS1F>nM^&tG>Z@~OQ_+MAV?E>VJv>-e4UG7PLtATDc3iD+$bK(wguoXER
zH=%4umvulJj}Cj^z?T<8zE8Y)(C(Y?oY`2nXF<kUIC>TG&%u5c<aq{2PJnfc#rHAD
z_#A5A!MHDB6YGG*cKBp9wCNzj<PpfG)e)Sv$Nnf-ZUiffWD%7dta0Rz+yf2m1U5f{
z5Ar~ZUXJ6n>5P*^>^H9{U3LLy+DaA1G~jy9h9~5MZYSLP`~kT#&$iBkRWV;?f7s%M
z$bNY>JaH6!d>rBtvro=NdBvJxJq^#i4;dzJLHl2!fh*FvGB3i|GlAsM$YtqRw8!0;
z_cvTWmH@wM%v*t@-@xnl!0RjMegV*W7uufdy##BTn)ca8yoZ3zqwwCTz-BhE`3OD#
z4Vmbrn_$)3A=8fZwh(zWw*s@DAnT84T?Y^S4jNgF-v7p^g|NAgf$hSyM{ET*wTNQ@
zNZk_N)6wc|ortWSXTj<RBIo8skhwoF=#9*wU6G5kGbADIN5C?24`R=ZyjD8<XaM?Y
zp3Xbk9vV6tIywryTcdYNtaWWrI-qnyAr;aH0hI)-s|qL-1BV^ZTM_K48n(6{(SUhP
z{g@{|J-#D2=ng#3L-e@{QSm0k^-=KU5r~b0P<J-&X*(gSX8<(!La#dx+Q3)17Wohs
zI0tt166|jd5S$O6T8uci0(P-FUDiSel8v+wTogk`MQKcF-<wgk0%x|+)9R6-6t7WX
zt~QwQ1Vp=T;OrFSDm@NT9|?X#Kt=K^oBY5rjAQ=Ra!9)yr3}YapjHd(W42fN6)i$?
z&$6B^Y-5(%3XG$r?u93K>8RZfGk1fXo(0LzPG@)R4lNxIj_4olf$}jpI~lo7dqRWf
zL6?^u#BGFiJ8boz*8d<<j0TFgVimg{82%Z3`oZ^mLXu;Ec^>rIh*@{Qs@B7j)&MEu
zwI*$O>w(HPXl*w*s|1hAY1Kg)i_-|=kx$s@I*egNSOnZZ2Kw_*=EK@W3#&nU0}$Q;
zS{0x}-7{-$3X(G0HF8{{micdY!OH2`-y*_%4Gk_#N6<xx70bXEBjQe=TLW)SKtn;;
zP671M7Kj`Rw3xB>bYS&ISocMc_6p$f7nG~8o?Z$Ix)3&h0k#+8><ZKm!hOsj?5{-M
zi_vmEinL#b5mzA^T=gGM9|)>_(s^-DNw3Y#5t&lym97xpNRC&4$FEVAV!sUjxDuFr
zkCvZ+ViDx7gVZ*7&Vg^`VjBedO{lE|#>|+@EVW<2ci+X8$Skb2yfT`I@&uyiL~jz#
zo<THy8Ikq_Sp2_1eG@3}f<H07F=Oz7G_DqGGlW9FY7Q;8g{5?WzB{8Fk8)hP-~qRE
zq@^vjOZ)Lr@MqrLwZy0bL^S3Z&IJu->TQnNBY<>k^z8sTC&B9e05tj_lJo=GECbSU
z^pC)a@stsDAn+WBqd%foZ}jc~A36b?Fgl12rNA>+LF%2Bv;lZAqODDr?||R;=_toY
z_aiXcfbWm#BOY%$=(pQ}FC#tkIICCHBiAxBNi&Btvo~vI?G$Pb0QD;LFNQ@jJ2B(w
zX6zXQwt*8y@KSI^AFTtj%r>1$(+DCe<N=KqkbrRykAMTKC~U!>jt42s#_ZHAG1xXE
zrw^JAqfdSs)dGw;G95YF1OH=Sx6})@!EDYFeT@c)Jm;cjh&-+QWJHA?K))ACPh3Tv
z3RJsbd^>2T1-K(G%*oB1+RTU@!%;FFYhA=n{FxiJ#}dSu|6m+rZ)V@#hxo%-RFd}X
zUxCR6Aioy2whH#W0ye!0Weo~)@<9(9fg3UV4Oo_795o;@P1Zif*k6Gw*DIsDjX2+u
zzCPUr>!)A%fzJ^@fa@}|K{JvKM)qTt!D;`!7?yP2K@aQ-`(qsB3e@tT#4aS}dcv=h
z#ykKC=?B5IlthG@XLfGJ8o9zL0oNtawYf^*s^kMY1!>Re2wyn~aiu$O?g`#{0Q(c6
zt)nq>0g$8zOaAL3*bVX81W9?dv@u=enu#@BVSY{{MVxAZbHjfugtp0F3Sh=`{E;Bd
zHD20938gg^LOp*cfz~B5FrPRrjJ8d0&jmLvU{_p)+rSQvM4^4slNoDh5pwOsD9g22
zuaT&?QlKHWzV9GUvMTeMp*^%h`#dharHvkq8Cs<+EdpM-3NnT>|2lbMzIMsW&g|vn
zkMW3h$GArykkSl#i=vL!D6;ZekJd?z`O<XhOU$1xncTTbQfp=4mn-b9v<y43BG6ye
z{^+sHw@!3u(GCCMihOC7c+DQq2*Fy35)N>px02T=;%D?AhvF%cIbVMPL3EXaULsWs
zAkqqWaV_cy+w7F~>|?PfX6?~(1mb8E)Oim>Pm>joa`2i_u8t^2DeQw+NQ__POx8(8
z5?+hRFG?_<zy3M_Et`4ynK_?m5wE={%xAChkmt;6FSsyMz5F_X_zEr0=z%iH88yKo
zQ6qwSVv?7Z7_ZHuXe2xmvL6MTYlXtC#zscReDr68<aIo4MKa{?fiBCS#Ukjh5Pl(f
z<9~$@Z9~l1c~GN_>g6b6pIqfA0e_Ex5luhlcP;P`A}Sf}{otE=RIaJZT=1@iYa82W
zZyE>bW5ijJ7OaT?EmUlR<MJSr`O;DtV`#I~gv4F`ngMa9WeI=WvW+c_%CtQ37v6iS
z*7hPc@Lr7imld6o@>(aH=2vDX`m{K%B}Be83fG7u(>|Y{=7=#vd^0oZDdxPN;GHFX
zN3K=rIlKp^#W`t<;F_gtKQ-DS9Z&LsBz=}1!1Y6l{L+KOFRm_vyTmYRp4bR7{A~w%
zG`*QCjd(JDKY;g5Tz9#Oh^EC(MIRC$IRodV^;2840eT3r6`W}Sj5)LbVwRU71JUB@
zD>h2&lqeKROG28YY6{Q!`x&~Da6OR-B^syiQRg}{=N5i+FIp#m`a37Qlchv*M@tzb
zu4ntN+!gRzn}v4`^a8H1yr*Earbq8jN3SMOmvxU3Ozf4{f8xo)FLl6s7M)q|Tg2X_
zmeD|Er#_9}(|5%3>C56dv;o0Gv`l17(WuTIQp8^5w?IU8@@zCH7;!6wo+3WNs}kNn
z@tT6sfGYt1KEZWc?waJvZx8HBFi}KAcl2EPD{Yw(i*_v6xb$O224bZg)6;n@R;<Tz
z-6ZmARAdwuOeuwM%?QLo$%v=othlnLSgS+<#sRrr<Vq#0I@e9L7tSFuShiyMv{IqN
z_#!m+Lw1e&yatnq&ln<6NFpQGYtgp&t!O;WEbLG$QCCOdjO!y~maOHJ#}AFlN+5BF
z*TF}?vsfffaBbi^A@Nn$D$b%@OB~cFDPBUU<u_*(MS79&W-OCP5_S2~o>+*Jxq1=}
zX;fCMNSpIY^d;Kzdn~+H;57rUs`M&>YcJ&%**S_RO6(>s{Ia)>65R<_;*I2+NHc;c
zcVa&h9c4w5SZY>U_7oq{Xrwk|Vv%^6T%YJ_M6}KI3TZIn(kgjv#+b{fD}JOAg@1`)
zG~qQnJ%=lfTsKixSz#GT$-mgYu712bWT8*7PCQ7~1V(4^3+haMQ;2sL65j+Tx$ZKn
zqk6q)h;y^(FEz<^5BZ?ZjU^BfMb`Kv`$$YxO^d$O);KG@hJ|y8g^7PsXVjV?N9=?K
zDbNyhHD^1=@k<?xr;|gmMcI=RS)<rXifEIPsFqnHR~l-;jFhs*u^-2ACPP6iU;J0B
zgxVH-gsRLUb}!g#lrg`oL`~3s#fEtWBG>QyD}rJrYpQG+9fS+vS64-i`C`Ljztq2Q
zrd*2*Y1j^KD3gaP4T(m)XA_CU&W)ZqFEt_e7!vE~XQB^dE20a|LyGdNSK@D6r^qEa
z6zi0TB*?Nq?M1KM)W_utO=i-$#BS6&$r*VQWMthJI?}_mV1o1~GagXeL`y7K&ctp+
zcCm9~IV5lpI_!j=Y$*qIB3joYK|)5E-lQP>3h%<Bkw__XwE1YQ)^JAgKD9rwK(TnS
zdf`@Qq5fn&=bBDSQ)`omBfiDnW)4b1I}(JbJ0d1qgPXyREvjLTB2VgpqFwrecr@2s
z+PuVX-c?9UA|FJ9mdW{rj>KfK8udQWk6^DFW1Y+>ydN5$c&}kRTY9Qi6)BB@8hgmE
z#4Pf~UWyUtmZC_}e#ELW<!2v$4%MmDt3BwR7R-<CrQV#Wez?!{IgL;{T4OqWMpsUa
zE4-Q@&MJ#=rifDtUiHX&u3I8!WRnO(>l5r%Hi_NBn{bjTiPVS$97o(#XNn}xsYTI_
zp&`iYRyDzM+Lr8zi1g4a8A>i+O04li`ebq=vnvlGjUX!AiU*Rq#ul*#gQ4m|DVfM2
zaaB;I{}5d*f~cO!o;?*8K|}Zvu0_(!W5p;Nb*&Xd$fs~F_DwVd5wUIYe?dxdk`-E!
z*6R_~j{25Zh4Ct}2=QGWQ+m<0$|BKHY6TnV!x}*^n|I}0bSnG`|JfW;ej;T!%%rY7
z$ew-VL%kWRQLV~8(@vxx$B1=`Y@#d1DN|%twPD$-JxjD9U6qE&b8VH#DAvkj@l};m
zwLoM9BSDsPWzNiFiI#Fii^^^nj1LN*v7+e=VkO)XT`AeeY~x31WbPFu(Sy+k`>1Y2
zL!u?3BVVHrp<%W%R;AY7nKQ5l`!g08&k&mz>k-7$Y^U*~K173Rp)x;vNMD119&xYK
z*iR)AixN$!K13@#Gxny|t@5ZQ1PRviL(2S^h);fmE3?(3bXRO-q(N5ElG#O8J!YGk
zDHAVBC`D?GO$a8!hs-I6rwOGusy~@oWTJ$MfWg7^kR!P#CK0Y4*$AlYQg3FX7Z_=c
z4UwYZOnFkO(jsV?UP4v-P&!%?YgJOww2V`l(rUECb3skU%UGetG1>mAwkMhq9E1<y
zOU4Ngre12bEmKm?DVzxc+NO2ds<e16@|YT_6T~@(k;h0WW7tN@^5u-G0o_XtX$uE>
zq<pegX|atIb+5;Y1nd9%M=560<C&S+P?A|ljkEA0$J}O0qrO6UvTc}>v9H0Iv@#JA
zx}>aM;Usg*`S>yXnBKybv3wq}RX#Fz=HCj16QQTAJQE(xR%KG%Wa7jAh8H>Fd}`5$
z%SZbBk}BsA>a=S!BdH$x;i!Y6q}LHfREerwWhDCOUk2F14{2~~D04QAuIA3w)R7C}
zFk32_UFl`+dCZT@$9^&&_d-Kv=ec~+tR3V|$H`t>4&Mtc)(dyKm3oy=XW+SVA;)at
zSosQlq9>Y?b8gwgc!1RC97bc=)Qx-`nT-I?NR=NMCCA+AY@9{=$evrFBm5W|nWIe)
zJ<4p&q;9NG=GEDRbKROb4?W7HuhNR7zZVUEPZ~o}#|y8@iR=yXMiMz92Rb8b3`cU5
zh7cR|yThdyBy}F0SEwsR^-7*o+cJ;L%VV(s<Ee7Q-a5b8i_Vk}o*6ADmok!Dp&&;p
zo67rp`%I}hN_i9tI;-Kv%*{IKb?DZ#P!EQN>8G42{(7E0JNxU*ilA`F{_LB%m)=JH
zOzwDWEKFzP2sxI%T5t9;j{VHX_$KSjEF!g>bDNEV_EkJgyUvz5gYqT&LwoRCI5V@G
z(Hz4bijTIKzCurQBehbabY+D07iz528rkcaj5QqVd|Iz9%8%9`t`DUsGx1Dk5-TwE
zI*0ZUn%wG<;aZMNn~qT$W>%>Y3Yo_;Rw*9ZpY)8b<xICahjNukRe5EdY0svl<IP^V
zV?RZdN9Lo)O39p?K6;cXi#b+F1t}S?TQe7XnK9Crbag(RBXe&k|KI(Tn$XXrs56=t
z!;v1bO~<g#_?+QI`UpQNn`zb2rsnsL*iZXr&no>z2g=u>G<0NUeRf^84l~iyG1>jK
zR?v|%8I@V1y$;_CRzg#@Ixc%2_R|{O6A9rrv&Qt3*@T{2j-md0{R|zB6-~;K(31Ad
zZ8p-RZf3|l%cP>TNaxT#(q^deoc&}@ZO!CB>a<<@%2s=uz4p>E!mIEg$DB!!F@3eg
z&=G#JW!F|k`cQneJ##N}WYf<+lisW|{A5!2Z}mcjW3uTfmF(JVie?OZ@vEZ_?WgD2
z)J$(Z(srRm+_Xk0NI&hZ^J(wQzM03`pL_Xgjgde(%-)-RtS3!l>(WzcW{;CI!$US5
zIpdZ+C}VcJp=_ucTA8hf)@uK3+%l<W9&7La4~5_FCvzJPR0`Jo{w#;jqhk!ehF&%m
zp_oZq{l>Hy`s}B)4fjG(+LZdCoSOL$KN5;sFC%q4_i`-swU?RwaDLcd*Ae#19CK)I
zZIiuhb#9J5+=?=LvIjp(g~w*Jjun0~TiA~u#aho~uUkVYlfJh8UY)jaZ$?Y)|JOfT
z3hAx<>6y~cq?L)ZwCl`fpV>dV56`rfv+1$$FK3G5;Uig-IfniD)tNJo%yBlyI;+ej
zTT(GJWZvJOowde(6|c+@X5Q>0<yB|YG1>E)R;kbCOIvj>G{}XalG&Ddtn+a%h_Ka+
lV?EEL_P>o}FKx@5)zIOQoSC{qH9#tvAEBspGh58@{{o1}kAnaJ

literal 53810
zcmXV21$-3A)9s#F*CfPnceewAySwY*9PaLNIKkcB4tIC=!=2-hBLYcQ#=F0_`~8{U
zPIh;Cdb+EtUcIX7rB%av^@{vWNc)=Y>h>KxDpv#{gyNW14R0nBB9I8uyT_;=7PPSv
zH!%?p$w8t?9C4wQn^=gG6vI)9lpuvkR$PfE3Rk(zt+@9W|IWYhUp&bDoN)^=cz}Px
zH$LtnnMpyk%|WuEPdj?aP72_goFq4iMZcLzSyGf_!F3Z!#vL#Dzx)l?c|1wyDf|O}
z#h>sOIKRdFWd4;u<#+jCI1cke{5-#azdQL^ocHm|{5W6CXY+-80$;_q;<bP;=QD83
z;v@JtK9UdMqxc`ZId8@*^J=^<-c{rcaU}5ayaum>?|brYyf^R4+u(Ww9Ig0pJ_h$p
z!kwXSJMpo6KAtd%_u<3v>dIT-&UU;l&UtwOo}Fjq#dvAF%JAIy<ls>}5C4r9;xYIx
zC*D=$6?k>r(F=F?z|o3#$1_{un~vze5{~@1BR?;}Yhyf(aHTiKJstDgf%!dzES~d^
zki>V$-9@}4hD4GKd`f}r(s(+q{R7G0N6Qa*|Ba{O+Q0l6|G~fT_xv;3B%_BXklhh}
z6q4J|uS4FS(dH*si}P5lQz23WM}Cq(Dv>IrHyK2_kj|tN8AJM!p=2tVK@!O#G7D-m
zh72ad$Vf7eOhCIvq&mq%@?w2!;!{`Bopd98aA!+=8bPLn?-`0dx}vX<WH{~_NZONj
zqz0))TA*z&^za)rD*|&0asgUWoRq|Ol}R1amNZA30l2#Z-nS$bp*Pi_i{<gGvKVU-
z+?56T8PeGpXspgnkdGY_@(>QWByk0@(0L5nHzEy5T~ZtO6~a+~)W_$9aF3Zt6tpoL
zdWj_|klB5H5U<d?FL-@`<gVek0bRR-@9*$qI6vZ-(CP$axq)wkRM+v<SgB{+gIPpD
zzhhxTahOYM$YUy*PUezB<P7;p{4l`p<Tg1*j*}H+HMxYptzZ$67}qc|8xpDtDZId%
zM3GPE{|;u9fN|Hw=)5r6x3D#iRnsu9vsk}8xKbaos)t_sLr#N8U&uBAaw!FQ{RYd+
zfPMaiEKKAJzs2|Rjj)liyaRN&1+=X#G_ol#!gKL_I5*~faiuk^q&n=RJ@3b7^1*mt
zA8qUKn$X!HVXNwkYZZ7b^t(J-cSR2spuZ8^!{gApA=-Bg_uiL}=Kc9J*y05KC*C#W
zox_&a6WZMj$M5(&5Pc5Ahz8*p3L6`WYt!&|2<);uuJ^;=A$%UbnFKqZ533l*r{J5R
zu+ZOO-y?DE04r(@>*|3~H^5V>VLS=E3PxOtXXAwbVE5TW_LSXWf3v&n0^5z_6no6R
zvL84COk#w)xDA#Yi^GM3F@xE;g&Ry^7S34&_wo$1`@()PKTBsRIDcW!*+-UyPkv_R
zI?G_0VDmM3OW5^5V8<*z2eO`y^Af(9Z^vq$<GW#h2cSteaa`y3_-(Aob6Bj5^%035
z8ut;}bsNa{8mRaQ$afKT`vh&?;F`{(p{FirOBSrK8(J0xt#AN~WiG+S(_uemVhd|t
zW!O$r=)z!FSQl8|Kv?lWGKZ`otH^q?kt`rP$R2WyoW=Pt^zkpUlI$W!$N{`Rh3mVZ
zom<i7B)LH@k|*RAc}Cumm(byl<PCXEPUGqv;K58X3Oe2n`qu!sQ2^Fh7#3O-He42Z
zSQ%E<nlytgS0trLIaqNg*mGatz&wm#3)xRL0U;8}W;|^UEMg-@u?6Eff^l8K=q{6c
zc>XQ&m^>tR$rX&_G5H%$e@9YbNp>1Zvrq^1P$!L{(KHh^Q<-pF`;UCb-_X@Kngd5x
z8iDsEXhm9<=Ehwn>ZK~NQ6Jg`iG@bdJTw;HWkOGceu9PWA=}9kSmZ?5Rc+W<Ghj#^
zph#CT8grfuoBk75mqVV5(Pk7{jRQtbA<M{G+_@ODTY%3KAfZW^;a2pw2Xov=mg3k1
zxvs{n=i<sZ*nMBp18CU+k}3xY<{+7{h9CIf(2spUmkoR+&}BPpcQ>qeAzuXVcMb^j
z5E^t8T6Gei&*S}N{tkNc4w&^XeDwc&=L~q}kYA_p&XlCWr@jPoU4{qP3Z2}Bdmo{%
z`@porVSjoEsI~`Iwhhla22I_Dr_F*loeQh$2doHL?l9={Y<Sr{z`bKQuJAJ$$I7q;
zw}n+V4|}n;u***Hq*Gu6yWqDE^Q(B)ON`(Kbonu!8M1(@`1Bgzg~s&&Uj8;T{yp^A
z2h^glh&Zr_?9g=)Xm}3@dl=8#j`L>B;3#_d0nQ<DCzy)C3DI~4Eae`q@4{0zVvZ}}
zK~IIp8M31!T$j<e8S=2<2ziSL{0-pBTZ}3hR~dY4G%16Xo=WCpefPpLj)#w1<UT&F
z#@bH7x=zEoPD7hn@N)g34|8Ev%g9*RLS?L>oBSJ|=^4m#1LQXyGFyceJcC(&hLz@k
z532wl*aSA30E@|oarhvKCs@xtd<9>MJ7!|7m*DOVn8y~}c^7N@1ta~5IfrJThLJsh
zkw1YqFoT~Jfdy8D9u0<#-+<*j#c`FKgRO3dZOz9>M*s~P!&_9wxeffqP>gsQG<g9=
zzZe#{79MB-Y_<lhw-`KeERGngK`y*6hx?k6-!a3nnD-i3>Jr%M99Y6YNUIW#0%&W-
zxB~D}KKLjB$doON2@Rm(A>Th6Pnr$;nT7in;ED4fvn3eW0?2hG`dkA<*h}{PZ}iW}
zE8xHptiV3B8i}ztfZTGz4|4EL0+|&AAE=4Bv;>prjH^w-7m5KD1HjQYkjs5&*fmJ*
z3B2WTSjjy|>kDx84P^Zlc$x`FRRKP<6c|h-UfFSu1s5hH3DSN6ZTuEyQfFWb$Dng(
zp}8qogCyL42fYM<$e|wUfYY>xWClR?12LAq7;A5wdqbABA)V@IQyRGFg*@z#QgN`K
zYWS`Qq}C3te#bo_q*w`=EXG=G0$QBHy6wU@(;%;@klGf=XbG+^g8mHxYRmu<jS8Qq
z;@un^3xO#cAg{;J)F(i;Z$u{&2>`h-19cW+r6%F%NrqtMe-Gnz8AvcQ=9~gaor9g6
zg6(Vvn^_FI9Rtf>1^YP%A9MqL>O5rm47lQeZj^z<s$kX~F_JmZwjJnwKA2%IAZ317
zRRE0i8SHaoSo?PZ#}0?rVlNP7Bjk7kmi`du`;hQ6ynhWzIw6}_*q#I4>_1q~bv*eT
zTCd^DU=NEhuHCSM%^1}myefD`0?xI-JsQCG*TcIu@Vg^{R3Tsc7uv1|YAy?-+zR}i
z0|qh$%wiE=3!B`Dag7I}^#-Tu4zAJ=T)qbQdl`7}I35WWP=uERD-wB-1(^s2V8*+2
zmc~B7_kUpjvQ#F64}51onE>8X660wB&)*714c-K7sW*;6U;$(B%+cUgeZZWC@-E>~
z)&)aq1=i9VJf;io>Vj+2fx1h>qrQfj-vdg#gMF)ECRhM+DF}(zhZPONT2F-qj)B*i
z0Q;GRb6Z>~3(E*`_g2v6Ht<rd@oEoiXpeXGV1rTN7%TxgS0=oA^>BR<`s)vkm<j~=
z12$A0jIlg)vl3c`tb7Vy!^3|=I5!WvcoHkUDeSvK$T1u?IS@L~3$O9OmTtH^FF2<S
zb{~&t=0Sf2fE|(0_$*)mmGO)=u-E?|MQQvkgVFVYHV(mj#z1>Ff`1%<{x5}=Zvq-^
z!?6(0>4_)Q1u7MR9CE;d6d?IqXwX;0Hdgq{>_Dfo=z9XRaTl2OA=t|}+&3O0?1y`s
zzzR!aJlPR}Mx&Q}7@s-Jn*6}dkI=V|@H_uN>+ZtFPvHCvmi_=1u^MYQ2%fAR{8=xs
z;bHKBzk|=N2<zDic$&RvvmBN(3Eyo2Q{4(3?1FdS5Phr#!=3^Cehl5~3>{dHHSPee
zZLk9Hq}yN#E1@6L@oEk>oeg@?37QdN+jYZCb||!eGx{6^U1<YNX#uUT31(XcF{cfh
znjikQKiKnB^gj|K3u)D0yc>o0As$>7M=R+00JN%v_LVXA@$lYDG1eongZ*KTL?Mk>
zNKu354I<*Xidn5hyAao&h4~zToxXug9zr%7aA!zUSEI#J*wtL1Maa`G$LA&ZW)ULr
zy}*jwkY)fr+zQ{E2Oh2zv>;^LKY<EcfiELr6Crlr4%R#n{cHisBw;Na;dNm=7j&jD
zG^7r2vNYB!C!#h7dI&LsC?J1`&roPW=5Wkf2O1j%9gD&G=EjpOn2i?*)f6l^q{%0N
zfZvEllF29X7I^p(7<nCjWdr=rLg@8E;OYceR>%sfzyc~mW10b%LlJO2^e_s#-8dXA
zW=G#XZg4wBR}3R72pz8jly8rgU4g1SfcB*@uJ6FK$M8J|!m?Tg=_K;WK!AbJlbKk7
zxsc;N%=s$x?mKk(0VMk;R;L*+ik!l4$Rp%K?jVw9!5WtXrc~s`@i_|L$UvcF_KaO&
z2iRt|maSwP**2W#uvzRlJBp|z1+hyOUJ!Rpgj`o(ea3-Hgt8eefK4@_Wo4j8A+&1%
zO>2qI!@_Gi2V<RuccHah4=g$hZg(25$MA#-W?UL_Nq~g<LmE@SDK^1xhdfCYd|wFi
z2;~9tVU4n2W%9uut3rm2VYkI#aXG=^LM$``E6l*OQos;GJmx<jT?(u?5-U~<R@w$Q
zvmQJk5mN7g9ts0NvO$YN%#8E==%+fKR}1|%$I%*SP!Jjr3yny}bw8}`3z+jQu-rqC
z?j@ju&a+@F31~YL{ca%Z!c4sb(4-t@6w<5Q@XS%rteTkD5a>V%EvknxA~T*H0Ue7Y
z64-)`{J_}^{ffgK^`ISnG0v5c#A%FqW|)Cw!8Hf&$cu3m!22vf)8@d~Ezq_n;9|GP
zd1&LoaE!AeJepnLXt&4*l0mZ4;<Os=PlwVObP-)l|Ds=M8WjaUy-j!Gb3fXgwxV@t
zX<CvdAYy5VR_Evg`kX$dXX#cti}u33ZD|3-F{#k+%ZNK>0+&OP#&RI{dOZIeM)4H9
z;1xK+4y<AajIS8R$-t5J!GbqJXI5kN*Fs~40Xu5Lv(`i&t0=G|<PUx9Ep+cLV(IJb
zFT~Q{SQL+k4mAkd@oeb(4rtE+*meovS%5hZO-JDVa)`xiz}JogT6Kih6@`Wi{44v9
zy+Le#5*oaOEobwg$J^Lpb_{(VXIt3`b_ROBA3FUA-uEuM#}2VXwuen&8`vgx9@pQX
z#a;G-eP&PCUfe%~^<s@!eO8jy!K(zz$BME7tSl?Sa<PIe8h`WSQx7(V%|s86;k{kl
z#M2SYKf>{z`QYz$mK8odWM^h>M^?a$48j{c`wFaW3GSQDrm<mc9^1)oBCGHOS%6SZ
zKt*oA7S7{UMh<Tvu&oXJKyz4JXYiBCcs~)+Jpjx-0Ok{l`9qBV3M_OsT2)2{u@o<i
zXMKZ&UxHb@0l%mTW>FTXa2^P;7s#5#vjU~>!#hQScb11edEi3_V|A~R&%^|+D26z<
z2+d1H`WR95O4w?DAW{WbKp|L7SFHG?a7=Ro@xftoft&~*e*+l_by1C^0Ve~9piPLF
zZHTG!B6c?8T#dG;Rj8Z(B0msaKSFz%L?LcABaYTcURs|1Mmy2IbP%0KhtY9#0Ubb_
z(o!@RqU~hz0)BiPENcj?up4l75IDp-u=-o%3;73eeh7gF0WIQ48s_)~X!r=$dkEg?
zDm=+k*mZVbY~3(d2(it37$ZX-;3Bx^Q1n|CHd_yU4}<?-3Knq(QT-QW5YFOt3mShJ
zBYsZ4l7Gn?X#T$>o&3ZMl3@pFusV%ohvdp5cklwU|A5{%LQaig&&^;RMSx6MfEDH7
zO?tqV*5MgbaIGm=S7|VclJFM^cvl@GXa-x)6}}_HdqNo42=i?p_7dG7|BB$oR`L}-
z_7u2n$V*;^rOkoQo(Y^zgimY^&tDw;(+v%@aSeH>pW!@HGW72hI}7G%M=mN4UMAR8
z1~OCG;1yKp<}S95jbxo!N0z|qv0AJOE5$0ZR;)Rz#2n1SqF6l3#3=seU@&(h5=pMI
ztQqSL&v6NCHamDlV?4hX?7j<-;11%X|KRg4!Uybx?u9t&I%we*;KXV8EFHN0COm_K
zU@3#)J3_2_HT=d_#O7CVcQUe=WpRYGuO9qHPspklBzPI|^g<wg6JUH_tltzweFu==
zIf~fj9x)@IW1<<*v!CHS(_O5{SCWF`DOS>t@7|MNBo6vk9PM%f?Gk7c+L<n;d+9#f
zhFYNcY0#?g#EHyLLuhRi+KnEe2F)zEg$ThTxCDprj$Wbr>2f-PHlUSgVVWP7Q;BxK
zF$9|4iZ-L|=xiYBA9O8{buz6<e<1%f0bH^YX3`7M&O&g@L}-`<I}m8-UmBiy9+vnO
zSpE@s-x=JX1hlqo*kA4h?@)+`mZL3b7mTPb?L)iL+PLo%X8HkMVL5pCc<A0F*y#Xx
zj4#mD^FZu_xV8<?9EoQp!sFIPzqR14>O!xIfNg|weQj~~eqjC`jOYsPJckH!I5e*>
zVt}&nq9w2jp_s2B&RZeLYq0&-_<I{Y55rni2e0}Yj4%=0?<AtNld!s1$gPAj4Y`q%
zP66f|0ZWdBCzRo@|H1pyU<eoBWA7tg*$bAs2J3zgh;|7vvw@7}LtH%o-uM@o&1f96
zfc`^}6Pk@<J+wX%c(@UaaVm7{Jo<ZuXZL}3s1HrdgWPfjV4{ov0=lQNXw+5=XlYik
zmeTMzwcv4@z>8#s7s&=&`3fYy#U8LN@F4A2C+K}Uc!9QT60Aps)!bwY;5FK?$}AhJ
z0E_6uda^%op2k+A&1AIh0n2H{I<P^&_8DvhyhseQvj|vDc2<M+W#i#r_JCh~U{>yg
zXZyh3q1QEN-G}vuKib8%!y}wPM)?_gjaL$LA(||X*-U^<ge+qtcyBNG#VGzU>}5g`
zWiEJ!R<O73u>JXHcLV761v$GK(6XlCXzD*mH5AJx!WNUTDgiJdor~lq--KRwU{%(_
z;&y`btObu=1Mhnu_<0X`q+`I(zhJK$Al;jIT|n!lIOc_0CSpYfA^z=%XZJ+>FbHEC
zixKXEl~2d2tiq}dgx^---9ED0@XF^IW7)7Sg)xW0V6$_8S>wQDCx)~3L(s<#u!xsn
zwC51Rp2E0~hU-_K!)SgBW1`^ye`8!5Fw#EgDHqncIDB6L@Y~K{#;gB_G5c`ee$3a$
zivm+SLT6_ndodiD?19L&g>qKo;k_=xPuY<F^->NjzX$wS1TMA<`hOko55WJeg+;Bz
zy(178mqkPn4Q5sp{INcITts$*51s^fdkoyajd!aN0Zc^O#d!Y|Ic5|5U18c1*s_Qw
z(oJ*~eA__SY-w5$42#eY@GTc$)i4p74S3WVn9&*;<{Gpk?pOvNxQrghaeyA9FX?ys
zFa1IPrtj!sdXjFYE9g495}tB8`fN>m<2(fHZ49tx2kzL5;{@6!gWVm)`!jS4&NF~U
zD{yS03(;;W9ZM(D)pQrVL3h(@^fC=liJFAWLZnazXSW~<0csX90q>lGN|WdZ`h?yC
zSKN#FY@z4T|7(ox4!wwRZKA{JXj&h0Xob<Nr`zZnJYft*^#}ZTao}i0pkNJJ6-Zec
z-&6)WaMKiE@momm98hI9Bs~nW%@dBrN&&G}0AoZjh|FNCmC;8<+*cIrwJ6O_4KS9S
zK)F@GsGdNm{Nc=z1pQ7$=Fka0DIg2?9h!RtdNmo5Vh!LzefY;(h-_OTB3=UoN=A;v
z15^veURB{=Vvz?;h0i(*i}(xi*9qv}Qyj@*{vN`uP+jj8vZx&O=6qmJ<>9;C<Q8;q
zBD~&IoToys@8G^r4fQ;1az7&Vg}|+L;p%Nu;7=0x?@ieJ0wBd4wh)|kKm7A6)RHn_
z({99WA$=YXuXz&p??r#hVTZ?{)rauu6fp80Jme{O&r5jDBaG)IMis)#-Dnxg|1Jkw
ztb|2P$7m*CY`Za=)3D4du%CaBg?k9Q>xZbm0iG19-1Pynwm`O_4s5A3r~Dt_-g_|P
z$9UZZJ2rzaq<~9&0^|OK2vKJO_(KppQv=WZ0ffAQIlaX@ky~Mv^^mz5i8&8IMxin6
z-OB$34qriEr@=!0#yrzd87qugP6e88#|U=<FNPr(S0C;3BbyPbi2Y(Q7}XTu*Z*e_
zZ(%JDW6lpS{_B|GB=C&{o&#(qA27WD(560SG6VULRY12jko$(PgucNOF|70l$mu08
z<{?%lR695eIi@0}B#82Y;hfNoaNXn<Bp9j^wgjU63CXVo5*@)P)<KG)Z03FVrQ_l6
z*WpRaF#6fyQ69y5pTXLPeAol%OsI<dH@Lb1?eT&KMk2~g#+Bnh$*p)se>|Z)<edwe
zSqrltj8)l&bz1`Ig?wFiNGLCl#@w<X|I!gS8QMdT2$`(~3NORhCZk;lNhd>BCgWO1
zWPoZz`sJbHjgSopRZ=@a@0LJTgCMPzhz5Fx(Ro8yvqES+8>qbzcmIx2<i}ADtI{6P
z{#eX-9!44R6a9hlbr6S@hm?zBtnJ}-I^s!<(Br^xu6F@i9fJnG2kR3MtxMq?ekgAk
z3(G7Awowhy;&^cQ+2A3Y!PPc^bA+lRhY&;T1<o!8pDP45oCgtQ7BF2Wthy}jE(j~D
z2>ww6nUc;x@HXhFHgG+ZhaL?)p9z%i4;1Kw7-0-p&J=tz3%Tw^!2GFr#ya45D3;lV
z+T=lG_d-$hBXFaK!0k8STq$_}65qWH$F?8v`4bq>doaqI@B};1QzH7AgVze6)Ir1@
zp?G)^yvR@>Pha?#PGI(J;1_CyvjHJK-w?d25SUeQU_>b#Wib9aV6n}^aY0jX*KY7B
z-7w=W_`V`qlmr*cfp#U3g|cFIOftAr3~~WxWGjQv-gM}%4}QBi<WUy=cY-Vig&AKc
zi@N|L3spo`BL6WOXf_P!*%CdMLytMI26pu1L2X6G5!$CCK;v~F)GgS-N$B<-Xx&a|
z)FSBLR_N!&upaJ)j&H;FYrri+*4Y_a(GII#5Jz>`c1Wvc<E{|eY(hVwTu;cJ7{(q8
zPgxNwRR;K&2eT@S=qw5~GZnd@k6`ynz^(m2vz6cx)4<>+Al_|+Xf+1>Mg}%y24l;Q
z9DOtBT4!iqM=+cL(B=MUKLD-g!v-I~)}F(nzrunkWM;y=L`W|JoHLZA4effc;&35P
zW5!;FP<2~DPREXKVxeCur2GQ*{2Y=<#kjtPBkHS=$x)2$1m<%FsCg1veF2DZBP^?p
zV7Z&Zcp5_D?U?;S*hq-WTtJllH%9aXQhkE4J;My2VRWH=DFl8cCu|^;=^hmJY%9PX
z7D2aGhcng7VGS#xRcqk|*I^wKq1T;pv;s5l3oYq|_3n!GY=Lj8!&8-oCN+aTH35?k
z#l*wm2PeUQE<+#Nkf}KU4sr(Pt$4yj^wkahgfd>O(7r6RtuFL%AT)S3p0pgTLRqC7
z(Epo=B<{j8u7%^?6p{&^w+fCrv^(OB&WO#s!@IWzTPY7;Tm)<-0c<84e7S-s<QHrw
zgCrr!`V8B;0(;wp-dDg5=VIO?;bU9l)gQLf3^q~$Gbs-LSpjy^8Z(JUZX%TTyb1od
z7kTTAVEp3|*)GMqJ$Nkz9&G^5O$92gz~8a(V1t2X{o!f;gpc2eRw3=(9%iwN!#!-m
zc@tWX0ZL6q+aAa>RYs<y2y&$LapZ+2XG1P22ehti7%}SsF+=&%%E+CBFti)=emL$O
z4qXpbffm3fwgP_>q5bpG^DaE|EOg{Ac!JQUYgqO7uz_?Op`B<Sz!gHV?oTj>H_(R%
z_~sX?Rnh;mNi%dYKkTCf&JOHndj{`v34N}HRg8w-H-+{$0dlv1uP%mBHAGZU7)ad@
zV_5?|3|02G;>jVl9Mar-VQ%mOn)n?xppd^i4_prQmIzy&hfz<1^@OStb1|1C81W(8
z^#Hp20m$u!mBs;!3WqZit)L^bu!ie_pWDH<PGU`;Av%7K98MCn<T~*o<M9M96>;}}
z_+EfMRD;f>5j$dQI}Pnsx{7Ry1f<Qt*@DcXOsvqDBH(cG;B=XxJMmzGIcYvHaYo!=
z<2p3x6Z#F|@;R`>hj{J-#HLr^DK-M(#sm37HW8|PHizakhF!D<-gQIFQv;tu^{e{$
zTM?0ND7TU;?1h}jcco*Fp=!-J$oLp6`6$j;aK3>x3+<Zufm~6j-W1B|mJCPU!!h^A
zVDonH)_UNtm0<z1z<xK=jo`d1!0I=FLvO;d4J`UJeFP3|6AB6qh33L|VWqH3xGf|L
z=>iub#A0GOv5h!MoF%RoPlz|fTjE3U8qU|m1NgjOTquqZn~2rK;$pOzLu9z`t#Dr0
zDl8QK5UL0z1TXRq$B}ht2f1aXU$7!Wkzsxg464aLBmY^AWf-@N1x61ezwwX0Lm#C#
z(sk{TwnZDL<<P#X_tfp`8g;ZfNFAy6R=cSq@OO#2NxiN<Re!2kwIbR`ZJG8`i`RSW
zoAfVwV`IG$G`b=#6`_9<pv?}{n|DJ43L;xK9}z%0%^|cCHVE&80%C7*EoR~q^O!1`
zCYiRFZki&@^~^KOH_cznH7#o_TP$BJ@z$!=Ue?9dUDk8fr`9i4$r@uTZL4gnYRhlS
zYs+PG+a6eFSU+0wSqhq~n;wfb#BGQ?y3qabu4}>F-Mk`eVa(C5X(p|qI$L=xmyrLH
z?nq~(InosAq(tQ&@+SF?{2h)ihtfn@sq9gbm8@!cb(<Qit<kdTtMo`?nGuUTa58Iv
zIb8%xD-n(i=Ri|aNfG)dJw$I}W^Tkmb%pGLjOc4Qtwq0r1H^*qk43)VFssR)8q*A?
zq3F-`8G4jHQ;X6rt7Fv?s!83i%v6>slaz(Z5@okCUg@cfR0b#+N+z|cI#1oA-c%#C
zDq1(~q?W2tJx;H!57YPSFLc2uhp{~|B3N}~49<dMSAb`n2kX6waejt%mVm~$gGWe2
zoO&Gga2fWo9>)fZZ7$IKApF2waI28bK0=QEA!Z~I3QLT|ksDdrC}d*`AOfp_Z0lmW
z9kJhg`im05BxFIv=Me0|Pvp7+i0xkB+9^bP=MX)ff@a@Do!~O+1#cjq)pRDEg6MNR
zGG0{>k4eb4-GVQh0n2O*c9t7hW<fUg3y^Uw_`ytg@2Wu60?4=gK%V7fI5)o^JLwLw
zgNTKmA~Lvx4CN90y?{)~4#Y;YQIT5ARv~A;5gK?J*^?`XFrFaSo{qTVKj@?#$XzRp
z|Lb7G|Ab?-vdB_(L8k2@GPVJdnU<l|X=7A2CeXRC+MS4=_aGy;6S4Ci%wspc9gApo
z4`z6SUPXNS8F6k3m8l(>J_qc$yiir>06l1ia|dC#FhdwA%n|ko=Y%JMU$BVz#kyh{
zv5**pBa>(oQ-vSGYvH-@NVtK&Z-rD?@eARma8=kQOb|K)7aYPvJiQTQb{Kx!L+--2
z6y-0GUn<4k7&DCQ#xZEbJ#CVfS39J3QvZW4l~mH?^Kzm*82bH(+)VBw_mum|W90Sn
zW%;x0RoW>Vm6u8$^-uMdnoXOhrD)^ymwJM6-6#k7+QFEfAYPpb-yV-R{TO{nBf|K!
zUAQbH39^u1tS$Bv$H9Vki6_NZVv6WAMVV@ux|oKWR+vtk?wG!stmZ1_!RAEsIrDQf
zu@tp*wXC<Cwmh(;Sp1ftg<I@ai#6IBX?0oStWkJnv)ZhhC6m=@l`ITx-dj#tYFf^k
zqs%o;J4A=rNq7L=IS8J-8<D4sib#gh!8oT=y|$L9-c+KLL^(n}Bn^?uN(H6n(rD?5
zWS2X`zQ4%Dls}Zo%1&juGDbO|99CW{SCm+FhI&|y)23@5v^x4&eXahF{z1R4kI?Js
zRrTV!psU&!jp(iQdHP=chMvu+WHd7x8cmJPMis+r{M3Ky_w;@GQT?3$RJY>FRAZy@
zz_77$Seq5l$xGPtSrk#&JzfxAh{6WEK<BLDIx(kdgDKWL-u%^E+A`1b&5~f<1box1
zg>6f04{UaOHG3ob82ecJaC<L%ZF_NhA$zo4vHh^!wXL*`vDLIC00liZpY^zPjdilM
zob{fik!6zktm%$u5qk>f!LF*14TuC!vBE6L=xUsSl)7lA)O4k`vP4dnI!aRTaBxnr
zWw29lYfuUHlrBi+<VEruxr8!AnF0N%p!}w+QI0E56{1#ASEyEPp_Zg|(--Pj^*4H&
z{!PCE9P6TYz-z4DLvOE-)6eSh#ysP!@yXB($%tdcQ312E_r`7GgYgu{L*tvlpkbq6
zt!t6PxPe?iNwAgnkkLVKpwft$K9aJ?FclNF2u^Xlm?}0kZ8qgJFEJ;<C#<kMwG_pC
z=UKN{A6S1{BW=ZPg>5BlHe0;yi}j85p7nrrm9>+#wl$A6tMw;z=&EJ0rKP31#cugw
zer7&k-fNy<E@^%Tf0HVX5_bcuhan%{kZgmjHlR{*z^HAc=!5jzT7B)X8mZ1yQsm#|
z>rxj<lr97(1qTGD1Xl&`2Tf8NX_a(ZdM8=rc)79MS{^2kmQTn>We4=|ppq_Ml60x7
zv{u%XSL%EClE?6zPt+ICx7*4!d5@e&DXkP#6#1dtS1F-p(lRu=VFsdI2ckWMWOkss
z*q*I18X1vBEu%0-RnQo1{AaX6p5+i<f*R5ZI!l-+PB%3-OXe4ri`H$nn)cWBMvnWA
zrB2CN%hlht#<kaV(lyyN)z#DGcb;{&b|yG~Iu1EnIvftsVc1jcKkVstZg1iE)3MdD
z!g0ZI!m-8C!ZE|X)>gxM)_g}iMi=s`hE207^`s+#E&lVqe!gM8gFdH!g+E*1P#{-u
zLGXOAYOq1DX|OMJX-sfW&<YDYEftgxL6`a}hm^C*U+`<KmH*^h@;Z5eyiq<V-;|T&
zXeGN+M(GW-`d6u@&Q$+V<F!FrvNl(D8+(m{Y#B1aRS_+Pax5X1dk~De&;O30;O?iu
zqaT2w6#>^Qf}G-5A)m;_BIeGPCDuK*4fair-OjbHG46(*pl7=`CZd1Dpol6FFT9()
zZN0_3@m{|t(wpod-YcHPp6;HKo}3<&hkDZ74xDp%8hKWDzIiHm$9NBWUwM<gx4rwk
zv%Gn|cRYDK-Q1;ImmD?iL#$QJS;bG}2wS4BP_N0&BxkU2;Ff=!e}?~*zi{AOU}VrA
zoFRRXa?2g$5zvyn@;}lFshd<lqSD#m>|oJg?%=1uia@hKx__&`rN0QaG6sCleW|{}
z{$c*t{`!HZfxf{QX`fV2_9&Z_GU_$8xprRDz_}9iPT*br^@_Sh|Dv4+6I-l})0%4q
zHHY?(dKIkUp86kTIz!71q`G7rWNY{;a*YOsqNX0^8J3>bJhlV2lJ*VuGLFNJ($4M9
zAI@y9+OB!7ZLW(hpNqI(xOTdlx(+)BIU73jIm<f>ISV^8J0(Z5!{lt@-08gT<W9G%
ztg9YgWnC6ms&fslXL0<n?XtGETru4jJ`jzSHo9qNmHhJi;H$t7e}ey@udlCyZ!pdq
ze3Gw@|DgYqKYyTq;8x&j;BjD6pnV`J@ZEpGKgD0$FZ#dwcKL?-8v4rmGW+uR+WIEq
zu17wbzl;BfKW|`rAWLvwFsHOsvdJ^#U-BQyd!>##SAC=^Y8`F1b_1R*M$f6|$Klm~
zYS*-_+Mn74tr6JND#$UHo~Tpfsxh49!w%99<SM--q>4dP(EQf&z`6j^Ic#^pBKkPy
zIF>k8Ip#SQJ3cvLoxeK|J3l##yMA|#aQ)_rbLDZlT<@GSon@R!j!TZ2j*gBIj{Od&
zbBOc2GsT(LRnb+z_0`$M`Q1^}G0A?|cFX$K^2+?ulmX_Rib&`N-^V5!WAvfg0Clc%
zMJ_Kdk&NK9V5MMmP!2qY7CaAJ3w#Xt0{(y=NDkZx><#n?6oNdL`>XnY_zw66`fB(J
z_=@3Z;5+0i<KN{k8n_Us8GI4!BYlyo%WL5Ub0~x11Jjg->JG@K5qSQ0EkU2EpND>w
zh5zbc3^m3X`;7a>dn4WO859-VDyTP&!w!ioh^V8GYbZl~^q{Z+xGb3tnMcEOowdf<
zD%$Gf)!f$2HU&EM!Isb7)4tfg+5XY)ag=s+a!hiJarAZ6b=VxQ?33*^?Lm0CtF{xi
z7dDsucl%a*n!N_DR&+FV<a4~S54Hzvf7%?j1J-=jftKawt0q}Y6aJ+~$QnM2EjRY*
zr!_@wqP|g@D*u4r6q4Ua6Qu+xS_%d)2X_Q}2FnFS$n9ldYhZYwX}}w}?_cY0>d)eT
z>pSdQ?fcU=)tBgd;8T1J{D=Hef#m^vaB@(PhDwj5lJa!<nru<(1GC;Kg&--D)?T}y
zMe6;+oSW$$Be!8Pau@}S;zkprwecrdQw}x|`RbC$j@pn@^pT;6>Fx_>#FeH|<{FkH
z%YJJq+dkV{TdK`uPqsa@t+kD}HM0%0O|qS`<+AUy$2h7%K8cQXj+KtF4zJ^iJ<(p?
zF4-R2j@n+@y!Q6?d5~p3M}Nl^$79D`$3@2?M?uGI`!QPp%&Cy2m3e?^qj(Qou_hfy
zo+0v2WM_<TdQrWnwpo3v+>u3jg7iJOELa1Qi3_$1jtFiF-o<uPEtnGg3KY*8JRYbP
z*y}Ip-|s8sdyugpV_3$@jQ1Ha!249+GJgUzZ$U6KEX^rTk+;Kd$#R0yO_{5#L(Gx{
ztJ?qsZ>;Up-1-DCsa!@EV<ckBv&Ls56XKH|;6#U+85xxg;N8s-`_BU#IwkZIOPXTL
z8Rq?#V%GK6g0}g#Z?<^My_fx>y|AOZV=X+=7e`fRH|J>QNarT!a_4^MHs^fjBIi1I
zxh$~0a;^rh&aOQ!#bvnsuE(xBt{bjZuEs9GHQ8CxdDfBOm~7u|`vpeOz*5^>&s0RT
z3vUSW9PFKu431Yso25ppN0d(B%<JV=az)uMy^vN*1ErEuX{m%1DJ2DO2LB3n2$l>!
z3ycaB4a5h&`S<(h`KS74!OmX#;{)9T>jF0exr0-K7lNt5BG9BuQWklf{8erMEo!Fz
zt+v&!Aa<*%kJb-^Y5${t)nDl!arWyGMk`~X@!ZITeH4vQGoFvgCL1a-$B?}phKl1$
z`cZJ2qRrIu-LlGRh1P6=jlBjNB=*+!Gxk`=IEUsK=6vQH;7W3pbKiAGdOV)X?gj4b
z?vt*CuI50ePOh`AH?F+y9qugfbQ3*2J+(X*&og&b_fXeKXR0I3{?fL_TGaBxv|l_c
zG@~uaAl{d)GM?*p{Wop9npr)lG*(o3s$5Ps$rq)mQZ-2q?hFnJ#s?1s+60;hbpJER
zq8SkTx$j@!W?wU39$#T!7vDbLS>I(}im!!#vj2+zD^O`j;C>(nnACs4-jWrlEy-n-
zL5O`%E2osNV4HbVk4n@6>R|N&P|B(K)%w~7&8#;BR%hs|jl78TB-R0WnA)hq{*FrL
z3gFRD({l4s%T;Tltqqdb(;cafY|awS{LXsL>CPL@pUwo=W><E1V|OMTncdCYtKCc8
zZQL2I_b#Wqkh{5ilH2d@=Go?n_LlKx_UfMJo<W`q?&9vPuFuY*&gqWh_M5id)`pg=
zrtxASK_dHkTb9)*q>t8)s5#Ybkj@YJxx5DXg$L3@>6z47x)y8}6oM}U?*gwe<CTFH
z0nLBkKi}WLAL-xf;~D2NPGtO(;m^nq^e*JT>z@^P0Uf#slrI@<6&xR29()t5F1?h-
z$}i=dN*mbOD<ws-s8mf;a;aU^#p)vUl$u%VsU6l#dP{vdQuG6jGlms=4X&`7$mBHw
zo8C<K3Yu8Q+}BdknrvNZOR@QYKPk2b_7gyzvyQ)<*IakqMLiWgO+1O79-i!;J??C7
zv%9nVyt}cdfcKuaaD?vt>OJE1do#S}y;HoMyz{&Rz5Tt0XSnC1`=EQhE5k9s-q{vm
zt!Ta?Ruy#YTdd5pu-wLEy_z1-`f0mWL4A!GB!mC0MNHa39t+gY3rw=egQU$tSMV})
z=x=a?%g~(5zHh!{907Q-5&p$EF8fOaW(H0Lo&_w>x8A`skc>n6UHU0iMpV07{v-$G
ztl*OE!H|9_QCN?%$XdNtOKH`#%Gw-o%6wo;L-jfOWxa**)+hiCDhU)iMH1-<p_kYY
z2=vZ8(L$`ftsAZ1tkr<j2kk{2!yMNg4S_UC&T6g$u2kU5F!yfQ&|>#=cVFmC19u*1
z$Vm4|cV*8JPbO~#Z$+=@-R0@!+3mjS3Id^uIQuy!+K1a_Sl3uqn5&xXVls9Sr6PJI
z>?`=&P3<?Wmzt`qRVpi@azgGeC&<-hubeI&ktRzGq{dQZ>EGa=!D>Mo+!5#($QM}V
z{|Wv*75=`3ud%PSuf1=J?<f4&5Pu7Q75@O(;17TCK>fh(fEjVhd?{J_O>Qe60XJ%>
zEKqhSmlQ#*rcQ(IELEqg!_}$kDpgWjfje%{A`s7RF?wMa*jVI__K>=i3aXIBlw!JV
zUSsKDwb_Q+hS_%77TbF`zB!T{ilc+Gk87fP8*HhqyP~_6yS7_&Uw2XWFi5DR+vjTO
ze&kN@)b+SMf~UFXhUc>9l4q)Cn5VpFjJtqqsw3Uj$(m*!YpN_Z64ue1WGatmMU4;o
zQvIzqT&=BK$H;d9lb=Wxq`Xp9sUFT>gExRYeS)=dG(`lPJ@_H83%Fc55cJRVXTY-`
z@G)N{e@CFvT>mfsK-kooz>>hWz|TM<c)+Kysw8QfoDUg+eZZW{I43I85r=$J;*pOV
zrk+y^XoJ*!$iYRcyEG@*&>*9v5pC=?o*6fcVa5{U4ckoC3sp@f%SlU1<SmL>A6lKZ
z;kMqkBKFpf+s?PZ;WF-jU9Viv;f+qZdbo<ay5Q3(cXiKl%)h>8KM*FDH=mb!o!%zi
z>)wEOx%ZmqtNXj_xO2Z_jlH(bWKA+hn=6>gh}nd?L@}=F-SiWB5o3xTrBU?^@{}i(
zgYxs>)WFn0dSH6+68ug#>Aloho-enT4RD!Wl0*6xY%Qfp)1X-e<e(HO=LCMQkRAl9
z1iQiaPm?YqR^!qtc^G{7Bly5<U_Jq5xY|{%sP+T*Nkr5ip_kAL=`p%Zj|U1**5~T8
zu|ga4B)yX{-WX_1H*OmJSZ&1bH%S3R?<a(8rblMMnqaML9c%4nGb4l1#lFYh#i2S!
zy7PP9y5A!v-RsH^U$xME&=uv9oi$u$*D>cPXFKN-XA4(n_XRg`x5hVxovBWjYl745
z{LOL1KG1&LR>@Y+`q5n3^pCI}Ik%<sB*9JumWr&yN@KL1plM1grJ*uk`5QTm<??Ux
zEIAu88J(5c@@GWrpX39Ip=?!B!SjOhb7iVpQ{AoPhVP%I<X0zS-XE0<h~O)#QCd1O
z-JR85Y9+)9dG$Y$6{@YJs1@L|@9ACjJ^E&2D;SN0m04vpMQ!9F>u78@#`C=_(Rg8K
z3{^LFnI}{HvV=9&GZ_`hX47_YHgcb}NdaM_`Lk_=<tfQVs|pJ(%j|+JxA{3D`kkh_
znCExPVsW8R#nKoW*3a}x7;P?T%`ne3#fd${`Q}pQ4d#!cC}uNF2FHA7%4c3;8X$BP
zt_cD0zPLkdY|0|~!ImqKih@ThNB6NzMuuLIwZ)TnF;TCrcGOPkcMMg}rDan81xu-*
z-9$#Yw6<EwkoL*zwd(paZN4%E2zyZNVKmf7sBdJulCJ)tjnevRYt=()KkcbnTJviY
z^*h=|<&H8-OJp~V=~@8%CbOQy_=aqX%jmCN)30J5D0WK;1=(dKU4KZD1XN3nh1zoV
z8>%gj=ryCaFp-thm$1SZZz1gAX~=gQr;MJwvnZR+)9gGmD^A7;3(RB0%U~tl_%`8+
zSjqG|cAu{mP7_(!WtuIdlGDamQbfFHT7m*<ciK!?A-pGC=quc#Q&Ho6OePCk#2?}S
z`jXWb-jRW*w#=b(NNIkO+-HO6P+<k?sh8O@;~3k*Zz3l?iFY)n>l=*L>^kYh%NREh
zbsyGi810C_7Qxe2*BcoX;BQSvhBjVLHOk=L(ME#4O#fZ~%lM02)ShdPwbt4!Hr|Nk
ztMwe%9eYxpYa~*_*RecC8|;)hpigJH$avb&SjUF57JLY653gyWPXz4p)6B-dygQpf
z&tl(1e<Qp08&kw9qLm%xbx1BMlW5ei3yQUn8$N6HV((xd`an!F%@WekdLX!CecL8s
z9;2|z2jUmnLHtQ((;4Cl)L`+yIbhe;bM~8XUu+>XMArRJlGobKG|<$Fc2}~|RGQmn
zwnlU05w-W^qN%l+nojGZ_z+YRBCy-~rpXXeSQheuRi*Pu20d!B(=SGXT9JPuSA;@h
zC(?>N)-tm{*;OHvIR|^FuhHYQ1a?Z8E5-@y*%Eahto0_(C;TNC)S+CKAL~}}j)-b6
zcd8fF2!5HZ6^06#*$4fpw$sQVb~k0h6E+*Ob=_!*n%)Z2Kpw5v(5o3;gnJf2EUvlL
znOX%>(-LW8T0Ny4D{i<<e)D!xU0clBi3zAfWfmU_Ss8Lkf)zE(Bj)C&fAnkYgIGy8
z$xE9(WVwETbQRp@Pr?9(-D12ga!S)#X8}LSqf7ZYmRqpVkyx$PLIaW$wWVTa2W?`U
zWm$yc;sw3DX(4=jHq5TSDGQ0^J!mafj872Kgu-ktvcMzR8M2QK;dN18v2)#6!pe&|
z%-6|qwCbiMQ=eFs_GH~?9-^!3v2Wx$c0i0FuUQi|gIdUJ7DESe(TFg{(!7F-oWVhT
zDle_i;`QhsWH2qJ1+^6YE&a<}n9mol>TTuz^csI@juC2+&b+#e-)NX-o64EC8YlJn
zER~lsJ>YfBc6vgs3!W&LC*zk!KWL2JSl7ksredTWDgh0RyQHO1(^5jHrVe7qd0F#M
zaXzVldg2Z}2OViDB<v;AsGqhq#tHlQYiQI6EzUF*^`3!v_7u98FBE@Z58pUc&P$<c
z5YO(>9;O*2-qIPF^?EE9P4Jwi{R{)y<jV8{DQ<o(%rg7fHKjkT&1(u7qE#qGKd~H!
znVc~V!|ssjTtyvgC%-S4j9Bdhr(!mihFuh|$Vt7J{y;db?+`oK{vp3>H>D#+9a}cD
zDtcK_rLn<G4J^aO&V0Xekl{CDLRZsQ;Q>3X<-lI7WMR2&o={x9C#gyyVY};=b-mFo
zutB;@7no04I&&A>sdZK#QIDmX)k=JFNi|9>EM{?L6TR9*Wr_Z;xrnttEM>EFn~pM1
zG@rL+)2-@pEgHIWz*56;P@g9qREltqW4HaOsh9dJy@|ZdQaN&^W0Jl%I8iCc&swV5
zme5zmBqOu-2k{|CmtZYz910#%PS9u8H?{@*R<N74iJr55vJ~Liv~Ti5zQ=UfxzoH%
z86`WU`gEjarnR%FE%nGd151qe;z_4qy`Uuq=E{rtanpRuF|%Jy4D8kJ3bn)|rX0cw
z^*7@(KSGnRYHe7O=F?X3{pMSiezcTUKvL!LB(uGst&?#>iSfS<o;I1C9j!IF!yhRv
z<@IR|d!l&@>7f=3IMm;S0iJ+;ms&W`GZ;_)wKsR}5>H8e0)Hw8_*K(R$8NDtaC30B
z^rvveZnphR>#2>TQ&NBoH#fJ=BZv5YZN5K0uVH^+9YRiE18}5K(Y(T(NBn8DB{j81
z`X97}>AdBvQozVZhVo;k&Eizk2>h<5zuHI7CycS=uzQUPic6h_Us%^RuQK1}CAE&q
z6>$yu%9fL3hKp3BN7x2BPHRg3GsV(KVVlxS&q|)t58?@$L1*zF%6+{SxrQB*N%RM=
zsOQuEHMUyPMC{U4=1U3e6WL;EEtECZXpNPTY_!-_SVoVsXQ+R-(|@3b-<FmoEsU|O
zl`)(@5A7&3_2oy^MOvzIgG8FMTPE-_V3VJ<M&=*FYNN5f$oR&~=&Js}2xwPGEPbmh
z`eyw;+hf`#HWN@cQE#&hKETx0a#)C?l~^~vl}+U1g;wS=!YkdwC$R728NDv95PxUY
z^|{7wUR+pj&Sb^DE48auU(aklX-zks*Y7K3^wVNzb8W#+`?7O1ja?&`uuIpVWyRk}
zV|Ee0Tbd^fGk+B$$Txi^FE6fS5p+BDKYSD$&@B8Cd(BJG8zI})FQCHQk314WJM<f|
zH^iXa_?LYoXXqlHn<StTy_D`|i}*#FnF(x`k(ZtmTk|`l5q#ZpKF0VPdod~qRmgWG
zi4Dc>gC?XdcA{K08ewP3P@R$t;U51;zwzDJmyq44M-QVS%gJ|m)lBTP&PXSrk621*
zYNQy&u{Y$ZX&hDbj{Fh)Ty|PQ>@T)QmEEPUF;>yp<S5_73iI9A7p<de8;6}8o3Tgm
z94#cQ#2)7uVFG@+p`t>)3A=Ur(O3BG&;`B#zb$&o>ywFm1gVVOU8Bf%wu84PS5X;i
zj;hsCGE=B06ae2mWbEbnQ5kP2R7Qn44pHP+;R342pHSz!%3F{c@Dy&^lRAVVWTP>i
zm0_oNB|`Z}<2-aElQG6<q&wJNGLJ3by^voWM!v8gs7~i+<*>WB6)(r~qux;rHN>90
zwy_esWM8mO^aUBqW7$C?2j9U~k!9FP@`k=+Em&y3Yi97eP$fMd;j}t@+dJNeCLw~m
zN!kfLOhKWB7$w#sZFwuwoYoUR(0Qb<U=v>pOHj)mjT%^W8Xzy|8ZoP7qIop^hF=q?
z<ebm}mE6}tB=&Dj6jlrM#r#4IS^{<Z%VHK&VbdC64fb!{!2gYKfzQOg-ak-XID|bH
z<M=c78+QFQz>WtmD)QG1oBlvs4X%`(JvQ9>BW<iEYi%Kgl16`h0@$QOZ)ViN%$gW8
z^h;paMYWH53_kVLJEC?`P9J5wG%o535P6u4zx2}J*4K@qtRb^8mqE}jp54N(j0S9o
zaok8@e_*%CBEbv`y?~t{1%zT^F_X(w)s)v%$E2F_S}NL(px(UM+{(1V^4uD4kF@u*
zx~;V>&n?ZYw~?pKZ>?z=X1-{)S-m!st)ulk^0y<b^KD1$Ssi}6)xOonYzytj?7i(H
zZH=uJEk(=?OjO)MJAsY$G>&K^)%A!#KFjx{$Y4#s&DT4lSH_Wy#u+=(qta>mu=L+E
za-)vX%~#KN)Az<dB6v$0CC@~C{;2$y@>m_M4cB6|EvgG1rk#31HPqv33-yxH44Kx8
zs9X0}hTy*fa3h{SqU2K()E9_!?kY$C>ZkPykoPS7#>3zjXm>Fv<~8Loy)i{&X7Q*i
zPe;x8gjKZv<{0fbgNk2o$9(59*GKmj&pmg0MDN4hqGzN>M^u{Vbw!Mdm>F3q)3|6`
zbkV4l-UQbhYmDg#c457R-lg*5#_#Gbd83pbxSY}R*N>ECDQfDfUnkSDr#sXAX@8{a
z=^HY#`s(>J2Omg7l;T=YKgU|4c2J3)!XA?~^qg=~{1QShF+pr0y2UoaIdX-)Hrg6V
zh;p}}epN=Tt{jo8$p@rbQblQ*v;%Q!W<+cUP)BSVyzHx)@hGEMV5gKS?~_Z*1LV2N
zXyho@Ni=vS5H00|L@sHEw6gjv<XLy=4*j}T33$8TXuxuS(<aisLJgrWJ%b%)oK_Xj
zi2KAfVhQY7SY;lCI(nM;Skw?<+%|PIM_9U9R$3lI3k*v+>tSSv=Gzvde$mhV$sV-t
zMXjQm4Y8YTH1cYbY(;EwHimsl>DK?OBdtFybu6>ZZ%hSEv&HX1E$Cz__MJV$j>6s0
z-f!4r&{g}4INPi)QHCM%_RBV8S(nQ>knz}$s^17{x^xVY{3|I&?uEMPOF2z0pfpfc
zAx0mKda_>?u~RES8>$`C-fDNX>zV|A`CV^gtTi?mbB%RIHvE<*3wAZ;z<$vr)FqN=
zJK=y(LPYAq9BaveeCc=lE5|bDeAgED5zj1dwuo~Py(1e${g&xmCSRsknQmqJo~cgs
z>FDT~W-*ImJh4w>pT=#9FOvCg{L`4{Ubp?8*p6Ps4(#7p7VU4Taqx98PO2A}o_->A
zaB`*OMJXG9j!c`GzBD7h|5)HyaDdcC&aXUHYO1ZZO~!umT1+tyu_T)JnO>PDqCTH$
zS}jfx<_Xgf*(QqbX?Nb*P_;SgIAx67MJgHG;Cr67_t&jo*V6K)&r2)ttIp4Lze;B;
z_xB8p_gC@{4m6TVC|0$pdL1#mp_=ub`c3_@Udxz_9d6WUfxQ{|Sh}&sc&MlAw~U?m
zz3@ENk`)BXuc9S|`a&I4Z914@EuF3Jtxat8?L8f%oV$?WD(HFSY3E&ptmiaux_4E?
zlZdhrJ5UoY8qq#tc0`YewceASrpT#NPd>K`71LqP%npa`ta+{I5+0M;{3|lWepF=r
z>KXO9qGMOecWHqXBOM9$K%Td?lpvp$TPo$$mcYC5*ljXWpNV|I2P2AIHmt}mWHFZM
zjkK!j8feKmWt$=>Po&(~!x1O>g5LsW)JvlS1p>tbtpalb{{(gi3rgLkJyKrTj2$m$
zmH(7>>hIb-eHtoyHSn9V<?u%Bu#?Egd*QdVe+VnYL#CtV6ib3_iT%C9;i~Ok<$32-
zB5FpB$uuZBXUxHv=-6Sg2V$SaHjYb+>lptoesSh;S@LE5nPpIBQ(UcRZ)Ci?rERCU
zf<Hw5WruQHsvIcbE1z*N{d`)EUu{!a^0DNFDej*mf5oPId<lW^QekC3R;Lm?!gBm-
zV?Wo)8bKFtnq*U6^K$bea|g5AR7GeA{^mE*_2b%fb(Hd6u7es@u3#De>vZGimXu)9
z*re#>k}0!N<A1r+FZ=XBZ)D5=z`h?@x{BQePNfed+FpCCUoeu5Yes)VFyi&e+8OL+
ze4<j#j2dtb?C$$!Brq@JEa2C5kN8pi(sv5IEgUdivz)a#9k(2HoH>#CJ?n0Qyxn@>
zV>_=S;%tOBG9q$IM7;>r`^GC{hd^a-M~}~K@kD#tqq;KRJ<b_zOEY~E8X)!=YBbVX
zDsyD2d{jOw684kWrG3G=$mqHP!~OYDOF9t9AtlNp_Ei3*rE3dyvyspEjOu=@{zB`e
zdyNF+t?tx+sQJ}WN=JF0bUBz5$Q{TUNC?!$n&%2!^l$QS@hkqgfHROiP#L-3ukb4y
zffWyxiO6JK(+Z+`@(Vq6H|`oWAoYJ(N9^xCkN>T(2X+RkqzV;;a^fJ<PxE1GKYNm+
ztLvBh5AUUj<j8|joijy6>(P<1EVgCbqPXnwSL3^6ew4XcmiAfBW!B>!$NbFnI#TtN
zb2hVxw3^}2>MN1bU;eQfv(jgzuTHO?PSQI3+?)D7_0OM`f3-`ilu^+?Gk8U=tDQ8i
z^Gw)PcnCWR_Xu;u^I})zv|6FI{2h333rzS6dBb<H>$<GARRnobaHs!LM)~yOX?cHD
z`q?b?amtKT<!6Jm3hDNY6~5_#0n%ByxUwI+B)+N3wfovr&56A{$@+h&x7`9NdW@xd
zZtU$jqy39LU-Pv+@Cl`{>t-H)i=TjBIwTRBfb_VSV194uZM$Sw9AjOn?t0!v5sxBz
zM9Pu#qAo<;kD3-$JgQ<;{ip^}MWbFuzKDDnxh=9;<nD-x5e*~qM&$8cbltEYuv9n2
z3C;O_{k?KTx)`h(`~y{vZ^%He3k-&}Zuf6PJ-V#F59+GxgHiGw<%4=dBhbVV_}#`K
z{)TnI|17Wpb)3ogRp@);G+0h?eTu5f|4L5je4wj8%6BWH8ER{SKbwD@zgD0a_Tiig
z{s>M-zCTr7tR!GxUoCy4k%wJItz#E<;^rjlP?yP%U$hp(Z(GgSDV~gsZ4c~%9*ljU
zSCO?HDrPc0GkL&+mf7MRU!5)8w>&4kpCZ~uxuZA4G>e@YyC$|)T)FsD@onSn@rUD5
z;yT2Si?1CwDSAa@eNO}DS?gSJ65poPl5N2`{@1=M83`Hrfdb3Y^Q4bXpPKGT|C}}?
z{e8v@|FPf;`INd=Cp-t@&+XXP{2cq6XJE&(ARZGs3bDdb`~o@#TzCwF)oOR-0zt2T
zb;h6R&C>!uAN)M>D>*G8BQ@hcpC@1nJ_dq*lyb@2<Tc7bwH-2>hLID$8L5hWuBWkU
z692agGQ`J?zD9fS#pk+&D&io<*#_+1?uTDhl%-jP%woK0syW%x-4?LlaOUtl@P3V0
z6}dF3Z>FD_6!^jcnZ{)DL~nx~JdNHJeLp%|Oy-z}(fOlWWZDwtipm~Qz!hOTU}`S(
z<Z0S#xmAz`*7!f7`p^M>>AYX}--De`^xyK&MulclKn`w@8>>&X0!BTi@S}8{Fi1Fn
z-z)CI|HPM>uVrzpF|ss+^(NXrrHVXP`VlM{=<Dm7@hN?N`q6ZnQ8=Rr>V{+dJ;GJz
z80oQ8LcT74Rs>jjL%opk2)U&bd^Y}f0tI{iL%Y{=@`=d8-ZTz@OIO7IM<KI8$RU0q
z7pO=4Y|_jpEVr#K>>lTI*Ex3sPrUb|w^!uzs0W#fM6+mb?8n&BaWmtx#8-%S$Nw4E
zInEuI6x%6!p?A4sw<Vit4*8*FlOALoPFt3CC2e)u$Y0}s^#uzQGoGf8$w<tolu-qo
z_nkjN%B^fu?`!9cjbyNR&(z$sRE!XNiiJ`Ac}JeIVTQ$sGIr`Sv{y=fDZxKAV?+Am
zw3MIwQ>G^$O`e-N{1-{zl3pTXq;ElBp;T6O%D=GcGvy7+7qu+1&J&FKY#jeZj$*I*
z01|~bDm!~+*o@~|Nv*5aR$pnXVs){z^&xQyt;O1=rsh7V-qg4KXaC|<+#|g`A`VA-
zGmVaJ5fg~9#lDLXV_U?|h^-N8iS@<!V}8WsiOmwbA=4DkK>J_jokCTXA`AZhznZ1;
zl-sG9e>O-}Qy2d-r5{Xhm{Hc}^0my^m%)8;!3)wDrK~nt|H^E_Pt!SbCG!r`3oy;r
zrZ>V5WS>(|{m*T5*I%hQm7l@o{;s}2#*_4^zp|t{Q%<C0`MLd9rL;3?^V1*sP6fxy
z>F^>4l&$J&WF7_>slf8RSm|H*@0AA%KmM<!a{!NP>%#CjYMU~(ZQFKxYumQn-r8<c
zJGF6Zr?zc3X`D<lGynUY{?B>FXU^GJ>s#1suSGV3vh4C1MKtvacYl|!>!xcj5vWVq
zWAfNE#nQ=k-u}vQ#cQ$m3ZLJ;+Y+@(a?*cu(u>J5rFfsRTR_{?Yf=ZMK9YKN>dmPu
zq;49}Gu76Vzf%lO@grGqlKH-m9gnQVP2LIjV%|m^`ja$tRLHK7y&(}Hnf}ZTdmmmp
z@>bNn==((ft&53|9qbH_&xoa)%CyZ=0llI%(V8#4j(a8b`ebitBO=<g&E3znFMgTx
zNZdcMZKGy}_Y4dBvoQ43?|Q%X{;c+EIWv8ORKqhP?nk|h`4f9AZoM;ie1g+Ees}!x
zgp{r(u5HY89?w$K8nPQ4Vz1L=qPSaoin#l_vJs_w*R{;u%QMSV$nueRz}}8V-X5QO
zzN7p$C2EUa<DcwdvOUQUrU*@0I^a)0&eY{n&q&=V^^$<;sp3+$N?9+3Ct0GT84?xp
z?qN@2?e6)<**CgoME<bVp_f8Nh6IGv2sskkIBaV8qKN3o)T}_7V>ZRkbMB5mli=lE
zgE!4&J8N&?7*6c+P_K?&;~bUkq1L6A!sY{>itgZqR`Kba&136FUykS;UM1{$=-uCC
zf35m8<@fB6(V@%#3<;ke`6Bvz?CrSc&ZqIApwxC(4)nDq>{vSMnPV!Crzwj4+1ZH4
zFJum54_juB+dUR+uYm`5hxxAMgS9j3*!R31c&GNQ<yRrm{KTJ<v`v~i+2>>zlIKd9
zKGkvPID0_NfTjWE0~Vy(kg{Kjk;&61E9BoOQ97UF_6?Suo*@ZS<CaH{k1Q9lH@r!B
zQ(_bvL_CZ*9C;vWVsx>XdFb!`<Bp@dEyB{yWm;!$V0{c#Riekk94Q>D>}PE&tm!P-
zOoQCHTm|s>JK`dv8%0%(I2HEqpKYPjLjL=mI%IIj+0Zs&m%>X#_KtcKy*Tzy+-qk;
zte0j)SPgZhB?hJ~IT&tu`r+wynrgAn=O!pU3BS`;_DfG8E5sX<m!+I_i0y;j?bXcp
zpG3zJC-rZgbYHURDH5mpooac&mejk`JWV@2UETB#(>Kj9H$&?TeKOQYADX&y@*as|
zywcjfxCg{`3(p=B`m@{5RX@A_EcP?+ug)R8{{)96il`F#A5j{8qngI_ic5~=>UK{w
zcd$3}@nQY8P-6cipIFacpJ;+_I`66W&enV8G^TFuGeo2fa&|?tpC3CoW@c0@5sPu*
zZ^M@VDH2*bWOB&V(D=~U(7AuIhv$xb88s*7d+avn<M`Nw%<ScU&J1^VUx#iz3E5ps
z@s$jXzl{&CX+qhAFxFltCRBGFbUDyY+j=^hx|%1j_GKZGdAehVm%}HY?=Rm4tdUMk
zR44JZ#IF*cO58i~%*3M;#}EOv$S>A+xbID$BtE^o9}y)!!(n%vvNyKxMYkGj$!RWV
zO6NJ?N|CV0IUaw-tk@*6r(@O<yOM`Emit8Su8n&V=ZR}TG-U%KbKAx5BnoZ<yM%Hk
zEQ;^KdSIv64zU|zZba3I{1)*#VnpN#qUlBxm$@RUWAxeR3#`*0LMPeJdi(a69Wgg!
z`p0gM3w3r$7~=9{kA}rm$6SbAN|`NFEN1Hj<bOxoW7`~iH^)&gFQ00@4gGQ?x}0cY
z;$=xj`g{B<CB2n&d9p>xho`8Y@<Ga?se)6r3iuung;in=Xr3~2(ky=cy#j1cJQ<v6
zqgsVk53Lq*_xG^hMSsWs{ucTWTH6(IJ~EJ~Npp18n7y(4;!Zj<vL5%@l-73G@yqLj
z_blHOiA;&^`d0Ai@72@Z##-83##7UEE&iJGV_aEc;rkQc^E#$xOjPvp=$I&TRJF)s
zL>jD*=uUi`C*n)wC8B(WMYoNq5W6$>0P#BWi0a<ueBrdPUYZmwv~t3!_$|CohkxjC
z+$DDROv4(y6d#f>+#TtuL&SSQYfoEK`yu;tdw+-1F$Z5%XYX>%Z@BMHzluZ=6iqyw
zh^2LjZu({N8{xax=Y;nuud-fe92JNpoonxG?`qF&kF<TWB@nN%*_P8b+Zt;5*Rqkd
z&&3`xUEy2X7QdI+p^9<0S^F*!`!Hq?`=Ca!TOxT(nwWUBsQ;paqf^FoiP;$w5z{pG
zSgb9sW?XyrB29{06E{EZ5SFBw{Uyhp8Hu9xPiVwWCRajwq)aN8Vj{D-1|W|*kWGs0
z9;RyMbnI*iwjQws+GjZ0dNuKG=~KctpP$q3aH1-SUl3b#AxQ!Mq4=6Y{p%*(nY3WC
z;mI~7dzUO_@=wVUC;yr3-((MxmP}g2-;7>W*Ivll$dojpe(aT~s}WbiJA`Kme-PF#
z+!?VcYF6~Vm|}6U_`;fD>*ZlR@K8cCydL8{8_Z3tv8<-M>~+2Rvd^T5qk#R1b+9GM
zRKc^GwZ|Yvqb$)>&*L)3t%@xkTRJ9V^qk1d5pBXl!!Cp^2(uCUI6HE2bc@*VxS8>p
zT;tre(9D05Luoe4fgRmx$!ZiqkK>7$31BC4UhudPyH+<Me_J7uzT%@TV@YQH0Jl1b
zZnSy7^WNZ7+;_e2ci;Set%zqV?HA^I*Vpaq^gZlb)%T9iNxVsweRg?&_v-5P*)iR5
z&>q2lnpo=&b^#W(rm@zx`dGu+7u3u0oJ?1ZO#Ysv?jo+12{q&A6GOBoZZh#3<>Io$
z<&E<KiEhL$BXY(+_9?!zBQckV^2i^%gc#uWal4#(<2T1U<C_r0FbSNgnvf0avT#BT
z_Hy)0*v$S8Z+x>O@aP|B*M=QS<R$TBEAX*SB!891eA}|t+S^vd{>Q%9QP69bS1Ip#
z-d;Yvd{Ps;ddWA5UlYF-#795#TR`0Sd*4&OhkQ5t*7vRJJB&!+mcIRbFZ!mV_m7Dm
z-{)J~=a<(qN1%<kba$78;?BvjDPlsSe26DpKveGT==CwTW9Jd~mX%1QKxaQ^0cU93
zr??D6cXeWS&iRDo?EOB(`a@@8CEt<x!0D=!Fc>R2CT3Q2_Nb+iGl>Cu99c8!W>nMY
zFVQoJ{BgwAiXBH>(Yx5xai`;YI6pc!#cxa)<0?vyiSg{p_9Ev}HFk6UU{00KHyG{7
zi=FX9*$q{ROeF#A;fcl@zMGwkb8PXp5%y?%bH@V5dq*K8{v!6E9AP)lDzD34!SL{L
zuf|>v9D^Jg97pYG?Q?DAY#D7H?42Lh_ttCHS=L<Eb7Y7bXg<pQs(X&GV`eS3o|E;^
zAod=PXFe(<#No@F7hf&@lXI@KiZhinF`MFZ5dYkg9hqC%UE&{KkfUb&fcRzcli1C*
zim07+@hgcvh>0(mFeTwbLV4B|kGMkcw|J4AD=pdSa<QsBm%SvJ*-O@k6_^X2I;@dI
z5qmM&oYyjo{by~g0k*}qd$y0ZLiP#vr}p>u1NKR*%MG_@v~RaHuywU{x3#ksvE}D^
zsqL0+zpbaOw{4|u8?Oy*Ep07`C75NKVjEzqV9RCuYHepti*5JY{FZ(6jXXiD5w*j&
zvmI<~n$VS<rvVA?vCJpN=a2WW$FvzyeC3>t!JB=~C}$3&&w=>l>^5qZFeG6adcuBo
zYCR{O?{I>ztAMM#s|UNBy12T#_Oi?9qw9q$il~j1c*D=S6R~Go^#zuaA48c|etAk_
z^`3=4qfFAIN~2FjnX;N)WXsA$o~-gjM>HU7Odcc0PG@%NZe_hJsU<re<GMtXl(r1F
zEU|R5q#?Uk0()9}vL{)&-wwg~8yUk(Wb;X2?P?yoi*u4iU@v}_R_wv9Mh*n!9BGIS
zGlLOt%6FN`BsGh;-#FJ9c8<++jbjhlUE-zdv41fcyWG04dwmJ((<R9~^@p87li2||
zhn0_F?q<BUC7z@|vH6F{N_Y`pW(1LYHQ52G?1(eTi;|7G?dG||e#!e}w5rH{_Vx6?
z96SG)vG4O4)}M_v_i3yXoo7{NFFQcrkzL~pyQq&4eGr88+<=v{3}hlnV&wnp$-e)1
zQ+{?f7iZK0%pb|a(UwdJq3qpy4Ch?L(!5Q7l`%y*+*`2sRT)4_GX6E#o7#!J<m1ub
zda!>!AAX3|%uG}AyFJGvKbbwMH;H&SPxjL$WMfdy&W-q|-ox9)xXVPwcQ@?{Vh=)J
zasa5hMoaj#H#rgpv%Bs#Jz7J*X5!DN&Q9<AWGr)g3Ugi_VvLlDupG0|5Z=zqqdIso
zmKgM}-0_#GEbByx(K18HN#JIE;w*cXAF*dW1PVFD`({XtCJFEGz<)>M36HKClfmid
zqCSMVf_0VM$2-b9rFSMr4tr1A0n1&_GFN_AG1rrX*9je64ct+zpH5}}`c`vW%X`)i
zAG14m9eeBBuotz3r#W<2&HTYsj<v``=5*{^K8?rX4*h9FpC7PVSAl)oH%+C;5VxD%
z#0SV~(8QC%bJ^XH9mOSx6hDD~Cm&L23v=<4t2Hza=Q<DBM_-TK-oKfb*Pdcr@fiDh
zbF!~>1^dx+nIq}jF*1W3U|!mjIc6sNwdWJ_Ihjm$KRB<2`5bGfS<NpP^QG*WS3MqO
zJe&+&Y#_&p@`e{T?S(oAl6mJmdu|W0YqS^qwFey8!rgmgk^RHtKj<zObH0E*y}Q{j
z-<vz+Fb9)8;tg5#R*_+%A(<RfneLLI=P|i2iZZ9=nZFBU2zba$gqTXQ3po>&6|Cm#
zMlOtu<fC}O^@hMl@5#79j0+h=rb0KqrVnKH$ZVPqr;j7o!v$n@EHQ91nahvln0SGo
z>m{u#OJ1%5P+1jdZX5Hvli0BP@M{P9q?{n`|0BN@+Nw_9Ti{38!&w*M2xa#Tg=(&m
zLZm3T5yLF4BpNB5=`9lSF0|htx?V<ZkmB%-gE3e@MvdQS4&TY+u$CN1Z}GeaGU7SN
zMYIimIZn2TIjr}-V1%sf6)y<aUSq#R6XI{W;jv3Zeu<4_;(Slu6=g}fOb(PT`0!fu
zc{zD1ig-H0>t@d__6zQ0t=Qx_!Sfz6ri77)VkXBi&MQa+PHFD24}KfUm?>MLgQz&w
zjp<INgPY`qc*JP^$C~_k)^wF=CKY{pNEVc9cnxFNMO}jD5O)V+#Z)t*4*yfFnBUAr
zL61Mze*+Cvq;HRzmkLHstrzUmTS5*P)j-M(>V>mguL<Mz7|weRPyR#36>l<B{>On9
z3^J_)gVInJ<1_oxOOYdR9jLj7oO!3Xc53oijHB(!KCzRrSOnf2<N2efDm$awfLjg7
zSz87?PGWulj;(<r4jFiFf)1lREuoPh*1^_7z1QII3?NrDxi#*S1L`%|FT9|i!Hn59
z_CAO3zkwzfv0gjb^WB$RX${S%pu2%^Ob;?<y4^L%0n>~zP@cv5<a#J!zR!;OdE_Yh
zO6z7YXLq;;5h+9!E+Wgy4tiISd(B`TYJh*r@>-qNfA?gE!p4z1^atp7iM%`u7}aJ_
zXCEWa%2h@x6<I?)aLjtno5o6id#JJ^Jd~Y`X-mx^=5OR#TS3pSFlrOv)MB9D2J&ds
zCJt)~XN+To-^qBVX2#Eu&0`EZ?Mp%L%ejU(o`iQ$cttXdY$0P#PI5=}g|m;5*`hr8
z1G;l$G#v-SbHi6pktQ|axbZ}-OvB$)fGjWnabA!oglvad+<xveE*tVD(vy@ZtzVu&
z$e6KYbt__uArII9_k4B*b#-+jjxQUTL#rXZzOWW?i|o=Z-H%)?*;f_ds^eP5p1c#T
z@2)xSV$8+~W~L4^yqK{mNA-XraOp3;J;Y-R`Lcc)8B+4Y`9H~dl#O3M;Pu=Kt)C;S
z$rIMY|G|H786HnCRRRCTAieyEnLL9Ykeh65S@~yT9+^Ph`iy&dBb$^DS2=>j+`?TB
zf+cZK*+4QDtN}%?A}co&n<pF$LiS%J+lYhJac{GSQ4eP>XA_I@3CZ>Vxp;-0JXIL|
zZD<DH$!GEtS{TF(B_Zoo7)TL`N23a}I)`fpfqW;Rlm^H^2fLK?Ya-}jxRIG>15$c3
zzt(3wij&=;9_={@wiO~1ZhtZ?j6|AbGCw1OPBF5FeSiX=u+JemGO!2!lO4?WeB??6
zB%$)iC{uL}uuJts0=Z5g|Leqk(?ca*oKXh4>_tybf|_gvVC_1NOe^of!QHfBEXO!T
zq$`<Kni3V-ma7#ZM^9N0eJIj+F?oUxA_*3db*eKpG#(=b27p=}(L+{&KS}7-5XNMy
zan4{g?61t)c=&G!NGiSO05kg?WIDt90Z?Uj_Dy)>AFq!LTnh&L#PgL1w0;RaZbjqh
zO+L61{5lzoyTh-caOex>Zv`4|1M>Em`G36IAMQ=dwF;6u%HjD+9?#|ESf0e1*dpG0
zh`rJO4}oLK7&nAGQ#r{a;sbqubEiOaDa=uhyL1EB%793Xj7(HL!Pe!V*jKEiXy*GZ
z+R%3NpDM`t{ItCR$h`?laKcsDk>qhulrmRW0tu2L(R09ggBhW{NP;)eS{!+2o`J;s
z!I6Dj_X+(;1Rtt~hBBJpGe%`Pc{19QWos7g-N*S(M#M~IrvuQf^uPwRG%NW-i|}p$
zd_R_69VN=`GFHt|IBP9!+`(~_$9Ap}%${mx(p2uigJcm=R^ALyS!HysHsqJ;0%8}1
znrx<{oHd&fZwC&~1@E3PSN`BjF8D|_N^;YVKlJ1Rp3Bq5PVJ}UW>q!4UG(WOe4#oB
z$)Mqy(9wKo;1c$gD6}pAPfcDfRrvEohE{;~&#<R5HyGQPY*O9Gp_>kE{TG>U-;-(Q
z2rE2?pwXLPdJtR|2uh5Hj)%aX<BV)o%4@Y6+U`$gF(<TgpFW<2IuZ<o>jk3DN2hEI
z0%m1iZZQ&*LAuV2*Z}0~3^cZ5wEYU-pP;8pdEA9Ae|btUDl_4p@o+$A+L?&_@<C`o
zI~n2YMjpT1<gV%h&Q^k#vcvD;+*Mh_lszmZI!IACu_RoR9qdWYQ4r}k7>*nb%Iu)c
zv*<;8FfTpQ)SD6gM7(!#kfS~LA&zSRZe-+_DDeCnSqqh|EP!4XhMRI5(%p;S?lRg>
z$Oj{y%Zya*%q$M%KE3F3WA3digAb6UH@HfSrwUgd#u%z@)P5w;DQd&~hh*A^++NIF
zHD|{1Az77qKY;lzM2o6`06l0`bF#?|LT>d!ewCyJex{qy(ndxu2uirhm?T5@Qr7y6
zMrOA^$ZqkmsxO9tba7y{+4LMeK{@R{@l7Z+ZD*X6-?|3(D$h6<=DI$_>TO1DcSJuZ
zi#)9j7mk9{4|A=b^z{w$`~q~hnc3J+-(E02%B&X8tXmkdU(nwR=wLtepxTB-7}Xw(
z{75K4+3*TN3rW!h{ouH|*c9W@e6OHkJOrQ5GBYEY`#NCkAMW*;QB$q7Oi13AoLdV@
zQk8&=jCD5Tzee*kwCX|+jYJDQj+W4c>n5iie#}TlC`VZjQ^8k1;Em&GR6Vg5vvXv{
zPO5=+-X9(D7^rrayM1MpQbRo@psTJ>$SUm49n9ArW@sX<Z$^vDFiII1ucFXMZ>V&Z
zk(IUr<8~j6Q3X8J?r6(%Z){Oji_*BU2F_^vFsuIjw*x#s2p*bDR<FIZc?P{61jUH9
zRLd$YT1a8$yf*Yylb(8k;P>J1wP4|Ftk$VKCW3nV`1UC@A4|L4XtJ4@%PO=#Ke9q~
z+{!Z6eq<Q@$UI%4kN27Nc=U&y^sgnfq`EQfjNE{#;Q0tzRvy0>(5&)uUdOJuMhxU!
z@}JECQHOE=Wn5`5b2|nrv>`_+;u3RVLuX_}%OfLtb5<b7-zr!4;mFsF?m_;i%$y6+
znKvVM9)X89&}HB8`!yp!Wk!&&Ds5>;`=w*0Vhn6(=BgI9f%{$nS>GGfpd4+gQ;~{U
zjxl&%?KujfKVS?p(C6yVNl$oY0CP5j%xS|7nffohzYw{39xA(m&2bP}yC0k5Ec$F>
z@`e}2H&K$+pNwWVI>=ULza~_c2b@d}HKt@1vO-l6wBZLxcAQa|OAZayk2%d8yobxI
z%&iX@23_>|4jR%KT78{gzBA^TXmtr@CXc~ye&_&h_~a>6_kkk@{KyMmSAquW8Ys~n
zRNVsgZ3DmefLYt<$tY&5wlTZa83AS9R5eY>jt)qYqRezSym=n!I*MZ?QcpD)XBe|K
z4@#WR3{2yj0nCE3M0Y^nR;|Vfcu^W5aZ8YSuMFDg2+qBXjK~PDR^*=Tjgd$SZro<1
zcOd~cqg}6~-;bc}htSI_difAkQ3W_*OE_2%N3S!2W_Bp+Ht4Yxv{$}B<z~IgOeKM;
zJ)p^cz8{4|ErHi2H<VO_Hk5~|TGP4#P}vY<@IqR&1kCx5-d$tl_knwRxz-YT*%#fU
zF`7>wdb)s?o#UEsX~8%8aEf#Kf@7Jew_?Jcc3|DrHmG?M^miBfQ^hsq2>eMd^v}$c
z5H6JezlIlH^ZE_m`NnT{xKcTZvolZGq0BePIc10K#L<|xcHurVko)@$j(iU-K4%2p
z&<<t1HN%-Hn15AKsfbi63N2;CI?n<<RAUa66MhBq^%yeuFwer9m1wO4nc3?2yyb(P
zz&Cr1Ok(%2@t&b|zvT5h==U4nq7QlKe6cq^@#zKrm-l!kUh?ZVJS{26>6Z$hLpCZy
zc+l0ip$kZ|Hl;l+3=3%qNOuBRzm3N&xZ^cDdaQUea*|uJ1U@NU+mA;C7SkK7bY<jE
z$~PIfZW8qUUu2ow1r|0%ex*diP+cc8T6QA(m<zft4JVc6Zl&PzYEXDX4pmfZ2(L?{
z$qd&-fnm|mOav(Wm9hKE_&;ZqHyR!Z>7N<UD&6kH<Xu#SU)6Ey5A_NO=P>q*psq!R
zZn+3;ZwgvQL$XG;!AsU0O7CsxfWpr4yz7C7r~?*tarD&$w9glK06(C?TJUpc;~({y
z(RT1eO|0vBd~Swb+ybq%HFI4XdddpgrRB^3yb4*N*QR{m2~^%qYtKS=2SKf4Q2H)I
z4ocg53=OGHT~_Evl@zMbd)fFk!Rb<rS|P?MG2?dG&=3#c2~gEQ)g0Xf-9KYg-auKB
z`K#cDDfDLuyfKt}&V&mBIe#rWlyXqtf{Lw3;{Zdxs-i#>P^AtsAq)5vOa99@@ajEy
z@CEquiJAHZ<;x!t3|B2D*XJ-ss5^IP#yB;lUmc<DLEy$x+N7Fmm+8+>xZ(kvEa}jS
z(J4u8X;oS)O4ee{%51KsN^}R|)8)v)zKmrya(lkS$~|f1*bk!W(MGJ()8w{(;r3!I
zGc&W|`}T}=1Mb<2yO+YI&&N@K>-J#wCUL)QaQP8NHwd5I9Ok(IHiUe%s^|8etaQpt
zXU3=D$M-gDku=x=*|8UDK@nA$y;|Um?31VDgL;ZZ{eeiyL?A?RdQlJUpaSPr=ggwm
zC)sIrGG0q^)Q1|nGb$Zvi8B3<Wd^!38#}?0cg&G;d|si)vzedyh8{44QQHVrC|mVy
zM*I|Wr0Ukw`KKyo^<`YUFy2!b@eR=VapqTg{YNk;hHQu#n8P$+L<W$@#@r=<%BqPN
z!n}RMmy{McP=%u;6qFlWD#Wv@H)TQ(EQBtgx>@bf*jpnbWPvwDi)@7)oCm+}N3-9D
zcC(LMwac(1WpB1d-)IfJ^fPQw)p+cWrlOq6E#cQf@M%igAzeF^K3_n>oq{^|K@nF$
z(hzje)KHmfyOn^`TO0DM8i-ehPr|V&Trm*28;rJh6{OqGoo5?Z)D4-RjTVy}$*|tP
zAW@z}ldsWI)6)AgXj_B$_aZdJAT%;%?Og#P&O%D=0K3kChj;LPeIs7uCpL|$!aPC4
z-ov4aPEFymwD8wQuB-ZR@8BpeH08v`wc9XCBhbNA({dC@JBR1-=yii>@e2F`C&1KD
zJoFZFr21hYg@dvo-03`5--JJC3jgkfHmK^cb2)ztXn2)gel`&L6&})C<cVB}wl|HD
z?a4@V1KDf9{k0ii<qxjU=t`?mRjm-{=n-Rj6Pmfn2)snQ{{+W*SP3nTR#%5vRekQ(
zT<0Id8c2<FNrap=@v2&%DVc+k%wrAaZV314!U)KwsSol<H&zudRrXAb3|6M*)SyFA
z=D0DMy7FkZ=6yNtRGBsv<!U}?<A3Ft4I3jHR9O&;E6KYQ*u<(?EIc{NA@n<i?0jVC
z1F@_zsE&3eD5wpiqk5Nv=y`M6{Wn)^Jz7<d{?{?iXvh8XK|!hmmXRX>#BzgBs*fH*
z??f9m#>mDsQ@|BP=ucz%)`nvoS5_wF`MgtY#g)9%F_GCB&)g2<brz2yj9eWKSy7o7
zi40JIgL~$LLMwBfu6X69bLIwg#LY;`Kps;#=3?=#gz9!Mld24?V-7M;na@>ay$O=7
z7`&H*Q8Clfq_jN~S1ktOSLDj=Xv0A0YAt+s8hTYlquWr)8En8HsCuh$H(7!UIcF<e
zqWtJLuxSqQIEF2LlJhU{zvIx|NnZExug#pVs<89nBjv7M#Pf1Spd<6$m=={XY~TPi
zSb3h}kqytm`3vCZ5u}Ey|Bc7<+XJ*32^RH26H~RoVd#m(T%k)ngj0T@{fNG7%t%3c
zl#3%PBbJHRqV%m6lv0%utq9GOg`z9*T$=M#iQOOiibNy)06xfXq>8|H^mbKd4dJtR
z{4p}?FfEyiCMrMtP~){12-_L0ZU6|WdYh_)SPY3>1c{p-MEAj`?@M;UoJa`OTO7s}
zcG0)*@O5tRq81d@0qLiTuFYwkaIzW?Rq0J@aDfFp_z5DsLn6MRCHo9La}J&f`TbQ>
zuPpMX1oGk^MrIK0TZHsofu!Bb=qS7O3?yGg+EsxzWJKZw(4J(-o5aW$7gZ#_lc(~j
z`xz20nByYH5pp#h<a4OoMlZ@E=XxM7jvzNKA>YH$6+_W|lF&ku5HWUtz<JfKy=};a
zOURO)2Ii{H^B&}&>ITfCb&Zgjosmxcu*_CsyWKG`E{ZFv4!}#s|2<>#fU&;LaT9)h
z&bQZjearP@(HOK|krVDK0}83~cM{}HIR8FRZ&kZEknxeH;T3wCjqxiCzqDXPD<T=H
zVwI$TFEy(l;r@?cmCfJ)FE~MxDXBq0R(^lW?=N_$X4Wwzo-ET#JZ|w>w!w8;|Jaak
z@;*zh{;lS@hx_lR9m@>)zrh%@i=2CnvtRHpWzavt)ztUPM*n1snCXptKcWDkPBRYG
z0I$X;p<D}4yE|w)5RT~zhIa%jyMYli4Wu5x_id4)s&-nLE2#Ewc}A!T|Lc!GV=Q<-
z8SY!bu>?*W&;KNwW`R1YAgme)6G4a}$dUp0Kqi79`{C@pNZk`itOv-fXUM`Q_&=WF
zZT^Z6+i6O~3a~1er{FaaYc46N&K7RSlE--Bo+9NgVWGa|`6P1XIuB|5AJEL-pz*&m
zKHUH@l#6&Wh%*r6mG@{i=(U}DpXRRDK#~1e8#}<Clbrh>So46@x_fy0E`T_@z?%b{
zvxXK2f-cKImKBW2Vy?2BevIOZeUKKNc-IeoViWy&k6ctewh-iR7>~!?Cm4iN_Twi!
zKLO3ok~#c{fm4$iyOI1`m8yp`2R)dHPRvwS#<o5(p*EP>ns55^TPtw6CDN=hGf@iN
zLO#C$L+8n9*r>9H>l#{h4d$s7)T?^P={YA2bnJ(n`3Jgv2(3sb-DP;LR8v4Y`Vm7%
z-wrp=0}IE)<^Q5l2cg^ShMsOiUGW@tc=0QA{S%IooqGccdjg#)KmKY1F`A=URzeFd
z1{%qpE(e0tM2F0S2ALH*!Oi+%6uRkKYDxv8^<E+;=`OM^??ek$<<GTLpqWpt#LeVa
z-btS2jqFlYjlwDT&yTWVxtk28m-*KNj*rBb++=UXRX$%Ja$y5C5cQw!{40pajI*p?
zUqQRQ%~_wguPR-oFe*Cuv+8LEo&2aumyUMjrM+d)d7I%E8vu?-*Vn3IOWrjzG~OoQ
zpsK+)MAvP{SsgfIFfw8SI6aVaDsheEATcRIJx*$L{ib5gGae6kev4-Ri)vG`#9U<O
zeks7bLfo$=h}ssJ(;rPZ(BRo8jOs^bNtNK0`8_dMk`@V~nw?4TA$Wr>RzvrUho{|!
zl^BCe`H01HkvTjDdK?EuZZM;=IrbQjqs;Xtjv$a@H;BCvY+Mh@&&8)OiFsB91KAKW
zX~lVP<0~3bE)cCYJl_;l7zW=?fZ~_niCK?zd;z|Did}sLUU|g(hrByw_;Am_(MJrP
zQf-q!R*xp}@9}7SyWtJ_oEO4<i@3%nj`jR5ANp|gp^-fLLF@ANRYe=B3+>lLlBz~)
zC7!FI^GUl`4bqfIVG^q{D)~6CJsdL_{ultywBi4%z&sXBWg7R}giaw}mwe1eL5f3Y
zGkee~_QLf?>Bm)2-~#JSr{Mmb=n+CM)xulNS*Or59`m>kx9;Ow>-b)EoOc;LJY#4g
zssONybJxSmYe0l`Xd?6J;}+xIYq{oO?k0^%H8f^}9e=Z<FXIeVh5ozBHJuR&G;rn;
z*Iq*lCeZ!?pl};Tp)RA*5DZXVsUD2cAZBAY_}r7ZDFWZ8XB@J@^`+4_RBNCozx8K4
zCo;0CL*48D{}}>`bO+U2F|!4baVhv;T0@u2&;K(s&Z-URK)XqcS1}V0X_m!7k&<A(
zYC-l!Bb3FllW|l<j`JLsnERWI6Z<Pb)2oct6-InLeHcw2RDo<9{SGunN8_`RIT3oQ
zzWXe$7Knzq$dEItk~5Am{1>X2&7J!3x`4Zh;x2Kg%gobl?sb;gJIbi8XKYXK+d;;E
z3)p>_89M~Buj0y+7$Mnls#nq+3Le5}j7Rd!LJ~~k%6&jw$pqErnZRA<(<ag56sUC=
z{cOUh7hq)5F(T6RO7i(1bg9499aLeblOahq(BpHo>Ny^#??|7AP|9vvv;`Y#J|ocA
z!1DIYN&$G+1^PaLOC_%_A+4XGi9AIod5k7<8qQQ@kTtv>0C6wy%}&GrF~QJ9n}Vke
z!JvxxVT4nC(Q_JsPeQ1kJUW9&oef)h1{l8CK=f1S1CP;O@1Z|P!;z=sH5i@|K9YQp
zm7EOyCI*>s0li`q_uR|fgE&s2<6P#uL*T^;aO5z0*>yZ5iubv|=UwQTbI=A=6Sg@T
zTzL?*04SdeZLv7-GJ^S~@K~r8eRVK%5E|JO`nQ&2J4X<A+lZxf8VPfcL)!Qq?r@&t
zEaP#G`(Na@Gkkj!JbTXZopzq#_gmcm2rW_t!kJ*$cyM?qxcv{ZCkM!ykF{5|p*$<2
zg~8?0v`rPy>oP(eX>nf=w;vL2EPe@9h@Jrgj>jLW`d8{r1^Q4E??_?8r=N@emu1we
za^3vM3L$t79sx)XH<icUg7Tqw%|3zfexQG5ek%^HEApu$TC@Di&G{@(QdO>Bh3gjK
ztW?M#kK1n4<CR=UO)E=skE(`8t{C4Cf6G}JXhlwX*1+&ajb*ln(dwz#CKDJ_d9K@V
zpAJYNRX7`ibZW~;)Hb}agX!6Dql(gI#_TLqC!O#HbNQHu<cz90X{GWHuYZsyulW8d
zv%lT&ys2*DOzttr7`>)Qbjd{d{#x<cjuDge>&w_rGHjM%wCP`@$rfh$7_+H4y~^C)
z=NnaST8)HJm37swnE?fD<}A@bFthsu4Luf%$N?thLQ{5t^{Jpwe*>dZ!H24HAB`u(
zjV<;S?)im`jOH+5#jA$XeR`*wHZPEf-#ITOdTJUFG9O1-v?Z+-sNT;|5MclaGYZ^U
z4Cc%O_hl!~f@7zGKeEpUf)cI4kPb+pT5xZ6+T=@n649P~$fHt*_FSKH`hYb(IlnWg
zpqg4u(XHFV`MvloU9J~*7=}ce%atc_{i#TSt=xAjn06fW+X7-}C04Z{XVQv8$fix8
z*)e`ShIHA_V+Ar~8M5U*+VNW=UXq{-r^Az-gY}vU#D$cn(qSDQRjK39oSKgH$*ED1
z?=zd7hA#RP><mWIsW#+Gw9=of6Nd8~hR@neWTH1-?qbAtbR&{yI&n9niKywxI#5RL
z^A>&jF2^H|+hF_-TC)PooW^LaWaQR?;i~$51Ksu$y892>c#Z!&<NV_w_eJpfIBnm;
znTp(74)RW9WGbV<7DOve$INDAriwDUS-~V<?DJ44LA7zdV;x5rc9|7#f)3SjyT?4A
z1V1*gnz9|d{Ez5@Af8Y0*lT>V-q3qg+4m~;-BmE{F#k9X4=e#?cR_Pk3{+F4v2&pC
z6RvrkcRRV_0)vu=z*X(gT}wghZNV?q1TVp-jM%mTtfXWI1G4fdH_z$u_~yXk_4b5O
z#U<7q#|ld}@TWEi&<6hNjgP(`|F6!G*Wl+|pn#$$+JNcp;M6wg$wk41++asqu9^)L
zD+G#EL9Z+g78T=NWj<E~!3G;Peh`x46ll1X*Gb@52ej1sTtkRg9K4b>uG*$K!Ifm-
z)EmRkdxvZd_lSu4L)F9=yoP~ZAJJ8#z@Zp-YH+ijfsfMv*Dz8u;qC6Uurb#x0BRLO
z_LK$FnsL8w^s5c6%ENf11*h_JR08Y9!R3-+M?mU}%*YwW_blI>0}YO#r{2MuxXTRe
z2EFIem-dWqd;T{Tzrr>{3T@}#TH`v#T&iy3HUsOg!r||riU_pb-&o%XSl4Wk;7A4S
zbcd#va7;y;m<vZL>Zl{uQZIC-c2MdBbbx`J)ea7=gRY~Res(A=8Vrdf4n*EU`Eyi<
zErJ#Na1iDXI(B0CJsrN1yijIQ^r3tpiVtgmQOHpn7?lJ>@Wl(Hip0`dE1)an;aorT
zbO&qB0sJQm+ErTq=Z7{Q3iAE{WrbTYV3d$ol{t4onZv-@=Egcg4JfT3Bc6$I&CZC|
zWd>?8+9N>!w#-R8JX9MP!IN;pEATG`k~|AonTC7XXtC-X`?8j#`eje?Y&-=m-=b+e
zLwakq&yO?H(dN2_y*U(wk|jG5&g=!A_2cts^rdNx<Px+Z=~sIU4p#lPX^d`fV_bWn
zZ`MUa%nAqRg@>~+g2@;WUwWGr^iE)I#K-cs-NNd81P#a{QS?(Vbf9XdR@&~xztV!R
z1sGi~^pQBm<RvuFml0S;9+M;N$IM1O<D>3Y#4>IGuUm7sJ4l|7?o#eQu1`b+*1_(}
z1I-M;Z!o}ogX-8_EC(#REf&j8(=H^S4;p@UeAerrgwg1i!$4`RY@5lJvKt!B%~j8X
zYU%jw_S6K^E}~t|1pl?ZJQ8`)oTDCxR+6ftLsvjQOpczM5=>G(#4k|h2j=nvqb-Rl
zKdov@D}o~uYEgwl$&Sv9Uw6Y|=>~;LHywjS>QAe7g0eqYsr<xBd^)f_J-K`%@gS&@
zorgT0Mfo<BDK6oF>!tam^^?V7N{6NE59R-ZX4(iZ*%4+pHx`s6T?3xW(9U9f*8Rpq
z|AU#&)%1QXuTyA;{DM{ay(w2v1eU76cY;T{(60KtufcU{@!T1CHUwHxy}V{ndO_$%
zaV0;Y=HKL`_d&aLL(g7(n;y!KW-Jto;RDU9=B8@vsm7nQae0BObH5JUrv+{5%P7b<
z|F`m_>VPW-Ei=?v?EgNvVo2{&h78UEol6G$;j=VxA19ouRn`#b%*;C{t$9Y99-#|p
zh2t}A_`~R1@xI35-4x2-g7(freX6r4B)W_~bOo$A0m|$KyY@qQn?M~!f2w}!Y0kNg
zPVvL&sUpYR^w<|JQ&q=&(0K;vKOKnioY7Cfc6Kl>@1SBQ-ao|~vGI>Fa6k=YAp<#L
z_}<EQ>5MV-hhs%255ULMQ1C(Sd4qQc;qEg$ZqR0}0DYka@nBt6e2!%qv!U>*I8+*o
zsz?unrbjYMit`u<H&5n$f5y8ZZL}FNBJa3{Y=CcI#&@pxm~nf}cxlWYp-uf})*SRE
zF({am*E~q*qRd@k=DP;ozB<rJT?0Gw&~|UG8U>{*RzS7t?lX!f(evgZ@dmPHS%pLO
zSu=uU@}2Yqvj(F31#-?##`=okh0<!tZs=CN<b`11Dzy6TApQn;b~bWN6^!Q~1*9u(
zMmDR?;UZ>9)xoFXH<`sT0ew={iH|djnlG)sC`xD<|NNK#k49z=;&~MM@id-QPgZeB
z^6fMNWfpL)$$USRtLo0Oe}b@1_R><dW-Byt5d6BymCta8ef$!H4@H&40y%Gmu_nBZ
zp35H5+UydpIhtb#Qf(yKad))k9^ktqT?1sN*5sS=tjeM4a}Z;2juCvsScdU<$2_Sn
zatN`tqIEYEqe}H=^kUUcP06>4*HbK&YKr<G$-?0U)p&el(Dh~LT>frFI{kqU%|z>J
z&aXjbXX(=h`nI0C9>6BJ&3GN*{&%3Wr_e^YaVJIR=w4<b@ITX@i~R34SG~liA5eD+
zgGY<Q&sulYdRztk$@O_iH*Nw_2-6mTH}adR0_Z`!tg7l6f`8Y|{)XRp4X&Ww24UO3
z!I!iXlpT!yJ{jJ)OJA!&cVUdm7kdAck?zIJKPPs+0#vPNQB}Y_j@ADTZ}AtdYQj?-
zz)p?;a_6Non?ZYzv0j;!n2iK}_2t_%L~Qt)0S|snVEytXe&TcZt`v25!SMC`jpdXN
zdjuTT9i)=|UIpAyY(OZnRmt#y-bJ3u2NeZ3xmcsF$C<-uzq}6G&yoWUzhsQ=IViag
zGd+ao=IBlh(F?V{UJ?(WtmBesP|1<=QRt3}&iTlW?T@^F&+!(k@;mxj7&|1N;K_W4
zKk*X&&+BM#QG6@=(~9m{5Z$y0enq$2%ys_oZ8*Ns`{;Tn@vg2X1I<iwa`bi&bhl$y
zR|lRuu*<6@c@o>YTeye11KHvA+3kz2SRSs>3eP6``iEHeGW5TrfzXPRYXl0X1kqD4
z<CQ>yD$Kt8az3D7KF;ilr09VLA#YqK=BN*2GMwj0XcYbUwY~HtPhQsp*D2E%v)h!$
zbJaZ-j+)QxdOY&E*tnLkzc7>}IaH1J)wrH2!wzO1x?xGJXU-4fNjeABL>N&&X{c;p
z-P{VhrxFo9>7js>SVxNWDT)o00=b!)s(_jJB?)on5o8$7j@{IPShwofNx94~v1ksN
zGE-BwsCf_a_=G7FI}d(BUAv%-lf3SOE*BX7oH1BEN6-%z(1u5NejZ|3y{8>t@Bn>g
z<^2v;(q&^+;VRy$6a4cSy2N3wq8NjfNMHHpCPL-Ck#&kU)w*E`Xr>UjYeP~cg4VPq
z&6Zj?UtFI8E|kTqsIPV4&0;(lsvBPp)KbKc;$${r1swuYAH($taPd`m^|4_oC*^9H
zkerG*O%I;ug})-1-9KRW-`x$0O?!y8DBbG=66dU86Dc<A6WsnCxe)EijMT35e@(aq
zx}Kz39BB0!-6oHr1?GYR+QIpyxl=~&QUhsG9Ud;gY?eprXF@OdMW23i=U0Z@QMBbX
zc=|o~BL9?Dt-jG;2k2Iu|Mq5_#y~xbu{yODd<ZRcCEAPp%F<Ok^Jsxa(%sN)`Z2P@
zkb0w_K8<l#+ESi-X9F#zi%R!ajNTjWA5G7`B8yE(^Jsqa@M{dO4mesdSQ>R{bcsPg
zeXDBe*_f$va71C=$1rlb!XK{in_j%(+FDCF2rbWHCW84!E5;Fw&l50Ha%C%;?<LN;
z$TwPrxx)C$i}4lIea<hZxWXCc?K<ZyMl!kK;#m0qCe`fvu#2+`J5KL_4L98L$mKmB
zB>2Wk>N+Y^z6BpHlT|YS%P}uU8PMP|`?c!A+Y4AtIR|cpQq{x+Mtr4yM^fquJ#<H~
zax#l+FJ-UiWcOg|Fiij>GLvOI1v_0W?(FVZ*K_iDC6Eo}tS7|$*fNyN<Vj5HsGgU^
zd=!cH##06j!AunPR#0p=xRee%r4tx3gb~;PZ$8KKUDR|NdM%HA+7mh47g-(WnGM&h
zgyOFwwNEf&)1l%^jC%%y8_F}*cC;eJX#appf1tq$6{HiEGN?;yBC$}6qW*(<-h<XL
z8BJgi=$ea%>`CcxYq|DR<bPk#cO2gwMbhbzY!Bl13(VaY_{EQTb|U#cGoxRajS#3*
zRFS~DP&7#Gu9F7lA`VCE{l&n(O3-}{WPsMy6!R3!aRtrh0CzcI%<n~J_&NV}(|*<Q
zRqgXk=+26zO$E2h`jp-k&vlc4e5z8eSofA-rZl%Ya7`sRp*%hR>x<8gUY3bRTFzAs
z^(1hL32yeMB{`tXQnXC_BeH|X`S{!r%<hXuI1wql5ENBa*8cFRc)2qF$i!7-gMP<y
zxlfzVFgCKdwKgPg^kMXZD^TkhqfZAIuM5!lPDbu56n_Jpkd)hGXkio3aC#aZ!9L9Q
zG>~ZydfN(gq3O)Q1oW<U*syh=?AB;R12}UAn$s*aprzbz1sc^#G?Oi8TW1+R`6`7(
z+YRk%8WhzJ${NExhNACH=U=Nie=c-0(a^FK!8?sR2BIe}<DLiMrz^~h*3&-1_us)P
z7ZyqaT>Xxj{sQ8>0V8k1gNk$w=M15uZ2RZ%=t+8hk>79AckP~(@BA(F_>S4~0--F-
zVH$8C()dm>1&a51#qU4B(63PKCp55++~XN`$2aImaR{GjgS75j=&47+?0IO6YZ*^P
zncU&r@0_Vf96OjD57j8f?GqFdMXQrS36+`89vq6eS6t6xJcCnNi|EKK%d08B<v_HN
zf1sHX9JxU(SzG_`YgcT(b*z}2LMmNGt|_K^DHg;;{;$~E$^1SY`k056znm4Dt!QM5
zjZ>!hC&;o7NVnG<AJD5}(78<PYEFZ6%|ZlhZjJ&-+&t`XuZ<V73i7oiQL`o3<zA6`
z?dAAXo!7>ED$7p7eC&~{z@EnzRP60c1?k4T*5mh@{92O#l;YC_^CI&ub_!oGzhwvF
z5A!=R62GG+?lI~F2eIpLJA3Ch8^6CZzcGK|_(raH3spN3Sv>5icbcsh2igC<Ex*XY
z{EUja=csDAi~sLuU;RanyIlJeuXm{Lc!+wuE2*PB$=usKp8c4;@PdvoFQz)}b@L;t
z5`Qqirv~wTGSt2`Kj8ld$pSVCFKHcmR1|AHDb~8G%_p$W-xqzOIKJ83d|!cgCFoU6
z<7h-rJL6^DXx_uUpPIkZ-sG0_?EWulsctDx4%bSSQkHyFKlZnHSwgw%6Z3N(cev+U
zYCvb^*Lu{rZfa>^X>Ms~X;0nUqI{c<jKnciiT*|Xyx+Wrk@N5?kMI2R6}vaDn=i8m
z{4&?Gke@pp|43s=&5_yS%?L#^t{=!;9l^U${`;0|xXsC_IGvNLrza;tUP~EEVeU7O
zoG)uQj#^e*mRjamCes>^c{Mcf8oG)xm1i$&KYG}a9igerap**$rnHQJqNm+x>JQM`
zj<eRe4L^?d@Gaso35h)!y{->_^nRezaK0JI^H}WLLFj-D(T^%47Yf30S-`8LAf6Cg
zF>v~n1-{74t8Ac8@T(-Ntd(Fe<PsM3E%;LWwjM+piDXdZYCkNoHrRkekqtu)i?s*1
zR02&pJx3DuQU5kpb=+=mH0qB?wqx!+98{KYZ$V!^M9l7XqL>a4ai9p^g&cF-i-}fP
ziA3Ix&K=0}V$M}m!*X(AOhntBNM)$iWLi2&^wTMxkF&p?Y}-_<or2_T=&nuv)bj4?
z?wUO6yGyzYkxL_$JBploZ(Xmb;dTRg{M_}!6^Gk1r8|JUqW*3h8D~StS{_DC-1Nxo
zitb+I{2u8Z=N`))`=T{8M-M3BE<-N%D&#(INKWrI=nsQ=3}?q|4~~xL1Cz*Rv5ubX
zAsS;B_P`qSf)&JLs81)cHA)g&H4+(m4E&OR<11cMdHEio@iao_=EBBpYD9k3Mc3Jg
z9bAcj1cGh{@LT4@J3bC13^qQ?r+*oYk3hb@2Ki?oN75qqHn1YG1Y~W7HRdEc!7^4h
z8sRf34sQ9umF2*VZ|0nqV&*5@eJD6Li@nkB;mO&=2<^Zt(azKl3NLA%PMqp!YLoS`
zrnD4>zD_Btm30GgM8D0|t)bMF-eAdMeq^d_*=22Qy-r>1t%PoTfeY(c`kPaEF1zM?
z)>&e$zbu1I%iP^OYs}@X%gK+E#*!Z{FJr!Bu40*vk1>WS<S)UQw`2o<2CrQ-&m+Uh
z3hH`a1ZfWO?JM?AyE!ARB`Z9g9e#`Avk#Tr$zw)E@60@B=GPRKX!!gdc(a9LwRx3!
z3D29sxh1@x$5>3|{a#RS8`yCkG<(ODogiK~#|w^Y=3qGY3GZHlSHIxpKO7EAtbqj%
z@Z@h5@f1{6cYqfr%U3Ys2aj7I#1&e*fjg*@`Z#j74+Z^(f`#qD@>;9}S>WFvU_X@s
z@KfGEdTc`q4P?X>SKJ(3yB^-e*68DP(fq3!c19+oSRAu_59ufw9E$Y&g@pTrJbli$
zXYr++GdvMXk=e7MBt^B$_jQYN<bTM34pScK^>@8(Ja*(bY%j$;&w*O5vga)t-06Y7
zs9ng~4^#z>{tjG!kt->xa}yYF7+r5Qck{>MDr@LPS<st)B8{~czaMS84;o1?@L&$~
zxgAYn9U5&AS2)C#?jVsZ=v7^@`@SLdvanJ%8uShztL9G2M@trKL+c<a{|vYOYxa<(
zzJUeWvb3^104LX4epzzy`$X$L>pJUU>wPkjKC^~fEjB+}9$OWoruj(n;2ff+`CGDU
z{j^!_DeS53S?yWu1?{!$L+rEc>+BoIUH#GSO_rZbj;xM+j!KU5j@*tM)Gu%5Xy>Tq
zsN<;UNaINBa5!uZN*UVU*<X`S=a&76{jmL*eUE)Pb*Y=%t8?Ygwv*)0S<W%SR@L_1
zdd7Ouy5Bm_TGyJD+)u|Xi!6gJ4Z!@Y^dP+@3~D+DT`VP!|6JAt$s>kkt+;Xr^zerD
z#2~1kCn(^=ZrO?+)`s=4#AKb>1^Vp=<tDmYgKCxCS*afM(G^Tpqg}48u1&6uWDJ@_
zb+2)*j?|ydP4%cqs-thEwr6)LqIV`YX}5%y2@R=LR4^e~LU{bC_zhIa8c#O=zVQRe
zRg@<_ZM<K+HQpBQ9UtL*<a|l(>xa&J<iI}aT;-fYCZmNMvv^FS(&JRVSwcO!nN%8I
z$n$LK(=8?6(_-fy=Q(N_Z{hPYo==jCE|}NFTw}d+IsYB*?CWgqtmmvo-N*oE4re;2
zn|#;5$h3Se?ilsUSH?|@>lN22t~2#Fy2h1>%NCa}t_fMqy2iDj%6TVVdvL~X>Q^Pi
zrE=DA4s%YYRqM&qea5NE-KO}|97W@MlYuEP{(SrcDhfx%TM{zT^Lh!58HK^r^qkJf
zTugYE@IB#Af}8r)ejFLd0@IXgaU)!F$i=>bT<i;Zw~Gu#Yst>Ogq-C4$j01_>S-HX
zL9X58_&Y~sgGba-JIwD(sVh6!HOe*IHJNjFxsLJuGvv%mB+WDOavh->-A%F_WpWom
zax_Cm^mli1cV;%*kUgj=>n-h&9aE8h$B2J05x0<!HLF^zP*&!s&g!KiFZ}UqB`4A@
z2Fp~tP37^@x@rd_BBB$vVl7s;3Sv8EBy!>hR_t}6^{x{c_`v;@z4G43lQKr60d+L^
z*BB)AD5Up5Jgq^{TOgLQR(9ok9n5vRVpV8WvOZB14Op#|SF8kAPwoj}f6EUe4q1Dg
zLdi4o+wH^JYD@G0WqKLOl}Cd>v!UTpJSOoxh+kT;j+&QV_;})|7aqzZiX)W~&8U^%
z0H{3--`pvz>{aeDj8RwjFy!iRo?9b%TSEKgk>yQz-;kp*V>lZf-~{oHAz0+<qlL)C
zaC~-Qd=r69AI`f^tj>nw$qQjj?h(m&1npv^dz4|L|I3{gA(v-U!+Z|cT8qwcm=;{Y
zrhm@apYa*!T8~-ny+~x_AvBN)%wK<6Bh8|rJGVQO%xll7oqLa&-U2`V$LAleNDh-b
zvAZzs9f%(B5A(hhP3J6;nwO0^u`wfQn5#teJ2U-H!<uyxUQ_WYE9=1VcpWd1vF8?^
zieT>hFPg;=D5WngABh)oIeJkbRlCNzNAS&D-c6?;lkigv<6eUp!NI(CKx-L@PBb4q
zYA;v2$?=GJQ<j)_yk4bWm(fa&;rlpG>n`*8496{I;uU8+=ib5mM>S=RF*aBEbcq)I
zBr-FO5x07%kjc1bglY@ngRKQ6YM*x=j^gwq1Nur1PZn0Rwa=*^^wt%Nb^!0&LJ{@R
zY_t|$iX#vG&&-**_@*?z+&UcEG1iZ>M?sZK@H8An#%T}Bee|Dq=sU6?vtxg^#3mbq
z^)(S2Y&<?J?cnHuL>+)FCvU5w(8?fpiedei<&g$is#QhVf?*um*M0|$>I^b&HJ0{b
z>~+O)%5L9@U*!UiEl9Afe7gdUSqQ%b8J4E<4Jh8`1Df%BB=RrxFbjShFMM5E6aQ>{
zrzrL}hUKbk9!b!!<!?xZ$HN~#Mm9VW8CfUFg(pH$F==?#innZNS-&3)f5~q?J6SJN
z<VFb6-+_*n0e_rgIkboC8)qh<4Vt)$zSrIsE3(;YoFgxso7TL+`j(&P1p2}W#&aJY
zwo`_Fe}^8(j(<x(<X6-RU>Ke}_30bm-2feeL5Y3nt&-+zc|ORvoI?NnLfgV==Nlf+
z_<a|1b(MKKhW+rID{A-hHNKHILi-H<<67tW{Rxl1Stri$%N52s7=1-6E614&Md0q_
z*pD8gc>b4M`!@Hm<7+9!s_YOn7wxCtj2<%yFY+RESnZwZ$ZRVHCKBD~6?fO}@bmol
zHU2>PSrj9<$M7RAg{qce%gDo`cPkCw$x_Bdv4tP-u0`QFD2Y}!1}lCiy4n*w(AUtZ
zk7N0t#y|atRgaH&td)gK8kE)Wl@}s@i2Sc)%wLDbxeT3i1$y;FbioeTt`*S&3lb?5
zgU9<{bSUk3%7%wZexz1>a|mB}JXy88(G#PI1zJsyv^ts?Dk+WiR@?A0G=NS*pfLG1
zs?d@_XrAi~8@iu)pZOyiZHU=q$wc1PhL+Km8I}o_ZWgDaoZyQ<b9JoH?_`LK#tO(~
zX@~Z_%{&9kIG*p4ql<q+H|~MWP=#Kc!2|G~j5T@aNdQ)LeM@W0UGqD5?S!ccR{tw3
z__;(&4MI1cZ3)E29dBL*FOD-$v7AJ2tA^e@A6=)HdAuc!b)%(}`LOASIlr|EHo;0P
z^$F&F=$}!R0JOC@>lEuYD$c}PW2`5vL#-{WBgi&C5z8W_^*B9=!BbEO8+aIg1DmC$
zWiE9<9$0p8onXsZGHcr{r|<}DX1oXCFG$V^x>=jLkDmRI<0h+Q*NM2+4u@B)Slwqv
z?-8xAmG!TsSjpPE(T8ZWCQw&RtYz))sLz~q07C}w>tJY3ac7gTr-yK^vMFdcVkz|H
z!gz?1;fMSLy}g8b_MrVP#~vF>q)}I-V_VR!I_nHM4SX*{gi#s1s~wP<J(0-$vFx;0
zX*mCw2X`HVukL|!7odFU_6OjYi*S*mJ+#;64fjxXiKGS!q{NGv84B@*-%~&*TF;a>
zJ_HIjVcmNS`%rxR8@|>N4ZlXg)gQ4ZUh%EPxI$iNP&ujc<Il_w&H1sv<|+JmnRc9k
zYmOM}xBJl)&cOTY4Rq8B^eKMRPMPbx9_8GVtcPlc&q`2p7Z2sdc?C*%SaVFndPV_I
zxD3w)q4=uCs;7LXQJ`C5UgK!j0|TXQargI(r=kna8f~~tzZLI$nHC7gbnho%TXx1i
z4AhEYRH9kiZ^%6MVSZ}Sqby)SR&ZerD^GKHrM?mO?h00{#SXazC0sSUD08ufw)5^F
zh_a0HX7T$nj<a}B{>6Vc1g;Ne?L_<ds8+|d+M=<~<BWND`8GgJ=eWZ%=;<aFb#G`y
zwIPS|%Q3DzACKT714Wc)ZVeXSTE1DxYSvV2a>c_hgsv8H<;B<nn?bhIaK84`+~uyH
zx%wMCh8MV&q6EitEydVxAi{Ao-pAFv-^RVQ(@@coGkL#`mTHILRz7usc84(@?Tp7j
zyhNkHv_;%iyY$y^oh|f1(d8@nr{WS7VW^ByOK8zF{Fx)U?li6$%t$CY{Rk`38i&P(
z1v%1KK^j7@wW8dUvFZoM^yBq!M9u)lvk!J{Z8%D+z?EpxP~H!qRpX8G2J*kUa8mXE
zQ-_dFp7>0Lq|6KQ)y4y>3?_A$)$&{+KXYzI+NEbD)eBnHN`Ur{T3FBc1YK(fwW#Gg
zIam}GrByU#Oi<p!2CS-f#sgfEyJ%0PGTmfBE@$Rl5@ffc;uKAy2rBuTDFe@gQpeoI
zm%BxBMkf4ZF8E8kAYU3%<{VN*@eYd0UI}+7?qV(;d&Ms(`eG@X!zz9cgm?DC6KCP`
z{X7EEEEEH=h_mP6#}71gj2(szzuS<s+7}dvbP5Dl<d0X}-(K)QYX*w?I1A4mhyU)=
z-*>c1+Cmia{v)5Gd8d8K%5nCX=a(Fc_>D3ULhF{F;m$X3*$rg$6?p45XK5Fo8E*6i
z2MXX9EMUw=5yr0p`c@NWx-Z92WK&D#wJtt`^o)#l*j<Aw|Bk^?WbFx~ZCbO_UPJBt
zThBLFIiz!4q}Bh?2jyg0OaGL2U?b<A;Fmjw_7qP$<)w@_be{~&cL~Ec@elJ^0yNPM
z=ii2$e@!1=p#>>Bk#-BFMT?SWGX>rcC)cp>Sy3J)K6w~}xAau;)&FAYbT%Tg6-lD~
zN)6CsE8~OvhiKJWyej%+GFrkKbe_GS^I_Vo460Wdv5WNarg7h|P>Qk<zBBZlyUeTl
zx{6ty&3umHE<LfRYN6d$MGj;`H+Eu|#9*a_@c4<H5>BMT7v6c%s_vl47h>2ViT|C0
z29k?;$OXPD22gjr#K;}z?(dNS4On#;4p0BxGpM~6+V9d7+SVFdHaMjWQmiH1-@@QP
z?ZPO?x5`%KgLF{*W*GCWENzOgR7_qe!wc0LteT3X*J_4#?(K&AwP$26o@e3Q9paMz
z1EuyDEAxlItkYn_F^;R8xeo+Vrm$r^b{lK|_mH@Ek-=BdSRUfTzJn+G9S9tO#*>Wv
zAUW|&Rx~pDR$vXI71>nlv4T;EHI5>z1f(It+JsCG#e4RQYwQAngjt(F#1lqz^jeTm
z`ACGNlR%cSjD=!YXMxcR!LMQTXDC?KpO_S757@>(m67Z-(cphraR|j*YGD;78F3I=
zS82d%N&|CUBf_B$t1T5-Q%Q{nJ_9ij%5|EXn1>2@v1$=7QH_>nXPqK7k3?ipNoJ1c
zdLc&q_H(q2U}p3r<Du+*)6kDbFl!yq8<Ii6KRJX~%9!~RiuVW8GB88Mn7OjdZ#`(f
zA$nd(X2(JvBW32e38lSc{=XpCl~+bP@ZDgYFCM&9=!z-H*O_3fxxV4ocbqSe{w-)m
z5loT%_c0uB9+|D266>H;A<#6?Y8d*+cr172*c=1x_lIulu<l$KD?L3vh@aTU7x8Rw
zU`>26w&gq`N>B6MAGZTbun0Dwa-CE}-zkN@Qw%#W3tCS)Xg(J{fGlu8W&9S+;JTjZ
zM#`Vkib$Ti+@T;=pkj4Ro?lqs%BgW3uCTJh(4So*8Ms1vu2}(JL=~=5mg^{Yi#WBr
zu`)f#*fp^ne%lH{ZlYgn(B{|BFYQg2cS6xaZTN>Q)z;`=1Nh%~<5{2T7|}$9=tCjS
zkk;0de~f^8r-0v+u>aPA8S6pe10eTS?y!TA+r%BVU_UGezh-mZ8YI9v&I;n}m3)>I
z5dSL!iFUBG#P+QY)-)vcqyg_0x!#Fi{$T`)8{U?3w4*dSVtLMK%>Tw96*hAB<y?C+
z*W8abbPLpZX~bG22IrJ<>n-0MN8?&e|Hsktc1Dz(?p2+UD#qwmG-Asu@V+W{9D%;M
z0F84YSKh@+@Bmsrm3yvdM7pz99mH?@xZ`9b%oJ9K7csiS_;r#YY36}&l40tlqA#vP
zzh|H^WzPGEju}GREW~s?Kz2TcLKXe_24r_bvnLFzTN36gyEDIn*WT!?ihat2lrM`c
zRwPe8P^c2taS0@LUF39Cw2KBvoTg|DqtG5Y@U8el-xT8<?cFPfepwOkxqkVF=K_3}
z4-Cu)_f!J?D)PA`XBPz-6=7EteWE_Hq&3={qOGLsYkyz|10%E=EL~;>7&w_@0lYgD
z`)fF7>Re^k)SklnpleMm07b)Vf4X7|3J||G1TLR|b|;j%25Kpz$Q!K4H`sVDu^jJX
zIX(dkpQ77EvS#B;rq)!TOG%I=GuB!FSeOfIF&*!V7^|R3Sts_Gl!Hl;Sf#NyZRYP>
z;|^%3oDZ9j7K*vk&Jg8V`J1U?F_<_V99+jQ<9V<6e(9tO7{@{Ms~^1@4oc1fKi7k#
z|AF!+SP$3^f*;3fdcYlmLGJtD{aZeLVdXjrt;<H+s<KX9*Vrke72FE6GNU<`T`}69
z9n5i_Lwi}zBO9)>j`<6zmXaQ+SNXu%eDtgopYns7$yf;wpl7-GlodO#7}ssg)wQbK
zk=5-QtfF*ccTHyyxg(#-ab9Y(7g32$eCfZr>w9qgm&uR#Gza#dM$5(hVvsOq^B<lQ
zv!-m}A7<lD$@uh>Y^lmSeVP8<rq_oV$^A$><yIL6Mk@}nJ~UGTJWh**{EF<o4c#j`
z<vJXAo8twT`Vj5w2|RfpyW}|jgaz=-6nJ9|hp^xX|2qNq&IVEX!hPCpSd*iu;UOyk
zu2qEByMPme;k#M<Yb~6-oVbR`aOxPibFg6}j^vZDXe!@L1XsFYWA^6wYeUX~mlgl9
z%)p?Pd{TVJTvmj}<A+twR(WGvg3v8El;gEC|Lh0$Y3E;mKDFmQ%9J*e>r=~`wko60
z1n#+z)-2#!(qpIdZUHi5C9PBh&?Q5s6Nf9($V2R%_Gv~UowVEJ61tzX+G*guGH+A?
zf2x9YCDDZH@ToeH&DF4o3xHEqLCFqaS0~1zEzep_>dt8OK@yFn9WxCqno9o_qq3EG
zIK)gSwpjU)6z#p8vC}@eod!w=89pmTNy~#LeR-2HJCbx7j}i1<nU<$<Wo49G%~&Z)
zItcxHHyF2p|0%xdGCIEUP*c>wqx>$4>Q<(J($Gb2<al1b^TGr0oiR{FinCC`AyDrt
z_-6(e6x&w`yeWfh)~=R6;NT1B=K*@Wc3kaeBo)ImkFlN1EGZvUcRZ$@n61X3az#)(
zJzirUV$ZZ+sv<`-Mr9hZW+y$K3p#f~j!G(aVw{IF23NVN#$5E%4++x}daG<mh0e^X
z{xQ*51sH9};(5$xGw!MVvpGF`@q4%DUx_>$SbcirZpm0>@O*U_MFXXh99*`6JgQ{J
zrVd6QDps6x=i)AR$<W!EEY!K^|8jWl9y(TeQ%AC%ccxPJS~A_OCDVI;(-GH~_=2wc
z?ybyD1)>C6GZKZ70T(>suDJ<cU1gbdf6rT1_3yh^dm4aRZ#|RTIo%UHQKow4uBOJG
z!L%=(X94v<c9^cRV(eqC>6z^g^h^Mg$I|LP%vc{{|Db;SXbnt*nfDoJffJB{e>{`W
zS%W~$Pli6WfmY51ALsEr6lqe0d5S|fx&Sq2V8(Kyr6omTWaTJ~7QGO=cM{k>7Q|Er
z4CT>yVW9X|^xAje`%gYU22q}HXpi?EEbSfO?@~}#I~d1-(BnbhF<fOj-^}6rCA3J9
zLg#p0X|zSVyOojWAV|EGLt6K3uKo*I5rV{sMQ8P5Z()9poam)#!QPC>isVQIFSO8T
z&Qc~1wdA3Zo8ugsy3knpf21=9an)^T=Npj-%55|mc`$^QP2$r8Z1xUFwu<=7v!hie
zAhq8z!)MSIb{U@ViO@tpP@_Nmspt<;koFQDz%tTqi&wn<$GHpmRXd!uN~L^gJ)ykm
zP}xziQXVMfz_<_ZDk4`=xV!m9z8z_&=eUM;RnG+#$3WTIDVG_IFphPdXx5@)-8Q1V
zqtF+hvF3GySdhIu4zgbJ!);<Er4o0W$Mrs;ktljK91UB$pp|z{S*A9^;Rg)6ZV%d=
zb|EaI75{=Q+PxsZld|5Qr7hy;RUof;sV&DKdNq}ESMyj;>vtG<p<JFpaO76B#4B+3
z8%9m>HFl&&PBa^NHRRtngOE{R;}?#HXy?kc{0c2zI}&8QDsMn;jvV|eGgOxk1j))L
zWy`98F4O@1rU}|hBf|o(54Kd{`(}LK8mZM8+24`5)IQzL=qOqPQIxsj%yV&N*~HSz
zv^QK5OkO_O6xTS;gV*xhYfrs)bZP_+Gg2q$*><={c`p_*e{(_L`CyAyXlC+$5xlMV
z1La-P+TUd4hjib68L?r=saE(vYk)wC+AqsPdpDc$sAb@P8RoSBxKS9*w-E1&8F*Ki
zYn)@0UPBF=xW4vL&*vViLA|Nmd$=KG`x+}sjnV9t#ZmJziN_dZ-d`!9?C_@z1iy)O
zew8a~J>u_r05un(LU|#^cp_L|3wD2FA3}Z47}jLxBFTbjbviWkrO;nPq}M2VJ{TF*
z8f4YpLG}0+Qo(^G?aRm(WrTAuM_O-63(|Wd75?Vn&CLAeMG~ZDJ`18zC`0StoO*51
zI7g##C}W&r{FTK<$S{a+hx1t;I7JgGx1RDi9Kx#nj25Py#}Vi~pFx|~XlK$VAA%6t
z9isPl@NeEkm)i)+El2ko4o%eJzRD_=1J8(dUnQZv+IvzT8W{x&%>kWe5nV6qSUS;N
zbk(0kt3+WLQvlBV4bAmEh;oBFoW~=126TJGr_*59I*?-k6sKH1%Ks)WO;xU50V%0G
z_^MH(JvQH=Eajro?woM!Rx_>8T39YTI;t|Fj9S_qk&SOs8>s6~Tb08tJ6uqPD=V{7
z5AM~U6~3X2(iFz4KSz6P<$Q3ADkxZ3D^r|Iq+yFFc3Au9GIHHayxN!-6ERWRh4UWi
zqx`M%HH!nT!xbms;`LDaa*$vfuN$C#MU~3~f1mbiC)^jLyK>gv;kXPhDZ_)}Y&Y^y
z4ISlmQ0~v+aKSR@SH3#sI6MYF9R@M>@%cagCEQuhLwOyx@hVLv5IP?V?~H@WCxNp)
zIFvJ>2G*}KN%|8H;6*f;oqfq^;g^E&R8!9Agaj84^n@3bQD7o%U&*yr(>CFlVyV?n
zW!oHprZtc=lzFl}M`vu%V%Vgkq3ym<do6f&B%BeB-gA%Eb*3-fX`=}$8V!9GWjEAw
zJfQ=b_bzBd;q*?qj+Cc-6a804%0B$3BmDP(k#@7&sVelm5elosjPJ(>_QQP`nY|Ux
z7{Cg62Jm_^ICO{}{>!TT5YIelFR7^@_ss)cjAAUJsSH&DI<Qft=N^?xc0qdusEKhA
zSuML!dG+^!yV^V9ix)S9_WM9NiWt}Gs<J9uz~kSb+g)Z)`P!7BS~2a4bksWJ9cc3%
z`kwN?Yi(9GWNAET16cRzj%1tzH!5m)9&%D~mu;{ui{o?AzOqEnZW3rcG4GYBED0W-
zR8Y0@D(cmrYo&({6@Q+F@03ACIT6yMWoR8I4|b`nUm?9RDN3{a+k)JjqfE!*4J+PO
z?L?FQco~{iWTy5-oS`p&SB<tJh2)Wt4yk=orO_CxVQ&;i61PEqwP8Mc@mnYGOwk$x
zp$BE$*6VoCZXGng9jSMcD<6a|wK~3r@oxiy*9NVVL3{DUTf`uJ<B+wE|L1K=Wzc3k
z{vhq6vO%YQe3KEpuf+X@_R6=bb*~ZJXFilKO*<H`!3$`_iTBH&aZ^5?yjcHv8I8`!
zx#n1@ZNSCpXe!DB(S^^-`&f%J(jrf_-{3Q&p><^KzPrn4XdnDxW>(oyg5m0qXaL{w
zOL(E(X;-NmX{#0YEZjpmbX#+$cIZ4^K`+JO)d9aM^Qd6x`Q`8!XveO<v1{d*(cT+q
zM8CLqBwlE(uFKD^%&poH6v=($)2PUGw1Xb472iey<}wE}uRTiId9O9YP`ulAW4%5V
zcaVh642|UBk&_Yln_Ds?W1_VU^-?=Iw1eM8`<3-s8Rjd2iORt-jS<?31bBs2`xY6Y
wjLy;n<3VuwO{J&aMDtTUkIl%0k?07m&>9pOtcpt-^#G{Mn||8SP+oZc2gq-Z82|tP

diff --git a/tools.py b/tools.py
index f8298e6..168bb1a 100644
--- a/tools.py
+++ b/tools.py
@@ -435,7 +435,7 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
 
             start_time = time.time()
             self.log.info("Start Speaker Diarization: %s" % (start_time))
-            if self.maxNrSpeakers == 1:
+            if self.maxNrSpeakers == 1 or audio.dur < 3:
                 self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
                 return [[0, audio.dur, 1],
                         [audio.dur, -1, -1]]

From 51f562dc3cf27d8e588003058a3ceadbeca1d1d6 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 17 Aug 2020 14:44:43 +0200
Subject: [PATCH 018/172] update readme

---
 README.md | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index 8e53305..a2540a8 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ This service is mandatory in a LinTO platform stack as the main worker for speec
 Generally, Automatic Speech Recognition (ASR) is the task of recognition and translation of spoken language into text. Our ASR system takes advantages from the recent advances in machine learning technologies and in particular deep learning ones (TDNN, LSTM, attentation-based architecture). The core of our system consists of two main components: an acoustic model and a decoding graph. A high-performance ASR system relies on an accurate acoustic model as well as a perfect decoding graph.
 
 ## Usage
-See documentation : [doc.linto.ai](https://doc.linto.ai)
+See documentation : [doc.linto.ai](https://doc.linto.ai/#/services/linstt)
 
 # Deploy
 
@@ -42,12 +42,12 @@ Or, download the pre-built image from docker-hub:
 docker pull lintoai/linto-platform-stt-standalone-worker:latest
 ```
 
-NB: You must install docker on your machine.
+NOTE: You must install docker on your machine.
 
 ## Configuration
-The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [Downloads](https://doc.linto.ai/#/services/linstt_download).
+The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [linstt download](services/linstt_download).
 
-### Outside LinTO-Platform-STT-Service-Manager
+### Outside LinTO-Platform-STT-Service-Manager
 
 If you want to use our service alone without LinTO-Platform-STT-Service-Manager, you must `unzip` the files and put the extracted ones in the [shared storage](https://doc.linto.ai/#/infra?id=shared-storage). For example,
 
@@ -72,13 +72,15 @@ mv AM_fr-FR ~/linto_shared/data
 mv DG_fr-FR_Small ~/linto_shared/data
 ```
 
-4- Configure the environment file `.env` included in this repository
+4- Rename the default environment file `.envdefault` included in the repository `linto-platform-stt-standalone-worker` and configure it by providing the full path of the following parameters:
 
     AM_PATH=/full/path/to/linto_shared/data/AM_fr-FR
     LM_PATH=/full/path/to/linto_shared/data/DG_fr-FR_Small
 
+5- If you want to use Swagger interface, you need to set the corresponding environment parameter:
+    SWAGGER_PATH=/full/path/to/swagger/file
 
-NB: if you want to use the visual user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in this repository. Specifically, in the section `host`, specify the adress of the machine in which the service is deployed.
+NOTE: if you want to use the user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in the repository `linto-platform-stt-standalone-worker`. Specifically, in the section `host`, specify the address of the machine in which the service is deployed.
 
 ### Using LinTO-Platform-STT-Service-Manager
 In case you want to use `LinTO-Platform-STT-Service-Manager`, you need to:
@@ -87,9 +89,9 @@ In case you want to use `LinTO-Platform-STT-Service-Manager`, you need to:
 
 2- Create a language model and upload the corresponding decoding graph
 
-3- Configure the environmenet file of this service.
+3- Configure the environment file of this service.
 
-For more details, see configuration instruction in [LinTO - STT-Manager](https://doc.linto.ai/#/manager)
+For more details, see instructions in [LinTO - STT-Manager](https://doc.linto.ai/#/services/stt_manager)
 
 ## Execute
 In order to run the service alone, you have only to execute:
@@ -98,8 +100,9 @@ In order to run the service alone, you have only to execute:
 cd linto-platform-stt-standalone-worker
 docker-compose up
 ```
+Then you can acces it on [localhost:8888](localhost:8888)
 
-To run and manager LinSTT under `LinTO-Platform-STT-Service-Manager` service, you need to create a service first and then to start it. See [LinTO - STT-Manager](services/manager?id=execute)
+To run and manager LinSTT under `LinTO-Platform-STT-Service-Manager` service, you need to create a service first and then to start it. See [LinTO - STT-Manager](https://doc.linto.ai/#/services/stt_manager_how2use?id=how-to-use-it)
 
 Our service requires an audio file in `Waveform format`. It should has the following parameters:
 
@@ -109,6 +112,8 @@ Our service requires an audio file in `Waveform format`. It should has the follo
     - microphone: any type
     - duration: <30 minutes
 
+Other formats are also supported: mp3, aiff, flac, and ogg.
+
 ### Run Example Applications
 To run an automated test go to the test folder
 
@@ -122,5 +127,26 @@ And run the test script:
 ./test_deployment.sh
 ```
 
-Or use swagger interface to perform your personal test
+Or use swagger interface to perform your personal test: localhost:8888/api-doc/
+
+
+<!-- tabs:start -->
+
+#### ** /transcribe **
+
+Convert a speech to text
+
+### Functionality
+>  `post`  <br>
+> Make a POST request
+>>  <b  style="color:green;">Arguments</b> :
+>>  -  **{File} file** : Audio file (file format: wav, mp3, aiff, flac, ogg)
+>>  -  **{Integer} nbrSpeaker (optional)**: Number of speakers engaged in dialog
+>>  -  **{String} speaker (optional)**: Do speaker diarization (yes|no)
+>
+>>  <b  style="color:green;">Header</b> :
+>>  -  **{String} Accept**: response content type (text/plain|application/json)
+>
+>  **{text|Json}** : Return the full transcription or a json object with metadata
 
+<!-- tabs:end -->

From 04795c1cb6edac269642ff206a76ccf314e08d3a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 1 Sep 2020 12:04:06 +0200
Subject: [PATCH 019/172] remove extrat words from transcription when using
 text/plain response

---
 Jenkinsfile | 1 +
 tools.py    | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index d027c84..5f464c5 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -44,6 +44,7 @@ pipeline {
                     ).trim()
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push('latest-unstable')
+                        image.push('offline')
                     }
                 }
             }
diff --git a/tools.py b/tools.py
index 168bb1a..92b8dc1 100644
--- a/tools.py
+++ b/tools.py
@@ -542,7 +542,8 @@ def run(self,audio,asr,spk):
             else:
                 return {"text":output["text"]}
         else:
-            return decode["text"]
+            text = re.sub(r"#nonterm:[^ ]* ", "", decode["text"])
+            return text
 
     def getOutput(self,timestamps,frame_shift, frame_subsampling, spkSeg = []):
         output = {}

From 460efb7d19f4f28c750baee619dfabe189fa5c99 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 25 Sep 2020 02:48:57 +0200
Subject: [PATCH 020/172] update stt worker offline

---
 .gitmodules |   6 +
 Dockerfile  | 173 ++++------
 pyBK        |   1 +
 run.py      | 148 ++++-----
 tools.py    | 922 ++++++++++++++++++++++------------------------------
 vosk-api    |   1 +
 6 files changed, 515 insertions(+), 736 deletions(-)
 create mode 100644 .gitmodules
 create mode 160000 pyBK
 mode change 100755 => 100644 run.py
 create mode 160000 vosk-api

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..9cea8d6
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,6 @@
+[submodule "vosk-api"]
+	path = vosk-api
+	url = git@github.com:irebai/vosk-api.git
+[submodule "pyBK"]
+	path = pyBK
+	url = git@github.com:irebai/pyBK.git
diff --git a/Dockerfile b/Dockerfile
index 6608943..604bdb7 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,128 +1,81 @@
-# Dockerfile for building PyKaldi image from Ubuntu 16.04 image
 FROM ubuntu:18.04
 LABEL maintainer="irebai@linagora.com"
 
-# Install necessary system packages
-RUN apt-get update \
-    && apt-get install -y \
-    python3 \
+RUN apt-get update &&\
+    apt-get install -y \
+    python2.7   \
+    python3     \
     python3-pip \
-    python2.7 \
-    autoconf \
-    automake \
-    cmake \
-    make \
-    curl \
-    g++ \
-    git \
-    graphviz \
-    libatlas3-base \
-    libtool \
-    pkg-config \
-    sox \
-    subversion \
-    bzip2 \
-    unzip \
-    wget \
-    zlib1g-dev \
-    ca-certificates \
-    gfortran \
-    patch \
-    ffmpeg \
-    nano && \
-    ln -s /usr/bin/python3 /usr/bin/python && \
-    ln -s /usr/bin/pip3 /usr/bin/pip
+    git  \
+    swig \
+    nano \
+    sox  \
+    automake wget unzip build-essential libtool zlib1g-dev locales libatlas-base-dev ca-certificates gfortran subversion &&\
+    apt-get clean
 
-# Install necessary Python packages (pykaldi dependencies)
-RUN pip install --upgrade pip \
-    numpy \
-    setuptools \
-    pyparsing \
-    ninja
+## Build kaldi and Clean installation (intel, openfst, src/*)
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+    cd /opt/kaldi/tools && \
+    ./extras/install_mkl.sh && \
+    make -j $(nproc) && \
+    cd /opt/kaldi/src && \
+    ./configure --shared && \
+    make depend -j $(nproc) && \
+    make -j $(nproc) && \
+    mkdir -p /opt/kaldi/src_ && \
+    mv       /opt/kaldi/src/base \
+             /opt/kaldi/src/chain \
+             /opt/kaldi/src/cudamatrix \
+             /opt/kaldi/src/decoder \
+             /opt/kaldi/src/feat \
+             /opt/kaldi/src/fstext \
+             /opt/kaldi/src/gmm \
+             /opt/kaldi/src/hmm \
+             /opt/kaldi/src/ivector \
+             /opt/kaldi/src/kws \
+             /opt/kaldi/src/lat \
+             /opt/kaldi/src/lm \
+             /opt/kaldi/src/matrix \
+             /opt/kaldi/src/nnet \
+             /opt/kaldi/src/nnet2 \
+             /opt/kaldi/src/nnet3 \
+             /opt/kaldi/src/online2 \
+             /opt/kaldi/src/rnnlm \
+             /opt/kaldi/src/sgmm2 \
+             /opt/kaldi/src/transform \
+             /opt/kaldi/src/tree \
+             /opt/kaldi/src/util \
+             /opt/kaldi/src/itf \
+             /opt/kaldi/src/lib /opt/kaldi/src_ && \
+    cd /opt/kaldi && rm -r src && mv src_ src && rm src/*/*.cc && rm src/*/*.o && rm src/*/*.so && \
+    cd /opt/intel/mkl/lib && rm -f intel64/*.a intel64_lin/*.a && \
+    cd /opt/kaldi/tools && mkdir openfst_ && mv openfst-*/lib openfst-*/include openfst-*/bin openfst_ && rm openfst_/lib/*.so* openfst_/lib/*.la && \
+    rm -r openfst-*/* && mv openfst_/* openfst-*/ && rm -r openfst_
 
-## Install Protobuf, CLIF, Kaldi and PyKaldi and Clean installation
-RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
-    && cd /pykaldi/tools \
-    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_kaldi.sh \
-    && sed -i "s/\-j 2/-j $(nproc)/g" ./install_clif.sh \
-    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_protobuf.sh \
-    && ./check_dependencies.sh \
-    && ./install_protobuf.sh \
-    && ./install_clif.sh \
-    && ./install_kaldi.sh \
-    && cd /pykaldi \
-    && python setup.py install \
-    && rm -rf   /pykaldi/CMakeLists.txt \
-                /pykaldi/LICENSE \
-                /pykaldi/README.md \
-                /pykaldi/setup.cfg \
-                /pykaldi/setup.py \
-                /pykaldi/docker \
-                /pykaldi/docs \
-                /pykaldi/extras \
-                /pykaldi/pykaldi.egg-info \
-                /pykaldi/tests \
-                /pykaldi/build/CMakeCache.txt \
-                /pykaldi/build/bdist.linux-x86_64 \
-                /pykaldi/build/build.ninja \
-                /pykaldi/build/cmake_install.cmake \
-                /pykaldi/build/docs \
-                /pykaldi/build/kaldi \
-                /pykaldi/build/lib \
-                /pykaldi/build/rules.ninja \
-                /pykaldi/tools/check_dependencies.sh \
-                /pykaldi/tools/clif* \
-                /pykaldi/tools/find_python_library.py \
-                /pykaldi/tools/install_* \
-                /pykaldi/tools/protobuf \
-                /pykaldi/tools/use_namespace.sh \
-                /pykaldi/tools/kaldi/COPYING \
-                /pykaldi/tools/kaldi/INSTALL \
-                /pykaldi/tools/kaldi/README.md \
-                /pykaldi/tools/kaldi/egs \
-                /pykaldi/tools/kaldi/misc \
-                /pykaldi/tools/kaldi/scripts \
-                /pykaldi/tools/kaldi/windows \
-    && mkdir -p /pykaldi/tools/kaldi/src_/lib \
-    && mv  /pykaldi/tools/kaldi/src/base/libkaldi-base.so \
-            /pykaldi/tools/kaldi/src/chain/libkaldi-chain.so \
-            /pykaldi/tools/kaldi/src/cudamatrix/libkaldi-cudamatrix.so \
-            /pykaldi/tools/kaldi/src/decoder/libkaldi-decoder.so \
-            /pykaldi/tools/kaldi/src/feat/libkaldi-feat.so \
-            /pykaldi/tools/kaldi/src/fstext/libkaldi-fstext.so \
-            /pykaldi/tools/kaldi/src/gmm/libkaldi-gmm.so \
-            /pykaldi/tools/kaldi/src/hmm/libkaldi-hmm.so \
-            /pykaldi/tools/kaldi/src/ivector/libkaldi-ivector.so \
-            /pykaldi/tools/kaldi/src/kws/libkaldi-kws.so \
-            /pykaldi/tools/kaldi/src/lat/libkaldi-lat.so \
-            /pykaldi/tools/kaldi/src/lm/libkaldi-lm.so \
-            /pykaldi/tools/kaldi/src/matrix/libkaldi-matrix.so \
-            /pykaldi/tools/kaldi/src/nnet/libkaldi-nnet.so \
-            /pykaldi/tools/kaldi/src/nnet2/libkaldi-nnet2.so \
-            /pykaldi/tools/kaldi/src/nnet3/libkaldi-nnet3.so \
-            /pykaldi/tools/kaldi/src/online2/libkaldi-online2.so \
-            /pykaldi/tools/kaldi/src/rnnlm/libkaldi-rnnlm.so \
-            /pykaldi/tools/kaldi/src/sgmm2/libkaldi-sgmm2.so \
-            /pykaldi/tools/kaldi/src/transform/libkaldi-transform.so \
-            /pykaldi/tools/kaldi/src/tree/libkaldi-tree.so \
-            /pykaldi/tools/kaldi/src/util/libkaldi-util.so \
-            /pykaldi/tools/kaldi/src_/lib \
-        && rm -rf /pykaldi/tools/kaldi/src && mv /pykaldi/tools/kaldi/src_ /pykaldi/tools/kaldi/src \
-        && cd /pykaldi/tools/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
-                rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
+# Install pyBK (speaker diarization toolkit)
+RUN apt install -y software-properties-common && wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && ./llvm.sh 10 && \
+    export LLVM_CONFIG=/usr/bin/llvm-config-10 && \
+    pip3 install numpy && \
+    pip3 install websockets && \
+    pip3 install librosa webrtcvad scipy sklearn
+
+# build VOSK KALDI
+COPY vosk-api /opt/vosk-api
+RUN cd /opt/vosk-api/python && \
+    export KALDI_ROOT=/opt/kaldi && \
+    export KALDI_MKL=1 && \
+    python3 setup.py install --user --single-version-externally-managed --root=/
 
 # Define the main folder
 WORKDIR /usr/src/speech-to-text
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger librosa webrtcvad scipy sklearn
-RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git \
-    && git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
-    && cp /pykaldi/tools/pyBK/diarizationFunctions.py .
+RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml
 
 # Set environment variables
 ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
 
+COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py .
 COPY run.py .
 
diff --git a/pyBK b/pyBK
new file mode 160000
index 0000000..7738eb7
--- /dev/null
+++ b/pyBK
@@ -0,0 +1 @@
+Subproject commit 7738eb75dfc65438fbcd0eed9bb6a1f086b4bd6c
diff --git a/run.py b/run.py
old mode 100755
new mode 100644
index ecdbb18..a95cf47
--- a/run.py
+++ b/run.py
@@ -2,79 +2,36 @@
 # -*- coding: utf-8 -*-
 
 from flask import Flask, request, abort, Response, json
-from flask_swagger_ui import get_swaggerui_blueprint
-from flask_cors import CORS
-from tools import ASR, Audio, SpeakerDiarization, SttStandelone
-import yaml, os, sox, logging
+from vosk import Model, KaldiRecognizer
+from tools import WorkerStreaming
 from time import gmtime, strftime
 
+from gevent.pywsgi import WSGIServer
+
+
+
 app = Flask("__stt-standelone-worker__")
 
-# Set logger config
-logger = logging.getLogger(__name__)
-logging.basicConfig(level=logging.DEBUG)
-
-# Main parameters
-AM_PATH = '/opt/models/AM'
-LM_PATH = '/opt/models/LM'
-TEMP_FILE_PATH = '/opt/tmp'
-CONFIG_FILES_PATH = '/opt/config'
-NBR_PROCESSES = 1
-SAVE_AUDIO = False
-SERVICE_PORT = 80
-SWAGGER_URL = '/api-doc'
-SWAGGER_PATH = ''
-asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
-
-if not os.path.isdir(TEMP_FILE_PATH):
-    os.mkdir(TEMP_FILE_PATH)
-if not os.path.isdir(CONFIG_FILES_PATH):
-    os.mkdir(CONFIG_FILES_PATH)
-
-# Environment parameters
-if 'SERVICE_PORT' in os.environ:
-    SERVICE_PORT = os.environ['SERVICE_PORT']
-if 'SAVE_AUDIO' in os.environ:
-    SAVE_AUDIO = os.environ['SAVE_AUDIO']
-if 'NBR_PROCESSES' in os.environ:
-    if int(os.environ['NBR_PROCESSES']) > 0:
-        NBR_PROCESSES = int(os.environ['NBR_PROCESSES'])
-    else:
-        exit("You must to provide a positif number of processes 'NBR_PROCESSES'")
-if 'SWAGGER_PATH' in os.environ:
-    SWAGGER_PATH = os.environ['SWAGGER_PATH']
-
-def swaggerUI():
-    ### swagger specific ###
-    swagger_yml = yaml.load(open(SWAGGER_PATH, 'r'), Loader=yaml.Loader)
-    swaggerui = get_swaggerui_blueprint(
-        SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
-        SWAGGER_PATH,
-        config={  # Swagger UI config overrides
-            'app_name': "STT API Documentation",
-            'spec': swagger_yml
-        }
-    )
-    app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
-    ### end swagger specific ###
-
-def getAudio(file,audio):
-    file_path = TEMP_FILE_PATH+file.filename.lower()
-    file.save(file_path)
-    audio.transform(file_path)
-    if not SAVE_AUDIO:
-        os.remove(file_path)
-    
+# create WorkerStreaming object
+worker = WorkerStreaming()
+
+# Load ASR models (acoustic model and decoding graph)
+worker.log.info('Load acoustic model and decoding graph')
+model = Model(worker.AM_PATH, worker.LM_PATH,
+              worker.CONFIG_FILES_PATH+"/online.conf")
+
+
+# API
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
-        app.logger.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
-        # create main objects
-        spk = SpeakerDiarization()
-        audio = Audio(asr.get_sample_rate())
-        
-        #get response content type
-        metadata = False
+        worker.log.info('[%s] New user entry on /transcribe' %
+                        (strftime("%d/%b/%d %H:%M:%S", gmtime())))
+
+        metadata = worker.METADATA
+        nbrSpk = 10
+
+        # get response content type
         if request.headers.get('accept').lower() == 'application/json':
             metadata = True
         elif request.headers.get('accept').lower() == 'text/plain':
@@ -82,69 +39,80 @@ def transcribe():
         else:
             raise ValueError('Not accepted header')
 
-        #get speaker parameter
+        # get speaker parameter
         spkDiarization = False
         if request.form.get('speaker') != None and (request.form.get('speaker').lower() == 'yes' or request.form.get('speaker').lower() == 'no'):
-            spkDiarization = True if request.form.get('speaker').lower() == 'yes' else False
-            #get number of speakers parameter
+            spkDiarization = True if request.form.get(
+                'speaker').lower() == 'yes' else False
+            # get number of speakers parameter
             try:
                 if request.form.get('nbrSpeaker') != None and spkDiarization and int(request.form.get('nbrSpeaker')) > 0:
-                    spk.set_maxNrSpeakers(int(request.form.get('nbrSpeaker')))
+                    nbrSpk = int(request.form.get('nbrSpeaker'))
                 elif request.form.get('nbrSpeaker') != None and spkDiarization:
-                    raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
+                    raise ValueError(
+                        'Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
             except Exception as e:
-                app.logger.error(e)
-                raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
+                worker.log.error(e)
+                raise ValueError(
+                    'Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
         else:
             if request.form.get('speaker') != None:
                 raise ValueError('Not accepted "speaker" field value (yes|no)')
 
-        stt = SttStandelone(metadata,spkDiarization)
-        
-        #get input file
+        # get input file
         if 'file' in request.files.keys():
             file = request.files['file']
-            getAudio(file,audio)
-            output = stt.run(audio,asr,spk)
+            worker.getAudio(file)
+            rec = KaldiRecognizer(model, worker.rate, metadata)
+            response = rec.Decode(worker.data)
+            if metadata:
+                obj = rec.GetMetadata()
+                data = json.loads(obj)
+                response = worker.process_metadata(data, spkDiarization, nbrSpk)
         else:
             raise ValueError('No audio file was uploaded')
 
-        return output, 200
+        return response, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
-        app.logger.error(e)
+        worker.log.error(e)
         return 'Server Error', 500
 
+
 @app.route('/healthcheck', methods=['GET'])
 def check():
     return '', 200
 
 # Rejected request handlers
+
+
 @app.errorhandler(405)
 def method_not_allowed(error):
     return 'The method is not allowed for the requested URL', 405
 
+
 @app.errorhandler(404)
 def page_not_found(error):
     return 'The requested URL was not found', 404
 
+
 @app.errorhandler(500)
 def server_error(error):
-    app.logger.error(error)
+    worker.log.error(error)
     return 'Server Error', 500
 
+
 if __name__ == '__main__':
     try:
-        #start SwaggerUI
-        if SWAGGER_PATH != '':
-            swaggerUI()
+        # start SwaggerUI
+        if worker.SWAGGER_PATH != '':
+            worker.swaggerUI(app)
+        # Run server
 
-        #Run ASR engine
-        asr.run()
+        http_server = WSGIServer(('', worker.SERVICE_PORT), app)
+        http_server.serve_forever()
 
-        #Run server
-        app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
     except Exception as e:
-        app.logger.error(e)
-        exit(e)
\ No newline at end of file
+        worker.log.error(e)
+        exit(e)
diff --git a/tools.py b/tools.py
index 92b8dc1..8cc3715 100644
--- a/tools.py
+++ b/tools.py
@@ -1,619 +1,469 @@
-## Kaldi ASR decoder
-from kaldi.asr import NnetLatticeFasterOnlineRecognizer
-from kaldi.decoder import (LatticeFasterDecoderOptions,
-                           LatticeFasterOnlineDecoder)
-from kaldi.nnet3 import NnetSimpleLoopedComputationOptions
-from kaldi.online2 import (OnlineEndpointConfig,
-                           OnlineIvectorExtractorAdaptationState,
-                           OnlineNnetFeaturePipelineConfig,
-                           OnlineNnetFeaturePipelineInfo,
-                           OnlineNnetFeaturePipeline,
-                           OnlineSilenceWeighting)
-from kaldi.util.options import ParseOptions
-from kaldi.util.table import SequentialWaveReader
-from kaldi.matrix import Matrix, Vector
-##############
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
 
-## word to CTM
-from kaldi.lat.align import (WordBoundaryInfoNewOpts,
-                            WordBoundaryInfo,
-                            word_align_lattice)
-from kaldi.lat.functions import (compact_lattice_to_word_alignment,
-                                 compact_lattice_shortest_path)
-from kaldi.asr import NnetRecognizer
-import kaldi.fstext as _fst
+#  ASR
+from vosk import Model, KaldiRecognizer
 ##############
 
-## Speaker Diarization
-from diarizationFunctions import *
-import numpy as np
+# Speaker Diarization
+from pyBK.diarizationFunctions import *
 import librosa
-from kaldi.ivector import (compute_vad_energy,
-                           VadEnergyOptions)
-from kaldi.feat.mfcc import Mfcc, MfccOptions
-from kaldi.util.options import ParseOptions
+import time
+import webrtcvad
 ##############
 
-## other packages
-import configparser, sys, os, re, sox, time, logging
-from concurrent.futures import ThreadPoolExecutor
+# other packages
+import configparser
+import logging
+import os
+import re
+import json
+import yaml
+import scipy.io.wavfile
+import numpy as np
+from flask_swagger_ui import get_swaggerui_blueprint
 ##############
 
-class ASR:
-    def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
-        self.log = logging.getLogger('__stt-standelone-worker__.ASR')
-        self.AM_PATH = AM_PATH
-        self.LM_PATH = LM_PATH
-        self.CONFIG_FILES_PATH = CONFIG_FILES_PATH
+
+class WorkerStreaming:
+    def __init__(self):
+        # Set logger config
+        self.log = logging.getLogger("__stt-standelone-worker-streaming__")
+        logging.basicConfig(level=logging.INFO)
+
+        # Main parameters
+        self.AM_PATH = '/opt/models/AM'
+        self.LM_PATH = '/opt/models/LM'
+        self.TEMP_FILE_PATH = '/opt/tmp'
+        self.CONFIG_FILES_PATH = '/opt/config'
+        self.SAVE_AUDIO=False
+        self.SERVICE_PORT = 80
+        self.NBR_THREADS = 100
+        self.METADATA = True
+        self.SWAGGER_URL = '/api-doc'
+        self.SWAGGER_PATH = ''
+
+        if not os.path.isdir(self.CONFIG_FILES_PATH):
+            os.mkdir(self.CONFIG_FILES_PATH)
+
+        if not os.path.isdir(self.TEMP_FILE_PATH):
+            os.mkdir(self.TEMP_FILE_PATH)
+
+        # Environment parameters
+        if 'NBR_THREADS' in os.environ:
+            if int(os.environ['NBR_THREADS']) > 0:
+                self.NBR_THREADS = int(os.environ['NBR_THREADS'])
+            else:
+                self.log.warning(
+                    "You must to provide a positif number of threads 'NBR_THREADS'")
+        if 'SWAGGER_PATH' in os.environ:
+            self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
+
+
+        # start loading ASR configuration
+        self.log.info("Create the new config files")
+        self.loadConfig()
+
+
+    def swaggerUI(self, app):
+        ### swagger specific ###
+        swagger_yml = yaml.load(open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+        swaggerui = get_swaggerui_blueprint(
+            self.SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+            self.SWAGGER_PATH,
+            config={  # Swagger UI config overrides
+                'app_name': "STT API Documentation",
+                'spec': swagger_yml
+            }
+        )
+        app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
+        ### end swagger specific ###
+
+
+    def getAudio(self,file):
+        file_path = self.TEMP_FILE_PATH+"/"+file.filename.lower()
+        file.save(file_path)
+        self.rate, self.data = scipy.io.wavfile.read(file_path)
+        
+        if not self.SAVE_AUDIO:
+            os.remove(file_path)
     
-    def run(self):
-        def loadConfig(self):
-            #get decoder parameters from "decode.cfg"
-            decoder_settings = configparser.ConfigParser()
-            decoder_settings.read(self.AM_PATH+'/decode.cfg')
-            self.DECODER_SYS = decoder_settings.get('decoder_params', 'decoder')
-            self.AM_FILE_PATH = decoder_settings.get('decoder_params', 'ampath')
-            self.DECODER_MINACT = int(decoder_settings.get('decoder_params', 'min_active'))
-            self.DECODER_MAXACT = int(decoder_settings.get('decoder_params', 'max_active'))
-            self.DECODER_BEAM = float(decoder_settings.get('decoder_params', 'beam'))
-            self.DECODER_LATBEAM = float(decoder_settings.get('decoder_params', 'lattice_beam'))
-            self.DECODER_ACWT = float(decoder_settings.get('decoder_params', 'acwt'))
-            self.DECODER_FSF = int(decoder_settings.get('decoder_params', 'frame_subsampling_factor'))
-
-            #Prepare "online.conf"
-            self.AM_PATH=self.AM_PATH+"/"+self.AM_FILE_PATH
-            with open(self.AM_PATH+"/conf/online.conf") as f:
-                values = f.readlines()
-                with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
-                    for i in values:
-                        f.write(i)
-                    f.write("--ivector-extraction-config="+self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
-                    f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf")
-
-            #Prepare "ivector_extractor.conf"
-            with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
-                values = f.readlines()
-                with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
-                    for i in values:
-                        f.write(i)
-                    f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
-                    f.write("--cmvn-config="+self.AM_PATH+"/conf/online_cmvn.conf\n")
-                    f.write("--lda-matrix="+self.AM_PATH+"/ivector_extractor/final.mat\n")
-                    f.write("--global-cmvn-stats="+self.AM_PATH+"/ivector_extractor/global_cmvn.stats\n")
-                    f.write("--diag-ubm="+self.AM_PATH+"/ivector_extractor/final.dubm\n")
-                    f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
-            
-            #Prepare "word_boundary.int" if not exist
-            if not os.path.exists(self.LM_PATH+"/word_boundary.int"):
-                if os.path.exists(self.AM_PATH+"phones.txt"):
-                    with open(self.AM_PATH+"phones.txt") as f:
-                        phones = f.readlines()
-
-                    with open(self.LM_PATH+"/word_boundary.int", "w") as f:
-                        for phone in phones:
-                            phone = phone.strip()
-                            phone = re.sub('^<eps> .*','', phone)
-                            phone = re.sub('^#\d+ .*','', phone)
-                            if phone != '':
-                                id = phone.split(' ')[1]
-                                if '_I ' in phone:
-                                    f.write(id+" internal\n")
-                                elif '_B ' in phone:
-                                    f.write(id+" begin\n")
-                                elif '_E ' in phone:
-                                    f.write(id+" end\n")
-                                elif '_S ' in phone:
-                                    f.write(id+" singleton\n")
-                                else:
-                                    f.write(id+" nonword\n")
+    # re-create config files
+    def loadConfig(self):
+        # load decoder parameters from "decode.cfg"
+        decoder_settings = configparser.ConfigParser()
+        if os.path.exists(self.AM_PATH+'/decode.cfg') == False:
+            return False
+        decoder_settings.read(self.AM_PATH+'/decode.cfg')
+
+        # Prepare "online.conf"
+        self.AM_PATH = self.AM_PATH+"/" + \
+            decoder_settings.get('decoder_params', 'ampath')
+        with open(self.AM_PATH+"/conf/online.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--ivector-extraction-config=" +
+                        self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
+                f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf\n")
+                f.write(
+                    "--beam="+decoder_settings.get('decoder_params', 'beam')+"\n")
+                f.write(
+                    "--lattice-beam="+decoder_settings.get('decoder_params', 'lattice_beam')+"\n")
+                f.write("--acoustic-scale=" +
+                        decoder_settings.get('decoder_params', 'acwt')+"\n")
+                f.write(
+                    "--min-active="+decoder_settings.get('decoder_params', 'min_active')+"\n")
+                f.write(
+                    "--max-active="+decoder_settings.get('decoder_params', 'max_active')+"\n")
+                f.write("--frame-subsampling-factor="+decoder_settings.get(
+                    'decoder_params', 'frame_subsampling_factor')+"\n")
+                f.write("--endpoint.rule2.min-trailing-silence=0.5\n")
+                f.write("--endpoint.rule3.min-trailing-silence=1.0\n")
+                f.write("--endpoint.rule4.min-trailing-silence=2.0\n")
+
+        # Prepare "ivector_extractor.conf"
+        with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
+                f.write("--cmvn-config="+self.AM_PATH +
+                        "/conf/online_cmvn.conf\n")
+                f.write("--lda-matrix="+self.AM_PATH +
+                        "/ivector_extractor/final.mat\n")
+                f.write("--global-cmvn-stats="+self.AM_PATH +
+                        "/ivector_extractor/global_cmvn.stats\n")
+                f.write("--diag-ubm="+self.AM_PATH +
+                        "/ivector_extractor/final.dubm\n")
+                f.write("--ivector-extractor="+self.AM_PATH +
+                        "/ivector_extractor/final.ie")
+
+        # Prepare "word_boundary.int" if not exist
+        if not os.path.exists(self.LM_PATH+"/word_boundary.int") and os.path.exists(self.AM_PATH+"/phones.txt"):
+            self.log.info("Create word_boundary.int based on phones.txt")
+            with open(self.AM_PATH+"/phones.txt") as f:
+                phones = f.readlines()
+
+            with open(self.LM_PATH+"/word_boundary.int", "w") as f:
+                for phone in phones:
+                    phone = phone.strip()
+                    phone = re.sub('^<eps> .*', '', phone)
+                    phone = re.sub('^#\d+ .*', '', phone)
+                    if phone != '':
+                        id = phone.split(' ')[1]
+                        if '_I ' in phone:
+                            f.write(id+" internal\n")
+                        elif '_B ' in phone:
+                            f.write(id+" begin\n")
+                        elif '_E ' in phone:
+                            f.write(id+" end\n")
+                        elif '_S ' in phone:
+                            f.write(id+" singleton\n")
+                        else:
+                            f.write(id+" nonword\n")
+
+    # TODO: metadata (timestamps, speakers, save audio)
+    #       return at the end of streaming a json object including word-data, speaker-data
+    #       (get frames after the end of decoding)
+    def process_metadata(self, metadata, spkDiarization, nbrSpk=10):
+        if metadata is not None and 'words' in metadata and 'features' in metadata:
+            if not spkDiarization:
+                del metadata['features']
+                del metadata['segments']
+                return metadata
+
 
+            features = metadata['features']
+            seg = metadata['segments'] if metadata['segments'] is not None else []
+            feats = np.array(features)
+            feats = np.squeeze(feats)
+            mask = np.ones(shape=(feats.shape[0],))
+
+            for pos in seg:
+                mask[pos-30:pos]=0
+
+            spk = SpeakerDiarization()
+            spk.set_maxNrSpeakers(nbrSpk)
+            spkrs = spk.run(feats,mask)
+
+            speaker = []
+            i = 0
+            text = ""
+            for word in metadata['words']:
+                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+                    text += word["word"]  + " "
                 else:
-                    raise ValueError('Neither word_boundary.int nor phones.txt exists!!!')
-        
-        try:
-            # Define online feature pipeline
-            self.log.info("Load decoder config")
-            loadConfig(self)
-            feat_opts = OnlineNnetFeaturePipelineConfig()
-            self.endpoint_opts = OnlineEndpointConfig()
-            po = ParseOptions("")
-            feat_opts.register(po)
-            self.endpoint_opts.register(po)
-            po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
-            self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
-            
-            # Set metadata parameters
-            self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
-            self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
-
-            # Construct recognizer
-            self.log.info("Load Decoder model")
-            decoder_opts = LatticeFasterDecoderOptions()
-            decoder_opts.beam = self.DECODER_BEAM
-            decoder_opts.max_active = self.DECODER_MAXACT
-            decoder_opts.min_active = self.DECODER_MINACT
-            decoder_opts.lattice_beam = self.DECODER_LATBEAM
-            self.decodable_opts = NnetSimpleLoopedComputationOptions()
-            self.decodable_opts.acoustic_scale = self.DECODER_ACWT
-            self.decodable_opts.frame_subsampling_factor = self.DECODER_FSF
-            self.decodable_opts.frames_per_chunk = 150
+                    speaker.append({'spk'+str(int(spkrs[i][2])) : text})
+                    i+=1
+                    text=""
+            speaker.append({'spk'+str(int(spkrs[i][2])) : text})
             
-            # Load Acoustic and graph models and other files
-            self.transition_model, self.acoustic_model = NnetRecognizer.read_model(self.AM_PATH+"/final.mdl")
-            graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
-            self.decoder_graph = LatticeFasterOnlineDecoder(graph, decoder_opts)
-            self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
-            self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
-            del graph, decoder_opts
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("AM and LM loading failed!!! (see logs for more details)")
-
-    def get_sample_rate(self):
-        return self.samp_freq
-
-    def get_frames(self,feat_pipeline):
-        rows = feat_pipeline.num_frames_ready()
-        cols = feat_pipeline.dim()
-        frames = Matrix(rows,cols)
-        feat_pipeline.get_frames(range(rows),frames)
-        return frames[:,:self.feat_info.mfcc_opts.num_ceps], frames[:,self.feat_info.mfcc_opts.num_ceps:]
-        # return feats + ivectors
-        
-    def compute_feat(self,audio):
-        try:
-            feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
-            feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
-            feat_pipeline.input_finished()
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Feature extraction failed!!!")
-        else:
-            return feat_pipeline
-        
-    def decoder(self,feats):
-        try:
-            start_time = time.time()
-            self.log.info("Start Decoding: %s" % (start_time))
-            asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
-                                                    self.symbols, decodable_opts= self.decodable_opts, endpoint_opts=self.endpoint_opts)
-            asr.set_input_pipeline(feats)
-            decode = asr.decode()
-            self.log.info("Decode time in seconds: %s" % (time.time() - start_time))
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Decoder failed to transcribe the input audio!!!")
-        else:
-            return decode
-        
-    def wordTimestamp(self,decode):
-        try:
-            _fst.utils.scale_compact_lattice([[1.0, 0],[0, float(self.DECODER_ACWT)]], decode['lattice'])
-            bestPath = compact_lattice_shortest_path(decode['lattice'])
-            _fst.utils.scale_compact_lattice([[1.0, 0],[0, 1.0/float(self.DECODER_ACWT)]], bestPath)
-            bestLattice = word_align_lattice(bestPath, self.transition_model, self.info, 0)
-            alignment = compact_lattice_to_word_alignment(bestLattice[1])
-            words = _fst.indices_to_symbols(self.symbols, alignment[0])
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Decoder failed to create the word timestamps!!!")
+            metadata["speakers"]=speaker
+
+            # vad = metadata['silweights']
+            # weights = np.zeros(shape=(vad[len(vad)-2]+1,))
+            # id = []
+            # w = []
+            # for i in range(0, len(vad), 2):
+            #     id.append(vad[i])
+            #     w.append(vad[i+1])
+            #     weights[vad[i]] = vad[i+1]
+            # self.log.info(id)
+            # self.log.info(w)
+            # self.log.info(weights)
+
+            del metadata['features']
+            del metadata['segments']
+
+            return metadata
         else:
-            return {
-                "words":words,
-                "start":alignment[1],
-                "dur":alignment[2]
-            }
+            return {'speakers': [], 'text': '', 'words': []}
+
+#    def process_metadata_conversation_manager(self, metadata):
+#        features = metadata['features']
+#        seg = metadata['segments'] if metadata['segments'] is not None else []
+#        feats = np.array(features)
+#        feats = np.squeeze(feats)        
+#        mask = np.ones(shape=(feats.shape[0],))
+#
+#        for pos in seg:
+#            mask[pos-30:pos]=0
+#
+#        spk = SpeakerDiarization()
+#        spk.set_maxNrSpeakers(10)
+#        spkrs = spk.run(feats,mask)
+#
+#        speakers = []
+#        text = []
+#        i = 0
+#        text_ = ""
+#        words=[]
+#        if 'words' in metadata:
+#            for word in metadata['words']:
+#                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+#                    text_ += word["word"]  + " "
+#                    words.append(word)
+#                else:
+#                    speaker = {}
+#                    speaker["btime"]=words[0]["start"]
+#                    speaker["etime"]=words[len(words)-1]["end"]
+#                    speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+#                    speaker["words"]=words
+#
+#                    text.append('spk'+str(int(spkrs[i][2]))+' : '+text_)
+#                    speakers.append(speaker)
+#
+#                    words=[]
+#                    text_=""
+#                    i+=1
+#
+#            speaker = {}
+#            speaker["btime"]=words[0]["start"]
+#            speaker["etime"]=words[len(words)-1]["end"]
+#            speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+#            speaker["words"]=words
+#
+#            text.append('spk'+str(int(spkrs[i][2]))+' : '+text_)
+#            speakers.append(speaker)
+#            return json.dumps({'speakers': speakers, 'text': text})
+#        else:
+#            return json.dumps({'speakers': [], 'text': '', 'words': []})
+
 
 class SpeakerDiarization:
     def __init__(self):
-        self.log = logging.getLogger('__stt-standelone-worker__.SPKDiarization')
-
-       ### MFCC FEATURES PARAMETERS
-        self.frame_length_s=0.025
-        self.frame_shift_s=0.01
-        self.num_bins=40
-        self.num_ceps=40
-        self.low_freq=40
-        self.high_freq=-200
-        #####
+        self.log = logging.getLogger(
+            '__stt-standelone-worker__.SPKDiarization')
 
-        ### VAD PARAMETERS
-        self.vad_ops = VadEnergyOptions()
-        self.vad_ops.vad_energy_mean_scale = 0.9
-        self.vad_ops.vad_energy_threshold = 5
-        #vad_ops.vad_frames_context = 2
-        #vad_ops.vad_proportion_threshold = 0.12
+       # MFCC FEATURES PARAMETERS
+        self.frame_length_s = 0.025
+        self.frame_shift_s = 0.01
+        self.num_bins = 40
+        self.num_ceps = 40
+        self.low_freq = 40
+        self.high_freq = -200
         #####
 
-        ### Segment
-        self.seg_length = 100 # Window size in frames
-        self.seg_increment = 100 # Window increment after and before window in frames
-        self.seg_rate = 100 # Window shifting in frames
+        # Segment
+        self.seg_length = 100  # Window size in frames
+        self.seg_increment = 100  # Window increment after and before window in frames
+        self.seg_rate = 100  # Window shifting in frames
         #####
 
-        ### KBM
-        self.minimumNumberOfInitialGaussians = 1024 # Minimum number of Gaussians in the initial pool
-        self.maximumKBMWindowRate = 50 # Maximum window rate for Gaussian computation
-        self.windowLength = 200 # Window length for computing Gaussians
-        self.kbmSize = 320 # Number of final Gaussian components in the KBM
-        self.useRelativeKBMsize = 1 # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
-        self.relKBMsize = 0.3 # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
+        # KBM
+        # Minimum number of Gaussians in the initial pool
+        self.minimumNumberOfInitialGaussians = 1024
+        self.maximumKBMWindowRate = 50  # Maximum window rate for Gaussian computation
+        self.windowLength = 200  # Window length for computing Gaussians
+        self.kbmSize = 320  # Number of final Gaussian components in the KBM
+        # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
+        self.useRelativeKBMsize = 1
+        # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
+        self.relKBMsize = 0.3
         ######
 
-        ### BINARY_KEY
-        self.topGaussiansPerFrame = 5 # Number of top selected components per frame
-        self.bitsPerSegmentFactor = 0.2 # Percentage of bits set to 1 in the binary keys
+        # BINARY_KEY
+        self.topGaussiansPerFrame = 5  # Number of top selected components per frame
+        self.bitsPerSegmentFactor = 0.2  # Percentage of bits set to 1 in the binary keys
         ######
 
-        ### CLUSTERING
-        self.N_init = 16 # Number of initial clusters
-        self.linkage = 0 # Set to one to perform linkage clustering instead of clustering/reassignment
-        self.linkageCriterion = 'average' # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
-        self.metric = 'cosine' # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
+        # CLUSTERING
+        self.N_init = 16  # Number of initial clusters
+        # Set to one to perform linkage clustering instead of clustering/reassignment
+        self.linkage = 0
+        # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
+        self.linkageCriterion = 'average'
+        # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
+        self.metric = 'cosine'
         ######
 
-        ### CLUSTERING_SELECTION
-        self.metric_clusteringSelection = 'cosine' # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
-        self.bestClusteringCriterion = 'elbow' # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
-        self.sigma = 1 # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
+        # CLUSTERING_SELECTION
+        # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
+        self.metric_clusteringSelection = 'cosine'
+        # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
+        self.bestClusteringCriterion = 'elbow'
+        self.sigma = 1  # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
         self.percentile = 40
-        self.maxNrSpeakers = 16 # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
+        self.maxNrSpeakers = 16  # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
         ######
 
-        ### RESEGMENTATION
-        self.resegmentation = 1 # Set to 1 to perform re-segmentation
-        self.modelSize = 6 # Number of GMM components
-        self.nbIter = 10 # Number of expectation-maximization (EM) iterations
-        self.smoothWin = 100 # Size of the likelihood smoothing window in nb of frames
+        # RESEGMENTATION
+        self.resegmentation = 1  # Set to 1 to perform re-segmentation
+        self.modelSize = 6  # Number of GMM components
+        self.nbIter = 10  # Number of expectation-maximization (EM) iterations
+        self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
         ######
-    
-    def set_maxNrSpeakers(self,nbr):
-        self.maxNrSpeakers = nbr
-    
-    def compute_feat_Librosa(self,audio):
-        try:
-            self.log.info("Start feature extraction: %s" % (time.time()))
-            if audio.sr == 16000:
-                self.low_freq=20
-                self.high_freq=7600
-            data = audio.data/32768
-            frame_length_inSample = self.frame_length_s * audio.sr
-            hop = int(self.frame_shift_s * audio.sr)
-            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
-            mfccNumpy = librosa.feature.mfcc(y=data,
-                                             sr=audio.sr,
-                                             dct_type=2,
-                                             n_mfcc=self.num_ceps,
-                                             n_mels=self.num_bins,
-                                             n_fft=NFFT,
-                                             hop_length=hop,
-                                             fmin=self.low_freq,
-                                             fmax=self.high_freq).T
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed when extracting features!!!")
-        else:
-            return mfccNumpy
-
-    def compute_feat_KALDI(self,audio):
-        try:
-            self.log.info("Start feature extraction: %s" % (time.time()))
-            po = ParseOptions("")
-            mfcc_opts = MfccOptions()
-            mfcc_opts.use_energy = False
-            mfcc_opts.frame_opts.samp_freq = audio.sr
-            mfcc_opts.frame_opts.frame_length_ms = self.frame_length_s*1000
-            mfcc_opts.frame_opts.frame_shift_ms = self.frame_shift_s*1000
-            mfcc_opts.frame_opts.allow_downsample = False
-            mfcc_opts.mel_opts.num_bins = self.num_bins
-            mfcc_opts.mel_opts.low_freq = self.low_freq
-            mfcc_opts.mel_opts.high_freq = self.high_freq
-            mfcc_opts.num_ceps = self.num_ceps
-            mfcc_opts.register(po)
-            
-            # Create MFCC object and obtain sample frequency
-            mfccObj = Mfcc(mfcc_opts)
-            mfccKaldi = mfccObj.compute_features(audio.getDataKaldyVector(), audio.sr, 1.0)
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed while extracting features!!!")
-        else:
-            return mfccKaldi
-        
-    def computeVAD_WEBRTC(self, audio):
-        try:
-            self.log.info("Start VAD: %s" % (time.time()))
-            data = audio.data/32768
-            hop = 30
-            va_framed = py_webrtcvad(data, fs=audio.sr, fs_vad=audio.sr, hoplength=hop, vad_mode=0)
-            segments = get_py_webrtcvad_segments(va_framed,audio.sr)
-            maskSAD = np.zeros([1,nFeatures])
-            for seg in segments:
-                start=int(np.round(seg[0]/frame_shift_s))
-                end=int(np.round(seg[1]/frame_shift_s))
-                maskSAD[0][start:end]=1
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed while voice activity detection!!!")
-        else:
-            return maskSAD
-    
-    def computeVAD_KALDI(self, audio, feats=None):
-        try:
-            self.log.info("Start VAD: %s" % (time.time()))
-            vadStream = compute_vad_energy(self.vad_ops,feats)
-            vad = Vector(vadStream)
-            VAD = vad.numpy()
-                        
-            ### segmentation
-            occurence=[]
-            value=[]
-            occurence.append(1)
-            value.append(VAD[0])
-
-            # compute the speech and non-speech frames
-            for i in range(1,len(VAD)):
-                if value[-1] == VAD[i]:
-                    occurence[-1]+=1
-                else:
-                    occurence.append(1)
-                    value.append(VAD[i])
-
-            # filter the speech and non-speech segments that are below 30 frames
-            i = 0
-            while(i < len(occurence)):
-                if i != 0 and (occurence[i] < 30 or value[i-1] == value[i]):
-                    occurence[i-1] += occurence[i]
-                    del value[i]
-                    del occurence[i]
-                else:
-                    i+=1
 
-            # split if and only if the silence is above 50 frames
-            i = 0
-            while(i < len(occurence)):
-                if i != 0 and ((occurence[i] < 30 and value[i] == 0.0) or value[i-1] == value[i]):
-                    occurence[i-1] += occurence[i]
-                    del value[i]
-                    del occurence[i]
-                else:
-                    i+=1
-            
-            # compute VAD mask
-            maskSAD = np.zeros(len(VAD))
-            start=0
-            for i in range(len(occurence)):
-                if value[i] == 1.0:
-                    end=start+occurence[i]
-                    maskSAD[start:end] = 1
-                    start=end
-                else:
-                    start += occurence[i]
-
-            maskSAD = np.expand_dims(maskSAD, axis=0)
-        except ValueError as v:
-            self.log.error(v)
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed while voice activity detection!!!")
-        else:
-            return maskSAD
+    def set_maxNrSpeakers(self, nbr):
+        self.maxNrSpeakers = nbr
 
-    def run(self, audio, feats=None):
+    def run(self, feats, mask):
         try:
             def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
-                numberOfSpeechFeatures = finalSegmentTable[-1,2].astype(int)+1
-                solutionVector = np.zeros([1,numberOfSpeechFeatures])
-                for i in np.arange(np.size(finalSegmentTable,0)):
-                    solutionVector[0,np.arange(finalSegmentTable[i,1],finalSegmentTable[i,2]+1).astype(int)]=finalClusteringTable[i]
-                seg = np.empty([0,3]) 
+                numberOfSpeechFeatures = finalSegmentTable[-1, 2].astype(int)+1
+                solutionVector = np.zeros([1, numberOfSpeechFeatures])
+                for i in np.arange(np.size(finalSegmentTable, 0)):
+                    solutionVector[0, np.arange(
+                        finalSegmentTable[i, 1], finalSegmentTable[i, 2]+1).astype(int)] = finalClusteringTable[i]
+                seg = np.empty([0, 3])
                 solutionDiff = np.diff(solutionVector)[0]
                 first = 0
-                for i in np.arange(0,np.size(solutionDiff,0)):
+                for i in np.arange(0, np.size(solutionDiff, 0)):
                     if solutionDiff[i]:
                         last = i+1
                         seg1 = (first)*frameshift
                         seg2 = (last-first)*frameshift
-                        seg3 = solutionVector[0,last-1]
+                        seg3 = solutionVector[0, last-1]
                         if seg.shape[0] != 0 and seg3 == seg[-1][2]:
                             seg[-1][1] += seg2
-                        elif seg3 and seg2 > 0.3: # and seg2 > 0.1
-                            seg = np.vstack((seg,[seg1,seg2,seg3]))
+                        elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
+                            seg = np.vstack((seg, [seg1, seg2, seg3]))
                         first = i+1
-                last = np.size(solutionVector,1)
+                last = np.size(solutionVector, 1)
                 seg1 = (first-1)*frameshift
                 seg2 = (last-first+1)*frameshift
-                seg3 = solutionVector[0,last-1]
+                seg3 = solutionVector[0, last-1]
                 if seg3 == seg[-1][2]:
                     seg[-1][1] += seg2
-                elif seg3 and seg2 > 0.3: # and seg2 > 0.1
-                    seg = np.vstack((seg,[seg1,seg2,seg3]))
-                seg = np.vstack((seg,[dur,-1,-1]))
-                seg[0][0]=0.0
+                elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
+                    seg = np.vstack((seg, [seg1, seg2, seg3]))
+                seg = np.vstack((seg, [dur, -1, -1]))
+                seg[0][0] = 0.0
                 return seg
-        
 
             start_time = time.time()
-            self.log.info("Start Speaker Diarization: %s" % (start_time))
-            if self.maxNrSpeakers == 1 or audio.dur < 3:
-                self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
-                return [[0, audio.dur, 1],
-                        [audio.dur, -1, -1]]
-            if feats == None:
-                feats = self.compute_feat_KALDI(audio)
+
             nFeatures = feats.shape[0]
-            maskSAD = self.computeVAD_KALDI(audio,feats)
-            maskUEM = np.ones([1,nFeatures])
+            duration = nFeatures * self.frame_shift_s
+
+            if duration < 5:
+                return [[0, duration, 1],
+                        [duration, -1, -1]]
 
-            mask = np.logical_and(maskUEM,maskSAD)    
+            maskSAD = mask
+            maskUEM = np.ones([1, nFeatures])
+            
+            mask = np.logical_and(maskUEM, maskSAD)
             mask = mask[0][0:nFeatures]
-            nSpeechFeatures=np.sum(mask)
+            nSpeechFeatures = np.sum(mask)
             speechMapping = np.zeros(nFeatures)
-            #you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
-            #so that we don't lose features on the way
-            speechMapping[np.nonzero(mask)] = np.arange(1,nSpeechFeatures+1)
-            data=feats[np.where(mask==1)]
+            # you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
+            # so that we don't lose features on the way
+            speechMapping[np.nonzero(mask)] = np.arange(1, nSpeechFeatures+1)
+            data = feats[np.where(mask == 1)]
             del feats
 
-            segmentTable=getSegmentTable(mask,speechMapping,self.seg_length,self.seg_increment,self.seg_rate)
-            numberOfSegments=np.size(segmentTable,0)
-            #create the KBM
-            #set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
+            segmentTable = getSegmentTable(
+                mask, speechMapping, self.seg_length, self.seg_increment, self.seg_rate)
+            numberOfSegments = np.size(segmentTable, 0)
+            # create the KBM
+            # set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
             if np.floor((nSpeechFeatures-self.windowLength)/self.minimumNumberOfInitialGaussians) < self.maximumKBMWindowRate:
-                windowRate = int(np.floor((np.size(data,0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
+                windowRate = int(np.floor(
+                    (np.size(data, 0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
             else:
                 windowRate = int(self.maximumKBMWindowRate)
-            
+
             if windowRate == 0:
-                raise ValueError('The audio is to short in order to perform the speaker diarization!!!')
-            
+                #self.log.info('The audio is to short in order to perform the speaker diarization!!!')
+                return [[0, duration, 1],
+                        [duration, -1, -1]]
+
             poolSize = np.floor((nSpeechFeatures-self.windowLength)/windowRate)
-            if  self.useRelativeKBMsize:
+            if self.useRelativeKBMsize:
                 kbmSize = int(np.floor(poolSize*self.relKBMsize))
             else:
                 kbmSize = int(self.kbmSize)
-            
-            #Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
-            kbm, gmPool = trainKBM(data,self.windowLength,windowRate,kbmSize)
-            
+
+            # Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
+            kbm, gmPool = trainKBM(
+                data, self.windowLength, windowRate, kbmSize)
+
             #'Selected',kbmSize,'gaussians from the pool'
-            Vg = getVgMatrix(data,gmPool,kbm,self.topGaussiansPerFrame)
-            
+            Vg = getVgMatrix(data, gmPool, kbm, self.topGaussiansPerFrame)
+
             #'Computing binary keys for all segments... '
-            segmentBKTable, segmentCVTable = getSegmentBKs(segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
-            
+            segmentBKTable, segmentCVTable = getSegmentBKs(
+                segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
+
             #'Performing initial clustering... '
-            initialClustering = np.digitize(np.arange(numberOfSegments),np.arange(0,numberOfSegments,numberOfSegments/self.N_init))
-            
-            
+            initialClustering = np.digitize(np.arange(numberOfSegments), np.arange(
+                0, numberOfSegments, numberOfSegments/self.N_init))
+
             #'Performing agglomerative clustering... '
             if self.linkage:
-                finalClusteringTable, k = performClusteringLinkage(segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
+                finalClusteringTable, k = performClusteringLinkage(
+                    segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
             else:
-                finalClusteringTable, k = performClustering(speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
+                finalClusteringTable, k = performClustering(
+                    speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
 
             #'Selecting best clustering...'
             if self.bestClusteringCriterion == 'elbow':
-                bestClusteringID = getBestClustering(self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
+                bestClusteringID = getBestClustering(
+                    self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
             elif self.bestClusteringCriterion == 'spectral':
-                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection,finalClusteringTable,self.N_init,segmentBKTable,segmentCVTable,k,self.sigma,self.percentile,self.maxNrSpeakers)+1
-                
-            if self.resegmentation and np.size(np.unique(finalClusteringTable[:,bestClusteringID.astype(int)-1]),0)>1:
-                finalClusteringTableResegmentation,finalSegmentTable = performResegmentation(data,speechMapping, mask,finalClusteringTable[:,bestClusteringID.astype(int)-1],segmentTable,self.modelSize,self.nbIter,self.smoothWin,nSpeechFeatures)
-                seg = getSegments(self.frame_shift_s,finalSegmentTable, np.squeeze(finalClusteringTableResegmentation), audio.dur)
+                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection, finalClusteringTable,
+                                                         self.N_init, segmentBKTable, segmentCVTable, k, self.sigma, self.percentile, self.maxNrSpeakers)+1
+
+            if self.resegmentation and np.size(np.unique(finalClusteringTable[:, bestClusteringID.astype(int)-1]), 0) > 1:
+                finalClusteringTableResegmentation, finalSegmentTable = performResegmentation(data, speechMapping, mask, finalClusteringTable[:, bestClusteringID.astype(
+                    int)-1], segmentTable, self.modelSize, self.nbIter, self.smoothWin, nSpeechFeatures)
+                seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(
+                    finalClusteringTableResegmentation), duration)
             else:
-                seg = getSegmentationFile(self.frame_shift_s,segmentTable, finalClusteringTable[:,bestClusteringID.astype(int)-1])
+                return None
+
             self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
         except ValueError as v:
             self.log.info(v)
-            return [[0, audio.dur, 1],
-                    [audio.dur, -1, -1]]
+            return [[0, duration, 1],
+                    [duration, -1, -1]]
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker Diarization failed!!!")
+            return None
         else:
             return seg
-        
-class SttStandelone:
-    def __init__(self,metadata=False,spkDiarization=False):
-        self.log = logging.getLogger('__stt-standelone-worker__.SttStandelone')
-        self.metadata = metadata
-        self.spkDiarization = spkDiarization
-        self.timestamp = True if self.metadata or self.spkDiarization else False
-        
-    def run(self,audio,asr,spk):
-        feats = asr.compute_feat(audio)
-        mfcc, ivector = asr.get_frames(feats)
-        if self.spkDiarization:
-            with ThreadPoolExecutor(max_workers=2) as executor:
-                thrd1 = executor.submit(asr.decoder, feats)
-                thrd2 = executor.submit(spk.run, audio, mfcc)
-                decode = thrd1.result()
-                spkSeg = thrd2.result()
-        else:
-            decode = asr.decoder(feats)
-            spkSeg = []
-        
-        if self.timestamp:
-            timestamps = asr.wordTimestamp(decode)
-            output = self.getOutput(timestamps,asr.frame_shift, asr.decodable_opts.frame_subsampling_factor,spkSeg)
-            if self.metadata:
-                return output
-            else:
-                return {"text":output["text"]}
-        else:
-            text = re.sub(r"#nonterm:[^ ]* ", "", decode["text"])
-            return text
-
-    def getOutput(self,timestamps,frame_shift, frame_subsampling, spkSeg = []):
-        output = {}
-        if len(spkSeg) == 0:
-            text = ""
-            output["words"] = []
-            for i in range(len(timestamps["words"])):
-                if timestamps["words"][i] != "<eps>":
-                    meta = {}
-                    meta["word"] = timestamps["words"][i]
-                    meta["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
-                    meta["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
-                    output["words"].append(meta)
-                    text += " "+meta["word"]
-            output["text"] = text
-        else:
-            output["speakers"] = []
-            output["text"] = []
-            j = 0
-            newSpk = 1
-            for i in range(len(timestamps["words"])):
-                if timestamps["words"][i] != "<eps>":
-                    if newSpk:
-                        speaker = {}
-                        speaker["speaker_id"] = "spk_"+str(int(spkSeg[j][2]))
-                        speaker["words"] = []
-                        txtSpk = speaker["speaker_id"]+":"
-                        newSpk = 0
-                    word = {}
-                    word["word"] = timestamps["words"][i]
-                    word["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
-                    word["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
-                    speaker["words"].append(word)
-                    txtSpk += " "+word["word"]
-                    if word["etime"] > spkSeg[j+1][0]:
-                        speaker["btime"] = speaker["words"][0]["btime"]
-                        speaker["etime"] = speaker["words"][-1]["etime"]
-                        output["speakers"].append(speaker)
-                        output["text"].append(txtSpk)
-                        newSpk = 1
-                        j += 1
-            #add the last speaker to the output speakers
-            speaker["btime"] = speaker["words"][0]["btime"]
-            speaker["etime"] = speaker["words"][-1]["etime"]
-            output["speakers"].append(speaker)
-            output["text"].append(txtSpk)
-        return output
-        
-        
-class Audio:
-    def __init__(self,sr):
-        self.log = logging.getLogger('__stt-standelone-worker__.Audio')
-        self.bit = 16
-        self.channels = 1
-        self.sr = sr
-    
-    def set_logger(self,log):
-        self.log = log
-    
-    def transform(self,file_name):
-        try:
-            tfm = sox.Transformer()
-            tfm.set_output_format(rate=self.sr,
-                                  bits=self.bit,
-                                  channels=self.channels)
-            self.data = tfm.build_array(input_filepath=file_name)
-            self.dur = len(self.data) / self.sr
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("The uploaded file format is not supported!!!")
-    
-    def getDataKaldyVector(self):
-        return Vector(self.data)
\ No newline at end of file
diff --git a/vosk-api b/vosk-api
new file mode 160000
index 0000000..fec4a1a
--- /dev/null
+++ b/vosk-api
@@ -0,0 +1 @@
+Subproject commit fec4a1ad76a3c2e66bad84acd5cead2070b3d1b6

From 58fbcecea9f25487ab73dfe36411aa58599a6b45 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 25 Sep 2020 03:16:45 +0200
Subject: [PATCH 021/172] update submodules

---
 .gitmodules | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.gitmodules b/.gitmodules
index 9cea8d6..b131dc4 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "vosk-api"]
 	path = vosk-api
-	url = git@github.com:irebai/vosk-api.git
+	url = https://github.com/irebai/vosk-api.git
 [submodule "pyBK"]
 	path = pyBK
-	url = git@github.com:irebai/pyBK.git
+	url = https://github.com/irebai/pyBK.git

From bcb5cc208602ebad9877226cab46cacb60b2d965 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 25 Sep 2020 14:05:33 +0200
Subject: [PATCH 022/172] add audio file exception

---
 tools.py | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/tools.py b/tools.py
index 8cc3715..8f21607 100644
--- a/tools.py
+++ b/tools.py
@@ -81,13 +81,15 @@ def swaggerUI(self, app):
 
 
     def getAudio(self,file):
-        file_path = self.TEMP_FILE_PATH+"/"+file.filename.lower()
-        file.save(file_path)
-        self.rate, self.data = scipy.io.wavfile.read(file_path)
-        
-        if not self.SAVE_AUDIO:
-            os.remove(file_path)
-    
+        try:
+            file_path = self.TEMP_FILE_PATH+"/"+file.filename.lower()
+            file.save(file_path)
+            self.rate, self.data = scipy.io.wavfile.read(file_path)
+            if not self.SAVE_AUDIO:
+                os.remove(file_path)
+        except Exception as e:
+            raise ValueError('Unsupported audio file! Only WAVE format is supported.')
+
     # re-create config files
     def loadConfig(self):
         # load decoder parameters from "decode.cfg"

From a671b6cac69cde158631e4d6252ada5443b4adfc Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 30 Sep 2020 16:53:19 +0200
Subject: [PATCH 023/172] update the response format

---
 document/swagger.yml |  15 +---
 run.py               |  41 +++-------
 tools.py             | 187 +++++++++++++++++++------------------------
 3 files changed, 92 insertions(+), 151 deletions(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index 57e818f..8a93b7c 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,22 +24,9 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, aiff, flac, ogg)"
+        description: "Audio File - Waveform Format"
         required: true
         type: "file"
-      - name: "nbrSpeaker"
-        in: "formData"
-        description: "Number of speakers in the audio"
-        required: false
-        type: "number"
-        default: 1
-      - name: "speaker"
-        in: "formData"
-        description: "Do speaker diarization"
-        required: false
-        type: "string"
-        enum: [ "Yes", "No" ]
-        default: "No"
       responses:
         200:
           description: Successfully transcribe the audio
diff --git a/run.py b/run.py
index a95cf47..f4dc9c2 100644
--- a/run.py
+++ b/run.py
@@ -28,51 +28,30 @@ def transcribe():
         worker.log.info('[%s] New user entry on /transcribe' %
                         (strftime("%d/%b/%d %H:%M:%S", gmtime())))
 
-        metadata = worker.METADATA
-        nbrSpk = 10
+        is_metadata = False
+        nbrOfSpk = 10
 
         # get response content type
         if request.headers.get('accept').lower() == 'application/json':
-            metadata = True
+            is_metadata = True
         elif request.headers.get('accept').lower() == 'text/plain':
-            metadata = False
+            is_metadata = False
         else:
             raise ValueError('Not accepted header')
 
-        # get speaker parameter
-        spkDiarization = False
-        if request.form.get('speaker') != None and (request.form.get('speaker').lower() == 'yes' or request.form.get('speaker').lower() == 'no'):
-            spkDiarization = True if request.form.get(
-                'speaker').lower() == 'yes' else False
-            # get number of speakers parameter
-            try:
-                if request.form.get('nbrSpeaker') != None and spkDiarization and int(request.form.get('nbrSpeaker')) > 0:
-                    nbrSpk = int(request.form.get('nbrSpeaker'))
-                elif request.form.get('nbrSpeaker') != None and spkDiarization:
-                    raise ValueError(
-                        'Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
-            except Exception as e:
-                worker.log.error(e)
-                raise ValueError(
-                    'Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
-        else:
-            if request.form.get('speaker') != None:
-                raise ValueError('Not accepted "speaker" field value (yes|no)')
-
         # get input file
         if 'file' in request.files.keys():
             file = request.files['file']
             worker.getAudio(file)
-            rec = KaldiRecognizer(model, worker.rate, metadata)
-            response = rec.Decode(worker.data)
-            if metadata:
-                obj = rec.GetMetadata()
-                data = json.loads(obj)
-                response = worker.process_metadata(data, spkDiarization, nbrSpk)
+            rec = KaldiRecognizer(model, worker.rate, is_metadata)
+            data_ = rec.Decode(worker.data)
+            if is_metadata:
+                data_ = rec.GetMetadata()
+            data = worker.get_response(data_, is_metadata, is_metadata, nbrOfSpk)
         else:
             raise ValueError('No audio file was uploaded')
 
-        return response, 200
+        return data, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
diff --git a/tools.py b/tools.py
index 8f21607..05fe391 100644
--- a/tools.py
+++ b/tools.py
@@ -39,7 +39,6 @@ def __init__(self):
         self.SAVE_AUDIO=False
         self.SERVICE_PORT = 80
         self.NBR_THREADS = 100
-        self.METADATA = True
         self.SWAGGER_URL = '/api-doc'
         self.SWAGGER_PATH = ''
 
@@ -64,7 +63,6 @@ def __init__(self):
         self.log.info("Create the new config files")
         self.loadConfig()
 
-
     def swaggerUI(self, app):
         ### swagger specific ###
         swagger_yml = yaml.load(open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
@@ -79,7 +77,6 @@ def swaggerUI(self, app):
         app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
         ### end swagger specific ###
 
-
     def getAudio(self,file):
         try:
             file_path = self.TEMP_FILE_PATH+"/"+file.filename.lower()
@@ -167,112 +164,90 @@ def loadConfig(self):
                         else:
                             f.write(id+" nonword\n")
 
-    # TODO: metadata (timestamps, speakers, save audio)
-    #       return at the end of streaming a json object including word-data, speaker-data
-    #       (get frames after the end of decoding)
-    def process_metadata(self, metadata, spkDiarization, nbrSpk=10):
-        if metadata is not None and 'words' in metadata and 'features' in metadata:
-            if not spkDiarization:
-                del metadata['features']
-                del metadata['segments']
-                return metadata
-
-
-            features = metadata['features']
-            seg = metadata['segments'] if metadata['segments'] is not None else []
-            feats = np.array(features)
-            feats = np.squeeze(feats)
-            mask = np.ones(shape=(feats.shape[0],))
-
-            for pos in seg:
-                mask[pos-30:pos]=0
-
-            spk = SpeakerDiarization()
-            spk.set_maxNrSpeakers(nbrSpk)
-            spkrs = spk.run(feats,mask)
-
-            speaker = []
-            i = 0
-            text = ""
-            for word in metadata['words']:
-                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-                    text += word["word"]  + " "
-                else:
-                    speaker.append({'spk'+str(int(spkrs[i][2])) : text})
-                    i+=1
-                    text=""
-            speaker.append({'spk'+str(int(spkrs[i][2])) : text})
-            
-            metadata["speakers"]=speaker
-
-            # vad = metadata['silweights']
-            # weights = np.zeros(shape=(vad[len(vad)-2]+1,))
-            # id = []
-            # w = []
-            # for i in range(0, len(vad), 2):
-            #     id.append(vad[i])
-            #     w.append(vad[i+1])
-            #     weights[vad[i]] = vad[i+1]
-            # self.log.info(id)
-            # self.log.info(w)
-            # self.log.info(weights)
-
-            del metadata['features']
-            del metadata['segments']
-
-            return metadata
+    # remove extra symbols
+    def parse_text(self, text):
+        text = re.sub(r"<unk>", "", text) # remove <unk> symbol
+        text = re.sub(r"#nonterm:[^ ]* ", "", text) # remove entity's mark
+        text = re.sub(r"' ", "'", text) # remove space after quote '
+        text = re.sub(r" +", " ", text) # remove multiple spaces
+        text = text.strip()
+        return text
+
+    # Postprocess response
+    def get_response(self, dataJson, is_metadata, is_spkDiarization, nbrOfSpk):
+        if dataJson is not None:
+            data = json.loads(dataJson)
+            if not is_metadata:
+                text = data['text'] # get text from response
+                return self.parse_text(text)
+
+            elif 'words' in data and 'features' in data:
+                if is_spkDiarization:
+                    # Get Features and spoken segments and clean data
+                    features = data['features']
+                    seg = data['segments'] if data['segments'] is not None else []
+                    del data['features']
+                    del data['segments']
+
+                    # Prepare the parameters for SpeakerDiarization input
+                    feats = np.array(features)
+                    feats = np.squeeze(feats)
+                    mask = np.ones(shape=(feats.shape[0],))
+                    for pos in seg:
+                        mask[pos-30:pos]=0
+
+                    # Do speaker diarization and get speaker segments
+                    spk = SpeakerDiarization()
+                    spk.set_maxNrSpeakers(nbrOfSpk)
+                    spkrs = spk.run(feats,mask)
+
+                    # Generate final output data
+                    return self.process_output(data, spkrs)
+
+                del data['features']
+                del data['segments']
+                return data
+            else:
+                return {'speakers': [], 'text': '', 'words': []}
         else:
             return {'speakers': [], 'text': '', 'words': []}
 
-#    def process_metadata_conversation_manager(self, metadata):
-#        features = metadata['features']
-#        seg = metadata['segments'] if metadata['segments'] is not None else []
-#        feats = np.array(features)
-#        feats = np.squeeze(feats)        
-#        mask = np.ones(shape=(feats.shape[0],))
-#
-#        for pos in seg:
-#            mask[pos-30:pos]=0
-#
-#        spk = SpeakerDiarization()
-#        spk.set_maxNrSpeakers(10)
-#        spkrs = spk.run(feats,mask)
-#
-#        speakers = []
-#        text = []
-#        i = 0
-#        text_ = ""
-#        words=[]
-#        if 'words' in metadata:
-#            for word in metadata['words']:
-#                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-#                    text_ += word["word"]  + " "
-#                    words.append(word)
-#                else:
-#                    speaker = {}
-#                    speaker["btime"]=words[0]["start"]
-#                    speaker["etime"]=words[len(words)-1]["end"]
-#                    speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-#                    speaker["words"]=words
-#
-#                    text.append('spk'+str(int(spkrs[i][2]))+' : '+text_)
-#                    speakers.append(speaker)
-#
-#                    words=[]
-#                    text_=""
-#                    i+=1
-#
-#            speaker = {}
-#            speaker["btime"]=words[0]["start"]
-#            speaker["etime"]=words[len(words)-1]["end"]
-#            speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-#            speaker["words"]=words
-#
-#            text.append('spk'+str(int(spkrs[i][2]))+' : '+text_)
-#            speakers.append(speaker)
-#            return json.dumps({'speakers': speakers, 'text': text})
-#        else:
-#            return json.dumps({'speakers': [], 'text': '', 'words': []})
+
+    # return a json object including word-data, speaker-data
+    def process_output(self, data, spkrs):
+        speakers = []
+        text = []
+        i = 0
+        text_ = ""
+        words=[]
+        for word in data['words']:
+            if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+                text_ += word["word"]  + " "
+                words.append(word)
+            else:
+                speaker = {}
+                speaker["start"]=words[0]["start"]
+                speaker["end"]=words[len(words)-1]["end"]
+                speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+                speaker["words"]=words
+
+                text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+                speakers.append(speaker)
+
+                words=[word]
+                text_=word["word"] + " "
+                i+=1
+
+        speaker = {}
+        speaker["start"]=words[0]["start"]
+        speaker["end"]=words[len(words)-1]["end"]
+        speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+        speaker["words"]=words
+
+        text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+        speakers.append(speaker)
+
+        return {'speakers': speakers, 'text': text}
 
 
 class SpeakerDiarization:

From bc939cf3f119a12d04833d76c50b557ea7de1e6c Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 1 Oct 2020 14:01:43 +0200
Subject: [PATCH 024/172] update installation and remove healthcheck API

---
 Dockerfile  | 3 ---
 Jenkinsfile | 2 --
 run.py      | 5 -----
 3 files changed, 10 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 604bdb7..1c6f518 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -72,9 +72,6 @@ WORKDIR /usr/src/speech-to-text
 # Install main service packages
 RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml
 
-# Set environment variables
-ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
-
 COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py .
 COPY run.py .
diff --git a/Jenkinsfile b/Jenkinsfile
index 5f464c5..b4bdffc 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -24,7 +24,6 @@ pipeline {
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push("${VERSION}")
                         image.push('latest')
-                        image.push('offline')
                     }
                 }
             }
@@ -44,7 +43,6 @@ pipeline {
                     ).trim()
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push('latest-unstable')
-                        image.push('offline')
                     }
                 }
             }
diff --git a/run.py b/run.py
index f4dc9c2..b548209 100644
--- a/run.py
+++ b/run.py
@@ -59,13 +59,8 @@ def transcribe():
         return 'Server Error', 500
 
 
-@app.route('/healthcheck', methods=['GET'])
-def check():
-    return '', 200
 
 # Rejected request handlers
-
-
 @app.errorhandler(405)
 def method_not_allowed(error):
     return 'The method is not allowed for the requested URL', 405

From c07e43a4e6002d2269cafb793c350e60a5cf2f2d Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 2 Oct 2020 19:21:29 +0200
Subject: [PATCH 025/172] update audio file reader and change the production
 server

---
 Dockerfile |  2 +-
 run.py     | 13 ++++++-------
 tools.py   | 14 ++++++++------
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 6608943..6fa563b 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -115,7 +115,7 @@ RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
 WORKDIR /usr/src/speech-to-text
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger librosa webrtcvad scipy sklearn
+RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger librosa webrtcvad scipy sklearn gevent
 RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git \
     && git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
     && cp /pykaldi/tools/pyBK/diarizationFunctions.py .
diff --git a/run.py b/run.py
index ecdbb18..1195d28 100755
--- a/run.py
+++ b/run.py
@@ -7,12 +7,13 @@
 from tools import ASR, Audio, SpeakerDiarization, SttStandelone
 import yaml, os, sox, logging
 from time import gmtime, strftime
+from gevent.pywsgi import WSGIServer
 
 app = Flask("__stt-standelone-worker__")
 
 # Set logger config
 logger = logging.getLogger(__name__)
-logging.basicConfig(level=logging.DEBUG)
+logging.basicConfig(level=logging.INFO)
 
 # Main parameters
 AM_PATH = '/opt/models/AM'
@@ -61,7 +62,7 @@ def swaggerUI():
 def getAudio(file,audio):
     file_path = TEMP_FILE_PATH+file.filename.lower()
     file.save(file_path)
-    audio.transform(file_path)
+    audio.read_audio(file_path)
     if not SAVE_AUDIO:
         os.remove(file_path)
     
@@ -116,10 +117,6 @@ def transcribe():
         app.logger.error(e)
         return 'Server Error', 500
 
-@app.route('/healthcheck', methods=['GET'])
-def check():
-    return '', 200
-
 # Rejected request handlers
 @app.errorhandler(405)
 def method_not_allowed(error):
@@ -144,7 +141,9 @@ def server_error(error):
         asr.run()
 
         #Run server
-        app.run(host='0.0.0.0', port=SERVICE_PORT, debug=False, threaded=False, processes=NBR_PROCESSES)
+        app.logger.info('Server ready for transcription...')
+        http_server = WSGIServer(('', SERVICE_PORT), app)
+        http_server.serve_forever()
     except Exception as e:
         app.logger.error(e)
         exit(e)
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 168bb1a..d021a58 100644
--- a/tools.py
+++ b/tools.py
@@ -37,6 +37,7 @@
 ## other packages
 import configparser, sys, os, re, sox, time, logging
 from concurrent.futures import ThreadPoolExecutor
+import scipy.io.wavfile
 ##############
 
 class ASR:
@@ -602,13 +603,14 @@ def __init__(self,sr):
     def set_logger(self,log):
         self.log = log
     
-    def transform(self,file_name):
+    def read_audio(self, audio):
         try:
-            tfm = sox.Transformer()
-            tfm.set_output_format(rate=self.sr,
-                                  bits=self.bit,
-                                  channels=self.channels)
-            self.data = tfm.build_array(input_filepath=file_name)
+            data, sr = librosa.load(audio,sr=None)
+            if sr != self.sr:
+                self.log.info('Resample audio file: '+str(sr)+'Hz -> '+str(self.sr)+'Hz')
+                data = librosa.resample(data, sr, self.sr)
+            data = (data * 32767).astype(np.int16)
+            self.data = data
             self.dur = len(self.data) / self.sr
         except Exception as e:
             self.log.error(e)

From 25b0e0d236b85e74cd8c5c0f8c1f97f41fa71e95 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Sat, 3 Oct 2020 15:52:23 +0200
Subject: [PATCH 026/172] clean code

---
 .envdefault          |   3 +-
 Dockerfile           |   5 +-
 document/swagger.yml |  15 +-
 run.py               | 101 +-----
 tools.py             | 811 ++++++++++++++++++++++---------------------
 5 files changed, 438 insertions(+), 497 deletions(-)

diff --git a/.envdefault b/.envdefault
index 80acea5..2246e24 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,4 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=/path/to/swagger/file
-NBR_PROCESSES=1
\ No newline at end of file
+SWAGGER_PATH=/path/to/swagger/file
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
index 6fa563b..5e9f2fe 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -115,9 +115,8 @@ RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
 WORKDIR /usr/src/speech-to-text
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui configparser pyyaml logger librosa webrtcvad scipy sklearn gevent
-RUN apt-get install -y libsox-fmt-all && pip3 install git+https://github.com/rabitt/pysox.git \
-    && git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
+RUN pip3 install flask flask-cors flask-swagger-ui pyyaml librosa gevent
+RUN git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
     && cp /pykaldi/tools/pyBK/diarizationFunctions.py .
 
 # Set environment variables
diff --git a/document/swagger.yml b/document/swagger.yml
index 57e818f..e763d3b 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,22 +24,9 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, aiff, flac, ogg)"
+        description: "Audio File (wav, mp3, flac, ogg)"
         required: true
         type: "file"
-      - name: "nbrSpeaker"
-        in: "formData"
-        description: "Number of speakers in the audio"
-        required: false
-        type: "number"
-        default: 1
-      - name: "speaker"
-        in: "formData"
-        description: "Do speaker diarization"
-        required: false
-        type: "string"
-        enum: [ "Yes", "No" ]
-        default: "No"
       responses:
         200:
           description: Successfully transcribe the audio
diff --git a/run.py b/run.py
index 1195d28..e485961 100755
--- a/run.py
+++ b/run.py
@@ -2,77 +2,24 @@
 # -*- coding: utf-8 -*-
 
 from flask import Flask, request, abort, Response, json
-from flask_swagger_ui import get_swaggerui_blueprint
-from flask_cors import CORS
-from tools import ASR, Audio, SpeakerDiarization, SttStandelone
-import yaml, os, sox, logging
+from tools import ASR, SttStandelone
 from time import gmtime, strftime
 from gevent.pywsgi import WSGIServer
+import os
 
 app = Flask("__stt-standelone-worker__")
 
-# Set logger config
-logger = logging.getLogger(__name__)
-logging.basicConfig(level=logging.INFO)
+stt = SttStandelone()
 
-# Main parameters
-AM_PATH = '/opt/models/AM'
-LM_PATH = '/opt/models/LM'
-TEMP_FILE_PATH = '/opt/tmp'
-CONFIG_FILES_PATH = '/opt/config'
-NBR_PROCESSES = 1
-SAVE_AUDIO = False
-SERVICE_PORT = 80
-SWAGGER_URL = '/api-doc'
-SWAGGER_PATH = ''
-asr = ASR(AM_PATH,LM_PATH, CONFIG_FILES_PATH)
+# Load ASR models (acoustic model and decoding graph)
+stt.log.info('Load acoustic model and decoding graph')
+asr = ASR(stt.AM_PATH, stt.LM_PATH, stt.CONFIG_FILES_PATH)
 
-if not os.path.isdir(TEMP_FILE_PATH):
-    os.mkdir(TEMP_FILE_PATH)
-if not os.path.isdir(CONFIG_FILES_PATH):
-    os.mkdir(CONFIG_FILES_PATH)
 
-# Environment parameters
-if 'SERVICE_PORT' in os.environ:
-    SERVICE_PORT = os.environ['SERVICE_PORT']
-if 'SAVE_AUDIO' in os.environ:
-    SAVE_AUDIO = os.environ['SAVE_AUDIO']
-if 'NBR_PROCESSES' in os.environ:
-    if int(os.environ['NBR_PROCESSES']) > 0:
-        NBR_PROCESSES = int(os.environ['NBR_PROCESSES'])
-    else:
-        exit("You must to provide a positif number of processes 'NBR_PROCESSES'")
-if 'SWAGGER_PATH' in os.environ:
-    SWAGGER_PATH = os.environ['SWAGGER_PATH']
-
-def swaggerUI():
-    ### swagger specific ###
-    swagger_yml = yaml.load(open(SWAGGER_PATH, 'r'), Loader=yaml.Loader)
-    swaggerui = get_swaggerui_blueprint(
-        SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
-        SWAGGER_PATH,
-        config={  # Swagger UI config overrides
-            'app_name': "STT API Documentation",
-            'spec': swagger_yml
-        }
-    )
-    app.register_blueprint(swaggerui, url_prefix=SWAGGER_URL)
-    ### end swagger specific ###
-
-def getAudio(file,audio):
-    file_path = TEMP_FILE_PATH+file.filename.lower()
-    file.save(file_path)
-    audio.read_audio(file_path)
-    if not SAVE_AUDIO:
-        os.remove(file_path)
-    
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
-        app.logger.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
-        # create main objects
-        spk = SpeakerDiarization()
-        audio = Audio(asr.get_sample_rate())
+        stt.log.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
         
         #get response content type
         metadata = False
@@ -83,30 +30,11 @@ def transcribe():
         else:
             raise ValueError('Not accepted header')
 
-        #get speaker parameter
-        spkDiarization = False
-        if request.form.get('speaker') != None and (request.form.get('speaker').lower() == 'yes' or request.form.get('speaker').lower() == 'no'):
-            spkDiarization = True if request.form.get('speaker').lower() == 'yes' else False
-            #get number of speakers parameter
-            try:
-                if request.form.get('nbrSpeaker') != None and spkDiarization and int(request.form.get('nbrSpeaker')) > 0:
-                    spk.set_maxNrSpeakers(int(request.form.get('nbrSpeaker')))
-                elif request.form.get('nbrSpeaker') != None and spkDiarization:
-                    raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
-            except Exception as e:
-                app.logger.error(e)
-                raise ValueError('Not accepted "nbrSpeaker" field value (nbrSpeaker>0)')
-        else:
-            if request.form.get('speaker') != None:
-                raise ValueError('Not accepted "speaker" field value (yes|no)')
-
-        stt = SttStandelone(metadata,spkDiarization)
-        
         #get input file
         if 'file' in request.files.keys():
             file = request.files['file']
-            getAudio(file,audio)
-            output = stt.run(audio,asr,spk)
+            stt.read_audio(file,asr.get_sample_rate())
+            output = stt.run(asr, metadata)
         else:
             raise ValueError('No audio file was uploaded')
 
@@ -133,16 +61,13 @@ def server_error(error):
 
 if __name__ == '__main__':
     try:
-        #start SwaggerUI
-        if SWAGGER_PATH != '':
-            swaggerUI()
-
-        #Run ASR engine
-        asr.run()
+        # start SwaggerUI
+        if os.path.exists(stt.SWAGGER_PATH):
+            stt.swaggerUI(app)
 
         #Run server
         app.logger.info('Server ready for transcription...')
-        http_server = WSGIServer(('', SERVICE_PORT), app)
+        http_server = WSGIServer(('', stt.SERVICE_PORT), app)
         http_server.serve_forever()
     except Exception as e:
         app.logger.error(e)
diff --git a/tools.py b/tools.py
index d021a58..cfb6117 100644
--- a/tools.py
+++ b/tools.py
@@ -1,4 +1,4 @@
-## Kaldi ASR decoder
+# Kaldi ASR decoder
 from kaldi.asr import NnetLatticeFasterOnlineRecognizer
 from kaldi.decoder import (LatticeFasterDecoderOptions,
                            LatticeFasterOnlineDecoder)
@@ -14,17 +14,17 @@
 from kaldi.matrix import Matrix, Vector
 ##############
 
-## word to CTM
+# word to CTM
 from kaldi.lat.align import (WordBoundaryInfoNewOpts,
-                            WordBoundaryInfo,
-                            word_align_lattice)
+                             WordBoundaryInfo,
+                             word_align_lattice)
 from kaldi.lat.functions import (compact_lattice_to_word_alignment,
                                  compact_lattice_shortest_path)
 from kaldi.asr import NnetRecognizer
 import kaldi.fstext as _fst
 ##############
 
-## Speaker Diarization
+# Speaker Diarization
 from diarizationFunctions import *
 import numpy as np
 import librosa
@@ -34,191 +34,152 @@
 from kaldi.util.options import ParseOptions
 ##############
 
-## other packages
-import configparser, sys, os, re, sox, time, logging
-from concurrent.futures import ThreadPoolExecutor
-import scipy.io.wavfile
+# other packages
+import configparser, sys, os, re, time, logging, yaml
+from flask_swagger_ui import get_swaggerui_blueprint
 ##############
 
+
 class ASR:
     def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
         self.log = logging.getLogger('__stt-standelone-worker__.ASR')
         self.AM_PATH = AM_PATH
         self.LM_PATH = LM_PATH
         self.CONFIG_FILES_PATH = CONFIG_FILES_PATH
-    
-    def run(self):
-        def loadConfig(self):
-            #get decoder parameters from "decode.cfg"
-            decoder_settings = configparser.ConfigParser()
-            decoder_settings.read(self.AM_PATH+'/decode.cfg')
-            self.DECODER_SYS = decoder_settings.get('decoder_params', 'decoder')
-            self.AM_FILE_PATH = decoder_settings.get('decoder_params', 'ampath')
-            self.DECODER_MINACT = int(decoder_settings.get('decoder_params', 'min_active'))
-            self.DECODER_MAXACT = int(decoder_settings.get('decoder_params', 'max_active'))
-            self.DECODER_BEAM = float(decoder_settings.get('decoder_params', 'beam'))
-            self.DECODER_LATBEAM = float(decoder_settings.get('decoder_params', 'lattice_beam'))
-            self.DECODER_ACWT = float(decoder_settings.get('decoder_params', 'acwt'))
-            self.DECODER_FSF = int(decoder_settings.get('decoder_params', 'frame_subsampling_factor'))
-
-            #Prepare "online.conf"
-            self.AM_PATH=self.AM_PATH+"/"+self.AM_FILE_PATH
-            with open(self.AM_PATH+"/conf/online.conf") as f:
-                values = f.readlines()
-                with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
-                    for i in values:
-                        f.write(i)
-                    f.write("--ivector-extraction-config="+self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
-                    f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf")
-
-            #Prepare "ivector_extractor.conf"
-            with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
-                values = f.readlines()
-                with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
-                    for i in values:
-                        f.write(i)
-                    f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
-                    f.write("--cmvn-config="+self.AM_PATH+"/conf/online_cmvn.conf\n")
-                    f.write("--lda-matrix="+self.AM_PATH+"/ivector_extractor/final.mat\n")
-                    f.write("--global-cmvn-stats="+self.AM_PATH+"/ivector_extractor/global_cmvn.stats\n")
-                    f.write("--diag-ubm="+self.AM_PATH+"/ivector_extractor/final.dubm\n")
-                    f.write("--ivector-extractor="+self.AM_PATH+"/ivector_extractor/final.ie")
-            
-            #Prepare "word_boundary.int" if not exist
-            if not os.path.exists(self.LM_PATH+"/word_boundary.int"):
-                if os.path.exists(self.AM_PATH+"phones.txt"):
-                    with open(self.AM_PATH+"phones.txt") as f:
-                        phones = f.readlines()
-
-                    with open(self.LM_PATH+"/word_boundary.int", "w") as f:
-                        for phone in phones:
-                            phone = phone.strip()
-                            phone = re.sub('^<eps> .*','', phone)
-                            phone = re.sub('^#\d+ .*','', phone)
-                            if phone != '':
-                                id = phone.split(' ')[1]
-                                if '_I ' in phone:
-                                    f.write(id+" internal\n")
-                                elif '_B ' in phone:
-                                    f.write(id+" begin\n")
-                                elif '_E ' in phone:
-                                    f.write(id+" end\n")
-                                elif '_S ' in phone:
-                                    f.write(id+" singleton\n")
-                                else:
-                                    f.write(id+" nonword\n")
-
-                else:
-                    raise ValueError('Neither word_boundary.int nor phones.txt exists!!!')
+        self.LoadModels()
         
+    def LoadModels(self):
         try:
             # Define online feature pipeline
-            self.log.info("Load decoder config")
-            loadConfig(self)
-            feat_opts = OnlineNnetFeaturePipelineConfig()
-            self.endpoint_opts = OnlineEndpointConfig()
             po = ParseOptions("")
-            feat_opts.register(po)
+
+            decoder_opts = LatticeFasterDecoderOptions()
+            self.endpoint_opts = OnlineEndpointConfig()
+            self.decodable_opts = NnetSimpleLoopedComputationOptions()
+            feat_opts = OnlineNnetFeaturePipelineConfig()
+
+
+            decoder_opts.register(po)
             self.endpoint_opts.register(po)
+            self.decodable_opts.register(po)
+            feat_opts.register(po)
+
             po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
-            self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(feat_opts)
-            
+            self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(
+                feat_opts)
+
             # Set metadata parameters
             self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
             self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
+            self.acwt = self.decodable_opts.acoustic_scale
 
-            # Construct recognizer
-            self.log.info("Load Decoder model")
-            decoder_opts = LatticeFasterDecoderOptions()
-            decoder_opts.beam = self.DECODER_BEAM
-            decoder_opts.max_active = self.DECODER_MAXACT
-            decoder_opts.min_active = self.DECODER_MINACT
-            decoder_opts.lattice_beam = self.DECODER_LATBEAM
-            self.decodable_opts = NnetSimpleLoopedComputationOptions()
-            self.decodable_opts.acoustic_scale = self.DECODER_ACWT
-            self.decodable_opts.frame_subsampling_factor = self.DECODER_FSF
-            self.decodable_opts.frames_per_chunk = 150
-            
             # Load Acoustic and graph models and other files
-            self.transition_model, self.acoustic_model = NnetRecognizer.read_model(self.AM_PATH+"/final.mdl")
+            self.transition_model, self.acoustic_model = NnetRecognizer.read_model(
+                self.AM_PATH+"/final.mdl")
             graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
-            self.decoder_graph = LatticeFasterOnlineDecoder(graph, decoder_opts)
-            self.symbols = _fst.SymbolTable.read_text(self.LM_PATH+"/words.txt")
-            self.info = WordBoundaryInfo.from_file(WordBoundaryInfoNewOpts(),self.LM_PATH+"/word_boundary.int")
+            self.decoder_graph = LatticeFasterOnlineDecoder(
+                graph, decoder_opts)
+            self.symbols = _fst.SymbolTable.read_text(
+                self.LM_PATH+"/words.txt")
+            self.info = WordBoundaryInfo.from_file(
+                WordBoundaryInfoNewOpts(), self.LM_PATH+"/word_boundary.int")
+
+            
+            self.asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
+                                                    self.symbols, decodable_opts=self.decodable_opts, endpoint_opts=self.endpoint_opts)
             del graph, decoder_opts
         except Exception as e:
             self.log.error(e)
-            raise ValueError("AM and LM loading failed!!! (see logs for more details)")
+            raise ValueError(
+                "AM and LM loading failed!!! (see logs for more details)")
 
     def get_sample_rate(self):
         return self.samp_freq
 
-    def get_frames(self,feat_pipeline):
+    def get_frames(self, feat_pipeline):
         rows = feat_pipeline.num_frames_ready()
         cols = feat_pipeline.dim()
-        frames = Matrix(rows,cols)
-        feat_pipeline.get_frames(range(rows),frames)
-        return frames[:,:self.feat_info.mfcc_opts.num_ceps], frames[:,self.feat_info.mfcc_opts.num_ceps:]
+        frames = Matrix(rows, cols)
+        feat_pipeline.get_frames(range(rows), frames)
+        return frames[:, :self.feat_info.mfcc_opts.num_ceps], frames[:, self.feat_info.mfcc_opts.num_ceps:]
         # return feats + ivectors
-        
-    def compute_feat(self,audio):
+
+    def compute_feat(self, wav):
         try:
             feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
-            feat_pipeline.accept_waveform(audio.sr, audio.getDataKaldyVector())
+            feat_pipeline.accept_waveform(self.samp_freq, wav)
             feat_pipeline.input_finished()
         except Exception as e:
             self.log.error(e)
             raise ValueError("Feature extraction failed!!!")
         else:
             return feat_pipeline
-        
-    def decoder(self,feats):
+
+    def decoder(self, feats):
         try:
             start_time = time.time()
             self.log.info("Start Decoding: %s" % (start_time))
-            asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
-                                                    self.symbols, decodable_opts= self.decodable_opts, endpoint_opts=self.endpoint_opts)
-            asr.set_input_pipeline(feats)
-            decode = asr.decode()
-            self.log.info("Decode time in seconds: %s" % (time.time() - start_time))
+            self.asr.set_input_pipeline(feats)
+            decode = self.asr.decode()
+            self.log.info("Decode time in seconds: %s" %
+                          (time.time() - start_time))
         except Exception as e:
             self.log.error(e)
             raise ValueError("Decoder failed to transcribe the input audio!!!")
         else:
             return decode
-        
-    def wordTimestamp(self,decode):
+
+    def wordTimestamp(self, text, lattice, frame_shift, frame_subsampling):
         try:
-            _fst.utils.scale_compact_lattice([[1.0, 0],[0, float(self.DECODER_ACWT)]], decode['lattice'])
-            bestPath = compact_lattice_shortest_path(decode['lattice'])
-            _fst.utils.scale_compact_lattice([[1.0, 0],[0, 1.0/float(self.DECODER_ACWT)]], bestPath)
-            bestLattice = word_align_lattice(bestPath, self.transition_model, self.info, 0)
+            _fst.utils.scale_compact_lattice(
+                [[1.0, 0], [0, float(self.acwt)]], lattice)
+            bestPath = compact_lattice_shortest_path(lattice)
+            _fst.utils.scale_compact_lattice(
+                [[1.0, 0], [0, 1.0/float(self.acwt)]], bestPath)
+            bestLattice = word_align_lattice(
+                bestPath, self.transition_model, self.info, 0)
             alignment = compact_lattice_to_word_alignment(bestLattice[1])
             words = _fst.indices_to_symbols(self.symbols, alignment[0])
+            start = alignment[1]
+            dur   = alignment[2]
+
+            output = {}
+            output["words"] = []
+            for i in range(len(words)):
+                meta = {}
+                meta["word"] = words[i]
+                meta["start"] = round(start[i] * frame_shift * frame_subsampling, 2)
+                meta["end"] = round((start[i]+dur[i]) * frame_shift * frame_subsampling, 2)
+                output["words"].append(meta)
+                text += " "+meta["word"]
+            output["text"] = text
+
         except Exception as e:
             self.log.error(e)
             raise ValueError("Decoder failed to create the word timestamps!!!")
         else:
-            return {
-                "words":words,
-                "start":alignment[1],
-                "dur":alignment[2]
-            }
+            return output
+
 
 class SpeakerDiarization:
-    def __init__(self):
-        self.log = logging.getLogger('__stt-standelone-worker__.SPKDiarization')
-
-       ### MFCC FEATURES PARAMETERS
-        self.frame_length_s=0.025
-        self.frame_shift_s=0.01
-        self.num_bins=40
-        self.num_ceps=40
-        self.low_freq=40
-        self.high_freq=-200
+    def __init__(self, sample_rate):
+        self.log = logging.getLogger(
+            '__stt-standelone-worker__.SPKDiarization')
+
+        # MFCC FEATURES PARAMETERS
+        self.sr = sample_rate
+        self.frame_length_s = 0.025
+        self.frame_shift_s = 0.01
+        self.num_bins = 40
+        self.num_ceps = 40
+        self.low_freq = 40
+        self.high_freq = -200
+        if self.sr == 16000:
+            self.low_freq = 20
+            self.high_freq = 7600
         #####
 
-        ### VAD PARAMETERS
+        # VAD PARAMETERS
         self.vad_ops = VadEnergyOptions()
         self.vad_ops.vad_energy_mean_scale = 0.9
         self.vad_ops.vad_energy_threshold = 5
@@ -226,83 +187,62 @@ def __init__(self):
         #vad_ops.vad_proportion_threshold = 0.12
         #####
 
-        ### Segment
-        self.seg_length = 100 # Window size in frames
-        self.seg_increment = 100 # Window increment after and before window in frames
-        self.seg_rate = 100 # Window shifting in frames
+        # Segment
+        self.seg_length = 100  # Window size in frames
+        self.seg_increment = 100  # Window increment after and before window in frames
+        self.seg_rate = 100  # Window shifting in frames
         #####
 
-        ### KBM
-        self.minimumNumberOfInitialGaussians = 1024 # Minimum number of Gaussians in the initial pool
-        self.maximumKBMWindowRate = 50 # Maximum window rate for Gaussian computation
-        self.windowLength = 200 # Window length for computing Gaussians
-        self.kbmSize = 320 # Number of final Gaussian components in the KBM
-        self.useRelativeKBMsize = 1 # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
-        self.relKBMsize = 0.3 # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
+        # KBM
+        # Minimum number of Gaussians in the initial pool
+        self.minimumNumberOfInitialGaussians = 1024
+        self.maximumKBMWindowRate = 50  # Maximum window rate for Gaussian computation
+        self.windowLength = 200  # Window length for computing Gaussians
+        self.kbmSize = 320  # Number of final Gaussian components in the KBM
+        # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
+        self.useRelativeKBMsize = 1
+        # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
+        self.relKBMsize = 0.3
         ######
 
-        ### BINARY_KEY
-        self.topGaussiansPerFrame = 5 # Number of top selected components per frame
-        self.bitsPerSegmentFactor = 0.2 # Percentage of bits set to 1 in the binary keys
+        # BINARY_KEY
+        self.topGaussiansPerFrame = 5  # Number of top selected components per frame
+        self.bitsPerSegmentFactor = 0.2  # Percentage of bits set to 1 in the binary keys
         ######
 
-        ### CLUSTERING
-        self.N_init = 16 # Number of initial clusters
-        self.linkage = 0 # Set to one to perform linkage clustering instead of clustering/reassignment
-        self.linkageCriterion = 'average' # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
-        self.metric = 'cosine' # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
+        # CLUSTERING
+        self.N_init = 16  # Number of initial clusters
+        # Set to one to perform linkage clustering instead of clustering/reassignment
+        self.linkage = 0
+        # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
+        self.linkageCriterion = 'average'
+        # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
+        self.metric = 'cosine'
         ######
 
-        ### CLUSTERING_SELECTION
-        self.metric_clusteringSelection = 'cosine' # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
-        self.bestClusteringCriterion = 'elbow' # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
-        self.sigma = 1 # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
+        # CLUSTERING_SELECTION
+        # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
+        self.metric_clusteringSelection = 'cosine'
+        # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
+        self.bestClusteringCriterion = 'elbow'
+        self.sigma = 1  # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
         self.percentile = 40
-        self.maxNrSpeakers = 16 # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
+        self.maxNrSpeakers = 10  # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
         ######
 
-        ### RESEGMENTATION
-        self.resegmentation = 1 # Set to 1 to perform re-segmentation
-        self.modelSize = 6 # Number of GMM components
-        self.nbIter = 10 # Number of expectation-maximization (EM) iterations
-        self.smoothWin = 100 # Size of the likelihood smoothing window in nb of frames
+        # RESEGMENTATION
+        self.resegmentation = 1  # Set to 1 to perform re-segmentation
+        self.modelSize = 6  # Number of GMM components
+        self.nbIter = 10  # Number of expectation-maximization (EM) iterations
+        self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
         ######
-    
-    def set_maxNrSpeakers(self,nbr):
-        self.maxNrSpeakers = nbr
-    
-    def compute_feat_Librosa(self,audio):
-        try:
-            self.log.info("Start feature extraction: %s" % (time.time()))
-            if audio.sr == 16000:
-                self.low_freq=20
-                self.high_freq=7600
-            data = audio.data/32768
-            frame_length_inSample = self.frame_length_s * audio.sr
-            hop = int(self.frame_shift_s * audio.sr)
-            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
-            mfccNumpy = librosa.feature.mfcc(y=data,
-                                             sr=audio.sr,
-                                             dct_type=2,
-                                             n_mfcc=self.num_ceps,
-                                             n_mels=self.num_bins,
-                                             n_fft=NFFT,
-                                             hop_length=hop,
-                                             fmin=self.low_freq,
-                                             fmax=self.high_freq).T
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed when extracting features!!!")
-        else:
-            return mfccNumpy
 
-    def compute_feat_KALDI(self,audio):
+    def compute_feat_KALDI(self, wav):
         try:
-            self.log.info("Start feature extraction: %s" % (time.time()))
             po = ParseOptions("")
             mfcc_opts = MfccOptions()
             mfcc_opts.use_energy = False
-            mfcc_opts.frame_opts.samp_freq = audio.sr
+            mfcc_opts.frame_opts.samp_freq = self.sr
             mfcc_opts.frame_opts.frame_length_ms = self.frame_length_s*1000
             mfcc_opts.frame_opts.frame_shift_ms = self.frame_shift_s*1000
             mfcc_opts.frame_opts.allow_downsample = False
@@ -311,51 +251,33 @@ def compute_feat_KALDI(self,audio):
             mfcc_opts.mel_opts.high_freq = self.high_freq
             mfcc_opts.num_ceps = self.num_ceps
             mfcc_opts.register(po)
-            
+
             # Create MFCC object and obtain sample frequency
             mfccObj = Mfcc(mfcc_opts)
-            mfccKaldi = mfccObj.compute_features(audio.getDataKaldyVector(), audio.sr, 1.0)
+            mfccKaldi = mfccObj.compute_features(wav, self.sr, 1.0)
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker diarization failed while extracting features!!!")
+            raise ValueError(
+                "Speaker diarization failed while extracting features!!!")
         else:
             return mfccKaldi
-        
-    def computeVAD_WEBRTC(self, audio):
-        try:
-            self.log.info("Start VAD: %s" % (time.time()))
-            data = audio.data/32768
-            hop = 30
-            va_framed = py_webrtcvad(data, fs=audio.sr, fs_vad=audio.sr, hoplength=hop, vad_mode=0)
-            segments = get_py_webrtcvad_segments(va_framed,audio.sr)
-            maskSAD = np.zeros([1,nFeatures])
-            for seg in segments:
-                start=int(np.round(seg[0]/frame_shift_s))
-                end=int(np.round(seg[1]/frame_shift_s))
-                maskSAD[0][start:end]=1
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Speaker diarization failed while voice activity detection!!!")
-        else:
-            return maskSAD
-    
-    def computeVAD_KALDI(self, audio, feats=None):
+
+    def computeVAD_KALDI(self, feats):
         try:
-            self.log.info("Start VAD: %s" % (time.time()))
-            vadStream = compute_vad_energy(self.vad_ops,feats)
+            vadStream = compute_vad_energy(self.vad_ops, feats)
             vad = Vector(vadStream)
             VAD = vad.numpy()
-                        
-            ### segmentation
-            occurence=[]
-            value=[]
+
+            #  segmentation
+            occurence = []
+            value = []
             occurence.append(1)
             value.append(VAD[0])
 
             # compute the speech and non-speech frames
-            for i in range(1,len(VAD)):
+            for i in range(1, len(VAD)):
                 if value[-1] == VAD[i]:
-                    occurence[-1]+=1
+                    occurence[-1] += 1
                 else:
                     occurence.append(1)
                     value.append(VAD[i])
@@ -368,7 +290,7 @@ def computeVAD_KALDI(self, audio, feats=None):
                     del value[i]
                     del occurence[i]
                 else:
-                    i+=1
+                    i += 1
 
             # split if and only if the silence is above 50 frames
             i = 0
@@ -378,16 +300,16 @@ def computeVAD_KALDI(self, audio, feats=None):
                     del value[i]
                     del occurence[i]
                 else:
-                    i+=1
-            
+                    i += 1
+
             # compute VAD mask
             maskSAD = np.zeros(len(VAD))
-            start=0
+            start = 0
             for i in range(len(occurence)):
                 if value[i] == 1.0:
-                    end=start+occurence[i]
+                    end = start+occurence[i]
                     maskSAD[start:end] = 1
-                    start=end
+                    start = end
                 else:
                     start += occurence[i]
 
@@ -396,225 +318,334 @@ def computeVAD_KALDI(self, audio, feats=None):
             self.log.error(v)
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker diarization failed while voice activity detection!!!")
+            raise ValueError(
+                "Speaker diarization failed while voice activity detection!!!")
         else:
             return maskSAD
 
-    def run(self, audio, feats=None):
+    def run(self, wav, dur, feats=None):
         try:
             def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
-                numberOfSpeechFeatures = finalSegmentTable[-1,2].astype(int)+1
-                solutionVector = np.zeros([1,numberOfSpeechFeatures])
-                for i in np.arange(np.size(finalSegmentTable,0)):
-                    solutionVector[0,np.arange(finalSegmentTable[i,1],finalSegmentTable[i,2]+1).astype(int)]=finalClusteringTable[i]
-                seg = np.empty([0,3]) 
+                numberOfSpeechFeatures = finalSegmentTable[-1, 2].astype(int)+1
+                solutionVector = np.zeros([1, numberOfSpeechFeatures])
+                for i in np.arange(np.size(finalSegmentTable, 0)):
+                    solutionVector[0, np.arange(
+                        finalSegmentTable[i, 1], finalSegmentTable[i, 2]+1).astype(int)] = finalClusteringTable[i]
+                seg = np.empty([0, 3])
                 solutionDiff = np.diff(solutionVector)[0]
                 first = 0
-                for i in np.arange(0,np.size(solutionDiff,0)):
+                for i in np.arange(0, np.size(solutionDiff, 0)):
                     if solutionDiff[i]:
                         last = i+1
                         seg1 = (first)*frameshift
                         seg2 = (last-first)*frameshift
-                        seg3 = solutionVector[0,last-1]
+                        seg3 = solutionVector[0, last-1]
                         if seg.shape[0] != 0 and seg3 == seg[-1][2]:
                             seg[-1][1] += seg2
-                        elif seg3 and seg2 > 0.3: # and seg2 > 0.1
-                            seg = np.vstack((seg,[seg1,seg2,seg3]))
+                        elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
+                            seg = np.vstack((seg, [seg1, seg2, seg3]))
                         first = i+1
-                last = np.size(solutionVector,1)
+                last = np.size(solutionVector, 1)
                 seg1 = (first-1)*frameshift
                 seg2 = (last-first+1)*frameshift
-                seg3 = solutionVector[0,last-1]
+                seg3 = solutionVector[0, last-1]
                 if seg3 == seg[-1][2]:
                     seg[-1][1] += seg2
-                elif seg3 and seg2 > 0.3: # and seg2 > 0.1
-                    seg = np.vstack((seg,[seg1,seg2,seg3]))
-                seg = np.vstack((seg,[dur,-1,-1]))
-                seg[0][0]=0.0
+                elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
+                    seg = np.vstack((seg, [seg1, seg2, seg3]))
+                seg = np.vstack((seg, [dur, -1, -1]))
+                seg[0][0] = 0.0
                 return seg
-        
+
 
             start_time = time.time()
             self.log.info("Start Speaker Diarization: %s" % (start_time))
-            if self.maxNrSpeakers == 1 or audio.dur < 3:
-                self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
-                return [[0, audio.dur, 1],
-                        [audio.dur, -1, -1]]
+            if self.maxNrSpeakers == 1 or dur < 5:
+                self.log.info("Speaker Diarization time in seconds: %s" %
+                              (time.time() - start_time))
+                return [[0, dur, 1],
+                        [dur, -1, -1]]
             if feats == None:
-                feats = self.compute_feat_KALDI(audio)
+                feats = self.compute_feat_KALDI(wav)
             nFeatures = feats.shape[0]
-            maskSAD = self.computeVAD_KALDI(audio,feats)
-            maskUEM = np.ones([1,nFeatures])
+            maskSAD = self.computeVAD_KALDI(feats)
+            maskUEM = np.ones([1, nFeatures])
 
-            mask = np.logical_and(maskUEM,maskSAD)    
+            mask = np.logical_and(maskUEM, maskSAD)
             mask = mask[0][0:nFeatures]
-            nSpeechFeatures=np.sum(mask)
+            nSpeechFeatures = np.sum(mask)
             speechMapping = np.zeros(nFeatures)
-            #you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
-            #so that we don't lose features on the way
-            speechMapping[np.nonzero(mask)] = np.arange(1,nSpeechFeatures+1)
-            data=feats[np.where(mask==1)]
+            # you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
+            # so that we don't lose features on the way
+            speechMapping[np.nonzero(mask)] = np.arange(1, nSpeechFeatures+1)
+            data = feats[np.where(mask == 1)]
             del feats
 
-            segmentTable=getSegmentTable(mask,speechMapping,self.seg_length,self.seg_increment,self.seg_rate)
-            numberOfSegments=np.size(segmentTable,0)
-            #create the KBM
-            #set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
+            segmentTable = getSegmentTable(
+                mask, speechMapping, self.seg_length, self.seg_increment, self.seg_rate)
+            numberOfSegments = np.size(segmentTable, 0)
+            # create the KBM
+            # set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
             if np.floor((nSpeechFeatures-self.windowLength)/self.minimumNumberOfInitialGaussians) < self.maximumKBMWindowRate:
-                windowRate = int(np.floor((np.size(data,0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
+                windowRate = int(np.floor(
+                    (np.size(data, 0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
             else:
                 windowRate = int(self.maximumKBMWindowRate)
-            
+
             if windowRate == 0:
-                raise ValueError('The audio is to short in order to perform the speaker diarization!!!')
-            
+                raise ValueError(
+                    'The audio is to short in order to perform the speaker diarization!!!')
+
             poolSize = np.floor((nSpeechFeatures-self.windowLength)/windowRate)
-            if  self.useRelativeKBMsize:
+            if self.useRelativeKBMsize:
                 kbmSize = int(np.floor(poolSize*self.relKBMsize))
             else:
                 kbmSize = int(self.kbmSize)
-            
-            #Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
-            kbm, gmPool = trainKBM(data,self.windowLength,windowRate,kbmSize)
-            
+
+            # Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
+            kbm, gmPool = trainKBM(
+                data, self.windowLength, windowRate, kbmSize)
+
             #'Selected',kbmSize,'gaussians from the pool'
-            Vg = getVgMatrix(data,gmPool,kbm,self.topGaussiansPerFrame)
-            
+            Vg = getVgMatrix(data, gmPool, kbm, self.topGaussiansPerFrame)
+
             #'Computing binary keys for all segments... '
-            segmentBKTable, segmentCVTable = getSegmentBKs(segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
-            
+            segmentBKTable, segmentCVTable = getSegmentBKs(
+                segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
+
             #'Performing initial clustering... '
-            initialClustering = np.digitize(np.arange(numberOfSegments),np.arange(0,numberOfSegments,numberOfSegments/self.N_init))
-            
-            
+            initialClustering = np.digitize(np.arange(numberOfSegments), np.arange(
+                0, numberOfSegments, numberOfSegments/self.N_init))
+
             #'Performing agglomerative clustering... '
             if self.linkage:
-                finalClusteringTable, k = performClusteringLinkage(segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
+                finalClusteringTable, k = performClusteringLinkage(
+                    segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
             else:
-                finalClusteringTable, k = performClustering(speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
+                finalClusteringTable, k = performClustering(
+                    speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
 
             #'Selecting best clustering...'
             if self.bestClusteringCriterion == 'elbow':
-                bestClusteringID = getBestClustering(self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
+                bestClusteringID = getBestClustering(
+                    self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
             elif self.bestClusteringCriterion == 'spectral':
-                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection,finalClusteringTable,self.N_init,segmentBKTable,segmentCVTable,k,self.sigma,self.percentile,self.maxNrSpeakers)+1
-                
-            if self.resegmentation and np.size(np.unique(finalClusteringTable[:,bestClusteringID.astype(int)-1]),0)>1:
-                finalClusteringTableResegmentation,finalSegmentTable = performResegmentation(data,speechMapping, mask,finalClusteringTable[:,bestClusteringID.astype(int)-1],segmentTable,self.modelSize,self.nbIter,self.smoothWin,nSpeechFeatures)
-                seg = getSegments(self.frame_shift_s,finalSegmentTable, np.squeeze(finalClusteringTableResegmentation), audio.dur)
+                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection, finalClusteringTable,
+                                                         self.N_init, segmentBKTable, segmentCVTable, k, self.sigma, self.percentile, self.maxNrSpeakers)+1
+
+            if self.resegmentation and np.size(np.unique(finalClusteringTable[:, bestClusteringID.astype(int)-1]), 0) > 1:
+                finalClusteringTableResegmentation, finalSegmentTable = performResegmentation(data, speechMapping, mask, finalClusteringTable[:, bestClusteringID.astype(
+                    int)-1], segmentTable, self.modelSize, self.nbIter, self.smoothWin, nSpeechFeatures)
+                seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(finalClusteringTableResegmentation), dur)
             else:
-                seg = getSegmentationFile(self.frame_shift_s,segmentTable, finalClusteringTable[:,bestClusteringID.astype(int)-1])
-            self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
+                seg = getSegmentationFile(
+                    self.frame_shift_s, segmentTable, finalClusteringTable[:, bestClusteringID.astype(int)-1])
+            self.log.info("Speaker Diarization time in seconds: %s" %
+                          (time.time() - start_time))
         except ValueError as v:
             self.log.info(v)
-            return [[0, audio.dur, 1],
-                    [audio.dur, -1, -1]]
+            return [[0, dur, 1],
+                    [dur, -1, -1]]
         except Exception as e:
             self.log.error(e)
             raise ValueError("Speaker Diarization failed!!!")
         else:
             return seg
-        
+
+
 class SttStandelone:
-    def __init__(self,metadata=False,spkDiarization=False):
-        self.log = logging.getLogger('__stt-standelone-worker__.SttStandelone')
-        self.metadata = metadata
-        self.spkDiarization = spkDiarization
-        self.timestamp = True if self.metadata or self.spkDiarization else False
-        
-    def run(self,audio,asr,spk):
-        feats = asr.compute_feat(audio)
-        mfcc, ivector = asr.get_frames(feats)
-        if self.spkDiarization:
-            with ThreadPoolExecutor(max_workers=2) as executor:
-                thrd1 = executor.submit(asr.decoder, feats)
-                thrd2 = executor.submit(spk.run, audio, mfcc)
-                decode = thrd1.result()
-                spkSeg = thrd2.result()
-        else:
-            decode = asr.decoder(feats)
-            spkSeg = []
-        
-        if self.timestamp:
-            timestamps = asr.wordTimestamp(decode)
-            output = self.getOutput(timestamps,asr.frame_shift, asr.decodable_opts.frame_subsampling_factor,spkSeg)
-            if self.metadata:
-                return output
-            else:
-                return {"text":output["text"]}
-        else:
-            return decode["text"]
+    def __init__(self):
+        self.log = logging.getLogger("__stt-standelone-worker-streaming__")
+        logging.basicConfig(level=logging.INFO)
 
-    def getOutput(self,timestamps,frame_shift, frame_subsampling, spkSeg = []):
-        output = {}
-        if len(spkSeg) == 0:
-            text = ""
-            output["words"] = []
-            for i in range(len(timestamps["words"])):
-                if timestamps["words"][i] != "<eps>":
-                    meta = {}
-                    meta["word"] = timestamps["words"][i]
-                    meta["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
-                    meta["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
-                    output["words"].append(meta)
-                    text += " "+meta["word"]
-            output["text"] = text
-        else:
-            output["speakers"] = []
-            output["text"] = []
-            j = 0
-            newSpk = 1
-            for i in range(len(timestamps["words"])):
-                if timestamps["words"][i] != "<eps>":
-                    if newSpk:
-                        speaker = {}
-                        speaker["speaker_id"] = "spk_"+str(int(spkSeg[j][2]))
-                        speaker["words"] = []
-                        txtSpk = speaker["speaker_id"]+":"
-                        newSpk = 0
-                    word = {}
-                    word["word"] = timestamps["words"][i]
-                    word["btime"] = round(timestamps["start"][i] * frame_shift * frame_subsampling,2)
-                    word["etime"] = round((timestamps["start"][i]+timestamps["dur"][i]) * frame_shift * frame_subsampling, 2)
-                    speaker["words"].append(word)
-                    txtSpk += " "+word["word"]
-                    if word["etime"] > spkSeg[j+1][0]:
-                        speaker["btime"] = speaker["words"][0]["btime"]
-                        speaker["etime"] = speaker["words"][-1]["etime"]
-                        output["speakers"].append(speaker)
-                        output["text"].append(txtSpk)
-                        newSpk = 1
-                        j += 1
-            #add the last speaker to the output speakers
-            speaker["btime"] = speaker["words"][0]["btime"]
-            speaker["etime"] = speaker["words"][-1]["etime"]
-            output["speakers"].append(speaker)
-            output["text"].append(txtSpk)
-        return output
-        
-        
-class Audio:
-    def __init__(self,sr):
-        self.log = logging.getLogger('__stt-standelone-worker__.Audio')
-        self.bit = 16
-        self.channels = 1
-        self.sr = sr
-    
-    def set_logger(self,log):
-        self.log = log
-    
-    def read_audio(self, audio):
+        # Main parameters
+        self.AM_PATH = '/opt/models/AM'
+        self.LM_PATH = '/opt/models/LM'
+        self.TEMP_FILE_PATH = '/opt/tmp'
+        self.CONFIG_FILES_PATH = '/opt/config'
+        self.SAVE_AUDIO = False
+        self.SERVICE_PORT = 80
+        self.SWAGGER_URL = '/api-doc'
+        self.SWAGGER_PATH = None
+
+        if not os.path.isdir(self.TEMP_FILE_PATH):
+            os.mkdir(self.TEMP_FILE_PATH)
+        if not os.path.isdir(self.CONFIG_FILES_PATH):
+            os.mkdir(self.CONFIG_FILES_PATH)
+
+        # Environment parameters
+        if 'SERVICE_PORT' in os.environ:
+            self.SERVICE_PORT = os.environ['SERVICE_PORT']
+        if 'SAVE_AUDIO' in os.environ:
+            self.SAVE_AUDIO = os.environ['SAVE_AUDIO']
+        if 'SWAGGER_PATH' in os.environ:
+            self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
+
+        self.loadConfig()
+
+    def loadConfig(self):
+        # get decoder parameters from "decode.cfg"
+        decoder_settings = configparser.ConfigParser()
+        if not os.path.exists(self.AM_PATH+'/decode.cfg'):
+            return False
+        decoder_settings.read(self.AM_PATH+'/decode.cfg')
+
+        # Prepare "online.conf"
+        self.AM_PATH = self.AM_PATH+"/" + \
+            decoder_settings.get('decoder_params', 'ampath')
+        with open(self.AM_PATH+"/conf/online.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--ivector-extraction-config=" +
+                        self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
+                f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf\n")
+                f.write(
+                    "--beam="+decoder_settings.get('decoder_params', 'beam')+"\n")
+                f.write(
+                    "--lattice-beam="+decoder_settings.get('decoder_params', 'lattice_beam')+"\n")
+                f.write("--acoustic-scale=" +
+                        decoder_settings.get('decoder_params', 'acwt')+"\n")
+                f.write(
+                    "--min-active="+decoder_settings.get('decoder_params', 'min_active')+"\n")
+                f.write(
+                    "--max-active="+decoder_settings.get('decoder_params', 'max_active')+"\n")
+                f.write("--frame-subsampling-factor="+decoder_settings.get(
+                    'decoder_params', 'frame_subsampling_factor')+"\n")
+
+        # Prepare "ivector_extractor.conf"
+        with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
+                f.write("--cmvn-config="+self.AM_PATH +
+                        "/conf/online_cmvn.conf\n")
+                f.write("--lda-matrix="+self.AM_PATH +
+                        "/ivector_extractor/final.mat\n")
+                f.write("--global-cmvn-stats="+self.AM_PATH +
+                        "/ivector_extractor/global_cmvn.stats\n")
+                f.write("--diag-ubm="+self.AM_PATH +
+                        "/ivector_extractor/final.dubm\n")
+                f.write("--ivector-extractor="+self.AM_PATH +
+                        "/ivector_extractor/final.ie")
+
+        # Prepare "word_boundary.int" if not exist
+        if not os.path.exists(self.LM_PATH+"/word_boundary.int") and os.path.exists(self.AM_PATH+"phones.txt"):
+            with open(self.AM_PATH+"phones.txt") as f:
+                phones = f.readlines()
+
+            with open(self.LM_PATH+"/word_boundary.int", "w") as f:
+                for phone in phones:
+                    phone = phone.strip()
+                    phone = re.sub('^<eps> .*', '', phone)
+                    phone = re.sub('^#\d+ .*', '', phone)
+                    if phone != '':
+                        id = phone.split(' ')[1]
+                        if '_I ' in phone:
+                            f.write(id+" internal\n")
+                        elif '_B ' in phone:
+                            f.write(id+" begin\n")
+                        elif '_E ' in phone:
+                            f.write(id+" end\n")
+                        elif '_S ' in phone:
+                            f.write(id+" singleton\n")
+                        else:
+                            f.write(id+" nonword\n")
+
+    def swaggerUI(self, app):
+        ### swagger specific ###
+        swagger_yml = yaml.load(
+            open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+        swaggerui = get_swaggerui_blueprint(
+            # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+            self.SWAGGER_URL,
+            self.SWAGGER_PATH,
+            config={  # Swagger UI config overrides
+                'app_name': "STT API Documentation",
+                'spec': swagger_yml
+            }
+        )
+        app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
+        ### end swagger specific ###
+
+    def read_audio(self, file, sample_rate):
+        file_path = self.TEMP_FILE_PATH+file.filename.lower()
+        file.save(file_path)
         try:
-            data, sr = librosa.load(audio,sr=None)
-            if sr != self.sr:
-                self.log.info('Resample audio file: '+str(sr)+'Hz -> '+str(self.sr)+'Hz')
-                data = librosa.resample(data, sr, self.sr)
+            data, sr = librosa.load(file_path, sr=None)
+            if sr != sample_rate:
+                self.log.info('Resample audio file: '+str(sr) +
+                              'Hz -> '+str(sample_rate)+'Hz')
+                data = librosa.resample(data, sr, sample_rate)
             data = (data * 32767).astype(np.int16)
-            self.data = data
-            self.dur = len(self.data) / self.sr
+            self.dur = len(data) / sample_rate
+            self.data = Vector(data)
+
+            if not self.SAVE_AUDIO:
+                os.remove(file_path)
         except Exception as e:
             self.log.error(e)
             raise ValueError("The uploaded file format is not supported!!!")
-    
-    def getDataKaldyVector(self):
-        return Vector(self.data)
\ No newline at end of file
+
+    def run(self, asr, metadata):
+        feats = asr.compute_feat(self.data)
+        mfcc, ivector = asr.get_frames(feats)
+        decode = asr.decoder(feats)
+        if metadata:
+            spk = SpeakerDiarization(asr.get_sample_rate())
+            spkSeg = spk.run(self.data, self.dur, mfcc)
+            data = asr.wordTimestamp(decode["text"], decode['lattice'], asr.frame_shift, asr.decodable_opts.frame_subsampling_factor)
+            output = self.process_output(data, spkSeg)
+            return output
+        else:
+            return self.parse_text(decode["text"])
+
+
+    # return a json object including word-data, speaker-data
+    def process_output(self, data, spkrs):
+        speakers = []
+        text = []
+        i = 0
+        text_ = ""
+        words=[]
+        for word in data['words']:
+            if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+                text_ += word["word"]  + " "
+                words.append(word)
+            else:
+                speaker = {}
+                speaker["start"]=words[0]["start"]
+                speaker["end"]=words[len(words)-1]["end"]
+                speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+                speaker["words"]=words
+
+                text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+                speakers.append(speaker)
+
+                words=[word]
+                text_=word["word"] + " "
+                i+=1
+
+        speaker = {}
+        speaker["start"]=words[0]["start"]
+        speaker["end"]=words[len(words)-1]["end"]
+        speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
+        speaker["words"]=words
+
+        text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+        speakers.append(speaker)
+
+        return {'speakers': speakers, 'text': text}
+
+    # remove extra symbols
+    def parse_text(self, text):
+        text = re.sub(r"<unk>", "", text) # remove <unk> symbol
+        text = re.sub(r"#nonterm:[^ ]* ", "", text) # remove entity's mark
+        text = re.sub(r"<eps>", "", text) # remove <eps>
+        text = re.sub(r"' ", "'", text) # remove space after quote '
+        text = re.sub(r" +", " ", text) # remove multiple spaces
+        text = text.strip()
+        return text

From f6a3659028fe77a6874dceec685e17a5eaa1bb0d Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Sat, 3 Oct 2020 15:53:48 +0200
Subject: [PATCH 027/172] update readme

---
 README.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/README.md b/README.md
index a2540a8..fb69cc2 100644
--- a/README.md
+++ b/README.md
@@ -140,9 +140,7 @@ Convert a speech to text
 >  `post`  <br>
 > Make a POST request
 >>  <b  style="color:green;">Arguments</b> :
->>  -  **{File} file** : Audio file (file format: wav, mp3, aiff, flac, ogg)
->>  -  **{Integer} nbrSpeaker (optional)**: Number of speakers engaged in dialog
->>  -  **{String} speaker (optional)**: Do speaker diarization (yes|no)
+>>  -  **{File} file** : Audio file (file format: wav, mp3, flac, ogg)
 >
 >>  <b  style="color:green;">Header</b> :
 >>  -  **{String} Accept**: response content type (text/plain|application/json)

From 37b48db04370f8485384ca4c5fb5ee0a4c4c44d1 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 13 Oct 2020 11:53:03 +0200
Subject: [PATCH 028/172] change audio file loader and saved filename

---
 tools.py | 81 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 36 deletions(-)

diff --git a/tools.py b/tools.py
index 05fe391..5d925bf 100644
--- a/tools.py
+++ b/tools.py
@@ -14,12 +14,13 @@
 
 # other packages
 import configparser
+import librosa
 import logging
 import os
 import re
+import uuid
 import json
 import yaml
-import scipy.io.wavfile
 import numpy as np
 from flask_swagger_ui import get_swaggerui_blueprint
 ##############
@@ -36,7 +37,7 @@ def __init__(self):
         self.LM_PATH = '/opt/models/LM'
         self.TEMP_FILE_PATH = '/opt/tmp'
         self.CONFIG_FILES_PATH = '/opt/config'
-        self.SAVE_AUDIO=False
+        self.SAVE_AUDIO = False
         self.SERVICE_PORT = 80
         self.NBR_THREADS = 100
         self.SWAGGER_URL = '/api-doc'
@@ -58,16 +59,17 @@ def __init__(self):
         if 'SWAGGER_PATH' in os.environ:
             self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
 
-
         # start loading ASR configuration
         self.log.info("Create the new config files")
         self.loadConfig()
 
     def swaggerUI(self, app):
         ### swagger specific ###
-        swagger_yml = yaml.load(open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+        swagger_yml = yaml.load(
+            open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
         swaggerui = get_swaggerui_blueprint(
-            self.SWAGGER_URL,  # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+            # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+            self.SWAGGER_URL,
             self.SWAGGER_PATH,
             config={  # Swagger UI config overrides
                 'app_name': "STT API Documentation",
@@ -77,15 +79,20 @@ def swaggerUI(self, app):
         app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
         ### end swagger specific ###
 
-    def getAudio(self,file):
+    def getAudio(self, file):
+        filename = str(uuid.uuid4())
+        file_path = self.TEMP_FILE_PATH+"/"+filename
+        file.save(file_path)
         try:
-            file_path = self.TEMP_FILE_PATH+"/"+file.filename.lower()
-            file.save(file_path)
-            self.rate, self.data = scipy.io.wavfile.read(file_path)
+            data, sr = librosa.load(file_path)
+            self.data = (data * 32767).astype(np.int16)
+            self.rate = sr
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("The uploaded file format is not supported!!!")
+        finally:
             if not self.SAVE_AUDIO:
                 os.remove(file_path)
-        except Exception as e:
-            raise ValueError('Unsupported audio file! Only WAVE format is supported.')
 
     # re-create config files
     def loadConfig(self):
@@ -166,10 +173,10 @@ def loadConfig(self):
 
     # remove extra symbols
     def parse_text(self, text):
-        text = re.sub(r"<unk>", "", text) # remove <unk> symbol
-        text = re.sub(r"#nonterm:[^ ]* ", "", text) # remove entity's mark
-        text = re.sub(r"' ", "'", text) # remove space after quote '
-        text = re.sub(r" +", " ", text) # remove multiple spaces
+        text = re.sub(r"<unk>", "", text)  # remove <unk> symbol
+        text = re.sub(r"#nonterm:[^ ]* ", "", text)  # remove entity's mark
+        text = re.sub(r"' ", "'", text)  # remove space after quote '
+        text = re.sub(r" +", " ", text)  # remove multiple spaces
         text = text.strip()
         return text
 
@@ -178,7 +185,7 @@ def get_response(self, dataJson, is_metadata, is_spkDiarization, nbrOfSpk):
         if dataJson is not None:
             data = json.loads(dataJson)
             if not is_metadata:
-                text = data['text'] # get text from response
+                text = data['text']  # get text from response
                 return self.parse_text(text)
 
             elif 'words' in data and 'features' in data:
@@ -194,12 +201,12 @@ def get_response(self, dataJson, is_metadata, is_spkDiarization, nbrOfSpk):
                     feats = np.squeeze(feats)
                     mask = np.ones(shape=(feats.shape[0],))
                     for pos in seg:
-                        mask[pos-30:pos]=0
+                        mask[pos-30:pos] = 0
 
                     # Do speaker diarization and get speaker segments
                     spk = SpeakerDiarization()
                     spk.set_maxNrSpeakers(nbrOfSpk)
-                    spkrs = spk.run(feats,mask)
+                    spkrs = spk.run(feats, mask)
 
                     # Generate final output data
                     return self.process_output(data, spkrs)
@@ -212,39 +219,40 @@ def get_response(self, dataJson, is_metadata, is_spkDiarization, nbrOfSpk):
         else:
             return {'speakers': [], 'text': '', 'words': []}
 
-
     # return a json object including word-data, speaker-data
+
     def process_output(self, data, spkrs):
         speakers = []
         text = []
         i = 0
         text_ = ""
-        words=[]
+        words = []
         for word in data['words']:
             if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-                text_ += word["word"]  + " "
+                text_ += word["word"] + " "
                 words.append(word)
             else:
                 speaker = {}
-                speaker["start"]=words[0]["start"]
-                speaker["end"]=words[len(words)-1]["end"]
-                speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-                speaker["words"]=words
+                speaker["start"] = words[0]["start"]
+                speaker["end"] = words[len(words)-1]["end"]
+                speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+                speaker["words"] = words
 
-                text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+                text.append(
+                    'spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
                 speakers.append(speaker)
 
-                words=[word]
-                text_=word["word"] + " "
-                i+=1
+                words = [word]
+                text_ = word["word"] + " "
+                i += 1
 
         speaker = {}
-        speaker["start"]=words[0]["start"]
-        speaker["end"]=words[len(words)-1]["end"]
-        speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-        speaker["words"]=words
+        speaker["start"] = words[0]["start"]
+        speaker["end"] = words[len(words)-1]["end"]
+        speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+        speaker["words"] = words
 
-        text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
+        text.append('spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
         speakers.append(speaker)
 
         return {'speakers': speakers, 'text': text}
@@ -362,7 +370,7 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
 
             maskSAD = mask
             maskUEM = np.ones([1, nFeatures])
-            
+
             mask = np.logical_and(maskUEM, maskSAD)
             mask = mask[0][0:nFeatures]
             nSpeechFeatures = np.sum(mask)
@@ -434,7 +442,8 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
             else:
                 return None
 
-            self.log.info("Speaker Diarization time in seconds: %s" % (time.time() - start_time))
+            self.log.info("Speaker Diarization time in seconds: %s" %
+                          (time.time() - start_time))
         except ValueError as v:
             self.log.info(v)
             return [[0, duration, 1],

From f6f9da2fc5a43aa5e911273916a85ff132f63a05 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 13 Oct 2020 11:56:24 +0200
Subject: [PATCH 029/172] update swagger

---
 document/swagger.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index 8a93b7c..e763d3b 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,7 +24,7 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File - Waveform Format"
+        description: "Audio File (wav, mp3, flac, ogg)"
         required: true
         type: "file"
       responses:

From 3782a1210daf02034cfef1dabca2c984f2c23bf3 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 13 Oct 2020 12:15:04 +0200
Subject: [PATCH 030/172] change audio file loader and saved filename

---
 document/swagger.yml |  2 +-
 tools.py             | 11 ++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index e763d3b..3db05a0 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,7 +24,7 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, flac, ogg)"
+        description: "Audio File (wav, mp3, flac, ogg, wma, m4a)"
         required: true
         type: "file"
       responses:
diff --git a/tools.py b/tools.py
index cfb6117..3129d99 100644
--- a/tools.py
+++ b/tools.py
@@ -35,7 +35,7 @@
 ##############
 
 # other packages
-import configparser, sys, os, re, time, logging, yaml
+import configparser, sys, os, re, time, logging, yaml, uuid
 from flask_swagger_ui import get_swaggerui_blueprint
 ##############
 
@@ -572,7 +572,8 @@ def swaggerUI(self, app):
         ### end swagger specific ###
 
     def read_audio(self, file, sample_rate):
-        file_path = self.TEMP_FILE_PATH+file.filename.lower()
+        filename = str(uuid.uuid4())
+        file_path = self.TEMP_FILE_PATH+"/"+filename
         file.save(file_path)
         try:
             data, sr = librosa.load(file_path, sr=None)
@@ -583,12 +584,12 @@ def read_audio(self, file, sample_rate):
             data = (data * 32767).astype(np.int16)
             self.dur = len(data) / sample_rate
             self.data = Vector(data)
-
-            if not self.SAVE_AUDIO:
-                os.remove(file_path)
         except Exception as e:
             self.log.error(e)
             raise ValueError("The uploaded file format is not supported!!!")
+        finally:
+            if not self.SAVE_AUDIO:
+                os.remove(file_path)
 
     def run(self, asr, metadata):
         feats = asr.compute_feat(self.data)

From 43958a74a45116c59f19a26a0b140f5eb8dca42a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 13 Oct 2020 12:29:17 +0200
Subject: [PATCH 031/172] install ffmpeg to extend the supported audio format

---
 Dockerfile           | 3 ++-
 document/swagger.yml | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 1c6f518..e3bd38a 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -70,7 +70,8 @@ RUN cd /opt/vosk-api/python && \
 WORKDIR /usr/src/speech-to-text
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml
+RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml && \
+    apt-get install -y ffmpeg
 
 COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py .
diff --git a/document/swagger.yml b/document/swagger.yml
index e763d3b..3db05a0 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,7 +24,7 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, flac, ogg)"
+        description: "Audio File (wav, mp3, flac, ogg, wma, m4a)"
         required: true
         type: "file"
       responses:

From a23a19725b28287a44de69caaba06bbccda9d010 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 3 Nov 2020 17:24:20 +0100
Subject: [PATCH 032/172] update worker and fix some bugs

---
 Dockerfile           |   8 +--
 document/swagger.yml |   2 +-
 run.py               |  10 ++--
 tools.py             | 133 +++++++++++++++++++++++++++----------------
 vosk-api             |   2 +-
 5 files changed, 96 insertions(+), 59 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index e3bd38a..c8e95cd 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -59,6 +59,10 @@ RUN apt install -y software-properties-common && wget https://apt.llvm.org/llvm.
     pip3 install websockets && \
     pip3 install librosa webrtcvad scipy sklearn
 
+# Install main service packages
+RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml && \
+    apt-get install -y ffmpeg
+
 # build VOSK KALDI
 COPY vosk-api /opt/vosk-api
 RUN cd /opt/vosk-api/python && \
@@ -69,10 +73,6 @@ RUN cd /opt/vosk-api/python && \
 # Define the main folder
 WORKDIR /usr/src/speech-to-text
 
-# Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml && \
-    apt-get install -y ffmpeg
-
 COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py .
 COPY run.py .
diff --git a/document/swagger.yml b/document/swagger.yml
index 3db05a0..b52b52c 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,7 +24,7 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, flac, ogg, wma, m4a)"
+        description: "Audio File - Waveform Audio File Format is required. Best configuration (16KHz, 16b, mono)"
         required: true
         type: "file"
       responses:
diff --git a/run.py b/run.py
index b548209..59bc5b1 100644
--- a/run.py
+++ b/run.py
@@ -19,7 +19,7 @@
 worker.log.info('Load acoustic model and decoding graph')
 model = Model(worker.AM_PATH, worker.LM_PATH,
               worker.CONFIG_FILES_PATH+"/online.conf")
-
+spkModel = None
 
 # API
 @app.route('/transcribe', methods=['POST'])
@@ -43,11 +43,13 @@ def transcribe():
         if 'file' in request.files.keys():
             file = request.files['file']
             worker.getAudio(file)
-            rec = KaldiRecognizer(model, worker.rate, is_metadata)
-            data_ = rec.Decode(worker.data)
+            rec = KaldiRecognizer(model, spkModel, worker.rate, False)
+            rec.AcceptWaveform(worker.data)
+            data_ = rec.FinalResult()
             if is_metadata:
                 data_ = rec.GetMetadata()
-            data = worker.get_response(data_, is_metadata, is_metadata, nbrOfSpk)
+            data = worker.get_response(data_, is_metadata, nbrOfSpk)
+            worker.clean()
         else:
             raise ValueError('No audio file was uploaded')
 
diff --git a/tools.py b/tools.py
index 5d925bf..958502f 100644
--- a/tools.py
+++ b/tools.py
@@ -22,6 +22,7 @@
 import json
 import yaml
 import numpy as np
+from scipy.io import wavfile
 from flask_swagger_ui import get_swaggerui_blueprint
 ##############
 
@@ -81,18 +82,20 @@ def swaggerUI(self, app):
 
     def getAudio(self, file):
         filename = str(uuid.uuid4())
-        file_path = self.TEMP_FILE_PATH+"/"+filename
-        file.save(file_path)
+        self.file_path = self.TEMP_FILE_PATH+"/"+filename
+        file.save(self.file_path)
         try:
-            data, sr = librosa.load(file_path)
-            self.data = (data * 32767).astype(np.int16)
-            self.rate = sr
+            self.rate, self.data = wavfile.read(self.file_path)
+            # if stereo file, convert to mono by computing the mean of the channels
+            if len(self.data.shape) == 2 and self.data.shape[1] == 2:
+                self.data = np.mean(self.data, axis=1, dtype=np.int16)
         except Exception as e:
             self.log.error(e)
             raise ValueError("The uploaded file format is not supported!!!")
-        finally:
-            if not self.SAVE_AUDIO:
-                os.remove(file_path)
+
+    def clean(self):
+        if not self.SAVE_AUDIO:
+            os.remove(self.file_path)
 
     # re-create config files
     def loadConfig(self):
@@ -125,9 +128,6 @@ def loadConfig(self):
                     "--max-active="+decoder_settings.get('decoder_params', 'max_active')+"\n")
                 f.write("--frame-subsampling-factor="+decoder_settings.get(
                     'decoder_params', 'frame_subsampling_factor')+"\n")
-                f.write("--endpoint.rule2.min-trailing-silence=0.5\n")
-                f.write("--endpoint.rule3.min-trailing-silence=1.0\n")
-                f.write("--endpoint.rule4.min-trailing-silence=2.0\n")
 
         # Prepare "ivector_extractor.conf"
         with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
@@ -181,39 +181,23 @@ def parse_text(self, text):
         return text
 
     # Postprocess response
-    def get_response(self, dataJson, is_metadata, is_spkDiarization, nbrOfSpk):
+    def get_response(self, dataJson, is_metadata, nbrOfSpk):
         if dataJson is not None:
             data = json.loads(dataJson)
             if not is_metadata:
                 text = data['text']  # get text from response
                 return self.parse_text(text)
 
-            elif 'words' in data and 'features' in data:
-                if is_spkDiarization:
-                    # Get Features and spoken segments and clean data
-                    features = data['features']
-                    seg = data['segments'] if data['segments'] is not None else []
-                    del data['features']
-                    del data['segments']
-
-                    # Prepare the parameters for SpeakerDiarization input
-                    feats = np.array(features)
-                    feats = np.squeeze(feats)
-                    mask = np.ones(shape=(feats.shape[0],))
-                    for pos in seg:
-                        mask[pos-30:pos] = 0
-
-                    # Do speaker diarization and get speaker segments
-                    spk = SpeakerDiarization()
-                    spk.set_maxNrSpeakers(nbrOfSpk)
-                    spkrs = spk.run(feats, mask)
-
-                    # Generate final output data
-                    return self.process_output(data, spkrs)
-
-                del data['features']
-                del data['segments']
-                return data
+            elif 'words' in data:
+                # Do speaker diarization and get speaker segments
+                spk = SpeakerDiarization()
+                spk.set_maxNrSpeakers(nbrOfSpk)
+                spkrs = spk.run(self.file_path)
+
+                # Generate final output data
+                return self.process_output(data, spkrs)
+            elif 'text' in data:
+                return {'speakers': [], 'text': data['text'], 'words': []}
             else:
                 return {'speakers': [], 'text': '', 'words': []}
         else:
@@ -228,6 +212,8 @@ def process_output(self, data, spkrs):
         text_ = ""
         words = []
         for word in data['words']:
+            if i+1 == len(spkrs):
+                continue
             if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
                 text_ += word["word"] + " "
                 words.append(word)
@@ -266,10 +252,8 @@ def __init__(self):
        # MFCC FEATURES PARAMETERS
         self.frame_length_s = 0.025
         self.frame_shift_s = 0.01
-        self.num_bins = 40
-        self.num_ceps = 40
-        self.low_freq = 40
-        self.high_freq = -200
+        self.num_bins = 30
+        self.num_ceps = 30
         #####
 
         # Segment
@@ -321,11 +305,57 @@ def __init__(self):
         self.nbIter = 10  # Number of expectation-maximization (EM) iterations
         self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
         ######
+ 
+    def compute_feat_Librosa(self,audioFile):
+        try:
+            self.data, self.sr = librosa.load(audioFile,sr=None)
+            frame_length_inSample = self.frame_length_s * self.sr
+            hop = int(self.frame_shift_s * self.sr)
+            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
+            if self.sr >= 16000:
+                mfccNumpy = librosa.feature.mfcc(y=self.data,
+                                                sr=self.sr,
+                                                dct_type=2,
+                                                n_mfcc=self.num_ceps,
+                                                n_mels=self.num_bins,
+                                                n_fft=NFFT,
+                                                hop_length=hop,
+                                                fmin=20,
+                                                fmax=7600).T
+            else:
+                mfccNumpy = librosa.feature.mfcc(y=self.data,
+                                                sr=self.sr,
+                                                dct_type=2,
+                                                n_mfcc=self.num_ceps,
+                                                n_mels=self.num_bins,
+                                                n_fft=NFFT,
+                                                hop_length=hop).T
+
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed when extracting features!!!")
+        else:
+            return mfccNumpy
+
+    def computeVAD_WEBRTC(self, data, sr, nFeatures):
+        try:
+            va_framed = py_webrtcvad(data, fs=sr, fs_vad=sr, hoplength=30, vad_mode=0)
+            segments = get_py_webrtcvad_segments(va_framed,sr)
+            maskSAD = np.zeros([1,nFeatures])
+            for seg in segments:
+                start=int(np.round(seg[0]/self.frame_shift_s))
+                end=int(np.round(seg[1]/self.frame_shift_s))
+                maskSAD[0][start:end]=1
+        except Exception as e:
+            self.log.error(e)
+            raise ValueError("Speaker diarization failed while voice activity detection!!!")
+        else:
+            return maskSAD
 
     def set_maxNrSpeakers(self, nbr):
         self.maxNrSpeakers = nbr
 
-    def run(self, feats, mask):
+    def run(self, audioFile):
         try:
             def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 numberOfSpeechFeatures = finalSegmentTable[-1, 2].astype(int)+1
@@ -361,6 +391,9 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
 
             start_time = time.time()
 
+            self.log.info('Start Speaker diarization')
+
+            feats = self.compute_feat_Librosa(audioFile)
             nFeatures = feats.shape[0]
             duration = nFeatures * self.frame_shift_s
 
@@ -368,7 +401,7 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 return [[0, duration, 1],
                         [duration, -1, -1]]
 
-            maskSAD = mask
+            maskSAD = self.computeVAD_WEBRTC(self.data, self.sr, nFeatures)
             maskUEM = np.ones([1, nFeatures])
 
             mask = np.logical_and(maskUEM, maskSAD)
@@ -440,16 +473,18 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(
                     finalClusteringTableResegmentation), duration)
             else:
-                return None
+                return [[0, duration, 1],
+                    [duration, -1, -1]]
 
-            self.log.info("Speaker Diarization time in seconds: %s" %
-                          (time.time() - start_time))
+            self.log.info("Speaker Diarization time in seconds: %d" %
+                          int(time.time() - start_time))
         except ValueError as v:
-            self.log.info(v)
+            self.log.error(v)
             return [[0, duration, 1],
                     [duration, -1, -1]]
         except Exception as e:
             self.log.error(e)
-            return None
+            return [[0, duration, 1],
+                    [duration, -1, -1]]
         else:
             return seg
diff --git a/vosk-api b/vosk-api
index fec4a1a..a38506d 160000
--- a/vosk-api
+++ b/vosk-api
@@ -1 +1 @@
-Subproject commit fec4a1ad76a3c2e66bad84acd5cead2070b3d1b6
+Subproject commit a38506d69460438d7f2b074470e72ef0ab973bf0

From ad9d5db2b795e42093e3233a39323e32156aef79 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 3 Nov 2020 17:32:13 +0100
Subject: [PATCH 033/172] remove extra parameters

---
 run.py   | 5 ++---
 tools.py | 9 +++------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/run.py b/run.py
index 59bc5b1..6ecfa3f 100644
--- a/run.py
+++ b/run.py
@@ -29,7 +29,6 @@ def transcribe():
                         (strftime("%d/%b/%d %H:%M:%S", gmtime())))
 
         is_metadata = False
-        nbrOfSpk = 10
 
         # get response content type
         if request.headers.get('accept').lower() == 'application/json':
@@ -43,12 +42,12 @@ def transcribe():
         if 'file' in request.files.keys():
             file = request.files['file']
             worker.getAudio(file)
-            rec = KaldiRecognizer(model, spkModel, worker.rate, False)
+            rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
             rec.AcceptWaveform(worker.data)
             data_ = rec.FinalResult()
             if is_metadata:
                 data_ = rec.GetMetadata()
-            data = worker.get_response(data_, is_metadata, nbrOfSpk)
+            data = worker.get_response(data_, is_metadata)
             worker.clean()
         else:
             raise ValueError('No audio file was uploaded')
diff --git a/tools.py b/tools.py
index 958502f..0cfe1f8 100644
--- a/tools.py
+++ b/tools.py
@@ -43,6 +43,7 @@ def __init__(self):
         self.NBR_THREADS = 100
         self.SWAGGER_URL = '/api-doc'
         self.SWAGGER_PATH = ''
+        self.ONLINE = False
 
         if not os.path.isdir(self.CONFIG_FILES_PATH):
             os.mkdir(self.CONFIG_FILES_PATH)
@@ -181,7 +182,7 @@ def parse_text(self, text):
         return text
 
     # Postprocess response
-    def get_response(self, dataJson, is_metadata, nbrOfSpk):
+    def get_response(self, dataJson, is_metadata):
         if dataJson is not None:
             data = json.loads(dataJson)
             if not is_metadata:
@@ -191,7 +192,6 @@ def get_response(self, dataJson, is_metadata, nbrOfSpk):
             elif 'words' in data:
                 # Do speaker diarization and get speaker segments
                 spk = SpeakerDiarization()
-                spk.set_maxNrSpeakers(nbrOfSpk)
                 spkrs = spk.run(self.file_path)
 
                 # Generate final output data
@@ -296,7 +296,7 @@ def __init__(self):
         self.bestClusteringCriterion = 'elbow'
         self.sigma = 1  # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
         self.percentile = 40
-        self.maxNrSpeakers = 16  # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
+        self.maxNrSpeakers = 10  # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
         ######
 
         # RESEGMENTATION
@@ -352,9 +352,6 @@ def computeVAD_WEBRTC(self, data, sr, nFeatures):
         else:
             return maskSAD
 
-    def set_maxNrSpeakers(self, nbr):
-        self.maxNrSpeakers = nbr
-
     def run(self, audioFile):
         try:
             def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):

From d13326ec61dd9947323cabc6d08c0f5ace2e1a96 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 3 Nov 2020 17:35:01 +0100
Subject: [PATCH 034/172] update README

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 8e53305..45c75f7 100644
--- a/README.md
+++ b/README.md
@@ -55,14 +55,14 @@ If you want to use our service alone without LinTO-Platform-STT-Service-Manager,
 
 ```bash
 wget https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/linSTT_AM_fr-FR_v1.0.0.zip
-wget https://dl.linto.ai/downloads/model-distribution/decoding-graphs/LVCSR/fr-FR/decoding_graph_fr-FR_Small_v1.0.0.zip
+wget https://dl.linto.ai/downloads/model-distribution/decoding-graphs/LVCSR/fr-FR/decoding_graph_fr-FR_Small_v1.1.0.zip
 ```
 
 2- Uncompress both files
 
 ```bash
 unzip linSTT_AM_fr-FR_v1.0.0.zip -d AM_fr-FR
-unzip decoding_graph_fr-FR_Small_v1.0.0.zip -d DG_fr-FR_Small
+unzip decoding_graph_fr-FR_Small_v1.1.0.zip -d DG_fr-FR_Small
 ```
 
 3- Move the uncompressed files into the shared storage directory

From e21264e36b1a7f00f8f16d7d1e9be45aea79b2ce Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 3 Nov 2020 17:45:06 +0100
Subject: [PATCH 035/172] update README

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index fb69cc2..43df31e 100644
--- a/README.md
+++ b/README.md
@@ -55,14 +55,14 @@ If you want to use our service alone without LinTO-Platform-STT-Service-Manager,
 
 ```bash
 wget https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/linSTT_AM_fr-FR_v1.0.0.zip
-wget https://dl.linto.ai/downloads/model-distribution/decoding-graphs/LVCSR/fr-FR/decoding_graph_fr-FR_Small_v1.0.0.zip
+wget https://dl.linto.ai/downloads/model-distribution/decoding-graphs/LVCSR/fr-FR/decoding_graph_fr-FR_Small_v1.1.0.zip
 ```
 
 2- Uncompress both files
 
 ```bash
 unzip linSTT_AM_fr-FR_v1.0.0.zip -d AM_fr-FR
-unzip decoding_graph_fr-FR_Small_v1.0.0.zip -d DG_fr-FR_Small
+unzip decoding_graph_fr-FR_Small_v1.1.0.zip -d DG_fr-FR_Small
 ```
 
 3- Move the uncompressed files into the shared storage directory

From d4b127f19f9cef1cbd11cb18f2168a796b0a939a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 6 Nov 2020 15:47:34 +0100
Subject: [PATCH 036/172] add confidence score feature

---
 run.py   | 5 +++--
 tools.py | 4 ++--
 vosk-api | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/run.py b/run.py
index 6ecfa3f..822a650 100644
--- a/run.py
+++ b/run.py
@@ -3,7 +3,7 @@
 
 from flask import Flask, request, abort, Response, json
 from vosk import Model, KaldiRecognizer
-from tools import WorkerStreaming
+from tools import Worker
 from time import gmtime, strftime
 
 from gevent.pywsgi import WSGIServer
@@ -13,7 +13,7 @@
 app = Flask("__stt-standelone-worker__")
 
 # create WorkerStreaming object
-worker = WorkerStreaming()
+worker = Worker()
 
 # Load ASR models (acoustic model and decoding graph)
 worker.log.info('Load acoustic model and decoding graph')
@@ -45,6 +45,7 @@ def transcribe():
             rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
             rec.AcceptWaveform(worker.data)
             data_ = rec.FinalResult()
+            worker.log.info(rec.uttConfidence())
             if is_metadata:
                 data_ = rec.GetMetadata()
             data = worker.get_response(data_, is_metadata)
diff --git a/tools.py b/tools.py
index 0cfe1f8..39e2700 100644
--- a/tools.py
+++ b/tools.py
@@ -27,10 +27,10 @@
 ##############
 
 
-class WorkerStreaming:
+class Worker:
     def __init__(self):
         # Set logger config
-        self.log = logging.getLogger("__stt-standelone-worker-streaming__")
+        self.log = logging.getLogger("__stt-standelone-worker__")
         logging.basicConfig(level=logging.INFO)
 
         # Main parameters
diff --git a/vosk-api b/vosk-api
index a38506d..7f555e4 160000
--- a/vosk-api
+++ b/vosk-api
@@ -1 +1 @@
-Subproject commit a38506d69460438d7f2b074470e72ef0ab973bf0
+Subproject commit 7f555e464c1d6b16233354491868f46d009c453c

From 088cbb55e347042dec3604c4fab5db1bb84a5fa5 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 12 Jan 2021 13:28:12 +0100
Subject: [PATCH 037/172] fix some bugs: response error when generating the
 speaker information

---
 docker-compose.yml |  2 +-
 pyBK               |  2 +-
 tools.py           | 79 +++++++++++++++++++++++++---------------------
 3 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/docker-compose.yml b/docker-compose.yml
index 08c14d0..f7da7db 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,7 +5,7 @@ services:
   stt-worker:
     container_name: stt-standalone-worker
     build: .
-    image: lintoai/linto-platform-stt-standalone-worker:latest
+    image: lintoai/linto-platform-stt-standalone-worker:latest-unstable
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM
diff --git a/pyBK b/pyBK
index 7738eb7..1e5dc7d 160000
--- a/pyBK
+++ b/pyBK
@@ -1 +1 @@
-Subproject commit 7738eb75dfc65438fbcd0eed9bb6a1f086b4bd6c
+Subproject commit 1e5dc7de4e0a7d43a44152a68beca0699c14fd4c
diff --git a/tools.py b/tools.py
index 39e2700..92c91ed 100644
--- a/tools.py
+++ b/tools.py
@@ -206,42 +206,49 @@ def get_response(self, dataJson, is_metadata):
     # return a json object including word-data, speaker-data
 
     def process_output(self, data, spkrs):
-        speakers = []
-        text = []
-        i = 0
-        text_ = ""
-        words = []
-        for word in data['words']:
-            if i+1 == len(spkrs):
-                continue
-            if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-                text_ += word["word"] + " "
-                words.append(word)
-            else:
-                speaker = {}
-                speaker["start"] = words[0]["start"]
-                speaker["end"] = words[len(words)-1]["end"]
-                speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
-                speaker["words"] = words
-
-                text.append(
-                    'spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
-                speakers.append(speaker)
-
-                words = [word]
-                text_ = word["word"] + " "
-                i += 1
-
-        speaker = {}
-        speaker["start"] = words[0]["start"]
-        speaker["end"] = words[len(words)-1]["end"]
-        speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
-        speaker["words"] = words
-
-        text.append('spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
-        speakers.append(speaker)
-
-        return {'speakers': speakers, 'text': text}
+        try:
+            speakers = []
+            text = []
+            i = 0
+            text_ = ""
+            words = []
+            for word in data['words']:
+                if i+1 == len(spkrs):
+                    continue
+                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+                    text_ += word["word"] + " "
+                    words.append(word)
+                elif len(words) != 0:
+                    speaker = {}
+                    speaker["start"] = words[0]["start"]
+                    speaker["end"] = words[len(words)-1]["end"]
+                    speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+                    speaker["words"] = words
+
+                    text.append(
+                        'spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
+                    speakers.append(speaker)
+
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
+                else:
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
+
+            speaker = {}
+            speaker["start"] = words[0]["start"]
+            speaker["end"] = words[len(words)-1]["end"]
+            speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+            speaker["words"] = words
+
+            text.append('spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
+            speakers.append(speaker)
+
+            return {'speakers': speakers, 'text': text}
+        except:
+            return { 'data': data, 'spks': spkrs }
 
 
 class SpeakerDiarization:

From 05651b8adddf128fa702ce1f1cdda82be8cfed99 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 12 Jan 2021 15:14:23 +0100
Subject: [PATCH 038/172] add the confidence score to the response json content

---
 run.py   |  4 +--
 tools.py | 75 +++++++++++++++++++++++++++++++-------------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/run.py b/run.py
index 822a650..5a7fb25 100644
--- a/run.py
+++ b/run.py
@@ -45,10 +45,10 @@ def transcribe():
             rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
             rec.AcceptWaveform(worker.data)
             data_ = rec.FinalResult()
-            worker.log.info(rec.uttConfidence())
+            confidence = rec.uttConfidence()
             if is_metadata:
                 data_ = rec.GetMetadata()
-            data = worker.get_response(data_, is_metadata)
+            data = worker.get_response(data_, confidence, is_metadata)
             worker.clean()
         else:
             raise ValueError('No audio file was uploaded')
diff --git a/tools.py b/tools.py
index 92c91ed..8844e48 100644
--- a/tools.py
+++ b/tools.py
@@ -182,9 +182,10 @@ def parse_text(self, text):
         return text
 
     # Postprocess response
-    def get_response(self, dataJson, is_metadata):
+    def get_response(self, dataJson, confidence, is_metadata):
         if dataJson is not None:
             data = json.loads(dataJson)
+            data['conf'] = confidence
             if not is_metadata:
                 text = data['text']  # get text from response
                 return self.parse_text(text)
@@ -197,11 +198,11 @@ def get_response(self, dataJson, is_metadata):
                 # Generate final output data
                 return self.process_output(data, spkrs)
             elif 'text' in data:
-                return {'speakers': [], 'text': data['text'], 'words': []}
+                return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': []}
             else:
-                return {'speakers': [], 'text': '', 'words': []}
+                return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
         else:
-            return {'speakers': [], 'text': '', 'words': []}
+            return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
 
     # return a json object including word-data, speaker-data
 
@@ -243,12 +244,13 @@ def process_output(self, data, spkrs):
             speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
             speaker["words"] = words
 
-            text.append('spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
+            text.append('spk'+str(int(spkrs[i][2])) +
+                        ' : ' + self.parse_text(text_))
             speakers.append(speaker)
 
-            return {'speakers': speakers, 'text': text}
+            return {'speakers': speakers, 'text': text, 'confidence-score': data['conf']}
         except:
-            return { 'data': data, 'spks': spkrs }
+            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
 
 
 class SpeakerDiarization:
@@ -312,50 +314,57 @@ def __init__(self):
         self.nbIter = 10  # Number of expectation-maximization (EM) iterations
         self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
         ######
- 
-    def compute_feat_Librosa(self,audioFile):
+
+    def compute_feat_Librosa(self, audioFile):
         try:
-            self.data, self.sr = librosa.load(audioFile,sr=None)
+            self.data, self.sr = librosa.load(audioFile, sr=None)
             frame_length_inSample = self.frame_length_s * self.sr
             hop = int(self.frame_shift_s * self.sr)
             NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
             if self.sr >= 16000:
                 mfccNumpy = librosa.feature.mfcc(y=self.data,
-                                                sr=self.sr,
-                                                dct_type=2,
-                                                n_mfcc=self.num_ceps,
-                                                n_mels=self.num_bins,
-                                                n_fft=NFFT,
-                                                hop_length=hop,
-                                                fmin=20,
-                                                fmax=7600).T
+                                                 sr=self.sr,
+                                                 dct_type=2,
+                                                 n_mfcc=self.num_ceps,
+                                                 n_mels=self.num_bins,
+                                                 n_fft=NFFT,
+                                                 hop_length=hop,
+                                                 fmin=20,
+                                                 fmax=7600).T
             else:
                 mfccNumpy = librosa.feature.mfcc(y=self.data,
-                                                sr=self.sr,
-                                                dct_type=2,
-                                                n_mfcc=self.num_ceps,
-                                                n_mels=self.num_bins,
-                                                n_fft=NFFT,
-                                                hop_length=hop).T
+                                                 sr=self.sr,
+                                                 dct_type=2,
+                                                 n_mfcc=self.num_ceps,
+                                                 n_mels=self.num_bins,
+                                                 n_fft=NFFT,
+                                                 hop_length=hop).T
 
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker diarization failed when extracting features!!!")
+            raise ValueError(
+                "Speaker diarization failed when extracting features!!!")
         else:
             return mfccNumpy
 
     def computeVAD_WEBRTC(self, data, sr, nFeatures):
         try:
-            va_framed = py_webrtcvad(data, fs=sr, fs_vad=sr, hoplength=30, vad_mode=0)
-            segments = get_py_webrtcvad_segments(va_framed,sr)
-            maskSAD = np.zeros([1,nFeatures])
+            if sr not in [8000, 16000, 32000, 48000]:
+                data = librosa.resample(data, sr, 16000)
+                sr = 16000
+
+            va_framed = py_webrtcvad(
+                data, fs=sr, fs_vad=sr, hoplength=30, vad_mode=0)
+            segments = get_py_webrtcvad_segments(va_framed, sr)
+            maskSAD = np.zeros([1, nFeatures])
             for seg in segments:
-                start=int(np.round(seg[0]/self.frame_shift_s))
-                end=int(np.round(seg[1]/self.frame_shift_s))
-                maskSAD[0][start:end]=1
+                start = int(np.round(seg[0]/self.frame_shift_s))
+                end = int(np.round(seg[1]/self.frame_shift_s))
+                maskSAD[0][start:end] = 1
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker diarization failed while voice activity detection!!!")
+            raise ValueError(
+                "Speaker diarization failed while voice activity detection!!!")
         else:
             return maskSAD
 
@@ -478,7 +487,7 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                     finalClusteringTableResegmentation), duration)
             else:
                 return [[0, duration, 1],
-                    [duration, -1, -1]]
+                        [duration, -1, -1]]
 
             self.log.info("Speaker Diarization time in seconds: %d" %
                           int(time.time() - start_time))

From 19054c4a4dc3275c44064b859b7d7e7c0a3c611c Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 12 Jan 2021 15:51:39 +0100
Subject: [PATCH 039/172] update RELEASE

---
 RELEASE.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/RELEASE.md b/RELEASE.md
index 8712413..e190830 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,8 @@
+# 3.1.1
+- Change Pykaldi with vosk-API (no python wrapper for decoding function, no extrat packages during installation, c++ implementation based on kaldi functions)
+- New feature: Compute a confidence score per transcription
+- Fix minor bugs
+
 # 2.2.1
 - Fix minor bugs
 - put SWAGGER_PATH parameter as optional

From 49e9528735359309122d8da220f094092e622a3a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 12 Jan 2021 16:19:43 +0100
Subject: [PATCH 040/172] update worker

---
 .gitmodules          |   6 +
 Dockerfile           | 180 ++++-------
 Jenkinsfile          |   1 -
 RELEASE.md           |   5 +
 document/swagger.yml |   2 +-
 pyBK                 |   1 +
 run.py               |  70 ++--
 tools.py             | 749 +++++++++++++++++--------------------------
 vosk-api             |   1 +
 9 files changed, 426 insertions(+), 589 deletions(-)
 create mode 100644 .gitmodules
 create mode 160000 pyBK
 mode change 100755 => 100644 run.py
 create mode 160000 vosk-api

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..b131dc4
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,6 @@
+[submodule "vosk-api"]
+	path = vosk-api
+	url = https://github.com/irebai/vosk-api.git
+[submodule "pyBK"]
+	path = pyBK
+	url = https://github.com/irebai/pyBK.git
diff --git a/Dockerfile b/Dockerfile
index 5e9f2fe..c8e95cd 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,127 +1,79 @@
-# Dockerfile for building PyKaldi image from Ubuntu 16.04 image
 FROM ubuntu:18.04
 LABEL maintainer="irebai@linagora.com"
 
-# Install necessary system packages
-RUN apt-get update \
-    && apt-get install -y \
-    python3 \
+RUN apt-get update &&\
+    apt-get install -y \
+    python2.7   \
+    python3     \
     python3-pip \
-    python2.7 \
-    autoconf \
-    automake \
-    cmake \
-    make \
-    curl \
-    g++ \
-    git \
-    graphviz \
-    libatlas3-base \
-    libtool \
-    pkg-config \
-    sox \
-    subversion \
-    bzip2 \
-    unzip \
-    wget \
-    zlib1g-dev \
-    ca-certificates \
-    gfortran \
-    patch \
-    ffmpeg \
-    nano && \
-    ln -s /usr/bin/python3 /usr/bin/python && \
-    ln -s /usr/bin/pip3 /usr/bin/pip
+    git  \
+    swig \
+    nano \
+    sox  \
+    automake wget unzip build-essential libtool zlib1g-dev locales libatlas-base-dev ca-certificates gfortran subversion &&\
+    apt-get clean
 
-# Install necessary Python packages (pykaldi dependencies)
-RUN pip install --upgrade pip \
-    numpy \
-    setuptools \
-    pyparsing \
-    ninja
+## Build kaldi and Clean installation (intel, openfst, src/*)
+RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
+    cd /opt/kaldi/tools && \
+    ./extras/install_mkl.sh && \
+    make -j $(nproc) && \
+    cd /opt/kaldi/src && \
+    ./configure --shared && \
+    make depend -j $(nproc) && \
+    make -j $(nproc) && \
+    mkdir -p /opt/kaldi/src_ && \
+    mv       /opt/kaldi/src/base \
+             /opt/kaldi/src/chain \
+             /opt/kaldi/src/cudamatrix \
+             /opt/kaldi/src/decoder \
+             /opt/kaldi/src/feat \
+             /opt/kaldi/src/fstext \
+             /opt/kaldi/src/gmm \
+             /opt/kaldi/src/hmm \
+             /opt/kaldi/src/ivector \
+             /opt/kaldi/src/kws \
+             /opt/kaldi/src/lat \
+             /opt/kaldi/src/lm \
+             /opt/kaldi/src/matrix \
+             /opt/kaldi/src/nnet \
+             /opt/kaldi/src/nnet2 \
+             /opt/kaldi/src/nnet3 \
+             /opt/kaldi/src/online2 \
+             /opt/kaldi/src/rnnlm \
+             /opt/kaldi/src/sgmm2 \
+             /opt/kaldi/src/transform \
+             /opt/kaldi/src/tree \
+             /opt/kaldi/src/util \
+             /opt/kaldi/src/itf \
+             /opt/kaldi/src/lib /opt/kaldi/src_ && \
+    cd /opt/kaldi && rm -r src && mv src_ src && rm src/*/*.cc && rm src/*/*.o && rm src/*/*.so && \
+    cd /opt/intel/mkl/lib && rm -f intel64/*.a intel64_lin/*.a && \
+    cd /opt/kaldi/tools && mkdir openfst_ && mv openfst-*/lib openfst-*/include openfst-*/bin openfst_ && rm openfst_/lib/*.so* openfst_/lib/*.la && \
+    rm -r openfst-*/* && mv openfst_/* openfst-*/ && rm -r openfst_
 
-## Install Protobuf, CLIF, Kaldi and PyKaldi and Clean installation
-RUN git clone --depth 1 https://github.com/pykaldi/pykaldi.git /pykaldi \
-    && cd /pykaldi/tools \
-    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_kaldi.sh \
-    && sed -i "s/\-j 2/-j $(nproc)/g" ./install_clif.sh \
-    && sed -i "s/make \-j4/make -j $(nproc)/g" ./install_protobuf.sh \
-    && ./check_dependencies.sh \
-    && ./install_protobuf.sh \
-    && ./install_clif.sh \
-    && ./install_kaldi.sh \
-    && cd /pykaldi \
-    && python setup.py install \
-    && rm -rf   /pykaldi/CMakeLists.txt \
-                /pykaldi/LICENSE \
-                /pykaldi/README.md \
-                /pykaldi/setup.cfg \
-                /pykaldi/setup.py \
-                /pykaldi/docker \
-                /pykaldi/docs \
-                /pykaldi/extras \
-                /pykaldi/pykaldi.egg-info \
-                /pykaldi/tests \
-                /pykaldi/build/CMakeCache.txt \
-                /pykaldi/build/bdist.linux-x86_64 \
-                /pykaldi/build/build.ninja \
-                /pykaldi/build/cmake_install.cmake \
-                /pykaldi/build/docs \
-                /pykaldi/build/kaldi \
-                /pykaldi/build/lib \
-                /pykaldi/build/rules.ninja \
-                /pykaldi/tools/check_dependencies.sh \
-                /pykaldi/tools/clif* \
-                /pykaldi/tools/find_python_library.py \
-                /pykaldi/tools/install_* \
-                /pykaldi/tools/protobuf \
-                /pykaldi/tools/use_namespace.sh \
-                /pykaldi/tools/kaldi/COPYING \
-                /pykaldi/tools/kaldi/INSTALL \
-                /pykaldi/tools/kaldi/README.md \
-                /pykaldi/tools/kaldi/egs \
-                /pykaldi/tools/kaldi/misc \
-                /pykaldi/tools/kaldi/scripts \
-                /pykaldi/tools/kaldi/windows \
-    && mkdir -p /pykaldi/tools/kaldi/src_/lib \
-    && mv  /pykaldi/tools/kaldi/src/base/libkaldi-base.so \
-            /pykaldi/tools/kaldi/src/chain/libkaldi-chain.so \
-            /pykaldi/tools/kaldi/src/cudamatrix/libkaldi-cudamatrix.so \
-            /pykaldi/tools/kaldi/src/decoder/libkaldi-decoder.so \
-            /pykaldi/tools/kaldi/src/feat/libkaldi-feat.so \
-            /pykaldi/tools/kaldi/src/fstext/libkaldi-fstext.so \
-            /pykaldi/tools/kaldi/src/gmm/libkaldi-gmm.so \
-            /pykaldi/tools/kaldi/src/hmm/libkaldi-hmm.so \
-            /pykaldi/tools/kaldi/src/ivector/libkaldi-ivector.so \
-            /pykaldi/tools/kaldi/src/kws/libkaldi-kws.so \
-            /pykaldi/tools/kaldi/src/lat/libkaldi-lat.so \
-            /pykaldi/tools/kaldi/src/lm/libkaldi-lm.so \
-            /pykaldi/tools/kaldi/src/matrix/libkaldi-matrix.so \
-            /pykaldi/tools/kaldi/src/nnet/libkaldi-nnet.so \
-            /pykaldi/tools/kaldi/src/nnet2/libkaldi-nnet2.so \
-            /pykaldi/tools/kaldi/src/nnet3/libkaldi-nnet3.so \
-            /pykaldi/tools/kaldi/src/online2/libkaldi-online2.so \
-            /pykaldi/tools/kaldi/src/rnnlm/libkaldi-rnnlm.so \
-            /pykaldi/tools/kaldi/src/sgmm2/libkaldi-sgmm2.so \
-            /pykaldi/tools/kaldi/src/transform/libkaldi-transform.so \
-            /pykaldi/tools/kaldi/src/tree/libkaldi-tree.so \
-            /pykaldi/tools/kaldi/src/util/libkaldi-util.so \
-            /pykaldi/tools/kaldi/src_/lib \
-        && rm -rf /pykaldi/tools/kaldi/src && mv /pykaldi/tools/kaldi/src_ /pykaldi/tools/kaldi/src \
-        && cd /pykaldi/tools/kaldi/tools && mkdir openfsttmp && mv openfst-*/lib openfst-*/include openfst-*/bin openfsttmp && rm openfsttmp/lib/*.a openfsttmp/lib/*.la && \
-                rm -r openfst-*/* && mv openfsttmp/* openfst-*/ && rm -r openfsttmp
-
-# Define the main folder
-WORKDIR /usr/src/speech-to-text
+# Install pyBK (speaker diarization toolkit)
+RUN apt install -y software-properties-common && wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && ./llvm.sh 10 && \
+    export LLVM_CONFIG=/usr/bin/llvm-config-10 && \
+    pip3 install numpy && \
+    pip3 install websockets && \
+    pip3 install librosa webrtcvad scipy sklearn
 
 # Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui pyyaml librosa gevent
-RUN git clone https://github.com/irebai/pyBK.git /pykaldi/tools/pyBK \
-    && cp /pykaldi/tools/pyBK/diarizationFunctions.py .
+RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml && \
+    apt-get install -y ffmpeg
 
-# Set environment variables
-ENV PATH /pykaldi/tools/kaldi/egs/wsj/s5/utils/:$PATH
+# build VOSK KALDI
+COPY vosk-api /opt/vosk-api
+RUN cd /opt/vosk-api/python && \
+    export KALDI_ROOT=/opt/kaldi && \
+    export KALDI_MKL=1 && \
+    python3 setup.py install --user --single-version-externally-managed --root=/
+
+# Define the main folder
+WORKDIR /usr/src/speech-to-text
 
+COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py .
 COPY run.py .
 
diff --git a/Jenkinsfile b/Jenkinsfile
index d027c84..b4bdffc 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -24,7 +24,6 @@ pipeline {
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push("${VERSION}")
                         image.push('latest')
-                        image.push('offline')
                     }
                 }
             }
diff --git a/RELEASE.md b/RELEASE.md
index 8712413..e190830 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,8 @@
+# 3.1.1
+- Change Pykaldi with vosk-API (no python wrapper for decoding function, no extrat packages during installation, c++ implementation based on kaldi functions)
+- New feature: Compute a confidence score per transcription
+- Fix minor bugs
+
 # 2.2.1
 - Fix minor bugs
 - put SWAGGER_PATH parameter as optional
diff --git a/document/swagger.yml b/document/swagger.yml
index 3db05a0..b52b52c 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -24,7 +24,7 @@ paths:
       parameters: 
       - name: "file"
         in: "formData"
-        description: "Audio File (wav, mp3, flac, ogg, wma, m4a)"
+        description: "Audio File - Waveform Audio File Format is required. Best configuration (16KHz, 16b, mono)"
         required: true
         type: "file"
       responses:
diff --git a/pyBK b/pyBK
new file mode 160000
index 0000000..1e5dc7d
--- /dev/null
+++ b/pyBK
@@ -0,0 +1 @@
+Subproject commit 1e5dc7de4e0a7d43a44152a68beca0699c14fd4c
diff --git a/run.py b/run.py
old mode 100755
new mode 100644
index e485961..48cf57d
--- a/run.py
+++ b/run.py
@@ -2,73 +2,95 @@
 # -*- coding: utf-8 -*-
 
 from flask import Flask, request, abort, Response, json
-from tools import ASR, SttStandelone
+from vosk import Model, KaldiRecognizer
+from tools import Worker
 from time import gmtime, strftime
 from gevent.pywsgi import WSGIServer
 import os
 
+from gevent.pywsgi import WSGIServer
+
+
+
 app = Flask("__stt-standelone-worker__")
 
-stt = SttStandelone()
+# create WorkerStreaming object
+worker = Worker()
 
 # Load ASR models (acoustic model and decoding graph)
-stt.log.info('Load acoustic model and decoding graph')
-asr = ASR(stt.AM_PATH, stt.LM_PATH, stt.CONFIG_FILES_PATH)
-
+worker.log.info('Load acoustic model and decoding graph')
+model = Model(worker.AM_PATH, worker.LM_PATH,
+              worker.CONFIG_FILES_PATH+"/online.conf")
+spkModel = None
 
+# API
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
-        stt.log.info('[%s] New user entry on /transcribe' % (strftime("%d/%b/%d %H:%M:%S", gmtime())))
-        
-        #get response content type
-        metadata = False
+        worker.log.info('[%s] New user entry on /transcribe' %
+                        (strftime("%d/%b/%d %H:%M:%S", gmtime())))
+
+        is_metadata = False
+
+        # get response content type
         if request.headers.get('accept').lower() == 'application/json':
-            metadata = True
+            is_metadata = True
         elif request.headers.get('accept').lower() == 'text/plain':
-            metadata = False
+            is_metadata = False
         else:
             raise ValueError('Not accepted header')
 
-        #get input file
+        # get input file
         if 'file' in request.files.keys():
             file = request.files['file']
-            stt.read_audio(file,asr.get_sample_rate())
-            output = stt.run(asr, metadata)
+            worker.getAudio(file)
+            rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
+            rec.AcceptWaveform(worker.data)
+            data_ = rec.FinalResult()
+            confidence = rec.uttConfidence()
+            if is_metadata:
+                data_ = rec.GetMetadata()
+            data = worker.get_response(data_, confidence, is_metadata)
+            worker.clean()
         else:
             raise ValueError('No audio file was uploaded')
 
-        return output, 200
+        return data, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
-        app.logger.error(e)
+        worker.log.error(e)
         return 'Server Error', 500
 
+
+
 # Rejected request handlers
 @app.errorhandler(405)
 def method_not_allowed(error):
     return 'The method is not allowed for the requested URL', 405
 
+
 @app.errorhandler(404)
 def page_not_found(error):
     return 'The requested URL was not found', 404
 
+
 @app.errorhandler(500)
 def server_error(error):
-    app.logger.error(error)
+    worker.log.error(error)
     return 'Server Error', 500
 
+
 if __name__ == '__main__':
     try:
         # start SwaggerUI
-        if os.path.exists(stt.SWAGGER_PATH):
-            stt.swaggerUI(app)
+        if worker.SWAGGER_PATH != '':
+            worker.swaggerUI(app)
+        # Run server
 
-        #Run server
-        app.logger.info('Server ready for transcription...')
-        http_server = WSGIServer(('', stt.SERVICE_PORT), app)
+        http_server = WSGIServer(('', worker.SERVICE_PORT), app)
         http_server.serve_forever()
+
     except Exception as e:
-        app.logger.error(e)
-        exit(e)
\ No newline at end of file
+        worker.log.error(e)
+        exit(e)
diff --git a/tools.py b/tools.py
index 3129d99..8844e48 100644
--- a/tools.py
+++ b/tools.py
@@ -1,190 +1,268 @@
-# Kaldi ASR decoder
-from kaldi.asr import NnetLatticeFasterOnlineRecognizer
-from kaldi.decoder import (LatticeFasterDecoderOptions,
-                           LatticeFasterOnlineDecoder)
-from kaldi.nnet3 import NnetSimpleLoopedComputationOptions
-from kaldi.online2 import (OnlineEndpointConfig,
-                           OnlineIvectorExtractorAdaptationState,
-                           OnlineNnetFeaturePipelineConfig,
-                           OnlineNnetFeaturePipelineInfo,
-                           OnlineNnetFeaturePipeline,
-                           OnlineSilenceWeighting)
-from kaldi.util.options import ParseOptions
-from kaldi.util.table import SequentialWaveReader
-from kaldi.matrix import Matrix, Vector
-##############
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
 
-# word to CTM
-from kaldi.lat.align import (WordBoundaryInfoNewOpts,
-                             WordBoundaryInfo,
-                             word_align_lattice)
-from kaldi.lat.functions import (compact_lattice_to_word_alignment,
-                                 compact_lattice_shortest_path)
-from kaldi.asr import NnetRecognizer
-import kaldi.fstext as _fst
+#  ASR
+from vosk import Model, KaldiRecognizer
 ##############
 
 # Speaker Diarization
-from diarizationFunctions import *
-import numpy as np
+from pyBK.diarizationFunctions import *
 import librosa
-from kaldi.ivector import (compute_vad_energy,
-                           VadEnergyOptions)
-from kaldi.feat.mfcc import Mfcc, MfccOptions
-from kaldi.util.options import ParseOptions
+import time
+import webrtcvad
 ##############
 
 # other packages
-import configparser, sys, os, re, time, logging, yaml, uuid
+import configparser
+import librosa
+import logging
+import os
+import re
+import uuid
+import json
+import yaml
+import numpy as np
+from scipy.io import wavfile
 from flask_swagger_ui import get_swaggerui_blueprint
 ##############
 
 
-class ASR:
-    def __init__(self, AM_PATH, LM_PATH, CONFIG_FILES_PATH):
-        self.log = logging.getLogger('__stt-standelone-worker__.ASR')
-        self.AM_PATH = AM_PATH
-        self.LM_PATH = LM_PATH
-        self.CONFIG_FILES_PATH = CONFIG_FILES_PATH
-        self.LoadModels()
-        
-    def LoadModels(self):
-        try:
-            # Define online feature pipeline
-            po = ParseOptions("")
-
-            decoder_opts = LatticeFasterDecoderOptions()
-            self.endpoint_opts = OnlineEndpointConfig()
-            self.decodable_opts = NnetSimpleLoopedComputationOptions()
-            feat_opts = OnlineNnetFeaturePipelineConfig()
-
-
-            decoder_opts.register(po)
-            self.endpoint_opts.register(po)
-            self.decodable_opts.register(po)
-            feat_opts.register(po)
-
-            po.read_config_file(self.CONFIG_FILES_PATH+"/online.conf")
-            self.feat_info = OnlineNnetFeaturePipelineInfo.from_config(
-                feat_opts)
-
-            # Set metadata parameters
-            self.samp_freq = self.feat_info.mfcc_opts.frame_opts.samp_freq
-            self.frame_shift = self.feat_info.mfcc_opts.frame_opts.frame_shift_ms / 1000
-            self.acwt = self.decodable_opts.acoustic_scale
-
-            # Load Acoustic and graph models and other files
-            self.transition_model, self.acoustic_model = NnetRecognizer.read_model(
-                self.AM_PATH+"/final.mdl")
-            graph = _fst.read_fst_kaldi(self.LM_PATH+"/HCLG.fst")
-            self.decoder_graph = LatticeFasterOnlineDecoder(
-                graph, decoder_opts)
-            self.symbols = _fst.SymbolTable.read_text(
-                self.LM_PATH+"/words.txt")
-            self.info = WordBoundaryInfo.from_file(
-                WordBoundaryInfoNewOpts(), self.LM_PATH+"/word_boundary.int")
-
-            
-            self.asr = NnetLatticeFasterOnlineRecognizer(self.transition_model, self.acoustic_model, self.decoder_graph,
-                                                    self.symbols, decodable_opts=self.decodable_opts, endpoint_opts=self.endpoint_opts)
-            del graph, decoder_opts
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError(
-                "AM and LM loading failed!!! (see logs for more details)")
+class Worker:
+    def __init__(self):
+        # Set logger config
+        self.log = logging.getLogger("__stt-standelone-worker__")
+        logging.basicConfig(level=logging.INFO)
 
-    def get_sample_rate(self):
-        return self.samp_freq
+        # Main parameters
+        self.AM_PATH = '/opt/models/AM'
+        self.LM_PATH = '/opt/models/LM'
+        self.TEMP_FILE_PATH = '/opt/tmp'
+        self.CONFIG_FILES_PATH = '/opt/config'
+        self.SAVE_AUDIO = False
+        self.SERVICE_PORT = 80
+        self.NBR_THREADS = 100
+        self.SWAGGER_URL = '/api-doc'
+        self.SWAGGER_PATH = ''
+        self.ONLINE = False
 
-    def get_frames(self, feat_pipeline):
-        rows = feat_pipeline.num_frames_ready()
-        cols = feat_pipeline.dim()
-        frames = Matrix(rows, cols)
-        feat_pipeline.get_frames(range(rows), frames)
-        return frames[:, :self.feat_info.mfcc_opts.num_ceps], frames[:, self.feat_info.mfcc_opts.num_ceps:]
-        # return feats + ivectors
+        if not os.path.isdir(self.CONFIG_FILES_PATH):
+            os.mkdir(self.CONFIG_FILES_PATH)
 
-    def compute_feat(self, wav):
-        try:
-            feat_pipeline = OnlineNnetFeaturePipeline(self.feat_info)
-            feat_pipeline.accept_waveform(self.samp_freq, wav)
-            feat_pipeline.input_finished()
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Feature extraction failed!!!")
-        else:
-            return feat_pipeline
+        if not os.path.isdir(self.TEMP_FILE_PATH):
+            os.mkdir(self.TEMP_FILE_PATH)
+
+        # Environment parameters
+        if 'NBR_THREADS' in os.environ:
+            if int(os.environ['NBR_THREADS']) > 0:
+                self.NBR_THREADS = int(os.environ['NBR_THREADS'])
+            else:
+                self.log.warning(
+                    "You must to provide a positif number of threads 'NBR_THREADS'")
+        if 'SWAGGER_PATH' in os.environ:
+            self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
+
+        # start loading ASR configuration
+        self.log.info("Create the new config files")
+        self.loadConfig()
+
+    def swaggerUI(self, app):
+        ### swagger specific ###
+        swagger_yml = yaml.load(
+            open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
+        swaggerui = get_swaggerui_blueprint(
+            # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
+            self.SWAGGER_URL,
+            self.SWAGGER_PATH,
+            config={  # Swagger UI config overrides
+                'app_name': "STT API Documentation",
+                'spec': swagger_yml
+            }
+        )
+        app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
+        ### end swagger specific ###
 
-    def decoder(self, feats):
+    def getAudio(self, file):
+        filename = str(uuid.uuid4())
+        self.file_path = self.TEMP_FILE_PATH+"/"+filename
+        file.save(self.file_path)
         try:
-            start_time = time.time()
-            self.log.info("Start Decoding: %s" % (start_time))
-            self.asr.set_input_pipeline(feats)
-            decode = self.asr.decode()
-            self.log.info("Decode time in seconds: %s" %
-                          (time.time() - start_time))
+            self.rate, self.data = wavfile.read(self.file_path)
+            # if stereo file, convert to mono by computing the mean of the channels
+            if len(self.data.shape) == 2 and self.data.shape[1] == 2:
+                self.data = np.mean(self.data, axis=1, dtype=np.int16)
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Decoder failed to transcribe the input audio!!!")
+            raise ValueError("The uploaded file format is not supported!!!")
+
+    def clean(self):
+        if not self.SAVE_AUDIO:
+            os.remove(self.file_path)
+
+    # re-create config files
+    def loadConfig(self):
+        # load decoder parameters from "decode.cfg"
+        decoder_settings = configparser.ConfigParser()
+        if os.path.exists(self.AM_PATH+'/decode.cfg') == False:
+            return False
+        decoder_settings.read(self.AM_PATH+'/decode.cfg')
+
+        # Prepare "online.conf"
+        self.AM_PATH = self.AM_PATH+"/" + \
+            decoder_settings.get('decoder_params', 'ampath')
+        with open(self.AM_PATH+"/conf/online.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--ivector-extraction-config=" +
+                        self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
+                f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf\n")
+                f.write(
+                    "--beam="+decoder_settings.get('decoder_params', 'beam')+"\n")
+                f.write(
+                    "--lattice-beam="+decoder_settings.get('decoder_params', 'lattice_beam')+"\n")
+                f.write("--acoustic-scale=" +
+                        decoder_settings.get('decoder_params', 'acwt')+"\n")
+                f.write(
+                    "--min-active="+decoder_settings.get('decoder_params', 'min_active')+"\n")
+                f.write(
+                    "--max-active="+decoder_settings.get('decoder_params', 'max_active')+"\n")
+                f.write("--frame-subsampling-factor="+decoder_settings.get(
+                    'decoder_params', 'frame_subsampling_factor')+"\n")
+
+        # Prepare "ivector_extractor.conf"
+        with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
+            values = f.readlines()
+            with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
+                for i in values:
+                    f.write(i)
+                f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
+                f.write("--cmvn-config="+self.AM_PATH +
+                        "/conf/online_cmvn.conf\n")
+                f.write("--lda-matrix="+self.AM_PATH +
+                        "/ivector_extractor/final.mat\n")
+                f.write("--global-cmvn-stats="+self.AM_PATH +
+                        "/ivector_extractor/global_cmvn.stats\n")
+                f.write("--diag-ubm="+self.AM_PATH +
+                        "/ivector_extractor/final.dubm\n")
+                f.write("--ivector-extractor="+self.AM_PATH +
+                        "/ivector_extractor/final.ie")
+
+        # Prepare "word_boundary.int" if not exist
+        if not os.path.exists(self.LM_PATH+"/word_boundary.int") and os.path.exists(self.AM_PATH+"/phones.txt"):
+            self.log.info("Create word_boundary.int based on phones.txt")
+            with open(self.AM_PATH+"/phones.txt") as f:
+                phones = f.readlines()
+
+            with open(self.LM_PATH+"/word_boundary.int", "w") as f:
+                for phone in phones:
+                    phone = phone.strip()
+                    phone = re.sub('^<eps> .*', '', phone)
+                    phone = re.sub('^#\d+ .*', '', phone)
+                    if phone != '':
+                        id = phone.split(' ')[1]
+                        if '_I ' in phone:
+                            f.write(id+" internal\n")
+                        elif '_B ' in phone:
+                            f.write(id+" begin\n")
+                        elif '_E ' in phone:
+                            f.write(id+" end\n")
+                        elif '_S ' in phone:
+                            f.write(id+" singleton\n")
+                        else:
+                            f.write(id+" nonword\n")
+
+    # remove extra symbols
+    def parse_text(self, text):
+        text = re.sub(r"<unk>", "", text)  # remove <unk> symbol
+        text = re.sub(r"#nonterm:[^ ]* ", "", text)  # remove entity's mark
+        text = re.sub(r"' ", "'", text)  # remove space after quote '
+        text = re.sub(r" +", " ", text)  # remove multiple spaces
+        text = text.strip()
+        return text
+
+    # Postprocess response
+    def get_response(self, dataJson, confidence, is_metadata):
+        if dataJson is not None:
+            data = json.loads(dataJson)
+            data['conf'] = confidence
+            if not is_metadata:
+                text = data['text']  # get text from response
+                return self.parse_text(text)
+
+            elif 'words' in data:
+                # Do speaker diarization and get speaker segments
+                spk = SpeakerDiarization()
+                spkrs = spk.run(self.file_path)
+
+                # Generate final output data
+                return self.process_output(data, spkrs)
+            elif 'text' in data:
+                return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': []}
+            else:
+                return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
         else:
-            return decode
+            return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
+
+    # return a json object including word-data, speaker-data
 
-    def wordTimestamp(self, text, lattice, frame_shift, frame_subsampling):
+    def process_output(self, data, spkrs):
         try:
-            _fst.utils.scale_compact_lattice(
-                [[1.0, 0], [0, float(self.acwt)]], lattice)
-            bestPath = compact_lattice_shortest_path(lattice)
-            _fst.utils.scale_compact_lattice(
-                [[1.0, 0], [0, 1.0/float(self.acwt)]], bestPath)
-            bestLattice = word_align_lattice(
-                bestPath, self.transition_model, self.info, 0)
-            alignment = compact_lattice_to_word_alignment(bestLattice[1])
-            words = _fst.indices_to_symbols(self.symbols, alignment[0])
-            start = alignment[1]
-            dur   = alignment[2]
-
-            output = {}
-            output["words"] = []
-            for i in range(len(words)):
-                meta = {}
-                meta["word"] = words[i]
-                meta["start"] = round(start[i] * frame_shift * frame_subsampling, 2)
-                meta["end"] = round((start[i]+dur[i]) * frame_shift * frame_subsampling, 2)
-                output["words"].append(meta)
-                text += " "+meta["word"]
-            output["text"] = text
+            speakers = []
+            text = []
+            i = 0
+            text_ = ""
+            words = []
+            for word in data['words']:
+                if i+1 == len(spkrs):
+                    continue
+                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
+                    text_ += word["word"] + " "
+                    words.append(word)
+                elif len(words) != 0:
+                    speaker = {}
+                    speaker["start"] = words[0]["start"]
+                    speaker["end"] = words[len(words)-1]["end"]
+                    speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+                    speaker["words"] = words
+
+                    text.append(
+                        'spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
+                    speakers.append(speaker)
+
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
+                else:
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
 
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("Decoder failed to create the word timestamps!!!")
-        else:
-            return output
+            speaker = {}
+            speaker["start"] = words[0]["start"]
+            speaker["end"] = words[len(words)-1]["end"]
+            speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
+            speaker["words"] = words
+
+            text.append('spk'+str(int(spkrs[i][2])) +
+                        ' : ' + self.parse_text(text_))
+            speakers.append(speaker)
+
+            return {'speakers': speakers, 'text': text, 'confidence-score': data['conf']}
+        except:
+            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
 
 
 class SpeakerDiarization:
-    def __init__(self, sample_rate):
+    def __init__(self):
         self.log = logging.getLogger(
             '__stt-standelone-worker__.SPKDiarization')
 
-        # MFCC FEATURES PARAMETERS
-        self.sr = sample_rate
+       # MFCC FEATURES PARAMETERS
         self.frame_length_s = 0.025
         self.frame_shift_s = 0.01
-        self.num_bins = 40
-        self.num_ceps = 40
-        self.low_freq = 40
-        self.high_freq = -200
-        if self.sr == 16000:
-            self.low_freq = 20
-            self.high_freq = 7600
-        #####
-
-        # VAD PARAMETERS
-        self.vad_ops = VadEnergyOptions()
-        self.vad_ops.vad_energy_mean_scale = 0.9
-        self.vad_ops.vad_energy_threshold = 5
-        #vad_ops.vad_frames_context = 2
-        #vad_ops.vad_proportion_threshold = 0.12
+        self.num_bins = 30
+        self.num_ceps = 30
         #####
 
         # Segment
@@ -237,85 +315,52 @@ def __init__(self, sample_rate):
         self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
         ######
 
-    def compute_feat_KALDI(self, wav):
+    def compute_feat_Librosa(self, audioFile):
         try:
-            po = ParseOptions("")
-            mfcc_opts = MfccOptions()
-            mfcc_opts.use_energy = False
-            mfcc_opts.frame_opts.samp_freq = self.sr
-            mfcc_opts.frame_opts.frame_length_ms = self.frame_length_s*1000
-            mfcc_opts.frame_opts.frame_shift_ms = self.frame_shift_s*1000
-            mfcc_opts.frame_opts.allow_downsample = False
-            mfcc_opts.mel_opts.num_bins = self.num_bins
-            mfcc_opts.mel_opts.low_freq = self.low_freq
-            mfcc_opts.mel_opts.high_freq = self.high_freq
-            mfcc_opts.num_ceps = self.num_ceps
-            mfcc_opts.register(po)
-
-            # Create MFCC object and obtain sample frequency
-            mfccObj = Mfcc(mfcc_opts)
-            mfccKaldi = mfccObj.compute_features(wav, self.sr, 1.0)
+            self.data, self.sr = librosa.load(audioFile, sr=None)
+            frame_length_inSample = self.frame_length_s * self.sr
+            hop = int(self.frame_shift_s * self.sr)
+            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
+            if self.sr >= 16000:
+                mfccNumpy = librosa.feature.mfcc(y=self.data,
+                                                 sr=self.sr,
+                                                 dct_type=2,
+                                                 n_mfcc=self.num_ceps,
+                                                 n_mels=self.num_bins,
+                                                 n_fft=NFFT,
+                                                 hop_length=hop,
+                                                 fmin=20,
+                                                 fmax=7600).T
+            else:
+                mfccNumpy = librosa.feature.mfcc(y=self.data,
+                                                 sr=self.sr,
+                                                 dct_type=2,
+                                                 n_mfcc=self.num_ceps,
+                                                 n_mels=self.num_bins,
+                                                 n_fft=NFFT,
+                                                 hop_length=hop).T
+
         except Exception as e:
             self.log.error(e)
             raise ValueError(
-                "Speaker diarization failed while extracting features!!!")
+                "Speaker diarization failed when extracting features!!!")
         else:
-            return mfccKaldi
+            return mfccNumpy
 
-    def computeVAD_KALDI(self, feats):
+    def computeVAD_WEBRTC(self, data, sr, nFeatures):
         try:
-            vadStream = compute_vad_energy(self.vad_ops, feats)
-            vad = Vector(vadStream)
-            VAD = vad.numpy()
-
-            #  segmentation
-            occurence = []
-            value = []
-            occurence.append(1)
-            value.append(VAD[0])
-
-            # compute the speech and non-speech frames
-            for i in range(1, len(VAD)):
-                if value[-1] == VAD[i]:
-                    occurence[-1] += 1
-                else:
-                    occurence.append(1)
-                    value.append(VAD[i])
-
-            # filter the speech and non-speech segments that are below 30 frames
-            i = 0
-            while(i < len(occurence)):
-                if i != 0 and (occurence[i] < 30 or value[i-1] == value[i]):
-                    occurence[i-1] += occurence[i]
-                    del value[i]
-                    del occurence[i]
-                else:
-                    i += 1
-
-            # split if and only if the silence is above 50 frames
-            i = 0
-            while(i < len(occurence)):
-                if i != 0 and ((occurence[i] < 30 and value[i] == 0.0) or value[i-1] == value[i]):
-                    occurence[i-1] += occurence[i]
-                    del value[i]
-                    del occurence[i]
-                else:
-                    i += 1
-
-            # compute VAD mask
-            maskSAD = np.zeros(len(VAD))
-            start = 0
-            for i in range(len(occurence)):
-                if value[i] == 1.0:
-                    end = start+occurence[i]
-                    maskSAD[start:end] = 1
-                    start = end
-                else:
-                    start += occurence[i]
-
-            maskSAD = np.expand_dims(maskSAD, axis=0)
-        except ValueError as v:
-            self.log.error(v)
+            if sr not in [8000, 16000, 32000, 48000]:
+                data = librosa.resample(data, sr, 16000)
+                sr = 16000
+
+            va_framed = py_webrtcvad(
+                data, fs=sr, fs_vad=sr, hoplength=30, vad_mode=0)
+            segments = get_py_webrtcvad_segments(va_framed, sr)
+            maskSAD = np.zeros([1, nFeatures])
+            for seg in segments:
+                start = int(np.round(seg[0]/self.frame_shift_s))
+                end = int(np.round(seg[1]/self.frame_shift_s))
+                maskSAD[0][start:end] = 1
         except Exception as e:
             self.log.error(e)
             raise ValueError(
@@ -323,7 +368,7 @@ def computeVAD_KALDI(self, feats):
         else:
             return maskSAD
 
-    def run(self, wav, dur, feats=None):
+    def run(self, audioFile):
         try:
             def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 numberOfSpeechFeatures = finalSegmentTable[-1, 2].astype(int)+1
@@ -357,18 +402,19 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 seg[0][0] = 0.0
                 return seg
 
-
             start_time = time.time()
-            self.log.info("Start Speaker Diarization: %s" % (start_time))
-            if self.maxNrSpeakers == 1 or dur < 5:
-                self.log.info("Speaker Diarization time in seconds: %s" %
-                              (time.time() - start_time))
-                return [[0, dur, 1],
-                        [dur, -1, -1]]
-            if feats == None:
-                feats = self.compute_feat_KALDI(wav)
+
+            self.log.info('Start Speaker diarization')
+
+            feats = self.compute_feat_Librosa(audioFile)
             nFeatures = feats.shape[0]
-            maskSAD = self.computeVAD_KALDI(feats)
+            duration = nFeatures * self.frame_shift_s
+
+            if duration < 5:
+                return [[0, duration, 1],
+                        [duration, -1, -1]]
+
+            maskSAD = self.computeVAD_WEBRTC(self.data, self.sr, nFeatures)
             maskUEM = np.ones([1, nFeatures])
 
             mask = np.logical_and(maskUEM, maskSAD)
@@ -393,8 +439,9 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
                 windowRate = int(self.maximumKBMWindowRate)
 
             if windowRate == 0:
-                raise ValueError(
-                    'The audio is to short in order to perform the speaker diarization!!!')
+                #self.log.info('The audio is to short in order to perform the speaker diarization!!!')
+                return [[0, duration, 1],
+                        [duration, -1, -1]]
 
             poolSize = np.floor((nSpeechFeatures-self.windowLength)/windowRate)
             if self.useRelativeKBMsize:
@@ -436,217 +483,21 @@ def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
             if self.resegmentation and np.size(np.unique(finalClusteringTable[:, bestClusteringID.astype(int)-1]), 0) > 1:
                 finalClusteringTableResegmentation, finalSegmentTable = performResegmentation(data, speechMapping, mask, finalClusteringTable[:, bestClusteringID.astype(
                     int)-1], segmentTable, self.modelSize, self.nbIter, self.smoothWin, nSpeechFeatures)
-                seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(finalClusteringTableResegmentation), dur)
+                seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(
+                    finalClusteringTableResegmentation), duration)
             else:
-                seg = getSegmentationFile(
-                    self.frame_shift_s, segmentTable, finalClusteringTable[:, bestClusteringID.astype(int)-1])
-            self.log.info("Speaker Diarization time in seconds: %s" %
-                          (time.time() - start_time))
+                return [[0, duration, 1],
+                        [duration, -1, -1]]
+
+            self.log.info("Speaker Diarization time in seconds: %d" %
+                          int(time.time() - start_time))
         except ValueError as v:
-            self.log.info(v)
-            return [[0, dur, 1],
-                    [dur, -1, -1]]
+            self.log.error(v)
+            return [[0, duration, 1],
+                    [duration, -1, -1]]
         except Exception as e:
             self.log.error(e)
-            raise ValueError("Speaker Diarization failed!!!")
+            return [[0, duration, 1],
+                    [duration, -1, -1]]
         else:
             return seg
-
-
-class SttStandelone:
-    def __init__(self):
-        self.log = logging.getLogger("__stt-standelone-worker-streaming__")
-        logging.basicConfig(level=logging.INFO)
-
-        # Main parameters
-        self.AM_PATH = '/opt/models/AM'
-        self.LM_PATH = '/opt/models/LM'
-        self.TEMP_FILE_PATH = '/opt/tmp'
-        self.CONFIG_FILES_PATH = '/opt/config'
-        self.SAVE_AUDIO = False
-        self.SERVICE_PORT = 80
-        self.SWAGGER_URL = '/api-doc'
-        self.SWAGGER_PATH = None
-
-        if not os.path.isdir(self.TEMP_FILE_PATH):
-            os.mkdir(self.TEMP_FILE_PATH)
-        if not os.path.isdir(self.CONFIG_FILES_PATH):
-            os.mkdir(self.CONFIG_FILES_PATH)
-
-        # Environment parameters
-        if 'SERVICE_PORT' in os.environ:
-            self.SERVICE_PORT = os.environ['SERVICE_PORT']
-        if 'SAVE_AUDIO' in os.environ:
-            self.SAVE_AUDIO = os.environ['SAVE_AUDIO']
-        if 'SWAGGER_PATH' in os.environ:
-            self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
-
-        self.loadConfig()
-
-    def loadConfig(self):
-        # get decoder parameters from "decode.cfg"
-        decoder_settings = configparser.ConfigParser()
-        if not os.path.exists(self.AM_PATH+'/decode.cfg'):
-            return False
-        decoder_settings.read(self.AM_PATH+'/decode.cfg')
-
-        # Prepare "online.conf"
-        self.AM_PATH = self.AM_PATH+"/" + \
-            decoder_settings.get('decoder_params', 'ampath')
-        with open(self.AM_PATH+"/conf/online.conf") as f:
-            values = f.readlines()
-            with open(self.CONFIG_FILES_PATH+"/online.conf", 'w') as f:
-                for i in values:
-                    f.write(i)
-                f.write("--ivector-extraction-config=" +
-                        self.CONFIG_FILES_PATH+"/ivector_extractor.conf\n")
-                f.write("--mfcc-config="+self.AM_PATH+"/conf/mfcc.conf\n")
-                f.write(
-                    "--beam="+decoder_settings.get('decoder_params', 'beam')+"\n")
-                f.write(
-                    "--lattice-beam="+decoder_settings.get('decoder_params', 'lattice_beam')+"\n")
-                f.write("--acoustic-scale=" +
-                        decoder_settings.get('decoder_params', 'acwt')+"\n")
-                f.write(
-                    "--min-active="+decoder_settings.get('decoder_params', 'min_active')+"\n")
-                f.write(
-                    "--max-active="+decoder_settings.get('decoder_params', 'max_active')+"\n")
-                f.write("--frame-subsampling-factor="+decoder_settings.get(
-                    'decoder_params', 'frame_subsampling_factor')+"\n")
-
-        # Prepare "ivector_extractor.conf"
-        with open(self.AM_PATH+"/conf/ivector_extractor.conf") as f:
-            values = f.readlines()
-            with open(self.CONFIG_FILES_PATH+"/ivector_extractor.conf", 'w') as f:
-                for i in values:
-                    f.write(i)
-                f.write("--splice-config="+self.AM_PATH+"/conf/splice.conf\n")
-                f.write("--cmvn-config="+self.AM_PATH +
-                        "/conf/online_cmvn.conf\n")
-                f.write("--lda-matrix="+self.AM_PATH +
-                        "/ivector_extractor/final.mat\n")
-                f.write("--global-cmvn-stats="+self.AM_PATH +
-                        "/ivector_extractor/global_cmvn.stats\n")
-                f.write("--diag-ubm="+self.AM_PATH +
-                        "/ivector_extractor/final.dubm\n")
-                f.write("--ivector-extractor="+self.AM_PATH +
-                        "/ivector_extractor/final.ie")
-
-        # Prepare "word_boundary.int" if not exist
-        if not os.path.exists(self.LM_PATH+"/word_boundary.int") and os.path.exists(self.AM_PATH+"phones.txt"):
-            with open(self.AM_PATH+"phones.txt") as f:
-                phones = f.readlines()
-
-            with open(self.LM_PATH+"/word_boundary.int", "w") as f:
-                for phone in phones:
-                    phone = phone.strip()
-                    phone = re.sub('^<eps> .*', '', phone)
-                    phone = re.sub('^#\d+ .*', '', phone)
-                    if phone != '':
-                        id = phone.split(' ')[1]
-                        if '_I ' in phone:
-                            f.write(id+" internal\n")
-                        elif '_B ' in phone:
-                            f.write(id+" begin\n")
-                        elif '_E ' in phone:
-                            f.write(id+" end\n")
-                        elif '_S ' in phone:
-                            f.write(id+" singleton\n")
-                        else:
-                            f.write(id+" nonword\n")
-
-    def swaggerUI(self, app):
-        ### swagger specific ###
-        swagger_yml = yaml.load(
-            open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
-        swaggerui = get_swaggerui_blueprint(
-            # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
-            self.SWAGGER_URL,
-            self.SWAGGER_PATH,
-            config={  # Swagger UI config overrides
-                'app_name': "STT API Documentation",
-                'spec': swagger_yml
-            }
-        )
-        app.register_blueprint(swaggerui, url_prefix=self.SWAGGER_URL)
-        ### end swagger specific ###
-
-    def read_audio(self, file, sample_rate):
-        filename = str(uuid.uuid4())
-        file_path = self.TEMP_FILE_PATH+"/"+filename
-        file.save(file_path)
-        try:
-            data, sr = librosa.load(file_path, sr=None)
-            if sr != sample_rate:
-                self.log.info('Resample audio file: '+str(sr) +
-                              'Hz -> '+str(sample_rate)+'Hz')
-                data = librosa.resample(data, sr, sample_rate)
-            data = (data * 32767).astype(np.int16)
-            self.dur = len(data) / sample_rate
-            self.data = Vector(data)
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError("The uploaded file format is not supported!!!")
-        finally:
-            if not self.SAVE_AUDIO:
-                os.remove(file_path)
-
-    def run(self, asr, metadata):
-        feats = asr.compute_feat(self.data)
-        mfcc, ivector = asr.get_frames(feats)
-        decode = asr.decoder(feats)
-        if metadata:
-            spk = SpeakerDiarization(asr.get_sample_rate())
-            spkSeg = spk.run(self.data, self.dur, mfcc)
-            data = asr.wordTimestamp(decode["text"], decode['lattice'], asr.frame_shift, asr.decodable_opts.frame_subsampling_factor)
-            output = self.process_output(data, spkSeg)
-            return output
-        else:
-            return self.parse_text(decode["text"])
-
-
-    # return a json object including word-data, speaker-data
-    def process_output(self, data, spkrs):
-        speakers = []
-        text = []
-        i = 0
-        text_ = ""
-        words=[]
-        for word in data['words']:
-            if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-                text_ += word["word"]  + " "
-                words.append(word)
-            else:
-                speaker = {}
-                speaker["start"]=words[0]["start"]
-                speaker["end"]=words[len(words)-1]["end"]
-                speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-                speaker["words"]=words
-
-                text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
-                speakers.append(speaker)
-
-                words=[word]
-                text_=word["word"] + " "
-                i+=1
-
-        speaker = {}
-        speaker["start"]=words[0]["start"]
-        speaker["end"]=words[len(words)-1]["end"]
-        speaker["speaker_id"]='spk'+str(int(spkrs[i][2]))
-        speaker["words"]=words
-
-        text.append('spk'+str(int(spkrs[i][2]))+' : '+ self.parse_text(text_))
-        speakers.append(speaker)
-
-        return {'speakers': speakers, 'text': text}
-
-    # remove extra symbols
-    def parse_text(self, text):
-        text = re.sub(r"<unk>", "", text) # remove <unk> symbol
-        text = re.sub(r"#nonterm:[^ ]* ", "", text) # remove entity's mark
-        text = re.sub(r"<eps>", "", text) # remove <eps>
-        text = re.sub(r"' ", "'", text) # remove space after quote '
-        text = re.sub(r" +", " ", text) # remove multiple spaces
-        text = text.strip()
-        return text
diff --git a/vosk-api b/vosk-api
new file mode 160000
index 0000000..7f555e4
--- /dev/null
+++ b/vosk-api
@@ -0,0 +1 @@
+Subproject commit 7f555e464c1d6b16233354491868f46d009c453c

From a78f891fc1cb76e2a55d369af086b5853da90376 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 12 Jan 2021 16:21:23 +0100
Subject: [PATCH 041/172] update README

---
 README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 43df31e..bd11978 100644
--- a/README.md
+++ b/README.md
@@ -140,7 +140,8 @@ Convert a speech to text
 >  `post`  <br>
 > Make a POST request
 >>  <b  style="color:green;">Arguments</b> :
->>  -  **{File} file** : Audio file (file format: wav, mp3, flac, ogg)
+>>  -  **{File} file** Audio File - Waveform Audio File Format is required
+
 >
 >>  <b  style="color:green;">Header</b> :
 >>  -  **{String} Accept**: response content type (text/plain|application/json)

From 5b9ebb6a41be19d628fbe1cef3c0c47bad9ca16c Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 13:37:40 +0100
Subject: [PATCH 042/172] update

---
 docker-compose.yml | 2 +-
 run.py             | 8 --------
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/docker-compose.yml b/docker-compose.yml
index f7da7db..08c14d0 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,7 +5,7 @@ services:
   stt-worker:
     container_name: stt-standalone-worker
     build: .
-    image: lintoai/linto-platform-stt-standalone-worker:latest-unstable
+    image: lintoai/linto-platform-stt-standalone-worker:latest
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM
diff --git a/run.py b/run.py
index c40e4f0..8f594d3 100644
--- a/run.py
+++ b/run.py
@@ -6,14 +6,6 @@
 from tools import Worker
 from time import gmtime, strftime
 from gevent.pywsgi import WSGIServer
-import os
-
-from gevent.pywsgi import WSGIServer
-
-
-
-from gevent.pywsgi import WSGIServer
-
 
 
 app = Flask("__stt-standelone-worker__")

From ed50dde63ec61d2ea633cbc06e805274c6c92dcc Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 13:38:33 +0100
Subject: [PATCH 043/172] update

---
 run.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/run.py b/run.py
index 5a7fb25..ea77af2 100644
--- a/run.py
+++ b/run.py
@@ -5,7 +5,6 @@
 from vosk import Model, KaldiRecognizer
 from tools import Worker
 from time import gmtime, strftime
-
 from gevent.pywsgi import WSGIServer
 
 

From cf8192af0b1059f0266a29c92615553acac8d362 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 13:39:06 +0100
Subject: [PATCH 044/172] update

---
 run.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/run.py b/run.py
index ea77af2..8f594d3 100644
--- a/run.py
+++ b/run.py
@@ -8,7 +8,6 @@
 from gevent.pywsgi import WSGIServer
 
 
-
 app = Flask("__stt-standelone-worker__")
 
 # create WorkerStreaming object

From ae038127505d4172a147a673a913eb11a7e749dd Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 14:50:48 +0100
Subject: [PATCH 045/172] update README

---
 .envdefault |  2 +-
 README.md   | 99 +++++++++++++++++++++++++----------------------------
 2 files changed, 48 insertions(+), 53 deletions(-)

diff --git a/.envdefault b/.envdefault
index 2246e24..130f6ef 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,3 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=/path/to/swagger/file
\ No newline at end of file
+SWAGGER_PATH=./document/swagger.yml
\ No newline at end of file
diff --git a/README.md b/README.md
index bd11978..a966856 100644
--- a/README.md
+++ b/README.md
@@ -5,12 +5,21 @@ This service is mandatory in a LinTO platform stack as the main worker for speec
 Generally, Automatic Speech Recognition (ASR) is the task of recognition and translation of spoken language into text. Our ASR system takes advantages from the recent advances in machine learning technologies and in particular deep learning ones (TDNN, LSTM, attentation-based architecture). The core of our system consists of two main components: an acoustic model and a decoding graph. A high-performance ASR system relies on an accurate acoustic model as well as a perfect decoding graph.
 
 ## Usage
-See documentation : [doc.linto.ai](https://doc.linto.ai/#/services/linstt)
+See documentation : [doc.linto.ai](https://doc.linto.ai)
 
 # Deploy
 
 With our proposed stack [linto-platform-stack](https://github.com/linto-ai/linto-platform-stack)
 
+# Hardware requirements
+In order to install and run this service, you need to have at least:
+    * 5Go available on your hard drive for the installation, and
+    * 500Mo/3Go/7Go of RAM memory available for models loading and decoding. The size depends mainly on the choosed decoding model (small, medium or big).
+
+While there is no specific minimal requirement on the CPU, speech recognition is a computationally task.
+
+**`—The better your hardware performance, the lower your decoding time—`**
+
 # Develop
 
 ## Installation
@@ -20,6 +29,7 @@ To start the LinSTT service on your local machine or your cloud, you need first
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt-standalone-worker
+git submodule update --init
 cd linto-platform-stt-standalone-worker
 mv .envdefault .env
 ```
@@ -27,7 +37,7 @@ mv .envdefault .env
 Then, to build the docker image, execute:
 
 ```bash
-docker build -t lintoai/linto-platform-stt-standalone-worker .
+docker build -t lintoai/linto-platform-stt-standalone-worker:latest .
 ```
 
 Or by docker-compose, by using:
@@ -42,16 +52,12 @@ Or, download the pre-built image from docker-hub:
 docker pull lintoai/linto-platform-stt-standalone-worker:latest
 ```
 
-NOTE: You must install docker on your machine.
+NB: You must install docker and docker-compose on your machine.
 
 ## Configuration
-The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [linstt download](services/linstt_download).
+The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [Downloads](https://doc.linto.ai/#/services/linstt_download).
 
-### Outside LinTO-Platform-STT-Service-Manager
-
-If you want to use our service alone without LinTO-Platform-STT-Service-Manager, you must `unzip` the files and put the extracted ones in the [shared storage](https://doc.linto.ai/#/infra?id=shared-storage). For example,
-
-1- Download the French acoustic model and the small decoding graph
+1- Download the French acoustic model and the small decoding graph (linstt.v1). You can download the latest version for optimal performance and you should make sure that you have the hardware requirement in terms of RAM.
 
 ```bash
 wget https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/linSTT_AM_fr-FR_v1.0.0.zip
@@ -68,41 +74,31 @@ unzip decoding_graph_fr-FR_Small_v1.1.0.zip -d DG_fr-FR_Small
 3- Move the uncompressed files into the shared storage directory
 
 ```bash
-mv AM_fr-FR ~/linto_shared/data
-mv DG_fr-FR_Small ~/linto_shared/data
+mkdir ~/linstt_model_storage
+mv AM_fr-FR ~/linstt_model_storage
+mv DG_fr-FR ~/linstt_model_storage
 ```
 
-4- Rename the default environment file `.envdefault` included in the repository `linto-platform-stt-standalone-worker` and configure it by providing the full path of the following parameters:
-
-    AM_PATH=/full/path/to/linto_shared/data/AM_fr-FR
-    LM_PATH=/full/path/to/linto_shared/data/DG_fr-FR_Small
+4- Configure the environment file `.env` included in this repository
 
-5- If you want to use Swagger interface, you need to set the corresponding environment parameter:
-    SWAGGER_PATH=/full/path/to/swagger/file
+    AM_PATH=~/linstt_model_storage/AM_fr-FR
+    LM_PATH=~/linstt_model_storage/DG_fr-FR
 
-NOTE: if you want to use the user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in the repository `linto-platform-stt-standalone-worker`. Specifically, in the section `host`, specify the address of the machine in which the service is deployed.
-
-### Using LinTO-Platform-STT-Service-Manager
-In case you want to use `LinTO-Platform-STT-Service-Manager`, you need to:
-
-1- Create an acoustic model and upload the approriate file
-
-2- Create a language model and upload the corresponding decoding graph
-
-3- Configure the environment file of this service.
-
-For more details, see instructions in [LinTO - STT-Manager](https://doc.linto.ai/#/services/stt_manager)
+NB: if you want to use the visual user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in this repository. Specifically, in the section `host`, specify the adress of the machine in which the service is deployed.
 
 ## Execute
-In order to run the service alone, you have only to execute:
+In order to run the service, you have only to execute:
 
 ```bash
 cd linto-platform-stt-standalone-worker
-docker-compose up
+docker run -p 8888:80 -v /full/path/to/linstt_model_storage/AM_fr-FR:/opt/models/AM -v /full/path/to/linstt_model_storage/DG_fr-FR:/opt/models/LM -v /full/path/to/linto-platform-stt-standalone-worker/document/swagger.yml:/opt/swagger.yml -e SWAGGER_PATH="/opt/swagger.yml" lintoai/linto-platform-stt-standalone-worker:latest
 ```
-Then you can acces it on [localhost:8888](localhost:8888)
 
-To run and manager LinSTT under `LinTO-Platform-STT-Service-Manager` service, you need to create a service first and then to start it. See [LinTO - STT-Manager](https://doc.linto.ai/#/services/stt_manager_how2use?id=how-to-use-it)
+or simply by executing:
+```bash
+cd linto-platform-stt-standalone-worker
+docker-compose up
+```
 
 Our service requires an audio file in `Waveform format`. It should has the following parameters:
 
@@ -112,27 +108,10 @@ Our service requires an audio file in `Waveform format`. It should has the follo
     - microphone: any type
     - duration: <30 minutes
 
-Other formats are also supported: mp3, aiff, flac, and ogg.
-
-### Run Example Applications
-To run an automated test go to the test folder
-
-```bash
-cd linto-platform-stt-standalone-worker/test
-```
-
-And run the test script:
-
-```bash
-./test_deployment.sh
-```
-
-Or use swagger interface to perform your personal test: localhost:8888/api-doc/
-
-
+### API
 <!-- tabs:start -->
 
-#### ** /transcribe **
+#### /transcribe
 
 Convert a speech to text
 
@@ -149,3 +128,19 @@ Convert a speech to text
 >  **{text|Json}** : Return the full transcription or a json object with metadata
 
 <!-- tabs:end -->
+
+
+### Run Example Applications
+To run an automated test, go to the test folder:
+
+```bash
+cd linto-platform-stt-standalone-worker/test
+```
+
+And run the test script:
+
+```bash
+./test_deployment.sh
+```
+
+To run personal test, you can use swagger interface: `localhost:8888/api-doc/`
\ No newline at end of file

From 8674379d28993e919ea5462d9481d6ced4d2c641 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 14:54:13 +0100
Subject: [PATCH 046/172] update README

---
 .envdefault |  3 +-
 README.md   | 80 ++++++++++++++++++++++++++++++++++-------------------
 2 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/.envdefault b/.envdefault
index 80acea5..e997778 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,4 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=/path/to/swagger/file
-NBR_PROCESSES=1
\ No newline at end of file
+SWAGGER_PATH=./document/swagger.yml
diff --git a/README.md b/README.md
index 45c75f7..3270b8c 100644
--- a/README.md
+++ b/README.md
@@ -11,6 +11,17 @@ See documentation : [doc.linto.ai](https://doc.linto.ai)
 
 With our proposed stack [linto-platform-stack](https://github.com/linto-ai/linto-platform-stack)
 
+# Hardware requirements
+In order to install and run this service, you need to have at least:
+
+* 5Go available on your hard drive for the installation, and
+
+* 500Mo/3Go/7Go of RAM memory available for models loading and decoding. The size depends mainly on the choosed decoding model (small, medium or big).
+
+While there is no specific minimal requirement on the CPU, speech recognition is a computationally task.
+
+**`—The better your hardware performance, the lower your decoding time—`**
+
 # Develop
 
 ## Installation
@@ -20,6 +31,7 @@ To start the LinSTT service on your local machine or your cloud, you need first
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt-standalone-worker
+git submodule update --init
 cd linto-platform-stt-standalone-worker
 mv .envdefault .env
 ```
@@ -27,7 +39,7 @@ mv .envdefault .env
 Then, to build the docker image, execute:
 
 ```bash
-docker build -t lintoai/linto-platform-stt-standalone-worker .
+docker build -t lintoai/linto-platform-stt-standalone-worker:latest .
 ```
 
 Or by docker-compose, by using:
@@ -42,16 +54,12 @@ Or, download the pre-built image from docker-hub:
 docker pull lintoai/linto-platform-stt-standalone-worker:latest
 ```
 
-NB: You must install docker on your machine.
+NB: You must install docker and docker-compose on your machine.
 
 ## Configuration
 The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [Downloads](https://doc.linto.ai/#/services/linstt_download).
 
-### Outside LinTO-Platform-STT-Service-Manager
-
-If you want to use our service alone without LinTO-Platform-STT-Service-Manager, you must `unzip` the files and put the extracted ones in the [shared storage](https://doc.linto.ai/#/infra?id=shared-storage). For example,
-
-1- Download the French acoustic model and the small decoding graph
+1- Download the French acoustic model and the small decoding graph (linstt.v1). You can download the latest version for optimal performance and you should make sure that you have the hardware requirement in terms of RAM.
 
 ```bash
 wget https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/linSTT_AM_fr-FR_v1.0.0.zip
@@ -68,38 +76,31 @@ unzip decoding_graph_fr-FR_Small_v1.1.0.zip -d DG_fr-FR_Small
 3- Move the uncompressed files into the shared storage directory
 
 ```bash
-mv AM_fr-FR ~/linto_shared/data
-mv DG_fr-FR_Small ~/linto_shared/data
+mkdir ~/linstt_model_storage
+mv AM_fr-FR ~/linstt_model_storage
+mv DG_fr-FR ~/linstt_model_storage
 ```
 
 4- Configure the environment file `.env` included in this repository
 
-    AM_PATH=/full/path/to/linto_shared/data/AM_fr-FR
-    LM_PATH=/full/path/to/linto_shared/data/DG_fr-FR_Small
-
+    AM_PATH=~/linstt_model_storage/AM_fr-FR
+    LM_PATH=~/linstt_model_storage/DG_fr-FR
 
 NB: if you want to use the visual user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in this repository. Specifically, in the section `host`, specify the adress of the machine in which the service is deployed.
 
-### Using LinTO-Platform-STT-Service-Manager
-In case you want to use `LinTO-Platform-STT-Service-Manager`, you need to:
-
-1- Create an acoustic model and upload the approriate file
-
-2- Create a language model and upload the corresponding decoding graph
-
-3- Configure the environmenet file of this service.
-
-For more details, see configuration instruction in [LinTO - STT-Manager](https://doc.linto.ai/#/manager)
-
 ## Execute
-In order to run the service alone, you have only to execute:
+In order to run the service, you have only to execute:
 
 ```bash
 cd linto-platform-stt-standalone-worker
-docker-compose up
+docker run -p 8888:80 -v /full/path/to/linstt_model_storage/AM_fr-FR:/opt/models/AM -v /full/path/to/linstt_model_storage/DG_fr-FR:/opt/models/LM -v /full/path/to/linto-platform-stt-standalone-worker/document/swagger.yml:/opt/swagger.yml -e SWAGGER_PATH="/opt/swagger.yml" lintoai/linto-platform-stt-standalone-worker:latest
 ```
 
-To run and manager LinSTT under `LinTO-Platform-STT-Service-Manager` service, you need to create a service first and then to start it. See [LinTO - STT-Manager](services/manager?id=execute)
+or simply by executing:
+```bash
+cd linto-platform-stt-standalone-worker
+docker-compose up
+```
 
 Our service requires an audio file in `Waveform format`. It should has the following parameters:
 
@@ -109,8 +110,30 @@ Our service requires an audio file in `Waveform format`. It should has the follo
     - microphone: any type
     - duration: <30 minutes
 
+### API
+<!-- tabs:start -->
+
+#### /transcribe
+
+Convert a speech to text
+
+### Functionality
+>  `post`  <br>
+> Make a POST request
+>>  <b  style="color:green;">Arguments</b> :
+>>  -  **{File} file** Audio File - Waveform Audio File Format is required
+
+>
+>>  <b  style="color:green;">Header</b> :
+>>  -  **{String} Accept**: response content type (text/plain|application/json)
+>
+>  **{text|Json}** : Return the full transcription or a json object with metadata
+
+<!-- tabs:end -->
+
+
 ### Run Example Applications
-To run an automated test go to the test folder
+To run an automated test, go to the test folder:
 
 ```bash
 cd linto-platform-stt-standalone-worker/test
@@ -122,5 +145,4 @@ And run the test script:
 ./test_deployment.sh
 ```
 
-Or use swagger interface to perform your personal test
-
+To run personal test, you can use swagger interface: `localhost:8888/api-doc/`
\ No newline at end of file

From 31db0d0bdcd2336da05bf334f0becd12a3543993 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Mon, 22 Feb 2021 14:54:50 +0100
Subject: [PATCH 047/172] update README

---
 README.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index a966856..3270b8c 100644
--- a/README.md
+++ b/README.md
@@ -13,8 +13,10 @@ With our proposed stack [linto-platform-stack](https://github.com/linto-ai/linto
 
 # Hardware requirements
 In order to install and run this service, you need to have at least:
-    * 5Go available on your hard drive for the installation, and
-    * 500Mo/3Go/7Go of RAM memory available for models loading and decoding. The size depends mainly on the choosed decoding model (small, medium or big).
+
+* 5Go available on your hard drive for the installation, and
+
+* 500Mo/3Go/7Go of RAM memory available for models loading and decoding. The size depends mainly on the choosed decoding model (small, medium or big).
 
 While there is no specific minimal requirement on the CPU, speech recognition is a computationally task.
 

From a7c5cd5bd28519b4e38b32684a891e9f3e04e9a3 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 4 Mar 2021 11:36:21 +0100
Subject: [PATCH 048/172] update README

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3270b8c..419dcd6 100644
--- a/README.md
+++ b/README.md
@@ -31,8 +31,8 @@ To start the LinSTT service on your local machine or your cloud, you need first
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt-standalone-worker
-git submodule update --init
 cd linto-platform-stt-standalone-worker
+git submodule update --init
 mv .envdefault .env
 ```
 

From a86fb9e23707db29fbe33f1e55867b2188247e4a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 26 Mar 2021 13:57:02 +0100
Subject: [PATCH 049/172] update docker compose config

---
 docker-compose.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docker-compose.yml b/docker-compose.yml
index f7da7db..08c14d0 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,7 +5,7 @@ services:
   stt-worker:
     container_name: stt-standalone-worker
     build: .
-    image: lintoai/linto-platform-stt-standalone-worker:latest-unstable
+    image: lintoai/linto-platform-stt-standalone-worker:latest
     volumes:
       - ${AM_PATH}:/opt/models/AM
       - ${LM_PATH}:/opt/models/LM

From 57e110ac1dc3b222d23e8e772071eeeab8120950 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 26 Mar 2021 13:59:29 +0100
Subject: [PATCH 050/172] update env file

---
 .envdefault | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.envdefault b/.envdefault
index e997778..130f6ef 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,3 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=./document/swagger.yml
+SWAGGER_PATH=./document/swagger.yml
\ No newline at end of file

From 000758b8807e2d23b7fff62be21b6c8fe5a5aedb Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 26 Mar 2021 14:00:58 +0100
Subject: [PATCH 051/172] update readme

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3270b8c..419dcd6 100644
--- a/README.md
+++ b/README.md
@@ -31,8 +31,8 @@ To start the LinSTT service on your local machine or your cloud, you need first
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt-standalone-worker
-git submodule update --init
 cd linto-platform-stt-standalone-worker
+git submodule update --init
 mv .envdefault .env
 ```
 

From ff279c53b3cd011cd7537f6d286e2412319a860b Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 31 Mar 2021 11:14:26 +0200
Subject: [PATCH 052/172] remove speaker diarization from the linstt functions.
 add speaker diarization punctuation services dependencies for stt service

---
 .envdefault          |   9 +-
 Dockerfile           |  11 +-
 docker-compose.yml   |   7 +
 docker-entrypoint.sh |  33 ++++
 run.py               |  51 ++++--
 tools.py             | 396 ++++++++++++++++---------------------------
 wait-for-it.sh       | 184 ++++++++++++++++++++
 7 files changed, 427 insertions(+), 264 deletions(-)
 create mode 100755 docker-entrypoint.sh
 create mode 100755 wait-for-it.sh

diff --git a/.envdefault b/.envdefault
index 130f6ef..8cc601e 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,3 +1,10 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=./document/swagger.yml
\ No newline at end of file
+SWAGGER_PATH=./document/swagger.yml
+
+# dependent services config
+PUCTUATION_HOST=text-punctuation-worker-host-name
+PUCTUATION_PORT=8080
+PUCTUATION_ROUTE="/api/route/path/"
+SPEAKER_DIARIZATION_HOST=speaker-diarization-worker-host-name
+SPEAKER_DIARIZATION_PORT=80
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
index c8e95cd..ee79f37 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -70,13 +70,18 @@ RUN cd /opt/vosk-api/python && \
     export KALDI_MKL=1 && \
     python3 setup.py install --user --single-version-externally-managed --root=/
 
+# Install curl for healthcheck
+RUN apt-get install -y curl
+
 # Define the main folder
 WORKDIR /usr/src/speech-to-text
 
 COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
-COPY tools.py .
-COPY run.py .
+COPY tools.py run.py docker-entrypoint.sh wait-for-it.sh ./
 
 EXPOSE 80
 
-CMD python3 ./run.py
\ No newline at end of file
+HEALTHCHECK CMD curl http://localhost/healthcheck || exit 1
+
+# Entrypoint handles the passed arguments
+ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file
diff --git a/docker-compose.yml b/docker-compose.yml
index 08c14d0..d4baa12 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -16,3 +16,10 @@ services:
     env_file: .env
     environment:
       SWAGGER_PATH: /opt/swagger.yml
+    networks:
+      - linstt-net
+
+networks:
+    internal:
+    linstt-net:
+       external: true
\ No newline at end of file
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
new file mode 100755
index 0000000..3555b4b
--- /dev/null
+++ b/docker-entrypoint.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+set -e
+
+max_attempts=3
+delay=5
+
+for retry in $(seq 1 $max_attempts); do
+    echo "Waiting punctuation service... [attempt=$retry]"
+    punctuation_state=1
+    ./wait-for-it.sh $PUCTUATION_HOST:$PUCTUATION_PORT --timeout=$delay || punctuation_state=0
+done
+
+if [ $punctuation_state == 1 ]; then
+    echo "$PUCTUATION_HOST:$PUCTUATION_PORT is up"
+else
+    echo "punctuation service is not runninig"
+fi
+
+for retry in $(seq 1 $max_attempts); do
+    echo "Waiting speaker diarization service... [attempt=$retry]"
+    spkdiarization_state=1
+    ./wait-for-it.sh $SPEAKER_DIARIZATION_HOST:$SPEAKER_DIARIZATION_PORT --timeout=$delay || spkdiarization_state=0
+done
+
+if [ $spkdiarization_state == 1 ]; then
+    echo "$SPEAKER_DIARIZATION_HOST:$SPEAKER_DIARIZATION_PORT is up"
+else
+    echo "speaker diarization service is not runninig"
+fi
+
+echo "RUNNING service"
+
+python3 ./run.py --puctuation $punctuation_state --speaker_diarization $spkdiarization_state
\ No newline at end of file
diff --git a/run.py b/run.py
index 8f594d3..7cba003 100644
--- a/run.py
+++ b/run.py
@@ -3,15 +3,18 @@
 
 from flask import Flask, request, abort, Response, json
 from vosk import Model, KaldiRecognizer
-from tools import Worker
+from tools import Worker, SpeakerDiarization, Punctuation
 from time import gmtime, strftime
 from gevent.pywsgi import WSGIServer
-
+import argparse
+import os
 
 app = Flask("__stt-standelone-worker__")
 
-# create WorkerStreaming object
+# instantiate services
 worker = Worker()
+punctuation = Punctuation()
+speakerdiarization = SpeakerDiarization()
 
 # Load ASR models (acoustic model and decoding graph)
 worker.log.info('Load acoustic model and decoding graph')
@@ -19,7 +22,20 @@
               worker.CONFIG_FILES_PATH+"/online.conf")
 spkModel = None
 
+def decode(is_metadata):
+    rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
+    rec.AcceptWaveform(worker.data)
+    data = rec.FinalResult()
+    confidence = rec.uttConfidence()
+    if is_metadata:
+        data = rec.GetMetadata()
+    return data, confidence
+
 # API
+@app.route('/healthcheck', methods=['GET'])
+def healthcheck():
+    return "1", 200
+
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
@@ -40,18 +56,15 @@ def transcribe():
         if 'file' in request.files.keys():
             file = request.files['file']
             worker.getAudio(file)
-            rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
-            rec.AcceptWaveform(worker.data)
-            data_ = rec.FinalResult()
-            confidence = rec.uttConfidence()
-            if is_metadata:
-                data_ = rec.GetMetadata()
-            data = worker.get_response(data_, confidence, is_metadata)
+            data, confidence = decode(is_metadata)
+            spk = speakerdiarization.get(worker.file_path)
+            trans = worker.get_response(data, spk, confidence, is_metadata)
+            response = punctuation.get(trans)
             worker.clean()
         else:
             raise ValueError('No audio file was uploaded')
 
-        return data, 200
+        return response, 200
     except ValueError as error:
         return str(error), 400
     except Exception as e:
@@ -79,6 +92,22 @@ def server_error(error):
 
 if __name__ == '__main__':
     try:
+        parser = argparse.ArgumentParser()
+        parser.add_argument(
+            '--puctuation',
+            type=int,
+            help='punctuation service status',
+            default=0)
+        parser.add_argument(
+            '--speaker_diarization',
+            type=int,
+            help='speaker diarization service status',
+            default=0)
+        args = parser.parse_args()
+
+        punctuation.setParam(True if args.puctuation else False)
+        speakerdiarization.setParam(True if args.speaker_diarization else False)
+        
         # start SwaggerUI
         if worker.SWAGGER_PATH != '':
             worker.swaggerUI(app)
diff --git a/tools.py b/tools.py
index 8844e48..286490b 100644
--- a/tools.py
+++ b/tools.py
@@ -24,13 +24,14 @@
 import numpy as np
 from scipy.io import wavfile
 from flask_swagger_ui import get_swaggerui_blueprint
+import requests
 ##############
 
 
 class Worker:
     def __init__(self):
         # Set logger config
-        self.log = logging.getLogger("__stt-standelone-worker__")
+        self.log = logging.getLogger("__stt-standelone-worker__.Worker")
         logging.basicConfig(level=logging.INFO)
 
         # Main parameters
@@ -40,7 +41,6 @@ def __init__(self):
         self.CONFIG_FILES_PATH = '/opt/config'
         self.SAVE_AUDIO = False
         self.SERVICE_PORT = 80
-        self.NBR_THREADS = 100
         self.SWAGGER_URL = '/api-doc'
         self.SWAGGER_PATH = ''
         self.ONLINE = False
@@ -52,12 +52,9 @@ def __init__(self):
             os.mkdir(self.TEMP_FILE_PATH)
 
         # Environment parameters
-        if 'NBR_THREADS' in os.environ:
-            if int(os.environ['NBR_THREADS']) > 0:
-                self.NBR_THREADS = int(os.environ['NBR_THREADS'])
-            else:
-                self.log.warning(
-                    "You must to provide a positif number of threads 'NBR_THREADS'")
+        if 'SAVE_AUDIO' in os.environ:
+            self.SAVE_AUDIO = True if os.environ['SAVE_AUDIO'].lower(
+            ) == "true" else False
         if 'SWAGGER_PATH' in os.environ:
             self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
 
@@ -182,7 +179,7 @@ def parse_text(self, text):
         return text
 
     # Postprocess response
-    def get_response(self, dataJson, confidence, is_metadata):
+    def get_response(self, dataJson, speakers, confidence, is_metadata):
         if dataJson is not None:
             data = json.loads(dataJson)
             data['conf'] = confidence
@@ -191,12 +188,12 @@ def get_response(self, dataJson, confidence, is_metadata):
                 return self.parse_text(text)
 
             elif 'words' in data:
-                # Do speaker diarization and get speaker segments
-                spk = SpeakerDiarization()
-                spkrs = spk.run(self.file_path)
+                if speakers is not None:
+                    # Generate final output data
+                    return self.process_output_v2(data, speakers)
+                else:
+                    return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': data['words']}
 
-                # Generate final output data
-                return self.process_output(data, spkrs)
             elif 'text' in data:
                 return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': []}
             else:
@@ -205,7 +202,6 @@ def get_response(self, dataJson, confidence, is_metadata):
             return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
 
     # return a json object including word-data, speaker-data
-
     def process_output(self, data, spkrs):
         try:
             speakers = []
@@ -252,252 +248,154 @@ def process_output(self, data, spkrs):
         except:
             return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
 
-
-class SpeakerDiarization:
-    def __init__(self):
-        self.log = logging.getLogger(
-            '__stt-standelone-worker__.SPKDiarization')
-
-       # MFCC FEATURES PARAMETERS
-        self.frame_length_s = 0.025
-        self.frame_shift_s = 0.01
-        self.num_bins = 30
-        self.num_ceps = 30
-        #####
-
-        # Segment
-        self.seg_length = 100  # Window size in frames
-        self.seg_increment = 100  # Window increment after and before window in frames
-        self.seg_rate = 100  # Window shifting in frames
-        #####
-
-        # KBM
-        # Minimum number of Gaussians in the initial pool
-        self.minimumNumberOfInitialGaussians = 1024
-        self.maximumKBMWindowRate = 50  # Maximum window rate for Gaussian computation
-        self.windowLength = 200  # Window length for computing Gaussians
-        self.kbmSize = 320  # Number of final Gaussian components in the KBM
-        # If set to 1, the KBM size is set as a proportion, given by "relKBMsize", of the pool size
-        self.useRelativeKBMsize = 1
-        # Relative KBM size if "useRelativeKBMsize = 1" (value between 0 and 1).
-        self.relKBMsize = 0.3
-        ######
-
-        # BINARY_KEY
-        self.topGaussiansPerFrame = 5  # Number of top selected components per frame
-        self.bitsPerSegmentFactor = 0.2  # Percentage of bits set to 1 in the binary keys
-        ######
-
-        # CLUSTERING
-        self.N_init = 16  # Number of initial clusters
-        # Set to one to perform linkage clustering instead of clustering/reassignment
-        self.linkage = 0
-        # Linkage criterion used if linkage==1 ('average', 'single', 'complete')
-        self.linkageCriterion = 'average'
-        # Similarity metric: 'cosine' for cumulative vectors, and 'jaccard' for binary keys
-        self.metric = 'cosine'
-        ######
-
-        # CLUSTERING_SELECTION
-        # Distance metric used in the selection of the output clustering solution ('jaccard','cosine')
-        self.metric_clusteringSelection = 'cosine'
-        # Method employed for number of clusters selection. Can be either 'elbow' for an elbow criterion based on within-class sum of squares (WCSS) or 'spectral' for spectral clustering
-        self.bestClusteringCriterion = 'elbow'
-        self.sigma = 1  # Spectral clustering parameters, employed if bestClusteringCriterion == spectral
-        self.percentile = 40
-        self.maxNrSpeakers = 10  # If known, max nr of speakers in a sesssion in the database. This is to limit the effect of changes in very small meaningless eigenvalues values generating huge eigengaps
-        ######
-
-        # RESEGMENTATION
-        self.resegmentation = 1  # Set to 1 to perform re-segmentation
-        self.modelSize = 6  # Number of GMM components
-        self.nbIter = 10  # Number of expectation-maximization (EM) iterations
-        self.smoothWin = 100  # Size of the likelihood smoothing window in nb of frames
-        ######
-
-    def compute_feat_Librosa(self, audioFile):
+    # return a json object including word-data, speaker-data
+    def process_output_v2(self, data, spkrs):
         try:
-            self.data, self.sr = librosa.load(audioFile, sr=None)
-            frame_length_inSample = self.frame_length_s * self.sr
-            hop = int(self.frame_shift_s * self.sr)
-            NFFT = int(2**np.ceil(np.log2(frame_length_inSample)))
-            if self.sr >= 16000:
-                mfccNumpy = librosa.feature.mfcc(y=self.data,
-                                                 sr=self.sr,
-                                                 dct_type=2,
-                                                 n_mfcc=self.num_ceps,
-                                                 n_mels=self.num_bins,
-                                                 n_fft=NFFT,
-                                                 hop_length=hop,
-                                                 fmin=20,
-                                                 fmax=7600).T
-            else:
-                mfccNumpy = librosa.feature.mfcc(y=self.data,
-                                                 sr=self.sr,
-                                                 dct_type=2,
-                                                 n_mfcc=self.num_ceps,
-                                                 n_mels=self.num_bins,
-                                                 n_fft=NFFT,
-                                                 hop_length=hop).T
+            speakers = []
+            text = []
+            i = 0
+            text_ = ""
+            words = []
 
-        except Exception as e:
-            self.log.error(e)
-            raise ValueError(
-                "Speaker diarization failed when extracting features!!!")
-        else:
-            return mfccNumpy
+            for word in data['words']:
+                if i+1 == len(spkrs):
+                    continue
+                if i+1 < len(spkrs) and word["end"] < spkrs[i+1]["seg_begin"]:
+                    text_ += word["word"] + " "
+                    words.append(word)
+                elif len(words) != 0:
+                    speaker = {}
+                    speaker["start"] = words[0]["start"]
+                    speaker["end"] = words[len(words)-1]["end"]
+                    speaker["speaker_id"] = str(spkrs[i]["spk_id"])
+                    speaker["words"] = words
 
-    def computeVAD_WEBRTC(self, data, sr, nFeatures):
-        try:
-            if sr not in [8000, 16000, 32000, 48000]:
-                data = librosa.resample(data, sr, 16000)
-                sr = 16000
-
-            va_framed = py_webrtcvad(
-                data, fs=sr, fs_vad=sr, hoplength=30, vad_mode=0)
-            segments = get_py_webrtcvad_segments(va_framed, sr)
-            maskSAD = np.zeros([1, nFeatures])
-            for seg in segments:
-                start = int(np.round(seg[0]/self.frame_shift_s))
-                end = int(np.round(seg[1]/self.frame_shift_s))
-                maskSAD[0][start:end] = 1
+                    text.append(
+                        str(spkrs[i]["spk_id"])+' : ' + self.parse_text(text_))
+                    speakers.append(speaker)
+
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
+                else:
+                    words = [word]
+                    text_ = word["word"] + " "
+                    i += 1
+
+            speaker = {}
+            speaker["start"] = words[0]["start"]
+            speaker["end"] = words[len(words)-1]["end"]
+            speaker["speaker_id"] = str(spkrs[i]["spk_id"])
+            speaker["words"] = words
+
+            text.append(str(spkrs[i]["spk_id"]) +
+                        ' : ' + self.parse_text(text_))
+            speakers.append(speaker)
+
+            return {'speakers': speakers, 'text': text, 'confidence-score': data['conf']}
         except Exception as e:
             self.log.error(e)
-            raise ValueError(
-                "Speaker diarization failed while voice activity detection!!!")
-        else:
-            return maskSAD
+            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
 
-    def run(self, audioFile):
-        try:
-            def getSegments(frameshift, finalSegmentTable, finalClusteringTable, dur):
-                numberOfSpeechFeatures = finalSegmentTable[-1, 2].astype(int)+1
-                solutionVector = np.zeros([1, numberOfSpeechFeatures])
-                for i in np.arange(np.size(finalSegmentTable, 0)):
-                    solutionVector[0, np.arange(
-                        finalSegmentTable[i, 1], finalSegmentTable[i, 2]+1).astype(int)] = finalClusteringTable[i]
-                seg = np.empty([0, 3])
-                solutionDiff = np.diff(solutionVector)[0]
-                first = 0
-                for i in np.arange(0, np.size(solutionDiff, 0)):
-                    if solutionDiff[i]:
-                        last = i+1
-                        seg1 = (first)*frameshift
-                        seg2 = (last-first)*frameshift
-                        seg3 = solutionVector[0, last-1]
-                        if seg.shape[0] != 0 and seg3 == seg[-1][2]:
-                            seg[-1][1] += seg2
-                        elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
-                            seg = np.vstack((seg, [seg1, seg2, seg3]))
-                        first = i+1
-                last = np.size(solutionVector, 1)
-                seg1 = (first-1)*frameshift
-                seg2 = (last-first+1)*frameshift
-                seg3 = solutionVector[0, last-1]
-                if seg3 == seg[-1][2]:
-                    seg[-1][1] += seg2
-                elif seg3 and seg2 > 0.3:  # and seg2 > 0.1
-                    seg = np.vstack((seg, [seg1, seg2, seg3]))
-                seg = np.vstack((seg, [dur, -1, -1]))
-                seg[0][0] = 0.0
-                return seg
-
-            start_time = time.time()
-
-            self.log.info('Start Speaker diarization')
-
-            feats = self.compute_feat_Librosa(audioFile)
-            nFeatures = feats.shape[0]
-            duration = nFeatures * self.frame_shift_s
-
-            if duration < 5:
-                return [[0, duration, 1],
-                        [duration, -1, -1]]
-
-            maskSAD = self.computeVAD_WEBRTC(self.data, self.sr, nFeatures)
-            maskUEM = np.ones([1, nFeatures])
-
-            mask = np.logical_and(maskUEM, maskSAD)
-            mask = mask[0][0:nFeatures]
-            nSpeechFeatures = np.sum(mask)
-            speechMapping = np.zeros(nFeatures)
-            # you need to start the mapping from 1 and end it in the actual number of features independently of the indexing style
-            # so that we don't lose features on the way
-            speechMapping[np.nonzero(mask)] = np.arange(1, nSpeechFeatures+1)
-            data = feats[np.where(mask == 1)]
-            del feats
-
-            segmentTable = getSegmentTable(
-                mask, speechMapping, self.seg_length, self.seg_increment, self.seg_rate)
-            numberOfSegments = np.size(segmentTable, 0)
-            # create the KBM
-            # set the window rate in order to obtain "minimumNumberOfInitialGaussians" gaussians
-            if np.floor((nSpeechFeatures-self.windowLength)/self.minimumNumberOfInitialGaussians) < self.maximumKBMWindowRate:
-                windowRate = int(np.floor(
-                    (np.size(data, 0)-self.windowLength)/self.minimumNumberOfInitialGaussians))
-            else:
-                windowRate = int(self.maximumKBMWindowRate)
 
-            if windowRate == 0:
-                #self.log.info('The audio is to short in order to perform the speaker diarization!!!')
-                return [[0, duration, 1],
-                        [duration, -1, -1]]
+class SpeakerDiarization:
+    def __init__(self):
+        self.SPEAKER_DIARIZATION_ISON = False
+        self.SPEAKER_DIARIZATION_HOST = None
+        self.SPEAKER_DIARIZATION_PORT = None
+        self.url = None
+        self.log = logging.getLogger(
+            "__stt-standelone-worker__.SpeakerDiarization")
+        logging.basicConfig(level=logging.INFO)
 
-            poolSize = np.floor((nSpeechFeatures-self.windowLength)/windowRate)
-            if self.useRelativeKBMsize:
-                kbmSize = int(np.floor(poolSize*self.relKBMsize))
+    def setParam(self, SPEAKER_DIARIZATION_ISON):
+        self.SPEAKER_DIARIZATION_ISON = SPEAKER_DIARIZATION_ISON
+        if self.SPEAKER_DIARIZATION_ISON:
+            self.SPEAKER_DIARIZATION_HOST = os.environ['SPEAKER_DIARIZATION_HOST']
+            self.SPEAKER_DIARIZATION_PORT = os.environ['SPEAKER_DIARIZATION_PORT']
+            self.url = "http://"+self.SPEAKER_DIARIZATION_HOST + \
+                ":"+self.SPEAKER_DIARIZATION_PORT+"/"
+        self.log.info(self.url) if self.url is not None else self.log.warn(
+            "The Speaker Diarization service is not running!")
+
+    def get(self, audio_path):
+        try:
+            if self.SPEAKER_DIARIZATION_ISON:
+                file = open(audio_path, 'rb')
+                result = requests.post(self.url, files={'file': file})
+                if result.status_code != 200:
+                    raise ValueError(result.text)
+
+                speakers = json.loads(result.text)
+                speakers = speakers["segments"]
+
+                last_spk = {
+                    'seg_begin': speakers[len(speakers) - 1]["seg_end"] + 10,
+                    'seg_end': -1,
+                    'spk_id': -1,
+                    'seg_id': -1,
+                }
+                speakers.append(last_spk)
+                
+                return speakers
             else:
-                kbmSize = int(self.kbmSize)
-
-            # Training pool of',int(poolSize),'gaussians with a rate of',int(windowRate),'frames'
-            kbm, gmPool = trainKBM(
-                data, self.windowLength, windowRate, kbmSize)
+                raise ValueError('Service is OFF')
+        except Exception as e:
+            self.log.error(str(e))
+            return None
+        except ValueError as error:
+            self.log.error(str(error))
+            return None
 
-            #'Selected',kbmSize,'gaussians from the pool'
-            Vg = getVgMatrix(data, gmPool, kbm, self.topGaussiansPerFrame)
 
-            #'Computing binary keys for all segments... '
-            segmentBKTable, segmentCVTable = getSegmentBKs(
-                segmentTable, kbmSize, Vg, self.bitsPerSegmentFactor, speechMapping)
+class Punctuation:
+    def __init__(self):
+        self.PUCTUATION_ISON = False
+        self.PUCTUATION_HOST = None
+        self.PUCTUATION_PORT = None
+        self.PUCTUATION_ROUTE = None
+        self.url = None
+        self.log = logging.getLogger("__stt-standelone-worker__.Punctuation")
+        logging.basicConfig(level=logging.INFO)
 
-            #'Performing initial clustering... '
-            initialClustering = np.digitize(np.arange(numberOfSegments), np.arange(
-                0, numberOfSegments, numberOfSegments/self.N_init))
+    def setParam(self, PUCTUATION_ISON):
+        self.PUCTUATION_ISON = PUCTUATION_ISON
+        if self.PUCTUATION_ISON:
+            self.PUCTUATION_HOST = os.environ['PUCTUATION_HOST']
+            self.PUCTUATION_PORT = os.environ['PUCTUATION_PORT']
+            self.PUCTUATION_ROUTE = os.environ['PUCTUATION_ROUTE']
+            self.PUCTUATION_ROUTE = re.sub('^/','',self.PUCTUATION_ROUTE)
+            self.PUCTUATION_ROUTE = re.sub('"|\'','',self.PUCTUATION_ROUTE)
+            self.url = "http://"+self.PUCTUATION_HOST+":"+self.PUCTUATION_PORT+"/"+self.PUCTUATION_ROUTE
+        self.log.info(self.url) if self.url is not None else self.log.warn(
+            "The Punctuation service is not running!")
+
+    def get(self, text):
+        try:
+            if self.PUCTUATION_ISON:
+                if isinstance(text, dict):          
+                    text_punc = []
+                    for utterance in text['text']:
+                        data = utterance.split(':')
+                        result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
+                        if result.status_code != 200:
+                            raise ValueError(result.text)
+                        
+                        text_punc.append(data[0]+": "+result.text.encode('latin-1').decode('utf-8'))
+                    text['text'] = text_punc
+                    return text
+                else:
+                    result = requests.post(self.url, data=text.encode('utf-8'), headers={'content-type': 'application/octet-stream'})
+                    if result.status_code != 200:
+                        raise ValueError(result.text.encode('latin-1').decode('utf-8'))
 
-            #'Performing agglomerative clustering... '
-            if self.linkage:
-                finalClusteringTable, k = performClusteringLinkage(
-                    segmentBKTable, segmentCVTable, self.N_init, self.linkageCriterion, self.metric)
+                    return result.text
             else:
-                finalClusteringTable, k = performClustering(
-                    speechMapping, segmentTable, segmentBKTable, segmentCVTable, Vg, self.bitsPerSegmentFactor, kbmSize, self.N_init, initialClustering, self.metric)
-
-            #'Selecting best clustering...'
-            if self.bestClusteringCriterion == 'elbow':
-                bestClusteringID = getBestClustering(
-                    self.metric_clusteringSelection, segmentBKTable, segmentCVTable, finalClusteringTable, k, self.maxNrSpeakers)
-            elif self.bestClusteringCriterion == 'spectral':
-                bestClusteringID = getSpectralClustering(self.metric_clusteringSelection, finalClusteringTable,
-                                                         self.N_init, segmentBKTable, segmentCVTable, k, self.sigma, self.percentile, self.maxNrSpeakers)+1
-
-            if self.resegmentation and np.size(np.unique(finalClusteringTable[:, bestClusteringID.astype(int)-1]), 0) > 1:
-                finalClusteringTableResegmentation, finalSegmentTable = performResegmentation(data, speechMapping, mask, finalClusteringTable[:, bestClusteringID.astype(
-                    int)-1], segmentTable, self.modelSize, self.nbIter, self.smoothWin, nSpeechFeatures)
-                seg = getSegments(self.frame_shift_s, finalSegmentTable, np.squeeze(
-                    finalClusteringTableResegmentation), duration)
-            else:
-                return [[0, duration, 1],
-                        [duration, -1, -1]]
-
-            self.log.info("Speaker Diarization time in seconds: %d" %
-                          int(time.time() - start_time))
-        except ValueError as v:
-            self.log.error(v)
-            return [[0, duration, 1],
-                    [duration, -1, -1]]
+                raise ValueError('Service is OFF')
         except Exception as e:
-            self.log.error(e)
-            return [[0, duration, 1],
-                    [duration, -1, -1]]
-        else:
-            return seg
+            self.log.error(str(e))
+            return text
+        except ValueError as error:
+            self.log.error(str(error))
+            return text
+
diff --git a/wait-for-it.sh b/wait-for-it.sh
new file mode 100755
index 0000000..ea66f79
--- /dev/null
+++ b/wait-for-it.sh
@@ -0,0 +1,184 @@
+#!/usr/bin/env bash
+# Use this script to test if a given TCP host/port are available
+
+WAITFORIT_cmdname=${0##*/}
+
+echoerr() { if [[ $WAITFORIT_QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }
+
+usage()
+{
+    cat << USAGE >&2
+Usage:
+    $WAITFORIT_cmdname host:port [-s] [-t timeout] [-- command args]
+    -h HOST | --host=HOST       Host or IP under test
+    -p PORT | --port=PORT       TCP port under test
+                                Alternatively, you specify the host and port as host:port
+    -s | --strict               Only execute subcommand if the test succeeds
+    -q | --quiet                Don't output any status messages
+    -t TIMEOUT | --timeout=TIMEOUT
+                                Timeout in seconds, zero for no timeout
+    -- COMMAND ARGS             Execute command with args after the test finishes
+USAGE
+    exit 1
+}
+
+wait_for()
+{
+    if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
+        echoerr "$WAITFORIT_cmdname: waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
+    else
+        echoerr "$WAITFORIT_cmdname: waiting for $WAITFORIT_HOST:$WAITFORIT_PORT without a timeout"
+    fi
+    WAITFORIT_start_ts=$(date +%s)
+    while :
+    do
+        if [[ $WAITFORIT_ISBUSY -eq 1 ]]; then
+            nc -z $WAITFORIT_HOST $WAITFORIT_PORT
+            WAITFORIT_result=$?
+        else
+            (echo > /dev/tcp/$WAITFORIT_HOST/$WAITFORIT_PORT) >/dev/null 2>&1
+            WAITFORIT_result=$?
+        fi
+        if [[ $WAITFORIT_result -eq 0 ]]; then
+            WAITFORIT_end_ts=$(date +%s)
+            echoerr "$WAITFORIT_cmdname: $WAITFORIT_HOST:$WAITFORIT_PORT is available after $((WAITFORIT_end_ts - WAITFORIT_start_ts)) seconds"
+            break
+        fi
+        sleep 1
+    done
+    return $WAITFORIT_result
+}
+
+wait_for_wrapper()
+{
+    # In order to support SIGINT during timeout: http://unix.stackexchange.com/a/57692
+    if [[ $WAITFORIT_QUIET -eq 1 ]]; then
+        timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --quiet --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
+    else
+        timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
+    fi
+    WAITFORIT_PID=$!
+    trap "kill -INT -$WAITFORIT_PID" INT
+    wait $WAITFORIT_PID
+    WAITFORIT_RESULT=$?
+    if [[ $WAITFORIT_RESULT -ne 0 ]]; then
+        echoerr "$WAITFORIT_cmdname: timeout occurred after waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
+    fi
+    return $WAITFORIT_RESULT
+}
+
+# process arguments
+while [[ $# -gt 0 ]]
+do
+    case "$1" in
+        *:* )
+        WAITFORIT_hostport=(${1//:/ })
+        WAITFORIT_HOST=${WAITFORIT_hostport[0]}
+        WAITFORIT_PORT=${WAITFORIT_hostport[1]}
+        shift 1
+        ;;
+        --child)
+        WAITFORIT_CHILD=1
+        shift 1
+        ;;
+        -q | --quiet)
+        WAITFORIT_QUIET=1
+        shift 1
+        ;;
+        -s | --strict)
+        WAITFORIT_STRICT=1
+        shift 1
+        ;;
+        -h)
+        WAITFORIT_HOST="$2"
+        if [[ $WAITFORIT_HOST == "" ]]; then break; fi
+        shift 2
+        ;;
+        --host=*)
+        WAITFORIT_HOST="${1#*=}"
+        shift 1
+        ;;
+        -p)
+        WAITFORIT_PORT="$2"
+        if [[ $WAITFORIT_PORT == "" ]]; then break; fi
+        shift 2
+        ;;
+        --port=*)
+        WAITFORIT_PORT="${1#*=}"
+        shift 1
+        ;;
+        -t)
+        WAITFORIT_TIMEOUT="$2"
+        if [[ $WAITFORIT_TIMEOUT == "" ]]; then break; fi
+        shift 2
+        ;;
+        --timeout=*)
+        WAITFORIT_TIMEOUT="${1#*=}"
+        shift 1
+        ;;
+        --)
+        shift
+        WAITFORIT_CLI=("$@")
+        break
+        ;;
+        --help)
+        usage
+        ;;
+        *)
+        echoerr "Unknown argument: $1"
+        usage
+        ;;
+    esac
+done
+
+if [[ "$WAITFORIT_HOST" == "" || "$WAITFORIT_PORT" == "" ]]; then
+    echoerr "Error: you need to provide a host and port to test."
+    usage
+fi
+
+WAITFORIT_TIMEOUT=${WAITFORIT_TIMEOUT:-15}
+WAITFORIT_STRICT=${WAITFORIT_STRICT:-0}
+WAITFORIT_CHILD=${WAITFORIT_CHILD:-0}
+WAITFORIT_QUIET=${WAITFORIT_QUIET:-0}
+
+# Check to see if timeout is from busybox?
+WAITFORIT_TIMEOUT_PATH=$(type -p timeout)
+WAITFORIT_TIMEOUT_PATH=$(realpath $WAITFORIT_TIMEOUT_PATH 2>/dev/null || readlink -f $WAITFORIT_TIMEOUT_PATH)
+
+WAITFORIT_BUSYTIMEFLAG=""
+if [[ $WAITFORIT_TIMEOUT_PATH =~ "busybox" ]]; then
+    WAITFORIT_ISBUSY=1
+    # Check if busybox timeout uses -t flag
+    # (recent Alpine versions don't support -t anymore)
+    if timeout &>/dev/stdout | grep -q -e '-t '; then
+        WAITFORIT_BUSYTIMEFLAG="-t"
+    fi
+else
+    WAITFORIT_ISBUSY=0
+fi
+
+if [[ $WAITFORIT_CHILD -gt 0 ]]; then
+    wait_for
+    WAITFORIT_RESULT=$?
+    exit $WAITFORIT_RESULT
+else
+    if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
+        wait_for_wrapper
+        WAITFORIT_RESULT=$?
+    else
+        wait_for
+        WAITFORIT_RESULT=$?
+    fi
+fi
+
+if [[ $WAITFORIT_CLI != "" ]]; then
+    echo $WAITFORIT_RESULT
+    echo $WAITFORIT_STRICT
+    if [[ $WAITFORIT_RESULT -ne 0 && $WAITFORIT_STRICT -eq 1 ]]; then
+        echoerr "$WAITFORIT_cmdname: strict mode, refusing to execute subprocess"
+        exit $WAITFORIT_RESULT
+    fi
+    exec "${WAITFORIT_CLI[@]}"
+else
+    exit $WAITFORIT_RESULT
+fi
\ No newline at end of file

From d20f01ac01dd1ec820e679eaecede2ca4972824f Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 31 Mar 2021 12:35:50 +0200
Subject: [PATCH 053/172] update Dockerfile

---
 Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Dockerfile b/Dockerfile
index ee79f37..b0f03f5 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -71,7 +71,7 @@ RUN cd /opt/vosk-api/python && \
     python3 setup.py install --user --single-version-externally-managed --root=/
 
 # Install curl for healthcheck
-RUN apt-get install -y curl
+RUN apt-get update && apt-get install -y curl
 
 # Define the main folder
 WORKDIR /usr/src/speech-to-text

From 53116ee7f7428e41a6736c245028a0cd2c8a43a1 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 31 Mar 2021 16:12:27 +0200
Subject: [PATCH 054/172] fix text punctuation error and clean entrypoint code

---
 docker-entrypoint.sh | 14 ++------------
 tools.py             | 22 +++++++++++++---------
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 3555b4b..6a826a7 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -8,26 +8,16 @@ for retry in $(seq 1 $max_attempts); do
     echo "Waiting punctuation service... [attempt=$retry]"
     punctuation_state=1
     ./wait-for-it.sh $PUCTUATION_HOST:$PUCTUATION_PORT --timeout=$delay || punctuation_state=0
+    if [ $punctuation_state == 1 ]; then break; fi
 done
 
-if [ $punctuation_state == 1 ]; then
-    echo "$PUCTUATION_HOST:$PUCTUATION_PORT is up"
-else
-    echo "punctuation service is not runninig"
-fi
-
 for retry in $(seq 1 $max_attempts); do
     echo "Waiting speaker diarization service... [attempt=$retry]"
     spkdiarization_state=1
     ./wait-for-it.sh $SPEAKER_DIARIZATION_HOST:$SPEAKER_DIARIZATION_PORT --timeout=$delay || spkdiarization_state=0
+    if [ $spkdiarization_state == 1 ]; then break; fi
 done
 
-if [ $spkdiarization_state == 1 ]; then
-    echo "$SPEAKER_DIARIZATION_HOST:$SPEAKER_DIARIZATION_PORT is up"
-else
-    echo "speaker diarization service is not runninig"
-fi
-
 echo "RUNNING service"
 
 python3 ./run.py --puctuation $punctuation_state --speaker_diarization $spkdiarization_state
\ No newline at end of file
diff --git a/tools.py b/tools.py
index 286490b..02af229 100644
--- a/tools.py
+++ b/tools.py
@@ -374,15 +374,19 @@ def get(self, text):
         try:
             if self.PUCTUATION_ISON:
                 if isinstance(text, dict):          
-                    text_punc = []
-                    for utterance in text['text']:
-                        data = utterance.split(':')
-                        result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
-                        if result.status_code != 200:
-                            raise ValueError(result.text)
-                        
-                        text_punc.append(data[0]+": "+result.text.encode('latin-1').decode('utf-8'))
-                    text['text'] = text_punc
+                    if isinstance(text['text'], list):
+                        text_punc = []
+                        for utterance in text['text']:
+                            data = utterance.split(':')
+                            result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
+                            if result.status_code != 200:
+                                raise ValueError(result.text)
+                            
+                            text_punc.append(data[0]+": "+result.text.encode('latin-1').decode('utf-8'))
+                        text['text'] = text_punc
+                    else:
+                        result = requests.post(self.url, data=text['text'].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
+                        text['text'] = result.text.encode('latin-1').decode('utf-8')
                     return text
                 else:
                     result = requests.post(self.url, data=text.encode('utf-8'), headers={'content-type': 'application/octet-stream'})

From 7437b6b23575a70279d77d072f25dbb73a7e714b Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 31 Mar 2021 16:50:15 +0200
Subject: [PATCH 055/172] update swagger param

---
 tools.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools.py b/tools.py
index 02af229..36ab763 100644
--- a/tools.py
+++ b/tools.py
@@ -57,6 +57,8 @@ def __init__(self):
             ) == "true" else False
         if 'SWAGGER_PATH' in os.environ:
             self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
+        if 'SWAGGER_URL' in os.environ:
+            self.SWAGGER_URL = os.environ['SWAGGER_URL']
 
         # start loading ASR configuration
         self.log.info("Create the new config files")

From e6766319c6b07aa22d023836a7ee900b8486cbfa Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 31 Mar 2021 17:07:36 +0200
Subject: [PATCH 056/172] add prefix to swagger ui

---
 tools.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools.py b/tools.py
index 36ab763..c02859f 100644
--- a/tools.py
+++ b/tools.py
@@ -42,6 +42,7 @@ def __init__(self):
         self.SAVE_AUDIO = False
         self.SERVICE_PORT = 80
         self.SWAGGER_URL = '/api-doc'
+        self.SWAGGER_PREFIX = ''
         self.SWAGGER_PATH = ''
         self.ONLINE = False
 
@@ -57,8 +58,8 @@ def __init__(self):
             ) == "true" else False
         if 'SWAGGER_PATH' in os.environ:
             self.SWAGGER_PATH = os.environ['SWAGGER_PATH']
-        if 'SWAGGER_URL' in os.environ:
-            self.SWAGGER_URL = os.environ['SWAGGER_URL']
+        if 'SWAGGER_PREFIX' in os.environ:
+            self.SWAGGER_PREFIX = os.environ['SWAGGER_PREFIX']
 
         # start loading ASR configuration
         self.log.info("Create the new config files")
@@ -70,7 +71,7 @@ def swaggerUI(self, app):
             open(self.SWAGGER_PATH, 'r'), Loader=yaml.Loader)
         swaggerui = get_swaggerui_blueprint(
             # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
-            self.SWAGGER_URL,
+            self.SWAGGER_PREFIX+self.SWAGGER_URL,
             self.SWAGGER_PATH,
             config={  # Swagger UI config overrides
                 'app_name': "STT API Documentation",

From ad3cc3ae07fe85d274eb68085f79720b16aabc14 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 1 Apr 2021 16:19:55 +0200
Subject: [PATCH 057/172] remove healthcheck

---
 Dockerfile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index b0f03f5..3aa8b30 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -81,7 +81,5 @@ COPY tools.py run.py docker-entrypoint.sh wait-for-it.sh ./
 
 EXPOSE 80
 
-HEALTHCHECK CMD curl http://localhost/healthcheck || exit 1
-
 # Entrypoint handles the passed arguments
 ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file

From 59a81b4d4c90bfbb8427b07564110de3ebb7c1ca Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 14 Apr 2021 14:28:18 +0200
Subject: [PATCH 058/172] fix services call

---
 tools.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools.py b/tools.py
index c02859f..b8c4f9a 100644
--- a/tools.py
+++ b/tools.py
@@ -342,7 +342,7 @@ def get(self, audio_path):
                 
                 return speakers
             else:
-                raise ValueError('Service is OFF')
+                return None
         except Exception as e:
             self.log.error(str(e))
             return None
@@ -398,7 +398,7 @@ def get(self, text):
 
                     return result.text
             else:
-                raise ValueError('Service is OFF')
+                return text
         except Exception as e:
             self.log.error(str(e))
             return text

From fec438fcd135b5d15fdc6069dfce9eef3ee2454b Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 28 Apr 2021 15:57:38 +0200
Subject: [PATCH 059/172] add a new response type to activate/deactivate
 speaker diarization

---
 run.py | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/run.py b/run.py
index 7cba003..3c698b8 100644
--- a/run.py
+++ b/run.py
@@ -43,12 +43,17 @@ def transcribe():
                         (strftime("%d/%b/%d %H:%M:%S", gmtime())))
 
         is_metadata = False
+        do_spk = True
 
         # get response content type
         if request.headers.get('accept').lower() == 'application/json':
             is_metadata = True
+        elif request.headers.get('accept').lower() == 'application/json-nospk':
+            is_metadata = True
+            do_spk = False
         elif request.headers.get('accept').lower() == 'text/plain':
             is_metadata = False
+            do_spk = False
         else:
             raise ValueError('Not accepted header')
 
@@ -57,7 +62,9 @@ def transcribe():
             file = request.files['file']
             worker.getAudio(file)
             data, confidence = decode(is_metadata)
-            spk = speakerdiarization.get(worker.file_path)
+            spk = None
+            if do_spk:
+                spk = speakerdiarization.get(worker.file_path)
             trans = worker.get_response(data, spk, confidence, is_metadata)
             response = punctuation.get(trans)
             worker.clean()

From fc060f5120bb9b9da48f156311ab8b41164b709e Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 28 Apr 2021 17:15:50 +0200
Subject: [PATCH 060/172] fix punctuation text encode

---
 tools.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools.py b/tools.py
index b8c4f9a..6e34818 100644
--- a/tools.py
+++ b/tools.py
@@ -381,20 +381,21 @@ def get(self, text):
                         text_punc = []
                         for utterance in text['text']:
                             data = utterance.split(':')
+                            self.log.info(data[1].strip())
                             result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
                             if result.status_code != 200:
                                 raise ValueError(result.text)
                             
-                            text_punc.append(data[0]+": "+result.text.encode('latin-1').decode('utf-8'))
+                            text_punc.append(data[0]+": "+result.text)
                         text['text'] = text_punc
                     else:
                         result = requests.post(self.url, data=text['text'].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
-                        text['text'] = result.text.encode('latin-1').decode('utf-8')
+                        text['text'] = result.text
                     return text
                 else:
                     result = requests.post(self.url, data=text.encode('utf-8'), headers={'content-type': 'application/octet-stream'})
                     if result.status_code != 200:
-                        raise ValueError(result.text.encode('latin-1').decode('utf-8'))
+                        raise ValueError(result.text)
 
                     return result.text
             else:

From 349d3d9e0f4bed6bdc08d4e2a6e3a3e509ec910a Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Wed, 28 Apr 2021 17:17:32 +0200
Subject: [PATCH 061/172] remove verification message

---
 tools.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools.py b/tools.py
index 6e34818..6a3324d 100644
--- a/tools.py
+++ b/tools.py
@@ -381,7 +381,6 @@ def get(self, text):
                         text_punc = []
                         for utterance in text['text']:
                             data = utterance.split(':')
-                            self.log.info(data[1].strip())
                             result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
                             if result.status_code != 200:
                                 raise ValueError(result.text)

From 4d58d1a84486f528d8cc494baff4a7f028d7b38f Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 11 May 2021 15:36:32 +0200
Subject: [PATCH 062/172] clean memory

---
 run.py   | 1 +
 tools.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/run.py b/run.py
index 3c698b8..cfec1ab 100644
--- a/run.py
+++ b/run.py
@@ -29,6 +29,7 @@ def decode(is_metadata):
     confidence = rec.uttConfidence()
     if is_metadata:
         data = rec.GetMetadata()
+    del rec
     return data, confidence
 
 # API
diff --git a/tools.py b/tools.py
index 6a3324d..226f6cc 100644
--- a/tools.py
+++ b/tools.py
@@ -97,6 +97,7 @@ def getAudio(self, file):
     def clean(self):
         if not self.SAVE_AUDIO:
             os.remove(self.file_path)
+        del self.data
 
     # re-create config files
     def loadConfig(self):

From aa957e4601977ead3002c807dcf4aa492a2f530e Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 26 Aug 2021 21:03:20 +0200
Subject: [PATCH 063/172] update linstt worker

---
 .gitmodules          |  6 ------
 Dockerfile           | 20 ++++----------------
 docker-entrypoint.sh |  8 +++++++-
 pyBK                 |  1 -
 requirements.txt     |  7 +++++++
 run.py               | 26 ++++++++++++++++++--------
 tools.py             | 20 ++++----------------
 vosk-api             |  1 -
 8 files changed, 40 insertions(+), 49 deletions(-)
 delete mode 100644 .gitmodules
 delete mode 160000 pyBK
 create mode 100644 requirements.txt
 delete mode 160000 vosk-api

diff --git a/.gitmodules b/.gitmodules
deleted file mode 100644
index b131dc4..0000000
--- a/.gitmodules
+++ /dev/null
@@ -1,6 +0,0 @@
-[submodule "vosk-api"]
-	path = vosk-api
-	url = https://github.com/irebai/vosk-api.git
-[submodule "pyBK"]
-	path = pyBK
-	url = https://github.com/irebai/pyBK.git
diff --git a/Dockerfile b/Dockerfile
index 3aa8b30..3283b9c 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -52,31 +52,19 @@ RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi && \
     cd /opt/kaldi/tools && mkdir openfst_ && mv openfst-*/lib openfst-*/include openfst-*/bin openfst_ && rm openfst_/lib/*.so* openfst_/lib/*.la && \
     rm -r openfst-*/* && mv openfst_/* openfst-*/ && rm -r openfst_
 
-# Install pyBK (speaker diarization toolkit)
-RUN apt install -y software-properties-common && wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && ./llvm.sh 10 && \
-    export LLVM_CONFIG=/usr/bin/llvm-config-10 && \
-    pip3 install numpy && \
-    pip3 install websockets && \
-    pip3 install librosa webrtcvad scipy sklearn
-
-# Install main service packages
-RUN pip3 install flask flask-cors flask-swagger-ui gevent pyyaml && \
-    apt-get install -y ffmpeg
+# Install python dependencies
+COPY requirements.txt ./
+RUN pip3 install --no-cache-dir -r requirements.txt
 
 # build VOSK KALDI
-COPY vosk-api /opt/vosk-api
-RUN cd /opt/vosk-api/python && \
+RUN git clone --depth 1 https://github.com/irebai/vosk-api.git /opt/vosk-api && cd /opt/vosk-api/python && \
     export KALDI_ROOT=/opt/kaldi && \
     export KALDI_MKL=1 && \
     python3 setup.py install --user --single-version-externally-managed --root=/
 
-# Install curl for healthcheck
-RUN apt-get update && apt-get install -y curl
-
 # Define the main folder
 WORKDIR /usr/src/speech-to-text
 
-COPY pyBK/diarizationFunctions.py pyBK/diarizationFunctions.py
 COPY tools.py run.py docker-entrypoint.sh wait-for-it.sh ./
 
 EXPOSE 80
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 6a826a7..8ca4752 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -4,20 +4,26 @@ set -e
 max_attempts=3
 delay=5
 
+punctuation_state=0
+if [[ ! -z $PUCTUATION_HOST && ! -z $PUCTUATION_PORT ]]; then
 for retry in $(seq 1 $max_attempts); do
     echo "Waiting punctuation service... [attempt=$retry]"
     punctuation_state=1
     ./wait-for-it.sh $PUCTUATION_HOST:$PUCTUATION_PORT --timeout=$delay || punctuation_state=0
     if [ $punctuation_state == 1 ]; then break; fi
 done
+fi
 
+spkdiarization_state=0
+if [[ ! -z $SPEAKER_DIARIZATION_HOST && ! -z $SPEAKER_DIARIZATION_PORT ]]; then
 for retry in $(seq 1 $max_attempts); do
     echo "Waiting speaker diarization service... [attempt=$retry]"
     spkdiarization_state=1
     ./wait-for-it.sh $SPEAKER_DIARIZATION_HOST:$SPEAKER_DIARIZATION_PORT --timeout=$delay || spkdiarization_state=0
     if [ $spkdiarization_state == 1 ]; then break; fi
 done
+fi
 
-echo "RUNNING service"
+echo "Start service"
 
 python3 ./run.py --puctuation $punctuation_state --speaker_diarization $spkdiarization_state
\ No newline at end of file
diff --git a/pyBK b/pyBK
deleted file mode 160000
index 1e5dc7d..0000000
--- a/pyBK
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit 1e5dc7de4e0a7d43a44152a68beca0699c14fd4c
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..86cc88c
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,7 @@
+flask>=1.1.2
+flask-cors>=3.0.10
+flask-swagger-ui>=3.36.0
+gevent>=21.8.0
+pyyaml>=5.4.1
+wavio>=0.0.4
+requests>=2.26.0
\ No newline at end of file
diff --git a/run.py b/run.py
index cfec1ab..fdcb819 100644
--- a/run.py
+++ b/run.py
@@ -23,13 +23,19 @@
 spkModel = None
 
 def decode(is_metadata):
-    rec = KaldiRecognizer(model, spkModel, worker.rate, worker.ONLINE)
-    rec.AcceptWaveform(worker.data)
-    data = rec.FinalResult()
-    confidence = rec.uttConfidence()
+    if is_metadata and len(worker.data) / worker.rate > 30 :
+        recognizer = KaldiRecognizer(model, spkModel, worker.rate, is_metadata, True)
+        for i in range(0, len(worker.data), int(worker.rate/4)):
+            if recognizer.AcceptWaveform(worker.data[i:i + int(worker.rate/4)]):
+                recognizer.Result()
+    else:
+        recognizer = KaldiRecognizer(model, None, worker.rate, is_metadata, False)
+        recognizer.AcceptWaveform(worker.data)
+
+    data = recognizer.FinalResult()
+    confidence = recognizer.uttConfidence()
     if is_metadata:
-        data = rec.GetMetadata()
-    del rec
+        data = recognizer.GetMetadata()
     return data, confidence
 
 # API
@@ -40,7 +46,7 @@ def healthcheck():
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
-        worker.log.info('[%s] New user entry on /transcribe' %
+        worker.log.info('[%s] Transcribe request received' %
                         (strftime("%d/%b/%d %H:%M:%S", gmtime())))
 
         is_metadata = False
@@ -62,19 +68,23 @@ def transcribe():
         if 'file' in request.files.keys():
             file = request.files['file']
             worker.getAudio(file)
+            worker.log.info("Start decoding [Audio duration={}(s)]".format(str(int(len(worker.data) / worker.rate))))
             data, confidence = decode(is_metadata)
+            worker.log.info("Decoding complete")
             spk = None
             if do_spk:
                 spk = speakerdiarization.get(worker.file_path)
             trans = worker.get_response(data, spk, confidence, is_metadata)
             response = punctuation.get(trans)
             worker.clean()
+            worker.log.info("... Complete")
         else:
             raise ValueError('No audio file was uploaded')
 
         return response, 200
     except ValueError as error:
-        return str(error), 400
+        worker.log.error(e)
+        return 'Server Error', 400
     except Exception as e:
         worker.log.error(e)
         return 'Server Error', 500
diff --git a/tools.py b/tools.py
index 226f6cc..8238ecc 100644
--- a/tools.py
+++ b/tools.py
@@ -1,20 +1,7 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 
-#  ASR
-from vosk import Model, KaldiRecognizer
-##############
-
-# Speaker Diarization
-from pyBK.diarizationFunctions import *
-import librosa
-import time
-import webrtcvad
-##############
-
-# other packages
 import configparser
-import librosa
 import logging
 import os
 import re
@@ -22,10 +9,9 @@
 import json
 import yaml
 import numpy as np
-from scipy.io import wavfile
+import wavio
 from flask_swagger_ui import get_swaggerui_blueprint
 import requests
-##############
 
 
 class Worker:
@@ -86,7 +72,9 @@ def getAudio(self, file):
         self.file_path = self.TEMP_FILE_PATH+"/"+filename
         file.save(self.file_path)
         try:
-            self.rate, self.data = wavfile.read(self.file_path)
+            file_content = wavio.read(self.file_path)
+            self.rate = file_content.rate
+            self.data = file_content.data
             # if stereo file, convert to mono by computing the mean of the channels
             if len(self.data.shape) == 2 and self.data.shape[1] == 2:
                 self.data = np.mean(self.data, axis=1, dtype=np.int16)
diff --git a/vosk-api b/vosk-api
deleted file mode 160000
index 7f555e4..0000000
--- a/vosk-api
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit 7f555e464c1d6b16233354491868f46d009c453c

From 44888b82eeb905c3498b365f7b9db240d3d4eab9 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 27 Aug 2021 10:32:07 +0200
Subject: [PATCH 064/172] remove speaker information from response

---
 tools.py | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools.py b/tools.py
index 8238ecc..5a6eaae 100644
--- a/tools.py
+++ b/tools.py
@@ -184,14 +184,14 @@ def get_response(self, dataJson, speakers, confidence, is_metadata):
                     # Generate final output data
                     return self.process_output_v2(data, speakers)
                 else:
-                    return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': data['words']}
+                    return {'text': data['text'], 'confidence-score': data['conf'], 'words': data['words']}
 
             elif 'text' in data:
-                return {'speakers': [], 'text': data['text'], 'confidence-score': data['conf'], 'words': []}
+                return {'text': data['text'], 'confidence-score': data['conf'], 'words': []}
             else:
-                return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
+                return {'text': '', 'confidence-score': 0, 'words': []}
         else:
-            return {'speakers': [], 'text': '', 'confidence-score': 0, 'words': []}
+            return {'text': '', 'confidence-score': 0, 'words': []}
 
     # return a json object including word-data, speaker-data
     def process_output(self, data, spkrs):

From 0e9843c89f6beeacae091374627cc726c2a86d86 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 27 Aug 2021 10:49:46 +0200
Subject: [PATCH 065/172] update linstt

---
 run.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run.py b/run.py
index fdcb819..a4bef94 100644
--- a/run.py
+++ b/run.py
@@ -23,7 +23,7 @@
 spkModel = None
 
 def decode(is_metadata):
-    if is_metadata and len(worker.data) / worker.rate > 30 :
+    if is_metadata and len(worker.data) / worker.rate > 1800 :
         recognizer = KaldiRecognizer(model, spkModel, worker.rate, is_metadata, True)
         for i in range(0, len(worker.data), int(worker.rate/4)):
             if recognizer.AcceptWaveform(worker.data[i:i + int(worker.rate/4)]):

From 30af1fc86b20064c3154021a3c0732e3ae5cf7fe Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 31 Aug 2021 22:37:52 +0200
Subject: [PATCH 066/172] add new features/roots

---
 document/swagger.yml | 37 +++++++++++++++++++++-
 run.py               | 74 ++++++++++++++++++++++++++++++++++----------
 tools.py             | 25 +++++++++------
 3 files changed, 109 insertions(+), 27 deletions(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index b52b52c..4426218 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -7,7 +7,7 @@ info:
 
 schemes:
   - http
-host: localhost:8888
+host: 127.0.0.1:8888
 basePath: /
 
 paths:
@@ -30,3 +30,38 @@ paths:
       responses:
         200:
           description: Successfully transcribe the audio
+        400:
+          description: Request error
+        500:
+          description: Server error
+  /transcription/{PID}:
+    get:
+      tags:
+        - "Speech-To-Text API"
+      summary: Perform Speech-to-Text
+      consumes:
+      - "multipart/form-data"
+      produces:
+      - "application/json"
+      - "text/plain"
+      parameters: 
+      - name: "PID"
+        in: "path"
+        description: "PID of a transcribe request"
+        required: true
+        type: "string"
+      responses:
+        200:
+          description: Get transcription
+        400:
+          description: Invalid PID
+  /get/pids:
+    get:
+      tags:
+        - "Speech-To-Text API"
+      summary: Get PIDs
+      produces:
+      - "application/json"
+      responses:
+        200:
+          description: Get list of PIDs
\ No newline at end of file
diff --git a/run.py b/run.py
index a4bef94..22d9677 100644
--- a/run.py
+++ b/run.py
@@ -8,9 +8,13 @@
 from gevent.pywsgi import WSGIServer
 import argparse
 import os
+import _thread
+import uuid
 
 app = Flask("__stt-standelone-worker__")
 
+max_duration = 1800
+
 # instantiate services
 worker = Worker()
 punctuation = Punctuation()
@@ -23,7 +27,7 @@
 spkModel = None
 
 def decode(is_metadata):
-    if is_metadata and len(worker.data) / worker.rate > 1800 :
+    if is_metadata and len(worker.data) / worker.rate > max_duration :
         recognizer = KaldiRecognizer(model, spkModel, worker.rate, is_metadata, True)
         for i in range(0, len(worker.data), int(worker.rate/4)):
             if recognizer.AcceptWaveform(worker.data[i:i + int(worker.rate/4)]):
@@ -38,11 +42,44 @@ def decode(is_metadata):
         data = recognizer.GetMetadata()
     return data, confidence
 
+def processing(is_metadata, do_spk, audio_buffer, file_path=None):
+    try:
+        worker.log.info("Start decoding")
+        data, confidence = decode(is_metadata)
+        worker.log.info("Decoding complete")
+        worker.log.info("Post Processing ...")
+        spk = None
+        if do_spk:
+            spk = speakerdiarization.get(audio_buffer)
+        trans = worker.get_response(data, spk, confidence, is_metadata)
+        response = punctuation.get(trans)
+        worker.log.info("... Complete")
+        if file_path is not None:
+            with open(file_path, 'w') as outfile:
+                json.dump(response, outfile)
+        else:
+            return response
+    except Exception as e:
+        worker.log.error(e)
+        exit(1)
+
 # API
 @app.route('/healthcheck', methods=['GET'])
 def healthcheck():
     return "1", 200
 
+@app.route('/transcription/<PID>', methods=['GET'])
+def transcription(PID):
+    file_path = worker.TRANS_FILES_PATH + "/" + str(PID)
+    if os.path.exists(file_path):
+        return json.load(open(file_path,)), 200
+    else:
+        return "PID {} is invalid".format(str(PID)), 400
+
+@app.route('/get/pids', methods=['GET'])
+def get():
+    return json.load(open(worker.TRANS_FILES_PATH + "/pids.json")), 200
+
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
     try:
@@ -65,26 +102,29 @@ def transcribe():
             raise ValueError('Not accepted header')
 
         # get input file
-        if 'file' in request.files.keys():
-            file = request.files['file']
-            worker.getAudio(file)
-            worker.log.info("Start decoding [Audio duration={}(s)]".format(str(int(len(worker.data) / worker.rate))))
-            data, confidence = decode(is_metadata)
-            worker.log.info("Decoding complete")
-            spk = None
-            if do_spk:
-                spk = speakerdiarization.get(worker.file_path)
-            trans = worker.get_response(data, spk, confidence, is_metadata)
-            response = punctuation.get(trans)
-            worker.clean()
-            worker.log.info("... Complete")
-        else:
+        if 'file' not in request.files.keys():
             raise ValueError('No audio file was uploaded')
 
+        audio_buffer = request.files['file'].read()
+        worker.getAudio(audio_buffer)
+        duration = int(len(worker.data) / worker.rate)
+        if duration > max_duration:
+            filename = str(uuid.uuid4())
+            file_path = worker.TRANS_FILES_PATH + "/" + filename
+        
+            pids = json.load(open(worker.TRANS_FILES_PATH + "/pids.json"))
+            pids['pids'].append({'pid':filename, 'time':strftime("%d/%b/%d %H:%M:%S", gmtime())})
+            with open(worker.TRANS_FILES_PATH + "/pids.json", 'w') as pids_file:
+                json.dump(pids, pids_file)
+
+            _thread.start_new_thread(processing, (is_metadata, do_spk, audio_buffer, file_path,))
+            return "The approximate decoding time is {} seconds. Use this PID={} to get the transcription after decoding.".format(str(int(duration*0.33)), filename), 200
+        response = processing(is_metadata, do_spk, audio_buffer)
+        
         return response, 200
-    except ValueError as error:
+    except ValueError as e:
         worker.log.error(e)
-        return 'Server Error', 400
+        return str(e), 400
     except Exception as e:
         worker.log.error(e)
         return 'Server Error', 500
diff --git a/tools.py b/tools.py
index 5a6eaae..0450885 100644
--- a/tools.py
+++ b/tools.py
@@ -4,6 +4,7 @@
 import configparser
 import logging
 import os
+import io
 import re
 import uuid
 import json
@@ -24,6 +25,7 @@ def __init__(self):
         self.AM_PATH = '/opt/models/AM'
         self.LM_PATH = '/opt/models/LM'
         self.TEMP_FILE_PATH = '/opt/tmp'
+        self.TRANS_FILES_PATH = '/opt/trans'
         self.CONFIG_FILES_PATH = '/opt/config'
         self.SAVE_AUDIO = False
         self.SERVICE_PORT = 80
@@ -38,6 +40,12 @@ def __init__(self):
         if not os.path.isdir(self.TEMP_FILE_PATH):
             os.mkdir(self.TEMP_FILE_PATH)
 
+        if not os.path.isdir(self.TRANS_FILES_PATH):
+            os.mkdir(self.TRANS_FILES_PATH)
+
+        with open(self.TRANS_FILES_PATH + "/pids.json", 'w') as outfile:
+            json.dump({'pids':[]}, outfile)
+
         # Environment parameters
         if 'SAVE_AUDIO' in os.environ:
             self.SAVE_AUDIO = True if os.environ['SAVE_AUDIO'].lower(
@@ -68,11 +76,8 @@ def swaggerUI(self, app):
         ### end swagger specific ###
 
     def getAudio(self, file):
-        filename = str(uuid.uuid4())
-        self.file_path = self.TEMP_FILE_PATH+"/"+filename
-        file.save(self.file_path)
         try:
-            file_content = wavio.read(self.file_path)
+            file_content = wavio.read(io.BytesIO(file))
             self.rate = file_content.rate
             self.data = file_content.data
             # if stereo file, convert to mono by computing the mean of the channels
@@ -82,10 +87,12 @@ def getAudio(self, file):
             self.log.error(e)
             raise ValueError("The uploaded file format is not supported!!!")
 
-    def clean(self):
-        if not self.SAVE_AUDIO:
-            os.remove(self.file_path)
-        del self.data
+    def saveFile(self, file):
+        if self.SAVE_AUDIO:
+            filename = str(uuid.uuid4())
+            self.file_path = self.TEMP_FILE_PATH+"/"+filename
+            file.save(self.file_path)
+            
 
     # re-create config files
     def loadConfig(self):
@@ -184,7 +191,7 @@ def get_response(self, dataJson, speakers, confidence, is_metadata):
                     # Generate final output data
                     return self.process_output_v2(data, speakers)
                 else:
-                    return {'text': data['text'], 'confidence-score': data['conf'], 'words': data['words']}
+                    return {'text': self.parse_text(data['text']), 'confidence-score': data['conf'], 'words': data['words']}
 
             elif 'text' in data:
                 return {'text': data['text'], 'confidence-score': data['conf'], 'words': []}

From 86fc204df63d87cf6831011f8970fa1248d4e59b Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Tue, 31 Aug 2021 22:52:58 +0200
Subject: [PATCH 067/172] rename PID to jobid

---
 document/swagger.yml | 14 +++++++-------
 run.py               | 24 ++++++++++++------------
 tools.py             |  4 ++--
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index 4426218..d169694 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -34,7 +34,7 @@ paths:
           description: Request error
         500:
           description: Server error
-  /transcription/{PID}:
+  /transcription/{jobid}:
     get:
       tags:
         - "Speech-To-Text API"
@@ -45,23 +45,23 @@ paths:
       - "application/json"
       - "text/plain"
       parameters: 
-      - name: "PID"
+      - name: "jobid"
         in: "path"
-        description: "PID of a transcribe request"
+        description: "jobid of a transcribe request"
         required: true
         type: "string"
       responses:
         200:
           description: Get transcription
         400:
-          description: Invalid PID
-  /get/pids:
+          description: Invalid jobid
+  /get/jobids:
     get:
       tags:
         - "Speech-To-Text API"
-      summary: Get PIDs
+      summary: Get jobids
       produces:
       - "application/json"
       responses:
         200:
-          description: Get list of PIDs
\ No newline at end of file
+          description: Get list of jobids
\ No newline at end of file
diff --git a/run.py b/run.py
index 22d9677..2a039c4 100644
--- a/run.py
+++ b/run.py
@@ -68,17 +68,17 @@ def processing(is_metadata, do_spk, audio_buffer, file_path=None):
 def healthcheck():
     return "1", 200
 
-@app.route('/transcription/<PID>', methods=['GET'])
-def transcription(PID):
-    file_path = worker.TRANS_FILES_PATH + "/" + str(PID)
+@app.route('/transcription/<jobid>', methods=['GET'])
+def transcription(jobid):
+    file_path = worker.TRANS_FILES_PATH + "/" + str(jobid)
     if os.path.exists(file_path):
         return json.load(open(file_path,)), 200
     else:
-        return "PID {} is invalid".format(str(PID)), 400
+        return "jobid {} is invalid".format(str(jobid)), 400
 
-@app.route('/get/pids', methods=['GET'])
+@app.route('/get/jobids', methods=['GET'])
 def get():
-    return json.load(open(worker.TRANS_FILES_PATH + "/pids.json")), 200
+    return json.load(open(worker.TRANS_FILES_PATH + "/jobids.json")), 200
 
 @app.route('/transcribe', methods=['POST'])
 def transcribe():
@@ -109,16 +109,16 @@ def transcribe():
         worker.getAudio(audio_buffer)
         duration = int(len(worker.data) / worker.rate)
         if duration > max_duration:
-            filename = str(uuid.uuid4())
-            file_path = worker.TRANS_FILES_PATH + "/" + filename
+            jobid = str(uuid.uuid4())
+            file_path = worker.TRANS_FILES_PATH + "/" + jobid
         
-            pids = json.load(open(worker.TRANS_FILES_PATH + "/pids.json"))
-            pids['pids'].append({'pid':filename, 'time':strftime("%d/%b/%d %H:%M:%S", gmtime())})
-            with open(worker.TRANS_FILES_PATH + "/pids.json", 'w') as pids_file:
+            pids = json.load(open(worker.TRANS_FILES_PATH + "/jobids.json"))
+            pids['jobids'].append({'jobid':jobid, 'time':strftime("%d/%b/%d %H:%M:%S", gmtime())})
+            with open(worker.TRANS_FILES_PATH + "/jobids.json", 'w') as pids_file:
                 json.dump(pids, pids_file)
 
             _thread.start_new_thread(processing, (is_metadata, do_spk, audio_buffer, file_path,))
-            return "The approximate decoding time is {} seconds. Use this PID={} to get the transcription after decoding.".format(str(int(duration*0.33)), filename), 200
+            return "The approximate decoding time is {} seconds. Use this jobid={} to get the transcription after decoding.".format(str(int(duration*0.33)), jobid), 200
         response = processing(is_metadata, do_spk, audio_buffer)
         
         return response, 200
diff --git a/tools.py b/tools.py
index 0450885..d9353d7 100644
--- a/tools.py
+++ b/tools.py
@@ -43,8 +43,8 @@ def __init__(self):
         if not os.path.isdir(self.TRANS_FILES_PATH):
             os.mkdir(self.TRANS_FILES_PATH)
 
-        with open(self.TRANS_FILES_PATH + "/pids.json", 'w') as outfile:
-            json.dump({'pids':[]}, outfile)
+        with open(self.TRANS_FILES_PATH + "/jobids.json", 'w') as outfile:
+            json.dump({'jobids':[]}, outfile)
 
         # Environment parameters
         if 'SAVE_AUDIO' in os.environ:

From 701e4e2046c68b0b97579751e7301623e947a176 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 2 Sep 2021 17:12:40 +0200
Subject: [PATCH 068/172] fix audio format to send to the service of speaker
 diarization

---
 tools.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools.py b/tools.py
index d9353d7..6bee1bc 100644
--- a/tools.py
+++ b/tools.py
@@ -317,11 +317,10 @@ def setParam(self, SPEAKER_DIARIZATION_ISON):
         self.log.info(self.url) if self.url is not None else self.log.warn(
             "The Speaker Diarization service is not running!")
 
-    def get(self, audio_path):
+    def get(self, audio_buffer):
         try:
             if self.SPEAKER_DIARIZATION_ISON:
-                file = open(audio_path, 'rb')
-                result = requests.post(self.url, files={'file': file})
+                result = requests.post(self.url, files={'file': audio_buffer})
                 if result.status_code != 200:
                     raise ValueError(result.text)
 

From 2df35e63aa96ac530fd77042372c38b67669b4e7 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 2 Sep 2021 20:51:34 +0200
Subject: [PATCH 069/172] update README

---
 README.md | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 64 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 419dcd6..aae3c2d 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,10 @@ This service is mandatory in a LinTO platform stack as the main worker for speec
 
 Generally, Automatic Speech Recognition (ASR) is the task of recognition and translation of spoken language into text. Our ASR system takes advantages from the recent advances in machine learning technologies and in particular deep learning ones (TDNN, LSTM, attentation-based architecture). The core of our system consists of two main components: an acoustic model and a decoding graph. A high-performance ASR system relies on an accurate acoustic model as well as a perfect decoding graph.
 
+**NB**: The service works as follows: 
+* If the audio's duration is less that 30 minutes, the service will return the transcription after decoding.
+* Otherwise, the server will return a **jobid** that could be used to get the transcription after decoding using the API **`/transcription/{jobid}`**.
+
 ## Usage
 See documentation : [doc.linto.ai](https://doc.linto.ai)
 
@@ -27,7 +31,7 @@ While there is no specific minimal requirement on the CPU, speech recognition is
 ## Installation
 
 ### Packaged in Docker
-To start the LinSTT service on your local machine or your cloud, you need first to download the source code and set the environment file, as follows:
+To start the STT worker on your local machine or your cloud, you need first to download the source code and set the environment file, as follows:
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt-standalone-worker
@@ -57,7 +61,7 @@ docker pull lintoai/linto-platform-stt-standalone-worker:latest
 NB: You must install docker and docker-compose on your machine.
 
 ## Configuration
-The LinSTT service that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [Downloads](https://doc.linto.ai/#/services/linstt_download).
+The STT worker that will be set-up here require KALDI models, the acoustic model and the decoding graph. Indeed, these models are not included in the repository; you must download them in order to run LinSTT. You can use our pre-trained models from here: [Downloads](https://doc.linto.ai/#/services/linstt_download).
 
 1- Download the French acoustic model and the small decoding graph (linstt.v1). You can download the latest version for optimal performance and you should make sure that you have the hardware requirement in terms of RAM.
 
@@ -83,8 +87,10 @@ mv DG_fr-FR ~/linstt_model_storage
 
 4- Configure the environment file `.env` included in this repository
 
+```bash
     AM_PATH=~/linstt_model_storage/AM_fr-FR
     LM_PATH=~/linstt_model_storage/DG_fr-FR
+```
 
 NB: if you want to use the visual user interface of the service, you need also to configure the swagger file `document/swagger.yml` included in this repository. Specifically, in the section `host`, specify the adress of the machine in which the service is deployed.
 
@@ -129,6 +135,32 @@ Convert a speech to text
 >
 >  **{text|Json}** : Return the full transcription or a json object with metadata
 
+
+#### /transcription/{jobid}
+
+Get the transcription using the jobid
+
+### Functionality
+>  `get`  <br>
+> Make a GET request
+>>  <b  style="color:green;">Arguments</b> :
+>>  -  **{String} jobid** jobid - An identifier used to find the corresponding transcription
+>
+>  **{text|Json}** : Return the transcription
+
+
+#### /jobids
+
+List of the transcription jobids
+
+### Functionality
+>  `get`  <br>
+> Make a GET request
+>>  <b  style="color:green;">Arguments</b> :
+>>  - no arguments
+>
+>  **{Json}** : Return a json object with jobids
+
 <!-- tabs:end -->
 
 
@@ -145,4 +177,33 @@ And run the test script:
 ./test_deployment.sh
 ```
 
-To run personal test, you can use swagger interface: `localhost:8888/api-doc/`
\ No newline at end of file
+To run personal test, you can use swagger interface: `localhost:8888/api-doc/`
+
+
+### Extrat metadata
+If you would like to have a transcription with speaker information and punctuation marks, it's possible thanks to our open-source services:
+
+* Speaker diarization worker: https://github.com/linto-ai/linto-platform-speaker-diarization-worker
+* Text punctuation worker: https://github.com/linto-ai/linto-platform-text-punctuation-worker
+
+To do that, you need first to start either the speaker or punctuation service or you can start both if it's necessary. **Please read the documentation to know how to install, configure, and start these services.**
+
+Once the services are on, you need to configure the STT worker as follows:
+
+1- Edit the environment file `.env` as follows:
+
+* if you started the punctuation worker, the following variables should be used
+
+```bash
+    PUCTUATION_HOST=text-punctuation-worker-host-name
+    PUCTUATION_PORT=worker-port-example-80
+    PUCTUATION_ROUTE=/api/route/path/
+```
+* if you started the speaker diarization worker, the following variables should be used
+
+```bash
+    SPEAKER_DIARIZATION_HOST=speaker-diarization-worker-host-name
+    SPEAKER_DIARIZATION_PORT=worker-port-example-80
+```
+
+2- Start the service using the same command described in section **Execute**
\ No newline at end of file

From 5058f436d0967b74be1736b845299766ac9ca773 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Thu, 2 Sep 2021 20:57:23 +0200
Subject: [PATCH 070/172] remove extra parameters

---
 .envdefault | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/.envdefault b/.envdefault
index 8cc601e..130f6ef 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,10 +1,3 @@
 AM_PATH=/path/to/acoustic/models/dir
 LM_PATH=/path/to/language/models/dir
-SWAGGER_PATH=./document/swagger.yml
-
-# dependent services config
-PUCTUATION_HOST=text-punctuation-worker-host-name
-PUCTUATION_PORT=8080
-PUCTUATION_ROUTE="/api/route/path/"
-SPEAKER_DIARIZATION_HOST=speaker-diarization-worker-host-name
-SPEAKER_DIARIZATION_PORT=80
\ No newline at end of file
+SWAGGER_PATH=./document/swagger.yml
\ No newline at end of file

From 2d5a26e64899d9634f3c7f0f6a79fb47413030f2 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 3 Sep 2021 09:55:38 +0200
Subject: [PATCH 071/172] update the response

---
 run.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/run.py b/run.py
index 2a039c4..8adbabe 100644
--- a/run.py
+++ b/run.py
@@ -13,7 +13,7 @@
 
 app = Flask("__stt-standelone-worker__")
 
-max_duration = 1800
+max_duration = 10
 
 # instantiate services
 worker = Worker()
@@ -118,7 +118,13 @@ def transcribe():
                 json.dump(pids, pids_file)
 
             _thread.start_new_thread(processing, (is_metadata, do_spk, audio_buffer, file_path,))
-            return "The approximate decoding time is {} seconds. Use this jobid={} to get the transcription after decoding.".format(str(int(duration*0.33)), jobid), 200
+            estdur = str(int(duration*0.33))
+            response = {
+                'jobid': jobid,
+                'decoding_time': '~' + estdur + ' seconds',
+                'message': "Use the jobid to get the transcription after decoding",
+            }
+            return response, 200
         response = processing(is_metadata, do_spk, audio_buffer)
         
         return response, 200

From 6e96e339046ddd04c73681ba0a3d35da04b34654 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 3 Sep 2021 12:17:43 +0200
Subject: [PATCH 072/172] update the decoding function

---
 document/swagger.yml | 4 ++--
 run.py               | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/document/swagger.yml b/document/swagger.yml
index d169694..a1ed3e7 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -7,7 +7,7 @@ info:
 
 schemes:
   - http
-host: 127.0.0.1:8888
+host: localhost:8888
 basePath: /
 
 paths:
@@ -55,7 +55,7 @@ paths:
           description: Get transcription
         400:
           description: Invalid jobid
-  /get/jobids:
+  /jobids:
     get:
       tags:
         - "Speech-To-Text API"
diff --git a/run.py b/run.py
index 8adbabe..e095af8 100644
--- a/run.py
+++ b/run.py
@@ -13,7 +13,7 @@
 
 app = Flask("__stt-standelone-worker__")
 
-max_duration = 10
+max_duration = 1800
 
 # instantiate services
 worker = Worker()
@@ -27,7 +27,7 @@
 spkModel = None
 
 def decode(is_metadata):
-    if is_metadata and len(worker.data) / worker.rate > max_duration :
+    if len(worker.data) / worker.rate > max_duration :
         recognizer = KaldiRecognizer(model, spkModel, worker.rate, is_metadata, True)
         for i in range(0, len(worker.data), int(worker.rate/4)):
             if recognizer.AcceptWaveform(worker.data[i:i + int(worker.rate/4)]):
@@ -76,7 +76,7 @@ def transcription(jobid):
     else:
         return "jobid {} is invalid".format(str(jobid)), 400
 
-@app.route('/get/jobids', methods=['GET'])
+@app.route('/jobids', methods=['GET'])
 def get():
     return json.load(open(worker.TRANS_FILES_PATH + "/jobids.json")), 200
 
@@ -118,7 +118,7 @@ def transcribe():
                 json.dump(pids, pids_file)
 
             _thread.start_new_thread(processing, (is_metadata, do_spk, audio_buffer, file_path,))
-            estdur = str(int(duration*0.33))
+            estdur = str(int(duration*0.3)) if is_metadata else str(int(duration*0.18))
             response = {
                 'jobid': jobid,
                 'decoding_time': '~' + estdur + ' seconds',

From 14604313f33f1c1c5056abb518bc51b03293cdf4 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 3 Sep 2021 12:31:48 +0200
Subject: [PATCH 073/172] update README

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index aae3c2d..bbeb319 100644
--- a/README.md
+++ b/README.md
@@ -123,7 +123,7 @@ Our service requires an audio file in `Waveform format`. It should has the follo
 
 Convert a speech to text
 
-### Functionality
+#### Functionality
 >  `post`  <br>
 > Make a POST request
 >>  <b  style="color:green;">Arguments</b> :
@@ -140,7 +140,7 @@ Convert a speech to text
 
 Get the transcription using the jobid
 
-### Functionality
+#### Functionality
 >  `get`  <br>
 > Make a GET request
 >>  <b  style="color:green;">Arguments</b> :
@@ -153,7 +153,7 @@ Get the transcription using the jobid
 
 List of the transcription jobids
 
-### Functionality
+#### Functionality
 >  `get`  <br>
 > Make a GET request
 >>  <b  style="color:green;">Arguments</b> :
@@ -180,7 +180,7 @@ And run the test script:
 To run personal test, you can use swagger interface: `localhost:8888/api-doc/`
 
 
-### Extrat metadata
+### Additional metadata
 If you would like to have a transcription with speaker information and punctuation marks, it's possible thanks to our open-source services:
 
 * Speaker diarization worker: https://github.com/linto-ai/linto-platform-speaker-diarization-worker

From 04535cac1c1f8c23a20bb7915f742b5ef5d8b654 Mon Sep 17 00:00:00 2001
From: Ilyes Rebai <irebai@linagora.com>
Date: Fri, 1 Oct 2021 00:30:14 +0200
Subject: [PATCH 074/172] fix response format

---
 run.py   |   6 ++-
 tools.py | 120 ++++++++++++++++++-------------------------------------
 2 files changed, 44 insertions(+), 82 deletions(-)

diff --git a/run.py b/run.py
index e095af8..7a47547 100644
--- a/run.py
+++ b/run.py
@@ -50,8 +50,12 @@ def processing(is_metadata, do_spk, audio_buffer, file_path=None):
         worker.log.info("Post Processing ...")
         spk = None
         if do_spk:
-            spk = speakerdiarization.get(audio_buffer)
+            spk = speakerdiarization.get(audio_buffer, int(len(worker.data) / worker.rate))
         trans = worker.get_response(data, spk, confidence, is_metadata)
+        
+        if trans is None:
+            raise ValueError('Transcription error')
+
         response = punctuation.get(trans)
         worker.log.info("... Complete")
         if file_path is not None:
diff --git a/tools.py b/tools.py
index 6bee1bc..e285a66 100644
--- a/tools.py
+++ b/tools.py
@@ -182,73 +182,17 @@ def get_response(self, dataJson, speakers, confidence, is_metadata):
         if dataJson is not None:
             data = json.loads(dataJson)
             data['conf'] = confidence
-            if not is_metadata:
-                text = data['text']  # get text from response
-                return self.parse_text(text)
-
-            elif 'words' in data:
-                if speakers is not None:
+            if 'text' in data:
+                if not is_metadata:
+                    text = data['text']  # get text from response
+                    return self.parse_text(text)
+                elif 'words' in data and len(data['words']) > 0:
                     # Generate final output data
-                    return self.process_output_v2(data, speakers)
-                else:
-                    return {'text': self.parse_text(data['text']), 'confidence-score': data['conf'], 'words': data['words']}
-
-            elif 'text' in data:
-                return {'text': data['text'], 'confidence-score': data['conf'], 'words': []}
-            else:
-                return {'text': '', 'confidence-score': 0, 'words': []}
-        else:
-            return {'text': '', 'confidence-score': 0, 'words': []}
+                    return self.process_output(data, speakers)
+        return None
 
     # return a json object including word-data, speaker-data
     def process_output(self, data, spkrs):
-        try:
-            speakers = []
-            text = []
-            i = 0
-            text_ = ""
-            words = []
-            for word in data['words']:
-                if i+1 == len(spkrs):
-                    continue
-                if i+1 < len(spkrs) and word["end"] < spkrs[i+1][0]:
-                    text_ += word["word"] + " "
-                    words.append(word)
-                elif len(words) != 0:
-                    speaker = {}
-                    speaker["start"] = words[0]["start"]
-                    speaker["end"] = words[len(words)-1]["end"]
-                    speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
-                    speaker["words"] = words
-
-                    text.append(
-                        'spk'+str(int(spkrs[i][2]))+' : ' + self.parse_text(text_))
-                    speakers.append(speaker)
-
-                    words = [word]
-                    text_ = word["word"] + " "
-                    i += 1
-                else:
-                    words = [word]
-                    text_ = word["word"] + " "
-                    i += 1
-
-            speaker = {}
-            speaker["start"] = words[0]["start"]
-            speaker["end"] = words[len(words)-1]["end"]
-            speaker["speaker_id"] = 'spk'+str(int(spkrs[i][2]))
-            speaker["words"] = words
-
-            text.append('spk'+str(int(spkrs[i][2])) +
-                        ' : ' + self.parse_text(text_))
-            speakers.append(speaker)
-
-            return {'speakers': speakers, 'text': text, 'confidence-score': data['conf']}
-        except:
-            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
-
-    # return a json object including word-data, speaker-data
-    def process_output_v2(self, data, spkrs):
         try:
             speakers = []
             text = []
@@ -256,6 +200,9 @@ def process_output_v2(self, data, spkrs):
             text_ = ""
             words = []
 
+            # Capitalize first word
+            data['words'][0]['word'] = data['words'][0]['word'].capitalize()
+            
             for word in data['words']:
                 if i+1 == len(spkrs):
                     continue
@@ -265,7 +212,7 @@ def process_output_v2(self, data, spkrs):
                 elif len(words) != 0:
                     speaker = {}
                     speaker["start"] = words[0]["start"]
-                    speaker["end"] = words[len(words)-1]["end"]
+                    speaker["end"] = words[-1]["end"]
                     speaker["speaker_id"] = str(spkrs[i]["spk_id"])
                     speaker["words"] = words
 
@@ -281,9 +228,13 @@ def process_output_v2(self, data, spkrs):
                     text_ = word["word"] + " "
                     i += 1
 
+            if i == 0:
+                words = data['words']
+                text_ = data['text'].capitalize()
+
             speaker = {}
             speaker["start"] = words[0]["start"]
-            speaker["end"] = words[len(words)-1]["end"]
+            speaker["end"] = words[-1]["end"]
             speaker["speaker_id"] = str(spkrs[i]["spk_id"])
             speaker["words"] = words
 
@@ -294,7 +245,7 @@ def process_output_v2(self, data, spkrs):
             return {'speakers': speakers, 'text': text, 'confidence-score': data['conf']}
         except Exception as e:
             self.log.error(e)
-            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'spks': []}
+            return {'text': data['text'], 'words': data['words'], 'confidence-score': data['conf'], 'speakers': []}
 
 
 class SpeakerDiarization:
@@ -317,7 +268,14 @@ def setParam(self, SPEAKER_DIARIZATION_ISON):
         self.log.info(self.url) if self.url is not None else self.log.warn(
             "The Speaker Diarization service is not running!")
 
-    def get(self, audio_buffer):
+    def get(self, audio_buffer, duration):
+        emptyReturn = [{
+                        "seg_id":1,
+                        "spk_id":"spk1",
+                        "seg_begin":0,
+                        "seg_end":duration,
+                    }]
+
         try:
             if self.SPEAKER_DIARIZATION_ISON:
                 result = requests.post(self.url, files={'file': audio_buffer})
@@ -337,13 +295,13 @@ def get(self, audio_buffer):
                 
                 return speakers
             else:
-                return None
+                return emptyReturn
         except Exception as e:
             self.log.error(str(e))
-            return None
+            return emptyReturn
         except ValueError as error:
             self.log.error(str(error))
-            return None
+            return emptyReturn
 
 
 class Punctuation:
@@ -368,24 +326,24 @@ def setParam(self, PUCTUATION_ISON):
         self.log.info(self.url) if self.url is not None else self.log.warn(
             "The Punctuation service is not running!")
 
-    def get(self, text):
+    def get(self, obj):
         try:
             if self.PUCTUATION_ISON:
-                if isinstance(text, dict):          
-                    if isinstance(text['text'], list):
+                if isinstance(obj, dict):          
+                    if isinstance(obj['text'], list):
                         text_punc = []
-                        for utterance in text['text']:
+                        for utterance in obj['text']:
                             data = utterance.split(':')
                             result = requests.post(self.url, data=data[1].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
                             if result.status_code != 200:
                                 raise ValueError(result.text)
                             
                             text_punc.append(data[0]+": "+result.text)
-                        text['text'] = text_punc
+                        obj['text-punc'] = text_punc
                     else:
-                        result = requests.post(self.url, data=text['text'].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
-                        text['text'] = result.text
-                    return text
+                        result = requests.post(self.url, data=obj['text'].strip().encode('utf-8'), headers={'content-type': 'application/octet-stream'})
+                        obj['text-punc'] = result.text
+                    return obj
                 else:
                     result = requests.post(self.url, data=text.encode('utf-8'), headers={'content-type': 'application/octet-stream'})
                     if result.status_code != 200:
@@ -393,11 +351,11 @@ def get(self, text):
 
                     return result.text
             else:
-                return text
+                return obj
         except Exception as e:
             self.log.error(str(e))
-            return text
+            return obj
         except ValueError as error:
             self.log.error(str(error))
-            return text
+            return obj
 

From 2f48a88da82ba4aa8371670a5133af1e1dace9b2 Mon Sep 17 00:00:00 2001
From: Houpert <yoann.houpert@yahoo.fr>
Date: Wed, 16 Feb 2022 16:25:55 +0100
Subject: [PATCH 075/172] Update Dockerfile

---
 Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 8aae3ae..f7ec8bb 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -42,7 +42,7 @@ RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi /opt/kal
        fi \
     && sed -i 's:-msse -msse2:-msse -msse2:g' kaldi.mk \
     && sed -i 's: -O1 : -O3 :g' kaldi.mk \
-    && make -j 32 online2 lm rnnlm
+    && make -j $(nproc) online2 lm rnnlm
 
 # Install python dependencies
 COPY requirements.txt ./
@@ -69,4 +69,4 @@ ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
 HEALTHCHECK CMD ./healthcheck.sh
 
-ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file
+ENTRYPOINT ["./docker-entrypoint.sh"]

From f6fbe703fe237fe6dd6dd775cd7c188c726cd6d5 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Wed, 9 Mar 2022 09:57:29 +0000
Subject: [PATCH 076/172] 3.3.0 Vosk rebase and streaming. See RELEASE.md

---
 .envdefault                  |  17 +++++
 .gitignore                   |   4 +-
 Dockerfile                   |   6 +-
 README.md                    | 133 ++++++++++++++++++++++-------------
 RELEASE.md                   |   7 ++
 celery_app/tasks.py          |  10 +--
 docker-entrypoint.sh         |  30 +++++++-
 http_server/ingress.py       |  20 +++++-
 lin_to_vosk.py               |  86 ++++++++++++++++++++++
 requirements.txt             |   2 +
 stt/processing/__init__.py   |  20 ++----
 stt/processing/decoding.py   |  18 ++---
 stt/processing/model.py      |  81 ---------------------
 stt/processing/streaming.py  | 107 ++++++++++++++++++++++++++++
 websocket/__init__.py        |   0
 websocket/websocketserver.py |  21 ++++++
 16 files changed, 398 insertions(+), 164 deletions(-)
 create mode 100644 .envdefault
 create mode 100755 lin_to_vosk.py
 delete mode 100644 stt/processing/model.py
 create mode 100644 stt/processing/streaming.py
 create mode 100644 websocket/__init__.py
 create mode 100644 websocket/websocketserver.py

diff --git a/.envdefault b/.envdefault
new file mode 100644
index 0000000..33a394c
--- /dev/null
+++ b/.envdefault
@@ -0,0 +1,17 @@
+# SERVING PARAMETERS
+SERVICE_MODE=http
+MODEL_TYPE=lin
+
+# HTTP PARAMETERS
+ENABLE_STREAMING=true
+
+# TASK PARAMETERS
+SERVICE_NAME=stt
+SERVICES_BROKER=redis://192.168.0.1:6379
+BROKER_PASS=password
+
+# WEBSOCKET PARAMETERS
+STREAMING_PORT=80
+
+# CONCURRENCY
+CONCURRENCY=2
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index d2b976b..8ad21e4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,3 @@
-start_container.sh
\ No newline at end of file
+start_container.sh
+.env
+test/*
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
index f7ec8bb..af7f731 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -49,9 +49,9 @@ COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 
 # Install Custom Vosk API
-RUN git clone --depth 1 https://github.com/linto-ai/linto-vosk-api.git /opt/vosk-api && cd /opt/vosk-api/python && \
+RUN git clone --depth 1 https://github.com/alphacep/vosk-api /opt/vosk-api && cd /opt/vosk-api/python && \
     cd /opt/vosk-api/src \
-    && KALDI_MKL=$KALDI_MKL KALDI_ROOT=/opt/kaldi make -j 32 \
+    && KALDI_MKL=$KALDI_MKL KALDI_ROOT=/opt/kaldi make -j $(nproc) \
     && cd /opt/vosk-api/python \
     && python3 ./setup.py install
 
@@ -60,8 +60,10 @@ WORKDIR /usr/src/app
 COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
+COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
 COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY lin_to_vosk.py /usr/src/app/lin_to_vosk.py
 
 RUN mkdir -p /var/log/supervisor/
 
diff --git a/README.md b/README.md
index bf1b4a8..aa711f7 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,6 @@
 # LINTO-PLATFORM-STT
 LinTO-platform-stt is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack).
 
-The STT-worker is configured with an acoustic model and a language model to perform Speech-To-Text tasks with high efficiency.
-
 LinTO-platform-stt can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## Pre-requisites
@@ -10,66 +8,101 @@ LinTO-platform-stt can either be used as a standalone transcription service or d
 ### Hardware
 To run the transcription models you'll need:
 * At least 7Go of disk space to build the docker image.
-* 500MB-3GB-7GB of RAM depending on the model used (small-medium-large).
+* Up to 7GB of RAM depending on the model used.
 * One CPU per worker. Inference time scales on CPU performances. 
 
 ### Model
-The transcription service relies on 2 models:
-* An acoustic model.
-* A language model (or decoding graph).
+LinTO-Platform-STT accepts two kinds of models:
+* LinTO Acoustic and Languages models.
+* Vosk models.
 
-We provide some models on [dl.linto.ai](https://dl.linto.ai/downloads/model-distribution/).
+We provide home-cured models (v2) on [dl.linto.ai](https://doc.linto.ai/#/services/linstt_download).
+Or you can also use Vosk models available [here](https://alphacephei.com/vosk/models).
 
 ### Docker
 The transcription service requires docker up and running.
 
 ### (micro-service) Service broker and shared folder
-The STT only entry point in job mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
-On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder.
+The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
+On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
 
 ## Deploy linto-platform-stt
-linto-platform-stt can be deployed three ways:
-* As a standalone transcription service through an HTTP API.
-* As a micro-service connected to a message broker.
 
-**1- First step is to build the image:**
+**1- First step is to build or pull the image:**
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt.git
 cd linto-platform-stt
 docker build . -t linto-platform-stt:latest
 ```
+or
+
+```bash
+docker pull lintoai/linto-platform-stt
+```
 
 **2- Download the models**
 
-Have the acoustic and language model ready at AM_PATH and LM_PATH.
+Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL.
 
-### HTTP API
+**3- Fill the .env**
+
+```bash
+cp .envdefault .env
+```
+
+| PARAMETER | DESCRIPTION | EXEMPLE |
+|---|---|---|
+| SERVING_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
+| MODEL_TYPE | Type of STT model used. | lin\|vosk |
+| ENABLE_STREAMING | Using http serving mode, enable the /streaming websocket route | true\|false |
+| SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
+| SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
+| BROKER_PASS | Using the task mode, broker password | my-password |
+| STREAMING_PORT | Using the websocket mode, the listening port for ingoing WS connexions.  | 80 |
+| CONCURRENCY | Maximum number of parallel requests | >1 |
+
+### Serving mode 
+![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
+
+STT can be use three ways:
+* Through an [HTTP API](#http-server) using the **http**'s mode.
+* Through a [message broker](#micro-service-within-linto-platform-stack) using the **task**'s mode.
+* Through a [websocket server](#websocket-server) **websocket**'s mode.
+
+Mode is specified using the .env value or environment variable ```SERVING_MODE```.
+```bash
+SERVING_MODE=http
+```
+### HTTP Server
+The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route.
+
+The SERVING_MODE value in the .env should be set to ```http```.
 
 ```bash
 docker run --rm \
 -p HOST_SERVING_PORT:80 \
--v AM_PATH:/opt/models/AM \
--v LM_PATH:/opt/models/LM \
---env SERVICE_NAME=stt \
---env LANGUAGE=en_US \
---env SERVICE_MODE=http \
---env CONCURRENCY=10 \
+-v AM_PATH:/opt/AM \
+-v LM_PATH:/opt/LM \
+--env-file .env \
 linto-platform-stt:latest
 ```
 
-This will run a container providing an http API binded on the host HOST_SERVING_PORT port.
+This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
 
 **Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
 | HOST_SERVING_PORT | Host serving port | 80 |
-| AM_PATH | Path to the acoustic model | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| LM_PATH | Path to the language model | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| LANGUAGE | Language code as a BCP-47 code  | en-US, fr_FR, ... |
-| CONCURRENCY | Number of worker (1 worker = 1 cpu) | 4 |
+| AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
+| LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
+| MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
 
 ### Micro-service within LinTO-Platform stack
+The HTTP serving mode connect a celery worker to a message broker.
+
+The SERVING_MODE value in the .env should be set to ```task```.
+
 >LinTO-platform-stt can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
 >LinTO-platform-stt in task mode is not intended to be launch manually.
 >However, if you intent to connect it to your custom message's broker here are the parameters:
@@ -81,35 +114,29 @@ docker run --rm \
 -v AM_PATH:/opt/models/AM \
 -v LM_PATH:/opt/models/LM \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
---env SERVICES_BROKER=MY_SERVICE_BROKER \
---env BROKER_PASS=MY_BROKER_PASS \
---env SERVICE_NAME=stt \
---env LANGUAGE=en_US \
---env SERVICE_MODE=task \
---env CONCURRENCY=10 \
-linstt:dev
+--env-file .env \
+linto-platform-stt:latest
 ```
 
 **Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
-| AM_PATH | Path to the acoustic model | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| LM_PATH | Path to the language model | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| SERVICES_BROKER | Service broker uri | redis://my_redis_broker:6379 |
-| BROKER_PASS | Service broker password (Leave empty if there is no password) | my_password |
-| SERVICE_NAME* | Transcription service name | my_stt |
-| LANGUAGE | Transcription language | en-US |
-| CONCURRENCY | Number of worker (1 worker = 1 cpu) | [ 1 -> numberOfCPU] |
+| AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
+| LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
+| MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
+| SHARED_AUDIO_FOLDER | Shared audio folder mounted to /opt/audio | /my/path/to/models/vosk-model |
 
-(* SERVICE NAME needs to be the same as the linto-platform-transcription-service if used.)
 
+### Websocket Server
+Websocket server's mode deploy a streaming transcription service only.
 
-## Usages
+The SERVING_MODE value in the .env should be set to ```websocket```.
 
-### HTTP API
+Usage is the same as the [http streaming API](#/streaming)
 
+## Usages
+### HTTP API
 #### /healthcheck
-
 Returns the state of the API
 
 Method: GET
@@ -117,12 +144,11 @@ Method: GET
 Returns "1" if healthcheck passes.
 
 #### /transcribe
-
 Transcription API
 
 * Method: POST
 * Response content: text/plain or application/json
-* File: An Wave f ile 16b 16Khz
+* File: An Wave file 16b 16Khz
 
 Return the transcripted text using "text/plain" or a json object when using "application/json" structure as followed:
 ```json
@@ -136,8 +162,20 @@ Return the transcripted text using "text/plain" or a json object when using "app
 }
 ```
 
+#### /streaming
+The /streaming route is accessible if the ENABLE_STREAMING environment variable is set to true.
+
+The route accepts websocket connexions. Exchanges are structured as followed:
+1. Client send a json {"config": {"sample_rate":16000}}.
+2. Client send audio chunk (go to 3- ) or {"eof" : 1} (go to 5-).
+3. Server send either a partial result {"partial" : "this is a "} or a final result {"text": "this is a transcription"}.
+4. Back to 2-
+5. Server send a final result and close the connexion.
+
+> Connexion will be closed and the worker will be freed if no chunk are received for 10s. 
+
 #### /docs
-The /docs route offers a OpenAPI/swagger interface. 
+The /docs route offers a OpenAPI/swagger interface.
 
 ### Through the message broker
 
@@ -169,6 +207,7 @@ On a successfull transcription the returned object is a json object structured a
 * The <ins>word</ins> field contains each word with their time stamp and individual confidence. (Empty if with_metadata=False)
 * The <ins>confidence</ins> field contains the overall confidence for the transcription. (0.0 if with_metadata=False)
 
+
 ## Test
 ### Curl
 You can test you http API using curl:
diff --git a/RELEASE.md b/RELEASE.md
index e6f14a9..0a146d5 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,10 @@
+# 3.3.0
+- Added optional streaming route to the http serving mode
+- Added serving mode: websocket
+- Added Dynamic model conversion allowing to use either Vosk Models or Linagora AM/LM models
+- Changer Vosk dependency to alphacep/vosk
+- Updated README.md
+
 # 3.2.1
 - Repository total rework. The goal being to have a simple transcription service embeddable within a micro-service infrastructure. 
 - Changed repository name from linto-platform-stt-standalone-worker to linto-platform-stt.
diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index aaf5975..6921a0c 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -1,15 +1,17 @@
 import os
+import asyncio
 
 from stt import logger
 from stt.processing import model
 from celery_app.celeryapp import celery
 from stt.processing.utils import load_wave
-from stt.processing.decoding import decode
+from stt.processing import decode
 
 @celery.task(name="transcribe_task")
 def transcribe_task(file_name: str, with_metadata: bool):
-    """ transcribe_task do a synchronous call to the transcribe worker API """
+    """ transcribe_task """
     logger.info("Received transcription task for {}".format(file_name))
+
     # Load wave
     file_path = os.path.join("/opt/audio", file_name)
     try:
@@ -25,6 +27,4 @@ def transcribe_task(file_name: str, with_metadata: bool):
         logger.error("Failed to decode: {}".format(e))
         raise Exception("Failed to decode {}".format(file_path))
 
-    return result
-
-    
\ No newline at end of file
+    return result
\ No newline at end of file
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 16c25d2..212b145 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -3,10 +3,29 @@ set -ea
 
 echo "RUNNING STT"
 
+# Check model
+echo "Checking model format ..."
+if [ -z "$MODEL_TYPE" ]
+then
+    echo "Model type not specified, expecting Vosk Model"
+    export MODEL_TYPE=vosk
+fi
+
+if  [ "$MODEL_TYPE" = "vosk" ]
+then
+    echo "Using Vosk format's model"
+
+elif [ "$MODEL_TYPE" = "lin" ]
+then
+    echo "Processing model ... "
+    ./lin_to_vosk.py
+else
+    echo "Unknown model type $MODEL_TYPE. Assuming vosk model"
+fi
 # Launch parameters, environement variables and dependencies check
 if [ -z "$SERVICE_MODE" ]
 then
-    echo "ERROR: Must specify a serving mode: [ http | task ]"
+    echo "ERROR: Must specify a serving mode: [ http | task | websocket ]"
     exit -1
 else
     if [ "$SERVICE_MODE" = "http" ] 
@@ -18,11 +37,16 @@ else
         if [[ -z "$SERVICES_BROKER" ]]
         then 
             echo "ERROR: SERVICES_BROKER variable not specified, cannot start celery worker."
-            return -1
+            exit -1
         fi
         /usr/src/app/wait-for-it.sh $(echo $SERVICES_BROKER | cut -d'/' -f 3) --timeout=20 --strict -- echo " $SERVICES_BROKER (Service Broker) is up"
         echo "RUNNING STT CELERY WORKER"
-        celery --app=celery_app.celeryapp worker -Ofair -n ${SERVICE_NAME}_worker@%h --queues=${SERVICE_NAME} -c ${CONCURRENCY}
+        celery --app=celery_app.celeryapp worker -Ofair --queues=${SERVICE_NAME} -c ${CONCURRENCY} -n ${SERVICE_NAME}_worker@%h
+
+    elif [ "$SERVICE_MODE" == "websocket" ]
+    then
+        echo "Running Websocket server on port ${STREAMING_PORT:=80}"
+        python websocket/websocketserver.py
     else
         echo "ERROR: Wrong serving command: $1"
         exit -1
diff --git a/http_server/ingress.py b/http_server/ingress.py
index c43a353..69b6e47 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -6,19 +6,33 @@
 import json
 
 from flask import Flask, request, abort, Response, json
+from flask_sock import Sock
 
 from serving import GunicornServing
 from confparser import createParser
 from swagger import setupSwaggerUI
 
-from stt.processing import model
-from stt.processing import decode, formatAudio
+from stt.processing import model, decode, formatAudio
+from stt.processing.streaming import ws_streaming
+
 
 app = Flask("__stt-standalone-worker__")
+app.config["JSON_AS_ASCII"] = False
+app.config["JSON_SORT_KEYS"] = False
 
 logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s: %(message)s', datefmt='%d/%m/%Y %H:%M:%S')
 logger = logging.getLogger("__stt-standalone-worker__")
 
+# If websocket streaming route is enabled
+if os.environ.get('ENABLE_STREAMING', False) in [True, "true", 1]:
+    logger.info("Init websocket serving ...")
+    sock = Sock(app)
+    logger.info("Streaming is enabled")
+
+    @sock.route('/streaming')
+    def streaming(ws):
+        ws_streaming(ws, model)
+
 @app.route('/healthcheck', methods=['GET'])
 def healthcheck():
     return json.dumps({"healthcheck": "OK"}), 200
@@ -98,7 +112,7 @@ def server_error(error):
         logger.warning("Could not setup swagger: {}".format(str(e)))
     
     serving = GunicornServing(app, {'bind': '{}:{}'.format("0.0.0.0", args.service_port),
-                                    'workers': args.workers,})
+                                    'workers': args.workers, 'timeout': 3600})
     logger.info(args)
     try:
         serving.run()
diff --git a/lin_to_vosk.py b/lin_to_vosk.py
new file mode 100755
index 0000000..1df8ee7
--- /dev/null
+++ b/lin_to_vosk.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+import os
+import re
+import configparser
+
+#LANGUAGE_MODEL_PATH= "/home/rbaraglia/training_ground/STT/fr-FR_Big_v2.2.0"
+#ACOUSTIC_MODEL_PATH= "/home/rbaraglia/training_ground/STT/AM_fr-FR_v2.2.0"
+#TARGET_PATH= "/home/rbaraglia/training_ground/STT/generated_model"
+
+LANGUAGE_MODEL_PATH="/opt/LM"
+ACOUSTIC_MODEL_PATH="/opt/AM"
+TARGET_PATH="/opt/model"
+
+def lin_to_vosk_format(am_path: str, lm_path: str, target_path: str):
+    os.mkdir(target_path)
+    # Create directory structure
+    print("Create directory structure")
+    for subfolder in ["am", "conf", "graph", "ivector", "rescore"]:
+        os.mkdir(os.path.join(target_path, subfolder))
+    
+    # Populate am directory
+    # final.mdl
+    print("Populate am directory")
+    for f in ["final.mdl"]:
+        print(f)
+        os.symlink(os.path.join(am_path, f),
+                os.path.join(target_path, "am", f))
+
+    # Populate conf directory
+    print("Populate conf directory")
+    print("mfcc.conf")
+    os.symlink(os.path.join(am_path, "conf", "mfcc.conf"),
+            os.path.join(target_path, "conf", "mfcc.conf"))
+    
+    print("model.conf")
+    with open(os.path.join(target_path, "conf", "model.conf"), 'w') as f:
+        f.write("--min-active=200\n")
+        f.write("--max-active=7000\n")
+        f.write("--beam=13.0\n")
+        f.write("--lattice-beam=6.0\n")
+        f.write("--acoustic-scale=1.0\n")
+        f.write("--frame-subsampling-factor=3\n")
+        f.write("--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10\n")
+        f.write("--endpoint.rule2.min-trailing-silence=0.5\n")
+        f.write("--endpoint.rule3.min-trailing-silence=1.0\n")
+        f.write("--endpoint.rule4.min-trailing-silence=2.0\n")
+
+    # Populate graph directory
+    print("Populate graph directory")
+    for f in ["HCLG.fst", "words.txt"]:
+        print(f)
+        os.symlink(os.path.join(lm_path, f),
+                os.path.join(target_path, "graph", f))
+
+    print("phones.txt")
+    os.symlink(os.path.join(am_path, "phones.txt"),
+                os.path.join(target_path, "graph", "phones.txt"))
+    
+    # Populate graph/phones directory
+    os.mkdir(os.path.join(target_path, "graph", "phones"))
+    
+    print("Populate graph/phones directory")
+    
+    print("word_boundary.int")
+    os.symlink(os.path.join(lm_path, "word_boundary.int"), 
+               os.path.join(target_path, "graph", "phones", "word_boundary.int"))
+    
+    # Populate ivector directory
+    print("Populate graph/phones directory")
+    for f in ["final.dubm",  "final.ie",  "final.mat",  "global_cmvn.stats",  "online_cmvn.conf"]:
+        print(f)
+        os.symlink(os.path.join(am_path, "ivector_extractor", f),
+                   os.path.join(target_path, "ivector", f))
+    
+    print("splice.conf")
+    with open(os.path.join(am_path, "ivector_extractor", "splice_opts"), 'r') as in_f:
+        with open(os.path.join(target_path, "ivector", "splice.conf"), 'w') as out_f:
+            for param in in_f.read().split(" "):
+                out_f.write(f"{param}\n")
+
+    # Populate rescore
+    # ?
+
+if __name__ == "__main__":
+    lin_to_vosk_format(ACOUSTIC_MODEL_PATH, LANGUAGE_MODEL_PATH, TARGET_PATH)
+
diff --git a/requirements.txt b/requirements.txt
index cb9cd90..c863c6a 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,7 +3,9 @@ numpy>=1.18.5
 flask>=1.1.2
 flask-cors>=3.0.10
 flask-swagger-ui>=3.36.0
+flask-sock
 gunicorn
 pyyaml>=5.4.1
 wavio>=0.0.4
 requests>=2.26.0
+websockets
\ No newline at end of file
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 701499d..d8c095e 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,29 +1,23 @@
 import os
 from time import time
 
+from vosk import Model
+
 from stt import logger
-from stt.processing.model import prepare, loadModel
 from stt.processing.decoding import decode
 from stt.processing.utils import load_wave, formatAudio
+#from stt.processing.model import loadModel
 
-# Model locations (should be mounted)
-AM_PATH='/opt/models/AM'
-LM_PATH='/opt/models/LM'
-CONF_PATH='/opt/config'
+__all__ = ["model", "logger", "decode", "load_wave", "formatAudio"]
 
-# Prepare Model
-logger.debug("Setting folders and configuration files")
-try:
-    prepare(AM_PATH, LM_PATH, CONF_PATH)
-except Exception as e:
-    logger.error("Could not prepare service: {}".format(str(e)))
-    exit(-1)
+# Model locations (should be mounted)
+MODEL_PATH="/opt/model"
 
 # Load ASR models (acoustic model and decoding graph)
 logger.info('Loading acoustic model and decoding graph ...')
 start = time()
 try:
-    model = loadModel(AM_PATH, LM_PATH, os.path.join(CONF_PATH, "online.conf"))
+    model = Model(MODEL_PATH)
 except Exception as e:
     raise Exception("Failed to load transcription model: {}".format(str(e)))
     exit(-1)
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 50c6532..fec7bec 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -1,13 +1,14 @@
 import json
+import re
 
 from vosk import KaldiRecognizer, Model
 
 def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: bool) -> dict:
     ''' Transcribe the audio data using the vosk library with the defined model.'''
-    result = {'text':'', 'words':[], 'confidence-score': 0.0}
+    result = {'text':'', 'confidence-score': 0.0, 'words':[]}
 
-    recognizer = KaldiRecognizer(model, sampling_rate, False)
-    recognizer.SetMaxAlternatives(1)
+    recognizer = KaldiRecognizer(model, sampling_rate)
+    recognizer.SetMaxAlternatives(0) # Set confidence per words
     recognizer.SetWords(with_metadata)
 
     recognizer.AcceptWaveform(audio_data)
@@ -20,10 +21,9 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
     except Exception:
         return result
 
-    result['text'] = decoder_result['text'].strip()
-    if 'words' in decoder_result:
-        result['words'] = decoder_result['words']
-    if 'confidence' in decoder_result:
-        result['confidence-score'] = decoder_result['confidence']
-
+    result["text"] = re.sub("<unk> " , "", decoder_result["text"])
+    if "word" in decoder_result:
+        result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
+    if "confidence" in decoder_result:    
+        result["confidence-score"] = sum([w["conf"] for w in words]) / len(words)
     return result
\ No newline at end of file
diff --git a/stt/processing/model.py b/stt/processing/model.py
deleted file mode 100644
index 866037b..0000000
--- a/stt/processing/model.py
+++ /dev/null
@@ -1,81 +0,0 @@
-import os
-import re
-import configparser
-
-from vosk import Model
-
-def prepare(am_path:str, lm_path:str, config_path:str):
-    ''' Prepare folder and configuration files needed for the model usage '''
-
-    if not os.path.isdir(config_path):
-        os.mkdir(config_path)
-
-    # load decoder parameters from "decode.cfg"
-    decoder_settings = configparser.ConfigParser()
-    if not os.path.exists(am_path+'/decode.cfg'):
-        raise FileNotFoundError("decode.cfg file is missing")
-
-    decoder_settings.read(am_path+'/decode.cfg')
-
-    # Prepare "online.conf"
-    with open(am_path+"/conf/online.conf") as f:
-        values = f.readlines()
-        with open(config_path+"/online.conf", 'w') as f:
-            for i in values:
-                f.write(i)
-            f.write("--ivector-extraction-config=" +
-                    config_path+"/ivector_extractor.conf\n")
-            f.write("--mfcc-config=" + os.path.join(am_path, "conf/mfcc.conf") + "\n")
-            f.write("--beam=" + decoder_settings.get('decoder_params', 'beam') + "\n")
-            f.write("--lattice-beam=" + decoder_settings.get('decoder_params', 'lattice_beam')+"\n")
-            f.write("--acoustic-scale=" + decoder_settings.get('decoder_params', 'acwt') + "\n")
-            f.write("--min-active=" + decoder_settings.get('decoder_params', 'min_active') + "\n")
-            f.write("--max-active=" + decoder_settings.get('decoder_params', 'max_active') + "\n")
-            f.write("--frame-subsampling-factor=" + decoder_settings.get('decoder_params', 'frame_subsampling_factor') + "\n")
-
-    # Prepare "ivector_extractor.conf"
-    with open(am_path+"/conf/ivector_extractor.conf") as f:
-        values = f.readlines()
-        with open(config_path+"/ivector_extractor.conf", 'w') as f:
-            for i in values:
-                f.write(i)
-            f.write("--splice-config="+am_path+"/conf/splice.conf\n")
-            f.write("--cmvn-config="+am_path +
-                    "/conf/online_cmvn.conf\n")
-            f.write("--lda-matrix="+am_path +
-                    "/ivector_extractor/final.mat\n")
-            f.write("--global-cmvn-stats="+am_path +
-                    "/ivector_extractor/global_cmvn.stats\n")
-            f.write("--diag-ubm="+am_path +
-                    "/ivector_extractor/final.dubm\n")
-            f.write("--ivector-extractor="+am_path +
-                    "/ivector_extractor/final.ie")
-
-    # Prepare "word_boundary.int" if not exist
-    if not os.path.exists(lm_path+"/word_boundary.int") and os.path.exists(am_path+"/phones.txt"):
-        print("Create word_boundary.int based on phones.txt")
-        with open(am_path+"/phones.txt", 'r') as f:
-            phones = f.readlines()
-
-        with open(lm_path+"/word_boundary.int", "w") as f:
-            for phone in phones:
-                phone = phone.strip()
-                phone = re.sub('^<eps> .*', '', phone)
-                phone = re.sub('^#\d+ .*', '', phone)
-                if phone != '':
-                    id = phone.split(' ')[1]
-                    if '_I ' in phone:
-                        f.write(id+" internal\n")
-                    elif '_B ' in phone:
-                        f.write(id+" begin\n")
-                    elif '_E ' in phone:
-                        f.write(id+" end\n")
-                    elif '_S ' in phone:
-                        f.write(id+" singleton\n")
-                    else:
-                        f.write(id+" nonword\n")
-
-def loadModel(am_path: str, lm_path: str, config_path: str) -> Model:
-    """ Load STT model """
-    print("MODEL" , os.path.join(config_path, "online.conf"))
-    return Model(am_path,lm_path, config_path)
diff --git a/stt/processing/streaming.py b/stt/processing/streaming.py
new file mode 100644
index 0000000..5e36b8c
--- /dev/null
+++ b/stt/processing/streaming.py
@@ -0,0 +1,107 @@
+import json
+import re
+from typing import Union
+
+from websockets.legacy.server import WebSocketServerProtocol
+from simple_websocket.ws import Server as WSServer
+from vosk import KaldiRecognizer, Model
+
+from stt import logger 
+
+async def wssDecode(ws: WebSocketServerProtocol, model: Model):
+    """ Async Decode function endpoint """
+    # Wait for config
+    res = await ws.recv()
+    
+    # Parse config
+    try:
+        config = json.loads(res)["config"]
+        sample_rate = config["sample_rate"]
+    except Exception as e :
+        logger.error("Failed to read stream configuration")
+        await ws.close(reason="Failed to load configuration")
+    
+    # Recognizer
+    try: 
+        recognizer = KaldiRecognizer(model, sample_rate)
+    except Exception as e:
+        logger.error("Failed to load recognizer")
+        await ws.close(reason="Failed to load recognizer")
+    
+    # Wait for chunks
+    while True: 
+        try:
+            # Client data
+            message = await ws.recv()
+            if message is None or message == "": # Timeout
+                ws.close()
+        except Exception as e:
+            print("Connection closed by client: {}".format(str(e)))
+            break
+        
+        # End frame
+        if "eof" in str(message):
+            ret = recognizer.FinalResult()
+            await ws.send(json.dumps(ret))
+            await ws.close(reason="End of stream")
+            break
+
+        # Audio chunk
+        if recognizer.AcceptWaveform(message):
+            ret = recognizer.Result() # Result seems to not work properly
+            await ws.send(ret)
+            
+        else:
+            ret = recognizer.PartialResult()
+            last_utterance = ret
+            await ws.send(ret)
+
+def ws_streaming(ws: WSServer, model: Model):
+    """ Sync Decode function endpoint"""
+    # Wait for config
+    res = ws.receive(timeout=10)
+
+    # Timeout
+    if res is None:
+        pass
+
+    # Parse config
+    try:
+        config = json.loads(res)["config"]
+        sample_rate = config["sample_rate"]
+    except Exception as e :
+        logger.error("Failed to read stream configuration")
+        ws.close()
+
+    # Recognizer
+    try:
+        recognizer = KaldiRecognizer(model, sample_rate)
+    except Exception as e:
+        logger.error("Failed to load recognizer")
+        ws.close()
+
+    # Wait for chunks
+    while True: 
+        try:
+            # Client data
+            message = ws.receive(timeout=10)
+            if message is None: # Timeout
+                ws.close()
+        except Exception:
+            print("Connection closed by client")
+            break
+        # End frame
+        if "eof" in str(message):
+            ret = recognizer.FinalResult()
+            ws.send(json.dumps(re.sub("<unk> ", "", ret)))
+            ws.close()
+            break    
+        # Audio chunk
+        print("Received chunk")
+        if recognizer.AcceptWaveform(message):
+            ret = recognizer.Result()
+            ws.send(re.sub("<unk> ", "", ret))
+            
+        else:
+            ret = recognizer.PartialResult()
+            ws.send(re.sub("<unk> ", "", ret))
\ No newline at end of file
diff --git a/websocket/__init__.py b/websocket/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/websocket/websocketserver.py b/websocket/websocketserver.py
new file mode 100644
index 0000000..eb3f9f2
--- /dev/null
+++ b/websocket/websocketserver.py
@@ -0,0 +1,21 @@
+import os
+import asyncio
+
+import websockets
+
+from stt.processing import model
+from stt.processing.streaming import wssDecode
+
+async def _fun_wrapper(ws):
+    """ Wrap wssDecode function to add STT Model reference """
+    return await wssDecode(ws, model)
+
+async def WSServer(port: int):
+    """ Launch the websocket server """
+    async with websockets.serve(_fun_wrapper, "0.0.0.0", serving_port):
+        await asyncio.Future()
+
+if __name__ == "__main__":
+    serving_port = os.environ.get("STREAMING_PORT", 80)
+    asyncio.run(WSServer(serving_port))
+    
\ No newline at end of file

From 36dfd29f967dd300d6beff34180807e1729f2717 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Wed, 9 Mar 2022 10:30:31 +0000
Subject: [PATCH 077/172] Fixed key

---
 stt/processing/decoding.py | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index fec7bec..a072741 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -20,10 +20,8 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
         decoder_result = json.loads(decoder_result_raw)
     except Exception:
         return result
-
     result["text"] = re.sub("<unk> " , "", decoder_result["text"])
-    if "word" in decoder_result:
+    if "result" in decoder_result:
         result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
-    if "confidence" in decoder_result:    
-        result["confidence-score"] = sum([w["conf"] for w in words]) / len(words)
+        result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(result["words"])    
     return result
\ No newline at end of file

From 07032f9d81f2d785cecc67dd5875ff6083d9cfa5 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Wed, 9 Mar 2022 10:39:11 +0000
Subject: [PATCH 078/172] Fixed division by zero

---
 stt/processing/decoding.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index a072741..3a7d33e 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -23,5 +23,6 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
     result["text"] = re.sub("<unk> " , "", decoder_result["text"])
     if "result" in decoder_result:
         result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
-        result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(result["words"])    
+        if len(result["words"]):
+            result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(result["words"])    
     return result
\ No newline at end of file

From 0cf497d14368615e85bda56c64f9c8c30ae2e333 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Thu, 1 Sep 2022 13:35:05 +0000
Subject: [PATCH 079/172] Auto styling and linting

---
 .gitignore                   |  2 +-
 Makefile                     | 13 ++++++
 celery_app/celeryapp.py      | 22 ++++++----
 celery_app/tasks.py          | 12 +++---
 http_server/confparser.py    | 80 +++++++++++++++---------------------
 http_server/ingress.py       | 80 +++++++++++++++++++++---------------
 http_server/serving.py       | 11 +++--
 http_server/swagger.py       | 14 +++----
 stt/__init__.py              |  8 ++--
 stt/processing/__init__.py   | 11 ++---
 stt/processing/decoding.py   | 15 ++++---
 stt/processing/streaming.py  | 42 ++++++++++---------
 stt/processing/utils.py      |  9 ++--
 websocket/websocketserver.py |  3 +-
 14 files changed, 174 insertions(+), 148 deletions(-)
 create mode 100644 Makefile

diff --git a/.gitignore b/.gitignore
index c556a27..0b8d9ad 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,3 @@
 start_container.sh
-.env
+.env*
 test/*
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..71be1a8
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,13 @@
+.DEFAULT_GOAL := help
+
+target_dirs := stt http_server celery_app
+
+help:
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
+
+style:	## update code style.
+	black -l 100 ${target_dirs}
+	isort ${target_dirs}
+
+lint:	## run pylint linter.
+	pylint ${target_dirs}
diff --git a/celery_app/celeryapp.py b/celery_app/celeryapp.py
index 5f1d96e..d4a5cb4 100644
--- a/celery_app/celeryapp.py
+++ b/celery_app/celeryapp.py
@@ -1,26 +1,30 @@
 import os
+
 from celery import Celery
 
 from stt import logger
 
-celery = Celery(__name__, include=['celery_app.tasks'])
+celery = Celery(__name__, include=["celery_app.tasks"])
 service_name = os.environ.get("SERVICE_NAME")
 broker_url = os.environ.get("SERVICES_BROKER")
 if os.environ.get("BROKER_PASS", False):
-    components = broker_url.split('//')
+    components = broker_url.split("//")
     broker_url = f'{components[0]}//:{os.environ.get("BROKER_PASS")}@{components[1]}'
 celery.conf.broker_url = "{}/0".format(broker_url)
 celery.conf.result_backend = "{}/1".format(broker_url)
-celery.conf.update(
-    result_expires=3600,
-    task_acks_late=True,
-    task_track_started = True)
+celery.conf.update(result_expires=3600, task_acks_late=True, task_track_started=True)
 
 # Queues
 celery.conf.update(
-    {'task_routes': {
-        'transcribe_task' : {'queue': service_name},}
+    {
+        "task_routes": {
+            "transcribe_task": {"queue": service_name},
+        }
     }
 )
 
-logger.info("Celery configured for broker located at {} with service name {}".format(broker_url, service_name))
\ No newline at end of file
+logger.info(
+    "Celery configured for broker located at {} with service name {}".format(
+        broker_url, service_name
+    )
+)
diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index 6921a0c..f2a2b08 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -1,15 +1,15 @@
-import os
 import asyncio
+import os
 
-from stt import logger
-from stt.processing import model
 from celery_app.celeryapp import celery
+from stt import logger
+from stt.processing import decode, model
 from stt.processing.utils import load_wave
-from stt.processing import decode
+
 
 @celery.task(name="transcribe_task")
 def transcribe_task(file_name: str, with_metadata: bool):
-    """ transcribe_task """
+    """transcribe_task"""
     logger.info("Received transcription task for {}".format(file_name))
 
     # Load wave
@@ -27,4 +27,4 @@ def transcribe_task(file_name: str, with_metadata: bool):
         logger.error("Failed to decode: {}".format(e))
         raise Exception("Failed to decode {}".format(file_path))
 
-    return result
\ No newline at end of file
+    return result
diff --git a/http_server/confparser.py b/http_server/confparser.py
index 4c2171d..f676e1a 100644
--- a/http_server/confparser.py
+++ b/http_server/confparser.py
@@ -1,68 +1,52 @@
-import os
 import argparse
+import os
 
 __all__ = ["createParser"]
 
+
 def createParser() -> argparse.ArgumentParser:
     parser = argparse.ArgumentParser()
-    
+
     # SERVICE
     parser.add_argument(
-        '--service_name',
+        "--service_name",
         type=str,
-        help='Service Name',
-        default=os.environ.get('SERVICE_NAME', 'stt'))
+        help="Service Name",
+        default=os.environ.get("SERVICE_NAME", "stt"),
+    )
 
     # MODELS
+    parser.add_argument("--am_path", type=str, help="Acoustic Model Path", default="/opt/models/AM")
+    parser.add_argument("--lm_path", type=str, help="Decoding graph path", default="/opt/models/LM")
     parser.add_argument(
-        '--am_path',
-        type=str,
-        help='Acoustic Model Path',
-        default='/opt/models/AM')
-    parser.add_argument(
-        '--lm_path',
-        type=str,
-        help='Decoding graph path',
-        default='/opt/models/LM')
-    parser.add_argument(
-        '--config_path',
-        type=str,
-        help='Configuration files path',
-        default='/opt/config')
-    
-    #GUNICORN    
-    parser.add_argument(
-        '--service_port',
-        type=int,
-        help='Service port',
-        default=80)
+        "--config_path", type=str, help="Configuration files path", default="/opt/config"
+    )
+
+    # GUNICORN
+    parser.add_argument("--service_port", type=int, help="Service port", default=80)
     parser.add_argument(
-        '--workers',
+        "--workers",
         type=int,
         help="Number of Gunicorn workers (default=CONCURRENCY + 1)",
-        default=int(os.environ.get('CONCURRENCY', 1)) + 1)
-    
-    #SWAGGER
-    parser.add_argument(
-        '--swagger_url',
-        type=str,
-        help='Swagger interface url',
-        default='/docs')
+        default=int(os.environ.get("CONCURRENCY", 1)) + 1,
+    )
+
+    # SWAGGER
+    parser.add_argument("--swagger_url", type=str, help="Swagger interface url", default="/docs")
     parser.add_argument(
-        '--swagger_prefix',
+        "--swagger_prefix",
         type=str,
-        help='Swagger prefix',
-        default=os.environ.get('SWAGGER_PREFIX', ''))
+        help="Swagger prefix",
+        default=os.environ.get("SWAGGER_PREFIX", ""),
+    )
     parser.add_argument(
-        '--swagger_path',
+        "--swagger_path",
         type=str,
-        help='Swagger file path',
-        default=os.environ.get('SWAGGER_PATH', '/usr/src/app/document/swagger.yml'))
-    
-    #MISC
-    parser.add_argument(
-        '--debug',
-        action='store_true',
-        help='Display debug logs')
+        help="Swagger file path",
+        default=os.environ.get("SWAGGER_PATH", "/usr/src/app/document/swagger.yml"),
+    )
+
+    # MISC
+    parser.add_argument("--debug", action="store_true", help="Display debug logs")
 
-    return parser
\ No newline at end of file
+    return parser
diff --git a/http_server/ingress.py b/http_server/ingress.py
index 69b6e47..ffe21ce 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -1,78 +1,81 @@
 #!/usr/bin/env python3
 
+import json
+import logging
 import os
 from time import time
-import logging
-import json
 
-from flask import Flask, request, abort, Response, json
+from confparser import createParser
+from flask import Flask, Response, abort, json, request
 from flask_sock import Sock
-
 from serving import GunicornServing
-from confparser import createParser
 from swagger import setupSwaggerUI
 
-from stt.processing import model, decode, formatAudio
+from stt.processing import decode, formatAudio, model
 from stt.processing.streaming import ws_streaming
 
-
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
 app.config["JSON_SORT_KEYS"] = False
 
-logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s: %(message)s', datefmt='%d/%m/%Y %H:%M:%S')
+logging.basicConfig(
+    format="%(asctime)s %(name)s %(levelname)s: %(message)s", datefmt="%d/%m/%Y %H:%M:%S"
+)
 logger = logging.getLogger("__stt-standalone-worker__")
 
 # If websocket streaming route is enabled
-if os.environ.get('ENABLE_STREAMING', False) in [True, "true", 1]:
+if os.environ.get("ENABLE_STREAMING", False) in [True, "true", 1]:
     logger.info("Init websocket serving ...")
     sock = Sock(app)
     logger.info("Streaming is enabled")
 
-    @sock.route('/streaming')
+    @sock.route("/streaming")
     def streaming(ws):
         ws_streaming(ws, model)
 
-@app.route('/healthcheck', methods=['GET'])
+
+@app.route("/healthcheck", methods=["GET"])
 def healthcheck():
     return json.dumps({"healthcheck": "OK"}), 200
 
-@app.route("/oas_docs", methods=['GET'])
+
+@app.route("/oas_docs", methods=["GET"])
 def oas_docs():
     return "Not Implemented", 501
 
-@app.route('/transcribe', methods=['POST'])
+
+@app.route("/transcribe", methods=["POST"])
 def transcribe():
     try:
-        logger.info('Transcribe request received')
+        logger.info("Transcribe request received")
 
         # get response content type
-        logger.debug(request.headers.get('accept').lower())
-        if request.headers.get('accept').lower() == 'application/json':
+        logger.debug(request.headers.get("accept").lower())
+        if request.headers.get("accept").lower() == "application/json":
             join_metadata = True
-        elif request.headers.get('accept').lower() == 'text/plain':
+        elif request.headers.get("accept").lower() == "text/plain":
             join_metadata = False
         else:
-            raise ValueError('Not accepted header')
+            raise ValueError("Not accepted header")
         logger.debug("Metadata: {}".format(join_metadata))
 
         # get input file
-        if 'file' in request.files.keys():
-            file_buffer = request.files['file'].read()
+        if "file" in request.files.keys():
+            file_buffer = request.files["file"].read()
             audio_data, sampling_rate = formatAudio(file_buffer)
             start_t = time()
-            
+
             # Transcription
             transcription = decode(audio_data, model, sampling_rate, join_metadata)
             logger.debug("Transcription complete (t={}s)".format(time() - start_t))
-            
+
             logger.debug("... Complete")
-            
+
         else:
-            raise ValueError('No audio file was uploaded')
+            raise ValueError("No audio file was uploaded")
 
         if join_metadata:
-            return json.dumps(transcription,ensure_ascii=False) , 200
+            return json.dumps(transcription, ensure_ascii=False), 200
         else:
             return transcription["text"], 200
         return response, 200
@@ -81,23 +84,26 @@ def transcribe():
         return str(error), 400
     except Exception as e:
         logger.error(e)
-        return 'Server Error: {}'.format(str(e)), 500
+        return "Server Error: {}".format(str(e)), 500
+
 
-# Rejected request handlers
 @app.errorhandler(405)
 def method_not_allowed(error):
-    return 'The method is not allowed for the requested URL', 405
+    return "The method is not allowed for the requested URL", 405
+
 
 @app.errorhandler(404)
 def page_not_found(error):
-    return 'The requested URL was not found', 404
+    return "The requested URL was not found", 404
+
 
 @app.errorhandler(500)
 def server_error(error):
     logger.error(error)
-    return 'Server Error', 500
+    return "Server Error", 500
 
-if __name__ == '__main__':
+
+if __name__ == "__main__":
     logger.info("Startup...")
 
     parser = createParser()
@@ -110,9 +116,15 @@ def server_error(error):
             logger.debug("Swagger UI set.")
     except Exception as e:
         logger.warning("Could not setup swagger: {}".format(str(e)))
-    
-    serving = GunicornServing(app, {'bind': '{}:{}'.format("0.0.0.0", args.service_port),
-                                    'workers': args.workers, 'timeout': 3600})
+
+    serving = GunicornServing(
+        app,
+        {
+            "bind": "{}:{}".format("0.0.0.0", args.service_port),
+            "workers": args.workers,
+            "timeout": 3600,
+        },
+    )
     logger.info(args)
     try:
         serving.run()
diff --git a/http_server/serving.py b/http_server/serving.py
index 076f34d..d2dd7e8 100644
--- a/http_server/serving.py
+++ b/http_server/serving.py
@@ -1,17 +1,20 @@
 import gunicorn.app.base
 
-class GunicornServing(gunicorn.app.base.BaseApplication):
 
+class GunicornServing(gunicorn.app.base.BaseApplication):
     def __init__(self, app, options=None):
         self.options = options or {}
         self.application = app
         super().__init__()
 
     def load_config(self):
-        config = {key: value for key, value in self.options.items()
-                  if key in self.cfg.settings and value is not None}
+        config = {
+            key: value
+            for key, value in self.options.items()
+            if key in self.cfg.settings and value is not None
+        }
         for key, value in config.items():
             self.cfg.set(key.lower(), value)
 
     def load(self):
-        return self.application
\ No newline at end of file
+        return self.application
diff --git a/http_server/swagger.py b/http_server/swagger.py
index c0af319..fe58685 100644
--- a/http_server/swagger.py
+++ b/http_server/swagger.py
@@ -1,17 +1,17 @@
 import yaml
 from flask_swagger_ui import get_swaggerui_blueprint
 
+
 def setupSwaggerUI(app, args):
-    '''Setup Swagger UI within the app'''
-    swagger_yml = yaml.load(
-        open(args.swagger_path, 'r'), Loader=yaml.Loader)
+    """Setup Swagger UI within the app"""
+    swagger_yml = yaml.load(open(args.swagger_path, "r"), Loader=yaml.Loader)
     swaggerui = get_swaggerui_blueprint(
         # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
         args.swagger_prefix + args.swagger_url,
         args.swagger_path,
         config={  # Swagger UI config overrides
-            'app_name': "LinTO Platform STT",
-            'spec': swagger_yml
-        }
+            "app_name": "LinTO Platform STT",
+            "spec": swagger_yml,
+        },
     )
-    app.register_blueprint(swaggerui, url_prefix=args.swagger_url)
\ No newline at end of file
+    app.register_blueprint(swaggerui, url_prefix=args.swagger_url)
diff --git a/stt/__init__.py b/stt/__init__.py
index 8e6dc75..a624077 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -1,5 +1,7 @@
-import os
 import logging
+import os
 
-logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s: %(message)s', datefmt='%d/%m/%Y %H:%M:%S')
-logger = logging.getLogger("__stt__")
\ No newline at end of file
+logging.basicConfig(
+    format="%(asctime)s %(name)s %(levelname)s: %(message)s", datefmt="%d/%m/%Y %H:%M:%S"
+)
+logger = logging.getLogger("__stt__")
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index d8c095e..d1a29db 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -5,20 +5,21 @@
 
 from stt import logger
 from stt.processing.decoding import decode
-from stt.processing.utils import load_wave, formatAudio
-#from stt.processing.model import loadModel
+from stt.processing.utils import formatAudio, load_wave
+
+# from stt.processing.model import loadModel
 
 __all__ = ["model", "logger", "decode", "load_wave", "formatAudio"]
 
 # Model locations (should be mounted)
-MODEL_PATH="/opt/model"
+MODEL_PATH = "/opt/model"
 
 # Load ASR models (acoustic model and decoding graph)
-logger.info('Loading acoustic model and decoding graph ...')
+logger.info("Loading acoustic model and decoding graph ...")
 start = time()
 try:
     model = Model(MODEL_PATH)
 except Exception as e:
     raise Exception("Failed to load transcription model: {}".format(str(e)))
     exit(-1)
-logger.info('Acoustic model and decoding graph loaded. (t={}s)'.format(time() - start))
+logger.info("Acoustic model and decoding graph loaded. (t={}s)".format(time() - start))
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 3a7d33e..9908ba0 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -3,12 +3,13 @@
 
 from vosk import KaldiRecognizer, Model
 
+
 def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: bool) -> dict:
-    ''' Transcribe the audio data using the vosk library with the defined model.'''
-    result = {'text':'', 'confidence-score': 0.0, 'words':[]}
+    """Transcribe the audio data using the vosk library with the defined model."""
+    result = {"text": "", "confidence-score": 0.0, "words": []}
 
     recognizer = KaldiRecognizer(model, sampling_rate)
-    recognizer.SetMaxAlternatives(0) # Set confidence per words
+    recognizer.SetMaxAlternatives(0)  # Set confidence per words
     recognizer.SetWords(with_metadata)
 
     recognizer.AcceptWaveform(audio_data)
@@ -20,9 +21,11 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
         decoder_result = json.loads(decoder_result_raw)
     except Exception:
         return result
-    result["text"] = re.sub("<unk> " , "", decoder_result["text"])
+    result["text"] = re.sub("<unk> ", "", decoder_result["text"])
     if "result" in decoder_result:
         result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
         if len(result["words"]):
-            result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(result["words"])    
-    return result
\ No newline at end of file
+            result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(
+                result["words"]
+            )
+    return result
diff --git a/stt/processing/streaming.py b/stt/processing/streaming.py
index 5e36b8c..36f9eca 100644
--- a/stt/processing/streaming.py
+++ b/stt/processing/streaming.py
@@ -2,43 +2,44 @@
 import re
 from typing import Union
 
-from websockets.legacy.server import WebSocketServerProtocol
 from simple_websocket.ws import Server as WSServer
 from vosk import KaldiRecognizer, Model
+from websockets.legacy.server import WebSocketServerProtocol
+
+from stt import logger
 
-from stt import logger 
 
 async def wssDecode(ws: WebSocketServerProtocol, model: Model):
-    """ Async Decode function endpoint """
+    """Async Decode function endpoint"""
     # Wait for config
     res = await ws.recv()
-    
+
     # Parse config
     try:
         config = json.loads(res)["config"]
         sample_rate = config["sample_rate"]
-    except Exception as e :
+    except Exception as e:
         logger.error("Failed to read stream configuration")
         await ws.close(reason="Failed to load configuration")
-    
+
     # Recognizer
-    try: 
+    try:
         recognizer = KaldiRecognizer(model, sample_rate)
     except Exception as e:
         logger.error("Failed to load recognizer")
         await ws.close(reason="Failed to load recognizer")
-    
+
     # Wait for chunks
-    while True: 
+    while True:
         try:
             # Client data
             message = await ws.recv()
-            if message is None or message == "": # Timeout
+            if message is None or message == "":  # Timeout
                 ws.close()
         except Exception as e:
             print("Connection closed by client: {}".format(str(e)))
             break
-        
+
         # End frame
         if "eof" in str(message):
             ret = recognizer.FinalResult()
@@ -48,16 +49,17 @@ async def wssDecode(ws: WebSocketServerProtocol, model: Model):
 
         # Audio chunk
         if recognizer.AcceptWaveform(message):
-            ret = recognizer.Result() # Result seems to not work properly
+            ret = recognizer.Result()  # Result seems to not work properly
             await ws.send(ret)
-            
+
         else:
             ret = recognizer.PartialResult()
             last_utterance = ret
             await ws.send(ret)
 
+
 def ws_streaming(ws: WSServer, model: Model):
-    """ Sync Decode function endpoint"""
+    """Sync Decode function endpoint"""
     # Wait for config
     res = ws.receive(timeout=10)
 
@@ -69,7 +71,7 @@ def ws_streaming(ws: WSServer, model: Model):
     try:
         config = json.loads(res)["config"]
         sample_rate = config["sample_rate"]
-    except Exception as e :
+    except Exception as e:
         logger.error("Failed to read stream configuration")
         ws.close()
 
@@ -81,11 +83,11 @@ def ws_streaming(ws: WSServer, model: Model):
         ws.close()
 
     # Wait for chunks
-    while True: 
+    while True:
         try:
             # Client data
             message = ws.receive(timeout=10)
-            if message is None: # Timeout
+            if message is None:  # Timeout
                 ws.close()
         except Exception:
             print("Connection closed by client")
@@ -95,13 +97,13 @@ def ws_streaming(ws: WSServer, model: Model):
             ret = recognizer.FinalResult()
             ws.send(json.dumps(re.sub("<unk> ", "", ret)))
             ws.close()
-            break    
+            break
         # Audio chunk
         print("Received chunk")
         if recognizer.AcceptWaveform(message):
             ret = recognizer.Result()
             ws.send(re.sub("<unk> ", "", ret))
-            
+
         else:
             ret = recognizer.PartialResult()
-            ws.send(re.sub("<unk> ", "", ret))
\ No newline at end of file
+            ws.send(re.sub("<unk> ", "", ret))
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 016716d..642f427 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -1,16 +1,17 @@
 import io
 
 import wavio
-from numpy import squeeze, int16
+from numpy import int16, squeeze
+
 
 def load_wave(file_path):
-    ''' Formats audio from a wavFile buffer to a bytebuffer'''
+    """Formats audio from a wavFile buffer to a bytebuffer"""
     audio = squeeze(wavio.read(file_path).data)
     return audio.tobytes()
 
 
 def formatAudio(file_buffer):
-    ''' Formats audio from a wavFile buffer to a numpy array for processing.'''
+    """Formats audio from a wavFile buffer to a numpy array for processing."""
     file_buffer_io = io.BytesIO(file_buffer)
     file_content = wavio.read(file_buffer_io)
     # if stereo file, convert to mono by computing the mean over the channels
@@ -21,4 +22,4 @@ def formatAudio(file_buffer):
             data = mean(data, axis=1, dtype=int16)
         return data.tobytes(), file_content.rate
     else:
-        raise Exception("Audio Format not supported.")
\ No newline at end of file
+        raise Exception("Audio Format not supported.")
diff --git a/websocket/websocketserver.py b/websocket/websocketserver.py
index eb3f9f2..9f1f683 100644
--- a/websocket/websocketserver.py
+++ b/websocket/websocketserver.py
@@ -6,10 +6,12 @@
 from stt.processing import model
 from stt.processing.streaming import wssDecode
 
+
 async def _fun_wrapper(ws):
     """ Wrap wssDecode function to add STT Model reference """
     return await wssDecode(ws, model)
 
+
 async def WSServer(port: int):
     """ Launch the websocket server """
     async with websockets.serve(_fun_wrapper, "0.0.0.0", serving_port):
@@ -18,4 +20,3 @@ async def WSServer(port: int):
 if __name__ == "__main__":
     serving_port = os.environ.get("STREAMING_PORT", 80)
     asyncio.run(WSServer(serving_port))
-    
\ No newline at end of file

From 4cf9be25e4a0150fd288a6162e87a5744318aee6 Mon Sep 17 00:00:00 2001
From: HOUPERT <yoann.houpert@yahoo.fr>
Date: Fri, 2 Sep 2022 14:15:33 +0200
Subject: [PATCH 080/172] Add github action for dockerhub description

---
 .github/workflows/dockerhub-description.yml | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 .github/workflows/dockerhub-description.yml

diff --git a/.github/workflows/dockerhub-description.yml b/.github/workflows/dockerhub-description.yml
new file mode 100644
index 0000000..0367b21
--- /dev/null
+++ b/.github/workflows/dockerhub-description.yml
@@ -0,0 +1,20 @@
+name: Update Docker Hub Description
+on:
+  push:
+    branches:
+      - master
+    paths:
+      - README.md
+      - .github/workflows/dockerhub-description.yml
+jobs:
+  dockerHubDescription:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Docker Hub Description
+      uses: peter-evans/dockerhub-description@v3
+      with:
+        username: ${{ secrets.DOCKERHUB_USERNAME }}
+        password: ${{ secrets.DOCKERHUB_PASSWORD }}
+        repository: lintoai/linto-platform-stt
+        readme-filepath: ./README.md

From 8d8346d26c7f418780b15218dab13b9be8ca120d Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Mon, 12 Sep 2022 09:23:45 +0000
Subject: [PATCH 081/172] 3.3.1: Fixes and style

---
 README.md                    |  4 +-
 RELEASE.md                   |  5 +++
 celery_app/celeryapp.py      |  8 ++--
 celery_app/tasks.py          | 14 +++----
 http_server/confparser.py    |  5 ++-
 http_server/ingress.py       | 33 ++++++++--------
 http_server/swagger.py       |  3 +-
 lin_to_vosk.py               | 74 ++++++++++++++++++++++--------------
 stt/__init__.py              |  3 +-
 stt/processing/__init__.py   |  9 ++---
 stt/processing/decoding.py   |  6 +--
 stt/processing/streaming.py  | 24 ++++++------
 stt/processing/utils.py      |  5 +--
 websocket/websocketserver.py |  7 ++--
 14 files changed, 112 insertions(+), 88 deletions(-)

diff --git a/README.md b/README.md
index aa711f7..7fd1fa0 100644
--- a/README.md
+++ b/README.md
@@ -111,8 +111,8 @@ You need a message broker up and running at MY_SERVICE_BROKER.
 
 ```bash
 docker run --rm \
--v AM_PATH:/opt/models/AM \
--v LM_PATH:/opt/models/LM \
+-v AM_PATH:/opt/AM \
+-v LM_PATH:/opt/LM \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file .env \
 linto-platform-stt:latest
diff --git a/RELEASE.md b/RELEASE.md
index 0a146d5..2626e10 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,8 @@
+# 3.3.1
+- Fixed lin_to_vosk throwing an error on a already existing container.
+- Corrected an error on the README regarding mounting model volumes. 
+- Code styling (PEP 8)
+
 # 3.3.0
 - Added optional streaming route to the http serving mode
 - Added serving mode: websocket
diff --git a/celery_app/celeryapp.py b/celery_app/celeryapp.py
index d4a5cb4..e04d73b 100644
--- a/celery_app/celeryapp.py
+++ b/celery_app/celeryapp.py
@@ -10,8 +10,8 @@
 if os.environ.get("BROKER_PASS", False):
     components = broker_url.split("//")
     broker_url = f'{components[0]}//:{os.environ.get("BROKER_PASS")}@{components[1]}'
-celery.conf.broker_url = "{}/0".format(broker_url)
-celery.conf.result_backend = "{}/1".format(broker_url)
+celery.conf.broker_url = f"{broker_url}/0"
+celery.conf.result_backend = f"{broker_url}/1"
 celery.conf.update(result_expires=3600, task_acks_late=True, task_track_started=True)
 
 # Queues
@@ -24,7 +24,5 @@
 )
 
 logger.info(
-    "Celery configured for broker located at {} with service name {}".format(
-        broker_url, service_name
-    )
+    f"Celery configured for broker located at {broker_url} with service name {service_name}"
 )
diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index f2a2b08..ce2ca4d 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -10,21 +10,21 @@
 @celery.task(name="transcribe_task")
 def transcribe_task(file_name: str, with_metadata: bool):
     """transcribe_task"""
-    logger.info("Received transcription task for {}".format(file_name))
+    logger.info(f"Received transcription task for {file_name}")
 
     # Load wave
     file_path = os.path.join("/opt/audio", file_name)
     try:
         file_content = load_wave(file_path)
-    except Exception as e:
-        logger.error("Failed to load ressource: {}".format(e))
-        raise Exception("Could not open ressource {}".format(file_path))
+    except Exception as err:
+        logger.error(f"Failed to load ressource: {repr(err)}")
+        raise Exception(f"Could not open ressource {file_path}") from err
 
     # Decode
     try:
         result = decode(file_content, model, 16000, with_metadata)
-    except Exception as e:
-        logger.error("Failed to decode: {}".format(e))
-        raise Exception("Failed to decode {}".format(file_path))
+    except Exception as err:
+        logger.error(f"Failed to decode: {repr(err)}")
+        raise Exception(f"Failed to decode {file_path}") from err
 
     return result
diff --git a/http_server/confparser.py b/http_server/confparser.py
index f676e1a..2396d71 100644
--- a/http_server/confparser.py
+++ b/http_server/confparser.py
@@ -19,7 +19,10 @@ def createParser() -> argparse.ArgumentParser:
     parser.add_argument("--am_path", type=str, help="Acoustic Model Path", default="/opt/models/AM")
     parser.add_argument("--lm_path", type=str, help="Decoding graph path", default="/opt/models/LM")
     parser.add_argument(
-        "--config_path", type=str, help="Configuration files path", default="/opt/config"
+        "--config_path",
+        type=str,
+        help="Configuration files path",
+        default="/opt/config",
     )
 
     # GUNICORN
diff --git a/http_server/ingress.py b/http_server/ingress.py
index ffe21ce..5a9c661 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -19,7 +19,8 @@
 app.config["JSON_SORT_KEYS"] = False
 
 logging.basicConfig(
-    format="%(asctime)s %(name)s %(levelname)s: %(message)s", datefmt="%d/%m/%Y %H:%M:%S"
+    format="%(asctime)s %(name)s %(levelname)s: %(message)s",
+    datefmt="%d/%m/%Y %H:%M:%S",
 )
 logger = logging.getLogger("__stt-standalone-worker__")
 
@@ -30,8 +31,8 @@
     logger.info("Streaming is enabled")
 
     @sock.route("/streaming")
-    def streaming(ws):
-        ws_streaming(ws, model)
+    def streaming(web_socket):
+        ws_streaming(web_socket, model)
 
 
 @app.route("/healthcheck", methods=["GET"])
@@ -76,24 +77,22 @@ def transcribe():
 
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
-        else:
-            return transcription["text"], 200
-        return response, 200
+        return transcription["text"], 200
 
     except ValueError as error:
         return str(error), 400
-    except Exception as e:
-        logger.error(e)
-        return "Server Error: {}".format(str(e)), 500
+    except Exception as error:
+        logger.error(error)
+        return "Server Error: {}".format(str(error)), 500
 
 
 @app.errorhandler(405)
-def method_not_allowed(error):
+def method_not_allowed(_):
     return "The method is not allowed for the requested URL", 405
 
 
 @app.errorhandler(404)
-def page_not_found(error):
+def page_not_found(_):
     return "The requested URL was not found", 404
 
 
@@ -114,13 +113,13 @@ def server_error(error):
         if args.swagger_path is not None:
             setupSwaggerUI(app, args)
             logger.debug("Swagger UI set.")
-    except Exception as e:
-        logger.warning("Could not setup swagger: {}".format(str(e)))
+    except Exception as err:
+        logger.warning("Could not setup swagger: {}".format(str(err)))
 
     serving = GunicornServing(
         app,
         {
-            "bind": "{}:{}".format("0.0.0.0", args.service_port),
+            "bind": f"0.0.0.0:{args.service_port}",
             "workers": args.workers,
             "timeout": 3600,
         },
@@ -130,7 +129,7 @@ def server_error(error):
         serving.run()
     except KeyboardInterrupt:
         logger.info("Process interrupted by user")
-    except Exception as e:
-        logger.error(str(e))
+    except Exception as err:
+        logger.error(str(err))
         logger.critical("Service is shut down (Error)")
-        exit(e)
+        exit(err)
diff --git a/http_server/swagger.py b/http_server/swagger.py
index fe58685..a9b93d0 100644
--- a/http_server/swagger.py
+++ b/http_server/swagger.py
@@ -4,7 +4,8 @@
 
 def setupSwaggerUI(app, args):
     """Setup Swagger UI within the app"""
-    swagger_yml = yaml.load(open(args.swagger_path, "r"), Loader=yaml.Loader)
+    with open(args.swagger_path, "r") as yml_file:
+        swagger_yml = yaml.load(yml_file, Loader=yaml.Loader)
     swaggerui = get_swaggerui_blueprint(
         # Swagger UI static files will be mapped to '{SWAGGER_URL}/dist/'
         args.swagger_prefix + args.swagger_url,
diff --git a/lin_to_vosk.py b/lin_to_vosk.py
index 9c8d513..62025a0 100755
--- a/lin_to_vosk.py
+++ b/lin_to_vosk.py
@@ -1,35 +1,42 @@
 #!/usr/bin/env python3
+import configparser
 import os
 import re
-import configparser
 
-LANGUAGE_MODEL_PATH="/opt/LM"
-ACOUSTIC_MODEL_PATH="/opt/AM"
-TARGET_PATH="/opt/model"
+LANGUAGE_MODEL_PATH = "/opt/LM"
+ACOUSTIC_MODEL_PATH = "/opt/AM"
+TARGET_PATH = "/opt/model"
+
 
 def lin_to_vosk_format(am_path: str, lm_path: str, target_path: str):
+    if os.path.exists(target_path):
+        print(
+            "Target model folder already exist, assuming model has already been converted. Skipping..."
+        )
+        return
     os.mkdir(target_path)
     # Create directory structure
     print("Create directory structure")
     for subfolder in ["am", "conf", "graph", "ivector", "rescore"]:
         os.mkdir(os.path.join(target_path, subfolder))
-    
+
     # Populate am directory
     # final.mdl
     print("Populate am directory")
     for f in ["final.mdl"]:
         print(f)
-        os.symlink(os.path.join(am_path, f),
-                os.path.join(target_path, "am", f))
+        os.symlink(os.path.join(am_path, f), os.path.join(target_path, "am", f))
 
     # Populate conf directory
     print("Populate conf directory")
     print("mfcc.conf")
-    os.symlink(os.path.join(am_path, "conf", "mfcc.conf"),
-            os.path.join(target_path, "conf", "mfcc.conf"))
-    
+    os.symlink(
+        os.path.join(am_path, "conf", "mfcc.conf"),
+        os.path.join(target_path, "conf", "mfcc.conf"),
+    )
+
     print("model.conf")
-    with open(os.path.join(target_path, "conf", "model.conf"), 'w') as f:
+    with open(os.path.join(target_path, "conf", "model.conf"), "w") as f:
         f.write("--min-active=200\n")
         f.write("--max-active=7000\n")
         f.write("--beam=13.0\n")
@@ -45,38 +52,49 @@ def lin_to_vosk_format(am_path: str, lm_path: str, target_path: str):
     print("Populate graph directory")
     for f in ["HCLG.fst", "words.txt"]:
         print(f)
-        os.symlink(os.path.join(lm_path, f),
-                os.path.join(target_path, "graph", f))
+        os.symlink(os.path.join(lm_path, f), os.path.join(target_path, "graph", f))
 
     print("phones.txt")
-    os.symlink(os.path.join(am_path, "phones.txt"),
-                os.path.join(target_path, "graph", "phones.txt"))
-    
+    os.symlink(
+        os.path.join(am_path, "phones.txt"),
+        os.path.join(target_path, "graph", "phones.txt"),
+    )
+
     # Populate graph/phones directory
     os.mkdir(os.path.join(target_path, "graph", "phones"))
-    
+
     print("Populate graph/phones directory")
-    
+
     print("word_boundary.int")
-    os.symlink(os.path.join(lm_path, "word_boundary.int"), 
-               os.path.join(target_path, "graph", "phones", "word_boundary.int"))
-    
+    os.symlink(
+        os.path.join(lm_path, "word_boundary.int"),
+        os.path.join(target_path, "graph", "phones", "word_boundary.int"),
+    )
+
     # Populate ivector directory
     print("Populate graph/phones directory")
-    for f in ["final.dubm",  "final.ie",  "final.mat",  "global_cmvn.stats",  "online_cmvn.conf"]:
+    for f in [
+        "final.dubm",
+        "final.ie",
+        "final.mat",
+        "global_cmvn.stats",
+        "online_cmvn.conf",
+    ]:
         print(f)
-        os.symlink(os.path.join(am_path, "ivector_extractor", f),
-                   os.path.join(target_path, "ivector", f))
-    
+        os.symlink(
+            os.path.join(am_path, "ivector_extractor", f),
+            os.path.join(target_path, "ivector", f),
+        )
+
     print("splice.conf")
-    with open(os.path.join(am_path, "ivector_extractor", "splice_opts"), 'r') as in_f:
-        with open(os.path.join(target_path, "ivector", "splice.conf"), 'w') as out_f:
+    with open(os.path.join(am_path, "ivector_extractor", "splice_opts"), "r") as in_f:
+        with open(os.path.join(target_path, "ivector", "splice.conf"), "w") as out_f:
             for param in in_f.read().split(" "):
                 out_f.write(f"{param}\n")
 
     # Populate rescore
     # ?
 
+
 if __name__ == "__main__":
     lin_to_vosk_format(ACOUSTIC_MODEL_PATH, LANGUAGE_MODEL_PATH, TARGET_PATH)
-
diff --git a/stt/__init__.py b/stt/__init__.py
index a624077..73c3a1a 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -2,6 +2,7 @@
 import os
 
 logging.basicConfig(
-    format="%(asctime)s %(name)s %(levelname)s: %(message)s", datefmt="%d/%m/%Y %H:%M:%S"
+    format="%(asctime)s %(name)s %(levelname)s: %(message)s",
+    datefmt="%d/%m/%Y %H:%M:%S",
 )
 logger = logging.getLogger("__stt__")
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index d1a29db..2a3eca5 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,4 +1,5 @@
 import os
+import sys
 from time import time
 
 from vosk import Model
@@ -7,8 +8,6 @@
 from stt.processing.decoding import decode
 from stt.processing.utils import formatAudio, load_wave
 
-# from stt.processing.model import loadModel
-
 __all__ = ["model", "logger", "decode", "load_wave", "formatAudio"]
 
 # Model locations (should be mounted)
@@ -19,7 +18,7 @@
 start = time()
 try:
     model = Model(MODEL_PATH)
-except Exception as e:
-    raise Exception("Failed to load transcription model: {}".format(str(e)))
-    exit(-1)
+except Exception as err:
+    raise Exception("Failed to load transcription model: {}".format(str(err))) from err
+    sys.exit(-1)
 logger.info("Acoustic model and decoding graph loaded. (t={}s)".format(time() - start))
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 9908ba0..2e1fb7c 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -15,8 +15,8 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
     recognizer.AcceptWaveform(audio_data)
     try:
         decoder_result_raw = recognizer.FinalResult()
-    except Exception as e:
-        raise Exception("Failed to decode")
+    except Exception as err:
+        raise Exception("Failed to decode") from err
     try:
         decoder_result = json.loads(decoder_result_raw)
     except Exception:
@@ -24,7 +24,7 @@ def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: b
     result["text"] = re.sub("<unk> ", "", decoder_result["text"])
     if "result" in decoder_result:
         result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
-        if len(result["words"]):
+        if result["words"]:
             result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(
                 result["words"]
             )
diff --git a/stt/processing/streaming.py b/stt/processing/streaming.py
index 36f9eca..28274b8 100644
--- a/stt/processing/streaming.py
+++ b/stt/processing/streaming.py
@@ -58,10 +58,10 @@ async def wssDecode(ws: WebSocketServerProtocol, model: Model):
             await ws.send(ret)
 
 
-def ws_streaming(ws: WSServer, model: Model):
+def ws_streaming(websocket_server: WSServer, model: Model):
     """Sync Decode function endpoint"""
     # Wait for config
-    res = ws.receive(timeout=10)
+    res = websocket_server.receive(timeout=10)
 
     # Timeout
     if res is None:
@@ -71,39 +71,39 @@ def ws_streaming(ws: WSServer, model: Model):
     try:
         config = json.loads(res)["config"]
         sample_rate = config["sample_rate"]
-    except Exception as e:
+    except Exception:
         logger.error("Failed to read stream configuration")
-        ws.close()
+        websocket_server.close()
 
     # Recognizer
     try:
         recognizer = KaldiRecognizer(model, sample_rate)
-    except Exception as e:
+    except Exception:
         logger.error("Failed to load recognizer")
-        ws.close()
+        websocket_server.close()
 
     # Wait for chunks
     while True:
         try:
             # Client data
-            message = ws.receive(timeout=10)
+            message = websocket_server.receive(timeout=10)
             if message is None:  # Timeout
-                ws.close()
+                websocket_server.close()
         except Exception:
             print("Connection closed by client")
             break
         # End frame
         if "eof" in str(message):
             ret = recognizer.FinalResult()
-            ws.send(json.dumps(re.sub("<unk> ", "", ret)))
-            ws.close()
+            websocket_server.send(json.dumps(re.sub("<unk> ", "", ret)))
+            websocket_server.close()
             break
         # Audio chunk
         print("Received chunk")
         if recognizer.AcceptWaveform(message):
             ret = recognizer.Result()
-            ws.send(re.sub("<unk> ", "", ret))
+            websocket_server.send(re.sub("<unk> ", "", ret))
 
         else:
             ret = recognizer.PartialResult()
-            ws.send(re.sub("<unk> ", "", ret))
+            websocket_server.send(re.sub("<unk> ", "", ret))
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 642f427..d003fc8 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -1,7 +1,7 @@
 import io
 
 import wavio
-from numpy import int16, squeeze
+from numpy import int16, squeeze, mean
 
 
 def load_wave(file_path):
@@ -21,5 +21,4 @@ def formatAudio(file_buffer):
         elif file_content.data.shape[1] == 2:
             data = mean(data, axis=1, dtype=int16)
         return data.tobytes(), file_content.rate
-    else:
-        raise Exception("Audio Format not supported.")
+    raise Exception("Audio Format not supported.")
diff --git a/websocket/websocketserver.py b/websocket/websocketserver.py
index 9f1f683..81e035b 100644
--- a/websocket/websocketserver.py
+++ b/websocket/websocketserver.py
@@ -1,5 +1,5 @@
-import os
 import asyncio
+import os
 
 import websockets
 
@@ -8,15 +8,16 @@
 
 
 async def _fun_wrapper(ws):
-    """ Wrap wssDecode function to add STT Model reference """
+    """Wrap wssDecode function to add STT Model reference"""
     return await wssDecode(ws, model)
 
 
 async def WSServer(port: int):
-    """ Launch the websocket server """
+    """Launch the websocket server"""
     async with websockets.serve(_fun_wrapper, "0.0.0.0", serving_port):
         await asyncio.Future()
 
+
 if __name__ == "__main__":
     serving_port = os.environ.get("STREAMING_PORT", 80)
     asyncio.run(WSServer(serving_port))

From 7394acc0b5f407620d68fc20ba77787fbd82509f Mon Sep 17 00:00:00 2001
From: rbaraglia <rbaraglia@linagora.com>
Date: Mon, 3 Oct 2022 14:34:41 +0200
Subject: [PATCH 082/172] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index aa711f7..33c6a34 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ cp .envdefault .env
 ### Serving mode 
 ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
 
-STT can be use three ways:
+STT can be used three ways:
 * Through an [HTTP API](#http-server) using the **http**'s mode.
 * Through a [message broker](#micro-service-within-linto-platform-stack) using the **task**'s mode.
 * Through a [websocket server](#websocket-server) **websocket**'s mode.

From 45c59cf831e8e194cb2bcb30f9369b8d5c60d813 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Thu, 27 Oct 2022 08:47:34 +0000
Subject: [PATCH 083/172] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 7fd1fa0..f76955c 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ cp .envdefault .env
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
-| SERVING_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
+| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
 | MODEL_TYPE | Type of STT model used. | lin\|vosk |
 | ENABLE_STREAMING | Using http serving mode, enable the /streaming websocket route | true\|false |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
@@ -72,12 +72,12 @@ STT can be use three ways:
 
 Mode is specified using the .env value or environment variable ```SERVING_MODE```.
 ```bash
-SERVING_MODE=http
+SERVICE_MODE=http
 ```
 ### HTTP Server
 The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route.
 
-The SERVING_MODE value in the .env should be set to ```http```.
+The SERVICE_MODE value in the .env should be set to ```http```.
 
 ```bash
 docker run --rm \
@@ -101,7 +101,7 @@ This will run a container providing an [HTTP API](#http-api) binded on the host
 ### Micro-service within LinTO-Platform stack
 The HTTP serving mode connect a celery worker to a message broker.
 
-The SERVING_MODE value in the .env should be set to ```task```.
+The SERVICE_MODE value in the .env should be set to ```task```.
 
 >LinTO-platform-stt can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
 >LinTO-platform-stt in task mode is not intended to be launch manually.
@@ -130,7 +130,7 @@ linto-platform-stt:latest
 ### Websocket Server
 Websocket server's mode deploy a streaming transcription service only.
 
-The SERVING_MODE value in the .env should be set to ```websocket```.
+The SERVICE_MODE value in the .env should be set to ```websocket```.
 
 Usage is the same as the [http streaming API](#/streaming)
 

From 9abd1744362947adcfc5f6e98ccc8e1021ac9448 Mon Sep 17 00:00:00 2001
From: Rudy Baraglia <rbaraglia@linagora.com>
Date: Thu, 27 Oct 2022 09:32:26 +0000
Subject: [PATCH 084/172] Fix broken models link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f76955c..06e2ef0 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ LinTO-Platform-STT accepts two kinds of models:
 * LinTO Acoustic and Languages models.
 * Vosk models.
 
-We provide home-cured models (v2) on [dl.linto.ai](https://doc.linto.ai/#/services/linstt_download).
+We provide home-cured models (v2) on [dl.linto.ai](https://doc.linto.ai/docs/developpers/apis/ASR/models).
 Or you can also use Vosk models available [here](https://alphacephei.com/vosk/models).
 
 ### Docker

From 88999116f95a0cfd4f4408f788a31aa970566b45 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 21 Dec 2022 10:41:55 +0100
Subject: [PATCH 085/172] Fix stereo to mono conversion

---
 RELEASE.md              | 3 +++
 stt/processing/utils.py | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/RELEASE.md b/RELEASE.md
index 2626e10..9966250 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 3.3.2
+- Fixed use of stereo audio in http serving mode
+
 # 3.3.1
 - Fixed lin_to_vosk throwing an error on a already existing container.
 - Corrected an error on the README regarding mounting model volumes. 
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index d003fc8..b81cc5d 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -19,6 +19,8 @@ def formatAudio(file_buffer):
         if file_content.data.shape[1] == 1:
             data = squeeze(file_content.data)
         elif file_content.data.shape[1] == 2:
-            data = mean(data, axis=1, dtype=int16)
+            data = mean(file_content.data, axis=1, dtype=int16)
+        else:
+            raise Exception("Audio Format not supported.")
         return data.tobytes(), file_content.rate
     raise Exception("Audio Format not supported.")

From d929283f51b4a101ead26e9de462463dbc9eec71 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 13:15:07 +0100
Subject: [PATCH 086/172] First version of STT platform with OpenAI Whisper
 (and SpeechBrain for word alignment)

---
 .envdefault                       |   9 +-
 Dockerfile                        |  44 +---
 Jenkinsfile                       |  21 ++
 README.md                         |  96 ++++-----
 RELEASE.md                        |   3 +
 celery_app/tasks.py               |   8 +-
 docker-entrypoint.sh              |  19 +-
 http_server/ingress.py            |  17 +-
 lin_to_vosk.py                    | 100 ---------
 load_alignment_model.py           |  79 +++++++
 requirements.txt                  |   8 +-
 stt/processing/__init__.py        |  46 +++--
 stt/processing/alignment_model.py |  66 ++++++
 stt/processing/decoding.py        | 331 +++++++++++++++++++++++++++---
 stt/processing/load_model.py      |  62 ++++++
 stt/processing/streaming.py       | 109 ----------
 stt/processing/utils.py           |  48 +++--
 stt/processing/word_alignment.py  | 169 +++++++++++++++
 websocket/websocketserver.py      |  23 ---
 19 files changed, 844 insertions(+), 414 deletions(-)
 delete mode 100755 lin_to_vosk.py
 create mode 100644 load_alignment_model.py
 create mode 100644 stt/processing/alignment_model.py
 create mode 100644 stt/processing/load_model.py
 delete mode 100644 stt/processing/streaming.py
 create mode 100644 stt/processing/word_alignment.py
 delete mode 100644 websocket/websocketserver.py

diff --git a/.envdefault b/.envdefault
index 33a394c..61a57bd 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,17 +1,12 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
-MODEL_TYPE=lin
-
-# HTTP PARAMETERS
-ENABLE_STREAMING=true
+MODEL_TYPE=/opt/model.pt
+LANGUAGE=fr
 
 # TASK PARAMETERS
 SERVICE_NAME=stt
 SERVICES_BROKER=redis://192.168.0.1:6379
 BROKER_PASS=password
 
-# WEBSOCKET PARAMETERS
-STREAMING_PORT=80
-
 # CONCURRENCY
 CONCURRENCY=2
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile
index bdf65c0..4761b3d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
 FROM python:3.9
-LABEL maintainer="irebai@linagora.com, rbaraglia@linagora.com"
+LABEL maintainer="jlouradour@linagora.com"
 
 ARG KALDI_MKL
 
@@ -11,6 +11,7 @@ RUN apt-get update && \
         unzip \
         xz-utils \
         sox \
+        ffmpeg \
         g++ \
         make \
         cmake \
@@ -20,40 +21,20 @@ RUN apt-get update && \
         autoconf \
         libtool \
         pkg-config \
-        ca-certificates \
-    && rm -rf /var/lib/apt/lists/*
+        ca-certificates
 
-# Build vosk-kaldi
-RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi /opt/kaldi \
-    && cd /opt/kaldi/tools \
-    && sed -i 's:status=0:exit 0:g' extras/check_dependencies.sh \
-    && sed -i 's:--enable-ngram-fsts:--enable-ngram-fsts --disable-bin:g' Makefile \
-    && make -j $(nproc) openfst cub \
-    && if [ "x$KALDI_MKL" != "x1" ] ; then \
-          extras/install_openblas_clapack.sh; \
-       else \
-          extras/install_mkl.sh; \
-       fi \
-    && cd /opt/kaldi/src \
-    && if [ "x$KALDI_MKL" != "x1" ] ; then \
-          ./configure --mathlib=OPENBLAS_CLAPACK --shared; \
-       else \
-          ./configure --mathlib=MKL --shared; \
-       fi \
-    && sed -i 's:-msse -msse2:-msse -msse2:g' kaldi.mk \
-    && sed -i 's: -O1 : -O3 :g' kaldi.mk \
-    && make -j $(nproc) online2 lm rnnlm
+RUN rm -rf /var/lib/apt/lists/*
 
 # Install python dependencies
 COPY requirements.txt ./
-RUN pip install --no-cache-dir -r requirements.txt
+RUN pip install --force-reinstall --no-cache-dir -r requirements.txt
 
-# Install Custom Vosk API
-RUN git clone --depth 1 https://github.com/alphacep/vosk-api /opt/vosk-api && cd /opt/vosk-api/python && \
-    cd /opt/vosk-api/src \
-    && KALDI_MKL=$KALDI_MKL KALDI_ROOT=/opt/kaldi make -j $(nproc) \
-    && cd /opt/vosk-api/python \
-    && python3 ./setup.py install
+# Download alignment model
+COPY load_alignment_model.py ./
+RUN python3 load_alignment_model.py
+
+# Cleaning
+RUN rm requirements.txt load_alignment_model.py
 
 WORKDIR /usr/src/app
 
@@ -63,9 +44,6 @@ COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
 COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
-COPY lin_to_vosk.py /usr/src/app/lin_to_vosk.py
-
-RUN mkdir -p /var/log/supervisor/
 
 ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
diff --git a/Jenkinsfile b/Jenkinsfile
index 95e42b0..572c1c5 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -47,5 +47,26 @@ pipeline {
                 }
             }
         }
+
+        // stage('Docker build for whisper branch'){
+        //     when{
+        //         branch 'feature/whisper'
+        //     }
+        //     steps {
+        //         echo 'Publishing whisper'
+        //         script {
+        //             image = docker.build(env.DOCKER_HUB_REPO)
+        //             VERSION = sh(
+        //                 returnStdout: true, 
+        //                 script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+        //             ).trim()
+
+        //             docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+        //                 image.push("${VERSION}")
+        //                 image.push('whisper')
+        //             }
+        //         }
+        //     }
+        // }
     }// end stages
 }
\ No newline at end of file
diff --git a/README.md b/README.md
index ec70060..50f03a8 100644
--- a/README.md
+++ b/README.md
@@ -7,17 +7,26 @@ LinTO-platform-stt can either be used as a standalone transcription service or d
 
 ### Hardware
 To run the transcription models you'll need:
-* At least 7Go of disk space to build the docker image.
+* At least 8Go of disk space to build the docker image.
 * Up to 7GB of RAM depending on the model used.
 * One CPU per worker. Inference time scales on CPU performances. 
 
 ### Model
-LinTO-Platform-STT accepts two kinds of models:
-* LinTO Acoustic and Languages models.
-* Vosk models.
-
-We provide home-cured models (v2) on [dl.linto.ai](https://doc.linto.ai/docs/developpers/apis/ASR/models).
-Or you can also use Vosk models available [here](https://alphacephei.com/vosk/models).
+LinTO-Platform-STT accepts one Whisper models in the PyTorch format.
+
+You can download mutli-lingual models with the following links:
+* tiny: "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
+* base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
+* small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
+* medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
+* large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
+* large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
+
+Models specialized for English can also be found:
+* tiny.en: "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
+* base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
+* small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
+* medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
 
 ### Docker
 The transcription service requires docker up and running.
@@ -39,11 +48,14 @@ or
 
 ```bash
 docker pull lintoai/linto-platform-stt
-```
+``` with the following links
 
 **2- Download the models**
 
-Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL.
+Have the Whisper model file ready at ASR_PATH.
+
+You can downloaded with the links mentioned above, if you don't have already a Whisper model.
+If you already used Whisper in the past, you may have models in ~/.cache/whisper.
 
 **3- Fill the .env**
 
@@ -54,12 +66,10 @@ cp .envdefault .env
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
 | SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
-| MODEL_TYPE | Type of STT model used. | lin\|vosk |
-| ENABLE_STREAMING | Using http serving mode, enable the /streaming websocket route | true\|false |
+| MODEL_TYPE | Path to the model or type of model used. | ASR_PATH\|small\|medium\|large-v1\|... |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
 | SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
 | BROKER_PASS | Using the task mode, broker password | my-password |
-| STREAMING_PORT | Using the websocket mode, the listening port for ingoing WS connexions.  | 80 |
 | CONCURRENCY | Maximum number of parallel requests | >1 |
 
 ### Serving mode 
@@ -82,8 +92,7 @@ The SERVICE_MODE value in the .env should be set to ```http```.
 ```bash
 docker run --rm \
 -p HOST_SERVING_PORT:80 \
--v AM_PATH:/opt/AM \
--v LM_PATH:/opt/LM \
+-v ASR_PATH:/opt/model.pt \
 --env-file .env \
 linto-platform-stt:latest
 ```
@@ -94,9 +103,7 @@ This will run a container providing an [HTTP API](#http-api) binded on the host
 | Variables | Description | Example |
 |:-|:-|:-|
 | HOST_SERVING_PORT | Host serving port | 80 |
-| AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
-| MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
+| ASR_PATH | (Optional) Path to the Whisper model on the host machine to /opt/model.pt | /my/path/to/models/medium.pt |
 
 ### Micro-service within LinTO-Platform stack
 The HTTP serving mode connect a celery worker to a message broker.
@@ -111,8 +118,7 @@ You need a message broker up and running at MY_SERVICE_BROKER.
 
 ```bash
 docker run --rm \
--v AM_PATH:/opt/AM \
--v LM_PATH:/opt/LM \
+-v ASR_PATH:/opt/model.pt \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file .env \
 linto-platform-stt:latest
@@ -121,19 +127,10 @@ linto-platform-stt:latest
 **Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
-| AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
-| LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
-| MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
+| ASR_PATH | (Optional) Path to the Whisper model on the host machine to /opt/model.pt | /my/path/to/models/medium.pt |
 | SHARED_AUDIO_FOLDER | Shared audio folder mounted to /opt/audio | /my/path/to/models/vosk-model |
 
 
-### Websocket Server
-Websocket server's mode deploy a streaming transcription service only.
-
-The SERVICE_MODE value in the .env should be set to ```websocket```.
-
-Usage is the same as the [http streaming API](#/streaming)
-
 ## Usages
 ### HTTP API
 #### /healthcheck
@@ -153,27 +150,20 @@ Transcription API
 Return the transcripted text using "text/plain" or a json object when using "application/json" structure as followed:
 ```json
 {
-  "text" : "This is the transcription",
-  "words" : [
-    {"word":"This", "start": 0.123, "end": 0.453, "conf": 0.9},
-    ...
-  ]
-  "confidence-score": 0.879
+    "text" : "This is the transcription as text",
+    "words": [
+        {
+        "word" : "This",
+        "start": 0.0,
+        "end": 0.124,
+        "conf": 0.82341
+        },
+        ...
+    ],
+    "confidence-score": 0.879
 }
 ```
 
-#### /streaming
-The /streaming route is accessible if the ENABLE_STREAMING environment variable is set to true.
-
-The route accepts websocket connexions. Exchanges are structured as followed:
-1. Client send a json {"config": {"sample_rate":16000}}.
-2. Client send audio chunk (go to 3- ) or {"eof" : 1} (go to 5-).
-3. Server send either a partial result {"partial" : "this is a "} or a final result {"text": "this is a transcription"}.
-4. Back to 2-
-5. Server send a final result and close the connexion.
-
-> Connexion will be closed and the worker will be freed if no chunk are received for 10s. 
-
 #### /docs
 The /docs route offers a OpenAPI/swagger interface.
 
@@ -189,17 +179,17 @@ STT-Worker accepts requests with the following arguments:
 On a successfull transcription the returned object is a json object structured as follow:
 ```json
 {
-    "text" : "this is the transcription as text",
+    "text" : "This is the transcription as text",
     "words": [
         {
-        "word" : "this",
+        "word" : "This",
         "start": 0.0,
         "end": 0.124,
-        "conf": 1.0
+        "conf": 0.82341
         },
         ...
     ],
-    "confidence-score": ""
+    "confidence-score": 0.879
 }
 ```
 
@@ -220,5 +210,5 @@ This project is developped under the AGPLv3 License (see LICENSE).
 
 ## Acknowlegment.
 
-* [Vosk, speech recognition toolkit](https://alphacephei.com/vosk/).
-* [Kaldi Speech Recognition Toolkit](https://github.com/kaldi-asr/kaldi)
+* [OpenAI Whisper](https://github.com/openai/whisper)
+* [SpeechBrain](https://github.com/speechbrain/speechbrain).
diff --git a/RELEASE.md b/RELEASE.md
index 2626e10..a569376 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.0
+- Integration of Whisper
+
 # 3.3.1
 - Fixed lin_to_vosk throwing an error on a already existing container.
 - Corrected an error on the README regarding mounting model volumes. 
diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index ce2ca4d..3b7251f 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -3,8 +3,8 @@
 
 from celery_app.celeryapp import celery
 from stt import logger
-from stt.processing import decode, model
-from stt.processing.utils import load_wave
+from stt.processing import decode, model, alignment_model
+from stt.processing.utils import load_audiofile
 
 
 @celery.task(name="transcribe_task")
@@ -15,14 +15,14 @@ def transcribe_task(file_name: str, with_metadata: bool):
     # Load wave
     file_path = os.path.join("/opt/audio", file_name)
     try:
-        file_content = load_wave(file_path)
+        file_content = load_audiofile(file_path)
     except Exception as err:
         logger.error(f"Failed to load ressource: {repr(err)}")
         raise Exception(f"Could not open ressource {file_path}") from err
 
     # Decode
     try:
-        result = decode(file_content, model, 16000, with_metadata)
+        result = decode(file_content, model, alignment_model, with_metadata)
     except Exception as err:
         logger.error(f"Failed to decode: {repr(err)}")
         raise Exception(f"Failed to decode {file_path}") from err
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 212b145..4d67cca 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -7,21 +7,10 @@ echo "RUNNING STT"
 echo "Checking model format ..."
 if [ -z "$MODEL_TYPE" ]
 then
-    echo "Model type not specified, expecting Vosk Model"
-    export MODEL_TYPE=vosk
+    echo "Model type not specified, choosing Whisper medium model"
+    export MODEL_TYPE=medium
 fi
 
-if  [ "$MODEL_TYPE" = "vosk" ]
-then
-    echo "Using Vosk format's model"
-
-elif [ "$MODEL_TYPE" = "lin" ]
-then
-    echo "Processing model ... "
-    ./lin_to_vosk.py
-else
-    echo "Unknown model type $MODEL_TYPE. Assuming vosk model"
-fi
 # Launch parameters, environement variables and dependencies check
 if [ -z "$SERVICE_MODE" ]
 then
@@ -43,10 +32,6 @@ else
         echo "RUNNING STT CELERY WORKER"
         celery --app=celery_app.celeryapp worker -Ofair --queues=${SERVICE_NAME} -c ${CONCURRENCY} -n ${SERVICE_NAME}_worker@%h
 
-    elif [ "$SERVICE_MODE" == "websocket" ]
-    then
-        echo "Running Websocket server on port ${STREAMING_PORT:=80}"
-        python websocket/websocketserver.py
     else
         echo "ERROR: Wrong serving command: $1"
         exit -1
diff --git a/http_server/ingress.py b/http_server/ingress.py
index 5a9c661..6ccd090 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -11,8 +11,7 @@
 from serving import GunicornServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, formatAudio, model
-from stt.processing.streaming import ws_streaming
+from stt.processing import decode, load_wave_buffer, model, alignment_model
 
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
@@ -24,16 +23,6 @@
 )
 logger = logging.getLogger("__stt-standalone-worker__")
 
-# If websocket streaming route is enabled
-if os.environ.get("ENABLE_STREAMING", False) in [True, "true", 1]:
-    logger.info("Init websocket serving ...")
-    sock = Sock(app)
-    logger.info("Streaming is enabled")
-
-    @sock.route("/streaming")
-    def streaming(web_socket):
-        ws_streaming(web_socket, model)
-
 
 @app.route("/healthcheck", methods=["GET"])
 def healthcheck():
@@ -63,11 +52,11 @@ def transcribe():
         # get input file
         if "file" in request.files.keys():
             file_buffer = request.files["file"].read()
-            audio_data, sampling_rate = formatAudio(file_buffer)
+            audio_data = load_wave_buffer(file_buffer)
             start_t = time()
 
             # Transcription
-            transcription = decode(audio_data, model, sampling_rate, join_metadata)
+            transcription = decode(audio_data, model, alignment_model, join_metadata)
             logger.debug("Transcription complete (t={}s)".format(time() - start_t))
 
             logger.debug("... Complete")
diff --git a/lin_to_vosk.py b/lin_to_vosk.py
deleted file mode 100755
index 62025a0..0000000
--- a/lin_to_vosk.py
+++ /dev/null
@@ -1,100 +0,0 @@
-#!/usr/bin/env python3
-import configparser
-import os
-import re
-
-LANGUAGE_MODEL_PATH = "/opt/LM"
-ACOUSTIC_MODEL_PATH = "/opt/AM"
-TARGET_PATH = "/opt/model"
-
-
-def lin_to_vosk_format(am_path: str, lm_path: str, target_path: str):
-    if os.path.exists(target_path):
-        print(
-            "Target model folder already exist, assuming model has already been converted. Skipping..."
-        )
-        return
-    os.mkdir(target_path)
-    # Create directory structure
-    print("Create directory structure")
-    for subfolder in ["am", "conf", "graph", "ivector", "rescore"]:
-        os.mkdir(os.path.join(target_path, subfolder))
-
-    # Populate am directory
-    # final.mdl
-    print("Populate am directory")
-    for f in ["final.mdl"]:
-        print(f)
-        os.symlink(os.path.join(am_path, f), os.path.join(target_path, "am", f))
-
-    # Populate conf directory
-    print("Populate conf directory")
-    print("mfcc.conf")
-    os.symlink(
-        os.path.join(am_path, "conf", "mfcc.conf"),
-        os.path.join(target_path, "conf", "mfcc.conf"),
-    )
-
-    print("model.conf")
-    with open(os.path.join(target_path, "conf", "model.conf"), "w") as f:
-        f.write("--min-active=200\n")
-        f.write("--max-active=7000\n")
-        f.write("--beam=13.0\n")
-        f.write("--lattice-beam=6.0\n")
-        f.write("--acoustic-scale=1.0\n")
-        f.write("--frame-subsampling-factor=3\n")
-        f.write("--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10\n")
-        f.write("--endpoint.rule2.min-trailing-silence=0.5\n")
-        f.write("--endpoint.rule3.min-trailing-silence=1.0\n")
-        f.write("--endpoint.rule4.min-trailing-silence=2.0\n")
-
-    # Populate graph directory
-    print("Populate graph directory")
-    for f in ["HCLG.fst", "words.txt"]:
-        print(f)
-        os.symlink(os.path.join(lm_path, f), os.path.join(target_path, "graph", f))
-
-    print("phones.txt")
-    os.symlink(
-        os.path.join(am_path, "phones.txt"),
-        os.path.join(target_path, "graph", "phones.txt"),
-    )
-
-    # Populate graph/phones directory
-    os.mkdir(os.path.join(target_path, "graph", "phones"))
-
-    print("Populate graph/phones directory")
-
-    print("word_boundary.int")
-    os.symlink(
-        os.path.join(lm_path, "word_boundary.int"),
-        os.path.join(target_path, "graph", "phones", "word_boundary.int"),
-    )
-
-    # Populate ivector directory
-    print("Populate graph/phones directory")
-    for f in [
-        "final.dubm",
-        "final.ie",
-        "final.mat",
-        "global_cmvn.stats",
-        "online_cmvn.conf",
-    ]:
-        print(f)
-        os.symlink(
-            os.path.join(am_path, "ivector_extractor", f),
-            os.path.join(target_path, "ivector", f),
-        )
-
-    print("splice.conf")
-    with open(os.path.join(am_path, "ivector_extractor", "splice_opts"), "r") as in_f:
-        with open(os.path.join(target_path, "ivector", "splice.conf"), "w") as out_f:
-            for param in in_f.read().split(" "):
-                out_f.write(f"{param}\n")
-
-    # Populate rescore
-    # ?
-
-
-if __name__ == "__main__":
-    lin_to_vosk_format(ACOUSTIC_MODEL_PATH, LANGUAGE_MODEL_PATH, TARGET_PATH)
diff --git a/load_alignment_model.py b/load_alignment_model.py
new file mode 100644
index 0000000..0cf6087
--- /dev/null
+++ b/load_alignment_model.py
@@ -0,0 +1,79 @@
+import os 
+import urllib.request
+import zipfile
+
+import huggingface_hub
+import speechbrain as sb
+import requests
+
+
+def load_alignment_model(name, download_root = "/opt"):
+    if name.startswith("linSTT"):
+        destdir = os.path.join(download_root, name)
+        if not os.path.exists(destdir):
+            # Download model
+            url = f"https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/{name}.zip"
+            destzip = destdir+".zip"
+            if not os.path.exists(destzip):
+                print("Downloading", url, "into", destdir)
+                os.makedirs(download_root, exist_ok=True)
+                urllib.request.urlretrieve(url, destzip)
+            with zipfile.ZipFile(destzip, 'r') as z:
+                os.makedirs(destdir, exist_ok=True)
+                z.extractall(destdir)
+            assert os.path.isdir(destdir)
+            os.remove(destzip)
+    else:
+        destdir = name
+    load_speechbrain_model(destdir, download_root = download_root)
+
+def load_speechbrain_model(source, device = None, download_root = "/opt"):
+    
+    if os.path.isdir(source):
+        yaml_file = os.path.join(source, "hyperparams.yaml")
+        assert os.path.isfile(yaml_file), f"Hyperparams file {yaml_file} not found"
+    else:
+        try:
+            yaml_file = huggingface_hub.hf_hub_download(repo_id=source, filename="hyperparams.yaml", cache_dir = os.path.join(download_root, "huggingface/hub"))
+        except requests.exceptions.HTTPError:
+            yaml_file = None
+
+    overrides = make_yaml_overrides(yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
+    savedir = os.path.join(download_root, "speechbrain")
+    try:
+        model = sb.pretrained.EncoderASR.from_hparams(source = source, savedir = savedir, overrides = overrides)
+    except ValueError:
+        model = sb.pretrained.EncoderDecoderASR.from_hparams(source = source, savedir = savedir, overrides = overrides)
+    return model
+
+def make_yaml_overrides(yaml_file, key_values):
+    """
+    return a dictionary of overrides to be used with speechbrain
+    yaml_file: path to yaml file
+    key_values: dict of key values to override
+    """
+    if yaml_file is None: return None
+
+    override = {}
+    with open(yaml_file, "r") as f:
+        parent = None
+        for line in f:
+            if line.strip() == "":
+                parent = None
+            elif line == line.lstrip():
+                if ":" in line:
+                    parent = line.split(":")[0].strip()
+                    if parent in key_values:
+                        override[parent] = key_values[parent]
+            elif ":" in line:
+                child = line.strip().split(":")[0].strip()
+                if child in key_values:
+                    override[parent] = override.get(parent, {}) | {child: key_values[child]}
+    return override
+
+
+if __name__ == "__main__":
+
+    import sys
+    assert len(sys.argv) in [1, 2], f"Usage: {sys.argv[0]} <model_type_or_file>"
+    load_alignment_model(sys.argv[1] if len(sys.argv) > 1 else "linSTT_speechbrain_fr-FR_v1.0.0")
diff --git a/requirements.txt b/requirements.txt
index 132bdfc..a93dc9f 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,11 +1,13 @@
 celery[redis,auth,msgpack]>=4.4.7
-numpy>=1.18.5
 flask>=1.1.2
 flask-cors>=3.0.10
-flask-swagger-ui>=3.36.0
 flask-sock
+flask-swagger-ui>=3.36.0
 gunicorn
+num2words
 pyyaml>=5.4.1
-wavio>=0.0.4
 requests>=2.26.0
+speechbrain
+wavio>=0.0.4
 websockets
+git+https://github.com/openai/whisper.git
\ No newline at end of file
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 2a3eca5..a4d6182 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,24 +1,46 @@
 import os
-import sys
+import logging
 from time import time
 
-from vosk import Model
+import torch
+import whisper
 
 from stt import logger
-from stt.processing.decoding import decode
-from stt.processing.utils import formatAudio, load_wave
+from stt.processing.decoding import decode, get_default_language
+from stt.processing.utils import load_wave_buffer, load_audiofile
 
-__all__ = ["model", "logger", "decode", "load_wave", "formatAudio"]
+from .load_model import load_whisper_model, load_speechbrain_model
 
-# Model locations (should be mounted)
-MODEL_PATH = "/opt/model"
+__all__ = ["logger", "decode", "model", "alignment_model", "load_audiofile", "load_wave_buffer"]
 
-# Load ASR models (acoustic model and decoding graph)
-logger.info("Loading acoustic model and decoding graph ...")
+# Set logger level
+logger.setLevel(logging.INFO)
+
+# Set device
+device = os.environ.get("DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
+try:
+    device = torch.device(device)
+except Exception as err:
+    raise Exception("Failed to set device: {}".format(str(err))) from err
+
+# Check language
+available_languages = [k for k,v in whisper.tokenizer.LANGUAGES.items()] + [None]
+if get_default_language() not in available_languages:
+    raise RuntimeError(f"Langaue {get_default_language()} is not available. Available languages are: {available_languages}")
+
+# Load ASR model
+model_type = os.environ.get("MODEL_TYPE", "medium")
+logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.isfile(model_type) else 'remote'})...")
 start = time()
 try:
-    model = Model(MODEL_PATH)
+    model = load_whisper_model(model_type, device = device)
 except Exception as err:
     raise Exception("Failed to load transcription model: {}".format(str(err))) from err
-    sys.exit(-1)
-logger.info("Acoustic model and decoding graph loaded. (t={}s)".format(time() - start))
+logger.info("Model loaded. (t={}s)".format(time() - start))
+
+# Load alignment model
+alignment_model_type = os.environ.get("ALIGNMENT_MODEL_TYPE", "/opt/linSTT_speechbrain_fr-FR_v1.0.0")
+logger.info(f"Loading alignment model...")
+start = time()
+alignment_model = load_speechbrain_model(alignment_model_type, device = device, download_root = "/opt")
+logger.info("Alignment Model loaded. (t={}s)".format(time() - start))
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
new file mode 100644
index 0000000..f6d52c8
--- /dev/null
+++ b/stt/processing/alignment_model.py
@@ -0,0 +1,66 @@
+import math
+import torch
+import torch.nn.utils.rnn as rnn_utils
+
+from stt import logger
+
+def speechbrain_get_vocab(model):
+    tokenizer = model.tokenizer
+    labels = [{'':" ", ' ⁇ ':"<pad>"}.get(i,i).lower() for i in tokenizer.decode([[i] for i in range(tokenizer.get_piece_size())])]
+    blank_id = labels.index("<pad>")
+    return labels, blank_id
+
+
+# The following limit is to handle the corner Case of too long audio segment (which is better to split it to avoid memory overflow).
+# But it is 2240400 / 16000 Hz ~ 140 seconds, which should not happen for segments detected by Whisper (usually one sentence).
+# Also note that Whisper works with 30 seconds segment, so there is chance that this limit is never reached.
+MAX_LEN = 2240400 
+
+def speechbrain_compute_log_probas(model, audios, max_len = MAX_LEN):
+    # Single audio
+    if not isinstance(audios, list):
+        audios = [audios]
+        log_probas = speechbrain_compute_log_probas(model, audios, max_len = max_len)
+        return log_probas[0]
+
+    # Batch of audios (can occur when max_len is reached)
+    assert len(audios) > 0, "audios must be a non-empty list"
+    if not isinstance(audios[0], torch.Tensor):
+        audios = [torch.from_numpy(a) for a in audios]
+    if max([len(a) for a in audios]) > max_len:
+        # Split audios into chunks of max_len
+        batch_size = len(audios)
+        chunks = []
+        i_audio = []
+        for a in audios:
+            chunks.extend([a[i:min(i+max_len, len(a))] for i in range(0, len(a), max_len)])
+            i_audio.append(len(chunks))
+            if len(chunks) > 1:
+                logger.warning("Audio too long, splitting into {} chunks for alignment".format(len(chunks)))
+        # Decode chunks of audio and concatenate results
+        log_probas = [[] for i in range(len(audios))]
+        for i in range(0, len(chunks), batch_size):
+            chunk = chunks[i:min(i+batch_size, len(chunks))]
+            log_probas_tmp = speechbrain_compute_log_probas(model, chunk)
+            for j in range(i,i+len(chunk)):
+                k = 0
+                while j >= i_audio[k]:
+                    k += 1
+                log_probas[k].append(log_probas_tmp[j-i])
+        log_probas = [torch.cat(p, dim = 0) for p in log_probas]
+        log_probas, wav_lens = pack_sequences(log_probas, device = model.device)
+    else:
+        batch, wav_lens = pack_sequences(audios, device = model.device)
+        log_probas = model.forward(batch, wav_lens)
+
+    log_probas = torch.log_softmax(log_probas, dim=-1)
+    return log_probas
+
+def pack_sequences(tensors, device = "cpu"):
+    if len(tensors) == 1:
+        return tensors[0].unsqueeze(0).to(device), torch.Tensor([1.]).to(device)
+    tensor = rnn_utils.pad_sequence(tensors, batch_first=True)
+    wav_lens = [len(x) for x in tensors]
+    maxwav_lens = max(wav_lens)
+    wav_lens = torch.Tensor([l/maxwav_lens for l in wav_lens])
+    return tensor.to(device), wav_lens.to(device)
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 2e1fb7c..7290af4 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -1,31 +1,316 @@
-import json
+import os
+
+import whisper
+from whisper.audio import SAMPLE_RATE
+
+import math
+import numpy as np
+import torch
+
 import re
+import string
+from num2words import num2words
+
+from stt import logger
+from .word_alignment import compute_alignment
 
-from vosk import KaldiRecognizer, Model
+# TODO: understand and remove this limitations
+torch.set_num_threads(1)
 
+def get_default_language():
+    return os.environ.get("LANGUAGE", None)
 
-def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: bool) -> dict:
-    """Transcribe the audio data using the vosk library with the defined model."""
+def decode(audio: torch.Tensor,
+    model: whisper.model.Whisper,
+    alignment_model: "Any",
+    with_word_timestamps: bool,
+    language: str = None,
+    beam_size: int = None,
+    no_speech_threshold: float = 0.6,
+    logprob_threshold: float = -1.0,
+    compression_ratio_threshold: float = 2.4,
+    normalize_text_as_words = False,
+    ) -> dict:
+    """Transcribe the audio data using Whisper with the defined model."""
     result = {"text": "", "confidence-score": 0.0, "words": []}
 
-    recognizer = KaldiRecognizer(model, sampling_rate)
-    recognizer.SetMaxAlternatives(0)  # Set confidence per words
-    recognizer.SetWords(with_metadata)
+    fp16 = model.device != torch.device("cpu")
+
+    if language is None:
+        language = get_default_language()
+
+    logger.info(f"Transcribing audio with language {language}...")
+
+    whisper_res = model.transcribe(audio,
+        language = language,
+        fp16 = fp16,
+        temperature = 0.0, # For deterministic results
+        beam_size = beam_size,
+        no_speech_threshold = no_speech_threshold,
+        logprob_threshold = logprob_threshold,
+        compression_ratio_threshold = compression_ratio_threshold
+    )
+
+    text = whisper_res["text"].strip()
+    if normalize_text_as_words:
+        text = normalize_text(text, language)
+        text = remove_punctuation(text)
+    segments = whisper_res["segments"]
+
+    result["text"] = text
+    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean()
+    if not with_word_timestamps:
+        if not normalize_text_as_words:
+            text = normalize_text(text, language)
+            text = remove_punctuation(text)
+        result["words"] = text.split()
+    else:
+        # Compute word timestamps
+        result["words"] = []
+        max_t = audio.shape[0]
+        for segment in segments:
+            offset = segment["start"]
+            start = min(max_t, round(segment["start"] * SAMPLE_RATE))
+            end = min(max_t, round(segment["end"] * SAMPLE_RATE))
+            sub_audio = audio[start:end]
+            sub_text = segment["text"]
+            sub_text = normalize_text(sub_text, language)
+            sub_text = remove_punctuation(sub_text)
+            labels, emission, trellis, segments, word_segments = compute_alignment(sub_audio, sub_text, alignment_model)
+            ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
+            sub_words = sub_text.split()
+            assert len(sub_words) == len(word_segments), f"Unexpected number of words: {len(sub_words)} != {len(word_segments)}"
+            for word, segment in zip(sub_words, word_segments):
+                result["words"].append({
+                    "word": word,
+                    "start": segment.start * ratio + offset,
+                    "end": segment.end * ratio + offset,
+                    "conf": segment.score,
+                })
 
-    recognizer.AcceptWaveform(audio_data)
-    try:
-        decoder_result_raw = recognizer.FinalResult()
-    except Exception as err:
-        raise Exception("Failed to decode") from err
-    try:
-        decoder_result = json.loads(decoder_result_raw)
-    except Exception:
-        return result
-    result["text"] = re.sub("<unk> ", "", decoder_result["text"])
-    if "result" in decoder_result:
-        result["words"] = [w for w in decoder_result["result"] if w["word"] != "<unk>"]
-        if result["words"]:
-            result["confidence-score"] = sum([w["conf"] for w in result["words"]]) / len(
-                result["words"]
-            )
     return result
+
+
+custom_punctuations = string.punctuation.replace("'", "").replace("-", "")
+
+def remove_punctuation(text: str) -> str:
+    # Remove all punctuation except apostrophe
+    return text.translate(str.maketrans("", "", custom_punctuations))
+
+_whitespace_re = re.compile(r'[^\S\r\n]+')
+
+def collapse_whitespace(text):
+    return re.sub(_whitespace_re, ' ', text).strip()
+
+
+def normalize_text(text: str, lang: str) -> str:
+    """ Transform digits into characters... """
+    
+    # Roman digits
+    if re.search(r"[IVX]", text):
+        if lang == "en":
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(st|nd|rd|th)?\b", text)
+            digits = ["".join(d) for d in digits]
+        elif lang == "fr":
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(ème|eme|e|er|ère)?\b", text)
+            digits = ["".join(d) for d in digits]
+        else:
+            digits = []
+        if digits:
+            digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+            for s in digits:
+                filtered = re.sub("[a-z]", "", s)
+                ordinal = filtered != s
+                digit = romanToDecimal(filtered)
+                v = undigit(str(digit), lang=lang, to= "ordinal" if ordinal else "cardinal")
+                text = re.sub(r"\b" + s + r"\b", v, text)
+
+    # Ordinal digits
+    if lang == "en":
+        digits = re.findall(r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:th)\b", text)
+    elif lang == "fr":
+        digits = re.findall(r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:ème|eme|e)\b", text)
+    else:
+        logger.warn(f"Language {lang} not supported for normalization. Some words might be mis-localized.")
+        digits = []
+    if digits:
+        digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+        for digit in digits:
+            word = undigit(re.findall(r"\d+", digit)[0], to= "ordinal", lang = lang)
+            text = re.sub(r'\b'+str(digit)+r'\b', word, text)
+
+    # Cardinal digits
+    digits = re.findall(r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)",text)
+    digits = list(map(lambda s: s.strip(r"[/ ]"), digits))
+    digits = list(set(digits))
+    digits = digits + flatten([c.split() for c in digits if " " in c])
+    digits = digits + flatten([c.split("/") for c in digits if "/" in c])
+    digits = sorted(digits, reverse=True, key=lambda x: (len(x), x))
+    for digit in digits:
+        digitf = re.sub("/+", "/", digit)
+        if not digitf:
+            continue
+        numslash = len(re.findall("/", digitf))
+        if numslash == 0:
+            word = undigit(digitf, lang = lang)
+        elif numslash == 1: # Fraction or date
+            i = digitf.index("/")
+            is_date = False
+            if len(digitf[i+1:]) == 2:
+                try:
+                    first = int(digitf[:i])
+                    second = int(digitf[i+1:])
+                    is_date = first > 0 and first < 32 and second > 0 and second < 13
+                except: pass
+            if is_date:
+                first = undigit(digitf[:i].lstrip("0"), lang = lang)
+                if first == "un": first = "premier"
+                second = _int_to_month[second]
+            else:
+                first = undigit(digitf[:i], lang = lang)
+                second = undigit(digitf[i+1:], to="denominator", lang = lang)
+                if float(digitf[:i]) > 2. and second[-1] != "s":
+                    second += "s"
+            word = first + " " + second
+        elif numslash == 2: # Maybe a date
+            i1 = digitf.index("/")
+            i2 = digitf.index("/", i1+1)
+            is_date = False
+            if len(digitf[i1+1:i2]) == 2 and len(digitf[i2+1:]) == 4:
+                try:
+                    first = int(digitf[:i1])
+                    second = int(digitf[i1+1:i2])
+                    third = int(digitf[i2+1:])
+                    is_date = first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
+                except: pass
+            third = undigit(digitf[i2+1:], lang = lang)
+            if is_date:
+                first = undigit(digitf[:i1].lstrip("0"), lang = lang)
+                if first == "un": first = "premier"
+                second = _int_to_month.get(lang, {}).get(int(digitf[i1+1:i2]), digitf[i1+1:i2])
+                word = " ".join([first, second, third])
+            else:
+                word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+        else:
+            word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+        # Replace
+        if " " in digit:
+            text = re.sub(r'\b'+str(digit)+r'\b', " "+word+" ", text)
+        else:
+            text = re.sub(str(digit), " "+word+" ", text)
+
+    # TODO: symbols (currencies...)
+
+    return collapse_whitespace(text)
+
+def undigit(str, lang, to="cardinal"):
+    str = re.sub(" ","", str)
+    if to == "denominator":
+        assert lang == "fr"
+        if str == "2": return "demi"
+        if str == "3": return "tiers"
+        if str == "4": return "quart"
+        to = "ordinal"
+    if str.startswith("0") and to == "cardinal":
+        numZeros = len(re.findall(r"0+", str)[0])
+        if numZeros < len(str):
+            return numZeros * (my_num2words(0, lang=lang, to="cardinal")+" ") + my_num2words(float(str), lang=lang, to=to)
+    return my_num2words(float(str), lang=lang, to=to)
+
+
+def my_num2words(x, lang, to = "cardinal", orig = ""):
+    """
+    Bugfix for num2words
+    """
+    try:
+        if lang == "fr" and to == "ordinal":
+            return num2words(x, lang=lang, to=to).replace("vingtsième", "vingtième")
+        else:
+            return num2words(x, lang=lang, to=to)
+    except OverflowError:
+        if x == math.inf: # !
+            return " ".join(my_num2words(xi, lang=lang, to=to) for xi in orig)
+        if x == -math.inf: # !
+            return "moins " + my_num2words(-x, lang=lang, to=to, orig=orig.replace("-" , ""))
+        # TODO: print a warning
+        return my_num2words(x//10, lang=lang, to=to)
+
+def flatten(l):
+    """
+    flatten a list of lists
+    """
+    return [item for sublist in l for item in sublist]
+
+_int_to_month = {
+    "fr": {
+        1: "janvier",
+        2: "février",
+        3: "mars",
+        4: "avril",
+        5: "mai",
+        6: "juin",
+        7: "juillet",
+        8: "août",
+        9: "septembre",
+        10: "octobre",
+        11: "novembre",
+        12: "décembre",
+    },
+    "en": {
+        1: "january",
+        2: "february",
+        3: "march",
+        4: "april",
+        5: "may",
+        6: "june",
+        7: "july",
+        8: "august",
+        9: "september",
+        10: "october",
+        11: "november",
+        12: "december",
+    }
+}
+
+
+def romanToDecimal(str):
+    def value(r):
+        if (r == 'I'):
+            return 1
+        if (r == 'V'):
+            return 5
+        if (r == 'X'):
+            return 10
+        if (r == 'L'):
+            return 50
+        if (r == 'C'):
+            return 100
+        if (r == 'D'):
+            return 500
+        if (r == 'M'):
+            return 1000
+        return -1
+
+    res = 0
+    i = 0
+    while (i < len(str)):
+        # Getting value of symbol s[i]
+        s1 = value(str[i])
+        if (i + 1 < len(str)):
+            # Getting value of symbol s[i + 1]
+            s2 = value(str[i + 1])
+            # Comparing both values
+            if (s1 >= s2):
+                # Value of current symbol is greater
+                # or equal to the next symbol
+                res = res + s1
+                i = i + 1
+            else:
+                # Value of current symbol is greater
+                # or equal to the next symbol
+                res = res + s2 - s1
+                i = i + 2
+        else:
+            res = res + s1
+            i = i + 1
+    return res
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
new file mode 100644
index 0000000..da5d98c
--- /dev/null
+++ b/stt/processing/load_model.py
@@ -0,0 +1,62 @@
+import whisper
+
+import os
+import requests
+import huggingface_hub
+import speechbrain as sb
+
+def load_whisper_model(model_type_or_file, device = "cpu", download_root = "/opt"):
+
+    model = whisper.load_model(model_type_or_file, device = device, download_root = os.path.join(download_root, "whisper"))
+
+    model.eval()
+    model.requires_grad_(False)
+    return model
+
+def load_speechbrain_model(source, device = "cpu", download_root = "/opt"):
+    
+    if os.path.isdir(source):
+        yaml_file = os.path.join(source, "hyperparams.yaml")
+        assert os.path.isfile(yaml_file), f"Hyperparams file {yaml_file} not found"
+    else:
+        try:
+            yaml_file = huggingface_hub.hf_hub_download(repo_id=source, filename="hyperparams.yaml", cache_dir = os.path.join(download_root, "huggingface/hub"))
+        except requests.exceptions.HTTPError:
+            yaml_file = None
+    overrides = make_yaml_overrides(yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
+
+    savedir = os.path.join(download_root, "speechbrain")
+    try:
+        model = sb.pretrained.EncoderASR.from_hparams(source = source, run_opts= {"device": device}, savedir = savedir, overrides = overrides)
+    except ValueError:
+        model = sb.pretrained.EncoderDecoderASR.from_hparams(source = source, run_opts= {"device": device}, savedir = savedir, overrides = overrides)
+
+    model.train(False)
+    model.requires_grad_(False)
+    return model
+
+
+def make_yaml_overrides(yaml_file, key_values):
+    """
+    return a dictionary of overrides to be used with speechbrain (hyperyaml files)
+    yaml_file: path to yaml file
+    key_values: dict of key values to override
+    """
+    if yaml_file is None: return None
+
+    override = {}
+    with open(yaml_file, "r") as f:
+        parent = None
+        for line in f:
+            if line.strip() == "":
+                parent = None
+            elif line == line.lstrip():
+                if ":" in line:
+                    parent = line.split(":")[0].strip()
+                    if parent in key_values:
+                        override[parent] = key_values[parent]
+            elif ":" in line:
+                child = line.strip().split(":")[0].strip()
+                if child in key_values:
+                    override[parent] = override.get(parent, {}) | {child: key_values[child]}
+    return override
diff --git a/stt/processing/streaming.py b/stt/processing/streaming.py
deleted file mode 100644
index 28274b8..0000000
--- a/stt/processing/streaming.py
+++ /dev/null
@@ -1,109 +0,0 @@
-import json
-import re
-from typing import Union
-
-from simple_websocket.ws import Server as WSServer
-from vosk import KaldiRecognizer, Model
-from websockets.legacy.server import WebSocketServerProtocol
-
-from stt import logger
-
-
-async def wssDecode(ws: WebSocketServerProtocol, model: Model):
-    """Async Decode function endpoint"""
-    # Wait for config
-    res = await ws.recv()
-
-    # Parse config
-    try:
-        config = json.loads(res)["config"]
-        sample_rate = config["sample_rate"]
-    except Exception as e:
-        logger.error("Failed to read stream configuration")
-        await ws.close(reason="Failed to load configuration")
-
-    # Recognizer
-    try:
-        recognizer = KaldiRecognizer(model, sample_rate)
-    except Exception as e:
-        logger.error("Failed to load recognizer")
-        await ws.close(reason="Failed to load recognizer")
-
-    # Wait for chunks
-    while True:
-        try:
-            # Client data
-            message = await ws.recv()
-            if message is None or message == "":  # Timeout
-                ws.close()
-        except Exception as e:
-            print("Connection closed by client: {}".format(str(e)))
-            break
-
-        # End frame
-        if "eof" in str(message):
-            ret = recognizer.FinalResult()
-            await ws.send(json.dumps(ret))
-            await ws.close(reason="End of stream")
-            break
-
-        # Audio chunk
-        if recognizer.AcceptWaveform(message):
-            ret = recognizer.Result()  # Result seems to not work properly
-            await ws.send(ret)
-
-        else:
-            ret = recognizer.PartialResult()
-            last_utterance = ret
-            await ws.send(ret)
-
-
-def ws_streaming(websocket_server: WSServer, model: Model):
-    """Sync Decode function endpoint"""
-    # Wait for config
-    res = websocket_server.receive(timeout=10)
-
-    # Timeout
-    if res is None:
-        pass
-
-    # Parse config
-    try:
-        config = json.loads(res)["config"]
-        sample_rate = config["sample_rate"]
-    except Exception:
-        logger.error("Failed to read stream configuration")
-        websocket_server.close()
-
-    # Recognizer
-    try:
-        recognizer = KaldiRecognizer(model, sample_rate)
-    except Exception:
-        logger.error("Failed to load recognizer")
-        websocket_server.close()
-
-    # Wait for chunks
-    while True:
-        try:
-            # Client data
-            message = websocket_server.receive(timeout=10)
-            if message is None:  # Timeout
-                websocket_server.close()
-        except Exception:
-            print("Connection closed by client")
-            break
-        # End frame
-        if "eof" in str(message):
-            ret = recognizer.FinalResult()
-            websocket_server.send(json.dumps(re.sub("<unk> ", "", ret)))
-            websocket_server.close()
-            break
-        # Audio chunk
-        print("Received chunk")
-        if recognizer.AcceptWaveform(message):
-            ret = recognizer.Result()
-            websocket_server.send(re.sub("<unk> ", "", ret))
-
-        else:
-            ret = recognizer.PartialResult()
-            websocket_server.send(re.sub("<unk> ", "", ret))
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index d003fc8..6956161 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -1,24 +1,40 @@
 import io
-
 import wavio
-from numpy import int16, squeeze, mean
+import os
+import numpy as np
+import torch
+import torchaudio
+import whisper
 
+def conform_audio(audio, sample_rate = 16_000):
+    if sample_rate != whisper.audio.SAMPLE_RATE:
+        # Down or Up sample to the right sampling rate
+        audio = torchaudio.transforms.Resample(sample_rate, whisper.audio.SAMPLE_RATE)(audio)
+    if audio.shape[0] > 1:
+        # Stereo to mono
+        # audio = torchaudio.transforms.DownmixMono()(audio, channels_first = True)
+        audio = audio.mean(0)
+    else:
+        audio = audio.squeeze(0)
+    return audio
 
-def load_wave(file_path):
-    """Formats audio from a wavFile buffer to a bytebuffer"""
-    audio = squeeze(wavio.read(file_path).data)
-    return audio.tobytes()
+def load_audiofile(path):
+    if not os.path.isfile(path):
+        raise RuntimeError("File not found: %s" % path)
+    elif not os.access(path, os.R_OK):
+        raise RuntimeError("Missing reading permission for: %s" % path)
+    # audio, sample_rate = torchaudio.load(path)
+    # return conform_audio(audio, sample_rate)
+    audio = whisper.load_audio(path)
+    audio = torch.from_numpy(audio)
+    return audio
 
 
-def formatAudio(file_buffer):
-    """Formats audio from a wavFile buffer to a numpy array for processing."""
+def load_wave_buffer(file_buffer):
+    """ Formats audio from a wavFile buffer to a torch array for processing. """
     file_buffer_io = io.BytesIO(file_buffer)
     file_content = wavio.read(file_buffer_io)
-    # if stereo file, convert to mono by computing the mean over the channels
-    if file_content.data.ndim == 2:
-        if file_content.data.shape[1] == 1:
-            data = squeeze(file_content.data)
-        elif file_content.data.shape[1] == 2:
-            data = mean(data, axis=1, dtype=int16)
-        return data.tobytes(), file_content.rate
-    raise Exception("Audio Format not supported.")
+    sample_rate = file_content.rate
+    audio = torch.from_numpy(file_content.data.astype(np.float32)/32768)
+    audio = audio.transpose(0,1)
+    return conform_audio(audio, sample_rate)
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
new file mode 100644
index 0000000..974e528
--- /dev/null
+++ b/stt/processing/word_alignment.py
@@ -0,0 +1,169 @@
+import unicodedata
+from dataclasses import dataclass
+import torch
+
+from stt import logger
+from .alignment_model import speechbrain_compute_log_probas as compute_log_probas
+from .alignment_model import speechbrain_get_vocab as get_vocab
+
+
+def compute_alignment(audio, transcript, model):
+    """ Compute the alignment of the audio and a transcript, for a given model that returns log-probabilities on the charset defined the transcript."""
+
+    emission = compute_log_probas(model, audio)
+    labels, blank_id = get_vocab(model)
+    labels = labels[:emission.shape[1]]
+    dictionary = {c: i for i, c in enumerate(labels)}
+
+    tokens = [loose_get_char_index(dictionary, c, blank_id) for c in transcript]
+    tokens = [i for i in tokens if i is not None]
+
+    trellis = get_trellis(emission, tokens, blank_id = blank_id)
+
+    path = backtrack(trellis, emission, tokens, blank_id = blank_id)
+    
+    segments = merge_repeats(transcript, path)
+
+    word_segments = merge_words(segments)
+
+    return labels, emission, trellis, segments, word_segments
+
+def loose_get_char_index(dictionary, c, default):
+        i = dictionary.get(c, None)
+        if i is None:
+            other_char = list(set([c.lower(), c.upper(), transliterate(c), transliterate(c).lower(), transliterate(c).upper()]))
+            for c2 in other_char:
+                i = dictionary.get(c2, None)
+                if i is not None:
+                    break
+            if i is None:
+                logger.warn("Cannot find label " + " / ".join(list(set([c] + other_char))))
+                i = default
+        return i
+
+def transliterate(c):
+    # Transliterates a character to its closest ASCII equivalent.
+    # For example, "é" becomes "e".
+    # This is useful for converting Vietnamese text to ASCII.
+    # See https://stackoverflow.com/a/517974/446579
+    return unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii")
+
+def get_trellis(emission, tokens, blank_id=0, use_max = False):
+    num_frame = emission.size(0)
+    num_tokens = len(tokens)
+
+    # Trellis has extra diemsions for both time axis and tokens.
+    # The extra dim for tokens represents <SoS> (start-of-sentence)
+    # The extra dim for time axis is for simplification of the code.
+    trellis = torch.empty((num_frame + 1, num_tokens + 1)).to(emission.device)
+    trellis[0, 0] = 0
+    trellis[1:, 0] = torch.cumsum(emission[:, blank_id], 0)
+    trellis[0, -num_tokens:] = -float("inf")
+    trellis[-num_tokens:, 0] = float("inf")
+
+    for t in range(num_frame):
+        trellis[t + 1, 1:] = torch.maximum(
+            # Score for staying at the same token
+            trellis[t, 1:] + emission[t, blank_id],
+            torch.maximum(trellis[t, 1:] + emission[t, tokens],
+            # Score for changing to the next token
+            trellis[t, :-1] + emission[t, tokens])
+        ) if use_max else torch.logaddexp(
+            trellis[t, 1:] + emission[t, blank_id],
+            torch.logaddexp(trellis[t, 1:] + emission[t, tokens],
+            trellis[t, :-1] + emission[t, tokens])
+        )
+    return trellis
+
+@dataclass
+class Point:
+    token_index: int
+    time_index: int
+    score: float
+
+
+def backtrack(trellis, emission, tokens, blank_id=0):
+    # Note:
+    # j and t are indices for trellis, which has extra dimensions
+    # for time and tokens at the beginning.
+    # When referring to time frame index `T` in trellis,
+    # the corresponding index in emission is `T-1`.
+    # Similarly, when referring to token index `J` in trellis,
+    # the corresponding index in transcript is `J-1`.
+    j = trellis.size(1) - 1
+    t_start = torch.argmax(trellis[:, j]).item()
+
+    path = []
+    for t in range(t_start, 0, -1):
+        # 1. Figure out if the current position was stay or change
+        # Note (again):
+        # `emission[J-1]` is the emission at time frame `J` of trellis dimension.
+        # Score for token staying the same from time frame J-1 to T.
+        stayed = trellis[t - 1, j] + emission[t - 1, blank_id]
+        # Score for token changing from C-1 at T-1 to J at T.
+        changed = trellis[t - 1, j - 1] + emission[t - 1, tokens[j - 1]]
+
+        # 2. Store the path with frame-wise probability.
+        prob = emission[t - 1, tokens[j - 1] if changed > stayed else 0].exp().item()
+        # Return token index and time index in non-trellis coordinate.
+        path.append(Point(j - 1, t - 1, prob))
+
+        # 3. Update the token
+        if changed > stayed:
+            j -= 1
+            if j == 0:
+                break
+    else:
+        raise ValueError("Failed to align")
+    return path[::-1]
+
+
+# Merge the labels
+@dataclass
+class Segment:
+    label: str
+    start: int
+    end: int
+    score: float
+
+    def __repr__(self):
+        return f"{self.label}\t({self.score:4.2f}): [{self.start:5d}, {self.end:5d})"
+
+    @property
+    def length(self):
+        return self.end - self.start
+
+
+def merge_repeats(transcript, path):
+    i1, i2 = 0, 0
+    segments = []
+    while i1 < len(path):
+        while i2 < len(path) and path[i1].token_index == path[i2].token_index:
+            i2 += 1
+        score = sum(path[k].score for k in range(i1, i2)) / (i2 - i1)
+        segments.append(
+            Segment(
+                transcript[path[i1].token_index],
+                path[i1].time_index,
+                path[i2 - 1].time_index + 1,
+                score,
+            )
+        )
+        i1 = i2
+    return segments
+
+def merge_words(segments, separator=" "):
+    words = []
+    i1, i2 = 0, 0
+    while i1 < len(segments):
+        if i2 >= len(segments) or segments[i2].label == separator:
+            if i1 != i2:
+                segs = segments[i1:i2]
+                word = "".join([seg.label for seg in segs])
+                score = sum(seg.score * seg.length for seg in segs) / sum(seg.length for seg in segs)
+                words.append(Segment(word, segments[i1].start, segments[i2 - 1].end, score))
+            i1 = i2 + 1
+            i2 = i1
+        else:
+            i2 += 1
+    return words
\ No newline at end of file
diff --git a/websocket/websocketserver.py b/websocket/websocketserver.py
deleted file mode 100644
index 81e035b..0000000
--- a/websocket/websocketserver.py
+++ /dev/null
@@ -1,23 +0,0 @@
-import asyncio
-import os
-
-import websockets
-
-from stt.processing import model
-from stt.processing.streaming import wssDecode
-
-
-async def _fun_wrapper(ws):
-    """Wrap wssDecode function to add STT Model reference"""
-    return await wssDecode(ws, model)
-
-
-async def WSServer(port: int):
-    """Launch the websocket server"""
-    async with websockets.serve(_fun_wrapper, "0.0.0.0", serving_port):
-        await asyncio.Future()
-
-
-if __name__ == "__main__":
-    serving_port = os.environ.get("STREAMING_PORT", 80)
-    asyncio.run(WSServer(serving_port))

From 0135916b55eb06a5285da8626d275e66a9538374 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 15:57:58 +0100
Subject: [PATCH 087/172] Rename MODEL_TYPE -> MODEL. Document LANGUAGE

---
 .envdefault                |  2 +-
 README.md                  | 25 ++++++++++++++++++++++---
 docker-entrypoint.sh       |  4 ++--
 stt/processing/__init__.py |  2 +-
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/.envdefault b/.envdefault
index 61a57bd..4452be3 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,6 +1,6 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
-MODEL_TYPE=/opt/model.pt
+MODEL=/opt/model.pt
 LANGUAGE=fr
 
 # TASK PARAMETERS
diff --git a/README.md b/README.md
index 50f03a8..a15b330 100644
--- a/README.md
+++ b/README.md
@@ -66,12 +66,32 @@ cp .envdefault .env
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
 | SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
-| MODEL_TYPE | Path to the model or type of model used. | ASR_PATH\|small\|medium\|large-v1\|... |
+| MODEL | Path to the model or type of model used. | ASR_PATH\|small\|medium\|large-v1\|... |
+| LANGUAGE | (Optional) Language to recognize | fr\|en\|... |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
 | SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
 | BROKER_PASS | Using the task mode, broker password | my-password |
 | CONCURRENCY | Maximum number of parallel requests | >1 |
 
+The language is a code of two or three letters. The list of languages supported by Whisper are:
+```
+af(afrikaans), am(amharic), ar(arabic), as(assamese), az(azerbaijani),
+ba(bashkir), be(belarusian), bg(bulgarian), bn(bengali), bo(tibetan), br(breton), bs(bosnian),
+ca(catalan), cs(czech), cy(welsh), da(danish), de(german), el(greek), en(english), es(spanish),
+et(estonian), eu(basque), fa(persian), fi(finnish), fo(faroese), fr(french), gl(galician),
+gu(gujarati), ha(hausa), haw(hawaiian), he(hebrew), hi(hindi), hr(croatian), ht(haitian creole),
+hu(hungarian), hy(armenian), id(indonesian), is(icelandic), it(italian), ja(japanese),
+jw(javanese), ka(georgian), kk(kazakh), km(khmer), kn(kannada), ko(korean), la(latin),
+lb(luxembourgish), ln(lingala), lo(lao), lt(lithuanian), lv(latvian), mg(malagasy), mi(maori),
+mk(macedonian), ml(malayalam), mn(mongolian), mr(marathi), ms(malay), mt(maltese), my(myanmar),
+ne(nepali), nl(dutch), nn(nynorsk), no(norwegian), oc(occitan), pa(punjabi), pl(polish),
+ps(pashto), pt(portuguese), ro(romanian), ru(russian), sa(sanskrit), sd(sindhi), si(sinhala),
+sk(slovak), sl(slovenian), sn(shona), so(somali), sq(albanian), sr(serbian), su(sundanese),
+sv(swedish), sw(swahili), ta(tamil), te(telugu), tg(tajik), th(thai), tk(turkmen), tl(tagalog),
+tr(turkish), tt(tatar), uk(ukrainian), ur(urdu), uz(uzbek), vi(vietnamese), yi(yiddish),
+yo(yoruba), zh(chinese)
+```
+
 ### Serving mode 
 ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
 
@@ -122,9 +142,8 @@ docker run --rm \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file .env \
 linto-platform-stt:latest
-```
+```| LANGUAGE | (Optional) Language to recognize | fr\|en\|... |
 
-**Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
 | ASR_PATH | (Optional) Path to the Whisper model on the host machine to /opt/model.pt | /my/path/to/models/medium.pt |
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 4d67cca..5014d8f 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -5,10 +5,10 @@ echo "RUNNING STT"
 
 # Check model
 echo "Checking model format ..."
-if [ -z "$MODEL_TYPE" ]
+if [ -z "$MODEL" ]
 then
     echo "Model type not specified, choosing Whisper medium model"
-    export MODEL_TYPE=medium
+    export MODEL=medium
 fi
 
 # Launch parameters, environement variables and dependencies check
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index a4d6182..19d7f0e 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -29,7 +29,7 @@
     raise RuntimeError(f"Langaue {get_default_language()} is not available. Available languages are: {available_languages}")
 
 # Load ASR model
-model_type = os.environ.get("MODEL_TYPE", "medium")
+model_type = os.environ.get("MODEL", "medium")
 logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.isfile(model_type) else 'remote'})...")
 start = time()
 try:

From 7291f87e4e74fcee4bf6d83f6e2f1dbaf3794d92 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 17:18:10 +0100
Subject: [PATCH 088/172] Isolate everything related to text normalization in
 one place. And implement stuff for symbols

---
 stt/processing/decoding.py       | 235 +----------------------
 stt/processing/text_normalize.py | 316 +++++++++++++++++++++++++++++++
 stt/processing/utils.py          |   6 +
 stt/processing/word_alignment.py |  50 +++--
 4 files changed, 358 insertions(+), 249 deletions(-)
 create mode 100644 stt/processing/text_normalize.py

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 7290af4..dc8d95a 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -3,16 +3,12 @@
 import whisper
 from whisper.audio import SAMPLE_RATE
 
-import math
 import numpy as np
 import torch
 
-import re
-import string
-from num2words import num2words
-
 from stt import logger
 from .word_alignment import compute_alignment
+from .text_normalize import remove_punctuation, normalize_text
 
 # TODO: understand and remove this limitations
 torch.set_num_threads(1)
@@ -76,10 +72,14 @@ def decode(audio: torch.Tensor,
             sub_text = segment["text"]
             sub_text = normalize_text(sub_text, language)
             sub_text = remove_punctuation(sub_text)
+            if not sub_text:
+                logger.warn(f"Lost text in segment {segment['start']}-{segment['end']}")
+                continue
             labels, emission, trellis, segments, word_segments = compute_alignment(sub_audio, sub_text, alignment_model)
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
-            assert len(sub_words) == len(word_segments), f"Unexpected number of words: {len(sub_words)} != {len(word_segments)}"
+            assert len(sub_words) == len(word_segments), \
+                f"Unexpected number of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}"
             for word, segment in zip(sub_words, word_segments):
                 result["words"].append({
                     "word": word,
@@ -91,226 +91,3 @@ def decode(audio: torch.Tensor,
     return result
 
 
-custom_punctuations = string.punctuation.replace("'", "").replace("-", "")
-
-def remove_punctuation(text: str) -> str:
-    # Remove all punctuation except apostrophe
-    return text.translate(str.maketrans("", "", custom_punctuations))
-
-_whitespace_re = re.compile(r'[^\S\r\n]+')
-
-def collapse_whitespace(text):
-    return re.sub(_whitespace_re, ' ', text).strip()
-
-
-def normalize_text(text: str, lang: str) -> str:
-    """ Transform digits into characters... """
-    
-    # Roman digits
-    if re.search(r"[IVX]", text):
-        if lang == "en":
-            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(st|nd|rd|th)?\b", text)
-            digits = ["".join(d) for d in digits]
-        elif lang == "fr":
-            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(ème|eme|e|er|ère)?\b", text)
-            digits = ["".join(d) for d in digits]
-        else:
-            digits = []
-        if digits:
-            digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
-            for s in digits:
-                filtered = re.sub("[a-z]", "", s)
-                ordinal = filtered != s
-                digit = romanToDecimal(filtered)
-                v = undigit(str(digit), lang=lang, to= "ordinal" if ordinal else "cardinal")
-                text = re.sub(r"\b" + s + r"\b", v, text)
-
-    # Ordinal digits
-    if lang == "en":
-        digits = re.findall(r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:th)\b", text)
-    elif lang == "fr":
-        digits = re.findall(r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:ème|eme|e)\b", text)
-    else:
-        logger.warn(f"Language {lang} not supported for normalization. Some words might be mis-localized.")
-        digits = []
-    if digits:
-        digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
-        for digit in digits:
-            word = undigit(re.findall(r"\d+", digit)[0], to= "ordinal", lang = lang)
-            text = re.sub(r'\b'+str(digit)+r'\b', word, text)
-
-    # Cardinal digits
-    digits = re.findall(r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)",text)
-    digits = list(map(lambda s: s.strip(r"[/ ]"), digits))
-    digits = list(set(digits))
-    digits = digits + flatten([c.split() for c in digits if " " in c])
-    digits = digits + flatten([c.split("/") for c in digits if "/" in c])
-    digits = sorted(digits, reverse=True, key=lambda x: (len(x), x))
-    for digit in digits:
-        digitf = re.sub("/+", "/", digit)
-        if not digitf:
-            continue
-        numslash = len(re.findall("/", digitf))
-        if numslash == 0:
-            word = undigit(digitf, lang = lang)
-        elif numslash == 1: # Fraction or date
-            i = digitf.index("/")
-            is_date = False
-            if len(digitf[i+1:]) == 2:
-                try:
-                    first = int(digitf[:i])
-                    second = int(digitf[i+1:])
-                    is_date = first > 0 and first < 32 and second > 0 and second < 13
-                except: pass
-            if is_date:
-                first = undigit(digitf[:i].lstrip("0"), lang = lang)
-                if first == "un": first = "premier"
-                second = _int_to_month[second]
-            else:
-                first = undigit(digitf[:i], lang = lang)
-                second = undigit(digitf[i+1:], to="denominator", lang = lang)
-                if float(digitf[:i]) > 2. and second[-1] != "s":
-                    second += "s"
-            word = first + " " + second
-        elif numslash == 2: # Maybe a date
-            i1 = digitf.index("/")
-            i2 = digitf.index("/", i1+1)
-            is_date = False
-            if len(digitf[i1+1:i2]) == 2 and len(digitf[i2+1:]) == 4:
-                try:
-                    first = int(digitf[:i1])
-                    second = int(digitf[i1+1:i2])
-                    third = int(digitf[i2+1:])
-                    is_date = first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
-                except: pass
-            third = undigit(digitf[i2+1:], lang = lang)
-            if is_date:
-                first = undigit(digitf[:i1].lstrip("0"), lang = lang)
-                if first == "un": first = "premier"
-                second = _int_to_month.get(lang, {}).get(int(digitf[i1+1:i2]), digitf[i1+1:i2])
-                word = " ".join([first, second, third])
-            else:
-                word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
-        else:
-            word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
-        # Replace
-        if " " in digit:
-            text = re.sub(r'\b'+str(digit)+r'\b', " "+word+" ", text)
-        else:
-            text = re.sub(str(digit), " "+word+" ", text)
-
-    # TODO: symbols (currencies...)
-
-    return collapse_whitespace(text)
-
-def undigit(str, lang, to="cardinal"):
-    str = re.sub(" ","", str)
-    if to == "denominator":
-        assert lang == "fr"
-        if str == "2": return "demi"
-        if str == "3": return "tiers"
-        if str == "4": return "quart"
-        to = "ordinal"
-    if str.startswith("0") and to == "cardinal":
-        numZeros = len(re.findall(r"0+", str)[0])
-        if numZeros < len(str):
-            return numZeros * (my_num2words(0, lang=lang, to="cardinal")+" ") + my_num2words(float(str), lang=lang, to=to)
-    return my_num2words(float(str), lang=lang, to=to)
-
-
-def my_num2words(x, lang, to = "cardinal", orig = ""):
-    """
-    Bugfix for num2words
-    """
-    try:
-        if lang == "fr" and to == "ordinal":
-            return num2words(x, lang=lang, to=to).replace("vingtsième", "vingtième")
-        else:
-            return num2words(x, lang=lang, to=to)
-    except OverflowError:
-        if x == math.inf: # !
-            return " ".join(my_num2words(xi, lang=lang, to=to) for xi in orig)
-        if x == -math.inf: # !
-            return "moins " + my_num2words(-x, lang=lang, to=to, orig=orig.replace("-" , ""))
-        # TODO: print a warning
-        return my_num2words(x//10, lang=lang, to=to)
-
-def flatten(l):
-    """
-    flatten a list of lists
-    """
-    return [item for sublist in l for item in sublist]
-
-_int_to_month = {
-    "fr": {
-        1: "janvier",
-        2: "février",
-        3: "mars",
-        4: "avril",
-        5: "mai",
-        6: "juin",
-        7: "juillet",
-        8: "août",
-        9: "septembre",
-        10: "octobre",
-        11: "novembre",
-        12: "décembre",
-    },
-    "en": {
-        1: "january",
-        2: "february",
-        3: "march",
-        4: "april",
-        5: "may",
-        6: "june",
-        7: "july",
-        8: "august",
-        9: "september",
-        10: "october",
-        11: "november",
-        12: "december",
-    }
-}
-
-
-def romanToDecimal(str):
-    def value(r):
-        if (r == 'I'):
-            return 1
-        if (r == 'V'):
-            return 5
-        if (r == 'X'):
-            return 10
-        if (r == 'L'):
-            return 50
-        if (r == 'C'):
-            return 100
-        if (r == 'D'):
-            return 500
-        if (r == 'M'):
-            return 1000
-        return -1
-
-    res = 0
-    i = 0
-    while (i < len(str)):
-        # Getting value of symbol s[i]
-        s1 = value(str[i])
-        if (i + 1 < len(str)):
-            # Getting value of symbol s[i + 1]
-            s2 = value(str[i + 1])
-            # Comparing both values
-            if (s1 >= s2):
-                # Value of current symbol is greater
-                # or equal to the next symbol
-                res = res + s1
-                i = i + 1
-            else:
-                # Value of current symbol is greater
-                # or equal to the next symbol
-                res = res + s2 - s1
-                i = i + 2
-        else:
-            res = res + s1
-            i = i + 1
-    return res
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
new file mode 100644
index 0000000..5ba3358
--- /dev/null
+++ b/stt/processing/text_normalize.py
@@ -0,0 +1,316 @@
+import math
+import re
+#import string
+import unicodedata
+from num2words import num2words
+
+from stt import logger
+from .utils import flatten
+
+_punctuations = '!"#$%&()*+,/:;<=>?@[\\]^_`{|}~«»¿' # string.punctuation, plus Whisper specific "«»¿", minus apostrophe "'", dash "-", and dot "." (which will be processed as special)
+
+def remove_punctuation(text: str) -> str:
+    text = text.translate(str.maketrans("", "", _punctuations))
+    # We don't remove dots inside words (e.g. "ab@gmail.com")
+    text = re.sub(r"\.(\s)",r"\1", text+" ").strip()
+    return collapse_whitespace(text)
+
+_whitespace_re = re.compile(r'[^\S\r\n]+')
+
+def collapse_whitespace(text):
+    return re.sub(_whitespace_re, ' ', text).strip()
+
+def transliterate(c):
+    # Transliterates a character to its closest ASCII equivalent.
+    # Example: transliterate("à ß œ ﬂ") = "a ss oe fl"
+    c = re.sub("œ", "oe", c)
+    c = re.sub("æ", "ae", c)
+    c = re.sub("Œ", "OE", c)
+    c = re.sub("Æ", "AE", c)
+    c = re.sub("ß", "ss", c)
+    return unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii")
+
+
+def normalize_text(text: str, lang: str) -> str:
+    """ Transform digits into characters... """
+
+    # Reorder currencies (1,20€ -> 1 € 20)
+    coma = "," if lang in ["fr"] else "\."
+    for c in _currencies:
+        if c in text:
+            text = re.sub(r"\b(\d+)" + coma + r"(\d+)\s*" + c, r"\1 " + c + r" \2", text)
+    
+    # Roman digits
+    if re.search(r"[IVX]", text):
+        if lang == "en":
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|st|nd|rd|th)?\b", text)
+            digits = ["".join(d) for d in digits]
+        elif lang == "fr":
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text)
+            digits = ["".join(d) for d in digits]
+        else:
+            digits = []
+        if digits:
+            digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+            for s in digits:
+                filtered = re.sub("[a-z]", "", s)
+                ordinal = filtered != s
+                digit = roman_to_decimal(filtered)
+                v = undigit(str(digit), lang=lang, to= "ordinal" if ordinal else "cardinal")
+                text = re.sub(r"\b" + s + r"\b", v, text)
+
+    # Ordinal digits
+    if lang == "en":
+        digits = re.findall(r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:º|th)\b", text)
+    elif lang == "fr":
+        digits = re.findall(r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
+    else:
+        logger.warn(f"Language {lang} not supported for normalization. Some words might be mis-localized.")
+        digits = []
+    if digits:
+        digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+        for digit in digits:
+            word = undigit(re.findall(r"\d+", digit)[0], to= "ordinal", lang = lang)
+            text = re.sub(r'\b'+str(digit)+r'\b', word, text)
+
+    # Cardinal digits
+    digits = re.findall(r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)",text)
+    digits = list(map(lambda s: s.strip(r"[/ ]"), digits))
+    digits = list(set(digits))
+    digits = digits + flatten([c.split() for c in digits if " " in c])
+    digits = digits + flatten([c.split("/") for c in digits if "/" in c])
+    digits = sorted(digits, reverse=True, key=lambda x: (len(x), x))
+    for digit in digits:
+        digitf = re.sub("/+", "/", digit)
+        if not digitf:
+            continue
+        numslash = len(re.findall("/", digitf))
+        if numslash == 0:
+            word = undigit(digitf, lang = lang)
+        elif numslash == 1: # Fraction or date
+            i = digitf.index("/")
+            is_date = False
+            if len(digitf[i+1:]) == 2:
+                try:
+                    first = int(digitf[:i])
+                    second = int(digitf[i+1:])
+                    is_date = first > 0 and first < 32 and second > 0 and second < 13
+                except: pass
+            if is_date:
+                first = digitf[:i].lstrip("0")
+                use_ordinal = (lang == "fr" and first == "1") or (lang != "fr" and first[-1] in ["1", "2", "3"])
+                first = undigit(first, lang = lang, to="ordinal" if use_ordinal else "cardinal")
+                second = _int_to_month[second]
+            else:
+                first = undigit(digitf[:i], lang = lang)
+                second = undigit(digitf[i+1:], to="denominator", lang = lang)
+                if float(digitf[:i]) > 2. and second[-1] != "s":
+                    second += "s"
+            word = first + " " + second
+        elif numslash == 2: # Maybe a date
+            i1 = digitf.index("/")
+            i2 = digitf.index("/", i1+1)
+            is_date = False
+            if len(digitf[i1+1:i2]) == 2 and len(digitf[i2+1:]) == 4:
+                try:
+                    first = int(digitf[:i1])
+                    second = int(digitf[i1+1:i2])
+                    third = int(digitf[i2+1:])
+                    is_date = first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
+                except: pass
+            third = undigit(digitf[i2+1:], lang = lang)
+            if is_date:
+                first = digitf[:i].lstrip("0")
+                use_ordinal = (lang == "fr" and first == "1") or (lang != "fr" and first[-1] in ["1", "2", "3"])
+                first = undigit(first, lang = lang, to="ordinal" if use_ordinal else "cardinal")
+                second = _int_to_month.get(lang, {}).get(int(digitf[i1+1:i2]), digitf[i1+1:i2])
+                word = " ".join([first, second, third])
+            else:
+                word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+        else:
+            word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+        if " " in digit:
+            text = re.sub(r'\b'+str(digit)+r'\b', " "+word+" ", text)
+        else:
+            text = re.sub(str(digit), " "+word+" ", text)
+
+    # Symbols (currencies, percent...)
+    symbol_table = _symbol_to_word.get(lang, {})
+    for k, v in symbol_table.items():
+        text = re.sub(k, " "+v+" ", text)
+
+    return collapse_whitespace(text)
+
+def undigit(str, lang, to="cardinal"):
+    str = re.sub(" ","", str)
+    if to == "denominator":
+        assert lang == "fr"
+        if str == "2": return "demi"
+        if str == "3": return "tiers"
+        if str == "4": return "quart"
+        to = "ordinal"
+    if str.startswith("0") and to == "cardinal":
+        numZeros = len(re.findall(r"0+", str)[0])
+        if numZeros < len(str):
+            return numZeros * (my_num2words(0, lang=lang, to="cardinal")+" ") + my_num2words(float(str), lang=lang, to=to)
+    return my_num2words(float(str), lang=lang, to=to)
+
+
+def my_num2words(x, lang, to = "cardinal", orig = ""):
+    """
+    Bugfix for num2words
+    """
+    try:
+        if lang == "fr" and to == "ordinal":
+            return num2words(x, lang=lang, to=to).replace("vingtsième", "vingtième")
+        else:
+            return num2words(x, lang=lang, to=to)
+    except OverflowError:
+        if x == math.inf: # !
+            return " ".join(my_num2words(xi, lang=lang, to=to) for xi in orig)
+        if x == -math.inf: # !
+            return "moins " + my_num2words(-x, lang=lang, to=to, orig=orig.replace("-" , ""))
+        # TODO: print a warning
+        return my_num2words(x//10, lang=lang, to=to)
+
+def roman_to_decimal(str):
+    def value(r):
+        if (r == 'I'):
+            return 1
+        if (r == 'V'):
+            return 5
+        if (r == 'X'):
+            return 10
+        if (r == 'L'):
+            return 50
+        if (r == 'C'):
+            return 100
+        if (r == 'D'):
+            return 500
+        if (r == 'M'):
+            return 1000
+        return -1
+
+    res = 0
+    i = 0
+    while (i < len(str)):
+        s1 = value(str[i])
+        if (i + 1 < len(str)):
+            s2 = value(str[i + 1])
+            if (s1 >= s2):
+                # Value of current symbol is greater or equal to the next symbol
+                res = res + s1
+                i = i + 1
+            else:
+                # Value of current symbol is greater or equal to the next symbol
+                res = res + s2 - s1
+                i = i + 2
+        else:
+            res = res + s1
+            i = i + 1
+    return res
+
+_int_to_month = {
+    "fr": {
+        1: "janvier",
+        2: "février",
+        3: "mars",
+        4: "avril",
+        5: "mai",
+        6: "juin",
+        7: "juillet",
+        8: "août",
+        9: "septembre",
+        10: "octobre",
+        11: "novembre",
+        12: "décembre",
+    },
+    "en": {
+        1: "january",
+        2: "february",
+        3: "march",
+        4: "april",
+        5: "may",
+        6: "june",
+        7: "july",
+        8: "august",
+        9: "september",
+        10: "october",
+        11: "november",
+        12: "december",
+    }
+}
+
+_currencies = ["€", "$", "£", "¥"]
+
+_symbol_to_word = {
+    "fr": {
+        "%": "pour cents",
+        "÷": "divisé par",
+        "\*": "fois", # ?
+        "×": "fois",
+        "±": "plus ou moins",
+        "\+": "plus",        
+        "&": "et",
+        "@": "arobase",
+        "m²": "mètres carrés",
+        "m³": "mètres cubes",
+        "²": "au carré",
+        "³": "au cube",
+        "¼": "un quart",
+        "½": "un demi",
+        "¾": "trois quarts",
+        "§": "section",
+        "°C": "degrés Celsius",
+        "°F": "degrés Fahrenheit",
+        "°K": "kelvins",
+        "°": "degrés",
+        "€": "euros",
+        "¢": "cents",
+        "\$": "dollars",
+        "£": "livres",
+        "¥": "yens",
+        # Below: not in Whisper tokens
+        #"₩": "wons",
+        #"₽": "roubles",
+        #"₹": "roupies",
+        #"₺": "liras",
+        #"₪": "shekels",
+        #"₴": "hryvnias",
+        #"₮": "tugriks",
+        #"℃": "degrés Celsius",
+        #"℉": "degrés Fahrenheit",
+        # "Ω": "ohms",
+        # "Ω": "ohms",
+        # "K": "kelvins",
+        # "ℓ": "litres",
+    },
+    "en": {
+        "%": "percent",
+        "÷": "divided by",
+        "\*": "times", # ?
+        "×": "times",
+        "±": "plus or minus",
+        "\+": "plus",
+        "&": "and",
+        "@": "at",
+        "m²": "square meters",
+        "m³": "cubic meters",
+        "²": "squared",
+        "³": "cubed",
+        "¼": "one quarter",
+        "½": "one half",
+        "¾": "three quarters",
+        "§": "section",
+        "°C": "degrees Celsius",
+        "°F": "degrees Fahrenheit",
+        "°K": "kelvins",
+        "°": "degrees",
+        "€": "euros",
+        "¢": "cents",
+        "\$": "dollars",
+        "£": "pounds",
+        "¥": "yens",
+    }
+}
+
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 6956161..1e35c91 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -38,3 +38,9 @@ def load_wave_buffer(file_buffer):
     audio = torch.from_numpy(file_content.data.astype(np.float32)/32768)
     audio = audio.transpose(0,1)
     return conform_audio(audio, sample_rate)
+
+def flatten(l):
+    """
+    flatten a list of lists
+    """
+    return [item for sublist in l for item in sublist]
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index 974e528..2180d0d 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -1,10 +1,11 @@
-import unicodedata
 from dataclasses import dataclass
 import torch
 
 from stt import logger
 from .alignment_model import speechbrain_compute_log_probas as compute_log_probas
 from .alignment_model import speechbrain_get_vocab as get_vocab
+from .utils import flatten
+from .text_normalize import transliterate, remove_punctuation
 
 
 def compute_alignment(audio, transcript, model):
@@ -13,10 +14,12 @@ def compute_alignment(audio, transcript, model):
     emission = compute_log_probas(model, audio)
     labels, blank_id = get_vocab(model)
     labels = labels[:emission.shape[1]]
+    labels[blank_id] = " "
     dictionary = {c: i for i, c in enumerate(labels)}
 
-    tokens = [loose_get_char_index(dictionary, c, blank_id) for c in transcript]
-    tokens = [i for i in tokens if i is not None]
+    tokens = [loose_get_char_index(dictionary, c) for c in transcript]
+    tokens = flatten(tokens)
+    transcript = "".join([labels[i][0] for i in tokens]) # Make sure transcript has the same length as tokens (could be different because of transliteration "œ" -> "oe")
 
     trellis = get_trellis(emission, tokens, blank_id = blank_id)
 
@@ -28,25 +31,32 @@ def compute_alignment(audio, transcript, model):
 
     return labels, emission, trellis, segments, word_segments
 
-def loose_get_char_index(dictionary, c, default):
-        i = dictionary.get(c, None)
+def loose_get_char_index(dictionary, c):
+    i = dictionary.get(c, None)
+    if i is None:
+        # Try with alternative versions of the character
+        tc = transliterate(c)
+        other_char = list(set([c.lower(), c.upper(), tc, tc.lower(), tc.upper()]))
+        for c2 in other_char:
+            i = dictionary.get(c2, None)
+            if i is not None:
+                i = [i]
+                break
+        # Some transliterated versions may correspond to multiple characters
         if i is None:
-            other_char = list(set([c.lower(), c.upper(), transliterate(c), transliterate(c).lower(), transliterate(c).upper()]))
             for c2 in other_char:
-                i = dictionary.get(c2, None)
-                if i is not None:
-                    break
-            if i is None:
-                logger.warn("Cannot find label " + " / ".join(list(set([c] + other_char))))
-                i = default
-        return i
-
-def transliterate(c):
-    # Transliterates a character to its closest ASCII equivalent.
-    # For example, "é" becomes "e".
-    # This is useful for converting Vietnamese text to ASCII.
-    # See https://stackoverflow.com/a/517974/446579
-    return unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii")
+                if len(c2) > 1:
+                    candidate = [dictionary[c3] for c3 in c2 if c3 in dictionary]
+                    if len(candidate) > 0 and (i is None or len(candidate) > len(i)):
+                        i = candidate
+        # If still not found
+        if i is None:
+            logger.warn("Cannot find label " + " / ".join(list(set([c] + other_char))))
+            i = [] # [default] # Could be [] ...
+    else:
+        i = [i]
+    return i
+
 
 def get_trellis(emission, tokens, blank_id=0, use_max = False):
     num_frame = emission.size(0)

From d09d704c62586a5d6e56a6fdc6064e0ee5006918 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 18:32:05 +0100
Subject: [PATCH 089/172] set logging level at the right place

---
 http_server/ingress.py     | 5 ++++-
 stt/processing/__init__.py | 3 ---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index 6ccd090..db739d4 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -12,6 +12,7 @@
 from swagger import setupSwaggerUI
 
 from stt.processing import decode, load_wave_buffer, model, alignment_model
+from stt import logger as stt_logger
 
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
@@ -96,7 +97,9 @@ def server_error(error):
 
     parser = createParser()
     args = parser.parse_args()
-    logger.setLevel(logging.DEBUG if args.debug else logging.INFO)
+    logger_level = logging.DEBUG if args.debug else logging.INFO
+    logger.setLevel(logger_level)
+    stt_logger.setLevel(logger_level)
     try:
         # Setup SwaggerUI
         if args.swagger_path is not None:
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 19d7f0e..dc3a6a6 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -13,9 +13,6 @@
 
 __all__ = ["logger", "decode", "model", "alignment_model", "load_audiofile", "load_wave_buffer"]
 
-# Set logger level
-logger.setLevel(logging.INFO)
-
 # Set device
 device = os.environ.get("DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
 try:

From e186990f14f3bc643e8d0a3cd4b9cf2896198433 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 18:44:18 +0100
Subject: [PATCH 090/172] Robustness to corner cases (no transcription, too
 long transcription from Whisper, emojis...)

---
 stt/processing/decoding.py       | 47 ++++++++++++++++++++++----------
 stt/processing/text_normalize.py |  4 +++
 stt/processing/word_alignment.py | 34 +++++++++++++++++++----
 3 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index dc8d95a..48d1b66 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -8,7 +8,7 @@
 
 from stt import logger
 from .word_alignment import compute_alignment
-from .text_normalize import remove_punctuation, normalize_text
+from .text_normalize import remove_punctuation, normalize_text, remove_emoji
 
 # TODO: understand and remove this limitations
 torch.set_num_threads(1)
@@ -26,6 +26,7 @@ def decode(audio: torch.Tensor,
     logprob_threshold: float = -1.0,
     compression_ratio_threshold: float = 2.4,
     normalize_text_as_words = False,
+    remove_punctuation_from_words = False,
     ) -> dict:
     """Transcribe the audio data using Whisper with the defined model."""
     result = {"text": "", "confidence-score": 0.0, "words": []}
@@ -47,18 +48,23 @@ def decode(audio: torch.Tensor,
         compression_ratio_threshold = compression_ratio_threshold
     )
 
-    text = whisper_res["text"].strip()
+    text = whisper_res["text"]
+    text = remove_emoji(text).strip()
     if normalize_text_as_words:
         text = normalize_text(text, language)
-        text = remove_punctuation(text)
+        if remove_punctuation_from_words:
+            text = remove_punctuation(text)
     segments = whisper_res["segments"]
+    if language is None:
+        language = whisper_res["language"]
 
     result["text"] = text
-    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean()
+    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0
     if not with_word_timestamps:
         if not normalize_text_as_words:
             text = normalize_text(text, language)
-            text = remove_punctuation(text)
+            if remove_punctuation_from_words:
+                text = remove_punctuation(text)
         result["words"] = text.split()
     else:
         # Compute word timestamps
@@ -70,23 +76,34 @@ def decode(audio: torch.Tensor,
             end = min(max_t, round(segment["end"] * SAMPLE_RATE))
             sub_audio = audio[start:end]
             sub_text = segment["text"]
+            logger.debug(f"Aligning text: {sub_text}")
+            sub_text = remove_emoji(sub_text).strip()
             sub_text = normalize_text(sub_text, language)
-            sub_text = remove_punctuation(sub_text)
+            if remove_punctuation_from_words:
+                sub_text = remove_punctuation(sub_text)
             if not sub_text:
                 logger.warn(f"Lost text in segment {segment['start']}-{segment['end']}")
                 continue
             labels, emission, trellis, segments, word_segments = compute_alignment(sub_audio, sub_text, alignment_model)
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
-            assert len(sub_words) == len(word_segments), \
-                f"Unexpected number of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}"
-            for word, segment in zip(sub_words, word_segments):
-                result["words"].append({
-                    "word": word,
-                    "start": segment.start * ratio + offset,
-                    "end": segment.end * ratio + offset,
-                    "conf": segment.score,
-                })
+            if len(sub_words) == len(word_segments):
+                for word, segment in zip(sub_words, word_segments):
+                    result["words"].append({
+                        "word": word,
+                        "start": segment.start * ratio + offset,
+                        "end": segment.end * ratio + offset,
+                        "conf": segment.score,
+                    })
+            else:
+                logger.warn(f"Alignment failed. Results might differ on some words.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
+                for segment in word_segments:
+                    result["words"].append({
+                        "word": segment.label,
+                        "start": segment.start * ratio + offset,
+                        "end": segment.end * ratio + offset,
+                        "conf": segment.score,
+                    })
 
     return result
 
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 5ba3358..af9fdbd 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -30,6 +30,10 @@ def transliterate(c):
     c = re.sub("ß", "ss", c)
     return unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii")
 
+def remove_emoji(text):
+    # Remove emojis
+    return re.sub(r"[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F1E0-\U0001F1FF]+", "", text)
+
 
 def normalize_text(text: str, lang: str) -> str:
     """ Transform digits into characters... """
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index 2180d0d..34bacf0 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -1,3 +1,6 @@
+"""
+source: https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html
+"""
 from dataclasses import dataclass
 import torch
 
@@ -5,7 +8,7 @@
 from .alignment_model import speechbrain_compute_log_probas as compute_log_probas
 from .alignment_model import speechbrain_get_vocab as get_vocab
 from .utils import flatten
-from .text_normalize import transliterate, remove_punctuation
+from .text_normalize import transliterate
 
 
 def compute_alignment(audio, transcript, model):
@@ -17,9 +20,24 @@ def compute_alignment(audio, transcript, model):
     labels[blank_id] = " "
     dictionary = {c: i for i, c in enumerate(labels)}
 
-    tokens = [loose_get_char_index(dictionary, c) for c in transcript]
+    default = labels.index("-") if "-" in labels else None
+    tokens = [loose_get_char_index(dictionary, c, default) for c in transcript]
     tokens = flatten(tokens)
-    transcript = "".join([labels[i][0] for i in tokens]) # Make sure transcript has the same length as tokens (could be different because of transliteration "œ" -> "oe")
+
+    num_emissions = emission.shape[0]
+    num_repetitions = count_repetitions(tokens)
+    if len(tokens) + num_repetitions > num_emissions:
+        # It will be impossible to find a path...
+        # It can happen when Whisper is lost in a loop (ex: "Ha ha ha ha ...")
+        logger.warn(f"Got too many characters from Whisper. Shrinking to the first characters.")
+        tokens = tokens[:num_emissions]
+        num_repetitions = count_repetitions(tokens)
+        while len(tokens) + num_repetitions > num_emissions:
+            tokens = tokens[:-1]
+            num_repetitions = count_repetitions(tokens)
+
+    # Make sure transcript has the same length as tokens (it could be different just because of transliteration "œ" -> "oe")
+    transcript = "".join([labels[i][0] for i in tokens])
 
     trellis = get_trellis(emission, tokens, blank_id = blank_id)
 
@@ -31,7 +49,10 @@ def compute_alignment(audio, transcript, model):
 
     return labels, emission, trellis, segments, word_segments
 
-def loose_get_char_index(dictionary, c):
+def count_repetitions(tokens):
+    return sum([a==b for a,b in zip(tokens[1:], tokens[:-1])])
+
+def loose_get_char_index(dictionary, c, default = None):
     i = dictionary.get(c, None)
     if i is None:
         # Try with alternative versions of the character
@@ -52,7 +73,7 @@ def loose_get_char_index(dictionary, c):
         # If still not found
         if i is None:
             logger.warn("Cannot find label " + " / ".join(list(set([c] + other_char))))
-            i = [] # [default] # Could be [] ...
+            i = [default] if default is not None else []
     else:
         i = [i]
     return i
@@ -124,7 +145,8 @@ def backtrack(trellis, emission, tokens, blank_id=0):
             if j == 0:
                 break
     else:
-        raise ValueError("Failed to align")
+        logger.warn(f"Failed to align {len(tokens)} tokens")
+        return path
     return path[::-1]
 
 

From f2d33d570e9f27b9bfe28b7e6b1ca05585469580 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 18:50:57 +0100
Subject: [PATCH 091/172] remove unused stuff

---
 http_server/confparser.py | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/http_server/confparser.py b/http_server/confparser.py
index 2396d71..d296dbb 100644
--- a/http_server/confparser.py
+++ b/http_server/confparser.py
@@ -7,24 +7,6 @@
 def createParser() -> argparse.ArgumentParser:
     parser = argparse.ArgumentParser()
 
-    # SERVICE
-    parser.add_argument(
-        "--service_name",
-        type=str,
-        help="Service Name",
-        default=os.environ.get("SERVICE_NAME", "stt"),
-    )
-
-    # MODELS
-    parser.add_argument("--am_path", type=str, help="Acoustic Model Path", default="/opt/models/AM")
-    parser.add_argument("--lm_path", type=str, help="Decoding graph path", default="/opt/models/LM")
-    parser.add_argument(
-        "--config_path",
-        type=str,
-        help="Configuration files path",
-        default="/opt/config",
-    )
-
     # GUNICORN
     parser.add_argument("--service_port", type=int, help="Service port", default=80)
     parser.add_argument(

From 3b6a839586a56eeba64e131c29751976affcb760 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 19:08:42 +0100
Subject: [PATCH 092/172] Less cryptic warning

---
 stt/processing/word_alignment.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index 34bacf0..fc16af4 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -72,7 +72,7 @@ def loose_get_char_index(dictionary, c, default = None):
                         i = candidate
         # If still not found
         if i is None:
-            logger.warn("Cannot find label " + " / ".join(list(set([c] + other_char))))
+            logger.warn("Character not correctly handled by alignment model: '" + "' / '".join(list(set([c] + other_char))) + "'")
             i = [default] if default is not None else []
     else:
         i = [i]

From b4bcd9a982062e04cf6f89f21686d7430621022b Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 22 Dec 2022 21:27:21 +0100
Subject: [PATCH 093/172] PEP8 formatting

---
 stt/processing/__init__.py        |  26 +++--
 stt/processing/alignment_model.py |  29 +++---
 stt/processing/decoding.py        |  54 +++++-----
 stt/processing/load_model.py      |  32 +++---
 stt/processing/text_normalize.py  | 159 +++++++++++++++++++-----------
 stt/processing/utils.py           |  10 +-
 stt/processing/word_alignment.py  |  45 +++++----
 7 files changed, 220 insertions(+), 135 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index dc3a6a6..81bd784 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -11,33 +11,41 @@
 
 from .load_model import load_whisper_model, load_speechbrain_model
 
-__all__ = ["logger", "decode", "model", "alignment_model", "load_audiofile", "load_wave_buffer"]
+__all__ = ["logger", "decode", "model", "alignment_model",
+           "load_audiofile", "load_wave_buffer"]
 
 # Set device
-device = os.environ.get("DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
+device = os.environ.get(
+    "DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
 try:
     device = torch.device(device)
 except Exception as err:
     raise Exception("Failed to set device: {}".format(str(err))) from err
 
 # Check language
-available_languages = [k for k,v in whisper.tokenizer.LANGUAGES.items()] + [None]
+available_languages = [
+    k for k, v in whisper.tokenizer.LANGUAGES.items()] + [None]
 if get_default_language() not in available_languages:
-    raise RuntimeError(f"Langaue {get_default_language()} is not available. Available languages are: {available_languages}")
+    raise RuntimeError(
+        f"Language {get_default_language()} is not available. Available languages are: {available_languages}")
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
-logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.isfile(model_type) else 'remote'})...")
+logger.info(
+    f"Loading Whisper model {model_type} ({'local' if os.path.isfile(model_type) else 'remote'})...")
 start = time()
 try:
-    model = load_whisper_model(model_type, device = device)
+    model = load_whisper_model(model_type, device=device)
 except Exception as err:
-    raise Exception("Failed to load transcription model: {}".format(str(err))) from err
+    raise Exception(
+        "Failed to load transcription model: {}".format(str(err))) from err
 logger.info("Model loaded. (t={}s)".format(time() - start))
 
 # Load alignment model
-alignment_model_type = os.environ.get("ALIGNMENT_MODEL_TYPE", "/opt/linSTT_speechbrain_fr-FR_v1.0.0")
+alignment_model_type = os.environ.get(
+    "ALIGNMENT_MODEL_TYPE", "/opt/linSTT_speechbrain_fr-FR_v1.0.0")
 logger.info(f"Loading alignment model...")
 start = time()
-alignment_model = load_speechbrain_model(alignment_model_type, device = device, download_root = "/opt")
+alignment_model = load_speechbrain_model(
+    alignment_model_type, device=device, download_root="/opt")
 logger.info("Alignment Model loaded. (t={}s)".format(time() - start))
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index f6d52c8..309b7af 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -4,9 +4,11 @@
 
 from stt import logger
 
+
 def speechbrain_get_vocab(model):
     tokenizer = model.tokenizer
-    labels = [{'':" ", ' ⁇ ':"<pad>"}.get(i,i).lower() for i in tokenizer.decode([[i] for i in range(tokenizer.get_piece_size())])]
+    labels = [{'': " ", ' ⁇ ': "<pad>"}.get(i, i).lower() for i in tokenizer.decode(
+        [[i] for i in range(tokenizer.get_piece_size())])]
     blank_id = labels.index("<pad>")
     return labels, blank_id
 
@@ -14,13 +16,15 @@ def speechbrain_get_vocab(model):
 # The following limit is to handle the corner Case of too long audio segment (which is better to split it to avoid memory overflow).
 # But it is 2240400 / 16000 Hz ~ 140 seconds, which should not happen for segments detected by Whisper (usually one sentence).
 # Also note that Whisper works with 30 seconds segment, so there is chance that this limit is never reached.
-MAX_LEN = 2240400 
+MAX_LEN = 2240400
+
 
-def speechbrain_compute_log_probas(model, audios, max_len = MAX_LEN):
+def speechbrain_compute_log_probas(model, audios, max_len=MAX_LEN):
     # Single audio
     if not isinstance(audios, list):
         audios = [audios]
-        log_probas = speechbrain_compute_log_probas(model, audios, max_len = max_len)
+        log_probas = speechbrain_compute_log_probas(
+            model, audios, max_len=max_len)
         return log_probas[0]
 
     # Batch of audios (can occur when max_len is reached)
@@ -33,30 +37,33 @@ def speechbrain_compute_log_probas(model, audios, max_len = MAX_LEN):
         chunks = []
         i_audio = []
         for a in audios:
-            chunks.extend([a[i:min(i+max_len, len(a))] for i in range(0, len(a), max_len)])
+            chunks.extend([a[i:min(i+max_len, len(a))]
+                          for i in range(0, len(a), max_len)])
             i_audio.append(len(chunks))
             if len(chunks) > 1:
-                logger.warning("Audio too long, splitting into {} chunks for alignment".format(len(chunks)))
+                logger.warning(
+                    "Audio too long, splitting into {} chunks for alignment".format(len(chunks)))
         # Decode chunks of audio and concatenate results
         log_probas = [[] for i in range(len(audios))]
         for i in range(0, len(chunks), batch_size):
             chunk = chunks[i:min(i+batch_size, len(chunks))]
             log_probas_tmp = speechbrain_compute_log_probas(model, chunk)
-            for j in range(i,i+len(chunk)):
+            for j in range(i, i+len(chunk)):
                 k = 0
                 while j >= i_audio[k]:
                     k += 1
                 log_probas[k].append(log_probas_tmp[j-i])
-        log_probas = [torch.cat(p, dim = 0) for p in log_probas]
-        log_probas, wav_lens = pack_sequences(log_probas, device = model.device)
+        log_probas = [torch.cat(p, dim=0) for p in log_probas]
+        log_probas, wav_lens = pack_sequences(log_probas, device=model.device)
     else:
-        batch, wav_lens = pack_sequences(audios, device = model.device)
+        batch, wav_lens = pack_sequences(audios, device=model.device)
         log_probas = model.forward(batch, wav_lens)
 
     log_probas = torch.log_softmax(log_probas, dim=-1)
     return log_probas
 
-def pack_sequences(tensors, device = "cpu"):
+
+def pack_sequences(tensors, device="cpu"):
     if len(tensors) == 1:
         return tensors[0].unsqueeze(0).to(device), torch.Tensor([1.]).to(device)
     tensor = rnn_utils.pad_sequence(tensors, batch_first=True)
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 48d1b66..f257fb1 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -13,21 +13,23 @@
 # TODO: understand and remove this limitations
 torch.set_num_threads(1)
 
+
 def get_default_language():
     return os.environ.get("LANGUAGE", None)
 
+
 def decode(audio: torch.Tensor,
-    model: whisper.model.Whisper,
-    alignment_model: "Any",
-    with_word_timestamps: bool,
-    language: str = None,
-    beam_size: int = None,
-    no_speech_threshold: float = 0.6,
-    logprob_threshold: float = -1.0,
-    compression_ratio_threshold: float = 2.4,
-    normalize_text_as_words = False,
-    remove_punctuation_from_words = False,
-    ) -> dict:
+           model: whisper.model.Whisper,
+           alignment_model: "Any",
+           with_word_timestamps: bool,
+           language: str = None,
+           beam_size: int = None,
+           no_speech_threshold: float = 0.6,
+           logprob_threshold: float = -1.0,
+           compression_ratio_threshold: float = 2.4,
+           normalize_text_as_words=False,
+           remove_punctuation_from_words=False,
+           ) -> dict:
     """Transcribe the audio data using Whisper with the defined model."""
     result = {"text": "", "confidence-score": 0.0, "words": []}
 
@@ -39,14 +41,14 @@ def decode(audio: torch.Tensor,
     logger.info(f"Transcribing audio with language {language}...")
 
     whisper_res = model.transcribe(audio,
-        language = language,
-        fp16 = fp16,
-        temperature = 0.0, # For deterministic results
-        beam_size = beam_size,
-        no_speech_threshold = no_speech_threshold,
-        logprob_threshold = logprob_threshold,
-        compression_ratio_threshold = compression_ratio_threshold
-    )
+                                   language=language,
+                                   fp16=fp16,
+                                   temperature=0.0,  # For deterministic results
+                                   beam_size=beam_size,
+                                   no_speech_threshold=no_speech_threshold,
+                                   logprob_threshold=logprob_threshold,
+                                   compression_ratio_threshold=compression_ratio_threshold
+                                   )
 
     text = whisper_res["text"]
     text = remove_emoji(text).strip()
@@ -59,7 +61,8 @@ def decode(audio: torch.Tensor,
         language = whisper_res["language"]
 
     result["text"] = text
-    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0
+    result["confidence-score"] = np.exp(np.array([r["avg_logprob"]
+                                        for r in segments])).mean() if len(segments) else 0.0
     if not with_word_timestamps:
         if not normalize_text_as_words:
             text = normalize_text(text, language)
@@ -82,9 +85,11 @@ def decode(audio: torch.Tensor,
             if remove_punctuation_from_words:
                 sub_text = remove_punctuation(sub_text)
             if not sub_text:
-                logger.warn(f"Lost text in segment {segment['start']}-{segment['end']}")
+                logger.warn(
+                    f"Lost text in segment {segment['start']}-{segment['end']}")
                 continue
-            labels, emission, trellis, segments, word_segments = compute_alignment(sub_audio, sub_text, alignment_model)
+            labels, emission, trellis, segments, word_segments = compute_alignment(
+                sub_audio, sub_text, alignment_model)
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
             if len(sub_words) == len(word_segments):
@@ -96,7 +101,8 @@ def decode(audio: torch.Tensor,
                         "conf": segment.score,
                     })
             else:
-                logger.warn(f"Alignment failed. Results might differ on some words.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
+                logger.warn(
+                    f"Alignment failed. Results might differ on some words.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
                 for segment in word_segments:
                     result["words"].append({
                         "word": segment.label,
@@ -106,5 +112,3 @@ def decode(audio: torch.Tensor,
                     })
 
     return result
-
-
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index da5d98c..27fdf9a 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -5,31 +5,39 @@
 import huggingface_hub
 import speechbrain as sb
 
-def load_whisper_model(model_type_or_file, device = "cpu", download_root = "/opt"):
 
-    model = whisper.load_model(model_type_or_file, device = device, download_root = os.path.join(download_root, "whisper"))
+def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
+
+    model = whisper.load_model(model_type_or_file, device=device,
+                               download_root=os.path.join(download_root, "whisper"))
 
     model.eval()
     model.requires_grad_(False)
     return model
 
-def load_speechbrain_model(source, device = "cpu", download_root = "/opt"):
-    
+
+def load_speechbrain_model(source, device="cpu", download_root="/opt"):
+
     if os.path.isdir(source):
         yaml_file = os.path.join(source, "hyperparams.yaml")
-        assert os.path.isfile(yaml_file), f"Hyperparams file {yaml_file} not found"
+        assert os.path.isfile(
+            yaml_file), f"Hyperparams file {yaml_file} not found"
     else:
         try:
-            yaml_file = huggingface_hub.hf_hub_download(repo_id=source, filename="hyperparams.yaml", cache_dir = os.path.join(download_root, "huggingface/hub"))
+            yaml_file = huggingface_hub.hf_hub_download(
+                repo_id=source, filename="hyperparams.yaml", cache_dir=os.path.join(download_root, "huggingface/hub"))
         except requests.exceptions.HTTPError:
             yaml_file = None
-    overrides = make_yaml_overrides(yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
+    overrides = make_yaml_overrides(
+        yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
 
     savedir = os.path.join(download_root, "speechbrain")
     try:
-        model = sb.pretrained.EncoderASR.from_hparams(source = source, run_opts= {"device": device}, savedir = savedir, overrides = overrides)
+        model = sb.pretrained.EncoderASR.from_hparams(
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
     except ValueError:
-        model = sb.pretrained.EncoderDecoderASR.from_hparams(source = source, run_opts= {"device": device}, savedir = savedir, overrides = overrides)
+        model = sb.pretrained.EncoderDecoderASR.from_hparams(
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
 
     model.train(False)
     model.requires_grad_(False)
@@ -42,7 +50,8 @@ def make_yaml_overrides(yaml_file, key_values):
     yaml_file: path to yaml file
     key_values: dict of key values to override
     """
-    if yaml_file is None: return None
+    if yaml_file is None:
+        return None
 
     override = {}
     with open(yaml_file, "r") as f:
@@ -58,5 +67,6 @@ def make_yaml_overrides(yaml_file, key_values):
             elif ":" in line:
                 child = line.strip().split(":")[0].strip()
                 if child in key_values:
-                    override[parent] = override.get(parent, {}) | {child: key_values[child]}
+                    override[parent] = override.get(parent, {}) | {
+                        child: key_values[child]}
     return override
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index af9fdbd..7e2f6fb 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -1,25 +1,30 @@
 import math
 import re
-#import string
+# import string
 import unicodedata
 from num2words import num2words
 
 from stt import logger
 from .utils import flatten
 
-_punctuations = '!"#$%&()*+,/:;<=>?@[\\]^_`{|}~«»¿' # string.punctuation, plus Whisper specific "«»¿", minus apostrophe "'", dash "-", and dot "." (which will be processed as special)
+# string.punctuation, plus Whisper specific "«»¿", minus apostrophe "'", dash "-", and dot "." (which will be processed as special)
+_punctuations = '!"#$%&()*+,/:;<=>?@[\\]^_`{|}~«»¿'
+
 
 def remove_punctuation(text: str) -> str:
     text = text.translate(str.maketrans("", "", _punctuations))
     # We don't remove dots inside words (e.g. "ab@gmail.com")
-    text = re.sub(r"\.(\s)",r"\1", text+" ").strip()
+    text = re.sub(r"\.(\s)", r"\1", text+" ").strip()
     return collapse_whitespace(text)
 
+
 _whitespace_re = re.compile(r'[^\S\r\n]+')
 
+
 def collapse_whitespace(text):
     return re.sub(_whitespace_re, ' ', text).strip()
 
+
 def transliterate(c):
     # Transliterates a character to its closest ASCII equivalent.
     # Example: transliterate("à ß œ ﬂ") = "a ss oe fl"
@@ -30,6 +35,7 @@ def transliterate(c):
     c = re.sub("ß", "ss", c)
     return unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii")
 
+
 def remove_emoji(text):
     # Remove emojis
     return re.sub(r"[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F1E0-\U0001F1FF]+", "", text)
@@ -42,43 +48,54 @@ def normalize_text(text: str, lang: str) -> str:
     coma = "," if lang in ["fr"] else "\."
     for c in _currencies:
         if c in text:
-            text = re.sub(r"\b(\d+)" + coma + r"(\d+)\s*" + c, r"\1 " + c + r" \2", text)
-    
+            text = re.sub(r"\b(\d+)" + coma + r"(\d+)\s*" +
+                          c, r"\1 " + c + r" \2", text)
+
     # Roman digits
     if re.search(r"[IVX]", text):
         if lang == "en":
-            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|st|nd|rd|th)?\b", text)
+            digits = re.findall(
+                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|st|nd|rd|th)?\b", text)
             digits = ["".join(d) for d in digits]
         elif lang == "fr":
-            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text)
+            digits = re.findall(
+                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text)
             digits = ["".join(d) for d in digits]
         else:
             digits = []
         if digits:
-            digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+            digits = sorted(list(set(digits)), reverse=True,
+                            key=lambda x: (len(x), x))
             for s in digits:
                 filtered = re.sub("[a-z]", "", s)
                 ordinal = filtered != s
                 digit = roman_to_decimal(filtered)
-                v = undigit(str(digit), lang=lang, to= "ordinal" if ordinal else "cardinal")
+                v = undigit(str(digit), lang=lang,
+                            to="ordinal" if ordinal else "cardinal")
                 text = re.sub(r"\b" + s + r"\b", v, text)
 
     # Ordinal digits
     if lang == "en":
-        digits = re.findall(r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:º|th)\b", text)
+        digits = re.findall(
+            r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:º|th)\b", text)
     elif lang == "fr":
-        digits = re.findall(r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
+        digits = re.findall(
+            r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
     else:
-        logger.warn(f"Language {lang} not supported for normalization. Some words might be mis-localized.")
+        logger.warn(
+            f"Language {lang} not supported for normalization. Some words might be mis-localized.")
         digits = []
     if digits:
-        digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
+        digits = sorted(list(set(digits)), reverse=True,
+                        key=lambda x: (len(x), x))
         for digit in digits:
-            word = undigit(re.findall(r"\d+", digit)[0], to= "ordinal", lang = lang)
+            word = undigit(re.findall(r"\d+", digit)
+                           [0], to="ordinal", lang=lang)
             text = re.sub(r'\b'+str(digit)+r'\b', word, text)
 
     # Cardinal digits
-    digits = re.findall(r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)",text)
+    digits = re.findall(
+        r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)", text)
     digits = list(map(lambda s: s.strip(r"[/ ]"), digits))
     digits = list(set(digits))
     digits = digits + flatten([c.split() for c in digits if " " in c])
@@ -90,8 +107,8 @@ def normalize_text(text: str, lang: str) -> str:
             continue
         numslash = len(re.findall("/", digitf))
         if numslash == 0:
-            word = undigit(digitf, lang = lang)
-        elif numslash == 1: # Fraction or date
+            word = undigit(digitf, lang=lang)
+        elif numslash == 1:  # Fraction or date
             i = digitf.index("/")
             is_date = False
             if len(digitf[i+1:]) == 2:
@@ -99,19 +116,22 @@ def normalize_text(text: str, lang: str) -> str:
                     first = int(digitf[:i])
                     second = int(digitf[i+1:])
                     is_date = first > 0 and first < 32 and second > 0 and second < 13
-                except: pass
+                except:
+                    pass
             if is_date:
                 first = digitf[:i].lstrip("0")
-                use_ordinal = (lang == "fr" and first == "1") or (lang != "fr" and first[-1] in ["1", "2", "3"])
-                first = undigit(first, lang = lang, to="ordinal" if use_ordinal else "cardinal")
+                use_ordinal = (lang == "fr" and first == "1") or (
+                    lang != "fr" and first[-1] in ["1", "2", "3"])
+                first = undigit(first, lang=lang,
+                                to="ordinal" if use_ordinal else "cardinal")
                 second = _int_to_month[second]
             else:
-                first = undigit(digitf[:i], lang = lang)
-                second = undigit(digitf[i+1:], to="denominator", lang = lang)
+                first = undigit(digitf[:i], lang=lang)
+                second = undigit(digitf[i+1:], to="denominator", lang=lang)
                 if float(digitf[:i]) > 2. and second[-1] != "s":
                     second += "s"
             word = first + " " + second
-        elif numslash == 2: # Maybe a date
+        elif numslash == 2:  # Maybe a date
             i1 = digitf.index("/")
             i2 = digitf.index("/", i1+1)
             is_date = False
@@ -121,18 +141,24 @@ def normalize_text(text: str, lang: str) -> str:
                     second = int(digitf[i1+1:i2])
                     third = int(digitf[i2+1:])
                     is_date = first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
-                except: pass
-            third = undigit(digitf[i2+1:], lang = lang)
+                except:
+                    pass
+            third = undigit(digitf[i2+1:], lang=lang)
             if is_date:
                 first = digitf[:i].lstrip("0")
-                use_ordinal = (lang == "fr" and first == "1") or (lang != "fr" and first[-1] in ["1", "2", "3"])
-                first = undigit(first, lang = lang, to="ordinal" if use_ordinal else "cardinal")
-                second = _int_to_month.get(lang, {}).get(int(digitf[i1+1:i2]), digitf[i1+1:i2])
+                use_ordinal = (lang == "fr" and first == "1") or (
+                    lang != "fr" and first[-1] in ["1", "2", "3"])
+                first = undigit(first, lang=lang,
+                                to="ordinal" if use_ordinal else "cardinal")
+                second = _int_to_month.get(lang, {}).get(
+                    int(digitf[i1+1:i2]), digitf[i1+1:i2])
                 word = " ".join([first, second, third])
             else:
-                word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+                word = " / ".join([undigit(s, lang=lang)
+                                  for s in digitf.split('/')])
         else:
-            word = " / ".join([undigit(s, lang = lang) for s in digitf.split('/')])
+            word = " / ".join([undigit(s, lang=lang)
+                              for s in digitf.split('/')])
         if " " in digit:
             text = re.sub(r'\b'+str(digit)+r'\b', " "+word+" ", text)
         else:
@@ -145,37 +171,52 @@ def normalize_text(text: str, lang: str) -> str:
 
     return collapse_whitespace(text)
 
+
 def undigit(str, lang, to="cardinal"):
-    str = re.sub(" ","", str)
+    str = re.sub(" ", "", str)
     if to == "denominator":
-        assert lang == "fr"
-        if str == "2": return "demi"
-        if str == "3": return "tiers"
-        if str == "4": return "quart"
+        if lang == "fr":
+            if str == "2":
+                return "demi"
+            if str == "3":
+                return "tiers"
+            if str == "4":
+                return "quart"
+        elif lang == "en":
+            if str == "2":
+                return "half"
+            if str == "4":
+                return "quarter"
+        elif lang == "es":
+            if str == "2":
+                return "mitad"
+            if str == "3":
+                return "tercio"
         to = "ordinal"
     if str.startswith("0") and to == "cardinal":
         numZeros = len(re.findall(r"0+", str)[0])
         if numZeros < len(str):
-            return numZeros * (my_num2words(0, lang=lang, to="cardinal")+" ") + my_num2words(float(str), lang=lang, to=to)
-    return my_num2words(float(str), lang=lang, to=to)
+            return numZeros * (robust_num2words(0, lang=lang)+" ") + robust_num2words(float(str), lang=lang, to=to)
+    return robust_num2words(float(str), lang=lang, to=to)
 
 
-def my_num2words(x, lang, to = "cardinal", orig = ""):
+def robust_num2words(x, lang, to="cardinal", orig=""):
     """
     Bugfix for num2words
     """
     try:
+        res = num2words(x, lang=lang, to=to)
         if lang == "fr" and to == "ordinal":
-            return num2words(x, lang=lang, to=to).replace("vingtsième", "vingtième")
-        else:
-            return num2words(x, lang=lang, to=to)
+            res = res.replace("vingtsième", "vingtième")
+        return res
     except OverflowError:
-        if x == math.inf: # !
-            return " ".join(my_num2words(xi, lang=lang, to=to) for xi in orig)
-        if x == -math.inf: # !
-            return "moins " + my_num2words(-x, lang=lang, to=to, orig=orig.replace("-" , ""))
+        if x == math.inf:  # !
+            return " ".join(robust_num2words(xi, lang=lang, to=to) for xi in orig)
+        if x == -math.inf:  # !
+            return "moins " + robust_num2words(-x, lang=lang, to=to, orig=orig.replace("-", ""))
         # TODO: print a warning
-        return my_num2words(x//10, lang=lang, to=to)
+        return robust_num2words(x//10, lang=lang, to=to)
+
 
 def roman_to_decimal(str):
     def value(r):
@@ -214,6 +255,7 @@ def value(r):
             i = i + 1
     return res
 
+
 _int_to_month = {
     "fr": {
         1: "janvier",
@@ -251,10 +293,10 @@ def value(r):
     "fr": {
         "%": "pour cents",
         "÷": "divisé par",
-        "\*": "fois", # ?
+        "\*": "fois",  # ?
         "×": "fois",
         "±": "plus ou moins",
-        "\+": "plus",        
+        "\+": "plus",
         "&": "et",
         "@": "arobase",
         "m²": "mètres carrés",
@@ -275,15 +317,15 @@ def value(r):
         "£": "livres",
         "¥": "yens",
         # Below: not in Whisper tokens
-        #"₩": "wons",
-        #"₽": "roubles",
-        #"₹": "roupies",
-        #"₺": "liras",
-        #"₪": "shekels",
-        #"₴": "hryvnias",
-        #"₮": "tugriks",
-        #"℃": "degrés Celsius",
-        #"℉": "degrés Fahrenheit",
+        # "₩": "wons",
+        # "₽": "roubles",
+        # "₹": "roupies",
+        # "₺": "liras",
+        # "₪": "shekels",
+        # "₴": "hryvnias",
+        # "₮": "tugriks",
+        # "℃": "degrés Celsius",
+        # "℉": "degrés Fahrenheit",
         # "Ω": "ohms",
         # "Ω": "ohms",
         # "K": "kelvins",
@@ -292,7 +334,7 @@ def value(r):
     "en": {
         "%": "percent",
         "÷": "divided by",
-        "\*": "times", # ?
+        "\*": "times",  # ?
         "×": "times",
         "±": "plus or minus",
         "\+": "plus",
@@ -317,4 +359,3 @@ def value(r):
         "¥": "yens",
     }
 }
-
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 1e35c91..5ff706a 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -6,10 +6,12 @@
 import torchaudio
 import whisper
 
-def conform_audio(audio, sample_rate = 16_000):
+
+def conform_audio(audio, sample_rate=16_000):
     if sample_rate != whisper.audio.SAMPLE_RATE:
         # Down or Up sample to the right sampling rate
-        audio = torchaudio.transforms.Resample(sample_rate, whisper.audio.SAMPLE_RATE)(audio)
+        audio = torchaudio.transforms.Resample(
+            sample_rate, whisper.audio.SAMPLE_RATE)(audio)
     if audio.shape[0] > 1:
         # Stereo to mono
         # audio = torchaudio.transforms.DownmixMono()(audio, channels_first = True)
@@ -18,6 +20,7 @@ def conform_audio(audio, sample_rate = 16_000):
         audio = audio.squeeze(0)
     return audio
 
+
 def load_audiofile(path):
     if not os.path.isfile(path):
         raise RuntimeError("File not found: %s" % path)
@@ -36,9 +39,10 @@ def load_wave_buffer(file_buffer):
     file_content = wavio.read(file_buffer_io)
     sample_rate = file_content.rate
     audio = torch.from_numpy(file_content.data.astype(np.float32)/32768)
-    audio = audio.transpose(0,1)
+    audio = audio.transpose(0, 1)
     return conform_audio(audio, sample_rate)
 
+
 def flatten(l):
     """
     flatten a list of lists
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index fc16af4..7dd0c8f 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -29,7 +29,8 @@ def compute_alignment(audio, transcript, model):
     if len(tokens) + num_repetitions > num_emissions:
         # It will be impossible to find a path...
         # It can happen when Whisper is lost in a loop (ex: "Ha ha ha ha ...")
-        logger.warn(f"Got too many characters from Whisper. Shrinking to the first characters.")
+        logger.warn(
+            f"Got too many characters from Whisper. Shrinking to the first characters.")
         tokens = tokens[:num_emissions]
         num_repetitions = count_repetitions(tokens)
         while len(tokens) + num_repetitions > num_emissions:
@@ -39,25 +40,28 @@ def compute_alignment(audio, transcript, model):
     # Make sure transcript has the same length as tokens (it could be different just because of transliteration "œ" -> "oe")
     transcript = "".join([labels[i][0] for i in tokens])
 
-    trellis = get_trellis(emission, tokens, blank_id = blank_id)
+    trellis = get_trellis(emission, tokens, blank_id=blank_id)
+
+    path = backtrack(trellis, emission, tokens, blank_id=blank_id)
 
-    path = backtrack(trellis, emission, tokens, blank_id = blank_id)
-    
     segments = merge_repeats(transcript, path)
 
     word_segments = merge_words(segments)
 
     return labels, emission, trellis, segments, word_segments
 
+
 def count_repetitions(tokens):
-    return sum([a==b for a,b in zip(tokens[1:], tokens[:-1])])
+    return sum([a == b for a, b in zip(tokens[1:], tokens[:-1])])
+
 
-def loose_get_char_index(dictionary, c, default = None):
+def loose_get_char_index(dictionary, c, default=None):
     i = dictionary.get(c, None)
     if i is None:
         # Try with alternative versions of the character
         tc = transliterate(c)
-        other_char = list(set([c.lower(), c.upper(), tc, tc.lower(), tc.upper()]))
+        other_char = list(
+            set([c.lower(), c.upper(), tc, tc.lower(), tc.upper()]))
         for c2 in other_char:
             i = dictionary.get(c2, None)
             if i is not None:
@@ -67,19 +71,21 @@ def loose_get_char_index(dictionary, c, default = None):
         if i is None:
             for c2 in other_char:
                 if len(c2) > 1:
-                    candidate = [dictionary[c3] for c3 in c2 if c3 in dictionary]
+                    candidate = [dictionary[c3]
+                                 for c3 in c2 if c3 in dictionary]
                     if len(candidate) > 0 and (i is None or len(candidate) > len(i)):
                         i = candidate
         # If still not found
         if i is None:
-            logger.warn("Character not correctly handled by alignment model: '" + "' / '".join(list(set([c] + other_char))) + "'")
+            logger.warn("Character not correctly handled by alignment model: '" +
+                        "' / '".join(list(set([c] + other_char))) + "'")
             i = [default] if default is not None else []
     else:
         i = [i]
     return i
 
 
-def get_trellis(emission, tokens, blank_id=0, use_max = False):
+def get_trellis(emission, tokens, blank_id=0, use_max=False):
     num_frame = emission.size(0)
     num_tokens = len(tokens)
 
@@ -97,15 +103,16 @@ def get_trellis(emission, tokens, blank_id=0, use_max = False):
             # Score for staying at the same token
             trellis[t, 1:] + emission[t, blank_id],
             torch.maximum(trellis[t, 1:] + emission[t, tokens],
-            # Score for changing to the next token
-            trellis[t, :-1] + emission[t, tokens])
+                          # Score for changing to the next token
+                          trellis[t, :-1] + emission[t, tokens])
         ) if use_max else torch.logaddexp(
             trellis[t, 1:] + emission[t, blank_id],
             torch.logaddexp(trellis[t, 1:] + emission[t, tokens],
-            trellis[t, :-1] + emission[t, tokens])
+                            trellis[t, :-1] + emission[t, tokens])
         )
     return trellis
 
+
 @dataclass
 class Point:
     token_index: int
@@ -135,7 +142,8 @@ def backtrack(trellis, emission, tokens, blank_id=0):
         changed = trellis[t - 1, j - 1] + emission[t - 1, tokens[j - 1]]
 
         # 2. Store the path with frame-wise probability.
-        prob = emission[t - 1, tokens[j - 1] if changed > stayed else 0].exp().item()
+        prob = emission[t - 1, tokens[j - 1]
+                        if changed > stayed else 0].exp().item()
         # Return token index and time index in non-trellis coordinate.
         path.append(Point(j - 1, t - 1, prob))
 
@@ -184,6 +192,7 @@ def merge_repeats(transcript, path):
         i1 = i2
     return segments
 
+
 def merge_words(segments, separator=" "):
     words = []
     i1, i2 = 0, 0
@@ -192,10 +201,12 @@ def merge_words(segments, separator=" "):
             if i1 != i2:
                 segs = segments[i1:i2]
                 word = "".join([seg.label for seg in segs])
-                score = sum(seg.score * seg.length for seg in segs) / sum(seg.length for seg in segs)
-                words.append(Segment(word, segments[i1].start, segments[i2 - 1].end, score))
+                score = sum(seg.score * seg.length for seg in segs) / \
+                    sum(seg.length for seg in segs)
+                words.append(
+                    Segment(word, segments[i1].start, segments[i2 - 1].end, score))
             i1 = i2 + 1
             i2 = i1
         else:
             i2 += 1
-    return words
\ No newline at end of file
+    return words

From f1c3aaa60d49b393882569f7d7f321d5cfffadeb Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 08:42:47 +0100
Subject: [PATCH 094/172] Support more model types for word alignment
 (transformers, torchaudio)

---
 requirements.txt                  |   1 +
 stt/processing/__init__.py        |  23 ++---
 stt/processing/alignment_model.py | 143 ++++++++++++++++++++++++++++--
 stt/processing/decoding.py        |   7 +-
 stt/processing/load_model.py      |  67 ++++++++++++++
 stt/processing/text_normalize.py  |   2 +
 stt/processing/word_alignment.py  |   6 +-
 7 files changed, 224 insertions(+), 25 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index a93dc9f..c4e4fd4 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,6 +8,7 @@ num2words
 pyyaml>=5.4.1
 requests>=2.26.0
 speechbrain
+transformers
 wavio>=0.0.4
 websockets
 git+https://github.com/openai/whisper.git
\ No newline at end of file
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 81bd784..70b6695 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -6,14 +6,17 @@
 import whisper
 
 from stt import logger
-from stt.processing.decoding import decode, get_default_language
+from stt.processing.decoding import decode, get_language
 from stt.processing.utils import load_wave_buffer, load_audiofile
 
-from .load_model import load_whisper_model, load_speechbrain_model
+from .load_model import load_whisper_model, load_alignment_model, get_alignment_model, get_model_type
 
 __all__ = ["logger", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
 
+# Set informative log
+logger.setLevel(logging.INFO)
+
 # Set device
 device = os.environ.get(
     "DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
@@ -23,11 +26,12 @@
     raise Exception("Failed to set device: {}".format(str(err))) from err
 
 # Check language
+language = get_language()
 available_languages = [
     k for k, v in whisper.tokenizer.LANGUAGES.items()] + [None]
-if get_default_language() not in available_languages:
+if language not in available_languages:
     raise RuntimeError(
-        f"Language {get_default_language()} is not available. Available languages are: {available_languages}")
+        f"Language {get_language()} is not available. Available languages are: {available_languages}")
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
@@ -42,10 +46,9 @@
 logger.info("Model loaded. (t={}s)".format(time() - start))
 
 # Load alignment model
-alignment_model_type = os.environ.get(
-    "ALIGNMENT_MODEL_TYPE", "/opt/linSTT_speechbrain_fr-FR_v1.0.0")
-logger.info(f"Loading alignment model...")
+alignment_model_name = get_alignment_model(language)
+logger.info(f"Loading alignment model {alignment_model_name} ({'local' if os.path.isfile(alignment_model_name) else 'remote'})...")
 start = time()
-alignment_model = load_speechbrain_model(
-    alignment_model_type, device=device, download_root="/opt")
-logger.info("Alignment Model loaded. (t={}s)".format(time() - start))
+alignment_model = load_alignment_model(
+    alignment_model_name, device=device, download_root="/opt")
+logger.info(f"Alignment Model of type {get_model_type(alignment_model)} loaded. (t={time() - start}s)")
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index 309b7af..b6ef333 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -3,32 +3,90 @@
 import torch.nn.utils.rnn as rnn_utils
 
 from stt import logger
+from .load_model import get_model_type
 
+import whisper
 
-def speechbrain_get_vocab(model):
+################################################################################
+# Get list of labes (and blank_id) from model
+
+
+def get_vocab(model):
+    type = get_model_type(model)
+    if type == "speechbrain":
+        labels, blank_id = get_vocab_speechbrain(model)
+    elif type == "transformers":
+        labels, blank_id = get_vocab_transformers(model)
+    else:
+        labels, blank_id = get_vocab_torchaudio(model)
+    assert isinstance(labels, list) and min(
+        [isinstance(l, str) for l in labels]), "labels must be a list of strings"
+    return norm_labels(labels, blank_id), blank_id
+
+
+def get_vocab_speechbrain(model):
     tokenizer = model.tokenizer
-    labels = [{'': " ", ' ⁇ ': "<pad>"}.get(i, i).lower() for i in tokenizer.decode(
+    # Is this general enough?
+    labels = [{'': " ", ' ⁇ ': "<pad>"}.get(i, i) for i in tokenizer.decode(
         [[i] for i in range(tokenizer.get_piece_size())])]
     blank_id = labels.index("<pad>")
     return labels, blank_id
 
 
+def get_vocab_torchaudio(model_and_labels):
+    _, labels = model_and_labels
+    labels = list(labels)
+    # WTF : blank_id = labels.index("-") ...? Is it general enough?
+    blank_id = 0
+    return labels, blank_id
+
+
+def get_vocab_transformers(model_and_processor):
+    _, processor = model_and_processor
+    labels_dict = dict((v, k)
+                       for k, v in processor.tokenizer.get_vocab().items())
+    labels = [labels_dict[i] for i in range(len(labels_dict))]
+    blank_id = labels.index("<pad>")
+    return labels, blank_id
+
+
+def norm_labels(labels, blank_id):
+    labels[blank_id] = ""
+    return [l if l != "|" else " " for l in labels]
+
+################################################################################
+# Compute log-probabilities from model
+
+
 # The following limit is to handle the corner Case of too long audio segment (which is better to split it to avoid memory overflow).
 # But it is 2240400 / 16000 Hz ~ 140 seconds, which should not happen for segments detected by Whisper (usually one sentence).
 # Also note that Whisper works with 30 seconds segment, so there is chance that this limit is never reached.
 MAX_LEN = 2240400
 
 
-def speechbrain_compute_log_probas(model, audios, max_len=MAX_LEN):
+def compute_logprobas(model, audios, max_len=MAX_LEN):
+
     # Single audio
     if not isinstance(audios, list):
         audios = [audios]
-        log_probas = speechbrain_compute_log_probas(
-            model, audios, max_len=max_len)
-        return log_probas[0]
+        logits = compute_logprobas(model, audios, max_len=max_len)
+        return logits[0]
 
     # Batch of audios (can occur when max_len is reached)
     assert len(audios) > 0, "audios must be a non-empty list"
+
+    type = get_model_type(model)
+    if type == "speechbrain":
+        logits = compute_logits_speechbrain(model, audios, max_len)
+    elif type == "transformers":
+        logits = compute_logits_transformers(model, audios, max_len)
+    else:
+        logits = compute_logits_torchaudio(model, audios, max_len)
+
+    return torch.log_softmax(logits, dim=-1)
+
+
+def compute_logits_speechbrain(model, audios, max_len):
     if not isinstance(audios[0], torch.Tensor):
         audios = [torch.from_numpy(a) for a in audios]
     if max([len(a) for a in audios]) > max_len:
@@ -47,7 +105,7 @@ def speechbrain_compute_log_probas(model, audios, max_len=MAX_LEN):
         log_probas = [[] for i in range(len(audios))]
         for i in range(0, len(chunks), batch_size):
             chunk = chunks[i:min(i+batch_size, len(chunks))]
-            log_probas_tmp = speechbrain_compute_log_probas(model, chunk)
+            log_probas_tmp = compute_logits_speechbrain(model, chunk)
             for j in range(i, i+len(chunk)):
                 k = 0
                 while j >= i_audio[k]:
@@ -59,8 +117,7 @@ def speechbrain_compute_log_probas(model, audios, max_len=MAX_LEN):
         batch, wav_lens = pack_sequences(audios, device=model.device)
         log_probas = model.forward(batch, wav_lens)
 
-    log_probas = torch.log_softmax(log_probas, dim=-1)
-    return log_probas
+    return log_probas.cpu().detach()
 
 
 def pack_sequences(tensors, device="cpu"):
@@ -71,3 +128,71 @@ def pack_sequences(tensors, device="cpu"):
     maxwav_lens = max(wav_lens)
     wav_lens = torch.Tensor([l/maxwav_lens for l in wav_lens])
     return tensor.to(device), wav_lens.to(device)
+
+
+def compute_logits_transformers(model_and_processor, audios, max_len):
+
+    model, processor = model_and_processor
+
+    # can be different from processor.feature_extractor.sampling_rate
+    sample_rate = whisper.audio.SAMPLE_RATE
+    device = model.device
+
+    audios = [audio.numpy() for audio in audios]
+    processed_batch = processor(audios, sampling_rate=sample_rate)
+
+    padded_batch = processor.pad(
+        processed_batch,
+        padding=True,
+        max_length=None,
+        pad_to_multiple_of=None,
+        return_tensors="pt",
+    )
+
+    l = padded_batch.input_values.shape[1]
+
+    with torch.inference_mode():
+        if l > max_len:
+            # Split batch in smaller chunks
+            logger.warning(
+                "Audio too long, splitting into {} chunks for alignment".format(math.ceil(l / max_len)))
+            logits = []
+            for i in range(0, l, max_len):
+                j = min(i + max_len, l)
+                logits.append(model(padded_batch.input_values[:, i:j].to(device),
+                                    attention_mask=padded_batch.attention_mask[:, i:j].to(device)).logits)
+            logits = torch.cat(logits, dim=1)
+        else:
+            logits = model(padded_batch.input_values.to(device),
+                           attention_mask=padded_batch.attention_mask.to(device)).logits
+
+    return logits.cpu().detach()
+
+
+def compute_logits_torchaudio(model_and_labels, audios, max_len):
+    # TODO: factorize with compute_logits_transformers, and add support for batch of audios
+
+    model, _ = model_and_labels
+
+    all_logits = []
+
+    with torch.inference_mode():
+        for audio in audios:
+            l = len(audio)
+            if l > max_len:
+                # Split audio in smaller chunks
+                logger.warning(
+                    "Audio too long, splitting into {} chunks for alignment".format(math.ceil(l / max_len)))
+                logits = []
+                for i in range(0, l, max_len):
+                    j = min(i + max_len, l)
+                    logits.append(model(audio[i:j].unsqueeze(0))[0])
+                logits = torch.cat(logits, dim=1)
+            else:
+                logits, _ = model(audio.unsqueeze(0))
+
+            all_logits.append(logits.cpu().detach())
+
+    assert len(all_logits) == 1  # TODO: support batch of audios
+
+    return all_logits[0]
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index f257fb1..bb56d0f 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -9,12 +9,13 @@
 from stt import logger
 from .word_alignment import compute_alignment
 from .text_normalize import remove_punctuation, normalize_text, remove_emoji
+from .load_model import load_alignment_model, get_alignment_model
 
 # TODO: understand and remove this limitations
 torch.set_num_threads(1)
 
 
-def get_default_language():
+def get_language():
     return os.environ.get("LANGUAGE", None)
 
 
@@ -36,7 +37,7 @@ def decode(audio: torch.Tensor,
     fp16 = model.device != torch.device("cpu")
 
     if language is None:
-        language = get_default_language()
+        language = get_language()
 
     logger.info(f"Transcribing audio with language {language}...")
 
@@ -59,6 +60,8 @@ def decode(audio: torch.Tensor,
     segments = whisper_res["segments"]
     if language is None:
         language = whisper_res["language"]
+    if alignment_model is None:
+        alignment_model = load_alignment_model(get_alignment_model(language), device=model.device)
 
     result["text"] = text
     result["confidence-score"] = np.exp(np.array([r["avg_logprob"]
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 27fdf9a..4add720 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -4,6 +4,29 @@
 import requests
 import huggingface_hub
 import speechbrain as sb
+import transformers
+import torchaudio
+
+# Source: https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
+ALIGNMENT_MODELS = {
+    "fr": "/opt/linSTT_speechbrain_fr-FR_v1.0.0",
+    # "fr": "VOXPOPULI_ASR_BASE_10K_FR",
+    "en": "WAV2VEC2_ASR_BASE_960H",
+    # "en": "jonatasgrosman/wav2vec2-large-xlsr-53-english",
+    "de": "VOXPOPULI_ASR_BASE_10K_DE",
+    "es": "VOXPOPULI_ASR_BASE_10K_ES",
+    "it": "VOXPOPULI_ASR_BASE_10K_IT",
+    "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
+    "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
+    "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
+    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
+}
+
+
+def get_alignment_model(language):
+    source = os.environ.get("ALIGNMENT_MODEL")
+    if not source:
+        return ALIGNMENT_MODELS.get(language, ALIGNMENT_MODELS["fr"])
 
 
 def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
@@ -16,6 +39,20 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
     return model
 
 
+def load_alignment_model(source, device="cpu", download_root="/opt"):
+
+    if source in torchaudio.pipelines.__all__:
+        return load_torchaudio_model(source, device=device, download_root=download_root)
+    try:
+        return load_transformers_model(source, device=device, download_root=download_root)
+    except Exception as err1:
+        try:
+            return load_speechbrain_model(source, device=device, download_root=download_root)
+        except Exception as err2:
+            raise Exception(
+                f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
+
+
 def load_speechbrain_model(source, device="cpu", download_root="/opt"):
 
     if os.path.isdir(source):
@@ -44,6 +81,36 @@ def load_speechbrain_model(source, device="cpu", download_root="/opt"):
     return model
 
 
+def load_transformers_model(source, device="cpu", download_root="/opt"):
+
+    model = transformers.Wav2Vec2ForCTC.from_pretrained(source).to(device)
+    processor = transformers.Wav2Vec2Processor.from_pretrained(source)
+
+    model.eval()
+    model.requires_grad_(False)
+    return model, processor
+
+
+def load_torchaudio_model(source, device="cpu", download_root="/opt"):
+
+    bundle = torchaudio.pipelines.__dict__[source]
+    model = bundle.get_model().to(device)
+    labels = bundle.get_labels()
+
+    model.eval()
+    model.requires_grad_(False)
+    return model, labels
+
+
+def get_model_type(model):
+    if not isinstance(model, tuple):
+        return "speechbrain"
+    assert len(model) == 2, "Invalid model type"
+    if isinstance(model[0], transformers.Wav2Vec2ForCTC):
+        return "transformers"
+    return "torchaudio"
+
+
 def make_yaml_overrides(yaml_file, key_values):
     """
     return a dictionary of overrides to be used with speechbrain (hyperyaml files)
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 7e2f6fb..6065eaa 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -169,6 +169,8 @@ def normalize_text(text: str, lang: str) -> str:
     for k, v in symbol_table.items():
         text = re.sub(k, " "+v+" ", text)
 
+    text = re.sub(r" \.",".", text)
+
     return collapse_whitespace(text)
 
 
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index 7dd0c8f..4e32bdc 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -5,8 +5,7 @@
 import torch
 
 from stt import logger
-from .alignment_model import speechbrain_compute_log_probas as compute_log_probas
-from .alignment_model import speechbrain_get_vocab as get_vocab
+from .alignment_model import compute_logprobas, get_vocab
 from .utils import flatten
 from .text_normalize import transliterate
 
@@ -14,10 +13,9 @@
 def compute_alignment(audio, transcript, model):
     """ Compute the alignment of the audio and a transcript, for a given model that returns log-probabilities on the charset defined the transcript."""
 
-    emission = compute_log_probas(model, audio)
+    emission = compute_logprobas(model, audio)
     labels, blank_id = get_vocab(model)
     labels = labels[:emission.shape[1]]
-    labels[blank_id] = " "
     dictionary = {c: i for i, c in enumerate(labels)}
 
     default = labels.index("-") if "-" in labels else None

From 766a5d574aab68995d786caa99524a90307abd3c Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 09:21:14 +0100
Subject: [PATCH 095/172] ignore temporary files

---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 0b8d9ad..c7b414a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
 start_container.sh
 .env*
 test/*
+tmp*
\ No newline at end of file

From 5da1a65f63d7f0466717ee71ecad2436328c261f Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 10:36:04 +0100
Subject: [PATCH 096/172] ensure that word timestamps are increasing

---
 stt/processing/decoding.py | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index bb56d0f..ed0eef1 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -76,6 +76,17 @@ def decode(audio: torch.Tensor,
         # Compute word timestamps
         result["words"] = []
         max_t = audio.shape[0]
+
+        # Ensure that the segments start / end time are increasing
+        # (because there is no guarantee with Whisper)
+        previous_start = 0.0
+        for segment in segments:
+            if segment["start"] < previous_start:
+                segment["start"] = previous_start
+            if segment["end"] <= segment["start"]:
+                segment["end"] = segment["start"] + 1.0
+            previous_start = segment["end"]
+
         for segment in segments:
             offset = segment["start"]
             start = min(max_t, round(segment["start"] * SAMPLE_RATE))

From b968cfb1487b72fed6208d0d4a6b1ef455851174 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 13:05:12 +0100
Subject: [PATCH 097/172] Allow to have unspecied language (that can change
 from one segment to another)

---
 stt/processing/__init__.py   | 19 +++++++++----------
 stt/processing/decoding.py   | 14 +++++++++++---
 stt/processing/load_model.py | 33 ++++++++++++++++++++++++---------
 3 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 70b6695..0119461 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,6 +1,5 @@
 import os
 import logging
-from time import time
 
 import torch
 import whisper
@@ -35,20 +34,20 @@
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
-logger.info(
-    f"Loading Whisper model {model_type} ({'local' if os.path.isfile(model_type) else 'remote'})...")
-start = time()
+logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})...")
 try:
     model = load_whisper_model(model_type, device=device)
 except Exception as err:
     raise Exception(
         "Failed to load transcription model: {}".format(str(err))) from err
-logger.info("Model loaded. (t={}s)".format(time() - start))
 
 # Load alignment model
 alignment_model_name = get_alignment_model(language)
-logger.info(f"Loading alignment model {alignment_model_name} ({'local' if os.path.isfile(alignment_model_name) else 'remote'})...")
-start = time()
-alignment_model = load_alignment_model(
-    alignment_model_name, device=device, download_root="/opt")
-logger.info(f"Alignment Model of type {get_model_type(alignment_model)} loaded. (t={time() - start}s)")
+if alignment_model_name:
+    logger.info(
+        f"Loading alignment model {alignment_model_name} ({'local' if os.path.exists(alignment_model_name) else 'remote'})...")
+    alignment_model = load_alignment_model(
+        alignment_model_name, device=device, download_root="/opt")
+else:
+    logger.info("No alignment model preloaded")
+    alignment_model = {}  # Alignement model(s) will be loaded on the fly
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index ed0eef1..abbdd38 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -60,8 +60,16 @@ def decode(audio: torch.Tensor,
     segments = whisper_res["segments"]
     if language is None:
         language = whisper_res["language"]
-    if alignment_model is None:
-        alignment_model = load_alignment_model(get_alignment_model(language), device=model.device)
+        logger.info(f"Detected language: {language}")
+    if isinstance(alignment_model, dict):
+        # Load alignment model on the fly
+        if language not in alignment_model:
+            alignment_model_name = get_alignment_model(language)
+            logger.info(f"Loading alignment model {alignment_model_name} ({'local' if os.path.exists(alignment_model_name) else 'remote'})...")
+            alignment_model[language] = load_alignment_model(alignment_model_name, device=model.device, download_root="/opt")
+        spec_alignment_model = alignment_model[language]
+    else:
+        spec_alignment_model = alignment_model
 
     result["text"] = text
     result["confidence-score"] = np.exp(np.array([r["avg_logprob"]
@@ -103,7 +111,7 @@ def decode(audio: torch.Tensor,
                     f"Lost text in segment {segment['start']}-{segment['end']}")
                 continue
             labels, emission, trellis, segments, word_segments = compute_alignment(
-                sub_audio, sub_text, alignment_model)
+                sub_audio, sub_text, spec_alignment_model)
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
             if len(sub_words) == len(word_segments):
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 4add720..addbc04 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -7,6 +7,9 @@
 import transformers
 import torchaudio
 
+import time
+from stt import logger
+
 # Source: https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
 ALIGNMENT_MODELS = {
     "fr": "/opt/linSTT_speechbrain_fr-FR_v1.0.0",
@@ -26,31 +29,43 @@
 def get_alignment_model(language):
     source = os.environ.get("ALIGNMENT_MODEL")
     if not source:
-        return ALIGNMENT_MODELS.get(language, ALIGNMENT_MODELS["fr"])
+        return ALIGNMENT_MODELS.get(language, None)
 
 
 def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
 
+    start = time.time()
+
     model = whisper.load_model(model_type_or_file, device=device,
                                download_root=os.path.join(download_root, "whisper"))
 
     model.eval()
     model.requires_grad_(False)
+
+    logger.info("Whisper Model loaded. (t={}s)".format(time.time() - start))
+
     return model
 
 
 def load_alignment_model(source, device="cpu", download_root="/opt"):
 
+    start = time.time()
+
     if source in torchaudio.pipelines.__all__:
-        return load_torchaudio_model(source, device=device, download_root=download_root)
-    try:
-        return load_transformers_model(source, device=device, download_root=download_root)
-    except Exception as err1:
+        model = load_torchaudio_model(source, device=device, download_root=download_root)
+    else:
         try:
-            return load_speechbrain_model(source, device=device, download_root=download_root)
-        except Exception as err2:
-            raise Exception(
-                f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
+            model = load_transformers_model(source, device=device, download_root=download_root)
+        except Exception as err1:
+            try:
+                model = load_speechbrain_model(source, device=device, download_root=download_root)
+            except Exception as err2:
+                raise Exception(
+                    f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
+
+    logger.info(f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
+
+    return model
 
 
 def load_speechbrain_model(source, device="cpu", download_root="/opt"):

From 939b7576902344139b6567f0924a867dacfdf4c5 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 13:54:12 +0100
Subject: [PATCH 098/172] Update README

---
 .envdefault |  1 +
 README.md   | 82 +++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/.envdefault b/.envdefault
index 4452be3..617f4ae 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,6 +1,7 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
 MODEL=/opt/model.pt
+#ALIGNMENT_MODEL=/opt/linSTT_speechbrain_fr-FR_v1.0.0
 LANGUAGE=fr
 
 # TASK PARAMETERS
diff --git a/README.md b/README.md
index a15b330..0b27eb5 100644
--- a/README.md
+++ b/README.md
@@ -12,21 +12,23 @@ To run the transcription models you'll need:
 * One CPU per worker. Inference time scales on CPU performances. 
 
 ### Model
-LinTO-Platform-STT accepts one Whisper models in the PyTorch format.
-
-You can download mutli-lingual models with the following links:
-* tiny: "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
-* base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
-* small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
-* medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
-* large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
-* large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
-
-Models specialized for English can also be found:
-* tiny.en: "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
-* base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
-* small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
-* medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
+LinTO-Platform-STT works with two models:
+* A Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
+* A wav2vec model to perform word alignment, which can be in the format of SpeechBrain, HuggingFace's Transformers or TorchAudio
+
+The wav2vec model can be specified either
+* with a string corresponding to a `torchaudio` pipeline (e.g. "WAV2VEC2_ASR_BASE_960H") or
+* with a string corresponding to a HuggingFace repository of a wav2vec model (e.g. "jonatasgrosman/wav2vec2-large-xlsr-53-english"), or
+* with a path corresponding to a folder with a SpeechBrain model
+
+Default models are provided for the following languages:
+* French (fr)
+* English (en)
+* Spanish (es)
+* German (de)
+* Dutch (nl)
+* Japanese (ja)
+* Chinese (zh)
 
 ### Docker
 The transcription service requires docker up and running.
@@ -48,15 +50,30 @@ or
 
 ```bash
 docker pull lintoai/linto-platform-stt
-``` with the following links
+```
 
 **2- Download the models**
 
 Have the Whisper model file ready at ASR_PATH.
 
-You can downloaded with the links mentioned above, if you don't have already a Whisper model.
 If you already used Whisper in the past, you may have models in ~/.cache/whisper.
 
+You can download mutli-lingual Whisper models with the following links:
+* tiny: "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
+* base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
+* small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
+* medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
+* large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
+* large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
+
+Whisper models specialized for English can also be found here:
+* tiny.en: "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
+* base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
+* small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
+* medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
+
+If may also want to download a specific wav2vec model for word alignment.
+
 **3- Fill the .env**
 
 ```bash
@@ -65,8 +82,9 @@ cp .envdefault .env
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
-| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task\|websocket |
-| MODEL | Path to the model or type of model used. | ASR_PATH\|small\|medium\|large-v1\|... |
+| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task |
+| MODEL | Path to the Whisper model, or type of Whisper model used. | ASR_PATH\|small\|medium\|large-v1\|... |
+| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | WAV2VEC_PATH\|jonatasgrosman/wav2vec2-large-xlsr-53-english\|WAV2VEC2_ASR_BASE_960H |
 | LANGUAGE | (Optional) Language to recognize | fr\|en\|... |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
 | SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
@@ -95,10 +113,9 @@ yo(yoruba), zh(chinese)
 ### Serving mode 
 ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
 
-STT can be used three ways:
+STT can be used in two ways:
 * Through an [HTTP API](#http-server) using the **http**'s mode.
 * Through a [message broker](#micro-service-within-linto-platform-stack) using the **task**'s mode.
-* Through a [websocket server](#websocket-server) **websocket**'s mode.
 
 Mode is specified using the .env value or environment variable ```SERVING_MODE```.
 ```bash
@@ -119,11 +136,20 @@ linto-platform-stt:latest
 
 This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
 
+You may also want to mount your cache folder CACHE_PATH (e.g. "~/.cache") ```-v CACHE_PATH:/root/.cache```
+in order to avoid downloading models each time.
+
+Also if you want to specifiy a custom alignment model already downloaded in a folder WAV2VEC_PATH,
+you can add option ```-v WAV2VEC_PATH:/opt/wav2vec``` and environment variable ```ALIGNMENT_MODEL=/opt/wav2vec```.
+
+
 **Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
 | HOST_SERVING_PORT | Host serving port | 80 |
-| ASR_PATH | (Optional) Path to the Whisper model on the host machine to /opt/model.pt | /my/path/to/models/medium.pt |
+| ASR_PATH | Path to the Whisper model on the host machine mounted to /opt/model.pt | /my/path/to/models/medium.pt |
+| CACHE_PATH | (Optional) Path to a folder to download wav2vec alignment models when relevant | /home/username/.cache |
+| WAV2VEC_PATH | (Optional) Path to a folder to a custom wav2vec alignment model |  /my/path/to/models/wav2vec |
 
 ### Micro-service within LinTO-Platform stack
 The HTTP serving mode connect a celery worker to a message broker.
@@ -142,12 +168,20 @@ docker run --rm \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file .env \
 linto-platform-stt:latest
-```| LANGUAGE | (Optional) Language to recognize | fr\|en\|... |
+```
+
+You may also want to mount your cache folder CACHE_PATH (e.g. "~/.cache") ```-v CACHE_PATH:/root/.cache```
+in order to avoid downloading models each time.
+
+Also if you want to specifiy a custom alignment model already downloaded in a folder WAV2VEC_PATH,
+you can add option ```-v WAV2VEC_PATH:/opt/wav2vec``` and environment variable ```ALIGNMENT_MODEL=/opt/wav2vec```.
 
 | Variables | Description | Example |
 |:-|:-|:-|
-| ASR_PATH | (Optional) Path to the Whisper model on the host machine to /opt/model.pt | /my/path/to/models/medium.pt |
 | SHARED_AUDIO_FOLDER | Shared audio folder mounted to /opt/audio | /my/path/to/models/vosk-model |
+| ASR_PATH | Path to the Whisper model on the host machine mounted to /opt/model.pt | /my/path/to/models/medium.pt |
+| CACHE_PATH | (Optional) Path to a folder to download wav2vec alignment models when relevant | /home/username/.cache |
+| WAV2VEC_PATH | (Optional) Path to a folder to a custom wav2vec alignment model |  /my/path/to/models/wav2vec |
 
 
 ## Usages

From af5f8211843044a642d56a58d7ddfa06ba343f2a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 3 Jan 2023 13:54:45 +0100
Subject: [PATCH 099/172] cosm

---
 load_alignment_model.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/load_alignment_model.py b/load_alignment_model.py
index 0cf6087..7ca700e 100644
--- a/load_alignment_model.py
+++ b/load_alignment_model.py
@@ -1,4 +1,5 @@
-import os 
+import os
+import shutil
 import urllib.request
 import zipfile
 
@@ -14,7 +15,9 @@ def load_alignment_model(name, download_root = "/opt"):
             # Download model
             url = f"https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/{name}.zip"
             destzip = destdir+".zip"
-            if not os.path.exists(destzip):
+            if os.path.exists(os.path.basename(destzip)):
+                shutil.move(os.path.basename(destzip), destzip)
+            if not os.path.exists(destzip):                
                 print("Downloading", url, "into", destdir)
                 os.makedirs(download_root, exist_ok=True)
                 urllib.request.urlretrieve(url, destzip)

From bef7a48f16cbac4dc40b405364e65e98ce43c78a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 17:25:59 +0100
Subject: [PATCH 100/172] improve logs, readme, update comment

---
 README.md                        | 16 +++++++++-------
 stt/processing/__init__.py       |  2 ++
 stt/processing/decoding.py       |  2 +-
 stt/processing/word_alignment.py |  8 ++++++--
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index 0b27eb5..4d997ba 100644
--- a/README.md
+++ b/README.md
@@ -82,14 +82,14 @@ cp .envdefault .env
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
-| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http\|task |
-| MODEL | Path to the Whisper model, or type of Whisper model used. | ASR_PATH\|small\|medium\|large-v1\|... |
-| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | WAV2VEC_PATH\|jonatasgrosman/wav2vec2-large-xlsr-53-english\|WAV2VEC2_ASR_BASE_960H |
-| LANGUAGE | (Optional) Language to recognize | fr\|en\|... |
+| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http \| task |
+| MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| medium \| large-v1 \| ... |
+| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| WAV2VEC2_ASR_BASE_960H \| jonatasgrosman/wav2vec2-large-xlsr-53-english \| ... |
+| LANGUAGE | (Optional) Language to recognize | fr \| en \| ... |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
 | SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
 | BROKER_PASS | Using the task mode, broker password | my-password |
-| CONCURRENCY | Maximum number of parallel requests | >1 |
+| CONCURRENCY | Maximum number of parallel requests | 3 |
 
 The language is a code of two or three letters. The list of languages supported by Whisper are:
 ```
@@ -142,11 +142,10 @@ in order to avoid downloading models each time.
 Also if you want to specifiy a custom alignment model already downloaded in a folder WAV2VEC_PATH,
 you can add option ```-v WAV2VEC_PATH:/opt/wav2vec``` and environment variable ```ALIGNMENT_MODEL=/opt/wav2vec```.
 
-
 **Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
-| HOST_SERVING_PORT | Host serving port | 80 |
+| HOST_SERVING_PORT | Host serving port | 8080 |
 | ASR_PATH | Path to the Whisper model on the host machine mounted to /opt/model.pt | /my/path/to/models/medium.pt |
 | CACHE_PATH | (Optional) Path to a folder to download wav2vec alignment models when relevant | /home/username/.cache |
 | WAV2VEC_PATH | (Optional) Path to a folder to a custom wav2vec alignment model |  /my/path/to/models/wav2vec |
@@ -176,6 +175,7 @@ in order to avoid downloading models each time.
 Also if you want to specifiy a custom alignment model already downloaded in a folder WAV2VEC_PATH,
 you can add option ```-v WAV2VEC_PATH:/opt/wav2vec``` and environment variable ```ALIGNMENT_MODEL=/opt/wav2vec```.
 
+**Parameters:**
 | Variables | Description | Example |
 |:-|:-|:-|
 | SHARED_AUDIO_FOLDER | Shared audio folder mounted to /opt/audio | /my/path/to/models/vosk-model |
@@ -265,3 +265,5 @@ This project is developped under the AGPLv3 License (see LICENSE).
 
 * [OpenAI Whisper](https://github.com/openai/whisper)
 * [SpeechBrain](https://github.com/speechbrain/speechbrain).
+* [TorchAudio](https://github.com/pytorch/audio)
+* [HuggingFace Transformers](https://github.com/huggingface/transformers)
\ No newline at end of file
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 0119461..ac22286 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -23,6 +23,7 @@
     device = torch.device(device)
 except Exception as err:
     raise Exception("Failed to set device: {}".format(str(err))) from err
+logger.info(f"Using device {device}")
 
 # Check language
 language = get_language()
@@ -31,6 +32,7 @@
 if language not in available_languages:
     raise RuntimeError(
         f"Language {get_language()} is not available. Available languages are: {available_languages}")
+logger.info(f"Using language {language}")
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index abbdd38..f16dd0b 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -11,7 +11,7 @@
 from .text_normalize import remove_punctuation, normalize_text, remove_emoji
 from .load_model import load_alignment_model, get_alignment_model
 
-# TODO: understand and remove this limitations
+# This is to avoid hanging in a multi-threaded environment
 torch.set_num_threads(1)
 
 
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index 4e32bdc..ba94a14 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -9,6 +9,7 @@
 from .utils import flatten
 from .text_normalize import transliterate
 
+_unknown_chars = []
 
 def compute_alignment(audio, transcript, model):
     """ Compute the alignment of the audio and a transcript, for a given model that returns log-probabilities on the charset defined the transcript."""
@@ -54,6 +55,7 @@ def count_repetitions(tokens):
 
 
 def loose_get_char_index(dictionary, c, default=None):
+    global _unknown_chars
     i = dictionary.get(c, None)
     if i is None:
         # Try with alternative versions of the character
@@ -75,8 +77,10 @@ def loose_get_char_index(dictionary, c, default=None):
                         i = candidate
         # If still not found
         if i is None:
-            logger.warn("Character not correctly handled by alignment model: '" +
-                        "' / '".join(list(set([c] + other_char))) + "'")
+            if c not in _unknown_chars:
+                logger.warn("Character not correctly handled by alignment model: '" +
+                            "' / '".join(list(set([c] + other_char))) + "'")
+                _unknown_chars.append(c)
             i = [default] if default is not None else []
     else:
         i = [i]

From d60feecfb1852f0e77933e7f76fed82a3259732e Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 17:46:34 +0100
Subject: [PATCH 101/172] Make it work with GPU. Note: CUDA multiprocessing
 needs "spawn" start method, and gunicorn cannot achieve this

---
 http_server/ingress.py     | 42 +++++++++++++++++++++++---------------
 http_server/serving.py     | 24 +++++++++++++++++++++-
 requirements.txt           |  1 +
 stt/processing/__init__.py |  6 +++---
 4 files changed, 52 insertions(+), 21 deletions(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index db739d4..ce12e53 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -2,16 +2,15 @@
 
 import json
 import logging
-import os
-from time import time
+import time
 
 from confparser import createParser
 from flask import Flask, Response, abort, json, request
 from flask_sock import Sock
-from serving import GunicornServing
+from serving import GeventServing, GunicornServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, load_wave_buffer, model, alignment_model
+from stt.processing import decode, load_wave_buffer, model, alignment_model, use_gpu
 from stt import logger as stt_logger
 
 app = Flask("__stt-standalone-worker__")
@@ -41,29 +40,29 @@ def transcribe():
         logger.info("Transcribe request received")
 
         # get response content type
-        logger.debug(request.headers.get("accept").lower())
+        # logger.debug(request.headers.get("accept").lower())
         if request.headers.get("accept").lower() == "application/json":
             join_metadata = True
         elif request.headers.get("accept").lower() == "text/plain":
             join_metadata = False
         else:
             raise ValueError("Not accepted header")
-        logger.debug("Metadata: {}".format(join_metadata))
+        # logger.debug("Metadata: {}".format(join_metadata))
 
         # get input file
-        if "file" in request.files.keys():
-            file_buffer = request.files["file"].read()
-            audio_data = load_wave_buffer(file_buffer)
-            start_t = time()
+        if "file" not in request.files.keys():
+            raise ValueError("No audio file was uploaded")
 
-            # Transcription
-            transcription = decode(audio_data, model, alignment_model, join_metadata)
-            logger.debug("Transcription complete (t={}s)".format(time() - start_t))
+        file_buffer = request.files["file"].read()
+        audio_data = load_wave_buffer(file_buffer)
+        start_t = time.time()
 
-            logger.debug("... Complete")
+        # Transcription
+        transcription = decode(
+            audio_data, model, alignment_model, join_metadata)
+        logger.debug("Transcription complete (t={}s)".format(time.time() - start_t))
 
-        else:
-            raise ValueError("No audio file was uploaded")
+        logger.debug(f"END {id}: {time.time()}")
 
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
@@ -108,7 +107,16 @@ def server_error(error):
     except Exception as err:
         logger.warning("Could not setup swagger: {}".format(str(err)))
 
-    serving = GunicornServing(
+    logger.info(f"Using {args.workers} workers")
+    
+    if use_gpu:
+        serving_type = GeventServing
+        logger.debug("Serving with gevent")
+    else:
+        serving_type = GunicornServing
+        logger.debug("Serving with gunicorn")
+
+    serving = serving_type(
         app,
         {
             "bind": f"0.0.0.0:{args.service_port}",
diff --git a/http_server/serving.py b/http_server/serving.py
index d2dd7e8..773c463 100644
--- a/http_server/serving.py
+++ b/http_server/serving.py
@@ -1,5 +1,7 @@
 import gunicorn.app.base
-
+import gevent.pywsgi
+import gevent.monkey
+gevent.monkey.patch_all()
 
 class GunicornServing(gunicorn.app.base.BaseApplication):
     def __init__(self, app, options=None):
@@ -18,3 +20,23 @@ def load_config(self):
 
     def load(self):
         return self.application
+
+class GeventServing():
+
+    def __init__(self, app, options=None):
+        self.options = options or {}
+        self.application = app
+
+    def run(self):
+        bind = self.options.get('bind', "0.0.0.0:8080")
+        workers = self.options.get('workers', 1)
+        listener = bind.split(':')
+        try:
+            assert len(listener) == 2
+            listener = (listener[0], int(listener[1]))
+        except:
+            print(f"Invalid bind address {bind}")
+
+        server = gevent.pywsgi.WSGIServer(listener, self.application, spawn = workers)
+        server.serve_forever()
+
diff --git a/requirements.txt b/requirements.txt
index c4e4fd4..6b9b488 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,6 +3,7 @@ flask>=1.1.2
 flask-cors>=3.0.10
 flask-sock
 flask-swagger-ui>=3.36.0
+gevent
 gunicorn
 num2words
 pyyaml>=5.4.1
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index ac22286..492aded 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -10,19 +10,19 @@
 
 from .load_model import load_whisper_model, load_alignment_model, get_alignment_model, get_model_type
 
-__all__ = ["logger", "decode", "model", "alignment_model",
+__all__ = ["logger", "use_gpu", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
 
 # Set informative log
 logger.setLevel(logging.INFO)
 
 # Set device
-device = os.environ.get(
-    "DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
+device = os.environ.get("DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
 try:
     device = torch.device(device)
 except Exception as err:
     raise Exception("Failed to set device: {}".format(str(err))) from err
+use_gpu = device.type == "cuda"
 logger.info(f"Using device {device}")
 
 # Check language

From 3759066abea268ded87333663946d01bc3f8bf94 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 18:30:12 +0100
Subject: [PATCH 102/172] fix failure on GPU with torchaudio models

---
 stt/processing/alignment_model.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index b6ef333..a4669a1 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -174,6 +174,12 @@ def compute_logits_torchaudio(model_and_labels, audios, max_len):
 
     model, _ = model_and_labels
 
+    # Get the device where is running the model
+    device = "cpu"
+    for p in model.parameters():
+        device = p.device
+        break
+    
     all_logits = []
 
     with torch.inference_mode():
@@ -186,10 +192,10 @@ def compute_logits_torchaudio(model_and_labels, audios, max_len):
                 logits = []
                 for i in range(0, l, max_len):
                     j = min(i + max_len, l)
-                    logits.append(model(audio[i:j].unsqueeze(0))[0])
+                    logits.append(model(audio[i:j].unsqueeze(0).to(device))[0])
                 logits = torch.cat(logits, dim=1)
             else:
-                logits, _ = model(audio.unsqueeze(0))
+                logits, _ = model(audio.unsqueeze(0).to(device))
 
             all_logits.append(logits.cpu().detach())
 

From 4c6b2b3af5c74228a96df84e6baaed93643fb43a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 18:34:53 +0100
Subject: [PATCH 103/172] give up linstt model for french. Add more HuggingFace
 wav2vec models to support alignment in more languages

---
 .envdefault                  |  5 ++-
 Dockerfile                   | 11 +----
 load_alignment_model.py      | 82 ------------------------------------
 stt/processing/load_model.py | 24 +++++++++--
 4 files changed, 24 insertions(+), 98 deletions(-)
 delete mode 100644 load_alignment_model.py

diff --git a/.envdefault b/.envdefault
index 617f4ae..ce8ca21 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,8 +1,9 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
-MODEL=/opt/model.pt
-#ALIGNMENT_MODEL=/opt/linSTT_speechbrain_fr-FR_v1.0.0
 LANGUAGE=fr
+MODEL=/opt/model.pt
+#ALIGNMENT_MODEL=/opt/alignment_model
+#DEVICE=cuda:0
 
 # TASK PARAMETERS
 SERVICE_NAME=stt
diff --git a/Dockerfile b/Dockerfile
index 4761b3d..844f7ac 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,8 +1,6 @@
 FROM python:3.9
 LABEL maintainer="jlouradour@linagora.com"
 
-ARG KALDI_MKL
-
 RUN apt-get update && \
     apt-get install -y --no-install-recommends \
         wget \
@@ -27,14 +25,7 @@ RUN rm -rf /var/lib/apt/lists/*
 
 # Install python dependencies
 COPY requirements.txt ./
-RUN pip install --force-reinstall --no-cache-dir -r requirements.txt
-
-# Download alignment model
-COPY load_alignment_model.py ./
-RUN python3 load_alignment_model.py
-
-# Cleaning
-RUN rm requirements.txt load_alignment_model.py
+RUN pip install --force-reinstall --no-cache-dir -r requirements.txt && rm requirements.txt
 
 WORKDIR /usr/src/app
 
diff --git a/load_alignment_model.py b/load_alignment_model.py
deleted file mode 100644
index 7ca700e..0000000
--- a/load_alignment_model.py
+++ /dev/null
@@ -1,82 +0,0 @@
-import os
-import shutil
-import urllib.request
-import zipfile
-
-import huggingface_hub
-import speechbrain as sb
-import requests
-
-
-def load_alignment_model(name, download_root = "/opt"):
-    if name.startswith("linSTT"):
-        destdir = os.path.join(download_root, name)
-        if not os.path.exists(destdir):
-            # Download model
-            url = f"https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/{name}.zip"
-            destzip = destdir+".zip"
-            if os.path.exists(os.path.basename(destzip)):
-                shutil.move(os.path.basename(destzip), destzip)
-            if not os.path.exists(destzip):                
-                print("Downloading", url, "into", destdir)
-                os.makedirs(download_root, exist_ok=True)
-                urllib.request.urlretrieve(url, destzip)
-            with zipfile.ZipFile(destzip, 'r') as z:
-                os.makedirs(destdir, exist_ok=True)
-                z.extractall(destdir)
-            assert os.path.isdir(destdir)
-            os.remove(destzip)
-    else:
-        destdir = name
-    load_speechbrain_model(destdir, download_root = download_root)
-
-def load_speechbrain_model(source, device = None, download_root = "/opt"):
-    
-    if os.path.isdir(source):
-        yaml_file = os.path.join(source, "hyperparams.yaml")
-        assert os.path.isfile(yaml_file), f"Hyperparams file {yaml_file} not found"
-    else:
-        try:
-            yaml_file = huggingface_hub.hf_hub_download(repo_id=source, filename="hyperparams.yaml", cache_dir = os.path.join(download_root, "huggingface/hub"))
-        except requests.exceptions.HTTPError:
-            yaml_file = None
-
-    overrides = make_yaml_overrides(yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
-    savedir = os.path.join(download_root, "speechbrain")
-    try:
-        model = sb.pretrained.EncoderASR.from_hparams(source = source, savedir = savedir, overrides = overrides)
-    except ValueError:
-        model = sb.pretrained.EncoderDecoderASR.from_hparams(source = source, savedir = savedir, overrides = overrides)
-    return model
-
-def make_yaml_overrides(yaml_file, key_values):
-    """
-    return a dictionary of overrides to be used with speechbrain
-    yaml_file: path to yaml file
-    key_values: dict of key values to override
-    """
-    if yaml_file is None: return None
-
-    override = {}
-    with open(yaml_file, "r") as f:
-        parent = None
-        for line in f:
-            if line.strip() == "":
-                parent = None
-            elif line == line.lstrip():
-                if ":" in line:
-                    parent = line.split(":")[0].strip()
-                    if parent in key_values:
-                        override[parent] = key_values[parent]
-            elif ":" in line:
-                child = line.strip().split(":")[0].strip()
-                if child in key_values:
-                    override[parent] = override.get(parent, {}) | {child: key_values[child]}
-    return override
-
-
-if __name__ == "__main__":
-
-    import sys
-    assert len(sys.argv) in [1, 2], f"Usage: {sys.argv[0]} <model_type_or_file>"
-    load_alignment_model(sys.argv[1] if len(sys.argv) > 1 else "linSTT_speechbrain_fr-FR_v1.0.0")
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index addbc04..7fae195 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -10,19 +10,35 @@
 import time
 from stt import logger
 
-# Source: https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
+# Sources:
+# * https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
+# * https://pytorch.org/audio/stable/pipelines.html
+# * https://huggingface.co/jonatasgrosman
+
 ALIGNMENT_MODELS = {
-    "fr": "/opt/linSTT_speechbrain_fr-FR_v1.0.0",
-    # "fr": "VOXPOPULI_ASR_BASE_10K_FR",
     "en": "WAV2VEC2_ASR_BASE_960H",
     # "en": "jonatasgrosman/wav2vec2-large-xlsr-53-english",
+    "fr": "VOXPOPULI_ASR_BASE_10K_FR",
+    # "fr": "jonatasgrosman/wav2vec2-large-xlsr-53-french",
     "de": "VOXPOPULI_ASR_BASE_10K_DE",
+    # "de": "jonatasgrosman/wav2vec2-large-xlsr-53-german",
     "es": "VOXPOPULI_ASR_BASE_10K_ES",
+    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-spanish",
     "it": "VOXPOPULI_ASR_BASE_10K_IT",
+    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-italian",
+    "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese",
     "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
+    "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish",
+    "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish",
+    "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian",
+    "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek",
+    "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
+    "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic",
+    "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian",
+    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
     "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
     "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
-    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
+    "vi": "nguyenvulebinh/wav2vec2-base-vietnamese-250h",
 }
 
 

From 3353c70e40751aaefe0729d89c236da6d6836602 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 18:59:10 +0100
Subject: [PATCH 104/172] cosm

---
 stt/processing/text_normalize.py | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 6065eaa..7621199 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -44,6 +44,7 @@ def remove_emoji(text):
 def normalize_text(text: str, lang: str) -> str:
     """ Transform digits into characters... """
 
+
     # Reorder currencies (1,20€ -> 1 € 20)
     coma = "," if lang in ["fr"] else "\."
     for c in _currencies:
@@ -62,12 +63,14 @@ def normalize_text(text: str, lang: str) -> str:
                 r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text)
             digits = ["".join(d) for d in digits]
         else:
-            digits = []
+            digits = re.findall(
+                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})\b", text)
+            digits = ["".join(d) for d in digits]
         if digits:
             digits = sorted(list(set(digits)), reverse=True,
                             key=lambda x: (len(x), x))
             for s in digits:
-                filtered = re.sub("[a-z]", "", s)
+                filtered = re.sub("[a-zèº]", "", s)
                 ordinal = filtered != s
                 digit = roman_to_decimal(filtered)
                 v = undigit(str(digit), lang=lang,
@@ -83,7 +86,7 @@ def normalize_text(text: str, lang: str) -> str:
             r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
     else:
         logger.warn(
-            f"Language {lang} not supported for normalization. Some words might be mis-localized.")
+            f"Language {lang} not supported for some normalization. Some words might be mis-localized.")
         digits = []
     if digits:
         digits = sorted(list(set(digits)), reverse=True,

From 01d3f57cf997b6b3a480424d4f9eca977d380f40 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 18:59:30 +0100
Subject: [PATCH 105/172] some wav2vec models do not use attention mask

---
 stt/processing/alignment_model.py | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index a4669a1..8a7c39f 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -151,6 +151,8 @@ def compute_logits_transformers(model_and_processor, audios, max_len):
 
     l = padded_batch.input_values.shape[1]
 
+    use_mask = hasattr(padded_batch, "attention_mask")
+
     with torch.inference_mode():
         if l > max_len:
             # Split batch in smaller chunks
@@ -159,12 +161,17 @@ def compute_logits_transformers(model_and_processor, audios, max_len):
             logits = []
             for i in range(0, l, max_len):
                 j = min(i + max_len, l)
-                logits.append(model(padded_batch.input_values[:, i:j].to(device),
+                if use_mask:
+                    logits.append(model(padded_batch.input_values[:, i:j].to(device),
                                     attention_mask=padded_batch.attention_mask[:, i:j].to(device)).logits)
+                else:
+                    logits.append(model(padded_batch.input_values[:, i:j].to(device)).logits)
             logits = torch.cat(logits, dim=1)
-        else:
+        elif use_mask:
             logits = model(padded_batch.input_values.to(device),
                            attention_mask=padded_batch.attention_mask.to(device)).logits
+        else:
+            logits = model(padded_batch.input_values.to(device)).logits
 
     return logits.cpu().detach()
 

From 76838ab0666587e15b6d5eba3b8284f56ca31b75 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 6 Jan 2023 19:09:14 +0100
Subject: [PATCH 106/172] glue the words inside a segment

---
 stt/processing/decoding.py | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index f16dd0b..5353bcd 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -115,22 +115,34 @@ def decode(audio: torch.Tensor,
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
             if len(sub_words) == len(word_segments):
-                for word, segment in zip(sub_words, word_segments):
+                for word, seg in zip(sub_words, word_segments):
                     result["words"].append({
                         "word": word,
-                        "start": segment.start * ratio + offset,
-                        "end": segment.end * ratio + offset,
-                        "conf": segment.score,
+                        "start": seg.start * ratio + offset,
+                        "end": seg.end * ratio + offset,
+                        "conf": seg.score,
                     })
             else:
                 logger.warn(
-                    f"Alignment failed. Results might differ on some words.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
-                for segment in word_segments:
+                    f"Alignment failed. Some words might be mis-rendered.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
+                for seg in word_segments:
                     result["words"].append({
-                        "word": segment.label,
-                        "start": segment.start * ratio + offset,
-                        "end": segment.end * ratio + offset,
-                        "conf": segment.score,
+                        "word": seg.label,
+                        "start": seg.start * ratio + offset,
+                        "end": seg.end * ratio + offset,
+                        "conf": seg.score,
                     })
+            # Glue the words inside a segment
+            previous_start = offset
+            words = result["words"]
+            for i, word in enumerate(words):
+                if i == 0:
+                    word["start"] = segment["start"]
+                else:
+                    word["start"] = words[i-1]["end"]
+                if i == len(words) - 1:
+                    word["end"] = segment["end"]
+                else:
+                    word["end"] = .5 * (words[i+1]["start"] + word["end"])
 
     return result

From affd53682881ade4edac24f1d638ed47f09a6892 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 9 Jan 2023 09:46:42 +0100
Subject: [PATCH 107/172] fix bug in the position of the first and last word of
 each segment

---
 stt/processing/decoding.py | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 5353bcd..41143d3 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -114,27 +114,21 @@ def decode(audio: torch.Tensor,
                 sub_audio, sub_text, spec_alignment_model)
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
-            if len(sub_words) == len(word_segments):
-                for word, seg in zip(sub_words, word_segments):
-                    result["words"].append({
-                        "word": word,
-                        "start": seg.start * ratio + offset,
-                        "end": seg.end * ratio + offset,
-                        "conf": seg.score,
-                    })
-            else:
+            words = []
+            use_original_words = True
+            if len(sub_words) != len(word_segments):
                 logger.warn(
                     f"Alignment failed. Some words might be mis-rendered.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
-                for seg in word_segments:
-                    result["words"].append({
-                        "word": seg.label,
-                        "start": seg.start * ratio + offset,
-                        "end": seg.end * ratio + offset,
-                        "conf": seg.score,
-                    })
+                assert len(word_segments) < len(sub_words)
+                use_original_words = False
+            for word, seg in zip(sub_words, word_segments):
+                words.append({
+                    "word": word if use_original_words else seg.label,
+                    "start": seg.start * ratio + offset,
+                    "end": seg.end * ratio + offset,
+                    "conf": seg.score,
+                })
             # Glue the words inside a segment
-            previous_start = offset
-            words = result["words"]
             for i, word in enumerate(words):
                 if i == 0:
                     word["start"] = segment["start"]
@@ -144,5 +138,7 @@ def decode(audio: torch.Tensor,
                     word["end"] = segment["end"]
                 else:
                     word["end"] = .5 * (words[i+1]["start"] + word["end"])
+            # Accumulate results
+            result["words"] += words
 
     return result

From 2cf0799247b228bb7c6c9e3efbad728414caf58a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 9 Jan 2023 09:47:11 +0100
Subject: [PATCH 108/172] fix alignment model specified with env variable
 ALIGNMENT_MODEL

---
 stt/processing/load_model.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 7fae195..b4b5738 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -45,7 +45,8 @@
 def get_alignment_model(language):
     source = os.environ.get("ALIGNMENT_MODEL")
     if not source:
-        return ALIGNMENT_MODELS.get(language, None)
+        source = ALIGNMENT_MODELS.get(language, None)
+    return source
 
 
 def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):

From b875897b8847c46ed62be4ff390fa187386d4d8f Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 9 Jan 2023 09:47:41 +0100
Subject: [PATCH 109/172] better text normalization for numbers/symbols before
 punctuation marks

---
 stt/processing/text_normalize.py | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 7621199..fa9933c 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -162,21 +162,27 @@ def normalize_text(text: str, lang: str) -> str:
         else:
             word = " / ".join([undigit(s, lang=lang)
                               for s in digitf.split('/')])
-        if " " in digit:
-            text = re.sub(r'\b'+str(digit)+r'\b', " "+word+" ", text)
-        else:
-            text = re.sub(str(digit), " "+word+" ", text)
+        text = replace_keeping_word_boundaries(digit, word, text)
 
     # Symbols (currencies, percent...)
     symbol_table = _symbol_to_word.get(lang, {})
     for k, v in symbol_table.items():
-        text = re.sub(k, " "+v+" ", text)
+        text = replace_keeping_word_boundaries(k, v, text)
 
-    text = re.sub(r" \.",".", text)
+    # Remove extra spaces before punctuation
+    # text = re.sub(r" ([\.,!:;])",r"\1",text)
 
     return collapse_whitespace(text)
 
 
+def replace_keeping_word_boundaries(orig, dest, text):
+    if orig in text:
+        text = re.sub(r"(\W)"+orig+r"(\W)", r"\1"+dest+r"\2", text)
+        text = re.sub(orig+r"(\W)", " "+dest+r"\1", text)
+        text = re.sub(r"(\W)"+orig, r"\1"+dest+" ", text)
+        text = re.sub(orig, " "+dest+" ", text)
+    return text
+
 def undigit(str, lang, to="cardinal"):
     str = re.sub(" ", "", str)
     if to == "denominator":

From 838ab1058d0a9ce54b3cf9dcdb525c2da3dabd86 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 16 Jan 2023 16:53:56 +0100
Subject: [PATCH 110/172] integrate another approach to get word timestamps
 from Whisper transcription, based on cross-attention weights (no need to
 wav2vec model)

---
 .envdefault                  |  6 ++--
 README.md                    |  2 +-
 requirements.txt             |  3 +-
 stt/processing/__init__.py   | 24 ++++++++------
 stt/processing/decoding.py   | 61 ++++++++++++++++++++++++++++++------
 stt/processing/load_model.py | 20 +++++++++---
 6 files changed, 87 insertions(+), 29 deletions(-)

diff --git a/.envdefault b/.envdefault
index ce8ca21..b2105da 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,9 +1,11 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
-LANGUAGE=fr
+STT_LANGUAGE=fr
 MODEL=/opt/model.pt
-#ALIGNMENT_MODEL=/opt/alignment_model
 #DEVICE=cuda:0
+#ALIGNMENT_MODEL=fr
+#ALIGNMENT_MODEL=wav2vec
+#ALIGNMENT_MODEL=/opt/alignment_model
 
 # TASK PARAMETERS
 SERVICE_NAME=stt
diff --git a/README.md b/README.md
index 4d997ba..df6e3cb 100644
--- a/README.md
+++ b/README.md
@@ -85,7 +85,7 @@ cp .envdefault .env
 | SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http \| task |
 | MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| medium \| large-v1 \| ... |
 | ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| WAV2VEC2_ASR_BASE_960H \| jonatasgrosman/wav2vec2-large-xlsr-53-english \| ... |
-| LANGUAGE | (Optional) Language to recognize | fr \| en \| ... |
+| STT_LANGUAGE | (Optional) Language to recognize | fr \| en \| ... |
 | SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
 | SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
 | BROKER_PASS | Using the task mode, broker password | my-password |
diff --git a/requirements.txt b/requirements.txt
index 6b9b488..b53c4be 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -12,4 +12,5 @@ speechbrain
 transformers
 wavio>=0.0.4
 websockets
-git+https://github.com/openai/whisper.git
\ No newline at end of file
+# git+https://github.com/openai/whisper.git
+git+https://github.com/Jeronymous/whisper-timestamped.git
\ No newline at end of file
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 492aded..757a182 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -27,11 +27,14 @@
 
 # Check language
 language = get_language()
-available_languages = [
-    k for k, v in whisper.tokenizer.LANGUAGES.items()] + [None]
+available_languages = \
+    list(whisper.tokenizer.LANGUAGES.keys()) + \
+    [k.title() for k in whisper.tokenizer.TO_LANGUAGE_CODE.keys()] + \
+    [None]
 if language not in available_languages:
-    raise RuntimeError(
-        f"Language {get_language()} is not available. Available languages are: {available_languages}")
+    raise ValueError(f"Language {get_language()} is not available. Available languages are: {available_languages}")
+if isinstance(language, str):
+    language = whisper.tokenizer.TO_LANGUAGE_CODE.get(language.lower(), language)
 logger.info(f"Using language {language}")
 
 # Load ASR model
@@ -44,12 +47,13 @@
         "Failed to load transcription model: {}".format(str(err))) from err
 
 # Load alignment model
-alignment_model_name = get_alignment_model(language)
-if alignment_model_name:
+alignment_model = get_alignment_model(os.environ.get("ALIGNMENT_MODEL"), language)
+if alignment_model:
     logger.info(
-        f"Loading alignment model {alignment_model_name} ({'local' if os.path.exists(alignment_model_name) else 'remote'})...")
-    alignment_model = load_alignment_model(
-        alignment_model_name, device=device, download_root="/opt")
+        f"Loading alignment model {alignment_model} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
+    alignment_model = load_alignment_model(alignment_model, device=device, download_root="/opt")
+elif alignment_model is None:
+    logger.info("Alignment will be done using Whisper cross-attention weights")
 else:
-    logger.info("No alignment model preloaded")
+    logger.info("No alignment model preloaded. It will be loaded on the fly depending on the detected language.")
     alignment_model = {}  # Alignement model(s) will be loaded on the fly
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 41143d3..2f83b4f 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -2,6 +2,7 @@
 
 import whisper
 from whisper.audio import SAMPLE_RATE
+import whisper_timestamped
 
 import numpy as np
 import torch
@@ -16,7 +17,7 @@
 
 
 def get_language():
-    return os.environ.get("LANGUAGE", None)
+    return os.environ.get("STT_LANGUAGE", None)
 
 
 def decode(audio: torch.Tensor,
@@ -41,15 +42,23 @@ def decode(audio: torch.Tensor,
 
     logger.info(f"Transcribing audio with language {language}...")
 
-    whisper_res = model.transcribe(audio,
-                                   language=language,
-                                   fp16=fp16,
-                                   temperature=0.0,  # For deterministic results
-                                   beam_size=beam_size,
-                                   no_speech_threshold=no_speech_threshold,
-                                   logprob_threshold=logprob_threshold,
-                                   compression_ratio_threshold=compression_ratio_threshold
-                                   )
+    kwargs = dict(
+        language=language,
+        fp16=fp16,
+        temperature=0.0,  # For deterministic results
+        beam_size=beam_size,
+        no_speech_threshold=no_speech_threshold,
+        logprob_threshold=logprob_threshold,
+        compression_ratio_threshold=compression_ratio_threshold
+    )
+
+    if alignment_model is None:
+        # Use Whisper cross-attention weights
+        return format_whisper_timestamped_response(
+            whisper_timestamped.transcribe(model, audio, **kwargs)
+        )
+
+    whisper_res = model.transcribe(audio, **kwargs)
 
     text = whisper_res["text"]
     text = remove_emoji(text).strip()
@@ -71,6 +80,7 @@ def decode(audio: torch.Tensor,
     else:
         spec_alignment_model = alignment_model
 
+
     result["text"] = text
     result["confidence-score"] = np.exp(np.array([r["avg_logprob"]
                                         for r in segments])).mean() if len(segments) else 0.0
@@ -142,3 +152,34 @@ def decode(audio: torch.Tensor,
             result["words"] += words
 
     return result
+
+def format_whisper_timestamped_response(transcription):
+    """Format Whisper response."""
+    
+    # NOCOMMIT
+    import json
+    print(json.dumps(transcription, indent=2, ensure_ascii=False))
+
+    for i, seg in enumerate(transcription["segments"][:-1]):
+            for expected_keys in ["start", "end", "words", "avg_logprob"]:
+                assert expected_keys in seg, f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
+
+    text = transcription["text"].strip()
+
+    segments = []
+
+    for seg in transcription["segments"]:
+        seg_proba = np.exp(seg["avg_logprob"])
+        for word in seg["words"]:
+            segments.append({
+                "text": word["text"],
+                "start": word["start"],
+                "end": word["end"],
+                "conf": seg_proba, # Same proba for all words within the segment
+            })
+
+    return {
+        "text": text,
+        "confidence-score": np.mean([np.exp(seg["avg_logprob"]) for seg in transcription["segments"]]),
+        "segments": segments
+    }
\ No newline at end of file
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index b4b5738..9c3ff29 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -42,11 +42,21 @@
 }
 
 
-def get_alignment_model(language):
-    source = os.environ.get("ALIGNMENT_MODEL")
-    if not source:
-        source = ALIGNMENT_MODELS.get(language, None)
-    return source
+def get_alignment_model(alignment_model_name, language, force = False):
+    if alignment_model_name in ["wav2vec", "wav2vec2"]:
+        if language is None:
+            # Will load alignment model on the fly depending on detected language
+            return {}
+        elif language in ALIGNMENT_MODELS:
+            return ALIGNMENT_MODELS[language]
+        elif force:
+            raise ValueError(f"No wav2vec alignment model for language '{language}'.")
+        else:
+            logger.warn(f"No wav2vec alignment model for language '{language}'. Fallback to English.")
+            return ALIGNMENT_MODELS["en"]
+    elif alignment_model_name in whisper.tokenizer.LANGUAGES.keys():
+        return get_alignment_model("wav2vec", alignment_model_name, force = True)
+    return alignment_model_name
 
 
 def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):

From 358a64d22c58dca5db9b9fdbaa289abcc0f98905 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 16 Jan 2023 19:02:39 +0100
Subject: [PATCH 111/172] remove unwanted print

---
 stt/processing/decoding.py | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 2f83b4f..55c1636 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -155,10 +155,6 @@ def decode(audio: torch.Tensor,
 
 def format_whisper_timestamped_response(transcription):
     """Format Whisper response."""
-    
-    # NOCOMMIT
-    import json
-    print(json.dumps(transcription, indent=2, ensure_ascii=False))
 
     for i, seg in enumerate(transcription["segments"][:-1]):
             for expected_keys in ["start", "end", "words", "avg_logprob"]:

From c0491b3af89bed443344551b12f99cb10010fb05 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 15 Feb 2023 17:53:00 +0100
Subject: [PATCH 112/172] fix text normalization

---
 stt/processing/text_normalize.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index fa9933c..da2675e 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -127,7 +127,7 @@ def normalize_text(text: str, lang: str) -> str:
                     lang != "fr" and first[-1] in ["1", "2", "3"])
                 first = undigit(first, lang=lang,
                                 to="ordinal" if use_ordinal else "cardinal")
-                second = _int_to_month[second]
+                second = _int_to_month.get(lang, {}).get(second,digitf[i+1:])
             else:
                 first = undigit(digitf[:i], lang=lang)
                 second = undigit(digitf[i+1:], to="denominator", lang=lang)
@@ -148,7 +148,7 @@ def normalize_text(text: str, lang: str) -> str:
                     pass
             third = undigit(digitf[i2+1:], lang=lang)
             if is_date:
-                first = digitf[:i].lstrip("0")
+                first = digitf[:i1].lstrip("0")
                 use_ordinal = (lang == "fr" and first == "1") or (
                     lang != "fr" and first[-1] in ["1", "2", "3"])
                 first = undigit(first, lang=lang,
@@ -370,3 +370,4 @@ def value(r):
         "¥": "yens",
     }
 }
+

From 20ce809987eee3a26bca2da27c4638b503c9fda2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 15 Feb 2023 18:26:58 +0100
Subject: [PATCH 113/172] update repo url

---
 requirements.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/requirements.txt b/requirements.txt
index b53c4be..bb3bebf 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,4 +13,4 @@ transformers
 wavio>=0.0.4
 websockets
 # git+https://github.com/openai/whisper.git
-git+https://github.com/Jeronymous/whisper-timestamped.git
\ No newline at end of file
+git+https://github.com/linto-ai/whisper-timestamped.git
\ No newline at end of file

From 65527f2bf56e3deb6fbb14507860908dd94d1873 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 15 Feb 2023 18:27:17 +0100
Subject: [PATCH 114/172] tune default Whisper options

---
 stt/processing/decoding.py | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 55c1636..fbf531e 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -26,6 +26,9 @@ def decode(audio: torch.Tensor,
            with_word_timestamps: bool,
            language: str = None,
            beam_size: int = None,
+           best_of: int = None,
+           temperature: float = 0.0,
+           condition_on_previous_text: bool = False,
            no_speech_threshold: float = 0.6,
            logprob_threshold: float = -1.0,
            compression_ratio_threshold: float = 2.4,
@@ -45,8 +48,10 @@ def decode(audio: torch.Tensor,
     kwargs = dict(
         language=language,
         fp16=fp16,
-        temperature=0.0,  # For deterministic results
+        temperature=temperature,
         beam_size=beam_size,
+        best_of=best_of,
+        condition_on_previous_text=condition_on_previous_text,
         no_speech_threshold=no_speech_threshold,
         logprob_threshold=logprob_threshold,
         compression_ratio_threshold=compression_ratio_threshold
@@ -58,6 +63,10 @@ def decode(audio: torch.Tensor,
             whisper_timestamped.transcribe(model, audio, **kwargs)
         )
 
+    # Force deterministic results
+    torch.manual_seed(1234)
+    torch.cuda.manual_seed_all(1234)
+    
     whisper_res = model.transcribe(audio, **kwargs)
 
     text = whisper_res["text"]

From d192382d9d4795b3fe8a09d3c978c21ab612830b Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 27 Feb 2023 15:19:05 +0100
Subject: [PATCH 115/172] Use LANGUAGE env variable, and accept more values
 ("*", "fr-FR", ...)

---
 .envdefault                |  8 +++++++-
 README.md                  | 23 +++++++++++++----------
 stt/processing/decoding.py | 13 +++++++++++--
 3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/.envdefault b/.envdefault
index b2105da..d4bb2e8 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,8 +1,14 @@
 # SERVING PARAMETERS
 SERVICE_MODE=http
-STT_LANGUAGE=fr
 MODEL=/opt/model.pt
+
+# LANGUAGE can be in different formats: en, en-US, English, ...
+# If not set or "*", the language will be detected automatically.
+LANGUAGE=*
+
 #DEVICE=cuda:0
+
+# Only used for alignement using wav2vec models
 #ALIGNMENT_MODEL=fr
 #ALIGNMENT_MODEL=wav2vec
 #ALIGNMENT_MODEL=/opt/alignment_model
diff --git a/README.md b/README.md
index df6e3cb..164db6f 100644
--- a/README.md
+++ b/README.md
@@ -82,16 +82,19 @@ cp .envdefault .env
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
-| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | http \| task |
-| MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| medium \| large-v1 \| ... |
-| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| WAV2VEC2_ASR_BASE_960H \| jonatasgrosman/wav2vec2-large-xlsr-53-english \| ... |
-| STT_LANGUAGE | (Optional) Language to recognize | fr \| en \| ... |
-| SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
-| SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
-| BROKER_PASS | Using the task mode, broker password | my-password |
-| CONCURRENCY | Maximum number of parallel requests | 3 |
-
-The language is a code of two or three letters. The list of languages supported by Whisper are:
+| SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | `http` \| `task` |
+| MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| `medium` \| `large-v1` \| ... |
+| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| `WAV2VEC2_ASR_BASE_960H` \| `jonatasgrosman/wav2vec2-large-xlsr-53-english` \| ... |
+| LANGUAGE | (Optional) Language to recognize | `*` \| `fr` \| `fr-FR` \| `French` \| `en` \| `en-US` \| `English` \| ... |
+| SERVICE_NAME | Using the task mode, set the queue's name for task processing | `my-stt` |
+| SERVICE_BROKER | Using the task mode, URL of the message broker | `redis://my-broker:6379` |
+| BROKER_PASS | Using the task mode, broker password | `my-password` |
+| CONCURRENCY | Maximum number of parallel requests | `3` |
+
+If `*` is used for the `LANGUAGE` environment variable, or if `LANGUAGE` is not defined,
+automatic language detection will be performed by Whisper.
+
+The language can be a code of two or three letters. The list of languages supported by Whisper are:
 ```
 af(afrikaans), am(amharic), ar(arabic), as(assamese), az(azerbaijani),
 ba(bashkir), be(belarusian), bg(bulgarian), bn(bengali), bo(tibetan), br(breton), bs(bosnian),
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index fbf531e..407b2be 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -17,8 +17,17 @@
 
 
 def get_language():
-    return os.environ.get("STT_LANGUAGE", None)
-
+    """
+    Get the language from the environment variable LANGUAGE, and format as expected by Whisper.
+    """
+    language = os.environ.get("LANGUAGE", "*")
+    # "fr-FR" -> "fr" (language-country code to ISO 639-1 code)
+    if len(language) > 2 and language[2] == "-":
+        language = language.split("-")[0]
+    # "*" means "all languages"
+    if language == "*":
+        language = None
+    return language
 
 def decode(audio: torch.Tensor,
            model: whisper.model.Whisper,

From 858cf88240aa9696d14fbbcf797ecdd69a741a01 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 4 Apr 2023 14:50:29 +0200
Subject: [PATCH 116/172] do not use --force-reinstall which could override
 torch custom installation (CPU...)

---
 Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Dockerfile b/Dockerfile
index 844f7ac..cb584ca 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -25,7 +25,7 @@ RUN rm -rf /var/lib/apt/lists/*
 
 # Install python dependencies
 COPY requirements.txt ./
-RUN pip install --force-reinstall --no-cache-dir -r requirements.txt && rm requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt && rm requirements.txt
 
 WORKDIR /usr/src/app
 

From ab5f3f436cef89249f0d323e9399ada898f1c1c8 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 4 Apr 2023 14:50:41 +0200
Subject: [PATCH 117/172] ignore pycache folders

---
 .gitignore | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index c7b414a..06b349b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 start_container.sh
 .env*
 test/*
-tmp*
\ No newline at end of file
+tmp*
+__pycache__
\ No newline at end of file

From 944f3ec0e08a162ad3a5076e48999c46b3153ee3 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 4 Apr 2023 14:51:15 +0200
Subject: [PATCH 118/172] can load more types of models

---
 stt/processing/__init__.py   | 8 ++++----
 stt/processing/load_model.py | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 757a182..d0da27a 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -2,13 +2,13 @@
 import logging
 
 import torch
-import whisper
+import whisper_timestamped as whisper
 
 from stt import logger
 from stt.processing.decoding import decode, get_language
 from stt.processing.utils import load_wave_buffer, load_audiofile
 
-from .load_model import load_whisper_model, load_alignment_model, get_alignment_model, get_model_type
+from .load_model import load_whisper_model, load_alignment_model, get_alignment_model
 
 __all__ = ["logger", "use_gpu", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
@@ -32,7 +32,7 @@
     [k.title() for k in whisper.tokenizer.TO_LANGUAGE_CODE.keys()] + \
     [None]
 if language not in available_languages:
-    raise ValueError(f"Language {get_language()} is not available. Available languages are: {available_languages}")
+    raise ValueError(f"Language '{get_language()}' is not available. Available languages are: {available_languages}")
 if isinstance(language, str):
     language = whisper.tokenizer.TO_LANGUAGE_CODE.get(language.lower(), language)
 logger.info(f"Using language {language}")
@@ -46,7 +46,7 @@
     raise Exception(
         "Failed to load transcription model: {}".format(str(err))) from err
 
-# Load alignment model
+# Load alignment model (if any)
 alignment_model = get_alignment_model(os.environ.get("ALIGNMENT_MODEL"), language)
 if alignment_model:
     logger.info(
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 9c3ff29..c8754d4 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -1,4 +1,4 @@
-import whisper
+import whisper_timestamped as whisper
 
 import os
 import requests

From 9d4b1f56186c8135da18fb9aa5f3d9fd90790d17 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 4 Apr 2023 15:19:03 +0200
Subject: [PATCH 119/172] fix output format, and add language key

---
 stt/processing/decoding.py | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 407b2be..98558be 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -45,7 +45,6 @@ def decode(audio: torch.Tensor,
            remove_punctuation_from_words=False,
            ) -> dict:
     """Transcribe the audio data using Whisper with the defined model."""
-    result = {"text": "", "confidence-score": 0.0, "words": []}
 
     fp16 = model.device != torch.device("cpu")
 
@@ -99,9 +98,11 @@ def decode(audio: torch.Tensor,
         spec_alignment_model = alignment_model
 
 
+    result = {}
     result["text"] = text
-    result["confidence-score"] = np.exp(np.array([r["avg_logprob"]
-                                        for r in segments])).mean() if len(segments) else 0.0
+    result["language"] = language
+    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0
+
     if not with_word_timestamps:
         if not normalize_text_as_words:
             text = normalize_text(text, language)
@@ -175,25 +176,27 @@ def format_whisper_timestamped_response(transcription):
     """Format Whisper response."""
 
     for i, seg in enumerate(transcription["segments"][:-1]):
-            for expected_keys in ["start", "end", "words", "avg_logprob"]:
-                assert expected_keys in seg, f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
+        for expected_keys in ["start", "end", "words", "avg_logprob"]:
+            assert expected_keys in seg, f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
 
     text = transcription["text"].strip()
 
-    segments = []
+    words = []
+
+    segments = transcription.get("segments", [])
 
-    for seg in transcription["segments"]:
-        seg_proba = np.exp(seg["avg_logprob"])
-        for word in seg["words"]:
-            segments.append({
-                "text": word["text"],
+    for seg in segments:
+        for word in seg.get("words", []):
+            words.append({
+                "word": word["text"],
                 "start": word["start"],
                 "end": word["end"],
-                "conf": seg_proba, # Same proba for all words within the segment
+                "conf": word["confidence"],
             })
 
     return {
         "text": text,
-        "confidence-score": np.mean([np.exp(seg["avg_logprob"]) for seg in transcription["segments"]]),
-        "segments": segments
+        "language": transcription["language"],
+        "confidence-score": np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0,
+        "words": words,
     }
\ No newline at end of file

From 06ce05ac818136051ddfeb0b6f267f66ec2d9a11 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 4 Apr 2023 15:54:28 +0200
Subject: [PATCH 120/172] log the detected language (when automatic)

---
 stt/processing/decoding.py | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 98558be..cac8ab9 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -67,9 +67,11 @@ def decode(audio: torch.Tensor,
 
     if alignment_model is None:
         # Use Whisper cross-attention weights
-        return format_whisper_timestamped_response(
-            whisper_timestamped.transcribe(model, audio, **kwargs)
-        )
+        whisper_res = whisper_timestamped.transcribe(model, audio, **kwargs)
+        if language is None:
+            language = whisper_res["language"]
+            logger.info(f"Detected language: {language}")
+        return format_whisper_timestamped_response(whisper_res)
 
     # Force deterministic results
     torch.manual_seed(1234)

From 2a85bc69ec5fea47d217ce46af78b0ab3a708758 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 12 Apr 2023 10:41:16 +0200
Subject: [PATCH 121/172] add support of faster_whisper

---
 Dockerfile.ctranslate2                     |  43 ++++++
 Dockerfile => Dockerfile.torch             |   4 +-
 Dockerfile.torch.cpu                       |  49 ++++++
 http_server/ingress.py                     |  28 ++--
 requirements.ctranslate2.txt               |  12 ++
 requirements.txt => requirements.torch.txt |   2 +-
 stt/__init__.py                            |  20 ++-
 stt/processing/__init__.py                 |  24 +--
 stt/processing/alignment_model.py          |  15 +-
 stt/processing/decoding.py                 | 169 +++++++++++++++++----
 stt/processing/load_model.py               |  78 +++++++---
 stt/processing/text_normalize.py           |   3 +-
 stt/processing/utils.py                    | 150 ++++++++++++++++--
 stt/processing/word_alignment.py           |   8 +-
 14 files changed, 497 insertions(+), 108 deletions(-)
 create mode 100644 Dockerfile.ctranslate2
 rename Dockerfile => Dockerfile.torch (88%)
 create mode 100644 Dockerfile.torch.cpu
 create mode 100644 requirements.ctranslate2.txt
 rename requirements.txt => requirements.torch.txt (85%)

diff --git a/Dockerfile.ctranslate2 b/Dockerfile.ctranslate2
new file mode 100644
index 0000000..5989b3f
--- /dev/null
+++ b/Dockerfile.ctranslate2
@@ -0,0 +1,43 @@
+FROM python:3.9
+LABEL maintainer="jlouradour@linagora.com"
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        wget \
+        nano \
+        bzip2 \
+        unzip \
+        xz-utils \
+        sox \
+        ffmpeg \
+        g++ \
+        make \
+        cmake \
+        git \
+        zlib1g-dev \
+        automake \
+        autoconf \
+        libtool \
+        pkg-config \
+        ca-certificates
+
+RUN rm -rf /var/lib/apt/lists/*
+
+# Install python dependencies
+COPY requirements.ctranslate2.txt ./
+RUN pip install --no-cache-dir -r requirements.ctranslate2.txt && rm requirements.ctranslate2.txt
+
+WORKDIR /usr/src/app
+
+COPY stt /usr/src/app/stt
+COPY celery_app /usr/src/app/celery_app
+COPY http_server /usr/src/app/http_server
+COPY websocket /usr/src/app/websocket
+COPY document /usr/src/app/document
+COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+
+ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
+
+HEALTHCHECK CMD ./healthcheck.sh
+
+ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file
diff --git a/Dockerfile b/Dockerfile.torch
similarity index 88%
rename from Dockerfile
rename to Dockerfile.torch
index cb584ca..9db2a58 100644
--- a/Dockerfile
+++ b/Dockerfile.torch
@@ -24,8 +24,8 @@ RUN apt-get update && \
 RUN rm -rf /var/lib/apt/lists/*
 
 # Install python dependencies
-COPY requirements.txt ./
-RUN pip install --no-cache-dir -r requirements.txt && rm requirements.txt
+COPY requirements.torch.txt ./
+RUN pip install --no-cache-dir -r requirements.torch.txt && rm requirements.torch.txt
 
 WORKDIR /usr/src/app
 
diff --git a/Dockerfile.torch.cpu b/Dockerfile.torch.cpu
new file mode 100644
index 0000000..68ceda1
--- /dev/null
+++ b/Dockerfile.torch.cpu
@@ -0,0 +1,49 @@
+FROM python:3.9
+LABEL maintainer="jlouradour@linagora.com"
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        wget \
+        nano \
+        bzip2 \
+        unzip \
+        xz-utils \
+        sox \
+        ffmpeg \
+        g++ \
+        make \
+        cmake \
+        git \
+        zlib1g-dev \
+        automake \
+        autoconf \
+        libtool \
+        pkg-config \
+        ca-certificates
+
+RUN rm -rf /var/lib/apt/lists/*
+
+# Force CPU versions of torch
+RUN pip3 install \
+    torch==1.13.1+cpu \
+    torchaudio==0.13.1+cpu \
+    -f https://download.pytorch.org/whl/torch_stable.html
+
+# Install python dependencies
+COPY requirements.torch.txt ./
+RUN pip install --no-cache-dir -r requirements.torch.txt && rm requirements.torch.txt
+
+WORKDIR /usr/src/app
+
+COPY stt /usr/src/app/stt
+COPY celery_app /usr/src/app/celery_app
+COPY http_server /usr/src/app/http_server
+COPY websocket /usr/src/app/websocket
+COPY document /usr/src/app/document
+COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+
+ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
+
+HEALTHCHECK CMD ./healthcheck.sh
+
+ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file
diff --git a/http_server/ingress.py b/http_server/ingress.py
index ce12e53..fae8c2f 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -5,13 +5,13 @@
 import time
 
 from confparser import createParser
-from flask import Flask, Response, abort, json, request
-from flask_sock import Sock
-from serving import GeventServing, GunicornServing
+from flask import Flask, json, request
+from serving import GunicornServing, GeventServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, load_wave_buffer, model, alignment_model, use_gpu
+from stt.processing import decode, load_wave_buffer, model, alignment_model
 from stt import logger as stt_logger
+from stt import SHOULD_USE_GEVENT
 
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
@@ -37,7 +37,7 @@ def oas_docs():
 @app.route("/transcribe", methods=["POST"])
 def transcribe():
     try:
-        logger.info("Transcribe request received")
+        logger.info(f"Transcribe request received {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))}")
 
         # get response content type
         # logger.debug(request.headers.get("accept").lower())
@@ -46,33 +46,31 @@ def transcribe():
         elif request.headers.get("accept").lower() == "text/plain":
             join_metadata = False
         else:
-            raise ValueError("Not accepted header")
+            raise ValueError(f"Not accepted header (accept={request.headers.get('accept')} should be either application/json or text/plain)")
         # logger.debug("Metadata: {}".format(join_metadata))
 
         # get input file
         if "file" not in request.files.keys():
-            raise ValueError("No audio file was uploaded")
+            raise ValueError(f"No audio file was uploaded (missing 'file' key)")
 
         file_buffer = request.files["file"].read()
-        audio_data = load_wave_buffer(file_buffer)
         start_t = time.time()
+        audio_data = load_wave_buffer(file_buffer)
 
         # Transcription
         transcription = decode(
             audio_data, model, alignment_model, join_metadata)
         logger.debug("Transcription complete (t={}s)".format(time.time() - start_t))
 
-        logger.debug(f"END {id}: {time.time()}")
-
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
         return transcription["text"], 200
 
-    except ValueError as error:
-        return str(error), 400
     except Exception as error:
-        logger.error(error)
-        return "Server Error: {}".format(str(error)), 500
+        import traceback
+        print(traceback.format_exc())
+        logger.error(repr(error))
+        return "Server Error: {}".format(str(error)), 400 if isinstance(error, ValueError) else 500
 
 
 @app.errorhandler(405)
@@ -109,7 +107,7 @@ def server_error(error):
 
     logger.info(f"Using {args.workers} workers")
     
-    if use_gpu:
+    if SHOULD_USE_GEVENT: # TODO: get rid of this
         serving_type = GeventServing
         logger.debug("Serving with gevent")
     else:
diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
new file mode 100644
index 0000000..2cf4b8d
--- /dev/null
+++ b/requirements.ctranslate2.txt
@@ -0,0 +1,12 @@
+celery[redis,auth,msgpack]>=4.4.7
+flask>=1.1.2
+flask-cors>=3.0.10
+flask-sock
+flask-swagger-ui>=3.36.0
+gevent
+gunicorn
+pyyaml>=5.4.1
+requests>=2.26.0
+wavio>=0.0.4
+websockets
+faster_whisper
\ No newline at end of file
diff --git a/requirements.txt b/requirements.torch.txt
similarity index 85%
rename from requirements.txt
rename to requirements.torch.txt
index bb3bebf..1b15744 100644
--- a/requirements.txt
+++ b/requirements.torch.txt
@@ -12,5 +12,5 @@ speechbrain
 transformers
 wavio>=0.0.4
 websockets
-# git+https://github.com/openai/whisper.git
+# openai-whisper
 git+https://github.com/linto-ai/whisper-timestamped.git
\ No newline at end of file
diff --git a/stt/__init__.py b/stt/__init__.py
index 73c3a1a..43c2725 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -1,8 +1,26 @@
 import logging
-import os
 
 logging.basicConfig(
     format="%(asctime)s %(name)s %(levelname)s: %(message)s",
     datefmt="%d/%m/%Y %H:%M:%S",
 )
 logger = logging.getLogger("__stt__")
+
+try:
+    import faster_whisper
+    USE_CTRANSLATE2 = True
+except ImportError:
+    USE_CTRANSLATE2 = False
+
+try:
+    import torch, torchaudio
+    USE_TORCH = True
+except ImportError:
+    USE_TORCH = False
+
+# TODO: Get rid of that
+if USE_TORCH:
+    SHOULD_USE_GEVENT = torch.cuda.is_available()
+    torch.set_num_threads(1)
+else:
+    SHOULD_USE_GEVENT = USE_CTRANSLATE2
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index d0da27a..86d6fde 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,40 +1,32 @@
 import os
 import logging
 
-import torch
-import whisper_timestamped as whisper
-
 from stt import logger
-from stt.processing.decoding import decode, get_language
-from stt.processing.utils import load_wave_buffer, load_audiofile
+from .decoding import decode, get_language
+from .utils import get_device, LANGUAGES, load_wave_buffer, load_audiofile
 
 from .load_model import load_whisper_model, load_alignment_model, get_alignment_model
 
-__all__ = ["logger", "use_gpu", "decode", "model", "alignment_model",
+__all__ = ["logger", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
 
 # Set informative log
 logger.setLevel(logging.INFO)
 
 # Set device
-device = os.environ.get("DEVICE", "cuda:0" if torch.cuda.is_available() else "cpu")
-try:
-    device = torch.device(device)
-except Exception as err:
-    raise Exception("Failed to set device: {}".format(str(err))) from err
-use_gpu = device.type == "cuda"
+device, use_gpu = get_device()
 logger.info(f"Using device {device}")
 
 # Check language
 language = get_language()
 available_languages = \
-    list(whisper.tokenizer.LANGUAGES.keys()) + \
-    [k.title() for k in whisper.tokenizer.TO_LANGUAGE_CODE.keys()] + \
+    list(LANGUAGES.keys()) + \
+    [k.lower() for k in LANGUAGES.values()] + \
     [None]
 if language not in available_languages:
     raise ValueError(f"Language '{get_language()}' is not available. Available languages are: {available_languages}")
-if isinstance(language, str):
-    language = whisper.tokenizer.TO_LANGUAGE_CODE.get(language.lower(), language)
+if isinstance(language, str) and language not in LANGUAGES:
+    language = {v: k for k, v in LANGUAGES.items()}[language.lower()]
 logger.info(f"Using language {language}")
 
 # Load ASR model
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index 8a7c39f..f026f7a 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -1,11 +1,12 @@
-import math
-import torch
-import torch.nn.utils.rnn as rnn_utils
-
-from stt import logger
+from stt import logger, USE_TORCH
+from .utils import SAMPLE_RATE
 from .load_model import get_model_type
 
-import whisper
+import math
+
+if USE_TORCH:
+    import torch
+    import torch.nn.utils.rnn as rnn_utils
 
 ################################################################################
 # Get list of labes (and blank_id) from model
@@ -135,7 +136,7 @@ def compute_logits_transformers(model_and_processor, audios, max_len):
     model, processor = model_and_processor
 
     # can be different from processor.feature_extractor.sampling_rate
-    sample_rate = whisper.audio.SAMPLE_RATE
+    sample_rate = SAMPLE_RATE
     device = model.device
 
     audios = [audio.numpy() for audio in audios]
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index cac8ab9..77b8881 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -1,19 +1,17 @@
 import os
 
-import whisper
-from whisper.audio import SAMPLE_RATE
-import whisper_timestamped
-
 import numpy as np
-import torch
+import copy
 
-from stt import logger
-from .word_alignment import compute_alignment
-from .text_normalize import remove_punctuation, normalize_text, remove_emoji
+from stt import logger, USE_CTRANSLATE2
+from .utils import SAMPLE_RATE
 from .load_model import load_alignment_model, get_alignment_model
+from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations
+from .word_alignment import compute_alignment
 
-# This is to avoid hanging in a multi-threaded environment
-torch.set_num_threads(1)
+if not USE_CTRANSLATE2:
+    import torch 
+    import whisper_timestamped
 
 
 def get_language():
@@ -29,30 +27,83 @@ def get_language():
         language = None
     return language
 
-def decode(audio: torch.Tensor,
-           model: whisper.model.Whisper,
+
+def decode(audio,
+           model,
            alignment_model: "Any",
            with_word_timestamps: bool,
            language: str = None,
+           remove_punctuation_from_words=False,
            beam_size: int = None,
            best_of: int = None,
            temperature: float = 0.0,
            condition_on_previous_text: bool = False,
            no_speech_threshold: float = 0.6,
-           logprob_threshold: float = -1.0,
            compression_ratio_threshold: float = 2.4,
-           normalize_text_as_words=False,
-           remove_punctuation_from_words=False,
            ) -> dict:
-    """Transcribe the audio data using Whisper with the defined model."""
-
-    fp16 = model.device != torch.device("cpu")
 
     if language is None:
         language = get_language()
 
+    kwargs = copy.copy(locals())
+
     logger.info(f"Transcribing audio with language {language}...")
 
+    if USE_CTRANSLATE2:
+        kwargs.pop("alignment_model")
+        return decode_ct2(**kwargs)
+    else:
+        return decode_torch(**kwargs)
+
+
+def decode_ct2(audio,
+               model,
+               with_word_timestamps,
+               language,
+               remove_punctuation_from_words,
+               **kwargs
+               ):
+
+    kwargs["no_speech_threshold"] = 1   # To avoid empty output
+    if kwargs.get("beam_size") is None:
+        kwargs["beam_size"] = 1
+    if kwargs.get("best_of") is None:
+        kwargs["best_of"] = 1
+
+    segments, info = model.transcribe(
+        audio,
+        word_timestamps=with_word_timestamps,
+        language=language,
+        # Careful with the following options
+        max_initial_timestamp=10000.0,
+        **kwargs)
+
+    segments = list(segments)
+
+    return format_faster_whisper_response(
+        segments, info,
+        remove_punctuation_from_words=remove_punctuation_from_words
+    )
+
+
+def decode_torch(audio,
+                 model,
+                 alignment_model,
+                 with_word_timestamps,
+                 language,
+                 remove_punctuation_from_words,
+                 beam_size,
+                 best_of,
+                 temperature,
+                 condition_on_previous_text,
+                 no_speech_threshold,
+                 compression_ratio_threshold,
+                 normalize_text_as_words=False,
+                 ):
+    """Transcribe the audio data using Whisper with the defined model."""
+
+    fp16 = model.device != torch.device("cpu")
+
     kwargs = dict(
         language=language,
         fp16=fp16,
@@ -61,7 +112,6 @@ def decode(audio: torch.Tensor,
         best_of=best_of,
         condition_on_previous_text=condition_on_previous_text,
         no_speech_threshold=no_speech_threshold,
-        logprob_threshold=logprob_threshold,
         compression_ratio_threshold=compression_ratio_threshold
     )
 
@@ -71,7 +121,7 @@ def decode(audio: torch.Tensor,
         if language is None:
             language = whisper_res["language"]
             logger.info(f"Detected language: {language}")
-        return format_whisper_timestamped_response(whisper_res)
+        return format_whisper_timestamped_response(whisper_res, remove_punctuation_from_words=remove_punctuation_from_words)
 
     # Force deterministic results
     torch.manual_seed(1234)
@@ -99,11 +149,12 @@ def decode(audio: torch.Tensor,
     else:
         spec_alignment_model = alignment_model
 
-
     result = {}
     result["text"] = text
     result["language"] = language
-    result["confidence-score"] = np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0
+    result["confidence-score"] = np.exp(
+        np.array([r["avg_logprob"] for r in segments])
+    ).mean() if len(segments) else 0.0
 
     if not with_word_timestamps:
         if not normalize_text_as_words:
@@ -174,31 +225,91 @@ def decode(audio: torch.Tensor,
 
     return result
 
-def format_whisper_timestamped_response(transcription):
+
+def format_whisper_timestamped_response(transcription, remove_punctuation_from_words=False):
     """Format Whisper response."""
 
     for i, seg in enumerate(transcription["segments"][:-1]):
         for expected_keys in ["start", "end", "words", "avg_logprob"]:
             assert expected_keys in seg, f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
 
-    text = transcription["text"].strip()
-
     words = []
 
     segments = transcription.get("segments", [])
 
     for seg in segments:
         for word in seg.get("words", []):
+            text = word["text"]
+            if remove_punctuation_from_words:
+                text = remove_punctuation(text)
             words.append({
-                "word": word["text"],
+                "word": text,
                 "start": word["start"],
                 "end": word["end"],
                 "conf": word["confidence"],
             })
 
     return {
-        "text": text,
+        "text": transcription["text"].strip(),
         "language": transcription["language"],
-        "confidence-score": np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0,
+        "confidence-score": round(np.exp(np.array([r["avg_logprob"] for r in segments])).mean(), 2) if len(segments) else 0.0,
         "words": words,
-    }
\ No newline at end of file
+    }
+
+
+def format_faster_whisper_response(segments, info,
+                                   remove_punctuation_from_words=False):
+
+    language = info.language
+    duration = info.duration
+
+    def checked_timestamps(start, end=None):
+        if start > duration or (end is not None and end > duration):
+            print("WARNING, timestamp %f is greater than duration %f" % (max(start, end if end else start), duration))
+        if end and end <= start:
+            if end == start:
+                pass # end = start + 0.01
+            else:
+                print("WARNING, end timestamp %f is smaller than start timestamp %f" % (end, start))
+        if end is None:
+            return start
+        return (start, end)
+
+    segments_list = []
+    for segment in segments:
+        start, end = checked_timestamps(segment.start, segment.end)
+
+        words = []
+        if segment.words:
+            for word in segment.words:
+                if len(words) and (not(word.word.strip()) or word.word.strip()[0] in _punctuations):
+                    words[-1]["text"] += word.word
+                    if word.word.strip() not in _punctuations:
+                        words[-1]["confidence"].append(word.probability)
+                        _, words[-1]["end"] = checked_timestamps(words[-1]["end"], word.end)
+                    continue
+                words.append(
+                    {"text": word.word, "confidence": [word.probability]} | dict(zip(("start", "end"), checked_timestamps(word.start, word.end)))
+                )
+
+            for word in words:
+                word["text"] = word["text"].strip()
+                word["confidence"] = round(np.mean([c for c in word["confidence"]]), 2)
+
+        segments_list.append({
+            "text": segment.text.strip(),
+            "start": start,
+            "end": end,
+            "avg_logprob": segment.avg_log_prob,
+            "words": words
+        })
+
+    assert len(segments_list)
+    
+    transcription = {
+        "text": " ".join(segment["text"] for segment in segments_list),
+        "language": language,
+        "confidence": round(np.exp(np.mean([segment.avg_log_prob for segment in segments])), 2),
+        "segments": segments_list,
+    }
+    return format_whisper_timestamped_response(transcription, remove_punctuation_from_words=remove_punctuation_from_words)
\ No newline at end of file
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index c8754d4..dc2e9fc 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -1,14 +1,20 @@
-import whisper_timestamped as whisper
-
 import os
 import requests
-import huggingface_hub
-import speechbrain as sb
-import transformers
-import torchaudio
-
 import time
-from stt import logger
+
+from stt import logger, USE_CTRANSLATE2, USE_TORCH
+from .utils import LANGUAGES
+
+if USE_CTRANSLATE2:
+    import faster_whisper as whisper
+else:
+    import whisper_timestamped as whisper
+
+if USE_TORCH:
+    import huggingface_hub
+    import speechbrain as sb
+    import transformers
+    import torchaudio
 
 # Sources:
 # * https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
@@ -42,20 +48,24 @@
 }
 
 
-def get_alignment_model(alignment_model_name, language, force = False):
+def get_alignment_model(alignment_model_name, language, force=False):
     if alignment_model_name in ["wav2vec", "wav2vec2"]:
         if language is None:
-            # Will load alignment model on the fly depending on detected language
+            # Will load alignment model on the fly depending
+            # on detected language
             return {}
         elif language in ALIGNMENT_MODELS:
             return ALIGNMENT_MODELS[language]
         elif force:
-            raise ValueError(f"No wav2vec alignment model for language '{language}'.")
+            raise ValueError(
+                f"No wav2vec alignment model for language '{language}'.")
         else:
-            logger.warn(f"No wav2vec alignment model for language '{language}'. Fallback to English.")
+            logger.warn(
+                f"No wav2vec alignment model for language '{language}'. Fallback to English."
+            )
             return ALIGNMENT_MODELS["en"]
-    elif alignment_model_name in whisper.tokenizer.LANGUAGES.keys():
-        return get_alignment_model("wav2vec", alignment_model_name, force = True)
+    elif alignment_model_name in LANGUAGES.keys():
+        return get_alignment_model("wav2vec", alignment_model_name, force=True)
     return alignment_model_name
 
 
@@ -63,11 +73,27 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
 
     start = time.time()
 
-    model = whisper.load_model(model_type_or_file, device=device,
-                               download_root=os.path.join(download_root, "whisper"))
+    if USE_CTRANSLATE2:
+        if not os.path.isdir(model_type_or_file):
+            # To specify the cache directory
+            model_type_or_file = whisper.utils.download_model(
+                model_type_or_file,
+                output_dir=os.path.join(download_root, "huggingface/hub")
+            )
+        model = whisper.WhisperModel(model_type_or_file, device=device,
+                                     # vvv TODO
+                                     compute_type="default",
+                                    #  cpu_threads=0,
+                                    #  num_workers=1,
+                                     )
 
-    model.eval()
-    model.requires_grad_(False)
+    else:
+        model = whisper.load_model(
+            model_type_or_file, device=device,
+            download_root=os.path.join(download_root, "whisper")
+        )
+        model.eval()
+        model.requires_grad_(False)
 
     logger.info("Whisper Model loaded. (t={}s)".format(time.time() - start))
 
@@ -76,21 +102,29 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
 
 def load_alignment_model(source, device="cpu", download_root="/opt"):
 
+    if not USE_TORCH:
+        raise NotImplementedError(
+            "Alignement model not available without Torch")
+
     start = time.time()
 
     if source in torchaudio.pipelines.__all__:
-        model = load_torchaudio_model(source, device=device, download_root=download_root)
+        model = load_torchaudio_model(
+            source, device=device, download_root=download_root)
     else:
         try:
-            model = load_transformers_model(source, device=device, download_root=download_root)
+            model = load_transformers_model(
+                source, device=device, download_root=download_root)
         except Exception as err1:
             try:
-                model = load_speechbrain_model(source, device=device, download_root=download_root)
+                model = load_speechbrain_model(
+                    source, device=device, download_root=download_root)
             except Exception as err2:
                 raise Exception(
                     f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
 
-    logger.info(f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
+    logger.info(
+        f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
 
     return model
 
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index da2675e..a4037bd 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -2,7 +2,6 @@
 import re
 # import string
 import unicodedata
-from num2words import num2words
 
 from stt import logger
 from .utils import flatten
@@ -44,7 +43,6 @@ def remove_emoji(text):
 def normalize_text(text: str, lang: str) -> str:
     """ Transform digits into characters... """
 
-
     # Reorder currencies (1,20€ -> 1 € 20)
     coma = "," if lang in ["fr"] else "\."
     for c in _currencies:
@@ -215,6 +213,7 @@ def robust_num2words(x, lang, to="cardinal", orig=""):
     """
     Bugfix for num2words
     """
+    from num2words import num2words
     try:
         res = num2words(x, lang=lang, to=to)
         if lang == "fr" and to == "ordinal":
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 5ff706a..e8b2bd8 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -1,17 +1,42 @@
+from stt import USE_CTRANSLATE2, USE_TORCH
+
 import io
 import wavio
 import os
 import numpy as np
-import torch
-import torchaudio
-import whisper
 
+SAMPLE_RATE = 16000 # whisper.audio.SAMPLE_RATE
+
+if USE_CTRANSLATE2:
+    import ctranslate2
+    import faster_whisper
+else:
+    import torch
+    import torchaudio
+    import whisper
+
+def has_cuda():
+    if USE_CTRANSLATE2:
+        return ctranslate2.get_cuda_device_count() > 0
+    else:
+        return torch.cuda.is_available()
+
+def get_device():
+    device = os.environ.get("DEVICE", "cuda" if has_cuda() else "cpu")
+    use_gpu = "cuda" in device
+    if not USE_CTRANSLATE2:
+        try:
+            device = torch.device(device)
+        except Exception as err:
+            raise Exception("Failed to set device: {}".format(str(err))) from err
+    return device, use_gpu
 
 def conform_audio(audio, sample_rate=16_000):
-    if sample_rate != whisper.audio.SAMPLE_RATE:
+    if sample_rate != SAMPLE_RATE:
+        if not USE_TORCH:
+            raise NotImplementedError("Resampling not available without Torch")
         # Down or Up sample to the right sampling rate
-        audio = torchaudio.transforms.Resample(
-            sample_rate, whisper.audio.SAMPLE_RATE)(audio)
+        audio = torchaudio.transforms.Resample(sample_rate, SAMPLE_RATE)(audio)
     if audio.shape[0] > 1:
         # Stereo to mono
         # audio = torchaudio.transforms.DownmixMono()(audio, channels_first = True)
@@ -26,8 +51,8 @@ def load_audiofile(path):
         raise RuntimeError("File not found: %s" % path)
     elif not os.access(path, os.R_OK):
         raise RuntimeError("Missing reading permission for: %s" % path)
-    # audio, sample_rate = torchaudio.load(path)
-    # return conform_audio(audio, sample_rate)
+    if USE_CTRANSLATE2:
+        return faster_whisper.decode_audio(path, sampling_rate=SAMPLE_RATE)
     audio = whisper.load_audio(path)
     audio = torch.from_numpy(audio)
     return audio
@@ -36,10 +61,13 @@ def load_audiofile(path):
 def load_wave_buffer(file_buffer):
     """ Formats audio from a wavFile buffer to a torch array for processing. """
     file_buffer_io = io.BytesIO(file_buffer)
+    if USE_CTRANSLATE2:
+        return faster_whisper.decode_audio(file_buffer_io, sampling_rate=SAMPLE_RATE)
     file_content = wavio.read(file_buffer_io)
     sample_rate = file_content.rate
-    audio = torch.from_numpy(file_content.data.astype(np.float32)/32768)
-    audio = audio.transpose(0, 1)
+    audio = file_content.data.astype(np.float32)/32768
+    audio = audio.transpose()
+    audio = torch.from_numpy(audio)
     return conform_audio(audio, sample_rate)
 
 
@@ -48,3 +76,105 @@ def flatten(l):
     flatten a list of lists
     """
     return [item for sublist in l for item in sublist]
+
+LANGUAGES = { # whisper.tokenizer.LANGUAGES
+    'en': 'english',
+    'zh': 'chinese',
+    'de': 'german',
+    'es': 'spanish',
+    'ru': 'russian',
+    'ko': 'korean',
+    'fr': 'french',
+    'ja': 'japanese',
+    'pt': 'portuguese',
+    'tr': 'turkish',
+    'pl': 'polish',
+    'ca': 'catalan',
+    'nl': 'dutch',
+    'ar': 'arabic',
+    'sv': 'swedish',
+    'it': 'italian',
+    'id': 'indonesian',
+    'hi': 'hindi',
+    'fi': 'finnish',
+    'vi': 'vietnamese',
+    'he': 'hebrew',
+    'uk': 'ukrainian',
+    'el': 'greek',
+    'ms': 'malay',
+    'cs': 'czech',
+    'ro': 'romanian',
+    'da': 'danish',
+    'hu': 'hungarian',
+    'ta': 'tamil',
+    'no': 'norwegian',
+    'th': 'thai',
+    'ur': 'urdu',
+    'hr': 'croatian',
+    'bg': 'bulgarian',
+    'lt': 'lithuanian',
+    'la': 'latin',
+    'mi': 'maori',
+    'ml': 'malayalam',
+    'cy': 'welsh',
+    'sk': 'slovak',
+    'te': 'telugu',
+    'fa': 'persian',
+    'lv': 'latvian',
+    'bn': 'bengali',
+    'sr': 'serbian',
+    'az': 'azerbaijani',
+    'sl': 'slovenian',
+    'kn': 'kannada',
+    'et': 'estonian',
+    'mk': 'macedonian',
+    'br': 'breton',
+    'eu': 'basque',
+    'is': 'icelandic',
+    'hy': 'armenian',
+    'ne': 'nepali',
+    'mn': 'mongolian',
+    'bs': 'bosnian',
+    'kk': 'kazakh',
+    'sq': 'albanian',
+    'sw': 'swahili',
+    'gl': 'galician',
+    'mr': 'marathi',
+    'pa': 'punjabi',
+    'si': 'sinhala',
+    'km': 'khmer',
+    'sn': 'shona',
+    'yo': 'yoruba',
+    'so': 'somali',
+    'af': 'afrikaans',
+    'oc': 'occitan',
+    'ka': 'georgian',
+    'be': 'belarusian',
+    'tg': 'tajik',
+    'sd': 'sindhi',
+    'gu': 'gujarati',
+    'am': 'amharic',
+    'yi': 'yiddish',
+    'lo': 'lao',
+    'uz': 'uzbek',
+    'fo': 'faroese',
+    'ht': 'haitian creole',
+    'ps': 'pashto',
+    'tk': 'turkmen',
+    'nn': 'nynorsk',
+    'mt': 'maltese',
+    'sa': 'sanskrit',
+    'lb': 'luxembourgish',
+    'my': 'myanmar',
+    'bo': 'tibetan',
+    'tl': 'tagalog',
+    'mg': 'malagasy',
+    'as': 'assamese',
+    'tt': 'tatar',
+    'haw': 'hawaiian',
+    'ln': 'lingala',
+    'ha': 'hausa',
+    'ba': 'bashkir',
+    'jw': 'javanese',
+    'su': 'sundanese'
+}
diff --git a/stt/processing/word_alignment.py b/stt/processing/word_alignment.py
index ba94a14..229fb43 100644
--- a/stt/processing/word_alignment.py
+++ b/stt/processing/word_alignment.py
@@ -1,14 +1,16 @@
 """
-source: https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html
+Credits: https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html
 """
+from stt import logger, USE_TORCH
 from dataclasses import dataclass
-import torch
 
-from stt import logger
 from .alignment_model import compute_logprobas, get_vocab
 from .utils import flatten
 from .text_normalize import transliterate
 
+if USE_TORCH:
+    import torch
+
 _unknown_chars = []
 
 def compute_alignment(audio, transcript, model):

From 888227ea274b18719318c4027b008794d68a0f06 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 12 Apr 2023 12:49:51 +0200
Subject: [PATCH 122/172] better format in logger timing

---
 http_server/ingress.py | 6 +-----
 stt/__init__.py        | 4 ++--
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index fae8c2f..d5524ca 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -17,10 +17,6 @@
 app.config["JSON_AS_ASCII"] = False
 app.config["JSON_SORT_KEYS"] = False
 
-logging.basicConfig(
-    format="%(asctime)s %(name)s %(levelname)s: %(message)s",
-    datefmt="%d/%m/%Y %H:%M:%S",
-)
 logger = logging.getLogger("__stt-standalone-worker__")
 
 
@@ -37,7 +33,7 @@ def oas_docs():
 @app.route("/transcribe", methods=["POST"])
 def transcribe():
     try:
-        logger.info(f"Transcribe request received {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))}")
+        logger.info(f"Transcribe request received")
 
         # get response content type
         # logger.debug(request.headers.get("accept").lower())
diff --git a/stt/__init__.py b/stt/__init__.py
index 43c2725..5460088 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -1,8 +1,8 @@
 import logging
 
 logging.basicConfig(
-    format="%(asctime)s %(name)s %(levelname)s: %(message)s",
-    datefmt="%d/%m/%Y %H:%M:%S",
+    format="[%(asctime)s,%(msecs)03d %(name)s] %(levelname)s: %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
 )
 logger = logging.getLogger("__stt__")
 

From ff1bf622267b14b15d99c05c6e7ca0e665bff86d Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 12 Apr 2023 13:08:37 +0200
Subject: [PATCH 123/172] reorganize code: move alignment model related stuff
 in alignment_model.py

---
 stt/processing/__init__.py        |   3 +-
 stt/processing/alignment_model.py | 182 +++++++++++++++++++++++++++++-
 stt/processing/decoding.py        |   4 +-
 stt/processing/load_model.py      | 180 +----------------------------
 4 files changed, 185 insertions(+), 184 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 86d6fde..f891984 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -5,7 +5,8 @@
 from .decoding import decode, get_language
 from .utils import get_device, LANGUAGES, load_wave_buffer, load_audiofile
 
-from .load_model import load_whisper_model, load_alignment_model, get_alignment_model
+from .load_model import load_whisper_model
+from .alignment_model import load_alignment_model, get_alignment_model
 
 __all__ = ["logger", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index f026f7a..08a5e45 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -1,15 +1,191 @@
 from stt import logger, USE_TORCH
-from .utils import SAMPLE_RATE
-from .load_model import get_model_type
+from .utils import SAMPLE_RATE, LANGUAGES
 
+import os
 import math
+import time
+import requests
 
 if USE_TORCH:
     import torch
     import torch.nn.utils.rnn as rnn_utils
+    import huggingface_hub
+    import speechbrain as sb
+    import transformers
+    import torchaudio
 
 ################################################################################
-# Get list of labes (and blank_id) from model
+# Load models
+
+# Sources:
+# * https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
+# * https://pytorch.org/audio/stable/pipelines.html
+# * https://huggingface.co/jonatasgrosman
+
+ALIGNMENT_MODELS = {
+    "en": "WAV2VEC2_ASR_BASE_960H",
+    # "en": "jonatasgrosman/wav2vec2-large-xlsr-53-english",
+    "fr": "VOXPOPULI_ASR_BASE_10K_FR",
+    # "fr": "jonatasgrosman/wav2vec2-large-xlsr-53-french",
+    "de": "VOXPOPULI_ASR_BASE_10K_DE",
+    # "de": "jonatasgrosman/wav2vec2-large-xlsr-53-german",
+    "es": "VOXPOPULI_ASR_BASE_10K_ES",
+    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-spanish",
+    "it": "VOXPOPULI_ASR_BASE_10K_IT",
+    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-italian",
+    "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese",
+    "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
+    "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish",
+    "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish",
+    "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian",
+    "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek",
+    "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
+    "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic",
+    "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian",
+    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
+    "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
+    "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
+    "vi": "nguyenvulebinh/wav2vec2-base-vietnamese-250h",
+}
+
+
+def get_alignment_model(alignment_model_name, language, force=False):
+    if alignment_model_name in ["wav2vec", "wav2vec2"]:
+        if language is None:
+            # Will load alignment model on the fly depending
+            # on detected language
+            return {}
+        elif language in ALIGNMENT_MODELS:
+            return ALIGNMENT_MODELS[language]
+        elif force:
+            raise ValueError(
+                f"No wav2vec alignment model for language '{language}'.")
+        else:
+            logger.warn(
+                f"No wav2vec alignment model for language '{language}'. Fallback to English."
+            )
+            return ALIGNMENT_MODELS["en"]
+    elif alignment_model_name in LANGUAGES.keys():
+        return get_alignment_model("wav2vec", alignment_model_name, force=True)
+    return alignment_model_name
+
+def load_alignment_model(source, device="cpu", download_root="/opt"):
+
+    if not USE_TORCH:
+        raise NotImplementedError(
+            "Alignement model not available without Torch")
+
+    start = time.time()
+
+    if source in torchaudio.pipelines.__all__:
+        model = load_torchaudio_model(
+            source, device=device, download_root=download_root)
+    else:
+        try:
+            model = load_transformers_model(
+                source, device=device, download_root=download_root)
+        except Exception as err1:
+            try:
+                model = load_speechbrain_model(
+                    source, device=device, download_root=download_root)
+            except Exception as err2:
+                raise Exception(
+                    f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
+
+    logger.info(
+        f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
+
+    return model
+
+
+def load_speechbrain_model(source, device="cpu", download_root="/opt"):
+
+    if os.path.isdir(source):
+        yaml_file = os.path.join(source, "hyperparams.yaml")
+        assert os.path.isfile(
+            yaml_file), f"Hyperparams file {yaml_file} not found"
+    else:
+        try:
+            yaml_file = huggingface_hub.hf_hub_download(
+                repo_id=source, filename="hyperparams.yaml", cache_dir=os.path.join(download_root, "huggingface/hub"))
+        except requests.exceptions.HTTPError:
+            yaml_file = None
+    overrides = make_yaml_overrides(
+        yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
+
+    savedir = os.path.join(download_root, "speechbrain")
+    try:
+        model = sb.pretrained.EncoderASR.from_hparams(
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
+    except ValueError:
+        model = sb.pretrained.EncoderDecoderASR.from_hparams(
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
+
+    model.train(False)
+    model.requires_grad_(False)
+    return model
+
+
+def load_transformers_model(source, device="cpu", download_root="/opt"):
+
+    model = transformers.Wav2Vec2ForCTC.from_pretrained(source).to(device)
+    processor = transformers.Wav2Vec2Processor.from_pretrained(source)
+
+    model.eval()
+    model.requires_grad_(False)
+    return model, processor
+
+
+def load_torchaudio_model(source, device="cpu", download_root="/opt"):
+
+    bundle = torchaudio.pipelines.__dict__[source]
+    model = bundle.get_model().to(device)
+    labels = bundle.get_labels()
+
+    model.eval()
+    model.requires_grad_(False)
+    return model, labels
+
+
+def get_model_type(model):
+    if not isinstance(model, tuple):
+        return "speechbrain"
+    assert len(model) == 2, "Invalid model type"
+    if isinstance(model[0], transformers.Wav2Vec2ForCTC):
+        return "transformers"
+    return "torchaudio"
+
+
+def make_yaml_overrides(yaml_file, key_values):
+    """
+    return a dictionary of overrides to be used with speechbrain (hyperyaml files)
+    yaml_file: path to yaml file
+    key_values: dict of key values to override
+    """
+    if yaml_file is None:
+        return None
+
+    override = {}
+    with open(yaml_file, "r") as f:
+        parent = None
+        for line in f:
+            if line.strip() == "":
+                parent = None
+            elif line == line.lstrip():
+                if ":" in line:
+                    parent = line.split(":")[0].strip()
+                    if parent in key_values:
+                        override[parent] = key_values[parent]
+            elif ":" in line:
+                child = line.strip().split(":")[0].strip()
+                if child in key_values:
+                    override[parent] = override.get(parent, {}) | {
+                        child: key_values[child]}
+    return override
+
+
+################################################################################
+# Get list of labels (and blank_id) from model
 
 
 def get_vocab(model):
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 77b8881..f6d00e5 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -5,8 +5,8 @@
 
 from stt import logger, USE_CTRANSLATE2
 from .utils import SAMPLE_RATE
-from .load_model import load_alignment_model, get_alignment_model
 from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations
+from .alignment_model import get_alignment_model, load_alignment_model
 from .word_alignment import compute_alignment
 
 if not USE_CTRANSLATE2:
@@ -312,4 +312,4 @@ def checked_timestamps(start, end=None):
         "confidence": round(np.exp(np.mean([segment.avg_log_prob for segment in segments])), 2),
         "segments": segments_list,
     }
-    return format_whisper_timestamped_response(transcription, remove_punctuation_from_words=remove_punctuation_from_words)
\ No newline at end of file
+    return format_whisper_timestamped_response(transcription, remove_punctuation_from_words=remove_punctuation_from_words)
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index dc2e9fc..4ba9f0e 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -1,74 +1,13 @@
 import os
-import requests
 import time
 
-from stt import logger, USE_CTRANSLATE2, USE_TORCH
-from .utils import LANGUAGES
+from stt import logger, USE_CTRANSLATE2
 
 if USE_CTRANSLATE2:
     import faster_whisper as whisper
 else:
     import whisper_timestamped as whisper
 
-if USE_TORCH:
-    import huggingface_hub
-    import speechbrain as sb
-    import transformers
-    import torchaudio
-
-# Sources:
-# * https://github.com/m-bain/whisperX (in whisperx/transcribe.py)
-# * https://pytorch.org/audio/stable/pipelines.html
-# * https://huggingface.co/jonatasgrosman
-
-ALIGNMENT_MODELS = {
-    "en": "WAV2VEC2_ASR_BASE_960H",
-    # "en": "jonatasgrosman/wav2vec2-large-xlsr-53-english",
-    "fr": "VOXPOPULI_ASR_BASE_10K_FR",
-    # "fr": "jonatasgrosman/wav2vec2-large-xlsr-53-french",
-    "de": "VOXPOPULI_ASR_BASE_10K_DE",
-    # "de": "jonatasgrosman/wav2vec2-large-xlsr-53-german",
-    "es": "VOXPOPULI_ASR_BASE_10K_ES",
-    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-spanish",
-    "it": "VOXPOPULI_ASR_BASE_10K_IT",
-    # "it": "jonatasgrosman/wav2vec2-large-xlsr-53-italian",
-    "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese",
-    "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
-    "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish",
-    "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish",
-    "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian",
-    "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek",
-    "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
-    "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic",
-    "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian",
-    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
-    "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
-    "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
-    "vi": "nguyenvulebinh/wav2vec2-base-vietnamese-250h",
-}
-
-
-def get_alignment_model(alignment_model_name, language, force=False):
-    if alignment_model_name in ["wav2vec", "wav2vec2"]:
-        if language is None:
-            # Will load alignment model on the fly depending
-            # on detected language
-            return {}
-        elif language in ALIGNMENT_MODELS:
-            return ALIGNMENT_MODELS[language]
-        elif force:
-            raise ValueError(
-                f"No wav2vec alignment model for language '{language}'.")
-        else:
-            logger.warn(
-                f"No wav2vec alignment model for language '{language}'. Fallback to English."
-            )
-            return ALIGNMENT_MODELS["en"]
-    elif alignment_model_name in LANGUAGES.keys():
-        return get_alignment_model("wav2vec", alignment_model_name, force=True)
-    return alignment_model_name
-
-
 def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
 
     start = time.time()
@@ -97,119 +36,4 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
 
     logger.info("Whisper Model loaded. (t={}s)".format(time.time() - start))
 
-    return model
-
-
-def load_alignment_model(source, device="cpu", download_root="/opt"):
-
-    if not USE_TORCH:
-        raise NotImplementedError(
-            "Alignement model not available without Torch")
-
-    start = time.time()
-
-    if source in torchaudio.pipelines.__all__:
-        model = load_torchaudio_model(
-            source, device=device, download_root=download_root)
-    else:
-        try:
-            model = load_transformers_model(
-                source, device=device, download_root=download_root)
-        except Exception as err1:
-            try:
-                model = load_speechbrain_model(
-                    source, device=device, download_root=download_root)
-            except Exception as err2:
-                raise Exception(
-                    f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
-
-    logger.info(
-        f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
-
-    return model
-
-
-def load_speechbrain_model(source, device="cpu", download_root="/opt"):
-
-    if os.path.isdir(source):
-        yaml_file = os.path.join(source, "hyperparams.yaml")
-        assert os.path.isfile(
-            yaml_file), f"Hyperparams file {yaml_file} not found"
-    else:
-        try:
-            yaml_file = huggingface_hub.hf_hub_download(
-                repo_id=source, filename="hyperparams.yaml", cache_dir=os.path.join(download_root, "huggingface/hub"))
-        except requests.exceptions.HTTPError:
-            yaml_file = None
-    overrides = make_yaml_overrides(
-        yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
-
-    savedir = os.path.join(download_root, "speechbrain")
-    try:
-        model = sb.pretrained.EncoderASR.from_hparams(
-            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
-    except ValueError:
-        model = sb.pretrained.EncoderDecoderASR.from_hparams(
-            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
-
-    model.train(False)
-    model.requires_grad_(False)
-    return model
-
-
-def load_transformers_model(source, device="cpu", download_root="/opt"):
-
-    model = transformers.Wav2Vec2ForCTC.from_pretrained(source).to(device)
-    processor = transformers.Wav2Vec2Processor.from_pretrained(source)
-
-    model.eval()
-    model.requires_grad_(False)
-    return model, processor
-
-
-def load_torchaudio_model(source, device="cpu", download_root="/opt"):
-
-    bundle = torchaudio.pipelines.__dict__[source]
-    model = bundle.get_model().to(device)
-    labels = bundle.get_labels()
-
-    model.eval()
-    model.requires_grad_(False)
-    return model, labels
-
-
-def get_model_type(model):
-    if not isinstance(model, tuple):
-        return "speechbrain"
-    assert len(model) == 2, "Invalid model type"
-    if isinstance(model[0], transformers.Wav2Vec2ForCTC):
-        return "transformers"
-    return "torchaudio"
-
-
-def make_yaml_overrides(yaml_file, key_values):
-    """
-    return a dictionary of overrides to be used with speechbrain (hyperyaml files)
-    yaml_file: path to yaml file
-    key_values: dict of key values to override
-    """
-    if yaml_file is None:
-        return None
-
-    override = {}
-    with open(yaml_file, "r") as f:
-        parent = None
-        for line in f:
-            if line.strip() == "":
-                parent = None
-            elif line == line.lstrip():
-                if ":" in line:
-                    parent = line.split(":")[0].strip()
-                    if parent in key_values:
-                        override[parent] = key_values[parent]
-            elif ":" in line:
-                child = line.strip().split(":")[0].strip()
-                if child in key_values:
-                    override[parent] = override.get(parent, {}) | {
-                        child: key_values[child]}
-    return override
+    return model
\ No newline at end of file

From 16b5743960b3ece59bb69fe8c0f902fa7373f04c Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 12 Apr 2023 18:32:51 +0200
Subject: [PATCH 124/172] cosm

---
 stt/processing/load_model.py | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 4ba9f0e..14ec3e2 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -19,11 +19,11 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
                 model_type_or_file,
                 output_dir=os.path.join(download_root, "huggingface/hub")
             )
-        model = whisper.WhisperModel(model_type_or_file, device=device,
-                                     # vvv TODO
+        model = whisper.WhisperModel(model_type_or_file,
+                                     device=device,
                                      compute_type="default",
-                                    #  cpu_threads=0,
-                                    #  num_workers=1,
+                                     cpu_threads=0, # Can be controled with OMP_NUM_THREADS
+                                     num_workers=1,
                                      )
 
     else:

From 916e1292454a9967023f8968f0cf19b2522fcbe1 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 12 Apr 2023 18:34:35 +0200
Subject: [PATCH 125/172] log processing time at a common place

---
 http_server/ingress.py     |  3 +--
 stt/processing/decoding.py | 12 +++++++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index d5524ca..bab7b65 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -50,13 +50,12 @@ def transcribe():
             raise ValueError(f"No audio file was uploaded (missing 'file' key)")
 
         file_buffer = request.files["file"].read()
-        start_t = time.time()
+        
         audio_data = load_wave_buffer(file_buffer)
 
         # Transcription
         transcription = decode(
             audio_data, model, alignment_model, join_metadata)
-        logger.debug("Transcription complete (t={}s)".format(time.time() - start_t))
 
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index f6d00e5..6572053 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -1,5 +1,5 @@
 import os
-
+import time
 import numpy as np
 import copy
 
@@ -49,11 +49,17 @@ def decode(audio,
 
     logger.info(f"Transcribing audio with language {language}...")
 
+    start_t = time.time()
+
     if USE_CTRANSLATE2:
         kwargs.pop("alignment_model")
-        return decode_ct2(**kwargs)
+        res = decode_ct2(**kwargs)
     else:
-        return decode_torch(**kwargs)
+        res = decode_torch(**kwargs)
+
+    logger.info("Transcription complete (t={}s)".format(time.time() - start_t))
+    
+    return res
 
 
 def decode_ct2(audio,

From 182443be94e92cb3707d65211c8cd5b4593034ac Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 13 Apr 2023 11:22:33 +0200
Subject: [PATCH 126/172] Simplify Dockerfile

---
 Dockerfile.ctranslate2     | 24 ++----------------------
 Dockerfile.ctranslate2.cpu | 23 +++++++++++++++++++++++
 Dockerfile.torch           | 22 +---------------------
 Dockerfile.torch.cpu       | 22 +---------------------
 4 files changed, 27 insertions(+), 64 deletions(-)
 create mode 100644 Dockerfile.ctranslate2.cpu

diff --git a/Dockerfile.ctranslate2 b/Dockerfile.ctranslate2
index 5989b3f..e2e0008 100644
--- a/Dockerfile.ctranslate2
+++ b/Dockerfile.ctranslate2
@@ -1,27 +1,7 @@
-FROM python:3.9
+FROM ghcr.io/opennmt/ctranslate2:latest-ubuntu20.04-cuda11.2
 LABEL maintainer="jlouradour@linagora.com"
 
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        wget \
-        nano \
-        bzip2 \
-        unzip \
-        xz-utils \
-        sox \
-        ffmpeg \
-        g++ \
-        make \
-        cmake \
-        git \
-        zlib1g-dev \
-        automake \
-        autoconf \
-        libtool \
-        pkg-config \
-        ca-certificates
-
-RUN rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
 
 # Install python dependencies
 COPY requirements.ctranslate2.txt ./
diff --git a/Dockerfile.ctranslate2.cpu b/Dockerfile.ctranslate2.cpu
new file mode 100644
index 0000000..46c148e
--- /dev/null
+++ b/Dockerfile.ctranslate2.cpu
@@ -0,0 +1,23 @@
+FROM python:3.9
+LABEL maintainer="jlouradour@linagora.com"
+
+RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
+
+# Install python dependencies
+COPY requirements.ctranslate2.txt ./
+RUN pip install --no-cache-dir -r requirements.ctranslate2.txt && rm requirements.ctranslate2.txt
+
+WORKDIR /usr/src/app
+
+COPY stt /usr/src/app/stt
+COPY celery_app /usr/src/app/celery_app
+COPY http_server /usr/src/app/http_server
+COPY websocket /usr/src/app/websocket
+COPY document /usr/src/app/document
+COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+
+ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
+
+HEALTHCHECK CMD ./healthcheck.sh
+
+ENTRYPOINT ["./docker-entrypoint.sh"]
\ No newline at end of file
diff --git a/Dockerfile.torch b/Dockerfile.torch
index 9db2a58..37480c0 100644
--- a/Dockerfile.torch
+++ b/Dockerfile.torch
@@ -1,27 +1,7 @@
 FROM python:3.9
 LABEL maintainer="jlouradour@linagora.com"
 
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        wget \
-        nano \
-        bzip2 \
-        unzip \
-        xz-utils \
-        sox \
-        ffmpeg \
-        g++ \
-        make \
-        cmake \
-        git \
-        zlib1g-dev \
-        automake \
-        autoconf \
-        libtool \
-        pkg-config \
-        ca-certificates
-
-RUN rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
 
 # Install python dependencies
 COPY requirements.torch.txt ./
diff --git a/Dockerfile.torch.cpu b/Dockerfile.torch.cpu
index 68ceda1..72582b6 100644
--- a/Dockerfile.torch.cpu
+++ b/Dockerfile.torch.cpu
@@ -1,27 +1,7 @@
 FROM python:3.9
 LABEL maintainer="jlouradour@linagora.com"
 
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        wget \
-        nano \
-        bzip2 \
-        unzip \
-        xz-utils \
-        sox \
-        ffmpeg \
-        g++ \
-        make \
-        cmake \
-        git \
-        zlib1g-dev \
-        automake \
-        autoconf \
-        libtool \
-        pkg-config \
-        ca-certificates
-
-RUN rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
 
 # Force CPU versions of torch
 RUN pip3 install \

From 51ffca54cf8db6f33cc8921737bac435e250e429 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 13 Apr 2023 11:23:18 +0200
Subject: [PATCH 127/172] tune and document default .env file

---
 .envdefault | 45 +++++++++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/.envdefault b/.envdefault
index d4bb2e8..1dbc2b1 100644
--- a/.envdefault
+++ b/.envdefault
@@ -1,22 +1,39 @@
+############################################
 # SERVING PARAMETERS
+############################################
+# "http" or "task"
 SERVICE_MODE=http
-MODEL=/opt/model.pt
 
-# LANGUAGE can be in different formats: en, en-US, English, ...
-# If not set or "*", the language will be detected automatically.
+# Below: used when SERVICE_MODE=task
+SERVICE_NAME=stt
+SERVICES_BROKER=redis://172.17.0.1:6379
+BROKER_PASS=
+
+############################################
+# STT MODELING PARAMETERS
+############################################
+
+# The model can be a path to a model, or a model name ("tiny", "base", "small", "medium", "large-v1" or "large-v2")
+MODEL=medium
+
+# The language can be in different formats: "en", "en-US", "English", ...
+# If not set or set to "*", the language will be detected automatically.
 LANGUAGE=*
 
-#DEVICE=cuda:0
+# An alignment wav2vec model can be used to get word timestamps.
+# It can be a path to a model, a language code (fr, en, ...), or "wav2vec" to automatically chose a model for the language
+# This option is experimental (and not implemented with ctranslate2).
+# ALIGNMENT_MODEL=wav2vec
 
-# Only used for alignement using wav2vec models
-#ALIGNMENT_MODEL=fr
-#ALIGNMENT_MODEL=wav2vec
-#ALIGNMENT_MODEL=/opt/alignment_model
+############################################
+# EFFICIENCY PARAMETERS
+############################################
 
-# TASK PARAMETERS
-SERVICE_NAME=stt
-SERVICES_BROKER=redis://192.168.0.1:6379
-BROKER_PASS=password
+# Device to use. It can be "cuda" to force/check GPU, "cpu" to force computation on CPU, or a specific GPU ("cuda:0", "cuda:1", ...)
+# DEVICE=cuda:0
+
+# Number of threads per worker when running on CPU
+OMP_NUM_THREADS=4
 
-# CONCURRENCY
-CONCURRENCY=2
\ No newline at end of file
+# Number of workers
+CONCURRENCY=2

From 0d47486708b0866c55d695a72efa3781808442d2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 13 Apr 2023 12:39:27 +0200
Subject: [PATCH 128/172] use --pool=solo option in celery on GPU to avoid CUDA
 initialization error

---
 docker-entrypoint.sh | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 5014d8f..97a3804 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-set -ea
+set -a
 
 echo "RUNNING STT"
 
@@ -20,7 +20,7 @@ else
     if [ "$SERVICE_MODE" = "http" ] 
     then
         echo "RUNNING STT HTTP SERVER"
-        python http_server/ingress.py --debug
+        python3 http_server/ingress.py --debug
     elif [ "$SERVICE_MODE" == "task" ]
     then
         if [[ -z "$SERVICES_BROKER" ]]
@@ -28,12 +28,22 @@ else
             echo "ERROR: SERVICES_BROKER variable not specified, cannot start celery worker."
             exit -1
         fi
-        /usr/src/app/wait-for-it.sh $(echo $SERVICES_BROKER | cut -d'/' -f 3) --timeout=20 --strict -- echo " $SERVICES_BROKER (Service Broker) is up"
+        nvidia-smi 2> /dev/null > /dev/null
+        if [ $? -eq 0 ];then
+            echo "GPU detected"
+            GPU=1
+            OPT="--pool=solo"
+        else
+            echo "No GPU detected"
+            GPU=0
+            OPT=""
+        fi
+        /usr/src/app/wait-for-it.sh $(echo $SERVICES_BROKER | cut -d'/' -f 3) --timeout=20 --strict -- echo " $SERVICES_BROKER (Service Broker) is up" || exit 1
         echo "RUNNING STT CELERY WORKER"
-        celery --app=celery_app.celeryapp worker -Ofair --queues=${SERVICE_NAME} -c ${CONCURRENCY} -n ${SERVICE_NAME}_worker@%h
+        celery --app=celery_app.celeryapp worker $OPT -Ofair --queues=${SERVICE_NAME} -c ${CONCURRENCY} -n ${SERVICE_NAME}_worker@%h
 
     else
-        echo "ERROR: Wrong serving command: $1"
+        echo "ERROR: Wrong serving command: $SERVICE_MODE"
         exit -1
     fi
 fi

From e1c7ecda643cd2b6d04d001052c37329927b7cf3 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 13 Apr 2023 12:40:27 +0200
Subject: [PATCH 129/172] Lazy loading of the model (to avoid deadlocks on
 multithreaded processes) + misc: - do not necessarily use torchaudio -
 clarify cache folder business with faster_whisper - little fixes - move
 get_language into utils.py

---
 http_server/ingress.py            |  5 ++--
 requirements.ctranslate2.txt      |  1 +
 requirements.torch.txt            |  1 +
 stt/__init__.py                   | 19 +++++++-----
 stt/processing/__init__.py        | 35 ++++++++++++++--------
 stt/processing/alignment_model.py | 18 ++++++++----
 stt/processing/decoding.py        | 32 +++++++-------------
 stt/processing/load_model.py      | 49 ++++++++++++++++++++++---------
 stt/processing/utils.py           | 33 ++++++++++++++++++---
 9 files changed, 126 insertions(+), 67 deletions(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index bab7b65..b55bb03 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -9,9 +9,8 @@
 from serving import GunicornServing, GeventServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, load_wave_buffer, model, alignment_model
+from stt.processing import decode, load_wave_buffer, model, alignment_model, use_gpu
 from stt import logger as stt_logger
-from stt import SHOULD_USE_GEVENT
 
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
@@ -102,7 +101,7 @@ def server_error(error):
 
     logger.info(f"Using {args.workers} workers")
     
-    if SHOULD_USE_GEVENT: # TODO: get rid of this
+    if use_gpu: # TODO: get rid of this?
         serving_type = GeventServing
         logger.debug("Serving with gevent")
     else:
diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index 2cf4b8d..84547ac 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -5,6 +5,7 @@ flask-sock
 flask-swagger-ui>=3.36.0
 gevent
 gunicorn
+lockfile
 pyyaml>=5.4.1
 requests>=2.26.0
 wavio>=0.0.4
diff --git a/requirements.torch.txt b/requirements.torch.txt
index 1b15744..9c40b6b 100644
--- a/requirements.torch.txt
+++ b/requirements.torch.txt
@@ -5,6 +5,7 @@ flask-sock
 flask-swagger-ui>=3.36.0
 gevent
 gunicorn
+lockfile
 num2words
 pyyaml>=5.4.1
 requests>=2.26.0
diff --git a/stt/__init__.py b/stt/__init__.py
index 5460088..6c57bb2 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -9,18 +9,21 @@
 try:
     import faster_whisper
     USE_CTRANSLATE2 = True
-except ImportError:
+except ImportError as err:
+    try:
+        import whisper
+    except:
+        raise err
     USE_CTRANSLATE2 = False
 
 try:
-    import torch, torchaudio
+    import torch
     USE_TORCH = True
 except ImportError:
     USE_TORCH = False
 
-# TODO: Get rid of that
-if USE_TORCH:
-    SHOULD_USE_GEVENT = torch.cuda.is_available()
-    torch.set_num_threads(1)
-else:
-    SHOULD_USE_GEVENT = USE_CTRANSLATE2
+try:
+    import torchaudio
+    USE_TORCHAUDIO = True
+except ImportError:
+    USE_TORCHAUDIO = False
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index f891984..5e72252 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -1,9 +1,10 @@
 import os
 import logging
+from lockfile import FileLock
 
-from stt import logger
-from .decoding import decode, get_language
-from .utils import get_device, LANGUAGES, load_wave_buffer, load_audiofile
+from stt import logger, USE_CTRANSLATE2
+from .decoding import decode
+from .utils import get_device, get_language, load_wave_buffer, load_audiofile
 
 from .load_model import load_whisper_model
 from .alignment_model import load_alignment_model, get_alignment_model
@@ -11,6 +12,23 @@
 __all__ = ["logger", "decode", "model", "alignment_model",
            "load_audiofile", "load_wave_buffer"]
 
+class LazyLoadedModel:
+
+    def __init__(self, model_type, device):
+        self.model_type = model_type
+        self.device = device
+        self._model = None
+        if USE_CTRANSLATE2:
+            # May download model here
+            load_whisper_model(self.model_type, device=self.device)
+
+    def __getattr__(self, name):
+        if self._model is None:
+            lockfile = os.path.basename(self.model_type)
+            with FileLock(lockfile):
+                self._model = load_whisper_model(self.model_type, device=self.device)
+        return getattr(self._model, name)
+    
 # Set informative log
 logger.setLevel(logging.INFO)
 
@@ -20,21 +38,14 @@
 
 # Check language
 language = get_language()
-available_languages = \
-    list(LANGUAGES.keys()) + \
-    [k.lower() for k in LANGUAGES.values()] + \
-    [None]
-if language not in available_languages:
-    raise ValueError(f"Language '{get_language()}' is not available. Available languages are: {available_languages}")
-if isinstance(language, str) and language not in LANGUAGES:
-    language = {v: k for k, v in LANGUAGES.items()}[language.lower()]
 logger.info(f"Using language {language}")
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
 logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})...")
 try:
-    model = load_whisper_model(model_type, device=device)
+    model = LazyLoadedModel(model_type, device=device)
+    # model = load_whisper_model(model_type, device=device)
 except Exception as err:
     raise Exception(
         "Failed to load transcription model: {}".format(str(err))) from err
diff --git a/stt/processing/alignment_model.py b/stt/processing/alignment_model.py
index 08a5e45..a8e6e79 100644
--- a/stt/processing/alignment_model.py
+++ b/stt/processing/alignment_model.py
@@ -1,4 +1,4 @@
-from stt import logger, USE_TORCH
+from stt import logger, USE_TORCH, USE_TORCHAUDIO
 from .utils import SAMPLE_RATE, LANGUAGES
 
 import os
@@ -9,9 +9,17 @@
 if USE_TORCH:
     import torch
     import torch.nn.utils.rnn as rnn_utils
-    import huggingface_hub
-    import speechbrain as sb
-    import transformers
+    try:
+        import speechbrain as sb
+        import huggingface_hub
+    except ImportError:
+        pass
+    try:
+        import transformers
+    except ImportError:
+        pass
+
+if USE_TORCHAUDIO:
     import torchaudio
 
 ################################################################################
@@ -77,7 +85,7 @@ def load_alignment_model(source, device="cpu", download_root="/opt"):
 
     start = time.time()
 
-    if source in torchaudio.pipelines.__all__:
+    if (source in torchaudio.pipelines.__all__) if USE_TORCHAUDIO else False:
         model = load_torchaudio_model(
             source, device=device, download_root=download_root)
     else:
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 6572053..c8a5380 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -4,7 +4,7 @@
 import copy
 
 from stt import logger, USE_CTRANSLATE2
-from .utils import SAMPLE_RATE
+from .utils import SAMPLE_RATE, get_language
 from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations
 from .alignment_model import get_alignment_model, load_alignment_model
 from .word_alignment import compute_alignment
@@ -14,20 +14,6 @@
     import whisper_timestamped
 
 
-def get_language():
-    """
-    Get the language from the environment variable LANGUAGE, and format as expected by Whisper.
-    """
-    language = os.environ.get("LANGUAGE", "*")
-    # "fr-FR" -> "fr" (language-country code to ISO 639-1 code)
-    if len(language) > 2 and language[2] == "-":
-        language = language.split("-")[0]
-    # "*" means "all languages"
-    if language == "*":
-        language = None
-    return language
-
-
 def decode(audio,
            model,
            alignment_model: "Any",
@@ -47,7 +33,7 @@ def decode(audio,
 
     kwargs = copy.copy(locals())
 
-    logger.info(f"Transcribing audio with language {language}...")
+    logger.info("Transcribing audio with " + (f"language {language}" if language else "automatic language detection") + "...")
 
     start_t = time.time()
 
@@ -123,7 +109,7 @@ def decode_torch(audio,
 
     if alignment_model is None:
         # Use Whisper cross-attention weights
-        whisper_res = whisper_timestamped.transcribe(model, audio, **kwargs)
+        whisper_res = whisper_timestamped.transcribe(model, audio, verbose=None, **kwargs)
         if language is None:
             language = whisper_res["language"]
             logger.info(f"Detected language: {language}")
@@ -133,7 +119,7 @@ def decode_torch(audio,
     torch.manual_seed(1234)
     torch.cuda.manual_seed_all(1234)
     
-    whisper_res = model.transcribe(audio, **kwargs)
+    whisper_res = model.transcribe(audio, verbose=None, **kwargs)
 
     text = whisper_res["text"]
     text = remove_emoji(text).strip()
@@ -294,9 +280,13 @@ def checked_timestamps(start, end=None):
                         words[-1]["confidence"].append(word.probability)
                         _, words[-1]["end"] = checked_timestamps(words[-1]["end"], word.end)
                     continue
-                words.append(
-                    {"text": word.word, "confidence": [word.probability]} | dict(zip(("start", "end"), checked_timestamps(word.start, word.end)))
-                )
+                start, end = checked_timestamps(word.start, word.end)
+                words.append({
+                    "text": word.word,
+                    "confidence": [word.probability],
+                    "start": start,
+                    "end": end
+                })
 
             for word in words:
                 word["text"] = word["text"].strip()
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 14ec3e2..1476d60 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -4,27 +4,48 @@
 from stt import logger, USE_CTRANSLATE2
 
 if USE_CTRANSLATE2:
-    import faster_whisper as whisper
+    import faster_whisper
 else:
     import whisper_timestamped as whisper
 
-def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
+def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
 
     start = time.time()
 
+    logger.info("Loading Whisper model {}...".format(model_type_or_file))
+
+    default_cache_root = os.path.join(os.path.expanduser("~"), ".cache")
+    if download_root is None:
+        download_root = default_cache_root
+
     if USE_CTRANSLATE2:
         if not os.path.isdir(model_type_or_file):
-            # To specify the cache directory
-            model_type_or_file = whisper.utils.download_model(
-                model_type_or_file,
-                output_dir=os.path.join(download_root, "huggingface/hub")
-            )
-        model = whisper.WhisperModel(model_type_or_file,
-                                     device=device,
-                                     compute_type="default",
-                                     cpu_threads=0, # Can be controled with OMP_NUM_THREADS
-                                     num_workers=1,
-                                     )
+            # Note: There is no good way to set the root cache directory
+            #       with the current version of faster_whisper:
+            #       if "download_root" is specified to faster_whisper.WhisperModel
+            #       (or "output_dir" in faster_whisper.utils.download_model),
+            #       then files are downloaded directly in it without symbolic links
+            #       to the cache directory. So it's different from the behavior
+            #       of the huggingface_hub.
+            #       So we try to create a symbolic link to the cache directory that will be used by HuggingFace...
+            if not os.path.exists(download_root):
+                if not os.path.exists(default_cache_root):
+                    os.makedirs(download_root)
+                    if default_cache_root != download_root:
+                        os.symlink(download_root, default_cache_root)
+                else:
+                    os.symlink(default_cache_root, download_root)
+            elif not os.path.exists(default_cache_root):
+                os.symlink(download_root, default_cache_root)
+
+        model = faster_whisper.WhisperModel(
+            model_type_or_file,
+            device=device,
+            compute_type="default",
+            cpu_threads=0,  # Can be controled with OMP_NUM_THREADS
+            num_workers=1,
+            # download_root=os.path.join(download_root, f"huggingface/hub/models--guillaumekln--faster-whisper-{model_type_or_file}"),
+        )
 
     else:
         model = whisper.load_model(
@@ -34,6 +55,6 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root="/opt"):
         model.eval()
         model.requires_grad_(False)
 
-    logger.info("Whisper Model loaded. (t={}s)".format(time.time() - start))
+    logger.info("Whisper model loaded. (t={}s)".format(time.time() - start))
 
     return model
\ No newline at end of file
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index e8b2bd8..a3719b0 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -1,4 +1,4 @@
-from stt import USE_CTRANSLATE2, USE_TORCH
+from stt import USE_CTRANSLATE2, USE_TORCH, USE_TORCHAUDIO
 
 import io
 import wavio
@@ -12,9 +12,11 @@
     import faster_whisper
 else:
     import torch
-    import torchaudio
     import whisper
 
+if USE_TORCHAUDIO:
+    import torchaudio
+
 def has_cuda():
     if USE_CTRANSLATE2:
         return ctranslate2.get_cuda_device_count() > 0
@@ -31,10 +33,33 @@ def get_device():
             raise Exception("Failed to set device: {}".format(str(err))) from err
     return device, use_gpu
 
+def get_language():
+    """
+    Get the language from the environment variable LANGUAGE, and format as expected by Whisper.
+    """
+    language = os.environ.get("LANGUAGE", "*")
+    # "fr-FR" -> "fr" (language-country code to ISO 639-1 code)
+    if len(language) > 2 and language[2] == "-":
+        language = language.split("-")[0]
+    # "*" means "all languages"
+    if language == "*":
+        language = None
+    # Convert French -> fr
+    if isinstance(language, str) and language not in LANGUAGES:
+        language = {v: k for k, v in LANGUAGES.items()}.get(language.lower(), language)
+        # Raise an exception for unknown languages
+        if language not in LANGUAGES:
+            available_languages = \
+                list(LANGUAGES.keys()) + \
+                [k[0].upper() + k[1:] for k in LANGUAGES.values()] + \
+                ["*", None]
+            raise ValueError(f"Language '{language}' is not available. Available languages are: {available_languages}")
+    return language
+
 def conform_audio(audio, sample_rate=16_000):
     if sample_rate != SAMPLE_RATE:
-        if not USE_TORCH:
-            raise NotImplementedError("Resampling not available without Torch")
+        if not USE_TORCHAUDIO:
+            raise NotImplementedError("Resampling not available without torchaudio")
         # Down or Up sample to the right sampling rate
         audio = torchaudio.transforms.Resample(sample_rate, SAMPLE_RATE)(audio)
     if audio.shape[0] > 1:

From adc1cf1d8e6abbf92065d7f24de7331137374966 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 13 Apr 2023 13:36:06 +0200
Subject: [PATCH 130/172] use upper case letters for global variables

---
 celery_app/tasks.py        |  4 ++--
 http_server/ingress.py     |  6 +++---
 stt/processing/__init__.py | 16 ++++++++--------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index 3b7251f..26ae18a 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -3,7 +3,7 @@
 
 from celery_app.celeryapp import celery
 from stt import logger
-from stt.processing import decode, model, alignment_model
+from stt.processing import decode, MODEL, ALIGNMENT_MODEL
 from stt.processing.utils import load_audiofile
 
 
@@ -22,7 +22,7 @@ def transcribe_task(file_name: str, with_metadata: bool):
 
     # Decode
     try:
-        result = decode(file_content, model, alignment_model, with_metadata)
+        result = decode(file_content, MODEL, ALIGNMENT_MODEL, with_metadata)
     except Exception as err:
         logger.error(f"Failed to decode: {repr(err)}")
         raise Exception(f"Failed to decode {file_path}") from err
diff --git a/http_server/ingress.py b/http_server/ingress.py
index b55bb03..afed5d0 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -9,7 +9,7 @@
 from serving import GunicornServing, GeventServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, load_wave_buffer, model, alignment_model, use_gpu
+from stt.processing import decode, load_wave_buffer, MODEL, ALIGNMENT_MODEL, USE_GPU
 from stt import logger as stt_logger
 
 app = Flask("__stt-standalone-worker__")
@@ -54,7 +54,7 @@ def transcribe():
 
         # Transcription
         transcription = decode(
-            audio_data, model, alignment_model, join_metadata)
+            audio_data, MODEL, ALIGNMENT_MODEL, join_metadata)
 
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
@@ -101,7 +101,7 @@ def server_error(error):
 
     logger.info(f"Using {args.workers} workers")
     
-    if use_gpu: # TODO: get rid of this?
+    if USE_GPU: # TODO: get rid of this?
         serving_type = GeventServing
         logger.debug("Serving with gevent")
     else:
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 5e72252..c4f9e55 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -33,7 +33,7 @@ def __getattr__(self, name):
 logger.setLevel(logging.INFO)
 
 # Set device
-device, use_gpu = get_device()
+device, USE_GPU = get_device()
 logger.info(f"Using device {device}")
 
 # Check language
@@ -44,20 +44,20 @@ def __getattr__(self, name):
 model_type = os.environ.get("MODEL", "medium")
 logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})...")
 try:
-    model = LazyLoadedModel(model_type, device=device)
+    MODEL = LazyLoadedModel(model_type, device=device)
     # model = load_whisper_model(model_type, device=device)
 except Exception as err:
     raise Exception(
         "Failed to load transcription model: {}".format(str(err))) from err
 
 # Load alignment model (if any)
-alignment_model = get_alignment_model(os.environ.get("ALIGNMENT_MODEL"), language)
-if alignment_model:
+ALIGNMENT_MODEL = get_alignment_model(os.environ.get("ALIGNMENT_MODEL"), language)
+if ALIGNMENT_MODEL:
     logger.info(
-        f"Loading alignment model {alignment_model} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
-    alignment_model = load_alignment_model(alignment_model, device=device, download_root="/opt")
-elif alignment_model is None:
+        f"Loading alignment model {ALIGNMENT_MODEL} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
+    ALIGNMENT_MODEL = load_alignment_model(ALIGNMENT_MODEL, device=device, download_root="/opt")
+elif ALIGNMENT_MODEL is None:
     logger.info("Alignment will be done using Whisper cross-attention weights")
 else:
     logger.info("No alignment model preloaded. It will be loaded on the fly depending on the detected language.")
-    alignment_model = {}  # Alignement model(s) will be loaded on the fly
+    ALIGNMENT_MODEL = {}  # Alignement model(s) will be loaded on the fly

From e2a9292507abe8564dd23a4efca653f61c63f007 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 14 Apr 2023 18:57:49 +0200
Subject: [PATCH 131/172] use lower precision when possible + use accurate
 decoding

---
 stt/processing/__init__.py   |  4 +---
 stt/processing/decoding.py   | 16 +++++++++++++---
 stt/processing/load_model.py | 31 +++++++++++++++++++++++--------
 3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index c4f9e55..e650424 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -18,9 +18,6 @@ def __init__(self, model_type, device):
         self.model_type = model_type
         self.device = device
         self._model = None
-        if USE_CTRANSLATE2:
-            # May download model here
-            load_whisper_model(self.model_type, device=self.device)
 
     def __getattr__(self, name):
         if self._model is None:
@@ -33,6 +30,7 @@ def __getattr__(self, name):
 logger.setLevel(logging.INFO)
 
 # Set device
+os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
 device, USE_GPU = get_device()
 logger.info(f"Using device {device}")
 
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index c8a5380..ebcd052 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -2,6 +2,7 @@
 import time
 import numpy as np
 import copy
+from typing import Tuple, Union
 
 from stt import logger, USE_CTRANSLATE2
 from .utils import SAMPLE_RATE, get_language
@@ -13,6 +14,15 @@
     import torch 
     import whisper_timestamped
 
+if "USE_ACCURATE":
+    default_beam_size = 5
+    default_best_of = 5
+    default_temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
+else:
+    default_beam_size = None
+    default_best_of = None
+    default_temperature = 0.0
+
 
 def decode(audio,
            model,
@@ -20,9 +30,9 @@ def decode(audio,
            with_word_timestamps: bool,
            language: str = None,
            remove_punctuation_from_words=False,
-           beam_size: int = None,
-           best_of: int = None,
-           temperature: float = 0.0,
+           beam_size: int = default_beam_size,
+           best_of: int = default_best_of,
+           temperature: Union[float, Tuple[float, ...]] = default_temperature,
            condition_on_previous_text: bool = False,
            no_speech_threshold: float = 0.6,
            compression_ratio_threshold: float = 2.4,
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 1476d60..ce3dbdd 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -38,14 +38,29 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             elif not os.path.exists(default_cache_root):
                 os.symlink(download_root, default_cache_root)
 
-        model = faster_whisper.WhisperModel(
-            model_type_or_file,
-            device=device,
-            compute_type="default",
-            cpu_threads=0,  # Can be controled with OMP_NUM_THREADS
-            num_workers=1,
-            # download_root=os.path.join(download_root, f"huggingface/hub/models--guillaumekln--faster-whisper-{model_type_or_file}"),
-        )
+        if device == "cpu":
+            compute_types = ["int8", "float32"]
+        else:
+            compute_types = ["int8_float16", "float16", "float32"]
+
+        model = None
+        for i, compute_type in enumerate(compute_types):
+            try:
+                model = faster_whisper.WhisperModel(
+                    model_type_or_file,
+                    device=device,
+                    compute_type=compute_type,
+                    cpu_threads=0,  # Can be controled with OMP_NUM_THREADS
+                    num_workers=1,
+                    # download_root=os.path.join(download_root, f"huggingface/hub/models--guillaumekln--faster-whisper-{model_type_or_file}"),
+                )
+                break
+            except ValueError as err:
+                # On some old GPU we may have the error
+                # "ValueError: Requested int8_float16 compute type, 
+                # but the target device or backend do not support efficient int8_float16 computation."
+                if i == len(compute_types) - 1:
+                    raise err
 
     else:
         model = whisper.load_model(

From be66eb58c47bedccff5961fe7da1bf1b5b10b63c Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Fri, 14 Apr 2023 20:06:44 +0200
Subject: [PATCH 132/172] multi-GPU and specification of the right GPU index
 with faster_whisper

---
 stt/processing/__init__.py   |  4 +++-
 stt/processing/load_model.py | 10 ++++++++--
 stt/processing/utils.py      | 16 +++++++++++++++-
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index e650424..40f98fc 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -18,6 +18,9 @@ def __init__(self, model_type, device):
         self.model_type = model_type
         self.device = device
         self._model = None
+        if USE_CTRANSLATE2:
+            # This may download the model, and test the device
+            load_whisper_model(self.model_type, device=self.device)
 
     def __getattr__(self, name):
         if self._model is None:
@@ -30,7 +33,6 @@ def __getattr__(self, name):
 logger.setLevel(logging.INFO)
 
 # Set device
-os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
 device, USE_GPU = get_device()
 logger.info(f"Using device {device}")
 
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index ce3dbdd..e4b1f58 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -43,15 +43,21 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
         else:
             compute_types = ["int8_float16", "float16", "float32"]
 
+        device_index = 0
+        if device.startswith("cuda:"):
+            device_index = [int(dev) for dev in device[5:].split(",")]
+            device = "cuda"
+
         model = None
         for i, compute_type in enumerate(compute_types):
             try:
                 model = faster_whisper.WhisperModel(
                     model_type_or_file,
                     device=device,
+                    device_index=device_index,
                     compute_type=compute_type,
-                    cpu_threads=0,  # Can be controled with OMP_NUM_THREADS
-                    num_workers=1,
+                    # cpu_threads=0,  # Can be controled with OMP_NUM_THREADS
+                    # num_workers=1,
                     # download_root=os.path.join(download_root, f"huggingface/hub/models--guillaumekln--faster-whisper-{model_type_or_file}"),
                 )
                 break
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index a3719b0..787c433 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -26,7 +26,21 @@ def has_cuda():
 def get_device():
     device = os.environ.get("DEVICE", "cuda" if has_cuda() else "cpu")
     use_gpu = "cuda" in device
-    if not USE_CTRANSLATE2:
+    
+    # The following is to have GPU in the right order (as nvidia-smi show them)
+    # But somehow it does not work with ctranslate2: 
+    # see https://github.com/guillaumekln/faster-whisper/issues/150
+    os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
+    
+    if USE_CTRANSLATE2:
+        try:
+            if device.startswith("cuda:"):
+                _ = [int(dev) for dev in device[5:].split(",")]
+            else:
+                assert device in ["cpu", "cuda"]
+        except:
+            raise ValueError(f"Invalid DEVICE '{device}' (should be 'cpu' or 'cuda' or 'cuda:<index> or 'cuda:<index1>,<index2>,...')")
+    else:
         try:
             device = torch.device(device)
         except Exception as err:

From 3afa5f4575c920ee86dfe2006c5b9d7e86e905eb Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 17 Apr 2023 22:03:03 +0200
Subject: [PATCH 133/172] fix GPU order (when several GPUs)

---
 stt/__init__.py            | 6 ++++++
 stt/processing/__init__.py | 3 ---
 stt/processing/utils.py    | 7 +------
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/stt/__init__.py b/stt/__init__.py
index 6c57bb2..aa3e314 100644
--- a/stt/__init__.py
+++ b/stt/__init__.py
@@ -1,3 +1,4 @@
+import os
 import logging
 
 logging.basicConfig(
@@ -6,6 +7,11 @@
 )
 logger = logging.getLogger("__stt__")
 
+# The following is to have GPU in the right order (as nvidia-smi show them)
+# It is important to set that before loading ctranslate2 
+# see https://github.com/guillaumekln/faster-whisper/issues/150
+os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
+
 try:
     import faster_whisper
     USE_CTRANSLATE2 = True
diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 40f98fc..095f91b 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -18,9 +18,6 @@ def __init__(self, model_type, device):
         self.model_type = model_type
         self.device = device
         self._model = None
-        if USE_CTRANSLATE2:
-            # This may download the model, and test the device
-            load_whisper_model(self.model_type, device=self.device)
 
     def __getattr__(self, name):
         if self._model is None:
diff --git a/stt/processing/utils.py b/stt/processing/utils.py
index 787c433..0352de4 100644
--- a/stt/processing/utils.py
+++ b/stt/processing/utils.py
@@ -26,12 +26,7 @@ def has_cuda():
 def get_device():
     device = os.environ.get("DEVICE", "cuda" if has_cuda() else "cpu")
     use_gpu = "cuda" in device
-    
-    # The following is to have GPU in the right order (as nvidia-smi show them)
-    # But somehow it does not work with ctranslate2: 
-    # see https://github.com/guillaumekln/faster-whisper/issues/150
-    os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
-    
+        
     if USE_CTRANSLATE2:
         try:
             if device.startswith("cuda:"):

From e970615d07d511f9ac3217621566c29c10e50be7 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 18 Apr 2023 09:39:23 +0200
Subject: [PATCH 134/172] add compute_type=int8 for GPU

---
 stt/processing/load_model.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index e4b1f58..18541e2 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -41,7 +41,7 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
         if device == "cpu":
             compute_types = ["int8", "float32"]
         else:
-            compute_types = ["int8_float16", "float16", "float32"]
+            compute_types = ["int8", "int8_float16", "float16", "float32"]
 
         device_index = 0
         if device.startswith("cuda:"):

From 6cc5b3d9375b37ee087d8708714064595423f45b Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 2 May 2023 16:13:28 +0200
Subject: [PATCH 135/172] fix typo that was fixed in faster_whisper

---
 requirements.ctranslate2.txt | 2 +-
 stt/processing/decoding.py   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index 84547ac..e50d462 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -10,4 +10,4 @@ pyyaml>=5.4.1
 requests>=2.26.0
 wavio>=0.0.4
 websockets
-faster_whisper
\ No newline at end of file
+faster_whisper>=0.5.1
\ No newline at end of file
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index ebcd052..f5a3222 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -306,7 +306,7 @@ def checked_timestamps(start, end=None):
             "text": segment.text.strip(),
             "start": start,
             "end": end,
-            "avg_logprob": segment.avg_log_prob,
+            "avg_logprob": segment.avg_logprob,
             "words": words
         })
 
@@ -315,7 +315,7 @@ def checked_timestamps(start, end=None):
     transcription = {
         "text": " ".join(segment["text"] for segment in segments_list),
         "language": language,
-        "confidence": round(np.exp(np.mean([segment.avg_log_prob for segment in segments])), 2),
+        "confidence": round(np.exp(np.mean([segment["avg_logprob"] for segment in segments_list])), 2),
         "segments": segments_list,
     }
     return format_whisper_timestamped_response(transcription, remove_punctuation_from_words=remove_punctuation_from_words)

From 5fe83692e86064e2e210d5c36af71a75332369db Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 10 May 2023 17:44:28 +0200
Subject: [PATCH 136/172] give more information in case of error

---
 celery_app/tasks.py    | 12 ++++++++----
 http_server/ingress.py |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index 26ae18a..3fc38f2 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -17,14 +17,18 @@ def transcribe_task(file_name: str, with_metadata: bool):
     try:
         file_content = load_audiofile(file_path)
     except Exception as err:
-        logger.error(f"Failed to load ressource: {repr(err)}")
-        raise Exception(f"Could not open ressource {file_path}") from err
+        import traceback
+        msg = f"{traceback.format_exc()}\nFailed to load ressource {file_path}"
+        logger.error(msg)
+        raise Exception(msg) # from err
 
     # Decode
     try:
         result = decode(file_content, MODEL, ALIGNMENT_MODEL, with_metadata)
     except Exception as err:
-        logger.error(f"Failed to decode: {repr(err)}")
-        raise Exception(f"Failed to decode {file_path}") from err
+        import traceback
+        msg = f"{traceback.format_exc()}\nFailed to decode {file_path}"
+        logger.error(msg)
+        raise Exception(msg) # from err
 
     return result
diff --git a/http_server/ingress.py b/http_server/ingress.py
index afed5d0..0e8a640 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -62,7 +62,7 @@ def transcribe():
 
     except Exception as error:
         import traceback
-        print(traceback.format_exc())
+        logger.error(traceback.format_exc())
         logger.error(repr(error))
         return "Server Error: {}".format(str(error)), 400 if isinstance(error, ValueError) else 500
 

From d045a66d9aa5d3d05c6668b3ce60280e8d5ce6e0 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 10 May 2023 17:45:27 +0200
Subject: [PATCH 137/172] remove dangerous useless assert + add VAD

---
 requirements.torch.txt     |  4 +++-
 stt/processing/decoding.py | 13 ++++++++-----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/requirements.torch.txt b/requirements.torch.txt
index 9c40b6b..75e747c 100644
--- a/requirements.torch.txt
+++ b/requirements.torch.txt
@@ -14,4 +14,6 @@ transformers
 wavio>=0.0.4
 websockets
 # openai-whisper
-git+https://github.com/linto-ai/whisper-timestamped.git
\ No newline at end of file
+git+https://github.com/linto-ai/whisper-timestamped.git
+onnxruntime
+torchaudio
\ No newline at end of file
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index f5a3222..37bccee 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -14,7 +14,10 @@
     import torch 
     import whisper_timestamped
 
-if "USE_ACCURATE":
+USE_ACCURATE = True
+USE_VAD = True
+
+if USE_ACCURATE:
     default_beam_size = 5
     default_best_of = 5
     default_temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
@@ -71,13 +74,14 @@ def decode_ct2(audio,
         kwargs["beam_size"] = 1
     if kwargs.get("best_of") is None:
         kwargs["best_of"] = 1
-
+    
     segments, info = model.transcribe(
         audio,
         word_timestamps=with_word_timestamps,
         language=language,
         # Careful with the following options
         max_initial_timestamp=10000.0,
+        vad_filter=USE_VAD,
         **kwargs)
 
     segments = list(segments)
@@ -114,7 +118,8 @@ def decode_torch(audio,
         best_of=best_of,
         condition_on_previous_text=condition_on_previous_text,
         no_speech_threshold=no_speech_threshold,
-        compression_ratio_threshold=compression_ratio_threshold
+        compression_ratio_threshold=compression_ratio_threshold,
+        vad=USE_VAD,
     )
 
     if alignment_model is None:
@@ -309,8 +314,6 @@ def checked_timestamps(start, end=None):
             "avg_logprob": segment.avg_logprob,
             "words": words
         })
-
-    assert len(segments_list)
     
     transcription = {
         "text": " ".join(segment["text"] for segment in segments_list),

From 28f3809262d70964ecebb234fca307a36bedd0c8 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 11 May 2023 11:13:23 +0200
Subject: [PATCH 138/172] fix call of the model (lazy) wrapping

---
 stt/processing/__init__.py | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/stt/processing/__init__.py b/stt/processing/__init__.py
index 095f91b..9bb51bc 100644
--- a/stt/processing/__init__.py
+++ b/stt/processing/__init__.py
@@ -19,13 +19,20 @@ def __init__(self, model_type, device):
         self.device = device
         self._model = None
 
-    def __getattr__(self, name):
+    def check_loaded(self):
         if self._model is None:
             lockfile = os.path.basename(self.model_type)
             with FileLock(lockfile):
                 self._model = load_whisper_model(self.model_type, device=self.device)
+
+    def __getattr__(self, name):
+        self.check_loaded()
         return getattr(self._model, name)
     
+    def __call__(self, *args, **kwargs):
+        self.check_loaded()
+        return self._model(*args, **kwargs)
+
 # Set informative log
 logger.setLevel(logging.INFO)
 

From be64669856d873ccb2563a80be6b041a3960e254 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 31 May 2023 09:04:40 +0200
Subject: [PATCH 139/172] build image with tag whisper-latest / 4.0.0

---
 Jenkinsfile | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 572c1c5..75a09bd 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -48,25 +48,25 @@ pipeline {
             }
         }
 
-        // stage('Docker build for whisper branch'){
-        //     when{
-        //         branch 'feature/whisper'
-        //     }
-        //     steps {
-        //         echo 'Publishing whisper'
-        //         script {
-        //             image = docker.build(env.DOCKER_HUB_REPO)
-        //             VERSION = sh(
-        //                 returnStdout: true, 
-        //                 script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
-        //             ).trim()
+        stage('Docker build for whisper branch'){
+            when{
+                branch 'feature/whisper'
+            }
+            steps {
+                echo 'Publishing faster_whisper'
+                script {
+                    image = docker.build(env.DOCKER_HUB_REPO, "-f Dockerfile.ctranslate2 .")
+                    VERSION = sh(
+                        returnStdout: true, 
+                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                    ).trim()
 
-        //             docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-        //                 image.push("${VERSION}")
-        //                 image.push('whisper')
-        //             }
-        //         }
-        //     }
-        // }
+                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+                        image.push("${VERSION}")
+                        image.push('whisper-latest')
+                    }
+                }
+            }
+        }
     }// end stages
 }
\ No newline at end of file

From 46c3e8e157651678037bb365e3eb6ebfce523e13 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 12 Jun 2023 17:42:17 +0200
Subject: [PATCH 140/172] fix: spaces that were added before "-" and "'"

---
 RELEASE.md                       | 3 +++
 stt/processing/decoding.py       | 6 +++---
 stt/processing/text_normalize.py | 1 +
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/RELEASE.md b/RELEASE.md
index 4ea4e01..df928db 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.1
+- Fix punctuations
+
 # 4.0.0
 - Integration of Whisper
 
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 37bccee..e49e208 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -6,7 +6,7 @@
 
 from stt import logger, USE_CTRANSLATE2
 from .utils import SAMPLE_RATE, get_language
-from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations
+from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations_plus
 from .alignment_model import get_alignment_model, load_alignment_model
 from .word_alignment import compute_alignment
 
@@ -289,9 +289,9 @@ def checked_timestamps(start, end=None):
         words = []
         if segment.words:
             for word in segment.words:
-                if len(words) and (not(word.word.strip()) or word.word.strip()[0] in _punctuations):
+                if len(words) and (not(word.word.strip()) or word.word.strip()[0] in _punctuations_plus):
                     words[-1]["text"] += word.word
-                    if word.word.strip() not in _punctuations:
+                    if word.word.strip() not in _punctuations_plus:
                         words[-1]["confidence"].append(word.probability)
                         _, words[-1]["end"] = checked_timestamps(words[-1]["end"], word.end)
                     continue
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index a4037bd..a7e0495 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -8,6 +8,7 @@
 
 # string.punctuation, plus Whisper specific "«»¿", minus apostrophe "'", dash "-", and dot "." (which will be processed as special)
 _punctuations = '!"#$%&()*+,/:;<=>?@[\\]^_`{|}~«»¿'
+_punctuations_plus = _punctuations + "'-"
 
 
 def remove_punctuation(text: str) -> str:

From a6289a77635e3c7e6fd13b47b59216c6c9bfa007 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 27 Jun 2023 17:51:58 +0200
Subject: [PATCH 141/172] Small list of punctuations

---
 stt/processing/text_normalize.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index a7e0495..427d22c 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -6,8 +6,8 @@
 from stt import logger
 from .utils import flatten
 
-# string.punctuation, plus Whisper specific "«»¿", minus apostrophe "'", dash "-", and dot "." (which will be processed as special)
-_punctuations = '!"#$%&()*+,/:;<=>?@[\\]^_`{|}~«»¿'
+# Punctuation marks
+_punctuations = '.!?,:;¿。，！？：、…؟،؛'
 _punctuations_plus = _punctuations + "'-"
 
 

From 658d77d1b46b87f4abf56b900fe8d1ac8ad70a16 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 27 Jun 2023 17:53:59 +0200
Subject: [PATCH 142/172] update release notes

---
 RELEASE.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/RELEASE.md b/RELEASE.md
index df928db..e9850b1 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.2
+- Do not considers symbols like "$" as punctuation marks
+
 # 4.0.1
 - Fix punctuations
 

From 14e1c55522b6db09eb63a31ad4e965716a705fc2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 27 Jun 2023 18:07:12 +0200
Subject: [PATCH 143/172] cosm

---
 stt/processing/text_normalize.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 427d22c..9077be6 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -7,7 +7,7 @@
 from .utils import flatten
 
 # Punctuation marks
-_punctuations = '.!?,:;¿。，！？：、…؟،؛'
+_punctuations = '!,.:;?¿،؛؟…、。！，：？' # + '"”' + ')]}'
 _punctuations_plus = _punctuations + "'-"
 
 

From 9dbd02a606badf8c0c6c6ab4cac18321d81dcb5e Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 3 Jul 2023 13:39:53 +0200
Subject: [PATCH 144/172] fix corner cases with punctuations and symbols

---
 RELEASE.md                       |  3 ++
 stt/processing/decoding.py       | 21 +++++++------
 stt/processing/text_normalize.py | 54 +++++++++++++++++++++-----------
 3 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/RELEASE.md b/RELEASE.md
index e9850b1..174ae1b 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.3
+- Tune punctuation heuristics
+
 # 4.0.2
 - Do not considers symbols like "$" as punctuation marks
 
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index e49e208..624255c 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -6,7 +6,7 @@
 
 from stt import logger, USE_CTRANSLATE2
 from .utils import SAMPLE_RATE, get_language
-from .text_normalize import remove_punctuation, normalize_text, remove_emoji, _punctuations_plus
+from .text_normalize import remove_punctuation, normalize_text, remove_emoji
 from .alignment_model import get_alignment_model, load_alignment_model
 from .word_alignment import compute_alignment
 
@@ -264,8 +264,11 @@ def format_whisper_timestamped_response(transcription, remove_punctuation_from_w
     }
 
 
-def format_faster_whisper_response(segments, info,
-                                   remove_punctuation_from_words=False):
+def format_faster_whisper_response(
+    segments, info,
+    remove_punctuation_from_words=False,
+    glue_punctuations="'-&@.,",
+    ):
 
     language = info.language
     duration = info.duration
@@ -289,13 +292,13 @@ def checked_timestamps(start, end=None):
         words = []
         if segment.words:
             for word in segment.words:
-                if len(words) and (not(word.word.strip()) or word.word.strip()[0] in _punctuations_plus):
-                    words[-1]["text"] += word.word
-                    if word.word.strip() not in _punctuations_plus:
-                        words[-1]["confidence"].append(word.probability)
-                        _, words[-1]["end"] = checked_timestamps(words[-1]["end"], word.end)
-                    continue
                 start, end = checked_timestamps(word.start, word.end)
+                word_strip = word.word.strip()
+                if glue_punctuations and len(word_strip)>1 and word_strip[0] in glue_punctuations:
+                    words[-1]["text"] += word.word.lstrip()
+                    words[-1]["confidence"].append(word.probability)
+                    words[-1]["end"] = end
+                    continue
                 words.append({
                     "text": word.word,
                     "confidence": [word.probability],
diff --git a/stt/processing/text_normalize.py b/stt/processing/text_normalize.py
index 9077be6..a5f3d04 100644
--- a/stt/processing/text_normalize.py
+++ b/stt/processing/text_normalize.py
@@ -6,24 +6,42 @@
 from stt import logger
 from .utils import flatten
 
-# Punctuation marks
-_punctuations = '!,.:;?¿،؛؟…、。！，：？' # + '"”' + ')]}'
-_punctuations_plus = _punctuations + "'-"
-
-
-def remove_punctuation(text: str) -> str:
-    text = text.translate(str.maketrans("", "", _punctuations))
-    # We don't remove dots inside words (e.g. "ab@gmail.com")
-    text = re.sub(r"\.(\s)", r"\1", text+" ").strip()
-    return collapse_whitespace(text)
-
-
-_whitespace_re = re.compile(r'[^\S\r\n]+')
-
-
-def collapse_whitespace(text):
-    return re.sub(_whitespace_re, ' ', text).strip()
-
+# All punctuations and symbols EXCEPT:
+# * apostrophe (') and hyphen (-)
+# * underscore (_)
+# * currency symbols ($, €, £, ...) -> \p{Sc}
+# * math symbols (%, +, ×). ex: C++
+# * misc (#, @). ex: C#, @user
+# and the space character (which can separate several series of punctuation marks)
+# Example of punctuations that can output models like Whisper: !,.:;?¿،؛؟…、。！，：？>/]:!(~\u200b[ா「«»“”"< ?;…,*」.)'
+_punctuation_regex = r"[^\w\p{Sc}" + re.escape("'-_%+×#@&") + "]"
+_leading_punctuations_regex = r"^" + _punctuation_regex + r"+"
+_trailing_punctuations_regex = _punctuation_regex + r"+$"
+
+# A list of symbols that can be an isolated words and not in the exclusion list above
+# * &
+# * candidates not retained: §, <, =, >, ≤, ≥
+_maybe_word_regex = None # r"[" + re.escape("&") + r"]$"
+
+
+def remove_punctuation(text: str, ensure_no_spaces_in_words: bool=False) -> str:
+    text = text.strip()
+    # Note: we don't remove dots inside words (e.g. "ab@gmail.com")
+    new_text = re.sub(_leading_punctuations_regex, "", text) #.lstrip()
+    new_text = re.sub(_trailing_punctuations_regex, "", new_text) #.rstrip()
+    # Let punctuation marks that are alone
+    if not new_text:
+        if _maybe_word_regex and re.match(_maybe_word_regex, text):
+            new_text = text
+        else:
+            new_text = ""
+    # Ensure that there is no space in the middle of a word
+    if ensure_no_spaces_in_words and " " in new_text:
+        new_text, tail = new_text.split(" ", 1)
+        # OK if the tail only contains non alphanumeric characters (then we just keep the first part)
+        assert not re.search(r"[^\W\d\'\-_]", tail), f"Got unexpected word containing space: {text}"
+        return remove_punctuation(new_text, ensure_no_spaces_in_words=ensure_no_spaces_in_words)
+    return new_text
 
 def transliterate(c):
     # Transliterates a character to its closest ASCII equivalent.

From 4aef237b9102abe85dd41b025d17adbfc5983d70 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 3 Jul 2023 13:45:31 +0200
Subject: [PATCH 145/172] safety

---
 stt/processing/decoding.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 624255c..4b1e1d5 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -297,7 +297,7 @@ def checked_timestamps(start, end=None):
                 if glue_punctuations and len(word_strip)>1 and word_strip[0] in glue_punctuations:
                     words[-1]["text"] += word.word.lstrip()
                     words[-1]["confidence"].append(word.probability)
-                    words[-1]["end"] = end
+                    words[-1]["end"] = max(words[-1]["end"], end)
                     continue
                 words.append({
                     "text": word.word,

From a36deec24a33df899764a6ab1c4f057b61535291 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 10 Jul 2023 12:13:56 +0200
Subject: [PATCH 146/172] update version of faster_whisper, and fix the version
 of ctranslate2

---
 requirements.ctranslate2.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index e50d462..f1d0e5a 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -10,4 +10,5 @@ pyyaml>=5.4.1
 requests>=2.26.0
 wavio>=0.0.4
 websockets
-faster_whisper>=0.5.1
\ No newline at end of file
+ctranslate2==3.16.1
+faster_whisper==0.6.0

From e6ea125e602f7c62cf6f0ca5827e6e62697aa3f6 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 10 Jul 2023 12:14:27 +0200
Subject: [PATCH 147/172] fix timeout issues in celery

---
 celery_app/celeryapp.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/celery_app/celeryapp.py b/celery_app/celeryapp.py
index e04d73b..b432831 100644
--- a/celery_app/celeryapp.py
+++ b/celery_app/celeryapp.py
@@ -10,9 +10,14 @@
 if os.environ.get("BROKER_PASS", False):
     components = broker_url.split("//")
     broker_url = f'{components[0]}//:{os.environ.get("BROKER_PASS")}@{components[1]}'
+
 celery.conf.broker_url = f"{broker_url}/0"
 celery.conf.result_backend = f"{broker_url}/1"
-celery.conf.update(result_expires=3600, task_acks_late=True, task_track_started=True)
+celery.conf.task_acks_late = False
+celery.conf.task_track_started = True
+celery.conf.broker_transport_options = {"visibility_timeout": float("inf")}
+# celery.conf.result_backend_transport_options = {"visibility_timeout": float("inf")}
+# celery.conf.result_expires = 3600 * 24
 
 # Queues
 celery.conf.update(

From f9435e7a6ee8b89a16f6377940f7df02d9cd71ef Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 10 Jul 2023 12:14:47 +0200
Subject: [PATCH 148/172] cosm

---
 http_server/ingress.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http_server/ingress.py b/http_server/ingress.py
index 0e8a640..e78967d 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -113,7 +113,7 @@ def server_error(error):
         {
             "bind": f"0.0.0.0:{args.service_port}",
             "workers": args.workers,
-            "timeout": 3600,
+            "timeout": 3600 * 24,
         },
     )
     logger.info(args)

From 00ac19a3dc142787b520ec34dbf0c6b9c258492a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 21 Aug 2023 17:27:22 +0200
Subject: [PATCH 149/172] update to latest

---
 requirements.ctranslate2.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index f1d0e5a..054d595 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -10,5 +10,5 @@ pyyaml>=5.4.1
 requests>=2.26.0
 wavio>=0.0.4
 websockets
-ctranslate2==3.16.1
-faster_whisper==0.6.0
+ctranslate2==3.18.0
+faster_whisper==0.7.1
\ No newline at end of file

From 9d385f7c1df02895bb87bd36b4fd03c611dd43a3 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 22 Aug 2023 15:29:31 +0200
Subject: [PATCH 150/172] support of Whisper models finetuned with transformers
 Python package (or in HuggingFace formats)

---
 stt/processing/load_model.py | 258 ++++++++++++++++++++++++++++++++++-
 1 file changed, 253 insertions(+), 5 deletions(-)

diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 18541e2..8a91b75 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -1,5 +1,8 @@
 import os
+import sys
 import time
+import shutil
+import subprocess
 
 from stt import logger, USE_CTRANSLATE2
 
@@ -48,6 +51,65 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             device_index = [int(dev) for dev in device[5:].split(",")]
             device = "cuda"
 
+        if not os.path.isfile(os.path.join(model_type_or_file, "model.bin")) and \
+            model_type_or_file not in ["tiny.en", "tiny", "base.en", "base", "small.en", "small", "medium.en", "medium", "large-v1", "large-v2"]:
+
+            # Convert transformer model
+
+            output_dir = os.path.join(download_root, f"ctranslate2/converters/transformers--{model_type_or_file.replace('/', '--')}")
+            logger.info(f"CTranslate2 model in {output_dir}")
+            if not os.path.isdir(output_dir):
+
+                import huggingface_hub
+
+                delete_hf_path = False
+                if not os.path.isdir(model_type_or_file):
+
+                    hf_path = huggingface_hub.hf_hub_download(repo_id=model_type_or_file, filename="pytorch_model.bin")
+                    hf_path = os.path.dirname(os.path.dirname(os.path.dirname(hf_path)))
+
+                    delete_hf_path = not os.path.exists(hf_path)
+                else:
+                    assert os.path.isfile(os.path.join(model_type_or_file, "pytorch_model.bin")), f"Could not find pytorch_model.bin in {model_type_or_file}"
+
+                check_torch_installed()
+
+                # from ctranslate2.converters.transformers import TransformersConverter
+                # converter = TransformersConverter(
+                #     model_type_or_file,
+                #     activation_scales=None, # Path to the pre-computed activation scales, see https://github.com/mit-han-lab/smoothquant
+                #     copy_files=[], # Note: "tokenizer.json" does not always exist, we will copy it separately
+                #     load_as_float16=False,
+                #     revision=None,
+                #     low_cpu_mem_usage=False,
+                #     trust_remote_code=False,
+                # )
+
+                try:
+                    # converter.convert(
+                    #     output_dir,
+                    #     force=False
+                    # )
+
+                    subprocess.check_call([
+                        "ct2-transformers-converter",
+                        "--model", model_type_or_file,
+                        "--output_dir", os.path.realpath(output_dir),
+                        "--quantization", "float16",
+                    ])
+                except Exception as err:
+                    shutil.rmtree(output_dir, ignore_errors=True)
+                    raise err
+                
+                finally:
+                    if delete_hf_path:
+                        logger.info(f"Deleting {hf_path}")
+                        shutil.rmtree(hf_path, ignore_errors=True)
+
+                assert os.path.isdir(output_dir), f"Failed to build {output_dir}"
+
+            model_type_or_file = output_dir
+
         model = None
         for i, compute_type in enumerate(compute_types):
             try:
@@ -62,6 +124,7 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                 )
                 break
             except ValueError as err:
+                logger.info("WARNING: failed to load model with compute_type={}".format(compute_type))
                 # On some old GPU we may have the error
                 # "ValueError: Requested int8_float16 compute type, 
                 # but the target device or backend do not support efficient int8_float16 computation."
@@ -69,13 +132,198 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                     raise err
 
     else:
-        model = whisper.load_model(
-            model_type_or_file, device=device,
-            download_root=os.path.join(download_root, "whisper")
-        )
+
+        extension = os.path.splitext(model_type_or_file)[-1] if os.path.isfile(model_type_or_file) else None
+
+        if model_type_or_file in whisper.available_models() or extension == ".pt":
+
+            model = whisper.load_model(
+                model_type_or_file, device=device,
+                download_root=os.path.join(download_root, "whisper")
+            )
+
+        else:
+
+            # Convert HuggingFace model
+            import torch
+
+            peft_folder = None
+
+            if extension in [".ckpt", ".bin"]:
+                model_path = model_type_or_file
+            else:
+                # Search for the cached file (download if necessary)
+                if os.path.isdir(model_type_or_file):
+                    for root, _, files in os.walk(model_type_or_file):
+                        if "adapter_config.json" in files:
+                            peft_folder = root
+                            break
+                try:
+                    import transformers
+                except ImportError:
+                    raise ImportError(f"If you are trying to download a HuggingFace model with {model_type_or_file}, please install first the transformers library")
+                from transformers.utils import cached_file
+
+                try:
+                    model_path = cached_file(model_type_or_file, "pytorch_model.bin", cache_dir=download_root, use_auth_token=None, revision=None)
+                except Exception as e:
+                    try:
+                        if isinstance(e, OSError):
+                            model_path = cached_file(model_type_or_file, "whisper.ckpt", cache_dir=download_root, use_auth_token=None, revision=None)
+                        else:
+                            raise e
+                    except:
+                        if peft_folder is None:
+                            raise RuntimeError(f"Original error: {e}\nCould not find model {model_type_or_file} from HuggingFace nor local folders.")
+
+            # Load HF Model
+            if peft_folder is not None:
+                from peft import PeftConfig, PeftModel
+                import transformers
+
+                peft_config = PeftConfig.from_pretrained(peft_folder)
+                base_model = peft_config.base_model_name_or_path
+
+                model = transformers.WhisperForConditionalGeneration.from_pretrained(base_model)
+                model = PeftModel.from_pretrained(model, peft_folder)
+                hf_state_dict = model.state_dict()
+                del model
+            else:
+                hf_state_dict = torch.load(model_path, map_location="cpu")
+
+            # Rename layers 
+            for key in list(hf_state_dict.keys()):
+                new_key = hf_to_whisper_states(key)
+                if new_key is None:
+                    hf_state_dict.pop(key)
+                elif new_key != key:
+                    hf_state_dict[new_key] = hf_state_dict.pop(key)
+
+            # Init Whisper Model and replace model weights
+            dims = whisper.model.ModelDimensions(**states_to_dim(hf_state_dict))
+            if "proj_out.weight" in hf_state_dict:
+                hf_state_dict["decoder.proj_out.weight"] = hf_state_dict.pop("proj_out.weight")
+                print("WARNING: Using untied projection layer")
+                whisper_model = WhisperUntied(dims)
+            else:
+                whisper_model = whisper.model.Whisper(dims)
+            whisper_model.load_state_dict(hf_state_dict)
+            del hf_state_dict
+            whisper_model = whisper_model.to(device)
+            return whisper_model
+
         model.eval()
         model.requires_grad_(False)
 
     logger.info("Whisper model loaded. (t={}s)".format(time.time() - start))
 
-    return model
\ No newline at end of file
+    return model
+
+
+def check_torch_installed():
+    try:
+        import torch
+    except ImportError:
+        # Install transformers with torch
+        subprocess.check_call([sys.executable, "-m", "pip", "install", "transformers[torch]>=4.23"])
+
+        # # Re-load ctranslate2
+        # import importlib
+        # import ctranslate2
+        # importlib.reload(ctranslate2)
+        # importlib.reload(ctranslate2.converters.transformers)
+
+    # import torch
+
+# Credit: https://github.com/openai/whisper/discussions/830
+def hf_to_whisper_states(text):
+    import re
+    
+    # From Speechbrain
+    if text == "_mel_filters":
+        return None
+    
+    # From PEFT
+    if "default" in text:
+        # print(f"WARNING: Ignoring {text}")
+        return None
+    if text.startswith("base_model.model."):
+        text = text[len("base_model.model."):]
+    
+    text = re.sub('.layers.', '.blocks.', text)
+    text = re.sub('.self_attn.', '.attn.', text)
+    text = re.sub('.q_proj.', '.query.', text)
+    text = re.sub('.k_proj.', '.key.', text)
+    text = re.sub('.v_proj.', '.value.', text)
+    text = re.sub('.out_proj.', '.out.', text)
+    text = re.sub('.fc1.', '.mlp.0.', text)
+    text = re.sub('.fc2.', '.mlp.2.', text)
+    text = re.sub('.fc3.', '.mlp.3.', text)
+    text = re.sub('.fc3.', '.mlp.3.', text)
+    text = re.sub('.encoder_attn.', '.cross_attn.', text)
+    text = re.sub('.cross_attn.ln.', '.cross_attn_ln.', text)
+    text = re.sub('.embed_positions.weight', '.positional_embedding', text)
+    text = re.sub('.embed_tokens.', '.token_embedding.', text)
+    text = re.sub('model.', '', text)
+    text = re.sub('attn.layer_norm.', 'attn_ln.', text)
+    text = re.sub('.final_layer_norm.', '.mlp_ln.', text)
+    text = re.sub('encoder.layer_norm.', 'encoder.ln_post.', text)
+    text = re.sub('decoder.layer_norm.', 'decoder.ln.', text)
+    return text
+
+def states_to_dim(state_dict):
+    n_audio_state = len(state_dict['encoder.ln_post.bias'])
+    n_text_state = len(state_dict["decoder.ln.bias"])
+    return {
+        "n_mels":        state_dict["encoder.conv1.weight"].shape[1],           # 80
+        "n_vocab":       state_dict["decoder.token_embedding.weight"].shape[0], # 51864 / 51865
+        "n_audio_ctx":   state_dict["encoder.positional_embedding"].shape[0],   # 1500
+        "n_audio_state": n_audio_state,         # 384 / 512 / 768 / 1024 / 1280
+        "n_audio_head":  n_audio_state // 64,   # 6 / 8 / 12 / 16 / 20
+        "n_audio_layer": len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "encoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
+        "n_text_ctx":    state_dict["decoder.positional_embedding"].shape[0],   # 448
+        "n_text_state":  n_text_state,          # 384 / 512 / 768 / 1024 / 1280
+        "n_text_head":   n_text_state // 64,    # 6 / 8 / 12 / 16 / 20
+        "n_text_layer":  len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "decoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
+    }
+
+if not USE_CTRANSLATE2:
+
+    class TextDecoderUntied(whisper.model.TextDecoder):
+        """
+        Same as TextDecoder but with untied weights
+        """
+        def __init__(self, *args, **kwargs):
+            import torch
+            super().__init__(*args, **kwargs)
+
+            n_vocab, n_state = self.token_embedding.weight.shape
+
+            self.proj_out = torch.nn.Linear(n_state, n_vocab, bias=False)
+
+        def forward(self, x, xa, kv_cache = None):
+            offset = next(iter(kv_cache.values())).shape[1] if kv_cache else 0
+            x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
+            x = x.to(xa.dtype)
+
+            for block in self.blocks:
+                x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
+
+            x = self.ln(x)
+
+            # logits = self.proj_out(x).float()
+            # logits = (x @ torch.transpose(self.proj_out.weight.to(x.dtype), 0, 1)).float()
+            logits = self.proj_out.to(x.dtype)(x).float()
+
+            return logits
+
+    class WhisperUntied(whisper.model.Whisper):
+        def __init__(self, *args, **kwargs):
+            super().__init__(*args, **kwargs)
+            self.decoder = TextDecoderUntied(
+                self.dims.n_vocab,
+                self.dims.n_text_ctx,
+                self.dims.n_text_state,
+                self.dims.n_text_head,
+                self.dims.n_text_layer,
+            )

From c7cf09597bfa6819173f89513a9620496d405489 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 22 Aug 2023 15:29:31 +0200
Subject: [PATCH 151/172] support of Whisper models finetuned with transformers
 Python package (or in HuggingFace formats)

---
 RELEASE.md                   |   3 +
 stt/processing/load_model.py | 258 ++++++++++++++++++++++++++++++++++-
 2 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/RELEASE.md b/RELEASE.md
index 174ae1b..5380da5 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.4
+- Add integration of Whisper models from transformers
+
 # 4.0.3
 - Tune punctuation heuristics
 
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 18541e2..8a91b75 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -1,5 +1,8 @@
 import os
+import sys
 import time
+import shutil
+import subprocess
 
 from stt import logger, USE_CTRANSLATE2
 
@@ -48,6 +51,65 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             device_index = [int(dev) for dev in device[5:].split(",")]
             device = "cuda"
 
+        if not os.path.isfile(os.path.join(model_type_or_file, "model.bin")) and \
+            model_type_or_file not in ["tiny.en", "tiny", "base.en", "base", "small.en", "small", "medium.en", "medium", "large-v1", "large-v2"]:
+
+            # Convert transformer model
+
+            output_dir = os.path.join(download_root, f"ctranslate2/converters/transformers--{model_type_or_file.replace('/', '--')}")
+            logger.info(f"CTranslate2 model in {output_dir}")
+            if not os.path.isdir(output_dir):
+
+                import huggingface_hub
+
+                delete_hf_path = False
+                if not os.path.isdir(model_type_or_file):
+
+                    hf_path = huggingface_hub.hf_hub_download(repo_id=model_type_or_file, filename="pytorch_model.bin")
+                    hf_path = os.path.dirname(os.path.dirname(os.path.dirname(hf_path)))
+
+                    delete_hf_path = not os.path.exists(hf_path)
+                else:
+                    assert os.path.isfile(os.path.join(model_type_or_file, "pytorch_model.bin")), f"Could not find pytorch_model.bin in {model_type_or_file}"
+
+                check_torch_installed()
+
+                # from ctranslate2.converters.transformers import TransformersConverter
+                # converter = TransformersConverter(
+                #     model_type_or_file,
+                #     activation_scales=None, # Path to the pre-computed activation scales, see https://github.com/mit-han-lab/smoothquant
+                #     copy_files=[], # Note: "tokenizer.json" does not always exist, we will copy it separately
+                #     load_as_float16=False,
+                #     revision=None,
+                #     low_cpu_mem_usage=False,
+                #     trust_remote_code=False,
+                # )
+
+                try:
+                    # converter.convert(
+                    #     output_dir,
+                    #     force=False
+                    # )
+
+                    subprocess.check_call([
+                        "ct2-transformers-converter",
+                        "--model", model_type_or_file,
+                        "--output_dir", os.path.realpath(output_dir),
+                        "--quantization", "float16",
+                    ])
+                except Exception as err:
+                    shutil.rmtree(output_dir, ignore_errors=True)
+                    raise err
+                
+                finally:
+                    if delete_hf_path:
+                        logger.info(f"Deleting {hf_path}")
+                        shutil.rmtree(hf_path, ignore_errors=True)
+
+                assert os.path.isdir(output_dir), f"Failed to build {output_dir}"
+
+            model_type_or_file = output_dir
+
         model = None
         for i, compute_type in enumerate(compute_types):
             try:
@@ -62,6 +124,7 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                 )
                 break
             except ValueError as err:
+                logger.info("WARNING: failed to load model with compute_type={}".format(compute_type))
                 # On some old GPU we may have the error
                 # "ValueError: Requested int8_float16 compute type, 
                 # but the target device or backend do not support efficient int8_float16 computation."
@@ -69,13 +132,198 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                     raise err
 
     else:
-        model = whisper.load_model(
-            model_type_or_file, device=device,
-            download_root=os.path.join(download_root, "whisper")
-        )
+
+        extension = os.path.splitext(model_type_or_file)[-1] if os.path.isfile(model_type_or_file) else None
+
+        if model_type_or_file in whisper.available_models() or extension == ".pt":
+
+            model = whisper.load_model(
+                model_type_or_file, device=device,
+                download_root=os.path.join(download_root, "whisper")
+            )
+
+        else:
+
+            # Convert HuggingFace model
+            import torch
+
+            peft_folder = None
+
+            if extension in [".ckpt", ".bin"]:
+                model_path = model_type_or_file
+            else:
+                # Search for the cached file (download if necessary)
+                if os.path.isdir(model_type_or_file):
+                    for root, _, files in os.walk(model_type_or_file):
+                        if "adapter_config.json" in files:
+                            peft_folder = root
+                            break
+                try:
+                    import transformers
+                except ImportError:
+                    raise ImportError(f"If you are trying to download a HuggingFace model with {model_type_or_file}, please install first the transformers library")
+                from transformers.utils import cached_file
+
+                try:
+                    model_path = cached_file(model_type_or_file, "pytorch_model.bin", cache_dir=download_root, use_auth_token=None, revision=None)
+                except Exception as e:
+                    try:
+                        if isinstance(e, OSError):
+                            model_path = cached_file(model_type_or_file, "whisper.ckpt", cache_dir=download_root, use_auth_token=None, revision=None)
+                        else:
+                            raise e
+                    except:
+                        if peft_folder is None:
+                            raise RuntimeError(f"Original error: {e}\nCould not find model {model_type_or_file} from HuggingFace nor local folders.")
+
+            # Load HF Model
+            if peft_folder is not None:
+                from peft import PeftConfig, PeftModel
+                import transformers
+
+                peft_config = PeftConfig.from_pretrained(peft_folder)
+                base_model = peft_config.base_model_name_or_path
+
+                model = transformers.WhisperForConditionalGeneration.from_pretrained(base_model)
+                model = PeftModel.from_pretrained(model, peft_folder)
+                hf_state_dict = model.state_dict()
+                del model
+            else:
+                hf_state_dict = torch.load(model_path, map_location="cpu")
+
+            # Rename layers 
+            for key in list(hf_state_dict.keys()):
+                new_key = hf_to_whisper_states(key)
+                if new_key is None:
+                    hf_state_dict.pop(key)
+                elif new_key != key:
+                    hf_state_dict[new_key] = hf_state_dict.pop(key)
+
+            # Init Whisper Model and replace model weights
+            dims = whisper.model.ModelDimensions(**states_to_dim(hf_state_dict))
+            if "proj_out.weight" in hf_state_dict:
+                hf_state_dict["decoder.proj_out.weight"] = hf_state_dict.pop("proj_out.weight")
+                print("WARNING: Using untied projection layer")
+                whisper_model = WhisperUntied(dims)
+            else:
+                whisper_model = whisper.model.Whisper(dims)
+            whisper_model.load_state_dict(hf_state_dict)
+            del hf_state_dict
+            whisper_model = whisper_model.to(device)
+            return whisper_model
+
         model.eval()
         model.requires_grad_(False)
 
     logger.info("Whisper model loaded. (t={}s)".format(time.time() - start))
 
-    return model
\ No newline at end of file
+    return model
+
+
+def check_torch_installed():
+    try:
+        import torch
+    except ImportError:
+        # Install transformers with torch
+        subprocess.check_call([sys.executable, "-m", "pip", "install", "transformers[torch]>=4.23"])
+
+        # # Re-load ctranslate2
+        # import importlib
+        # import ctranslate2
+        # importlib.reload(ctranslate2)
+        # importlib.reload(ctranslate2.converters.transformers)
+
+    # import torch
+
+# Credit: https://github.com/openai/whisper/discussions/830
+def hf_to_whisper_states(text):
+    import re
+    
+    # From Speechbrain
+    if text == "_mel_filters":
+        return None
+    
+    # From PEFT
+    if "default" in text:
+        # print(f"WARNING: Ignoring {text}")
+        return None
+    if text.startswith("base_model.model."):
+        text = text[len("base_model.model."):]
+    
+    text = re.sub('.layers.', '.blocks.', text)
+    text = re.sub('.self_attn.', '.attn.', text)
+    text = re.sub('.q_proj.', '.query.', text)
+    text = re.sub('.k_proj.', '.key.', text)
+    text = re.sub('.v_proj.', '.value.', text)
+    text = re.sub('.out_proj.', '.out.', text)
+    text = re.sub('.fc1.', '.mlp.0.', text)
+    text = re.sub('.fc2.', '.mlp.2.', text)
+    text = re.sub('.fc3.', '.mlp.3.', text)
+    text = re.sub('.fc3.', '.mlp.3.', text)
+    text = re.sub('.encoder_attn.', '.cross_attn.', text)
+    text = re.sub('.cross_attn.ln.', '.cross_attn_ln.', text)
+    text = re.sub('.embed_positions.weight', '.positional_embedding', text)
+    text = re.sub('.embed_tokens.', '.token_embedding.', text)
+    text = re.sub('model.', '', text)
+    text = re.sub('attn.layer_norm.', 'attn_ln.', text)
+    text = re.sub('.final_layer_norm.', '.mlp_ln.', text)
+    text = re.sub('encoder.layer_norm.', 'encoder.ln_post.', text)
+    text = re.sub('decoder.layer_norm.', 'decoder.ln.', text)
+    return text
+
+def states_to_dim(state_dict):
+    n_audio_state = len(state_dict['encoder.ln_post.bias'])
+    n_text_state = len(state_dict["decoder.ln.bias"])
+    return {
+        "n_mels":        state_dict["encoder.conv1.weight"].shape[1],           # 80
+        "n_vocab":       state_dict["decoder.token_embedding.weight"].shape[0], # 51864 / 51865
+        "n_audio_ctx":   state_dict["encoder.positional_embedding"].shape[0],   # 1500
+        "n_audio_state": n_audio_state,         # 384 / 512 / 768 / 1024 / 1280
+        "n_audio_head":  n_audio_state // 64,   # 6 / 8 / 12 / 16 / 20
+        "n_audio_layer": len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "encoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
+        "n_text_ctx":    state_dict["decoder.positional_embedding"].shape[0],   # 448
+        "n_text_state":  n_text_state,          # 384 / 512 / 768 / 1024 / 1280
+        "n_text_head":   n_text_state // 64,    # 6 / 8 / 12 / 16 / 20
+        "n_text_layer":  len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "decoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
+    }
+
+if not USE_CTRANSLATE2:
+
+    class TextDecoderUntied(whisper.model.TextDecoder):
+        """
+        Same as TextDecoder but with untied weights
+        """
+        def __init__(self, *args, **kwargs):
+            import torch
+            super().__init__(*args, **kwargs)
+
+            n_vocab, n_state = self.token_embedding.weight.shape
+
+            self.proj_out = torch.nn.Linear(n_state, n_vocab, bias=False)
+
+        def forward(self, x, xa, kv_cache = None):
+            offset = next(iter(kv_cache.values())).shape[1] if kv_cache else 0
+            x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
+            x = x.to(xa.dtype)
+
+            for block in self.blocks:
+                x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
+
+            x = self.ln(x)
+
+            # logits = self.proj_out(x).float()
+            # logits = (x @ torch.transpose(self.proj_out.weight.to(x.dtype), 0, 1)).float()
+            logits = self.proj_out.to(x.dtype)(x).float()
+
+            return logits
+
+    class WhisperUntied(whisper.model.Whisper):
+        def __init__(self, *args, **kwargs):
+            super().__init__(*args, **kwargs)
+            self.decoder = TextDecoderUntied(
+                self.dims.n_vocab,
+                self.dims.n_text_ctx,
+                self.dims.n_text_state,
+                self.dims.n_text_head,
+                self.dims.n_text_layer,
+            )

From 80922dfce999b22c323d911e2a8698fb328f2e21 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 22 Aug 2023 17:09:13 +0200
Subject: [PATCH 152/172] add option for prompt

---
 RELEASE.md                 | 1 +
 stt/processing/decoding.py | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/RELEASE.md b/RELEASE.md
index 5380da5..3917468 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,5 +1,6 @@
 # 4.0.4
 - Add integration of Whisper models from transformers
+- Add support of prompt from Whisper models (env variable PROMPT)
 
 # 4.0.3
 - Tune punctuation heuristics
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 4b1e1d5..8b81c5d 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -26,6 +26,7 @@
     default_best_of = None
     default_temperature = 0.0
 
+default_initial_prompt = os.environ.get("PROMPT", None)
 
 def decode(audio,
            model,
@@ -39,6 +40,7 @@ def decode(audio,
            condition_on_previous_text: bool = False,
            no_speech_threshold: float = 0.6,
            compression_ratio_threshold: float = 2.4,
+           initial_prompt: str = default_initial_prompt,
            ) -> dict:
 
     if language is None:

From bc2ca6380b8b0bc5ec8fe19b0dc5dac8beff8797 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 22 Aug 2023 17:13:29 +0200
Subject: [PATCH 153/172] Use persistent prompt (when there is an
 initial_prompt and when condition_on_previous_text is False

---
 Dockerfile.ctranslate2       | 2 +-
 Dockerfile.ctranslate2.cpu   | 2 +-
 requirements.ctranslate2.txt | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Dockerfile.ctranslate2 b/Dockerfile.ctranslate2
index e2e0008..64afb50 100644
--- a/Dockerfile.ctranslate2
+++ b/Dockerfile.ctranslate2
@@ -1,7 +1,7 @@
 FROM ghcr.io/opennmt/ctranslate2:latest-ubuntu20.04-cuda11.2
 LABEL maintainer="jlouradour@linagora.com"
 
-RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ffmpeg git
 
 # Install python dependencies
 COPY requirements.ctranslate2.txt ./
diff --git a/Dockerfile.ctranslate2.cpu b/Dockerfile.ctranslate2.cpu
index 46c148e..fc30d21 100644
--- a/Dockerfile.ctranslate2.cpu
+++ b/Dockerfile.ctranslate2.cpu
@@ -1,7 +1,7 @@
 FROM python:3.9
 LABEL maintainer="jlouradour@linagora.com"
 
-RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ffmpeg git
 
 # Install python dependencies
 COPY requirements.ctranslate2.txt ./
diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index 054d595..ef4e8cc 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -11,4 +11,5 @@ requests>=2.26.0
 wavio>=0.0.4
 websockets
 ctranslate2==3.18.0
-faster_whisper==0.7.1
\ No newline at end of file
+#faster_whisper==0.7.1
+git+https://github.com/linto-ai/faster-whisper.git@d9cffcaad763def754124977cc66150f0efcd7ea
\ No newline at end of file

From 7d386cf2f6b4d5472231b06125958446d70d05d2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 23 Aug 2023 10:39:05 +0200
Subject: [PATCH 154/172] fix possible failure when a segment starts with a
 punctuation

---
 RELEASE.md                 | 1 +
 stt/processing/decoding.py | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/RELEASE.md b/RELEASE.md
index 3917468..a53454e 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,6 +1,7 @@
 # 4.0.4
 - Add integration of Whisper models from transformers
 - Add support of prompt from Whisper models (env variable PROMPT)
+- Fix possible failure when a Whisper segment starts with a punctuation
 
 # 4.0.3
 - Tune punctuation heuristics
diff --git a/stt/processing/decoding.py b/stt/processing/decoding.py
index 8b81c5d..42b3c35 100644
--- a/stt/processing/decoding.py
+++ b/stt/processing/decoding.py
@@ -296,7 +296,7 @@ def checked_timestamps(start, end=None):
             for word in segment.words:
                 start, end = checked_timestamps(word.start, word.end)
                 word_strip = word.word.strip()
-                if glue_punctuations and len(word_strip)>1 and word_strip[0] in glue_punctuations:
+                if glue_punctuations and len(words) and len(word_strip)>1 and word_strip[0] in glue_punctuations:
                     words[-1]["text"] += word.word.lstrip()
                     words[-1]["confidence"].append(word.probability)
                     words[-1]["end"] = max(words[-1]["end"], end)

From 65e2ab0436e88424d3845ff49a927056cd763f85 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 30 Aug 2023 12:09:00 +0200
Subject: [PATCH 155/172] improve README

---
 README.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 164db6f..8e6b04d 100644
--- a/README.md
+++ b/README.md
@@ -84,12 +84,13 @@ cp .envdefault .env
 |---|---|---|
 | SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | `http` \| `task` |
 | MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| `medium` \| `large-v1` \| ... |
-| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| `WAV2VEC2_ASR_BASE_960H` \| `jonatasgrosman/wav2vec2-large-xlsr-53-english` \| ... |
 | LANGUAGE | (Optional) Language to recognize | `*` \| `fr` \| `fr-FR` \| `French` \| `en` \| `en-US` \| `English` \| ... |
-| SERVICE_NAME | Using the task mode, set the queue's name for task processing | `my-stt` |
-| SERVICE_BROKER | Using the task mode, URL of the message broker | `redis://my-broker:6379` |
-| BROKER_PASS | Using the task mode, broker password | `my-password` |
+| PROMPT | (Optional) Prompt to use for the Whisper model | `some free text to encourage a certain transcription style (disfluencies, no punctuation, ...)` |
+| ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| `WAV2VEC2_ASR_BASE_960H` \| `jonatasgrosman/wav2vec2-large-xlsr-53-english` \| ... |
 | CONCURRENCY | Maximum number of parallel requests | `3` |
+| SERVICE_NAME | (For the task mode) queue's name for task processing | `my-stt` |
+| SERVICE_BROKER | (For the task mode) URL of the message broker | `redis://my-broker:6379` |
+| BROKER_PASS | (For the task mode only) broker password | `my-password` |
 
 If `*` is used for the `LANGUAGE` environment variable, or if `LANGUAGE` is not defined,
 automatic language detection will be performed by Whisper.

From 5eb31eae80b72c9afe8d3762084e6b41aac7c242 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 30 Aug 2023 15:01:14 +0200
Subject: [PATCH 156/172] do not publish numbered tag on whisper branch

---
 Jenkinsfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 75a09bd..e96f25d 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -62,7 +62,7 @@ pipeline {
                     ).trim()
 
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push("${VERSION}")
+                        // image.push("${VERSION}")
                         image.push('whisper-latest')
                     }
                 }

From 97320ce2bd6d130e3168665d9693e899aa1d7f45 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Mon, 13 Nov 2023 15:34:17 +0100
Subject: [PATCH 157/172] Support of new Whisper model large-v3

---
 requirements.ctranslate2.txt | 6 +++---
 stt/processing/load_model.py | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/requirements.ctranslate2.txt b/requirements.ctranslate2.txt
index ef4e8cc..2ddc118 100644
--- a/requirements.ctranslate2.txt
+++ b/requirements.ctranslate2.txt
@@ -10,6 +10,6 @@ pyyaml>=5.4.1
 requests>=2.26.0
 wavio>=0.0.4
 websockets
-ctranslate2==3.18.0
-#faster_whisper==0.7.1
-git+https://github.com/linto-ai/faster-whisper.git@d9cffcaad763def754124977cc66150f0efcd7ea
\ No newline at end of file
+#faster_whisper==0.10.0
+# This is version faster_whisper==0.9.0 + prompt propagation + fix for large-v3
+git+https://github.com/linto-ai/faster-whisper.git@aad9e7508b528e79be2a9975ac79ef8317f02a6d
\ No newline at end of file
diff --git a/stt/processing/load_model.py b/stt/processing/load_model.py
index 8a91b75..3790593 100644
--- a/stt/processing/load_model.py
+++ b/stt/processing/load_model.py
@@ -52,7 +52,7 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             device = "cuda"
 
         if not os.path.isfile(os.path.join(model_type_or_file, "model.bin")) and \
-            model_type_or_file not in ["tiny.en", "tiny", "base.en", "base", "small.en", "small", "medium.en", "medium", "large-v1", "large-v2"]:
+            not max([model_type_or_file.startswith(prefix) for prefix in ["tiny", "base", "small", "medium", "large"]]):
 
             # Convert transformer model
 

From 586f1d60b9a00e722c00cbaef2fe01bbb58acb4a Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 22 Nov 2023 14:43:22 +0100
Subject: [PATCH 158/172] Update release note

---
 .envdefault | 2 +-
 RELEASE.md  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/.envdefault b/.envdefault
index 1dbc2b1..88c27ea 100644
--- a/.envdefault
+++ b/.envdefault
@@ -13,7 +13,7 @@ BROKER_PASS=
 # STT MODELING PARAMETERS
 ############################################
 
-# The model can be a path to a model, or a model name ("tiny", "base", "small", "medium", "large-v1" or "large-v2")
+# The model can be a path to a model, or a model name ("tiny", "base", "small", "medium", "large-v1", "large-v2" or "large-v3")
 MODEL=medium
 
 # The language can be in different formats: "en", "en-US", "English", ...
diff --git a/RELEASE.md b/RELEASE.md
index a53454e..f507fa5 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,3 +1,6 @@
+# 4.0.5
+- Support of Whisper large-v3 model
+
 # 4.0.4
 - Add integration of Whisper models from transformers
 - Add support of prompt from Whisper models (env variable PROMPT)

From b2fc9f072cd7616a33f585e0a566274785902d65 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 22 Nov 2023 15:15:29 +0100
Subject: [PATCH 159/172] publish tag 4.0.4

---
 Jenkinsfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index e96f25d..75a09bd 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -62,7 +62,7 @@ pipeline {
                     ).trim()
 
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        // image.push("${VERSION}")
+                        image.push("${VERSION}")
                         image.push('whisper-latest')
                     }
                 }

From 74cb4b424fabab2adf28c2887f31e6c7d5b00ce2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 29 Nov 2023 18:36:38 +0100
Subject: [PATCH 160/172] Isolate what is specific to Whisper in a folder

---
 RELEASE.md                                    | 72 -------------------
 .envdefault => whisper/.envdefault            |  0
 .../Dockerfile.ctranslate2                    |  6 +-
 .../Dockerfile.ctranslate2.cpu                |  6 +-
 Dockerfile.torch => whisper/Dockerfile.torch  |  6 +-
 .../Dockerfile.torch.cpu                      |  6 +-
 README.md => whisper/README.md                |  0
 whisper/RELEASE.md                            | 16 +++++
 .../docker-entrypoint.sh                      |  0
 .../requirements.ctranslate2.txt              |  0
 .../requirements.torch.txt                    |  0
 {stt => whisper/stt}/__init__.py              |  0
 {stt => whisper/stt}/processing/__init__.py   |  0
 .../stt}/processing/alignment_model.py        |  0
 {stt => whisper/stt}/processing/decoding.py   |  3 +
 {stt => whisper/stt}/processing/load_model.py |  0
 .../stt}/processing/text_normalize.py         |  0
 {stt => whisper/stt}/processing/utils.py      |  0
 .../stt}/processing/word_alignment.py         |  0
 19 files changed, 31 insertions(+), 84 deletions(-)
 delete mode 100644 RELEASE.md
 rename .envdefault => whisper/.envdefault (100%)
 rename Dockerfile.ctranslate2 => whisper/Dockerfile.ctranslate2 (81%)
 rename Dockerfile.ctranslate2.cpu => whisper/Dockerfile.ctranslate2.cpu (80%)
 rename Dockerfile.torch => whisper/Dockerfile.torch (79%)
 rename Dockerfile.torch.cpu => whisper/Dockerfile.torch.cpu (83%)
 rename README.md => whisper/README.md (100%)
 create mode 100644 whisper/RELEASE.md
 rename docker-entrypoint.sh => whisper/docker-entrypoint.sh (100%)
 rename requirements.ctranslate2.txt => whisper/requirements.ctranslate2.txt (100%)
 rename requirements.torch.txt => whisper/requirements.torch.txt (100%)
 rename {stt => whisper/stt}/__init__.py (100%)
 rename {stt => whisper/stt}/processing/__init__.py (100%)
 rename {stt => whisper/stt}/processing/alignment_model.py (100%)
 rename {stt => whisper/stt}/processing/decoding.py (99%)
 rename {stt => whisper/stt}/processing/load_model.py (100%)
 rename {stt => whisper/stt}/processing/text_normalize.py (100%)
 rename {stt => whisper/stt}/processing/utils.py (100%)
 rename {stt => whisper/stt}/processing/word_alignment.py (100%)

diff --git a/RELEASE.md b/RELEASE.md
deleted file mode 100644
index f507fa5..0000000
--- a/RELEASE.md
+++ /dev/null
@@ -1,72 +0,0 @@
-# 4.0.5
-- Support of Whisper large-v3 model
-
-# 4.0.4
-- Add integration of Whisper models from transformers
-- Add support of prompt from Whisper models (env variable PROMPT)
-- Fix possible failure when a Whisper segment starts with a punctuation
-
-# 4.0.3
-- Tune punctuation heuristics
-
-# 4.0.2
-- Do not considers symbols like "$" as punctuation marks
-
-# 4.0.1
-- Fix punctuations
-
-# 4.0.0
-- Integration of Whisper
-
-# 3.3.2
-- Fixed use of stereo audio in http serving mode
-
-# 3.3.1
-- Fixed lin_to_vosk throwing an error on a already existing container.
-- Corrected an error on the README regarding mounting model volumes. 
-- Code styling (PEP 8)
-
-# 3.3.0
-- Added optional streaming route to the http serving mode
-- Added serving mode: websocket
-- Added Dynamic model conversion allowing to use either Vosk Models or Linagora AM/LM models
-- Changer Vosk dependency to alphacep/vosk
-- Updated README.md
-
-# 3.2.1
-- Repository total rework. The goal being to have a simple transcription service embeddable within a micro-service infrastructure. 
-- Changed repository name from linto-platform-stt-standalone-worker to linto-platform-stt.
-- Added celery connector for microservice integration.
-- Added launch option to specify serving mode between task and http.
-- Removed diarization functionnality.
-- Removed punctuation functionnality.
-- Removed Async requests/Job management.
-- Updated README to reflect those changes.
-
-# 3.1.1
-- Change Pykaldi with vosk-API (no python wrapper for decoding function, no extrat packages during installation, c++ implementation based on kaldi functions)
-- New feature: Compute a confidence score per transcription
-- Fix minor bugs
-
-# 2.2.1
-- Fix minor bugs
-- put SWAGGER_PATH parameter as optional
-- Generate the word_boundary file if it does not exist
-
-# 2.2.0
-- Speaker diarization feature: pyBK package
-- Mulithreading feature: Speech decoding and Speaker diarization processes
-- Optional parameter: real number of speaker in the audio
-
-# 2.0.0
-- Reimplement LinTO-Platform-stt-standalone-worker using Pykaldi package
-
-# 1.1.2
-- New features:
-    - Word timestamp computing
-    - Response type: plain/text: simple text output and application/json: the transcription and the words timestamp.
-    - Swagger: integrate swagger in the service using a python package
-    - Fix minor bugs
-
-# 1.0.0
-- First build of LinTO-Platform-stt-standalone-worker
\ No newline at end of file
diff --git a/.envdefault b/whisper/.envdefault
similarity index 100%
rename from .envdefault
rename to whisper/.envdefault
diff --git a/Dockerfile.ctranslate2 b/whisper/Dockerfile.ctranslate2
similarity index 81%
rename from Dockerfile.ctranslate2
rename to whisper/Dockerfile.ctranslate2
index 64afb50..52fbc44 100644
--- a/Dockerfile.ctranslate2
+++ b/whisper/Dockerfile.ctranslate2
@@ -4,17 +4,17 @@ LABEL maintainer="jlouradour@linagora.com"
 RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ffmpeg git
 
 # Install python dependencies
-COPY requirements.ctranslate2.txt ./
+COPY whisper/requirements.ctranslate2.txt ./
 RUN pip install --no-cache-dir -r requirements.ctranslate2.txt && rm requirements.ctranslate2.txt
 
 WORKDIR /usr/src/app
 
-COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
-COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY whisper/stt /usr/src/app/stt
+COPY whisper/docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
 
 ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
diff --git a/Dockerfile.ctranslate2.cpu b/whisper/Dockerfile.ctranslate2.cpu
similarity index 80%
rename from Dockerfile.ctranslate2.cpu
rename to whisper/Dockerfile.ctranslate2.cpu
index fc30d21..c8d6972 100644
--- a/Dockerfile.ctranslate2.cpu
+++ b/whisper/Dockerfile.ctranslate2.cpu
@@ -4,17 +4,17 @@ LABEL maintainer="jlouradour@linagora.com"
 RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ffmpeg git
 
 # Install python dependencies
-COPY requirements.ctranslate2.txt ./
+COPY whisper/requirements.ctranslate2.txt ./
 RUN pip install --no-cache-dir -r requirements.ctranslate2.txt && rm requirements.ctranslate2.txt
 
 WORKDIR /usr/src/app
 
-COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
-COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY whisper/stt /usr/src/app/stt
+COPY whisper/docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
 
 ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
diff --git a/Dockerfile.torch b/whisper/Dockerfile.torch
similarity index 79%
rename from Dockerfile.torch
rename to whisper/Dockerfile.torch
index 37480c0..2f3a0d0 100644
--- a/Dockerfile.torch
+++ b/whisper/Dockerfile.torch
@@ -4,17 +4,17 @@ LABEL maintainer="jlouradour@linagora.com"
 RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg
 
 # Install python dependencies
-COPY requirements.torch.txt ./
+COPY whisper/requirements.torch.txt ./
 RUN pip install --no-cache-dir -r requirements.torch.txt && rm requirements.torch.txt
 
 WORKDIR /usr/src/app
 
-COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
-COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY whisper/stt /usr/src/app/stt
+COPY whisper/docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
 
 ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
diff --git a/Dockerfile.torch.cpu b/whisper/Dockerfile.torch.cpu
similarity index 83%
rename from Dockerfile.torch.cpu
rename to whisper/Dockerfile.torch.cpu
index 72582b6..e9198d5 100644
--- a/Dockerfile.torch.cpu
+++ b/whisper/Dockerfile.torch.cpu
@@ -10,17 +10,17 @@ RUN pip3 install \
     -f https://download.pytorch.org/whl/torch_stable.html
 
 # Install python dependencies
-COPY requirements.torch.txt ./
+COPY whisper/requirements.torch.txt ./
 RUN pip install --no-cache-dir -r requirements.torch.txt && rm requirements.torch.txt
 
 WORKDIR /usr/src/app
 
-COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
-COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY whisper/stt /usr/src/app/stt
+COPY whisper/docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
 
 ENV PYTHONPATH="${PYTHONPATH}:/usr/src/app/stt"
 
diff --git a/README.md b/whisper/README.md
similarity index 100%
rename from README.md
rename to whisper/README.md
diff --git a/whisper/RELEASE.md b/whisper/RELEASE.md
new file mode 100644
index 0000000..2d57069
--- /dev/null
+++ b/whisper/RELEASE.md
@@ -0,0 +1,16 @@
+# 1.0.0
+- Support of Whisper (including large-v3 model)
+- Add integration of Whisper models from transformers
+- Add support of prompt from Whisper models (env variable PROMPT)
+- Fix possible failure when a Whisper segment starts with a punctuation
+- Tune punctuation heuristics
+
+# 0.0.0
+- Added optional streaming route to the http serving mode
+- Added serving mode: websocket
+- Added Dynamic model conversion allowing to use either Vosk Models or Linagora AM/LM models
+- Added celery connector for microservice integration.
+- Added launch option to specify serving mode between task and http.
+- Removed Async requests/Job management.
+- New feature: Compute a confidence score per transcription
+- put SWAGGER_PATH parameter as optional
diff --git a/docker-entrypoint.sh b/whisper/docker-entrypoint.sh
similarity index 100%
rename from docker-entrypoint.sh
rename to whisper/docker-entrypoint.sh
diff --git a/requirements.ctranslate2.txt b/whisper/requirements.ctranslate2.txt
similarity index 100%
rename from requirements.ctranslate2.txt
rename to whisper/requirements.ctranslate2.txt
diff --git a/requirements.torch.txt b/whisper/requirements.torch.txt
similarity index 100%
rename from requirements.torch.txt
rename to whisper/requirements.torch.txt
diff --git a/stt/__init__.py b/whisper/stt/__init__.py
similarity index 100%
rename from stt/__init__.py
rename to whisper/stt/__init__.py
diff --git a/stt/processing/__init__.py b/whisper/stt/processing/__init__.py
similarity index 100%
rename from stt/processing/__init__.py
rename to whisper/stt/processing/__init__.py
diff --git a/stt/processing/alignment_model.py b/whisper/stt/processing/alignment_model.py
similarity index 100%
rename from stt/processing/alignment_model.py
rename to whisper/stt/processing/alignment_model.py
diff --git a/stt/processing/decoding.py b/whisper/stt/processing/decoding.py
similarity index 99%
rename from stt/processing/decoding.py
rename to whisper/stt/processing/decoding.py
index 42b3c35..9dd6855 100644
--- a/stt/processing/decoding.py
+++ b/whisper/stt/processing/decoding.py
@@ -56,6 +56,7 @@ def decode(audio,
         kwargs.pop("alignment_model")
         res = decode_ct2(**kwargs)
     else:
+        print("OK")
         res = decode_torch(**kwargs)
 
     logger.info("Transcription complete (t={}s)".format(time.time() - start_t))
@@ -107,6 +108,7 @@ def decode_torch(audio,
                  no_speech_threshold,
                  compression_ratio_threshold,
                  normalize_text_as_words=False,
+                 initial_prompt=None,
                  ):
     """Transcribe the audio data using Whisper with the defined model."""
 
@@ -122,6 +124,7 @@ def decode_torch(audio,
         no_speech_threshold=no_speech_threshold,
         compression_ratio_threshold=compression_ratio_threshold,
         vad=USE_VAD,
+        initial_prompt=initial_prompt,
     )
 
     if alignment_model is None:
diff --git a/stt/processing/load_model.py b/whisper/stt/processing/load_model.py
similarity index 100%
rename from stt/processing/load_model.py
rename to whisper/stt/processing/load_model.py
diff --git a/stt/processing/text_normalize.py b/whisper/stt/processing/text_normalize.py
similarity index 100%
rename from stt/processing/text_normalize.py
rename to whisper/stt/processing/text_normalize.py
diff --git a/stt/processing/utils.py b/whisper/stt/processing/utils.py
similarity index 100%
rename from stt/processing/utils.py
rename to whisper/stt/processing/utils.py
diff --git a/stt/processing/word_alignment.py b/whisper/stt/processing/word_alignment.py
similarity index 100%
rename from stt/processing/word_alignment.py
rename to whisper/stt/processing/word_alignment.py

From 8272a5f43fa51bc67ec614e471cc099caeccd908 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 30 Nov 2023 16:03:59 +0100
Subject: [PATCH 161/172] Isolate what is specific to Kaldi in a folder

---
 .envdefault => kaldi/.envdefault                   | 0
 Dockerfile => kaldi/Dockerfile                     | 8 ++++----
 README.md => kaldi/README.md                       | 0
 RELEASE.md => kaldi/RELEASE.md                     | 0
 docker-entrypoint.sh => kaldi/docker-entrypoint.sh | 0
 lin_to_vosk.py => kaldi/lin_to_vosk.py             | 0
 requirements.txt => kaldi/requirements.txt         | 0
 {stt => kaldi/stt}/__init__.py                     | 0
 {stt => kaldi/stt}/processing/__init__.py          | 0
 {stt => kaldi/stt}/processing/decoding.py          | 0
 {stt => kaldi/stt}/processing/streaming.py         | 0
 {stt => kaldi/stt}/processing/utils.py             | 0
 12 files changed, 4 insertions(+), 4 deletions(-)
 rename .envdefault => kaldi/.envdefault (100%)
 rename Dockerfile => kaldi/Dockerfile (91%)
 rename README.md => kaldi/README.md (100%)
 rename RELEASE.md => kaldi/RELEASE.md (100%)
 rename docker-entrypoint.sh => kaldi/docker-entrypoint.sh (100%)
 rename lin_to_vosk.py => kaldi/lin_to_vosk.py (100%)
 rename requirements.txt => kaldi/requirements.txt (100%)
 rename {stt => kaldi/stt}/__init__.py (100%)
 rename {stt => kaldi/stt}/processing/__init__.py (100%)
 rename {stt => kaldi/stt}/processing/decoding.py (100%)
 rename {stt => kaldi/stt}/processing/streaming.py (100%)
 rename {stt => kaldi/stt}/processing/utils.py (100%)

diff --git a/.envdefault b/kaldi/.envdefault
similarity index 100%
rename from .envdefault
rename to kaldi/.envdefault
diff --git a/Dockerfile b/kaldi/Dockerfile
similarity index 91%
rename from Dockerfile
rename to kaldi/Dockerfile
index bdf65c0..f062951 100644
--- a/Dockerfile
+++ b/kaldi/Dockerfile
@@ -45,7 +45,7 @@ RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi /opt/kal
     && make -j $(nproc) online2 lm rnnlm
 
 # Install python dependencies
-COPY requirements.txt ./
+COPY kaldi/requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 
 # Install Custom Vosk API
@@ -57,13 +57,13 @@ RUN git clone --depth 1 https://github.com/alphacep/vosk-api /opt/vosk-api && cd
 
 WORKDIR /usr/src/app
 
-COPY stt /usr/src/app/stt
 COPY celery_app /usr/src/app/celery_app
 COPY http_server /usr/src/app/http_server
 COPY websocket /usr/src/app/websocket
 COPY document /usr/src/app/document
-COPY docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
-COPY lin_to_vosk.py /usr/src/app/lin_to_vosk.py
+COPY kaldi/stt /usr/src/app/stt
+COPY kaldi/docker-entrypoint.sh wait-for-it.sh healthcheck.sh ./
+COPY kaldi/lin_to_vosk.py /usr/src/app/lin_to_vosk.py
 
 RUN mkdir -p /var/log/supervisor/
 
diff --git a/README.md b/kaldi/README.md
similarity index 100%
rename from README.md
rename to kaldi/README.md
diff --git a/RELEASE.md b/kaldi/RELEASE.md
similarity index 100%
rename from RELEASE.md
rename to kaldi/RELEASE.md
diff --git a/docker-entrypoint.sh b/kaldi/docker-entrypoint.sh
similarity index 100%
rename from docker-entrypoint.sh
rename to kaldi/docker-entrypoint.sh
diff --git a/lin_to_vosk.py b/kaldi/lin_to_vosk.py
similarity index 100%
rename from lin_to_vosk.py
rename to kaldi/lin_to_vosk.py
diff --git a/requirements.txt b/kaldi/requirements.txt
similarity index 100%
rename from requirements.txt
rename to kaldi/requirements.txt
diff --git a/stt/__init__.py b/kaldi/stt/__init__.py
similarity index 100%
rename from stt/__init__.py
rename to kaldi/stt/__init__.py
diff --git a/stt/processing/__init__.py b/kaldi/stt/processing/__init__.py
similarity index 100%
rename from stt/processing/__init__.py
rename to kaldi/stt/processing/__init__.py
diff --git a/stt/processing/decoding.py b/kaldi/stt/processing/decoding.py
similarity index 100%
rename from stt/processing/decoding.py
rename to kaldi/stt/processing/decoding.py
diff --git a/stt/processing/streaming.py b/kaldi/stt/processing/streaming.py
similarity index 100%
rename from stt/processing/streaming.py
rename to kaldi/stt/processing/streaming.py
diff --git a/stt/processing/utils.py b/kaldi/stt/processing/utils.py
similarity index 100%
rename from stt/processing/utils.py
rename to kaldi/stt/processing/utils.py

From bbe0c2b511e5940a1378f99a5aa9931ba4a19664 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 30 Nov 2023 17:38:55 +0100
Subject: [PATCH 162/172] uniformize calls (by simplifying decode function) to
 make both worlds work

---
 celery_app/tasks.py                |  4 ++--
 http_server/ingress.py             |  9 +++++----
 kaldi/stt/processing/__init__.py   | 15 ++++++++++++---
 kaldi/stt/processing/decoding.py   |  4 +++-
 kaldi/stt/processing/utils.py      |  6 +++---
 whisper/stt/processing/__init__.py | 26 ++++++++++++++++----------
 whisper/stt/processing/decoding.py |  5 +++--
 7 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index 3fc38f2..4b9a7d6 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -3,7 +3,7 @@
 
 from celery_app.celeryapp import celery
 from stt import logger
-from stt.processing import decode, MODEL, ALIGNMENT_MODEL
+from stt.processing import decode, MODEL
 from stt.processing.utils import load_audiofile
 
 
@@ -24,7 +24,7 @@ def transcribe_task(file_name: str, with_metadata: bool):
 
     # Decode
     try:
-        result = decode(file_content, MODEL, ALIGNMENT_MODEL, with_metadata)
+        result = decode(file_content, MODEL, with_metadata)
     except Exception as err:
         import traceback
         msg = f"{traceback.format_exc()}\nFailed to decode {file_path}"
diff --git a/http_server/ingress.py b/http_server/ingress.py
index da28e8d..3d3e306 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -10,7 +10,7 @@
 from serving import GunicornServing, GeventServing
 from swagger import setupSwaggerUI
 
-from stt.processing import decode, load_wave_buffer, MODEL, ALIGNMENT_MODEL, USE_GPU
+from stt.processing import decode, load_wave_buffer, MODEL, USE_GPU
 from stt import logger as stt_logger
 
 app = Flask("__stt-standalone-worker__")
@@ -25,13 +25,15 @@
 
 # If websocket streaming route is enabled
 if os.environ.get("ENABLE_STREAMING", False) in [True, "true", 1]:
+    from flask_sock import Sock
+    from stt.processing.streaming import ws_streaming
     logger.info("Init websocket serving ...")
     sock = Sock(app)
     logger.info("Streaming is enabled")
 
     @sock.route("/streaming")
     def streaming(web_socket):
-        ws_streaming(web_socket, model)
+        ws_streaming(web_socket, MODEL)
 
 
 @app.route("/healthcheck", methods=["GET"])
@@ -68,8 +70,7 @@ def transcribe():
         audio_data = load_wave_buffer(file_buffer)
 
         # Transcription
-        transcription = decode(
-            audio_data, MODEL, ALIGNMENT_MODEL, join_metadata)
+        transcription = decode(audio_data, MODEL, join_metadata)
 
         if join_metadata:
             return json.dumps(transcription, ensure_ascii=False), 200
diff --git a/kaldi/stt/processing/__init__.py b/kaldi/stt/processing/__init__.py
index 2a3eca5..fc32781 100644
--- a/kaldi/stt/processing/__init__.py
+++ b/kaldi/stt/processing/__init__.py
@@ -6,9 +6,15 @@
 
 from stt import logger
 from stt.processing.decoding import decode
-from stt.processing.utils import formatAudio, load_wave
+from stt.processing.utils import load_wave_buffer, load_audiofile
 
-__all__ = ["model", "logger", "decode", "load_wave", "formatAudio"]
+__all__ = [
+    "logger",
+    "decode",
+    "load_audiofile", "load_wave_buffer",
+    "MODEL",
+    "USE_GPU",
+]
 
 # Model locations (should be mounted)
 MODEL_PATH = "/opt/model"
@@ -17,8 +23,11 @@
 logger.info("Loading acoustic model and decoding graph ...")
 start = time()
 try:
-    model = Model(MODEL_PATH)
+    MODEL = Model(MODEL_PATH)
 except Exception as err:
     raise Exception("Failed to load transcription model: {}".format(str(err))) from err
     sys.exit(-1)
 logger.info("Acoustic model and decoding graph loaded. (t={}s)".format(time() - start))
+
+# Not implemented yet in Kaldi
+USE_GPU = False
\ No newline at end of file
diff --git a/kaldi/stt/processing/decoding.py b/kaldi/stt/processing/decoding.py
index 2e1fb7c..8c06007 100644
--- a/kaldi/stt/processing/decoding.py
+++ b/kaldi/stt/processing/decoding.py
@@ -4,10 +4,12 @@
 from vosk import KaldiRecognizer, Model
 
 
-def decode(audio_data: bytes, model: Model, sampling_rate: int, with_metadata: bool) -> dict:
+def decode(audio: tuple[bytes, int], model: Model, with_metadata: bool) -> dict:
     """Transcribe the audio data using the vosk library with the defined model."""
     result = {"text": "", "confidence-score": 0.0, "words": []}
 
+    audio_data, sampling_rate = audio
+
     recognizer = KaldiRecognizer(model, sampling_rate)
     recognizer.SetMaxAlternatives(0)  # Set confidence per words
     recognizer.SetWords(with_metadata)
diff --git a/kaldi/stt/processing/utils.py b/kaldi/stt/processing/utils.py
index b81cc5d..4de66c7 100644
--- a/kaldi/stt/processing/utils.py
+++ b/kaldi/stt/processing/utils.py
@@ -4,13 +4,13 @@
 from numpy import int16, squeeze, mean
 
 
-def load_wave(file_path):
+def load_audiofile(file_path):
     """Formats audio from a wavFile buffer to a bytebuffer"""
     audio = squeeze(wavio.read(file_path).data)
-    return audio.tobytes()
+    return (audio.tobytes(), 16000)
 
 
-def formatAudio(file_buffer):
+def load_wave_buffer(file_buffer):
     """Formats audio from a wavFile buffer to a numpy array for processing."""
     file_buffer_io = io.BytesIO(file_buffer)
     file_content = wavio.read(file_buffer_io)
diff --git a/whisper/stt/processing/__init__.py b/whisper/stt/processing/__init__.py
index 9bb51bc..6faaab0 100644
--- a/whisper/stt/processing/__init__.py
+++ b/whisper/stt/processing/__init__.py
@@ -9,9 +9,13 @@
 from .load_model import load_whisper_model
 from .alignment_model import load_alignment_model, get_alignment_model
 
-__all__ = ["logger", "decode", "model", "alignment_model",
-           "load_audiofile", "load_wave_buffer"]
-
+__all__ = [
+    "logger",
+    "decode",
+    "load_audiofile", "load_wave_buffer",
+    "MODEL",
+    "USE_GPU",
+]
 class LazyLoadedModel:
 
     def __init__(self, model_type, device):
@@ -48,20 +52,22 @@ def __call__(self, *args, **kwargs):
 model_type = os.environ.get("MODEL", "medium")
 logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})...")
 try:
-    MODEL = LazyLoadedModel(model_type, device=device)
+    model = LazyLoadedModel(model_type, device=device)
     # model = load_whisper_model(model_type, device=device)
 except Exception as err:
     raise Exception(
         "Failed to load transcription model: {}".format(str(err))) from err
 
 # Load alignment model (if any)
-ALIGNMENT_MODEL = get_alignment_model(os.environ.get("ALIGNMENT_MODEL"), language)
-if ALIGNMENT_MODEL:
+alignment_model = get_alignment_model(os.environ.get("alignment_model"), language)
+if alignment_model:
     logger.info(
-        f"Loading alignment model {ALIGNMENT_MODEL} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
-    ALIGNMENT_MODEL = load_alignment_model(ALIGNMENT_MODEL, device=device, download_root="/opt")
-elif ALIGNMENT_MODEL is None:
+        f"Loading alignment model {alignment_model} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
+    alignment_model = load_alignment_model(alignment_model, device=device, download_root="/opt")
+elif alignment_model is None:
     logger.info("Alignment will be done using Whisper cross-attention weights")
 else:
     logger.info("No alignment model preloaded. It will be loaded on the fly depending on the detected language.")
-    ALIGNMENT_MODEL = {}  # Alignement model(s) will be loaded on the fly
+    alignment_model = {}  # Alignement model(s) will be loaded on the fly
+
+MODEL = (model, alignment_model)
diff --git a/whisper/stt/processing/decoding.py b/whisper/stt/processing/decoding.py
index 9dd6855..b78c4db 100644
--- a/whisper/stt/processing/decoding.py
+++ b/whisper/stt/processing/decoding.py
@@ -29,8 +29,7 @@
 default_initial_prompt = os.environ.get("PROMPT", None)
 
 def decode(audio,
-           model,
-           alignment_model: "Any",
+           model_and_alignementmodel, # Tuple[model, alignment_model]
            with_word_timestamps: bool,
            language: str = None,
            remove_punctuation_from_words=False,
@@ -47,6 +46,8 @@ def decode(audio,
         language = get_language()
 
     kwargs = copy.copy(locals())
+    kwargs.pop("model_and_alignementmodel")
+    kwargs["model"], kwargs["alignment_model"] = model_and_alignementmodel
 
     logger.info("Transcribing audio with " + (f"language {language}" if language else "automatic language detection") + "...")
 

From bde383d30baed0eaa038805cc3520b37e25332c3 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 30 Nov 2023 17:39:11 +0100
Subject: [PATCH 163/172] Update Jenkinsfile

---
 Jenkinsfile | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 75a09bd..81d8ec8 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -1,10 +1,9 @@
 pipeline {
     agent any
     environment {
-        DOCKER_HUB_REPO = "lintoai/linto-platform-stt"
+        DOCKER_HUB_REPO_KALDI   = "lintoai/linto-platform-stt-kaldi"
+        DOCKER_HUB_REPO_WHISPER = "lintoai/linto-platform-stt-whisper"
         DOCKER_HUB_CRED = 'docker-hub-credentials'
-
-        VERSION = ''
     }
 
     stages{
@@ -15,10 +14,22 @@ pipeline {
             steps {
                 echo 'Publishing latest'
                 script {
-                    image = docker.build(env.DOCKER_HUB_REPO)
+                    image = docker.build(env.DOCKER_HUB_REPO_KALDI, "-f kaldi/Dockerfile .")
+                    VERSION = sh(
+                        returnStdout: true, 
+                        script: "awk -v RS='' '/#/ {print; exit}' kaldi/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                    ).trim()
+
+                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
+                        image.push("${VERSION}")
+                        image.push('latest')
+                    }
+                }
+                script {
+                    image = docker.build(env.DOCKER_HUB_REPO_WHISPER, "-f whisper/Dockerfile.ctranslate2 .")
                     VERSION = sh(
                         returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                        script: "awk -v RS='' '/#/ {print; exit}' whisper/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
 
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
@@ -36,37 +47,27 @@ pipeline {
             steps {
                 echo 'Publishing unstable'
                 script {
-                    image = docker.build(env.DOCKER_HUB_REPO)
+                    image = docker.build(env.DOCKER_HUB_REPO_KALDI, "-f kaldi/Dockerfile .")
                     VERSION = sh(
                         returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                        script: "awk -v RS='' '/#/ {print; exit}' kaldi/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
                         image.push('latest-unstable')
                     }
                 }
-            }
-        }
-
-        stage('Docker build for whisper branch'){
-            when{
-                branch 'feature/whisper'
-            }
-            steps {
-                echo 'Publishing faster_whisper'
                 script {
-                    image = docker.build(env.DOCKER_HUB_REPO, "-f Dockerfile.ctranslate2 .")
+                    image = docker.build(env.DOCKER_HUB_REPO_WHISPER, "-f whisper/Dockerfile.ctranslate2 .")
                     VERSION = sh(
                         returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
+                        script: "awk -v RS='' '/#/ {print; exit}' whisper/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
-
                     docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push("${VERSION}")
-                        image.push('whisper-latest')
+                        image.push('latest-unstable')
                     }
                 }
             }
         }
+
     }// end stages
 }
\ No newline at end of file

From 3f88b73ea7b3e830d766675bf9108939736aea8b Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Thu, 30 Nov 2023 17:49:13 +0100
Subject: [PATCH 164/172] fix coding style

---
 Makefile                                  |   2 +-
 celery_app/celeryapp.py                   |   1 -
 celery_app/tasks.py                       |  11 +-
 http_server/ingress.py                    |  19 +-
 http_server/serving.py                    |  16 +-
 kaldi/stt/processing/__init__.py          |  10 +-
 kaldi/stt/processing/streaming.py         |   3 +-
 kaldi/stt/processing/utils.py             |   2 +-
 whisper/stt/__init__.py                   |   9 +-
 whisper/stt/processing/__init__.py        |  33 +--
 whisper/stt/processing/alignment_model.py | 115 ++++++-----
 whisper/stt/processing/decoding.py        | 229 ++++++++++++---------
 whisper/stt/processing/load_model.py      | 176 +++++++++-------
 whisper/stt/processing/text_normalize.py  | 142 ++++++-------
 whisper/stt/processing/utils.py           | 237 +++++++++++-----------
 whisper/stt/processing/word_alignment.py  |  64 +++---
 16 files changed, 595 insertions(+), 474 deletions(-)

diff --git a/Makefile b/Makefile
index 71be1a8..24db387 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 .DEFAULT_GOAL := help
 
-target_dirs := stt http_server celery_app
+target_dirs := kaldi/stt whisper/stt http_server celery_app
 
 help:
 	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
diff --git a/celery_app/celeryapp.py b/celery_app/celeryapp.py
index b432831..d1c4099 100644
--- a/celery_app/celeryapp.py
+++ b/celery_app/celeryapp.py
@@ -1,7 +1,6 @@
 import os
 
 from celery import Celery
-
 from stt import logger
 
 celery = Celery(__name__, include=["celery_app.tasks"])
diff --git a/celery_app/tasks.py b/celery_app/tasks.py
index 4b9a7d6..114df2a 100644
--- a/celery_app/tasks.py
+++ b/celery_app/tasks.py
@@ -1,11 +1,12 @@
 import asyncio
 import os
 
-from celery_app.celeryapp import celery
 from stt import logger
-from stt.processing import decode, MODEL
+from stt.processing import MODEL, decode
 from stt.processing.utils import load_audiofile
 
+from celery_app.celeryapp import celery
+
 
 @celery.task(name="transcribe_task")
 def transcribe_task(file_name: str, with_metadata: bool):
@@ -18,17 +19,19 @@ def transcribe_task(file_name: str, with_metadata: bool):
         file_content = load_audiofile(file_path)
     except Exception as err:
         import traceback
+
         msg = f"{traceback.format_exc()}\nFailed to load ressource {file_path}"
         logger.error(msg)
-        raise Exception(msg) # from err
+        raise Exception(msg)  # from err
 
     # Decode
     try:
         result = decode(file_content, MODEL, with_metadata)
     except Exception as err:
         import traceback
+
         msg = f"{traceback.format_exc()}\nFailed to decode {file_path}"
         logger.error(msg)
-        raise Exception(msg) # from err
+        raise Exception(msg)  # from err
 
     return result
diff --git a/http_server/ingress.py b/http_server/ingress.py
index 3d3e306..6c71478 100644
--- a/http_server/ingress.py
+++ b/http_server/ingress.py
@@ -7,11 +7,10 @@
 
 from confparser import createParser
 from flask import Flask, json, request
-from serving import GunicornServing, GeventServing
-from swagger import setupSwaggerUI
-
-from stt.processing import decode, load_wave_buffer, MODEL, USE_GPU
+from serving import GeventServing, GunicornServing
 from stt import logger as stt_logger
+from stt.processing import MODEL, USE_GPU, decode, load_wave_buffer
+from swagger import setupSwaggerUI
 
 app = Flask("__stt-standalone-worker__")
 app.config["JSON_AS_ASCII"] = False
@@ -27,6 +26,7 @@
 if os.environ.get("ENABLE_STREAMING", False) in [True, "true", 1]:
     from flask_sock import Sock
     from stt.processing.streaming import ws_streaming
+
     logger.info("Init websocket serving ...")
     sock = Sock(app)
     logger.info("Streaming is enabled")
@@ -58,7 +58,9 @@ def transcribe():
         elif request.headers.get("accept").lower() == "text/plain":
             join_metadata = False
         else:
-            raise ValueError(f"Not accepted header (accept={request.headers.get('accept')} should be either application/json or text/plain)")
+            raise ValueError(
+                f"Not accepted header (accept={request.headers.get('accept')} should be either application/json or text/plain)"
+            )
         # logger.debug("Metadata: {}".format(join_metadata))
 
         # get input file
@@ -66,7 +68,7 @@ def transcribe():
             raise ValueError(f"No audio file was uploaded (missing 'file' key)")
 
         file_buffer = request.files["file"].read()
-        
+
         audio_data = load_wave_buffer(file_buffer)
 
         # Transcription
@@ -78,6 +80,7 @@ def transcribe():
 
     except Exception as error:
         import traceback
+
         logger.error(traceback.format_exc())
         logger.error(repr(error))
         return "Server Error: {}".format(str(error)), 400 if isinstance(error, ValueError) else 500
@@ -116,8 +119,8 @@ def server_error(error):
         logger.warning("Could not setup swagger: {}".format(str(err)))
 
     logger.info(f"Using {args.workers} workers")
-    
-    if USE_GPU: # TODO: get rid of this?
+
+    if USE_GPU:  # TODO: get rid of this?
         serving_type = GeventServing
         logger.debug("Serving with gevent")
     else:
diff --git a/http_server/serving.py b/http_server/serving.py
index 725f763..9230eb4 100644
--- a/http_server/serving.py
+++ b/http_server/serving.py
@@ -1,6 +1,7 @@
-import gunicorn.app.base
-import gevent.pywsgi
 import gevent.monkey
+import gevent.pywsgi
+import gunicorn.app.base
+
 gevent.monkey.patch_all()
 
 
@@ -22,22 +23,21 @@ def load_config(self):
     def load(self):
         return self.application
 
-class GeventServing():
 
+class GeventServing:
     def __init__(self, app, options=None):
         self.options = options or {}
         self.application = app
 
     def run(self):
-        bind = self.options.get('bind', "0.0.0.0:8080")
-        workers = self.options.get('workers', 1)
-        listener = bind.split(':')
+        bind = self.options.get("bind", "0.0.0.0:8080")
+        workers = self.options.get("workers", 1)
+        listener = bind.split(":")
         try:
             assert len(listener) == 2
             listener = (listener[0], int(listener[1]))
         except:
             print(f"Invalid bind address {bind}")
 
-        server = gevent.pywsgi.WSGIServer(listener, self.application, spawn = workers)
+        server = gevent.pywsgi.WSGIServer(listener, self.application, spawn=workers)
         server.serve_forever()
-
diff --git a/kaldi/stt/processing/__init__.py b/kaldi/stt/processing/__init__.py
index fc32781..9f99406 100644
--- a/kaldi/stt/processing/__init__.py
+++ b/kaldi/stt/processing/__init__.py
@@ -2,16 +2,16 @@
 import sys
 from time import time
 
-from vosk import Model
-
 from stt import logger
 from stt.processing.decoding import decode
-from stt.processing.utils import load_wave_buffer, load_audiofile
+from stt.processing.utils import load_audiofile, load_wave_buffer
+from vosk import Model
 
 __all__ = [
     "logger",
     "decode",
-    "load_audiofile", "load_wave_buffer",
+    "load_audiofile",
+    "load_wave_buffer",
     "MODEL",
     "USE_GPU",
 ]
@@ -30,4 +30,4 @@
 logger.info("Acoustic model and decoding graph loaded. (t={}s)".format(time() - start))
 
 # Not implemented yet in Kaldi
-USE_GPU = False
\ No newline at end of file
+USE_GPU = False
diff --git a/kaldi/stt/processing/streaming.py b/kaldi/stt/processing/streaming.py
index 28274b8..a33ecfc 100644
--- a/kaldi/stt/processing/streaming.py
+++ b/kaldi/stt/processing/streaming.py
@@ -3,11 +3,10 @@
 from typing import Union
 
 from simple_websocket.ws import Server as WSServer
+from stt import logger
 from vosk import KaldiRecognizer, Model
 from websockets.legacy.server import WebSocketServerProtocol
 
-from stt import logger
-
 
 async def wssDecode(ws: WebSocketServerProtocol, model: Model):
     """Async Decode function endpoint"""
diff --git a/kaldi/stt/processing/utils.py b/kaldi/stt/processing/utils.py
index 4de66c7..eb3349d 100644
--- a/kaldi/stt/processing/utils.py
+++ b/kaldi/stt/processing/utils.py
@@ -1,7 +1,7 @@
 import io
 
 import wavio
-from numpy import int16, squeeze, mean
+from numpy import int16, mean, squeeze
 
 
 def load_audiofile(file_path):
diff --git a/whisper/stt/__init__.py b/whisper/stt/__init__.py
index aa3e314..f5551af 100644
--- a/whisper/stt/__init__.py
+++ b/whisper/stt/__init__.py
@@ -1,5 +1,5 @@
-import os
 import logging
+import os
 
 logging.basicConfig(
     format="[%(asctime)s,%(msecs)03d %(name)s] %(levelname)s: %(message)s",
@@ -8,12 +8,13 @@
 logger = logging.getLogger("__stt__")
 
 # The following is to have GPU in the right order (as nvidia-smi show them)
-# It is important to set that before loading ctranslate2 
+# It is important to set that before loading ctranslate2
 # see https://github.com/guillaumekln/faster-whisper/issues/150
-os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' # GPU in the right order
+os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"  # GPU in the right order
 
 try:
     import faster_whisper
+
     USE_CTRANSLATE2 = True
 except ImportError as err:
     try:
@@ -24,12 +25,14 @@
 
 try:
     import torch
+
     USE_TORCH = True
 except ImportError:
     USE_TORCH = False
 
 try:
     import torchaudio
+
     USE_TORCHAUDIO = True
 except ImportError:
     USE_TORCHAUDIO = False
diff --git a/whisper/stt/processing/__init__.py b/whisper/stt/processing/__init__.py
index 6faaab0..b0e7f6d 100644
--- a/whisper/stt/processing/__init__.py
+++ b/whisper/stt/processing/__init__.py
@@ -1,23 +1,25 @@
-import os
 import logging
+import os
+
 from lockfile import FileLock
+from stt import USE_CTRANSLATE2, logger
 
-from stt import logger, USE_CTRANSLATE2
+from .alignment_model import get_alignment_model, load_alignment_model
 from .decoding import decode
-from .utils import get_device, get_language, load_wave_buffer, load_audiofile
-
 from .load_model import load_whisper_model
-from .alignment_model import load_alignment_model, get_alignment_model
+from .utils import get_device, get_language, load_audiofile, load_wave_buffer
 
 __all__ = [
     "logger",
     "decode",
-    "load_audiofile", "load_wave_buffer",
+    "load_audiofile",
+    "load_wave_buffer",
     "MODEL",
     "USE_GPU",
 ]
-class LazyLoadedModel:
 
+
+class LazyLoadedModel:
     def __init__(self, model_type, device):
         self.model_type = model_type
         self.device = device
@@ -32,11 +34,12 @@ def check_loaded(self):
     def __getattr__(self, name):
         self.check_loaded()
         return getattr(self._model, name)
-    
+
     def __call__(self, *args, **kwargs):
         self.check_loaded()
         return self._model(*args, **kwargs)
 
+
 # Set informative log
 logger.setLevel(logging.INFO)
 
@@ -50,24 +53,28 @@ def __call__(self, *args, **kwargs):
 
 # Load ASR model
 model_type = os.environ.get("MODEL", "medium")
-logger.info(f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})...")
+logger.info(
+    f"Loading Whisper model {model_type} ({'local' if os.path.exists(model_type) else 'remote'})..."
+)
 try:
     model = LazyLoadedModel(model_type, device=device)
     # model = load_whisper_model(model_type, device=device)
 except Exception as err:
-    raise Exception(
-        "Failed to load transcription model: {}".format(str(err))) from err
+    raise Exception("Failed to load transcription model: {}".format(str(err))) from err
 
 # Load alignment model (if any)
 alignment_model = get_alignment_model(os.environ.get("alignment_model"), language)
 if alignment_model:
     logger.info(
-        f"Loading alignment model {alignment_model} ({'local' if os.path.exists(alignment_model) else 'remote'})...")
+        f"Loading alignment model {alignment_model} ({'local' if os.path.exists(alignment_model) else 'remote'})..."
+    )
     alignment_model = load_alignment_model(alignment_model, device=device, download_root="/opt")
 elif alignment_model is None:
     logger.info("Alignment will be done using Whisper cross-attention weights")
 else:
-    logger.info("No alignment model preloaded. It will be loaded on the fly depending on the detected language.")
+    logger.info(
+        "No alignment model preloaded. It will be loaded on the fly depending on the detected language."
+    )
     alignment_model = {}  # Alignement model(s) will be loaded on the fly
 
 MODEL = (model, alignment_model)
diff --git a/whisper/stt/processing/alignment_model.py b/whisper/stt/processing/alignment_model.py
index a8e6e79..ea958db 100644
--- a/whisper/stt/processing/alignment_model.py
+++ b/whisper/stt/processing/alignment_model.py
@@ -1,17 +1,19 @@
-from stt import logger, USE_TORCH, USE_TORCHAUDIO
-from .utils import SAMPLE_RATE, LANGUAGES
-
-import os
 import math
+import os
 import time
+
 import requests
+from stt import USE_TORCH, USE_TORCHAUDIO, logger
+
+from .utils import LANGUAGES, SAMPLE_RATE
 
 if USE_TORCH:
     import torch
     import torch.nn.utils.rnn as rnn_utils
+
     try:
-        import speechbrain as sb
         import huggingface_hub
+        import speechbrain as sb
     except ImportError:
         pass
     try:
@@ -66,8 +68,7 @@ def get_alignment_model(alignment_model_name, language, force=False):
         elif language in ALIGNMENT_MODELS:
             return ALIGNMENT_MODELS[language]
         elif force:
-            raise ValueError(
-                f"No wav2vec alignment model for language '{language}'.")
+            raise ValueError(f"No wav2vec alignment model for language '{language}'.")
         else:
             logger.warn(
                 f"No wav2vec alignment model for language '{language}'. Fallback to English."
@@ -77,57 +78,59 @@ def get_alignment_model(alignment_model_name, language, force=False):
         return get_alignment_model("wav2vec", alignment_model_name, force=True)
     return alignment_model_name
 
-def load_alignment_model(source, device="cpu", download_root="/opt"):
 
+def load_alignment_model(source, device="cpu", download_root="/opt"):
     if not USE_TORCH:
-        raise NotImplementedError(
-            "Alignement model not available without Torch")
+        raise NotImplementedError("Alignement model not available without Torch")
 
     start = time.time()
 
     if (source in torchaudio.pipelines.__all__) if USE_TORCHAUDIO else False:
-        model = load_torchaudio_model(
-            source, device=device, download_root=download_root)
+        model = load_torchaudio_model(source, device=device, download_root=download_root)
     else:
         try:
-            model = load_transformers_model(
-                source, device=device, download_root=download_root)
+            model = load_transformers_model(source, device=device, download_root=download_root)
         except Exception as err1:
             try:
-                model = load_speechbrain_model(
-                    source, device=device, download_root=download_root)
+                model = load_speechbrain_model(source, device=device, download_root=download_root)
             except Exception as err2:
                 raise Exception(
-                    f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}") from err2
+                    f"Failed to load alignment model:\n<<< transformers <<<\n{str(err1)}\n<<< speechbrain <<<\n{str(err2)}"
+                ) from err2
 
     logger.info(
-        f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)")
+        f"Alignment Model of type {get_model_type(model)} loaded. (t={time.time() - start}s)"
+    )
 
     return model
 
 
 def load_speechbrain_model(source, device="cpu", download_root="/opt"):
-
     if os.path.isdir(source):
         yaml_file = os.path.join(source, "hyperparams.yaml")
-        assert os.path.isfile(
-            yaml_file), f"Hyperparams file {yaml_file} not found"
+        assert os.path.isfile(yaml_file), f"Hyperparams file {yaml_file} not found"
     else:
         try:
             yaml_file = huggingface_hub.hf_hub_download(
-                repo_id=source, filename="hyperparams.yaml", cache_dir=os.path.join(download_root, "huggingface/hub"))
+                repo_id=source,
+                filename="hyperparams.yaml",
+                cache_dir=os.path.join(download_root, "huggingface/hub"),
+            )
         except requests.exceptions.HTTPError:
             yaml_file = None
     overrides = make_yaml_overrides(
-        yaml_file, {"save_path": os.path.join(download_root, "speechbrain")})
+        yaml_file, {"save_path": os.path.join(download_root, "speechbrain")}
+    )
 
     savedir = os.path.join(download_root, "speechbrain")
     try:
         model = sb.pretrained.EncoderASR.from_hparams(
-            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides
+        )
     except ValueError:
         model = sb.pretrained.EncoderDecoderASR.from_hparams(
-            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides)
+            source=source, run_opts={"device": device}, savedir=savedir, overrides=overrides
+        )
 
     model.train(False)
     model.requires_grad_(False)
@@ -135,7 +138,6 @@ def load_speechbrain_model(source, device="cpu", download_root="/opt"):
 
 
 def load_transformers_model(source, device="cpu", download_root="/opt"):
-
     model = transformers.Wav2Vec2ForCTC.from_pretrained(source).to(device)
     processor = transformers.Wav2Vec2Processor.from_pretrained(source)
 
@@ -145,7 +147,6 @@ def load_transformers_model(source, device="cpu", download_root="/opt"):
 
 
 def load_torchaudio_model(source, device="cpu", download_root="/opt"):
-
     bundle = torchaudio.pipelines.__dict__[source]
     model = bundle.get_model().to(device)
     labels = bundle.get_labels()
@@ -187,8 +188,7 @@ def make_yaml_overrides(yaml_file, key_values):
             elif ":" in line:
                 child = line.strip().split(":")[0].strip()
                 if child in key_values:
-                    override[parent] = override.get(parent, {}) | {
-                        child: key_values[child]}
+                    override[parent] = override.get(parent, {}) | {child: key_values[child]}
     return override
 
 
@@ -205,15 +205,18 @@ def get_vocab(model):
     else:
         labels, blank_id = get_vocab_torchaudio(model)
     assert isinstance(labels, list) and min(
-        [isinstance(l, str) for l in labels]), "labels must be a list of strings"
+        [isinstance(l, str) for l in labels]
+    ), "labels must be a list of strings"
     return norm_labels(labels, blank_id), blank_id
 
 
 def get_vocab_speechbrain(model):
     tokenizer = model.tokenizer
     # Is this general enough?
-    labels = [{'': " ", ' ⁇ ': "<pad>"}.get(i, i) for i in tokenizer.decode(
-        [[i] for i in range(tokenizer.get_piece_size())])]
+    labels = [
+        {"": " ", " ⁇ ": "<pad>"}.get(i, i)
+        for i in tokenizer.decode([[i] for i in range(tokenizer.get_piece_size())])
+    ]
     blank_id = labels.index("<pad>")
     return labels, blank_id
 
@@ -228,8 +231,7 @@ def get_vocab_torchaudio(model_and_labels):
 
 def get_vocab_transformers(model_and_processor):
     _, processor = model_and_processor
-    labels_dict = dict((v, k)
-                       for k, v in processor.tokenizer.get_vocab().items())
+    labels_dict = dict((v, k) for k, v in processor.tokenizer.get_vocab().items())
     labels = [labels_dict[i] for i in range(len(labels_dict))]
     blank_id = labels.index("<pad>")
     return labels, blank_id
@@ -239,6 +241,7 @@ def norm_labels(labels, blank_id):
     labels[blank_id] = ""
     return [l if l != "|" else " " for l in labels]
 
+
 ################################################################################
 # Compute log-probabilities from model
 
@@ -250,7 +253,6 @@ def norm_labels(labels, blank_id):
 
 
 def compute_logprobas(model, audios, max_len=MAX_LEN):
-
     # Single audio
     if not isinstance(audios, list):
         audios = [audios]
@@ -280,22 +282,22 @@ def compute_logits_speechbrain(model, audios, max_len):
         chunks = []
         i_audio = []
         for a in audios:
-            chunks.extend([a[i:min(i+max_len, len(a))]
-                          for i in range(0, len(a), max_len)])
+            chunks.extend([a[i : min(i + max_len, len(a))] for i in range(0, len(a), max_len)])
             i_audio.append(len(chunks))
             if len(chunks) > 1:
                 logger.warning(
-                    "Audio too long, splitting into {} chunks for alignment".format(len(chunks)))
+                    "Audio too long, splitting into {} chunks for alignment".format(len(chunks))
+                )
         # Decode chunks of audio and concatenate results
         log_probas = [[] for i in range(len(audios))]
         for i in range(0, len(chunks), batch_size):
-            chunk = chunks[i:min(i+batch_size, len(chunks))]
+            chunk = chunks[i : min(i + batch_size, len(chunks))]
             log_probas_tmp = compute_logits_speechbrain(model, chunk)
-            for j in range(i, i+len(chunk)):
+            for j in range(i, i + len(chunk)):
                 k = 0
                 while j >= i_audio[k]:
                     k += 1
-                log_probas[k].append(log_probas_tmp[j-i])
+                log_probas[k].append(log_probas_tmp[j - i])
         log_probas = [torch.cat(p, dim=0) for p in log_probas]
         log_probas, wav_lens = pack_sequences(log_probas, device=model.device)
     else:
@@ -307,16 +309,15 @@ def compute_logits_speechbrain(model, audios, max_len):
 
 def pack_sequences(tensors, device="cpu"):
     if len(tensors) == 1:
-        return tensors[0].unsqueeze(0).to(device), torch.Tensor([1.]).to(device)
+        return tensors[0].unsqueeze(0).to(device), torch.Tensor([1.0]).to(device)
     tensor = rnn_utils.pad_sequence(tensors, batch_first=True)
     wav_lens = [len(x) for x in tensors]
     maxwav_lens = max(wav_lens)
-    wav_lens = torch.Tensor([l/maxwav_lens for l in wav_lens])
+    wav_lens = torch.Tensor([l / maxwav_lens for l in wav_lens])
     return tensor.to(device), wav_lens.to(device)
 
 
 def compute_logits_transformers(model_and_processor, audios, max_len):
-
     model, processor = model_and_processor
 
     # can be different from processor.feature_extractor.sampling_rate
@@ -342,19 +343,28 @@ def compute_logits_transformers(model_and_processor, audios, max_len):
         if l > max_len:
             # Split batch in smaller chunks
             logger.warning(
-                "Audio too long, splitting into {} chunks for alignment".format(math.ceil(l / max_len)))
+                "Audio too long, splitting into {} chunks for alignment".format(
+                    math.ceil(l / max_len)
+                )
+            )
             logits = []
             for i in range(0, l, max_len):
                 j = min(i + max_len, l)
                 if use_mask:
-                    logits.append(model(padded_batch.input_values[:, i:j].to(device),
-                                    attention_mask=padded_batch.attention_mask[:, i:j].to(device)).logits)
+                    logits.append(
+                        model(
+                            padded_batch.input_values[:, i:j].to(device),
+                            attention_mask=padded_batch.attention_mask[:, i:j].to(device),
+                        ).logits
+                    )
                 else:
                     logits.append(model(padded_batch.input_values[:, i:j].to(device)).logits)
             logits = torch.cat(logits, dim=1)
         elif use_mask:
-            logits = model(padded_batch.input_values.to(device),
-                           attention_mask=padded_batch.attention_mask.to(device)).logits
+            logits = model(
+                padded_batch.input_values.to(device),
+                attention_mask=padded_batch.attention_mask.to(device),
+            ).logits
         else:
             logits = model(padded_batch.input_values.to(device)).logits
 
@@ -371,7 +381,7 @@ def compute_logits_torchaudio(model_and_labels, audios, max_len):
     for p in model.parameters():
         device = p.device
         break
-    
+
     all_logits = []
 
     with torch.inference_mode():
@@ -380,7 +390,10 @@ def compute_logits_torchaudio(model_and_labels, audios, max_len):
             if l > max_len:
                 # Split audio in smaller chunks
                 logger.warning(
-                    "Audio too long, splitting into {} chunks for alignment".format(math.ceil(l / max_len)))
+                    "Audio too long, splitting into {} chunks for alignment".format(
+                        math.ceil(l / max_len)
+                    )
+                )
                 logits = []
                 for i in range(0, l, max_len):
                     j = min(i + max_len, l)
diff --git a/whisper/stt/processing/decoding.py b/whisper/stt/processing/decoding.py
index b78c4db..9f8411f 100644
--- a/whisper/stt/processing/decoding.py
+++ b/whisper/stt/processing/decoding.py
@@ -1,17 +1,18 @@
+import copy
 import os
 import time
-import numpy as np
-import copy
 from typing import Tuple, Union
 
-from stt import logger, USE_CTRANSLATE2
-from .utils import SAMPLE_RATE, get_language
-from .text_normalize import remove_punctuation, normalize_text, remove_emoji
+import numpy as np
+from stt import USE_CTRANSLATE2, logger
+
 from .alignment_model import get_alignment_model, load_alignment_model
+from .text_normalize import normalize_text, remove_emoji, remove_punctuation
+from .utils import SAMPLE_RATE, get_language
 from .word_alignment import compute_alignment
 
 if not USE_CTRANSLATE2:
-    import torch 
+    import torch
     import whisper_timestamped
 
 USE_ACCURATE = True
@@ -28,20 +29,21 @@
 
 default_initial_prompt = os.environ.get("PROMPT", None)
 
-def decode(audio,
-           model_and_alignementmodel, # Tuple[model, alignment_model]
-           with_word_timestamps: bool,
-           language: str = None,
-           remove_punctuation_from_words=False,
-           beam_size: int = default_beam_size,
-           best_of: int = default_best_of,
-           temperature: Union[float, Tuple[float, ...]] = default_temperature,
-           condition_on_previous_text: bool = False,
-           no_speech_threshold: float = 0.6,
-           compression_ratio_threshold: float = 2.4,
-           initial_prompt: str = default_initial_prompt,
-           ) -> dict:
 
+def decode(
+    audio,
+    model_and_alignementmodel,  # Tuple[model, alignment_model]
+    with_word_timestamps: bool,
+    language: str = None,
+    remove_punctuation_from_words=False,
+    beam_size: int = default_beam_size,
+    best_of: int = default_best_of,
+    temperature: Union[float, Tuple[float, ...]] = default_temperature,
+    condition_on_previous_text: bool = False,
+    no_speech_threshold: float = 0.6,
+    compression_ratio_threshold: float = 2.4,
+    initial_prompt: str = default_initial_prompt,
+) -> dict:
     if language is None:
         language = get_language()
 
@@ -49,7 +51,11 @@ def decode(audio,
     kwargs.pop("model_and_alignementmodel")
     kwargs["model"], kwargs["alignment_model"] = model_and_alignementmodel
 
-    logger.info("Transcribing audio with " + (f"language {language}" if language else "automatic language detection") + "...")
+    logger.info(
+        "Transcribing audio with "
+        + (f"language {language}" if language else "automatic language detection")
+        + "..."
+    )
 
     start_t = time.time()
 
@@ -61,24 +67,19 @@ def decode(audio,
         res = decode_torch(**kwargs)
 
     logger.info("Transcription complete (t={}s)".format(time.time() - start_t))
-    
-    return res
 
+    return res
 
-def decode_ct2(audio,
-               model,
-               with_word_timestamps,
-               language,
-               remove_punctuation_from_words,
-               **kwargs
-               ):
 
-    kwargs["no_speech_threshold"] = 1   # To avoid empty output
+def decode_ct2(
+    audio, model, with_word_timestamps, language, remove_punctuation_from_words, **kwargs
+):
+    kwargs["no_speech_threshold"] = 1  # To avoid empty output
     if kwargs.get("beam_size") is None:
         kwargs["beam_size"] = 1
     if kwargs.get("best_of") is None:
         kwargs["best_of"] = 1
-    
+
     segments, info = model.transcribe(
         audio,
         word_timestamps=with_word_timestamps,
@@ -86,31 +87,32 @@ def decode_ct2(audio,
         # Careful with the following options
         max_initial_timestamp=10000.0,
         vad_filter=USE_VAD,
-        **kwargs)
+        **kwargs,
+    )
 
     segments = list(segments)
 
     return format_faster_whisper_response(
-        segments, info,
-        remove_punctuation_from_words=remove_punctuation_from_words
+        segments, info, remove_punctuation_from_words=remove_punctuation_from_words
     )
 
 
-def decode_torch(audio,
-                 model,
-                 alignment_model,
-                 with_word_timestamps,
-                 language,
-                 remove_punctuation_from_words,
-                 beam_size,
-                 best_of,
-                 temperature,
-                 condition_on_previous_text,
-                 no_speech_threshold,
-                 compression_ratio_threshold,
-                 normalize_text_as_words=False,
-                 initial_prompt=None,
-                 ):
+def decode_torch(
+    audio,
+    model,
+    alignment_model,
+    with_word_timestamps,
+    language,
+    remove_punctuation_from_words,
+    beam_size,
+    best_of,
+    temperature,
+    condition_on_previous_text,
+    no_speech_threshold,
+    compression_ratio_threshold,
+    normalize_text_as_words=False,
+    initial_prompt=None,
+):
     """Transcribe the audio data using Whisper with the defined model."""
 
     fp16 = model.device != torch.device("cpu")
@@ -134,12 +136,14 @@ def decode_torch(audio,
         if language is None:
             language = whisper_res["language"]
             logger.info(f"Detected language: {language}")
-        return format_whisper_timestamped_response(whisper_res, remove_punctuation_from_words=remove_punctuation_from_words)
+        return format_whisper_timestamped_response(
+            whisper_res, remove_punctuation_from_words=remove_punctuation_from_words
+        )
 
     # Force deterministic results
     torch.manual_seed(1234)
     torch.cuda.manual_seed_all(1234)
-    
+
     whisper_res = model.transcribe(audio, verbose=None, **kwargs)
 
     text = whisper_res["text"]
@@ -156,8 +160,12 @@ def decode_torch(audio,
         # Load alignment model on the fly
         if language not in alignment_model:
             alignment_model_name = get_alignment_model(language)
-            logger.info(f"Loading alignment model {alignment_model_name} ({'local' if os.path.exists(alignment_model_name) else 'remote'})...")
-            alignment_model[language] = load_alignment_model(alignment_model_name, device=model.device, download_root="/opt")
+            logger.info(
+                f"Loading alignment model {alignment_model_name} ({'local' if os.path.exists(alignment_model_name) else 'remote'})..."
+            )
+            alignment_model[language] = load_alignment_model(
+                alignment_model_name, device=model.device, download_root="/opt"
+            )
         spec_alignment_model = alignment_model[language]
     else:
         spec_alignment_model = alignment_model
@@ -165,9 +173,9 @@ def decode_torch(audio,
     result = {}
     result["text"] = text
     result["language"] = language
-    result["confidence-score"] = np.exp(
-        np.array([r["avg_logprob"] for r in segments])
-    ).mean() if len(segments) else 0.0
+    result["confidence-score"] = (
+        np.exp(np.array([r["avg_logprob"] for r in segments])).mean() if len(segments) else 0.0
+    )
 
     if not with_word_timestamps:
         if not normalize_text_as_words:
@@ -202,37 +210,40 @@ def decode_torch(audio,
             if remove_punctuation_from_words:
                 sub_text = remove_punctuation(sub_text)
             if not sub_text:
-                logger.warn(
-                    f"Lost text in segment {segment['start']}-{segment['end']}")
+                logger.warn(f"Lost text in segment {segment['start']}-{segment['end']}")
                 continue
             labels, emission, trellis, segments, word_segments = compute_alignment(
-                sub_audio, sub_text, spec_alignment_model)
+                sub_audio, sub_text, spec_alignment_model
+            )
             ratio = len(sub_audio) / (trellis.size(0) * SAMPLE_RATE)
             sub_words = sub_text.split()
             words = []
             use_original_words = True
             if len(sub_words) != len(word_segments):
                 logger.warn(
-                    f"Alignment failed. Some words might be mis-rendered.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}")
+                    f"Alignment failed. Some words might be mis-rendered.\nNumber of words: {len(sub_words)} != {len(word_segments)}\n>>>\n{sub_words}\n<<<\n{[segment.label for segment in word_segments]}"
+                )
                 assert len(word_segments) < len(sub_words)
                 use_original_words = False
             for word, seg in zip(sub_words, word_segments):
-                words.append({
-                    "word": word if use_original_words else seg.label,
-                    "start": seg.start * ratio + offset,
-                    "end": seg.end * ratio + offset,
-                    "conf": seg.score,
-                })
+                words.append(
+                    {
+                        "word": word if use_original_words else seg.label,
+                        "start": seg.start * ratio + offset,
+                        "end": seg.end * ratio + offset,
+                        "conf": seg.score,
+                    }
+                )
             # Glue the words inside a segment
             for i, word in enumerate(words):
                 if i == 0:
                     word["start"] = segment["start"]
                 else:
-                    word["start"] = words[i-1]["end"]
+                    word["start"] = words[i - 1]["end"]
                 if i == len(words) - 1:
                     word["end"] = segment["end"]
                 else:
-                    word["end"] = .5 * (words[i+1]["start"] + word["end"])
+                    word["end"] = 0.5 * (words[i + 1]["start"] + word["end"])
             # Accumulate results
             result["words"] += words
 
@@ -244,7 +255,9 @@ def format_whisper_timestamped_response(transcription, remove_punctuation_from_w
 
     for i, seg in enumerate(transcription["segments"][:-1]):
         for expected_keys in ["start", "end", "words", "avg_logprob"]:
-            assert expected_keys in seg, f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
+            assert (
+                expected_keys in seg
+            ), f"Missing '{expected_keys}' in segment {i} (that has keys {list(seg.keys())})"
 
     words = []
 
@@ -255,36 +268,43 @@ def format_whisper_timestamped_response(transcription, remove_punctuation_from_w
             text = word["text"]
             if remove_punctuation_from_words:
                 text = remove_punctuation(text)
-            words.append({
-                "word": text,
-                "start": word["start"],
-                "end": word["end"],
-                "conf": word["confidence"],
-            })
+            words.append(
+                {
+                    "word": text,
+                    "start": word["start"],
+                    "end": word["end"],
+                    "conf": word["confidence"],
+                }
+            )
 
     return {
         "text": transcription["text"].strip(),
         "language": transcription["language"],
-        "confidence-score": round(np.exp(np.array([r["avg_logprob"] for r in segments])).mean(), 2) if len(segments) else 0.0,
+        "confidence-score": round(np.exp(np.array([r["avg_logprob"] for r in segments])).mean(), 2)
+        if len(segments)
+        else 0.0,
         "words": words,
     }
 
 
 def format_faster_whisper_response(
-    segments, info,
+    segments,
+    info,
     remove_punctuation_from_words=False,
     glue_punctuations="'-&@.,",
-    ):
-
+):
     language = info.language
     duration = info.duration
 
     def checked_timestamps(start, end=None):
         if start > duration or (end is not None and end > duration):
-            print("WARNING, timestamp %f is greater than duration %f" % (max(start, end if end else start), duration))
+            print(
+                "WARNING, timestamp %f is greater than duration %f"
+                % (max(start, end if end else start), duration)
+            )
         if end and end <= start:
             if end == start:
-                pass # end = start + 0.01
+                pass  # end = start + 0.01
             else:
                 print("WARNING, end timestamp %f is smaller than start timestamp %f" % (end, start))
         if end is None:
@@ -300,34 +320,47 @@ def checked_timestamps(start, end=None):
             for word in segment.words:
                 start, end = checked_timestamps(word.start, word.end)
                 word_strip = word.word.strip()
-                if glue_punctuations and len(words) and len(word_strip)>1 and word_strip[0] in glue_punctuations:
+                if (
+                    glue_punctuations
+                    and len(words)
+                    and len(word_strip) > 1
+                    and word_strip[0] in glue_punctuations
+                ):
                     words[-1]["text"] += word.word.lstrip()
                     words[-1]["confidence"].append(word.probability)
                     words[-1]["end"] = max(words[-1]["end"], end)
                     continue
-                words.append({
-                    "text": word.word,
-                    "confidence": [word.probability],
-                    "start": start,
-                    "end": end
-                })
+                words.append(
+                    {
+                        "text": word.word,
+                        "confidence": [word.probability],
+                        "start": start,
+                        "end": end,
+                    }
+                )
 
             for word in words:
                 word["text"] = word["text"].strip()
                 word["confidence"] = round(np.mean([c for c in word["confidence"]]), 2)
 
-        segments_list.append({
-            "text": segment.text.strip(),
-            "start": start,
-            "end": end,
-            "avg_logprob": segment.avg_logprob,
-            "words": words
-        })
-    
+        segments_list.append(
+            {
+                "text": segment.text.strip(),
+                "start": start,
+                "end": end,
+                "avg_logprob": segment.avg_logprob,
+                "words": words,
+            }
+        )
+
     transcription = {
         "text": " ".join(segment["text"] for segment in segments_list),
         "language": language,
-        "confidence": round(np.exp(np.mean([segment["avg_logprob"] for segment in segments_list])), 2),
+        "confidence": round(
+            np.exp(np.mean([segment["avg_logprob"] for segment in segments_list])), 2
+        ),
         "segments": segments_list,
     }
-    return format_whisper_timestamped_response(transcription, remove_punctuation_from_words=remove_punctuation_from_words)
+    return format_whisper_timestamped_response(
+        transcription, remove_punctuation_from_words=remove_punctuation_from_words
+    )
diff --git a/whisper/stt/processing/load_model.py b/whisper/stt/processing/load_model.py
index 3790593..b87a414 100644
--- a/whisper/stt/processing/load_model.py
+++ b/whisper/stt/processing/load_model.py
@@ -1,18 +1,18 @@
 import os
-import sys
-import time
 import shutil
 import subprocess
+import sys
+import time
 
-from stt import logger, USE_CTRANSLATE2
+from stt import USE_CTRANSLATE2, logger
 
 if USE_CTRANSLATE2:
     import faster_whisper
 else:
     import whisper_timestamped as whisper
 
-def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
 
+def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
     start = time.time()
 
     logger.info("Loading Whisper model {}...".format(model_type_or_file))
@@ -51,26 +51,34 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             device_index = [int(dev) for dev in device[5:].split(",")]
             device = "cuda"
 
-        if not os.path.isfile(os.path.join(model_type_or_file, "model.bin")) and \
-            not max([model_type_or_file.startswith(prefix) for prefix in ["tiny", "base", "small", "medium", "large"]]):
-
+        if not os.path.isfile(os.path.join(model_type_or_file, "model.bin")) and not max(
+            [
+                model_type_or_file.startswith(prefix)
+                for prefix in ["tiny", "base", "small", "medium", "large"]
+            ]
+        ):
             # Convert transformer model
 
-            output_dir = os.path.join(download_root, f"ctranslate2/converters/transformers--{model_type_or_file.replace('/', '--')}")
+            output_dir = os.path.join(
+                download_root,
+                f"ctranslate2/converters/transformers--{model_type_or_file.replace('/', '--')}",
+            )
             logger.info(f"CTranslate2 model in {output_dir}")
             if not os.path.isdir(output_dir):
-
                 import huggingface_hub
 
                 delete_hf_path = False
                 if not os.path.isdir(model_type_or_file):
-
-                    hf_path = huggingface_hub.hf_hub_download(repo_id=model_type_or_file, filename="pytorch_model.bin")
+                    hf_path = huggingface_hub.hf_hub_download(
+                        repo_id=model_type_or_file, filename="pytorch_model.bin"
+                    )
                     hf_path = os.path.dirname(os.path.dirname(os.path.dirname(hf_path)))
 
                     delete_hf_path = not os.path.exists(hf_path)
                 else:
-                    assert os.path.isfile(os.path.join(model_type_or_file, "pytorch_model.bin")), f"Could not find pytorch_model.bin in {model_type_or_file}"
+                    assert os.path.isfile(
+                        os.path.join(model_type_or_file, "pytorch_model.bin")
+                    ), f"Could not find pytorch_model.bin in {model_type_or_file}"
 
                 check_torch_installed()
 
@@ -91,16 +99,21 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                     #     force=False
                     # )
 
-                    subprocess.check_call([
-                        "ct2-transformers-converter",
-                        "--model", model_type_or_file,
-                        "--output_dir", os.path.realpath(output_dir),
-                        "--quantization", "float16",
-                    ])
+                    subprocess.check_call(
+                        [
+                            "ct2-transformers-converter",
+                            "--model",
+                            model_type_or_file,
+                            "--output_dir",
+                            os.path.realpath(output_dir),
+                            "--quantization",
+                            "float16",
+                        ]
+                    )
                 except Exception as err:
                     shutil.rmtree(output_dir, ignore_errors=True)
                     raise err
-                
+
                 finally:
                     if delete_hf_path:
                         logger.info(f"Deleting {hf_path}")
@@ -124,26 +137,28 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                 )
                 break
             except ValueError as err:
-                logger.info("WARNING: failed to load model with compute_type={}".format(compute_type))
+                logger.info(
+                    "WARNING: failed to load model with compute_type={}".format(compute_type)
+                )
                 # On some old GPU we may have the error
-                # "ValueError: Requested int8_float16 compute type, 
+                # "ValueError: Requested int8_float16 compute type,
                 # but the target device or backend do not support efficient int8_float16 computation."
                 if i == len(compute_types) - 1:
                     raise err
 
     else:
-
-        extension = os.path.splitext(model_type_or_file)[-1] if os.path.isfile(model_type_or_file) else None
+        extension = (
+            os.path.splitext(model_type_or_file)[-1] if os.path.isfile(model_type_or_file) else None
+        )
 
         if model_type_or_file in whisper.available_models() or extension == ".pt":
-
             model = whisper.load_model(
-                model_type_or_file, device=device,
-                download_root=os.path.join(download_root, "whisper")
+                model_type_or_file,
+                device=device,
+                download_root=os.path.join(download_root, "whisper"),
             )
 
         else:
-
             # Convert HuggingFace model
             import torch
 
@@ -161,25 +176,41 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
                 try:
                     import transformers
                 except ImportError:
-                    raise ImportError(f"If you are trying to download a HuggingFace model with {model_type_or_file}, please install first the transformers library")
+                    raise ImportError(
+                        f"If you are trying to download a HuggingFace model with {model_type_or_file}, please install first the transformers library"
+                    )
                 from transformers.utils import cached_file
 
                 try:
-                    model_path = cached_file(model_type_or_file, "pytorch_model.bin", cache_dir=download_root, use_auth_token=None, revision=None)
+                    model_path = cached_file(
+                        model_type_or_file,
+                        "pytorch_model.bin",
+                        cache_dir=download_root,
+                        use_auth_token=None,
+                        revision=None,
+                    )
                 except Exception as e:
                     try:
                         if isinstance(e, OSError):
-                            model_path = cached_file(model_type_or_file, "whisper.ckpt", cache_dir=download_root, use_auth_token=None, revision=None)
+                            model_path = cached_file(
+                                model_type_or_file,
+                                "whisper.ckpt",
+                                cache_dir=download_root,
+                                use_auth_token=None,
+                                revision=None,
+                            )
                         else:
                             raise e
                     except:
                         if peft_folder is None:
-                            raise RuntimeError(f"Original error: {e}\nCould not find model {model_type_or_file} from HuggingFace nor local folders.")
+                            raise RuntimeError(
+                                f"Original error: {e}\nCould not find model {model_type_or_file} from HuggingFace nor local folders."
+                            )
 
             # Load HF Model
             if peft_folder is not None:
-                from peft import PeftConfig, PeftModel
                 import transformers
+                from peft import PeftConfig, PeftModel
 
                 peft_config = PeftConfig.from_pretrained(peft_folder)
                 base_model = peft_config.base_model_name_or_path
@@ -191,7 +222,7 @@ def load_whisper_model(model_type_or_file, device="cpu", download_root=None):
             else:
                 hf_state_dict = torch.load(model_path, map_location="cpu")
 
-            # Rename layers 
+            # Rename layers
             for key in list(hf_state_dict.keys()):
                 new_key = hf_to_whisper_states(key)
                 if new_key is None:
@@ -235,73 +266,82 @@ def check_torch_installed():
 
     # import torch
 
+
 # Credit: https://github.com/openai/whisper/discussions/830
 def hf_to_whisper_states(text):
     import re
-    
+
     # From Speechbrain
     if text == "_mel_filters":
         return None
-    
+
     # From PEFT
     if "default" in text:
         # print(f"WARNING: Ignoring {text}")
         return None
     if text.startswith("base_model.model."):
-        text = text[len("base_model.model."):]
-    
-    text = re.sub('.layers.', '.blocks.', text)
-    text = re.sub('.self_attn.', '.attn.', text)
-    text = re.sub('.q_proj.', '.query.', text)
-    text = re.sub('.k_proj.', '.key.', text)
-    text = re.sub('.v_proj.', '.value.', text)
-    text = re.sub('.out_proj.', '.out.', text)
-    text = re.sub('.fc1.', '.mlp.0.', text)
-    text = re.sub('.fc2.', '.mlp.2.', text)
-    text = re.sub('.fc3.', '.mlp.3.', text)
-    text = re.sub('.fc3.', '.mlp.3.', text)
-    text = re.sub('.encoder_attn.', '.cross_attn.', text)
-    text = re.sub('.cross_attn.ln.', '.cross_attn_ln.', text)
-    text = re.sub('.embed_positions.weight', '.positional_embedding', text)
-    text = re.sub('.embed_tokens.', '.token_embedding.', text)
-    text = re.sub('model.', '', text)
-    text = re.sub('attn.layer_norm.', 'attn_ln.', text)
-    text = re.sub('.final_layer_norm.', '.mlp_ln.', text)
-    text = re.sub('encoder.layer_norm.', 'encoder.ln_post.', text)
-    text = re.sub('decoder.layer_norm.', 'decoder.ln.', text)
+        text = text[len("base_model.model.") :]
+
+    text = re.sub(".layers.", ".blocks.", text)
+    text = re.sub(".self_attn.", ".attn.", text)
+    text = re.sub(".q_proj.", ".query.", text)
+    text = re.sub(".k_proj.", ".key.", text)
+    text = re.sub(".v_proj.", ".value.", text)
+    text = re.sub(".out_proj.", ".out.", text)
+    text = re.sub(".fc1.", ".mlp.0.", text)
+    text = re.sub(".fc2.", ".mlp.2.", text)
+    text = re.sub(".fc3.", ".mlp.3.", text)
+    text = re.sub(".fc3.", ".mlp.3.", text)
+    text = re.sub(".encoder_attn.", ".cross_attn.", text)
+    text = re.sub(".cross_attn.ln.", ".cross_attn_ln.", text)
+    text = re.sub(".embed_positions.weight", ".positional_embedding", text)
+    text = re.sub(".embed_tokens.", ".token_embedding.", text)
+    text = re.sub("model.", "", text)
+    text = re.sub("attn.layer_norm.", "attn_ln.", text)
+    text = re.sub(".final_layer_norm.", ".mlp_ln.", text)
+    text = re.sub("encoder.layer_norm.", "encoder.ln_post.", text)
+    text = re.sub("decoder.layer_norm.", "decoder.ln.", text)
     return text
 
+
 def states_to_dim(state_dict):
-    n_audio_state = len(state_dict['encoder.ln_post.bias'])
+    n_audio_state = len(state_dict["encoder.ln_post.bias"])
     n_text_state = len(state_dict["decoder.ln.bias"])
     return {
-        "n_mels":        state_dict["encoder.conv1.weight"].shape[1],           # 80
-        "n_vocab":       state_dict["decoder.token_embedding.weight"].shape[0], # 51864 / 51865
-        "n_audio_ctx":   state_dict["encoder.positional_embedding"].shape[0],   # 1500
-        "n_audio_state": n_audio_state,         # 384 / 512 / 768 / 1024 / 1280
-        "n_audio_head":  n_audio_state // 64,   # 6 / 8 / 12 / 16 / 20
-        "n_audio_layer": len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "encoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
-        "n_text_ctx":    state_dict["decoder.positional_embedding"].shape[0],   # 448
-        "n_text_state":  n_text_state,          # 384 / 512 / 768 / 1024 / 1280
-        "n_text_head":   n_text_state // 64,    # 6 / 8 / 12 / 16 / 20
-        "n_text_layer":  len(set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "decoder.blocks." in k])), # 4 / 6 / 12 / 24 / 32
+        "n_mels": state_dict["encoder.conv1.weight"].shape[1],  # 80
+        "n_vocab": state_dict["decoder.token_embedding.weight"].shape[0],  # 51864 / 51865
+        "n_audio_ctx": state_dict["encoder.positional_embedding"].shape[0],  # 1500
+        "n_audio_state": n_audio_state,  # 384 / 512 / 768 / 1024 / 1280
+        "n_audio_head": n_audio_state // 64,  # 6 / 8 / 12 / 16 / 20
+        "n_audio_layer": len(
+            set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "encoder.blocks." in k])
+        ),  # 4 / 6 / 12 / 24 / 32
+        "n_text_ctx": state_dict["decoder.positional_embedding"].shape[0],  # 448
+        "n_text_state": n_text_state,  # 384 / 512 / 768 / 1024 / 1280
+        "n_text_head": n_text_state // 64,  # 6 / 8 / 12 / 16 / 20
+        "n_text_layer": len(
+            set([".".join(k.split(".")[:3]) for k in state_dict.keys() if "decoder.blocks." in k])
+        ),  # 4 / 6 / 12 / 24 / 32
     }
 
+
 if not USE_CTRANSLATE2:
 
     class TextDecoderUntied(whisper.model.TextDecoder):
         """
         Same as TextDecoder but with untied weights
         """
+
         def __init__(self, *args, **kwargs):
             import torch
+
             super().__init__(*args, **kwargs)
 
             n_vocab, n_state = self.token_embedding.weight.shape
 
             self.proj_out = torch.nn.Linear(n_state, n_vocab, bias=False)
 
-        def forward(self, x, xa, kv_cache = None):
+        def forward(self, x, xa, kv_cache=None):
             offset = next(iter(kv_cache.values())).shape[1] if kv_cache else 0
             x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
             x = x.to(xa.dtype)
diff --git a/whisper/stt/processing/text_normalize.py b/whisper/stt/processing/text_normalize.py
index a5f3d04..cde8f38 100644
--- a/whisper/stt/processing/text_normalize.py
+++ b/whisper/stt/processing/text_normalize.py
@@ -4,6 +4,7 @@
 import unicodedata
 
 from stt import logger
+
 from .utils import flatten
 
 # All punctuations and symbols EXCEPT:
@@ -21,14 +22,14 @@
 # A list of symbols that can be an isolated words and not in the exclusion list above
 # * &
 # * candidates not retained: §, <, =, >, ≤, ≥
-_maybe_word_regex = None # r"[" + re.escape("&") + r"]$"
+_maybe_word_regex = None  # r"[" + re.escape("&") + r"]$"
 
 
-def remove_punctuation(text: str, ensure_no_spaces_in_words: bool=False) -> str:
+def remove_punctuation(text: str, ensure_no_spaces_in_words: bool = False) -> str:
     text = text.strip()
     # Note: we don't remove dots inside words (e.g. "ab@gmail.com")
-    new_text = re.sub(_leading_punctuations_regex, "", text) #.lstrip()
-    new_text = re.sub(_trailing_punctuations_regex, "", new_text) #.rstrip()
+    new_text = re.sub(_leading_punctuations_regex, "", text)  # .lstrip()
+    new_text = re.sub(_trailing_punctuations_regex, "", new_text)  # .rstrip()
     # Let punctuation marks that are alone
     if not new_text:
         if _maybe_word_regex and re.match(_maybe_word_regex, text):
@@ -43,6 +44,7 @@ def remove_punctuation(text: str, ensure_no_spaces_in_words: bool=False) -> str:
         return remove_punctuation(new_text, ensure_no_spaces_in_words=ensure_no_spaces_in_words)
     return new_text
 
+
 def transliterate(c):
     # Transliterates a character to its closest ASCII equivalent.
     # Example: transliterate("à ß œ ﬂ") = "a ss oe fl"
@@ -56,66 +58,62 @@ def transliterate(c):
 
 def remove_emoji(text):
     # Remove emojis
-    return re.sub(r"[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F1E0-\U0001F1FF]+", "", text)
+    return re.sub(
+        r"[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F1E0-\U0001F1FF]+",
+        "",
+        text,
+    )
 
 
 def normalize_text(text: str, lang: str) -> str:
-    """ Transform digits into characters... """
+    """Transform digits into characters..."""
 
     # Reorder currencies (1,20€ -> 1 € 20)
     coma = "," if lang in ["fr"] else "\."
     for c in _currencies:
         if c in text:
-            text = re.sub(r"\b(\d+)" + coma + r"(\d+)\s*" +
-                          c, r"\1 " + c + r" \2", text)
+            text = re.sub(r"\b(\d+)" + coma + r"(\d+)\s*" + c, r"\1 " + c + r" \2", text)
 
     # Roman digits
     if re.search(r"[IVX]", text):
         if lang == "en":
-            digits = re.findall(
-                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|st|nd|rd|th)?\b", text)
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|st|nd|rd|th)?\b", text)
             digits = ["".join(d) for d in digits]
         elif lang == "fr":
             digits = re.findall(
-                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text)
+                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})(º|ème|eme|e|er|ère)?\b", text
+            )
             digits = ["".join(d) for d in digits]
         else:
-            digits = re.findall(
-                r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})\b", text)
+            digits = re.findall(r"\b(?=[XVI])M*(XX{0,3})(I[XV]|V?I{0,3})\b", text)
             digits = ["".join(d) for d in digits]
         if digits:
-            digits = sorted(list(set(digits)), reverse=True,
-                            key=lambda x: (len(x), x))
+            digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
             for s in digits:
                 filtered = re.sub("[a-zèº]", "", s)
                 ordinal = filtered != s
                 digit = roman_to_decimal(filtered)
-                v = undigit(str(digit), lang=lang,
-                            to="ordinal" if ordinal else "cardinal")
+                v = undigit(str(digit), lang=lang, to="ordinal" if ordinal else "cardinal")
                 text = re.sub(r"\b" + s + r"\b", v, text)
 
     # Ordinal digits
     if lang == "en":
-        digits = re.findall(
-            r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:º|th)\b", text)
+        digits = re.findall(r"\b\d*1(?:st)|\d*2(?:nd)|\d*3(?:rd)|\d+(?:º|th)\b", text)
     elif lang == "fr":
-        digits = re.findall(
-            r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
+        digits = re.findall(r"\b1(?:ère|ere|er|re|r)|2(?:nd|nde)|\d+(?:º|ème|eme|e)\b", text)
     else:
         logger.warn(
-            f"Language {lang} not supported for some normalization. Some words might be mis-localized.")
+            f"Language {lang} not supported for some normalization. Some words might be mis-localized."
+        )
         digits = []
     if digits:
-        digits = sorted(list(set(digits)), reverse=True,
-                        key=lambda x: (len(x), x))
+        digits = sorted(list(set(digits)), reverse=True, key=lambda x: (len(x), x))
         for digit in digits:
-            word = undigit(re.findall(r"\d+", digit)
-                           [0], to="ordinal", lang=lang)
-            text = re.sub(r'\b'+str(digit)+r'\b', word, text)
+            word = undigit(re.findall(r"\d+", digit)[0], to="ordinal", lang=lang)
+            text = re.sub(r"\b" + str(digit) + r"\b", word, text)
 
     # Cardinal digits
-    digits = re.findall(
-        r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)", text)
+    digits = re.findall(r"(?:\-?\b[\d/]*\d+(?: \d\d\d)+\b)|(?:\-?\d[/\d]*)", text)
     digits = list(map(lambda s: s.strip(r"[/ ]"), digits))
     digits = list(set(digits))
     digits = digits + flatten([c.split() for c in digits if " " in c])
@@ -131,54 +129,55 @@ def normalize_text(text: str, lang: str) -> str:
         elif numslash == 1:  # Fraction or date
             i = digitf.index("/")
             is_date = False
-            if len(digitf[i+1:]) == 2:
+            if len(digitf[i + 1 :]) == 2:
                 try:
                     first = int(digitf[:i])
-                    second = int(digitf[i+1:])
+                    second = int(digitf[i + 1 :])
                     is_date = first > 0 and first < 32 and second > 0 and second < 13
                 except:
                     pass
             if is_date:
                 first = digitf[:i].lstrip("0")
                 use_ordinal = (lang == "fr" and first == "1") or (
-                    lang != "fr" and first[-1] in ["1", "2", "3"])
-                first = undigit(first, lang=lang,
-                                to="ordinal" if use_ordinal else "cardinal")
-                second = _int_to_month.get(lang, {}).get(second,digitf[i+1:])
+                    lang != "fr" and first[-1] in ["1", "2", "3"]
+                )
+                first = undigit(first, lang=lang, to="ordinal" if use_ordinal else "cardinal")
+                second = _int_to_month.get(lang, {}).get(second, digitf[i + 1 :])
             else:
                 first = undigit(digitf[:i], lang=lang)
-                second = undigit(digitf[i+1:], to="denominator", lang=lang)
-                if float(digitf[:i]) > 2. and second[-1] != "s":
+                second = undigit(digitf[i + 1 :], to="denominator", lang=lang)
+                if float(digitf[:i]) > 2.0 and second[-1] != "s":
                     second += "s"
             word = first + " " + second
         elif numslash == 2:  # Maybe a date
             i1 = digitf.index("/")
-            i2 = digitf.index("/", i1+1)
+            i2 = digitf.index("/", i1 + 1)
             is_date = False
-            if len(digitf[i1+1:i2]) == 2 and len(digitf[i2+1:]) == 4:
+            if len(digitf[i1 + 1 : i2]) == 2 and len(digitf[i2 + 1 :]) == 4:
                 try:
                     first = int(digitf[:i1])
-                    second = int(digitf[i1+1:i2])
-                    third = int(digitf[i2+1:])
-                    is_date = first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
+                    second = int(digitf[i1 + 1 : i2])
+                    third = int(digitf[i2 + 1 :])
+                    is_date = (
+                        first > 0 and first < 32 and second > 0 and second < 13 and third > 1000
+                    )
                 except:
                     pass
-            third = undigit(digitf[i2+1:], lang=lang)
+            third = undigit(digitf[i2 + 1 :], lang=lang)
             if is_date:
                 first = digitf[:i1].lstrip("0")
                 use_ordinal = (lang == "fr" and first == "1") or (
-                    lang != "fr" and first[-1] in ["1", "2", "3"])
-                first = undigit(first, lang=lang,
-                                to="ordinal" if use_ordinal else "cardinal")
+                    lang != "fr" and first[-1] in ["1", "2", "3"]
+                )
+                first = undigit(first, lang=lang, to="ordinal" if use_ordinal else "cardinal")
                 second = _int_to_month.get(lang, {}).get(
-                    int(digitf[i1+1:i2]), digitf[i1+1:i2])
+                    int(digitf[i1 + 1 : i2]), digitf[i1 + 1 : i2]
+                )
                 word = " ".join([first, second, third])
             else:
-                word = " / ".join([undigit(s, lang=lang)
-                                  for s in digitf.split('/')])
+                word = " / ".join([undigit(s, lang=lang) for s in digitf.split("/")])
         else:
-            word = " / ".join([undigit(s, lang=lang)
-                              for s in digitf.split('/')])
+            word = " / ".join([undigit(s, lang=lang) for s in digitf.split("/")])
         text = replace_keeping_word_boundaries(digit, word, text)
 
     # Symbols (currencies, percent...)
@@ -194,12 +193,13 @@ def normalize_text(text: str, lang: str) -> str:
 
 def replace_keeping_word_boundaries(orig, dest, text):
     if orig in text:
-        text = re.sub(r"(\W)"+orig+r"(\W)", r"\1"+dest+r"\2", text)
-        text = re.sub(orig+r"(\W)", " "+dest+r"\1", text)
-        text = re.sub(r"(\W)"+orig, r"\1"+dest+" ", text)
-        text = re.sub(orig, " "+dest+" ", text)
+        text = re.sub(r"(\W)" + orig + r"(\W)", r"\1" + dest + r"\2", text)
+        text = re.sub(orig + r"(\W)", " " + dest + r"\1", text)
+        text = re.sub(r"(\W)" + orig, r"\1" + dest + " ", text)
+        text = re.sub(orig, " " + dest + " ", text)
     return text
 
+
 def undigit(str, lang, to="cardinal"):
     str = re.sub(" ", "", str)
     if to == "denominator":
@@ -224,7 +224,9 @@ def undigit(str, lang, to="cardinal"):
     if str.startswith("0") and to == "cardinal":
         numZeros = len(re.findall(r"0+", str)[0])
         if numZeros < len(str):
-            return numZeros * (robust_num2words(0, lang=lang)+" ") + robust_num2words(float(str), lang=lang, to=to)
+            return numZeros * (robust_num2words(0, lang=lang) + " ") + robust_num2words(
+                float(str), lang=lang, to=to
+            )
     return robust_num2words(float(str), lang=lang, to=to)
 
 
@@ -233,6 +235,7 @@ def robust_num2words(x, lang, to="cardinal", orig=""):
     Bugfix for num2words
     """
     from num2words import num2words
+
     try:
         res = num2words(x, lang=lang, to=to)
         if lang == "fr" and to == "ordinal":
@@ -244,34 +247,34 @@ def robust_num2words(x, lang, to="cardinal", orig=""):
         if x == -math.inf:  # !
             return "moins " + robust_num2words(-x, lang=lang, to=to, orig=orig.replace("-", ""))
         # TODO: print a warning
-        return robust_num2words(x//10, lang=lang, to=to)
+        return robust_num2words(x // 10, lang=lang, to=to)
 
 
 def roman_to_decimal(str):
     def value(r):
-        if (r == 'I'):
+        if r == "I":
             return 1
-        if (r == 'V'):
+        if r == "V":
             return 5
-        if (r == 'X'):
+        if r == "X":
             return 10
-        if (r == 'L'):
+        if r == "L":
             return 50
-        if (r == 'C'):
+        if r == "C":
             return 100
-        if (r == 'D'):
+        if r == "D":
             return 500
-        if (r == 'M'):
+        if r == "M":
             return 1000
         return -1
 
     res = 0
     i = 0
-    while (i < len(str)):
+    while i < len(str):
         s1 = value(str[i])
-        if (i + 1 < len(str)):
+        if i + 1 < len(str):
             s2 = value(str[i + 1])
-            if (s1 >= s2):
+            if s1 >= s2:
                 # Value of current symbol is greater or equal to the next symbol
                 res = res + s1
                 i = i + 1
@@ -313,7 +316,7 @@ def value(r):
         10: "october",
         11: "november",
         12: "december",
-    }
+    },
 }
 
 _currencies = ["€", "$", "£", "¥"]
@@ -386,6 +389,5 @@ def value(r):
         "\$": "dollars",
         "£": "pounds",
         "¥": "yens",
-    }
+    },
 }
-
diff --git a/whisper/stt/processing/utils.py b/whisper/stt/processing/utils.py
index 0352de4..106167a 100644
--- a/whisper/stt/processing/utils.py
+++ b/whisper/stt/processing/utils.py
@@ -1,32 +1,35 @@
-from stt import USE_CTRANSLATE2, USE_TORCH, USE_TORCHAUDIO
-
 import io
-import wavio
 import os
+
 import numpy as np
+import wavio
+from stt import USE_CTRANSLATE2, USE_TORCH, USE_TORCHAUDIO
 
-SAMPLE_RATE = 16000 # whisper.audio.SAMPLE_RATE
+SAMPLE_RATE = 16000  # whisper.audio.SAMPLE_RATE
 
 if USE_CTRANSLATE2:
     import ctranslate2
     import faster_whisper
 else:
     import torch
+
     import whisper
 
 if USE_TORCHAUDIO:
     import torchaudio
 
+
 def has_cuda():
     if USE_CTRANSLATE2:
         return ctranslate2.get_cuda_device_count() > 0
     else:
         return torch.cuda.is_available()
 
+
 def get_device():
     device = os.environ.get("DEVICE", "cuda" if has_cuda() else "cpu")
     use_gpu = "cuda" in device
-        
+
     if USE_CTRANSLATE2:
         try:
             if device.startswith("cuda:"):
@@ -34,7 +37,9 @@ def get_device():
             else:
                 assert device in ["cpu", "cuda"]
         except:
-            raise ValueError(f"Invalid DEVICE '{device}' (should be 'cpu' or 'cuda' or 'cuda:<index> or 'cuda:<index1>,<index2>,...')")
+            raise ValueError(
+                f"Invalid DEVICE '{device}' (should be 'cpu' or 'cuda' or 'cuda:<index> or 'cuda:<index1>,<index2>,...')"
+            )
     else:
         try:
             device = torch.device(device)
@@ -42,6 +47,7 @@ def get_device():
             raise Exception("Failed to set device: {}".format(str(err))) from err
     return device, use_gpu
 
+
 def get_language():
     """
     Get the language from the environment variable LANGUAGE, and format as expected by Whisper.
@@ -58,13 +64,17 @@ def get_language():
         language = {v: k for k, v in LANGUAGES.items()}.get(language.lower(), language)
         # Raise an exception for unknown languages
         if language not in LANGUAGES:
-            available_languages = \
-                list(LANGUAGES.keys()) + \
-                [k[0].upper() + k[1:] for k in LANGUAGES.values()] + \
-                ["*", None]
-            raise ValueError(f"Language '{language}' is not available. Available languages are: {available_languages}")
+            available_languages = (
+                list(LANGUAGES.keys())
+                + [k[0].upper() + k[1:] for k in LANGUAGES.values()]
+                + ["*", None]
+            )
+            raise ValueError(
+                f"Language '{language}' is not available. Available languages are: {available_languages}"
+            )
     return language
 
+
 def conform_audio(audio, sample_rate=16_000):
     if sample_rate != SAMPLE_RATE:
         if not USE_TORCHAUDIO:
@@ -93,13 +103,13 @@ def load_audiofile(path):
 
 
 def load_wave_buffer(file_buffer):
-    """ Formats audio from a wavFile buffer to a torch array for processing. """
+    """Formats audio from a wavFile buffer to a torch array for processing."""
     file_buffer_io = io.BytesIO(file_buffer)
     if USE_CTRANSLATE2:
         return faster_whisper.decode_audio(file_buffer_io, sampling_rate=SAMPLE_RATE)
     file_content = wavio.read(file_buffer_io)
     sample_rate = file_content.rate
-    audio = file_content.data.astype(np.float32)/32768
+    audio = file_content.data.astype(np.float32) / 32768
     audio = audio.transpose()
     audio = torch.from_numpy(audio)
     return conform_audio(audio, sample_rate)
@@ -111,104 +121,105 @@ def flatten(l):
     """
     return [item for sublist in l for item in sublist]
 
-LANGUAGES = { # whisper.tokenizer.LANGUAGES
-    'en': 'english',
-    'zh': 'chinese',
-    'de': 'german',
-    'es': 'spanish',
-    'ru': 'russian',
-    'ko': 'korean',
-    'fr': 'french',
-    'ja': 'japanese',
-    'pt': 'portuguese',
-    'tr': 'turkish',
-    'pl': 'polish',
-    'ca': 'catalan',
-    'nl': 'dutch',
-    'ar': 'arabic',
-    'sv': 'swedish',
-    'it': 'italian',
-    'id': 'indonesian',
-    'hi': 'hindi',
-    'fi': 'finnish',
-    'vi': 'vietnamese',
-    'he': 'hebrew',
-    'uk': 'ukrainian',
-    'el': 'greek',
-    'ms': 'malay',
-    'cs': 'czech',
-    'ro': 'romanian',
-    'da': 'danish',
-    'hu': 'hungarian',
-    'ta': 'tamil',
-    'no': 'norwegian',
-    'th': 'thai',
-    'ur': 'urdu',
-    'hr': 'croatian',
-    'bg': 'bulgarian',
-    'lt': 'lithuanian',
-    'la': 'latin',
-    'mi': 'maori',
-    'ml': 'malayalam',
-    'cy': 'welsh',
-    'sk': 'slovak',
-    'te': 'telugu',
-    'fa': 'persian',
-    'lv': 'latvian',
-    'bn': 'bengali',
-    'sr': 'serbian',
-    'az': 'azerbaijani',
-    'sl': 'slovenian',
-    'kn': 'kannada',
-    'et': 'estonian',
-    'mk': 'macedonian',
-    'br': 'breton',
-    'eu': 'basque',
-    'is': 'icelandic',
-    'hy': 'armenian',
-    'ne': 'nepali',
-    'mn': 'mongolian',
-    'bs': 'bosnian',
-    'kk': 'kazakh',
-    'sq': 'albanian',
-    'sw': 'swahili',
-    'gl': 'galician',
-    'mr': 'marathi',
-    'pa': 'punjabi',
-    'si': 'sinhala',
-    'km': 'khmer',
-    'sn': 'shona',
-    'yo': 'yoruba',
-    'so': 'somali',
-    'af': 'afrikaans',
-    'oc': 'occitan',
-    'ka': 'georgian',
-    'be': 'belarusian',
-    'tg': 'tajik',
-    'sd': 'sindhi',
-    'gu': 'gujarati',
-    'am': 'amharic',
-    'yi': 'yiddish',
-    'lo': 'lao',
-    'uz': 'uzbek',
-    'fo': 'faroese',
-    'ht': 'haitian creole',
-    'ps': 'pashto',
-    'tk': 'turkmen',
-    'nn': 'nynorsk',
-    'mt': 'maltese',
-    'sa': 'sanskrit',
-    'lb': 'luxembourgish',
-    'my': 'myanmar',
-    'bo': 'tibetan',
-    'tl': 'tagalog',
-    'mg': 'malagasy',
-    'as': 'assamese',
-    'tt': 'tatar',
-    'haw': 'hawaiian',
-    'ln': 'lingala',
-    'ha': 'hausa',
-    'ba': 'bashkir',
-    'jw': 'javanese',
-    'su': 'sundanese'
+
+LANGUAGES = {  # whisper.tokenizer.LANGUAGES
+    "en": "english",
+    "zh": "chinese",
+    "de": "german",
+    "es": "spanish",
+    "ru": "russian",
+    "ko": "korean",
+    "fr": "french",
+    "ja": "japanese",
+    "pt": "portuguese",
+    "tr": "turkish",
+    "pl": "polish",
+    "ca": "catalan",
+    "nl": "dutch",
+    "ar": "arabic",
+    "sv": "swedish",
+    "it": "italian",
+    "id": "indonesian",
+    "hi": "hindi",
+    "fi": "finnish",
+    "vi": "vietnamese",
+    "he": "hebrew",
+    "uk": "ukrainian",
+    "el": "greek",
+    "ms": "malay",
+    "cs": "czech",
+    "ro": "romanian",
+    "da": "danish",
+    "hu": "hungarian",
+    "ta": "tamil",
+    "no": "norwegian",
+    "th": "thai",
+    "ur": "urdu",
+    "hr": "croatian",
+    "bg": "bulgarian",
+    "lt": "lithuanian",
+    "la": "latin",
+    "mi": "maori",
+    "ml": "malayalam",
+    "cy": "welsh",
+    "sk": "slovak",
+    "te": "telugu",
+    "fa": "persian",
+    "lv": "latvian",
+    "bn": "bengali",
+    "sr": "serbian",
+    "az": "azerbaijani",
+    "sl": "slovenian",
+    "kn": "kannada",
+    "et": "estonian",
+    "mk": "macedonian",
+    "br": "breton",
+    "eu": "basque",
+    "is": "icelandic",
+    "hy": "armenian",
+    "ne": "nepali",
+    "mn": "mongolian",
+    "bs": "bosnian",
+    "kk": "kazakh",
+    "sq": "albanian",
+    "sw": "swahili",
+    "gl": "galician",
+    "mr": "marathi",
+    "pa": "punjabi",
+    "si": "sinhala",
+    "km": "khmer",
+    "sn": "shona",
+    "yo": "yoruba",
+    "so": "somali",
+    "af": "afrikaans",
+    "oc": "occitan",
+    "ka": "georgian",
+    "be": "belarusian",
+    "tg": "tajik",
+    "sd": "sindhi",
+    "gu": "gujarati",
+    "am": "amharic",
+    "yi": "yiddish",
+    "lo": "lao",
+    "uz": "uzbek",
+    "fo": "faroese",
+    "ht": "haitian creole",
+    "ps": "pashto",
+    "tk": "turkmen",
+    "nn": "nynorsk",
+    "mt": "maltese",
+    "sa": "sanskrit",
+    "lb": "luxembourgish",
+    "my": "myanmar",
+    "bo": "tibetan",
+    "tl": "tagalog",
+    "mg": "malagasy",
+    "as": "assamese",
+    "tt": "tatar",
+    "haw": "hawaiian",
+    "ln": "lingala",
+    "ha": "hausa",
+    "ba": "bashkir",
+    "jw": "javanese",
+    "su": "sundanese",
 }
diff --git a/whisper/stt/processing/word_alignment.py b/whisper/stt/processing/word_alignment.py
index 229fb43..e7a9256 100644
--- a/whisper/stt/processing/word_alignment.py
+++ b/whisper/stt/processing/word_alignment.py
@@ -1,24 +1,26 @@
 """
 Credits: https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html
 """
-from stt import logger, USE_TORCH
 from dataclasses import dataclass
 
+from stt import USE_TORCH, logger
+
 from .alignment_model import compute_logprobas, get_vocab
-from .utils import flatten
 from .text_normalize import transliterate
+from .utils import flatten
 
 if USE_TORCH:
     import torch
 
 _unknown_chars = []
 
+
 def compute_alignment(audio, transcript, model):
-    """ Compute the alignment of the audio and a transcript, for a given model that returns log-probabilities on the charset defined the transcript."""
+    """Compute the alignment of the audio and a transcript, for a given model that returns log-probabilities on the charset defined the transcript."""
 
     emission = compute_logprobas(model, audio)
     labels, blank_id = get_vocab(model)
-    labels = labels[:emission.shape[1]]
+    labels = labels[: emission.shape[1]]
     dictionary = {c: i for i, c in enumerate(labels)}
 
     default = labels.index("-") if "-" in labels else None
@@ -30,8 +32,7 @@ def compute_alignment(audio, transcript, model):
     if len(tokens) + num_repetitions > num_emissions:
         # It will be impossible to find a path...
         # It can happen when Whisper is lost in a loop (ex: "Ha ha ha ha ...")
-        logger.warn(
-            f"Got too many characters from Whisper. Shrinking to the first characters.")
+        logger.warn(f"Got too many characters from Whisper. Shrinking to the first characters.")
         tokens = tokens[:num_emissions]
         num_repetitions = count_repetitions(tokens)
         while len(tokens) + num_repetitions > num_emissions:
@@ -62,8 +63,7 @@ def loose_get_char_index(dictionary, c, default=None):
     if i is None:
         # Try with alternative versions of the character
         tc = transliterate(c)
-        other_char = list(
-            set([c.lower(), c.upper(), tc, tc.lower(), tc.upper()]))
+        other_char = list(set([c.lower(), c.upper(), tc, tc.lower(), tc.upper()]))
         for c2 in other_char:
             i = dictionary.get(c2, None)
             if i is not None:
@@ -73,15 +73,17 @@ def loose_get_char_index(dictionary, c, default=None):
         if i is None:
             for c2 in other_char:
                 if len(c2) > 1:
-                    candidate = [dictionary[c3]
-                                 for c3 in c2 if c3 in dictionary]
+                    candidate = [dictionary[c3] for c3 in c2 if c3 in dictionary]
                     if len(candidate) > 0 and (i is None or len(candidate) > len(i)):
                         i = candidate
         # If still not found
         if i is None:
             if c not in _unknown_chars:
-                logger.warn("Character not correctly handled by alignment model: '" +
-                            "' / '".join(list(set([c] + other_char))) + "'")
+                logger.warn(
+                    "Character not correctly handled by alignment model: '"
+                    + "' / '".join(list(set([c] + other_char)))
+                    + "'"
+                )
                 _unknown_chars.append(c)
             i = [default] if default is not None else []
     else:
@@ -103,16 +105,23 @@ def get_trellis(emission, tokens, blank_id=0, use_max=False):
     trellis[-num_tokens:, 0] = float("inf")
 
     for t in range(num_frame):
-        trellis[t + 1, 1:] = torch.maximum(
-            # Score for staying at the same token
-            trellis[t, 1:] + emission[t, blank_id],
-            torch.maximum(trellis[t, 1:] + emission[t, tokens],
-                          # Score for changing to the next token
-                          trellis[t, :-1] + emission[t, tokens])
-        ) if use_max else torch.logaddexp(
-            trellis[t, 1:] + emission[t, blank_id],
-            torch.logaddexp(trellis[t, 1:] + emission[t, tokens],
-                            trellis[t, :-1] + emission[t, tokens])
+        trellis[t + 1, 1:] = (
+            torch.maximum(
+                # Score for staying at the same token
+                trellis[t, 1:] + emission[t, blank_id],
+                torch.maximum(
+                    trellis[t, 1:] + emission[t, tokens],
+                    # Score for changing to the next token
+                    trellis[t, :-1] + emission[t, tokens],
+                ),
+            )
+            if use_max
+            else torch.logaddexp(
+                trellis[t, 1:] + emission[t, blank_id],
+                torch.logaddexp(
+                    trellis[t, 1:] + emission[t, tokens], trellis[t, :-1] + emission[t, tokens]
+                ),
+            )
         )
     return trellis
 
@@ -146,8 +155,7 @@ def backtrack(trellis, emission, tokens, blank_id=0):
         changed = trellis[t - 1, j - 1] + emission[t - 1, tokens[j - 1]]
 
         # 2. Store the path with frame-wise probability.
-        prob = emission[t - 1, tokens[j - 1]
-                        if changed > stayed else 0].exp().item()
+        prob = emission[t - 1, tokens[j - 1] if changed > stayed else 0].exp().item()
         # Return token index and time index in non-trellis coordinate.
         path.append(Point(j - 1, t - 1, prob))
 
@@ -205,10 +213,10 @@ def merge_words(segments, separator=" "):
             if i1 != i2:
                 segs = segments[i1:i2]
                 word = "".join([seg.label for seg in segs])
-                score = sum(seg.score * seg.length for seg in segs) / \
-                    sum(seg.length for seg in segs)
-                words.append(
-                    Segment(word, segments[i1].start, segments[i2 - 1].end, score))
+                score = sum(seg.score * seg.length for seg in segs) / sum(
+                    seg.length for seg in segs
+                )
+                words.append(Segment(word, segments[i1].start, segments[i2 - 1].end, score))
             i1 = i2 + 1
             i2 = i1
         else:

From 38d866b9d51035f9defd1d0b724739591192d3a6 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 12 Dec 2023 17:25:28 +0100
Subject: [PATCH 165/172] Update READMEs (and add a main one)

---
 README.md         | 12 ++++++++++
 kaldi/README.md   | 26 ++++++++++++----------
 whisper/README.md | 56 ++++++++++++++++++++++++-----------------------
 3 files changed, 55 insertions(+), 39 deletions(-)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..ac846d3
--- /dev/null
+++ b/README.md
@@ -0,0 +1,12 @@
+# LinTO-Platform-STT
+
+LinTO-Platform-STT is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack),
+which can currently work with Speech-To-Text (STT) models.
+The following families of STT models are currently supported (please refer to respective documentation for more details):
+* [Kaldi models](kaldi/README.md) 
+* [Whisper models](whisper/README.md)
+
+LinTO-Platform-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+
+## License
+This project is developped under the AGPLv3 License (see LICENSE).
diff --git a/kaldi/README.md b/kaldi/README.md
index ec70060..9ab215f 100644
--- a/kaldi/README.md
+++ b/kaldi/README.md
@@ -1,7 +1,9 @@
-# LINTO-PLATFORM-STT
-LinTO-platform-stt is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack).
+# LinTO-Platform-STT-Kaldi
 
-LinTO-platform-stt can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-Platform-STT-Kaldi is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
+based on Speech-To-Text (STT) models trained with [Kaldi](https://github.com/kaldi-asr/kaldi).
+
+LinTO-Platform-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## Pre-requisites
 
@@ -12,7 +14,7 @@ To run the transcription models you'll need:
 * One CPU per worker. Inference time scales on CPU performances. 
 
 ### Model
-LinTO-Platform-STT accepts two kinds of models:
+LinTO-Platform-STT-Kaldi accepts two kinds of models:
 * LinTO Acoustic and Languages models.
 * Vosk models.
 
@@ -26,19 +28,19 @@ The transcription service requires docker up and running.
 The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
 On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
 
-## Deploy linto-platform-stt
+## Deploy LinTO-Platform-STT-Kaldi
 
 **1- First step is to build or pull the image:**
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt.git
 cd linto-platform-stt
-docker build . -t linto-platform-stt:latest
+docker build . -f kaldi/Dockerfile -t linto-platform-stt-kaldi:latest
 ```
 or
 
 ```bash
-docker pull lintoai/linto-platform-stt
+docker pull lintoai/linto-platform-stt-kaldi
 ```
 
 **2- Download the models**
@@ -48,7 +50,7 @@ Have the acoustic and language model ready at AM_PATH and LM_PATH if you are usi
 **3- Fill the .env**
 
 ```bash
-cp .envdefault .env
+cp kaldi/.envdefault kaldi/.env
 ```
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
@@ -84,8 +86,8 @@ docker run --rm \
 -p HOST_SERVING_PORT:80 \
 -v AM_PATH:/opt/AM \
 -v LM_PATH:/opt/LM \
---env-file .env \
-linto-platform-stt:latest
+--env-file kaldi/.env \
+linto-platform-stt-kaldi:latest
 ```
 
 This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
@@ -114,8 +116,8 @@ docker run --rm \
 -v AM_PATH:/opt/AM \
 -v LM_PATH:/opt/LM \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
---env-file .env \
-linto-platform-stt:latest
+--env-file kaldi/.env \
+linto-platform-stt-kaldi:latest
 ```
 
 **Parameters:**
diff --git a/whisper/README.md b/whisper/README.md
index 8e6b04d..f460093 100644
--- a/whisper/README.md
+++ b/whisper/README.md
@@ -1,7 +1,9 @@
-# LINTO-PLATFORM-STT
-LinTO-platform-stt is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack).
+# LinTO-Platform-STT-Whisper
 
-LinTO-platform-stt can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-Platform-STT-Whisper is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
+based on Speech-To-Text (STT) [Whisper models](https://openai.com/research/whisper).
+
+LinTO-Platform-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## Pre-requisites
 
@@ -11,24 +13,22 @@ To run the transcription models you'll need:
 * Up to 7GB of RAM depending on the model used.
 * One CPU per worker. Inference time scales on CPU performances. 
 
-### Model
-LinTO-Platform-STT works with two models:
-* A Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
-* A wav2vec model to perform word alignment, which can be in the format of SpeechBrain, HuggingFace's Transformers or TorchAudio
+### Model(s)
+
+LinTO-Platform-STT-Whisper works with a Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
+
+#### Optional alignment model (deprecated)
 
+It can also work with a wav2vec model to perform word alignment.
 The wav2vec model can be specified either
-* with a string corresponding to a `torchaudio` pipeline (e.g. "WAV2VEC2_ASR_BASE_960H") or
-* with a string corresponding to a HuggingFace repository of a wav2vec model (e.g. "jonatasgrosman/wav2vec2-large-xlsr-53-english"), or
-* with a path corresponding to a folder with a SpeechBrain model
-
-Default models are provided for the following languages:
-* French (fr)
-* English (en)
-* Spanish (es)
-* German (de)
-* Dutch (nl)
-* Japanese (ja)
-* Chinese (zh)
+* (TorchAudio) with a string corresponding to a `torchaudio` pipeline (e.g. "WAV2VEC2_ASR_BASE_960H") or
+* (HuggingFace's Transformers) with a string corresponding to a HuggingFace repository of a wav2vec model (e.g. "jonatasgrosman/wav2vec2-large-xlsr-53-english"), or
+* (SpeechBrain) with a path corresponding to a folder with a SpeechBrain model
+
+Default wav2vec models are provided for French (fr), English (en), Spanish (es), German (de), Dutch (nl), Japanese (ja), Chinese (zh).
+
+But we advise not to use a companion wav2vec alignment model.
+This is not needed neither tested anymore.
 
 ### Docker
 The transcription service requires docker up and running.
@@ -37,19 +37,19 @@ The transcription service requires docker up and running.
 The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
 On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
 
-## Deploy linto-platform-stt
+## Deploy LinTO-Platform-STT-Whisper
 
 **1- First step is to build or pull the image:**
 
 ```bash
 git clone https://github.com/linto-ai/linto-platform-stt.git
 cd linto-platform-stt
-docker build . -t linto-platform-stt:latest
+docker build . -f whisper/Dockerfile.ctranslate2 -t linto-platform-stt-whisper:latest
 ```
 or
 
 ```bash
-docker pull lintoai/linto-platform-stt
+docker pull lintoai/linto-platform-stt-whisper
 ```
 
 **2- Download the models**
@@ -77,7 +77,7 @@ If may also want to download a specific wav2vec model for word alignment.
 **3- Fill the .env**
 
 ```bash
-cp .envdefault .env
+cp whisper/.envdefault whisper/.env
 ```
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
@@ -134,8 +134,8 @@ The SERVICE_MODE value in the .env should be set to ```http```.
 docker run --rm \
 -p HOST_SERVING_PORT:80 \
 -v ASR_PATH:/opt/model.pt \
---env-file .env \
-linto-platform-stt:latest
+--env-file whisper/.env \
+linto-platform-stt-whisper:latest
 ```
 
 This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
@@ -169,8 +169,8 @@ You need a message broker up and running at MY_SERVICE_BROKER.
 docker run --rm \
 -v ASR_PATH:/opt/model.pt \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
---env-file .env \
-linto-platform-stt:latest
+--env-file whisper/.env \
+linto-platform-stt-whisper:latest
 ```
 
 You may also want to mount your cache folder CACHE_PATH (e.g. "~/.cache") ```-v CACHE_PATH:/root/.cache```
@@ -267,7 +267,9 @@ This project is developped under the AGPLv3 License (see LICENSE).
 
 ## Acknowlegment.
 
+* [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)
 * [OpenAI Whisper](https://github.com/openai/whisper)
+* [Ctranslate2](https://github.com/OpenNMT/CTranslate2)
 * [SpeechBrain](https://github.com/speechbrain/speechbrain).
 * [TorchAudio](https://github.com/pytorch/audio)
 * [HuggingFace Transformers](https://github.com/huggingface/transformers)
\ No newline at end of file

From 4c8f1c9db10451ae015b3912e2b0c525c5ed4ac4 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Tue, 12 Dec 2023 17:29:50 +0100
Subject: [PATCH 166/172] restart release history for the two new images

---
 kaldi/RELEASE.md   | 55 +++-------------------------------------------
 whisper/RELEASE.md | 17 ++------------
 2 files changed, 5 insertions(+), 67 deletions(-)

diff --git a/kaldi/RELEASE.md b/kaldi/RELEASE.md
index 9966250..4bd02f5 100644
--- a/kaldi/RELEASE.md
+++ b/kaldi/RELEASE.md
@@ -1,52 +1,3 @@
-# 3.3.2
-- Fixed use of stereo audio in http serving mode
-
-# 3.3.1
-- Fixed lin_to_vosk throwing an error on a already existing container.
-- Corrected an error on the README regarding mounting model volumes. 
-- Code styling (PEP 8)
-
-# 3.3.0
-- Added optional streaming route to the http serving mode
-- Added serving mode: websocket
-- Added Dynamic model conversion allowing to use either Vosk Models or Linagora AM/LM models
-- Changer Vosk dependency to alphacep/vosk
-- Updated README.md
-
-# 3.2.1
-- Repository total rework. The goal being to have a simple transcription service embeddable within a micro-service infrastructure. 
-- Changed repository name from linto-platform-stt-standalone-worker to linto-platform-stt.
-- Added celery connector for microservice integration.
-- Added launch option to specify serving mode between task and http.
-- Removed diarization functionnality.
-- Removed punctuation functionnality.
-- Removed Async requests/Job management.
-- Updated README to reflect those changes.
-
-# 3.1.1
-- Change Pykaldi with vosk-API (no python wrapper for decoding function, no extrat packages during installation, c++ implementation based on kaldi functions)
-- New feature: Compute a confidence score per transcription
-- Fix minor bugs
-
-# 2.2.1
-- Fix minor bugs
-- put SWAGGER_PATH parameter as optional
-- Generate the word_boundary file if it does not exist
-
-# 2.2.0
-- Speaker diarization feature: pyBK package
-- Mulithreading feature: Speech decoding and Speaker diarization processes
-- Optional parameter: real number of speaker in the audio
-
-# 2.0.0
-- Reimplement LinTO-Platform-stt-standalone-worker using Pykaldi package
-
-# 1.1.2
-- New features:
-    - Word timestamp computing
-    - Response type: plain/text: simple text output and application/json: the transcription and the words timestamp.
-    - Swagger: integrate swagger in the service using a python package
-    - Fix minor bugs
-
-# 1.0.0
-- First build of LinTO-Platform-stt-standalone-worker
\ No newline at end of file
+#  1.0.0
+- First build of linto-platform-stt-kaldi
+- Based on 3.3.2 of linto-platform-stt (https://github.com/linto-ai/linto-platform-stt/blob/4361300a4463c90cec0bf3fa2975d7cc2ddf8d36/RELEASE.md)
diff --git a/whisper/RELEASE.md b/whisper/RELEASE.md
index 2d57069..53f203c 100644
--- a/whisper/RELEASE.md
+++ b/whisper/RELEASE.md
@@ -1,16 +1,3 @@
 # 1.0.0
-- Support of Whisper (including large-v3 model)
-- Add integration of Whisper models from transformers
-- Add support of prompt from Whisper models (env variable PROMPT)
-- Fix possible failure when a Whisper segment starts with a punctuation
-- Tune punctuation heuristics
-
-# 0.0.0
-- Added optional streaming route to the http serving mode
-- Added serving mode: websocket
-- Added Dynamic model conversion allowing to use either Vosk Models or Linagora AM/LM models
-- Added celery connector for microservice integration.
-- Added launch option to specify serving mode between task and http.
-- Removed Async requests/Job management.
-- New feature: Compute a confidence score per transcription
-- put SWAGGER_PATH parameter as optional
+- First build of linto-platform-stt-whisper
+- Based on 4.0.5 of linto-platform-stt https://github.com/linto-ai/linto-platform-stt/blob/a54b7b7ac2bc491a1795bb6dfb318a39c8b76d63/RELEASE.md

From 4bb3c1d5a32199b5b531640c23c2a84804730013 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 13 Dec 2023 15:06:25 +0100
Subject: [PATCH 167/172] rename linto-platform-stt -> linto-plaftorm

---
 .github/workflows/dockerhub-description.yml | 17 +++++++++++---
 Jenkinsfile                                 |  4 ++--
 README.md                                   |  6 ++---
 document/swagger.yml                        |  2 +-
 http_server/swagger.py                      |  2 +-
 kaldi/README.md                             | 26 ++++++++++-----------
 kaldi/RELEASE.md                            |  4 ++--
 whisper/README.md                           | 26 ++++++++++-----------
 whisper/RELEASE.md                          |  4 ++--
 9 files changed, 51 insertions(+), 40 deletions(-)

diff --git a/.github/workflows/dockerhub-description.yml b/.github/workflows/dockerhub-description.yml
index 0367b21..1301449 100644
--- a/.github/workflows/dockerhub-description.yml
+++ b/.github/workflows/dockerhub-description.yml
@@ -7,7 +7,7 @@ on:
       - README.md
       - .github/workflows/dockerhub-description.yml
 jobs:
-  dockerHubDescription:
+  dockerHubDescriptionKaldi:
     runs-on: ubuntu-latest
     steps:
     - uses: actions/checkout@v3
@@ -16,5 +16,16 @@ jobs:
       with:
         username: ${{ secrets.DOCKERHUB_USERNAME }}
         password: ${{ secrets.DOCKERHUB_PASSWORD }}
-        repository: lintoai/linto-platform-stt
-        readme-filepath: ./README.md
+        repository: lintoai/linto-stt-kaldi
+        readme-filepath: ./kaldi/README.md
+  dockerHubDescriptionWhisper:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Docker Hub Description
+      uses: peter-evans/dockerhub-description@v3
+      with:
+        username: ${{ secrets.DOCKERHUB_USERNAME }}
+        password: ${{ secrets.DOCKERHUB_PASSWORD }}
+        repository: lintoai/linto-stt-whisper
+        readme-filepath: ./whisper/README.md
diff --git a/Jenkinsfile b/Jenkinsfile
index 81d8ec8..99a4886 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -1,8 +1,8 @@
 pipeline {
     agent any
     environment {
-        DOCKER_HUB_REPO_KALDI   = "lintoai/linto-platform-stt-kaldi"
-        DOCKER_HUB_REPO_WHISPER = "lintoai/linto-platform-stt-whisper"
+        DOCKER_HUB_REPO_KALDI   = "lintoai/linto-stt-kaldi"
+        DOCKER_HUB_REPO_WHISPER = "lintoai/linto-stt-whisper"
         DOCKER_HUB_CRED = 'docker-hub-credentials'
     }
 
diff --git a/README.md b/README.md
index ac846d3..09009fe 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,12 @@
-# LinTO-Platform-STT
+# LinTO-STT
 
-LinTO-Platform-STT is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack),
+LinTO-STT is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack),
 which can currently work with Speech-To-Text (STT) models.
 The following families of STT models are currently supported (please refer to respective documentation for more details):
 * [Kaldi models](kaldi/README.md) 
 * [Whisper models](whisper/README.md)
 
-LinTO-Platform-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## License
 This project is developped under the AGPLv3 License (see LICENSE).
diff --git a/document/swagger.yml b/document/swagger.yml
index 70bc9fc..6da4ed6 100644
--- a/document/swagger.yml
+++ b/document/swagger.yml
@@ -2,7 +2,7 @@ swagger: "2.0"
 
 info:
   version: "1.0.0"
-  title: LinTo-Platform-STT
+  title: LinTo-STT
   description: Speech To Text API
   contact:
     email: "support@linto.ai"
diff --git a/http_server/swagger.py b/http_server/swagger.py
index a9b93d0..31344cd 100644
--- a/http_server/swagger.py
+++ b/http_server/swagger.py
@@ -11,7 +11,7 @@ def setupSwaggerUI(app, args):
         args.swagger_prefix + args.swagger_url,
         args.swagger_path,
         config={  # Swagger UI config overrides
-            "app_name": "LinTO Platform STT",
+            "app_name": "LinTO STT",
             "spec": swagger_yml,
         },
     )
diff --git a/kaldi/README.md b/kaldi/README.md
index 9ab215f..444d3b3 100644
--- a/kaldi/README.md
+++ b/kaldi/README.md
@@ -1,9 +1,9 @@
-# LinTO-Platform-STT-Kaldi
+# LinTO-STT-Kaldi
 
-LinTO-Platform-STT-Kaldi is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
+LinTO-STT-Kaldi is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
 based on Speech-To-Text (STT) models trained with [Kaldi](https://github.com/kaldi-asr/kaldi).
 
-LinTO-Platform-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## Pre-requisites
 
@@ -14,7 +14,7 @@ To run the transcription models you'll need:
 * One CPU per worker. Inference time scales on CPU performances. 
 
 ### Model
-LinTO-Platform-STT-Kaldi accepts two kinds of models:
+LinTO-STT-Kaldi accepts two kinds of models:
 * LinTO Acoustic and Languages models.
 * Vosk models.
 
@@ -28,19 +28,19 @@ The transcription service requires docker up and running.
 The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
 On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
 
-## Deploy LinTO-Platform-STT-Kaldi
+## Deploy LinTO-STT-Kaldi
 
 **1- First step is to build or pull the image:**
 
 ```bash
-git clone https://github.com/linto-ai/linto-platform-stt.git
-cd linto-platform-stt
-docker build . -f kaldi/Dockerfile -t linto-platform-stt-kaldi:latest
+git clone https://github.com/linto-ai/linto-stt.git
+cd linto-stt
+docker build . -f kaldi/Dockerfile -t linto-stt-kaldi:latest
 ```
 or
 
 ```bash
-docker pull lintoai/linto-platform-stt-kaldi
+docker pull lintoai/linto-stt-kaldi
 ```
 
 **2- Download the models**
@@ -87,7 +87,7 @@ docker run --rm \
 -v AM_PATH:/opt/AM \
 -v LM_PATH:/opt/LM \
 --env-file kaldi/.env \
-linto-platform-stt-kaldi:latest
+linto-stt-kaldi:latest
 ```
 
 This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
@@ -105,8 +105,8 @@ The HTTP serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.
 
->LinTO-platform-stt can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
->LinTO-platform-stt in task mode is not intended to be launch manually.
+>LinTO-STT-Kaldi can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
+>LinTO-STT-Kaldi in task mode is not intended to be launch manually.
 >However, if you intent to connect it to your custom message's broker here are the parameters:
 
 You need a message broker up and running at MY_SERVICE_BROKER.
@@ -117,7 +117,7 @@ docker run --rm \
 -v LM_PATH:/opt/LM \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file kaldi/.env \
-linto-platform-stt-kaldi:latest
+linto-stt-kaldi:latest
 ```
 
 **Parameters:**
diff --git a/kaldi/RELEASE.md b/kaldi/RELEASE.md
index 4bd02f5..e11f89a 100644
--- a/kaldi/RELEASE.md
+++ b/kaldi/RELEASE.md
@@ -1,3 +1,3 @@
 #  1.0.0
-- First build of linto-platform-stt-kaldi
-- Based on 3.3.2 of linto-platform-stt (https://github.com/linto-ai/linto-platform-stt/blob/4361300a4463c90cec0bf3fa2975d7cc2ddf8d36/RELEASE.md)
+- First build of linto-stt-kaldi
+- Based on 3.3.2 of linto-stt (https://github.com/linto-ai/linto-stt/blob/4361300a4463c90cec0bf3fa2975d7cc2ddf8d36/RELEASE.md)
diff --git a/whisper/README.md b/whisper/README.md
index f460093..f1eccd1 100644
--- a/whisper/README.md
+++ b/whisper/README.md
@@ -1,9 +1,9 @@
-# LinTO-Platform-STT-Whisper
+# LinTO-STT-Whisper
 
-LinTO-Platform-STT-Whisper is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
+LinTO-STT-Whisper is the transcription service within the [LinTO stack](https://github.com/linto-ai/linto-platform-stack)
 based on Speech-To-Text (STT) [Whisper models](https://openai.com/research/whisper).
 
-LinTO-Platform-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
 ## Pre-requisites
 
@@ -15,7 +15,7 @@ To run the transcription models you'll need:
 
 ### Model(s)
 
-LinTO-Platform-STT-Whisper works with a Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
+LinTO-STT-Whisper works with a Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
 
 #### Optional alignment model (deprecated)
 
@@ -37,19 +37,19 @@ The transcription service requires docker up and running.
 The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS.
 On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
 
-## Deploy LinTO-Platform-STT-Whisper
+## Deploy LinTO-STT-Whisper
 
 **1- First step is to build or pull the image:**
 
 ```bash
-git clone https://github.com/linto-ai/linto-platform-stt.git
-cd linto-platform-stt
-docker build . -f whisper/Dockerfile.ctranslate2 -t linto-platform-stt-whisper:latest
+git clone https://github.com/linto-ai/linto-stt.git
+cd linto-stt
+docker build . -f whisper/Dockerfile.ctranslate2 -t linto-stt-whisper:latest
 ```
 or
 
 ```bash
-docker pull lintoai/linto-platform-stt-whisper
+docker pull lintoai/linto-stt-whisper
 ```
 
 **2- Download the models**
@@ -135,7 +135,7 @@ docker run --rm \
 -p HOST_SERVING_PORT:80 \
 -v ASR_PATH:/opt/model.pt \
 --env-file whisper/.env \
-linto-platform-stt-whisper:latest
+linto-stt-whisper:latest
 ```
 
 This will run a container providing an [HTTP API](#http-api) binded on the host HOST_SERVING_PORT port.
@@ -159,8 +159,8 @@ The HTTP serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.
 
->LinTO-platform-stt can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
->LinTO-platform-stt in task mode is not intended to be launch manually.
+>LinTO-STT-Whisper can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
+>LinTO-STT-Whisper in task mode is not intended to be launch manually.
 >However, if you intent to connect it to your custom message's broker here are the parameters:
 
 You need a message broker up and running at MY_SERVICE_BROKER.
@@ -170,7 +170,7 @@ docker run --rm \
 -v ASR_PATH:/opt/model.pt \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
 --env-file whisper/.env \
-linto-platform-stt-whisper:latest
+linto-stt-whisper:latest
 ```
 
 You may also want to mount your cache folder CACHE_PATH (e.g. "~/.cache") ```-v CACHE_PATH:/root/.cache```
diff --git a/whisper/RELEASE.md b/whisper/RELEASE.md
index 53f203c..2967139 100644
--- a/whisper/RELEASE.md
+++ b/whisper/RELEASE.md
@@ -1,3 +1,3 @@
 # 1.0.0
-- First build of linto-platform-stt-whisper
-- Based on 4.0.5 of linto-platform-stt https://github.com/linto-ai/linto-platform-stt/blob/a54b7b7ac2bc491a1795bb6dfb318a39c8b76d63/RELEASE.md
+- First build of linto-stt-whisper
+- Based on 4.0.5 of linto-stt https://github.com/linto-ai/linto-stt/blob/a54b7b7ac2bc491a1795bb6dfb318a39c8b76d63/RELEASE.md

From ff42bd4ad09c740a13f8faa8d6d6ab1665082aa2 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 13 Dec 2023 15:37:30 +0100
Subject: [PATCH 168/172] remove wrong part

---
 kaldi/README.md   | 4 ----
 whisper/README.md | 4 ----
 2 files changed, 8 deletions(-)

diff --git a/kaldi/README.md b/kaldi/README.md
index 444d3b3..3bfa6c8 100644
--- a/kaldi/README.md
+++ b/kaldi/README.md
@@ -105,10 +105,6 @@ The HTTP serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.
 
->LinTO-STT-Kaldi can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
->LinTO-STT-Kaldi in task mode is not intended to be launch manually.
->However, if you intent to connect it to your custom message's broker here are the parameters:
-
 You need a message broker up and running at MY_SERVICE_BROKER.
 
 ```bash
diff --git a/whisper/README.md b/whisper/README.md
index f1eccd1..015eb42 100644
--- a/whisper/README.md
+++ b/whisper/README.md
@@ -159,10 +159,6 @@ The HTTP serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.
 
->LinTO-STT-Whisper can be deployed within the linto-platform-stack through the use of linto-platform-services-manager. Used this way, the container spawn celery worker waiting for transcription task on a message broker.
->LinTO-STT-Whisper in task mode is not intended to be launch manually.
->However, if you intent to connect it to your custom message's broker here are the parameters:
-
 You need a message broker up and running at MY_SERVICE_BROKER.
 
 ```bash

From 690b7ee813c78b7ea82c9e3c9a8a643ef5ecc2a3 Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 13 Dec 2023 16:07:01 +0100
Subject: [PATCH 169/172] update the README with most common usage

---
 whisper/README.md | 67 +++++++++++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/whisper/README.md b/whisper/README.md
index 015eb42..d22fd72 100644
--- a/whisper/README.md
+++ b/whisper/README.md
@@ -15,11 +15,13 @@ To run the transcription models you'll need:
 
 ### Model(s)
 
-LinTO-STT-Whisper works with a Whisper model to perform Automatic Speech Recognition, which must be in the PyTorch format.
+LinTO-STT-Whisper works with a Whisper model to perform Automatic Speech Recognition.
+If not downloaded already, the model will be downloaded when calling the first transcription,
+and can occupy several GB of disk space.
 
 #### Optional alignment model (deprecated)
 
-It can also work with a wav2vec model to perform word alignment.
+LinTO-STT-Whisper has also the option to work with a wav2vec model to perform word alignment.
 The wav2vec model can be specified either
 * (TorchAudio) with a string corresponding to a `torchaudio` pipeline (e.g. "WAV2VEC2_ASR_BASE_960H") or
 * (HuggingFace's Transformers) with a string corresponding to a HuggingFace repository of a wav2vec model (e.g. "jonatasgrosman/wav2vec2-large-xlsr-53-english"), or
@@ -39,7 +41,7 @@ On addition, as to prevent large audio from transiting through the message broke
 
 ## Deploy LinTO-STT-Whisper
 
-**1- First step is to build or pull the image:**
+### 1- First step is to build or pull the image
 
 ```bash
 git clone https://github.com/linto-ai/linto-stt.git
@@ -52,29 +54,7 @@ or
 docker pull lintoai/linto-stt-whisper
 ```
 
-**2- Download the models**
-
-Have the Whisper model file ready at ASR_PATH.
-
-If you already used Whisper in the past, you may have models in ~/.cache/whisper.
-
-You can download mutli-lingual Whisper models with the following links:
-* tiny: "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
-* base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
-* small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
-* medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
-* large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
-* large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
-
-Whisper models specialized for English can also be found here:
-* tiny.en: "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
-* base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
-* small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
-* medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
-
-If may also want to download a specific wav2vec model for word alignment.
-
-**3- Fill the .env**
+### 2- Fill the .env
 
 ```bash
 cp whisper/.envdefault whisper/.env
@@ -83,7 +63,7 @@ cp whisper/.envdefault whisper/.env
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
 | SERVICE_MODE | STT serving mode see [Serving mode](#serving-mode) | `http` \| `task` |
-| MODEL | Path to the Whisper model, or type of Whisper model used. | \<ASR_PATH\> \| `medium` \| `large-v1` \| ... |
+| MODEL | Path to a Whisper model, type of Whisper model used, or HuggingFace identifier of a Whisper model. | \<ASR_PATH\> \| `large-v3` \| `distil-whisper/distil-large-v2` \| ... |
 | LANGUAGE | (Optional) Language to recognize | `*` \| `fr` \| `fr-FR` \| `French` \| `en` \| `en-US` \| `English` \| ... |
 | PROMPT | (Optional) Prompt to use for the Whisper model | `some free text to encourage a certain transcription style (disfluencies, no punctuation, ...)` |
 | ALIGNMENT_MODEL | (Optional) Path to the wav2vec model for word alignment, or name of HuggingFace repository or torchaudio pipeline | \<WAV2VEC_PATH\> \| `WAV2VEC2_ASR_BASE_960H` \| `jonatasgrosman/wav2vec2-large-xlsr-53-english` \| ... |
@@ -92,6 +72,36 @@ cp whisper/.envdefault whisper/.env
 | SERVICE_BROKER | (For the task mode) URL of the message broker | `redis://my-broker:6379` |
 | BROKER_PASS | (For the task mode only) broker password | `my-password` |
 
+#### MODEL environment variable
+
+**Warning:**
+The model will be (downloaded if required and) loaded in memory when calling the first transcription.
+When using a Whisper model from Hugging Face (transformers) along with ctranslate2 (faster_whisper),
+it will also download torch library to make the conversion from torch to ctranslate2.
+
+If you want to preload the model (and later specify a path `ASR_PATH` as `MODEL`),
+you may want to download one of OpenAI Whisper models:
+* Mutli-lingual Whisper models can be downloaded with the following links:
+    * [tiny](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)
+    * [base](https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt)
+    * [small](https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt)
+    * [medium](https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt)
+    * [large-v1](https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt)
+    * [large-v2](https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt)
+    * [large-v3](https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt)
+* Whisper models specialized for English can also be found here:
+    * [tiny.en](https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt)
+    * [base.en](https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt)
+    * [small.en](https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt)
+    * [medium.en](https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt)
+
+If you already used Whisper in the past locally using [OpenAI-Whipser](https://github.com/openai/whisper), models can be found under ~/.cache/whisper.
+
+The same apply for Whisper models from Hugging Face (transformers), as for instance https://huggingface.co/distil-whisper/distil-large-v2
+(you can either download the model or use the Hugging Face identifier `distil-whisper/distil-large-v2`).
+
+#### LANGUAGE
+
 If `*` is used for the `LANGUAGE` environment variable, or if `LANGUAGE` is not defined,
 automatic language detection will be performed by Whisper.
 
@@ -113,6 +123,7 @@ sv(swedish), sw(swahili), ta(tamil), te(telugu), tg(tajik), th(thai), tk(turkmen
 tr(turkish), tt(tatar), uk(ukrainian), ur(urdu), uz(uzbek), vi(vietnamese), yi(yiddish),
 yo(yoruba), zh(chinese)
 ```
+and also `yue(cantonese)` since large-v3.
 
 ### Serving mode 
 ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
@@ -266,6 +277,6 @@ This project is developped under the AGPLv3 License (see LICENSE).
 * [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)
 * [OpenAI Whisper](https://github.com/openai/whisper)
 * [Ctranslate2](https://github.com/OpenNMT/CTranslate2)
-* [SpeechBrain](https://github.com/speechbrain/speechbrain).
+* [SpeechBrain](https://github.com/speechbrain/speechbrain)
 * [TorchAudio](https://github.com/pytorch/audio)
 * [HuggingFace Transformers](https://github.com/huggingface/transformers)
\ No newline at end of file

From cfaaaf0a8835cd1a2b3e5cbc08ac37e253ba9048 Mon Sep 17 00:00:00 2001
From: Houpert <yhoupert@linagora.com>
Date: Wed, 13 Dec 2023 16:09:19 +0100
Subject: [PATCH 170/172] Update Jenkinsfile with new structure

---
 Jenkinsfile | 123 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 84 insertions(+), 39 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 99a4886..2548775 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -1,73 +1,118 @@
+def buildWhisper(image_name, version) {
+    echo "Building Dockerfile for ${image_name}... with version ${version}"
+
+    script {
+        def image = docker.build(image_name, "-f whisper/Dockerfile.ctranslate2 .")
+
+        docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
+            if (version  == 'latest-unstable') {
+                image.push('latest-unstable')
+            } else {
+                image.push('latest')
+                image.push(version)
+            }
+        }
+    }
+}
+
+def buildKaldi(image_name, version) {
+    echo "Building Dockerfile for ${image_name}... with version ${version}"
+
+    script {
+        def image = docker.build(image_name, "-f kaldi/Dockerfile  .")
+
+        docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
+            if (version  == 'latest-unstable') {
+                image.push('latest-unstable')
+            } else {
+                image.push('latest')
+                image.push(version)
+            }
+        }
+    }
+}
+
 pipeline {
     agent any
     environment {
         DOCKER_HUB_REPO_KALDI   = "lintoai/linto-stt-kaldi"
         DOCKER_HUB_REPO_WHISPER = "lintoai/linto-stt-whisper"
-        DOCKER_HUB_CRED = 'docker-hub-credentials'
+
+        VERSION_KALDI = ''
+        VERSION_WHISPER = ''
     }
 
-    stages{
-        stage('Docker build for master branch'){
-            when{
+    stages {
+        stage('Docker build for master branch') {
+            when {
                 branch 'master'
             }
             steps {
                 echo 'Publishing latest'
                 script {
-                    image = docker.build(env.DOCKER_HUB_REPO_KALDI, "-f kaldi/Dockerfile .")
-                    VERSION = sh(
+                    def changedFiles = sh(returnStdout: true, script: 'git diff --name-only HEAD^ HEAD').trim()
+                    echo "My changed files: ${changedFiles}"
+                    
+                    VERSION_KALDI = sh(
                         returnStdout: true, 
                         script: "awk -v RS='' '/#/ {print; exit}' kaldi/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
 
-                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push("${VERSION}")
-                        image.push('latest')
-                    }
-                }
-                script {
-                    image = docker.build(env.DOCKER_HUB_REPO_WHISPER, "-f whisper/Dockerfile.ctranslate2 .")
-                    VERSION = sh(
+                    VERSION_WHISPER = sh(
                         returnStdout: true, 
                         script: "awk -v RS='' '/#/ {print; exit}' whisper/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
+                    
+                    if (changedFiles.contains('celery_app') || changedFiles.contains('http_server') || changedFiles.contains('websocket') || changedFiles.contains('document')) {
+                        echo "Build kaldi version ${VERSION_KALDI}"
+                        buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION_KALDI)
 
-                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push("${VERSION}")
-                        image.push('latest')
+                        echo "Build whisper version ${VERSION_WHISPER}"
+                        buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION_WHISPER)
+                    }else {
+                        if (changedFiles.contains('kaldi')) {
+                        echo "Build kaldi version ${VERSION_KALDI}"
+                            buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION_KALDI)
+                        }
+                        if (changedFiles.contains('whisper')) {
+                            echo "Build whisper version ${VERSION_WHISPER}"
+                            buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION_WHISPER)
+                        }
                     }
                 }
             }
         }
 
-        stage('Docker build for next (unstable) branch'){
-            when{
+        stage('Docker build for next (unstable) branch') {
+            when {
                 branch 'next'
             }
             steps {
                 echo 'Publishing unstable'
                 script {
-                    image = docker.build(env.DOCKER_HUB_REPO_KALDI, "-f kaldi/Dockerfile .")
-                    VERSION = sh(
-                        returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' kaldi/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
-                    ).trim()
-                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push('latest-unstable')
-                    }
-                }
-                script {
-                    image = docker.build(env.DOCKER_HUB_REPO_WHISPER, "-f whisper/Dockerfile.ctranslate2 .")
-                    VERSION = sh(
-                        returnStdout: true, 
-                        script: "awk -v RS='' '/#/ {print; exit}' whisper/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
-                    ).trim()
-                    docker.withRegistry('https://registry.hub.docker.com', env.DOCKER_HUB_CRED) {
-                        image.push('latest-unstable')
+                    def changedFiles = sh(returnStdout: true, script: 'git diff --name-only HEAD^ HEAD').trim()
+                    echo "My changed files: ${changedFiles}"
+                    
+                    VERSION = 'latest-unstable'
+                    
+                    if (changedFiles.contains('celery_app') || changedFiles.contains('http_server') || changedFiles.contains('websocket') || changedFiles.contains('document')) {
+                        echo 'Files in studio-api path are modified. Running specific build steps for studio-api...'
+                        echo "Build whisper and kaldi version ${VERSION}"
+
+                        buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION)
+                        buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION)
+                    }else {
+                        if (changedFiles.contains('kaldi')) {
+                            echo "Build kaldi version ${VERSION}"
+                            buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION)
+                        }
+                        if (changedFiles.contains('whisper')) {
+                            echo "Build whisper version ${VERSION}"
+                            buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION)
+                        }
                     }
                 }
             }
         }
-
-    }// end stages
-}
\ No newline at end of file
+    }
+}

From 9035609fca366d5e0ad3b82f1284b68131dfc07d Mon Sep 17 00:00:00 2001
From: Jeronymous <jeromelouradour@gmail.com>
Date: Wed, 13 Dec 2023 17:00:29 +0100
Subject: [PATCH 171/172] code factoring

---
 Jenkinsfile | 88 +++++++++++++----------------------------------------
 1 file changed, 21 insertions(+), 67 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 2548775..cd1ad07 100644
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -1,32 +1,17 @@
-def buildWhisper(image_name, version) {
-    echo "Building Dockerfile for ${image_name}... with version ${version}"
+def buildDockerfile(main_folder, dockerfilePath, image_name, version, changedFiles) {
+    if (changedFiles.contains(main_folder) || changedFiles.contains('celery_app') || changedFiles.contains('http_server') || changedFiles.contains('websocket') || changedFiles.contains('document')) {
+        echo "Building Dockerfile for ${image_name} with version ${version} (using ${dockerfilePath})"
 
-    script {
-        def image = docker.build(image_name, "-f whisper/Dockerfile.ctranslate2 .")
+        script {
+            def image = docker.build(image_name, "-f ${dockerfilePath} .")
 
-        docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
-            if (version  == 'latest-unstable') {
-                image.push('latest-unstable')
-            } else {
-                image.push('latest')
-                image.push(version)
-            }
-        }
-    }
-}
-
-def buildKaldi(image_name, version) {
-    echo "Building Dockerfile for ${image_name}... with version ${version}"
-
-    script {
-        def image = docker.build(image_name, "-f kaldi/Dockerfile  .")
-
-        docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
-            if (version  == 'latest-unstable') {
-                image.push('latest-unstable')
-            } else {
-                image.push('latest')
-                image.push(version)
+            docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
+                if (version  == 'latest-unstable') {
+                    image.push('latest-unstable')
+                } else {
+                    image.push('latest')
+                    image.push(version)
+                }
             }
         }
     }
@@ -37,11 +22,8 @@ pipeline {
     environment {
         DOCKER_HUB_REPO_KALDI   = "lintoai/linto-stt-kaldi"
         DOCKER_HUB_REPO_WHISPER = "lintoai/linto-stt-whisper"
-
-        VERSION_KALDI = ''
-        VERSION_WHISPER = ''
     }
-
+    
     stages {
         stage('Docker build for master branch') {
             when {
@@ -53,32 +35,18 @@ pipeline {
                     def changedFiles = sh(returnStdout: true, script: 'git diff --name-only HEAD^ HEAD').trim()
                     echo "My changed files: ${changedFiles}"
                     
-                    VERSION_KALDI = sh(
+                    version_kaldi = sh(
                         returnStdout: true, 
                         script: "awk -v RS='' '/#/ {print; exit}' kaldi/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
 
-                    VERSION_WHISPER = sh(
+                    version_whisper = sh(
                         returnStdout: true, 
                         script: "awk -v RS='' '/#/ {print; exit}' whisper/RELEASE.md | head -1 | sed 's/#//' | sed 's/ //'"
                     ).trim()
-                    
-                    if (changedFiles.contains('celery_app') || changedFiles.contains('http_server') || changedFiles.contains('websocket') || changedFiles.contains('document')) {
-                        echo "Build kaldi version ${VERSION_KALDI}"
-                        buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION_KALDI)
 
-                        echo "Build whisper version ${VERSION_WHISPER}"
-                        buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION_WHISPER)
-                    }else {
-                        if (changedFiles.contains('kaldi')) {
-                        echo "Build kaldi version ${VERSION_KALDI}"
-                            buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION_KALDI)
-                        }
-                        if (changedFiles.contains('whisper')) {
-                            echo "Build whisper version ${VERSION_WHISPER}"
-                            buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION_WHISPER)
-                        }
-                    }
+                    buildDockerfile('kaldi', 'kaldi/Dockerfile', env.DOCKER_HUB_REPO_KALDI, version_kaldi, changedFiles)
+                    buildDockerfile('whisper', 'whisper/Dockerfile.ctranslate2', env.DOCKER_HUB_REPO_WHISPER, version_whisper, changedFiles)
                 }
             }
         }
@@ -93,26 +61,12 @@ pipeline {
                     def changedFiles = sh(returnStdout: true, script: 'git diff --name-only HEAD^ HEAD').trim()
                     echo "My changed files: ${changedFiles}"
                     
-                    VERSION = 'latest-unstable'
-                    
-                    if (changedFiles.contains('celery_app') || changedFiles.contains('http_server') || changedFiles.contains('websocket') || changedFiles.contains('document')) {
-                        echo 'Files in studio-api path are modified. Running specific build steps for studio-api...'
-                        echo "Build whisper and kaldi version ${VERSION}"
+                    version = 'latest-unstable'
 
-                        buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION)
-                        buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION)
-                    }else {
-                        if (changedFiles.contains('kaldi')) {
-                            echo "Build kaldi version ${VERSION}"
-                            buildKaldi(env.DOCKER_HUB_REPO_KALDI, VERSION)
-                        }
-                        if (changedFiles.contains('whisper')) {
-                            echo "Build whisper version ${VERSION}"
-                            buildWhisper(env.DOCKER_HUB_REPO_WHISPER, VERSION)
-                        }
-                    }
+                    buildDockerfile('kaldi', 'kaldi/Dockerfile', env.DOCKER_HUB_REPO_KALDI, version, changedFiles)
+                    buildDockerfile('whisper', 'whisper/Dockerfile.ctranslate2', env.DOCKER_HUB_REPO_WHISPER, version, changedFiles)
                 }
             }
         }
     }
-}
+}
\ No newline at end of file

From 3b29d7163c0ad25856e5589932f1f43605cceefc Mon Sep 17 00:00:00 2001
From: gaydmi <gainullin.dima@gmail.com>
Date: Mon, 18 Dec 2023 10:21:17 +0100
Subject: [PATCH 172/172] Fixed small typos in README file

---
 kaldi/README.md   | 2 +-
 whisper/README.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kaldi/README.md b/kaldi/README.md
index 3bfa6c8..0e3a31a 100644
--- a/kaldi/README.md
+++ b/kaldi/README.md
@@ -101,7 +101,7 @@ This will run a container providing an [HTTP API](#http-api) binded on the host
 | MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
 
 ### Micro-service within LinTO-Platform stack
-The HTTP serving mode connect a celery worker to a message broker.
+The TASK serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.
 
diff --git a/whisper/README.md b/whisper/README.md
index d22fd72..20a3c7d 100644
--- a/whisper/README.md
+++ b/whisper/README.md
@@ -166,7 +166,7 @@ you can add option ```-v WAV2VEC_PATH:/opt/wav2vec``` and environment variable `
 | WAV2VEC_PATH | (Optional) Path to a folder to a custom wav2vec alignment model |  /my/path/to/models/wav2vec |
 
 ### Micro-service within LinTO-Platform stack
-The HTTP serving mode connect a celery worker to a message broker.
+The TASK serving mode connect a celery worker to a message broker.
 
 The SERVICE_MODE value in the .env should be set to ```task```.