This commit is contained in:
√(noham)² 2024-07-18 00:42:59 +02:00
parent 5c9313c4ca
commit 3cf13b815c
180 changed files with 34499 additions and 2 deletions

13
.gitignore vendored
View File

@ -127,7 +127,16 @@ dmypy.json
# node
node_modules/
*.tar
/test
# /yolov7-setup
/yolov7-tracker-example
*.tar
# /yolov7-tracker-example
/yolov7-tracker-example/cfg/training/yolov7x_dataset1_2024_06_19.yaml
/yolov7-tracker-example/data/dataset1_2024_06_19
/yolov7-tracker-example/runs
/yolov7-tracker-example/tracker/config_files/dataset1_2024_06_19.yaml
/yolov7-tracker-example/wandb
/yolov7-tracker-example/info_SF.txt
/yolov7-tracker-example/400m.mp4

View File

@ -0,0 +1,674 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.

View File

@ -0,0 +1,194 @@
# YOLO detector and SOTA Multi-object tracker Toolbox
## ❗❗Important Notes
Compared to the previous version, this is an ***entirely new version (branch v2)***!!!
**Please use this version directly, as I have almost rewritten all the code to ensure better readability and improved results, as well as to correct some errors in the past code.**
```bash
git clone https://github.com/JackWoo0831/Yolov7-tracker.git
git checkout v2 # change to v2 branch !!
```
🙌 ***If you have any suggestions for adding trackers***, please leave a comment in the Issues section with the paper title or link! Everyone is welcome to contribute to making this repo better.
<div align="center">
**Language**: English | [简体中文](README_CN.md)
</div>
## ❤️ Introduction
This repo is a toolbox that implements the **tracking-by-detection paradigm multi-object tracker**. The detector supports:
- YOLOX
- YOLO v7
- YOLO v8,
and the tracker supports:
- SORT
- DeepSORT
- ByteTrack ([ECCV2022](https://arxiv.org/pdf/2110.06864))
- Bot-SORT ([arxiv2206](https://arxiv.org/pdf/2206.14651.pdf))
- OCSORT ([CVPR2023](https://openaccess.thecvf.com/content/CVPR2023/papers/Cao_Observation-Centric_SORT_Rethinking_SORT_for_Robust_Multi-Object_Tracking_CVPR_2023_paper.pdf))
- C_BIoU Track ([arxiv2211](https://arxiv.org/pdf/2211.14317v2.pdf))
- Strong SORT ([IEEE TMM 2023](https://arxiv.org/pdf/2202.13514))
- Sparse Track ([arxiv 2306](https://arxiv.org/pdf/2306.05238))
and the reid model supports:
- OSNet
- Extractor from DeepSort
The highlights are:
- Supporting more trackers than MMTracking
- Rewrite multiple trackers with a ***unified code style***, without the need to configure multiple environments for each tracker
- Modular design, which ***decouples*** the detector, tracker, reid model and Kalman filter for easy conducting experiments
![gif](figure/demo.gif)
## 🗺️ Roadmap
- [ x ] Add StrongSort and SparseTrack
- [ x ] Add save video function
- [ x ] Add timer function to calculate fps
- [] Add more ReID modules.
## 🔨 Installation
The basic env is:
- Ubuntu 18.04
- Python3.9, Pytorch: 1.12
Run following commond to install other packages:
```bash
pip3 install -r requirements.txt
```
### 🔍 Detector installation
1. YOLOX:
The version of YOLOX is **0.1.0 (same as ByteTrack)**. To install it, you can clone the ByteTrack repo somewhere, and run:
``` bash
https://github.com/ifzhang/ByteTrack.git
python3 setup.py develop
```
2. YOLO v7:
There is no need to execute addtional steps as the repo itself is based on YOLOv7.
3. YOLO v8:
Please run:
```bash
pip3 install ultralytics==8.0.94
```
### 📑 Data preparation
***If you do not want to test on the specific dataset, instead, you only want to run demos, please skip this section.***
***No matter what dataset you want to test, please organize it in the following way (YOLO style):***
```
dataset_name
|---images
|---train
|---sequence_name1
|---000001.jpg
|---000002.jpg ...
|---val ...
|---test ...
|
```
You can refer to the codes in `./tools` to see how to organize the datasets.
***Then, you need to prepare a `yaml` file to indicate the path so that the code can find the images.***
Some examples are in `tracker/config_files`. The important keys are:
```
DATASET_ROOT: '/data/xxxx/datasets/MOT17' # your dataset root
SPLIT: test # train, test or val
CATEGORY_NAMES: # same in YOLO training
- 'pedestrian'
CATEGORY_DICT:
0: 'pedestrian'
```
## 🚗 Practice
### 🏃 Training
Trackers generally do not require parameters to be trained. Please refer to the training methods of different detectors to train YOLOs.
Some references may help you:
- YOLOX: `tracker/yolox_utils/train_yolox.py`
- YOLO v7:
```shell
python train_aux.py --dataset visdrone --workers 8 --device <$GPU_id$> --batch-size 16 --data data/visdrone_all.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights <$YOLO v7 pretrained model path$> --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yaml
```
- YOLO v8: `tracker/yolov8_utils/train_yolov8.py`
### 😊 Tracking !
If you only want to run a demo:
```bash
python tracker/track_demo.py --obj ${video path or images folder path} --detector ${yolox, yolov8 or yolov7} --tracker ${tracker name} --kalman_format ${kalman format, sort, byte, ...} --detector_model_path ${detector weight path} --save_images
```
For example:
```bash
python tracker/track_demo.py --obj M0203.mp4 --detector yolov8 --tracker deepsort --kalman_format byte --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt --save_images
```
If you want to run trackers on dataset:
```bash
python tracker/track.py --dataset ${dataset name, related with the yaml file} --detector ${yolox, yolov8 or yolov7} --tracker ${tracker name} --kalman_format ${kalman format, sort, byte, ...} --detector_model_path ${detector weight path}
```
For example:
- SORT: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker sort --kalman_format sort --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt `
- DeepSORT: `python tracker/track.py --dataset uavdt --detector yolov7 --tracker deepsort --kalman_format byte --detector_model_path weights/yolov7_UAVDT_35epochs_20230507.pt`
- ByteTrack: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker bytetrack --kalman_format byte --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- OCSort: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker ocsort --kalman_format ocsort --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- C-BIoU Track: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker c_bioutrack --kalman_format bot --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- BoT-SORT: `python tracker/track.py --dataset uavdt --detector yolox --tracker botsort --kalman_format bot --detector_model_path weights/yolox_m_uavdt_50epochs.pth.tar`
- Strong SORT: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker strongsort --kalman_format strongsort --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- Sparse Track: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker sparsetrack --kalman_format bot --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
### ✅ Evaluation
Coming Soon. As an alternative, after obtaining the result txt file, you can use the [Easier to use TrackEval repo](https://github.com/JackWoo0831/Easier_To_Use_TrackEval).

View File

@ -0,0 +1,186 @@
# YOLO检测器与SOTA多目标跟踪工具箱
## ❗❗重要提示
与之前的版本相比,这是一个***全新的版本分支v2***
**请直接使用这个版本,因为我几乎重写了所有代码,以确保更好的可读性和改进的结果,并修正了以往代码中的一些错误。**
```bash
git clone https://github.com/JackWoo0831/Yolov7-tracker.git
git checkout v2 # change to v2 branch !!
```
🙌 ***如果您有任何关于添加跟踪器的建议***请在Issues部分留言并附上论文标题或链接欢迎大家一起来让这个repo变得更好
## ❤️ 介绍
这个仓库是一个实现了***检测后跟踪范式***多目标跟踪器的工具箱。检测器支持:
- YOLOX
- YOLO v7
- YOLO v8,
跟踪器支持:
- SORT
- DeepSORT
- ByteTrack ([ECCV2022](https://arxiv.org/pdf/2110.06864))
- Bot-SORT ([arxiv2206](https://arxiv.org/pdf/2206.14651.pdf))
- OCSORT ([CVPR2023](https://openaccess.thecvf.com/content/CVPR2023/papers/Cao_Observation-Centric_SORT_Rethinking_SORT_for_Robust_Multi-Object_Tracking_CVPR_2023_paper.pdf))
- C_BIoU Track ([arxiv2211](https://arxiv.org/pdf/2211.14317v2.pdf))
- Strong SORT ([IEEE TMM 2023](https://arxiv.org/pdf/2202.13514))
- Sparse Track ([arxiv 2306](https://arxiv.org/pdf/2306.05238))
REID模型支持
- OSNet
- DeepSORT中的
亮点包括:
- 支持的跟踪器比MMTracking多
- 用***统一的代码风格***重写了多个跟踪器,无需为每个跟踪器配置多个环境
- 模块化设计,将检测器、跟踪器、外观提取模块和卡尔曼滤波器**解耦**,便于进行实验
![gif](figure/demo.gif)
## 🗺️ 路线图
- [ x ] Add StrongSort and SparseTrack
- [ x ] Add save video function
- [ x ] Add timer function to calculate fps
- [] Add more ReID modules.mer function to calculate fps
## 🔨 安装
基本环境是:
- Ubuntu 18.04
- Python3.9, Pytorch: 1.12
运行以下命令安装其他包:
```bash
pip3 install -r requirements.txt
```
### 🔍 检测器安装
1. YOLOX:
YOLOX的版本是0.1.0与ByteTrack相同。要安装它你可以在某处克隆ByteTrack仓库然后运行
``` bash
https://github.com/ifzhang/ByteTrack.git
python3 setup.py develop
```
2. YOLO v7:
由于仓库本身就是基于YOLOv7的因此无需执行额外的步骤。
3. YOLO v8:
请运行:
```bash
pip3 install ultralytics==8.0.94
```
### 📑 数据准备
***如果你不想在特定数据集上测试,而只想运行演示,请跳过这一部分。***
***无论你想测试哪个数据集请按以下方式YOLO风格组织***
```
dataset_name
|---images
|---train
|---sequence_name1
|---000001.jpg
|---000002.jpg ...
|---val ...
|---test ...
|
```
你可以参考`./tools`中的代码来了解如何组织数据集。
***然后你需要准备一个yaml文件来指明路径以便代码能够找到图像***
一些示例在tracker/config_files中。重要的键包括
```
DATASET_ROOT: '/data/xxxx/datasets/MOT17' # your dataset root
SPLIT: test # train, test or val
CATEGORY_NAMES: # same in YOLO training
- 'pedestrian'
CATEGORY_DICT:
0: 'pedestrian'
```
## 🚗 实践
### 🏃 训练
跟踪器通常不需要训练参数。请参考不同检测器的训练方法来训练YOLOs。
以下参考资料可能对你有帮助:
- YOLOX: `tracker/yolox_utils/train_yolox.py`
- YOLO v7:
```shell
python train_aux.py --dataset visdrone --workers 8 --device <$GPU_id$> --batch-size 16 --data data/visdrone_all.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights <$YOLO v7 pretrained model path$> --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yaml
```
- YOLO v8: `tracker/yolov8_utils/train_yolov8.py`
### 😊 跟踪!
如果你只是想运行一个demo:
```bash
python tracker/track_demo.py --obj ${video path or images folder path} --detector ${yolox, yolov8 or yolov7} --tracker ${tracker name} --kalman_format ${kalman format, sort, byte, ...} --detector_model_path ${detector weight path} --save_images
```
例如:
```bash
python tracker/track_demo.py --obj M0203.mp4 --detector yolov8 --tracker deepsort --kalman_format byte --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt --save_images
```
如果你想在数据集上测试:
```bash
python tracker/track.py --dataset ${dataset name, related with the yaml file} --detector ${yolox, yolov8 or yolov7} --tracker ${tracker name} --kalman_format ${kalman format, sort, byte, ...} --detector_model_path ${detector weight path}
```
例如:
- SORT: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker sort --kalman_format sort --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt `
- DeepSORT: `python tracker/track.py --dataset uavdt --detector yolov7 --tracker deepsort --kalman_format byte --detector_model_path weights/yolov7_UAVDT_35epochs_20230507.pt`
- ByteTrack: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker bytetrack --kalman_format byte --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- OCSort: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker ocsort --kalman_format ocsort --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- C-BIoU Track: `python tracker/track.py --dataset uavdt --detector yolov8 --tracker c_bioutrack --kalman_format bot --detector_model_path weights/yolov8l_UAVDT_60epochs_20230509.pt`
- BoT-SORT: `python tracker/track.py --dataset uavdt --detector yolox --tracker botsort --kalman_format bot --detector_model_path weights/yolox_m_uavdt_50epochs.pth.tar`
### ✅ 评估
马上推出作为备选项你可以使用这个repo [Easier to use TrackEval repo](https://github.com/JackWoo0831/Easier_To_Use_TrackEval).

View File

@ -0,0 +1,49 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# CSP-ResNet backbone
backbone:
# [from, number, module, args]
[[-1, 1, Stem, [128]], # 0-P1/2
[-1, 3, ResCSPC, [128]],
[-1, 1, Conv, [256, 3, 2]], # 2-P3/8
[-1, 4, ResCSPC, [256]],
[-1, 1, Conv, [512, 3, 2]], # 4-P3/8
[-1, 6, ResCSPC, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 6-P3/8
[-1, 3, ResCSPC, [1024]], # 7
]
# CSP-Res-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 8
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[5, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 2, ResCSPB, [256]], # 13
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[3, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 2, ResCSPB, [128]], # 18
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 13], 1, Concat, [1]], # cat
[-1, 2, ResCSPB, [256]], # 22
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 8], 1, Concat, [1]], # cat
[-1, 2, ResCSPB, [512]], # 26
[-1, 1, Conv, [1024, 3, 1]],
[[19,23,27], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,49 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# CSP-ResNeXt backbone
backbone:
# [from, number, module, args]
[[-1, 1, Stem, [128]], # 0-P1/2
[-1, 3, ResXCSPC, [128]],
[-1, 1, Conv, [256, 3, 2]], # 2-P3/8
[-1, 4, ResXCSPC, [256]],
[-1, 1, Conv, [512, 3, 2]], # 4-P3/8
[-1, 6, ResXCSPC, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 6-P3/8
[-1, 3, ResXCSPC, [1024]], # 7
]
# CSP-ResX-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 8
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[5, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 2, ResXCSPB, [256]], # 13
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[3, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 2, ResXCSPB, [128]], # 18
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 13], 1, Concat, [1]], # cat
[-1, 2, ResXCSPB, [256]], # 22
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 8], 1, Concat, [1]], # cat
[-1, 2, ResXCSPB, [512]], # 26
[-1, 1, Conv, [1024, 3, 1]],
[[19,23,27], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,52 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Bottleneck, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 2, BottleneckCSPC, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 8, BottleneckCSPC, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 8, BottleneckCSPC, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 4, BottleneckCSPC, [1024]], # 10
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 11
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[8, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [256]], # 16
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[6, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [128]], # 21
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 16], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [256]], # 25
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 11], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [512]], # 29
[-1, 1, Conv, [1024, 3, 1]],
[[22,26,30], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,52 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Bottleneck, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 2, BottleneckCSPC, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 8, BottleneckCSPC, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 8, BottleneckCSPC, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 4, BottleneckCSPC, [1024]], # 10
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 11
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[8, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [256]], # 16
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[6, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [128]], # 21
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 16], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [256]], # 25
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 11], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [512]], # 29
[-1, 1, Conv, [1024, 3, 1]],
[[22,26,30], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,63 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # expand model depth
width_multiple: 1.25 # expand layer channels
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, DownC, [128]], # 2-P2/4
[-1, 3, BottleneckCSPA, [128]],
[-1, 1, DownC, [256]], # 4-P3/8
[-1, 15, BottleneckCSPA, [256]],
[-1, 1, DownC, [512]], # 6-P4/16
[-1, 15, BottleneckCSPA, [512]],
[-1, 1, DownC, [768]], # 8-P5/32
[-1, 7, BottleneckCSPA, [768]],
[-1, 1, DownC, [1024]], # 10-P6/64
[-1, 7, BottleneckCSPA, [1024]], # 11
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 12
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-6, 1, Conv, [384, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [384]], # 17
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-13, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [256]], # 22
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-20, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [128]], # 27
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, DownC, [256]],
[[-1, 22], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [256]], # 31
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, DownC, [384]],
[[-1, 17], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [384]], # 35
[-1, 1, Conv, [768, 3, 1]],
[-2, 1, DownC, [512]],
[[-1, 12], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [512]], # 39
[-1, 1, Conv, [1024, 3, 1]],
[[28,32,36,40], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,63 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # expand model depth
width_multiple: 1.25 # expand layer channels
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, DownC, [128]], # 2-P2/4
[-1, 3, BottleneckCSPA, [128]],
[-1, 1, DownC, [256]], # 4-P3/8
[-1, 7, BottleneckCSPA, [256]],
[-1, 1, DownC, [512]], # 6-P4/16
[-1, 7, BottleneckCSPA, [512]],
[-1, 1, DownC, [768]], # 8-P5/32
[-1, 3, BottleneckCSPA, [768]],
[-1, 1, DownC, [1024]], # 10-P6/64
[-1, 3, BottleneckCSPA, [1024]], # 11
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 12
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-6, 1, Conv, [384, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [384]], # 17
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-13, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [256]], # 22
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-20, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [128]], # 27
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, DownC, [256]],
[[-1, 22], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [256]], # 31
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, DownC, [384]],
[[-1, 17], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [384]], # 35
[-1, 1, Conv, [768, 3, 1]],
[-2, 1, DownC, [512]],
[[-1, 12], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [512]], # 39
[-1, 1, Conv, [1024, 3, 1]],
[[28,32,36,40], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,63 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # expand model depth
width_multiple: 1.0 # expand layer channels
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
[-1, 3, BottleneckCSPA, [128]],
[-1, 1, Conv, [256, 3, 2]], # 4-P3/8
[-1, 7, BottleneckCSPA, [256]],
[-1, 1, Conv, [384, 3, 2]], # 6-P4/16
[-1, 7, BottleneckCSPA, [384]],
[-1, 1, Conv, [512, 3, 2]], # 8-P5/32
[-1, 3, BottleneckCSPA, [512]],
[-1, 1, Conv, [640, 3, 2]], # 10-P6/64
[-1, 3, BottleneckCSPA, [640]], # 11
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [320]], # 12
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-6, 1, Conv, [256, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [256]], # 17
[-1, 1, Conv, [192, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-13, 1, Conv, [192, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [192]], # 22
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-20, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [128]], # 27
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [192, 3, 2]],
[[-1, 22], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [192]], # 31
[-1, 1, Conv, [384, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 17], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [256]], # 35
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [320, 3, 2]],
[[-1, 12], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [320]], # 39
[-1, 1, Conv, [640, 3, 1]],
[[28,32,36,40], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,63 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # expand model depth
width_multiple: 1.0 # expand layer channels
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
[-1, 3, BottleneckCSPA, [128]],
[-1, 1, Conv, [256, 3, 2]], # 4-P3/8
[-1, 7, BottleneckCSPA, [256]],
[-1, 1, Conv, [512, 3, 2]], # 6-P4/16
[-1, 7, BottleneckCSPA, [512]],
[-1, 1, Conv, [768, 3, 2]], # 8-P5/32
[-1, 3, BottleneckCSPA, [768]],
[-1, 1, Conv, [1024, 3, 2]], # 10-P6/64
[-1, 3, BottleneckCSPA, [1024]], # 11
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 12
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-6, 1, Conv, [384, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [384]], # 17
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-13, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [256]], # 22
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[-20, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 3, BottleneckCSPB, [128]], # 27
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 22], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [256]], # 31
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [384, 3, 2]],
[[-1, 17], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [384]], # 35
[-1, 1, Conv, [768, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 12], 1, Concat, [1]], # cat
[-1, 3, BottleneckCSPB, [512]], # 39
[-1, 1, Conv, [1024, 3, 1]],
[[28,32,36,40], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,51 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# darknet53 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Bottleneck, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 2, Bottleneck, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 8, Bottleneck, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 8, Bottleneck, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 4, Bottleneck, [1024]], # 10
]
# YOLOv3-SPP head
head:
[[-1, 1, Bottleneck, [1024, False]],
[-1, 1, SPP, [512, [5, 9, 13]]],
[-1, 1, Conv, [1024, 3, 1]],
[-1, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [1024, 3, 1]], # 15 (P5/32-large)
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 8], 1, Concat, [1]], # cat backbone P4
[-1, 1, Bottleneck, [512, False]],
[-1, 1, Bottleneck, [512, False]],
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [512, 3, 1]], # 22 (P4/16-medium)
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P3
[-1, 1, Bottleneck, [256, False]],
[-1, 2, Bottleneck, [256, False]], # 27 (P3/8-small)
[[27, 22, 15], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,51 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# darknet53 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Bottleneck, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 2, Bottleneck, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 8, Bottleneck, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 8, Bottleneck, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 4, Bottleneck, [1024]], # 10
]
# YOLOv3 head
head:
[[-1, 1, Bottleneck, [1024, False]],
[-1, 1, Conv, [512, [1, 1]]],
[-1, 1, Conv, [1024, 3, 1]],
[-1, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [1024, 3, 1]], # 15 (P5/32-large)
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 8], 1, Concat, [1]], # cat backbone P4
[-1, 1, Bottleneck, [512, False]],
[-1, 1, Bottleneck, [512, False]],
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [512, 3, 1]], # 22 (P4/16-medium)
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P3
[-1, 1, Bottleneck, [256, False]],
[-1, 2, Bottleneck, [256, False]], # 27 (P3/8-small)
[[27, 22, 15], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,52 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# CSP-Darknet backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Bottleneck, [64]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 2, BottleneckCSPC, [128]],
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8
[-1, 8, BottleneckCSPC, [256]],
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16
[-1, 8, BottleneckCSPC, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32
[-1, 4, BottleneckCSPC, [1024]], # 10
]
# CSP-Dark-PAN head
head:
[[-1, 1, SPPCSPC, [512]], # 11
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[8, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [256]], # 16
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[6, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 2, BottleneckCSPB, [128]], # 21
[-1, 1, Conv, [256, 3, 1]],
[-2, 1, Conv, [256, 3, 2]],
[[-1, 16], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [256]], # 25
[-1, 1, Conv, [512, 3, 1]],
[-2, 1, Conv, [512, 3, 2]],
[[-1, 11], 1, Concat, [1]], # cat
[-1, 2, BottleneckCSPB, [512]], # 29
[-1, 1, Conv, [1024, 3, 1]],
[[22,26,30], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,202 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7-d6 backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [96, 3, 1]], # 1-P1/2
[-1, 1, DownC, [192]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [192, 1, 1]], # 14
[-1, 1, DownC, [384]], # 15-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 27
[-1, 1, DownC, [768]], # 28-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 40
[-1, 1, DownC, [1152]], # 41-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [1152, 1, 1]], # 53
[-1, 1, DownC, [1536]], # 54-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [1536, 1, 1]], # 66
]
# yolov7-d6 head
head:
[[-1, 1, SPPCSPC, [768]], # 67
[-1, 1, Conv, [576, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[53, 1, Conv, [576, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [576, 1, 1]], # 83
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[40, 1, Conv, [384, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 99
[-1, 1, Conv, [192, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[27, 1, Conv, [192, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [192, 1, 1]], # 115
[-1, 1, DownC, [384]],
[[-1, 99], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 129
[-1, 1, DownC, [576]],
[[-1, 83], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [576, 1, 1]], # 143
[-1, 1, DownC, [768]],
[[-1, 67], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 157
[115, 1, Conv, [384, 3, 1]],
[129, 1, Conv, [768, 3, 1]],
[143, 1, Conv, [1152, 3, 1]],
[157, 1, Conv, [1536, 3, 1]],
[[158,159,160,161], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,180 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7-e6 backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [80, 3, 1]], # 1-P1/2
[-1, 1, DownC, [160]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 12
[-1, 1, DownC, [320]], # 13-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 23
[-1, 1, DownC, [640]], # 24-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 34
[-1, 1, DownC, [960]], # 35-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 45
[-1, 1, DownC, [1280]], # 46-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 56
]
# yolov7-e6 head
head:
[[-1, 1, SPPCSPC, [640]], # 57
[-1, 1, Conv, [480, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[45, 1, Conv, [480, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 71
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[34, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 85
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[23, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 99
[-1, 1, DownC, [320]],
[[-1, 85], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 111
[-1, 1, DownC, [480]],
[[-1, 71], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 123
[-1, 1, DownC, [640]],
[[-1, 57], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 135
[99, 1, Conv, [320, 3, 1]],
[111, 1, Conv, [640, 3, 1]],
[123, 1, Conv, [960, 3, 1]],
[135, 1, Conv, [1280, 3, 1]],
[[136,137,138,139], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,301 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7-e6e backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [80, 3, 1]], # 1-P1/2
[-1, 1, DownC, [160]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 12
[-11, 1, Conv, [64, 1, 1]],
[-12, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 22
[[-1, -11], 1, Shortcut, [1]], # 23
[-1, 1, DownC, [320]], # 24-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 34
[-11, 1, Conv, [128, 1, 1]],
[-12, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 44
[[-1, -11], 1, Shortcut, [1]], # 45
[-1, 1, DownC, [640]], # 46-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 56
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 66
[[-1, -11], 1, Shortcut, [1]], # 67
[-1, 1, DownC, [960]], # 68-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 78
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 88
[[-1, -11], 1, Shortcut, [1]], # 89
[-1, 1, DownC, [1280]], # 90-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 100
[-11, 1, Conv, [512, 1, 1]],
[-12, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 110
[[-1, -11], 1, Shortcut, [1]], # 111
]
# yolov7-e6e head
head:
[[-1, 1, SPPCSPC, [640]], # 112
[-1, 1, Conv, [480, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[89, 1, Conv, [480, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 126
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 136
[[-1, -11], 1, Shortcut, [1]], # 137
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[67, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 151
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 161
[[-1, -11], 1, Shortcut, [1]], # 162
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[45, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 176
[-11, 1, Conv, [128, 1, 1]],
[-12, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 186
[[-1, -11], 1, Shortcut, [1]], # 187
[-1, 1, DownC, [320]],
[[-1, 162], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 199
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 209
[[-1, -11], 1, Shortcut, [1]], # 210
[-1, 1, DownC, [480]],
[[-1, 137], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 222
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 232
[[-1, -11], 1, Shortcut, [1]], # 233
[-1, 1, DownC, [640]],
[[-1, 112], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 245
[-11, 1, Conv, [512, 1, 1]],
[-12, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 255
[[-1, -11], 1, Shortcut, [1]], # 256
[187, 1, Conv, [320, 3, 1]],
[210, 1, Conv, [640, 3, 1]],
[233, 1, Conv, [960, 3, 1]],
[256, 1, Conv, [1280, 3, 1]],
[[257,258,259,260], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,112 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv7-tiny backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 2]], # 0-P1/2
[-1, 1, Conv, [64, 3, 2]], # 1-P2/4
[-1, 1, Conv, [32, 1, 1]],
[-2, 1, Conv, [32, 1, 1]],
[-1, 1, Conv, [32, 3, 1]],
[-1, 1, Conv, [32, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1]], # 7
[-1, 1, MP, []], # 8-P3/8
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 14
[-1, 1, MP, []], # 15-P4/16
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 21
[-1, 1, MP, []], # 22-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 28
]
# YOLOv7-tiny head
head:
[[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, SP, [5]],
[-2, 1, SP, [9]],
[-3, 1, SP, [13]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[[-1, -7], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 37
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[21, 1, Conv, [128, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 47
[-1, 1, Conv, [64, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[14, 1, Conv, [64, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [32, 1, 1]],
[-2, 1, Conv, [32, 1, 1]],
[-1, 1, Conv, [32, 3, 1]],
[-1, 1, Conv, [32, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1]], # 57
[-1, 1, Conv, [128, 3, 2]],
[[-1, 47], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 65
[-1, 1, Conv, [256, 3, 2]],
[[-1, 37], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 73
[57, 1, Conv, [128, 3, 1]],
[65, 1, Conv, [256, 3, 1]],
[73, 1, Conv, [512, 3, 1]],
[[74,75,76], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,112 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# yolov7-tiny backbone
backbone:
# [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True
[[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 0-P1/2
[-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 1-P2/4
[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7
[-1, 1, MP, []], # 8-P3/8
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 14
[-1, 1, MP, []], # 15-P4/16
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 21
[-1, 1, MP, []], # 22-P5/32
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 28
]
# yolov7-tiny head
head:
[[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, SP, [5]],
[-2, 1, SP, [9]],
[-3, 1, SP, [13]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -7], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 37
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[21, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 47
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[14, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 57
[-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
[[-1, 47], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 65
[-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
[[-1, 37], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 73
[57, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[65, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[73, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[74,75,76], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,158 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7-w6 backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 10
[-1, 1, Conv, [256, 3, 2]], # 11-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 19
[-1, 1, Conv, [512, 3, 2]], # 20-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 28
[-1, 1, Conv, [768, 3, 2]], # 29-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 37
[-1, 1, Conv, [1024, 3, 2]], # 38-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 46
]
# yolov7-w6 head
head:
[[-1, 1, SPPCSPC, [512]], # 47
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [384, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 59
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[28, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 71
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[19, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 83
[-1, 1, Conv, [256, 3, 2]],
[[-1, 71], 1, Concat, [1]], # cat
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 93
[-1, 1, Conv, [384, 3, 2]],
[[-1, 59], 1, Concat, [1]], # cat
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 103
[-1, 1, Conv, [512, 3, 2]],
[[-1, 47], 1, Concat, [1]], # cat
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 113
[83, 1, Conv, [256, 3, 1]],
[93, 1, Conv, [512, 3, 1]],
[103, 1, Conv, [768, 3, 1]],
[113, 1, Conv, [1024, 3, 1]],
[[114,115,116,117], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,140 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# yolov7 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 11
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 16-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 24
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 29-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 37
[-1, 1, MP, []],
[-1, 1, Conv, [512, 1, 1]],
[-3, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 42-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 50
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [512]], # 51
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 63
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[24, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 75
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3, 63], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 88
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3, 51], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 101
[75, 1, RepConv, [256, 3, 1]],
[88, 1, RepConv, [512, 3, 1]],
[101, 1, RepConv, [1024, 3, 1]],
[[102,103,104], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,156 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# yolov7x backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [40, 3, 1]], # 0
[-1, 1, Conv, [80, 3, 2]], # 1-P1/2
[-1, 1, Conv, [80, 3, 1]],
[-1, 1, Conv, [160, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 13
[-1, 1, MP, []],
[-1, 1, Conv, [160, 1, 1]],
[-3, 1, Conv, [160, 1, 1]],
[-1, 1, Conv, [160, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 18-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 28
[-1, 1, MP, []],
[-1, 1, Conv, [320, 1, 1]],
[-3, 1, Conv, [320, 1, 1]],
[-1, 1, Conv, [320, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 33-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 43
[-1, 1, MP, []],
[-1, 1, Conv, [640, 1, 1]],
[-3, 1, Conv, [640, 1, 1]],
[-1, 1, Conv, [640, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 48-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 58
]
# yolov7x head
head:
[[-1, 1, SPPCSPC, [640]], # 59
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[43, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 73
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[28, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 87
[-1, 1, MP, []],
[-1, 1, Conv, [160, 1, 1]],
[-3, 1, Conv, [160, 1, 1]],
[-1, 1, Conv, [160, 3, 2]],
[[-1, -3, 73], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 102
[-1, 1, MP, []],
[-1, 1, Conv, [320, 1, 1]],
[-3, 1, Conv, [320, 1, 1]],
[-1, 1, Conv, [320, 3, 2]],
[[-1, -3, 59], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 117
[87, 1, Conv, [320, 3, 1]],
[102, 1, Conv, [640, 3, 1]],
[117, 1, Conv, [1280, 3, 1]],
[[118,119,120], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,207 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7 backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [96, 3, 1]], # 1-P1/2
[-1, 1, DownC, [192]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [192, 1, 1]], # 14
[-1, 1, DownC, [384]], # 15-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 27
[-1, 1, DownC, [768]], # 28-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 40
[-1, 1, DownC, [1152]], # 41-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [1152, 1, 1]], # 53
[-1, 1, DownC, [1536]], # 54-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [1536, 1, 1]], # 66
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [768]], # 67
[-1, 1, Conv, [576, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[53, 1, Conv, [576, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [576, 1, 1]], # 83
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[40, 1, Conv, [384, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 99
[-1, 1, Conv, [192, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[27, 1, Conv, [192, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [192, 1, 1]], # 115
[-1, 1, DownC, [384]],
[[-1, 99], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 129
[-1, 1, DownC, [576]],
[[-1, 83], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [576, 1, 1]], # 143
[-1, 1, DownC, [768]],
[[-1, 67], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 157
[115, 1, Conv, [384, 3, 1]],
[129, 1, Conv, [768, 3, 1]],
[143, 1, Conv, [1152, 3, 1]],
[157, 1, Conv, [1536, 3, 1]],
[115, 1, Conv, [384, 3, 1]],
[99, 1, Conv, [768, 3, 1]],
[83, 1, Conv, [1152, 3, 1]],
[67, 1, Conv, [1536, 3, 1]],
[[158,159,160,161,162,163,164,165], 1, IAuxDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,185 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7 backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [80, 3, 1]], # 1-P1/2
[-1, 1, DownC, [160]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 12
[-1, 1, DownC, [320]], # 13-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 23
[-1, 1, DownC, [640]], # 24-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 34
[-1, 1, DownC, [960]], # 35-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 45
[-1, 1, DownC, [1280]], # 46-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 56
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [640]], # 57
[-1, 1, Conv, [480, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[45, 1, Conv, [480, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 71
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[34, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 85
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[23, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 99
[-1, 1, DownC, [320]],
[[-1, 85], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 111
[-1, 1, DownC, [480]],
[[-1, 71], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 123
[-1, 1, DownC, [640]],
[[-1, 57], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 135
[99, 1, Conv, [320, 3, 1]],
[111, 1, Conv, [640, 3, 1]],
[123, 1, Conv, [960, 3, 1]],
[135, 1, Conv, [1280, 3, 1]],
[99, 1, Conv, [320, 3, 1]],
[85, 1, Conv, [640, 3, 1]],
[71, 1, Conv, [960, 3, 1]],
[57, 1, Conv, [1280, 3, 1]],
[[136,137,138,139,140,141,142,143], 1, IAuxDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,306 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7 backbone
backbone:
# [from, number, module, args],
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [80, 3, 1]], # 1-P1/2
[-1, 1, DownC, [160]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 12
[-11, 1, Conv, [64, 1, 1]],
[-12, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 22
[[-1, -11], 1, Shortcut, [1]], # 23
[-1, 1, DownC, [320]], # 24-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 34
[-11, 1, Conv, [128, 1, 1]],
[-12, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 44
[[-1, -11], 1, Shortcut, [1]], # 45
[-1, 1, DownC, [640]], # 46-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 56
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 66
[[-1, -11], 1, Shortcut, [1]], # 67
[-1, 1, DownC, [960]], # 68-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 78
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [960, 1, 1]], # 88
[[-1, -11], 1, Shortcut, [1]], # 89
[-1, 1, DownC, [1280]], # 90-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 100
[-11, 1, Conv, [512, 1, 1]],
[-12, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 110
[[-1, -11], 1, Shortcut, [1]], # 111
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [640]], # 112
[-1, 1, Conv, [480, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[89, 1, Conv, [480, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 126
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 136
[[-1, -11], 1, Shortcut, [1]], # 137
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[67, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 151
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 161
[[-1, -11], 1, Shortcut, [1]], # 162
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[45, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 176
[-11, 1, Conv, [128, 1, 1]],
[-12, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 186
[[-1, -11], 1, Shortcut, [1]], # 187
[-1, 1, DownC, [320]],
[[-1, 162], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 199
[-11, 1, Conv, [256, 1, 1]],
[-12, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 209
[[-1, -11], 1, Shortcut, [1]], # 210
[-1, 1, DownC, [480]],
[[-1, 137], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 222
[-11, 1, Conv, [384, 1, 1]],
[-12, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [480, 1, 1]], # 232
[[-1, -11], 1, Shortcut, [1]], # 233
[-1, 1, DownC, [640]],
[[-1, 112], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 245
[-11, 1, Conv, [512, 1, 1]],
[-12, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 255
[[-1, -11], 1, Shortcut, [1]], # 256
[187, 1, Conv, [320, 3, 1]],
[210, 1, Conv, [640, 3, 1]],
[233, 1, Conv, [960, 3, 1]],
[256, 1, Conv, [1280, 3, 1]],
[186, 1, Conv, [320, 3, 1]],
[161, 1, Conv, [640, 3, 1]],
[136, 1, Conv, [960, 3, 1]],
[112, 1, Conv, [1280, 3, 1]],
[[257,258,259,260,261,262,263,264], 1, IAuxDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,112 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# yolov7-tiny backbone
backbone:
# [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True
[[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 0-P1/2
[-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 1-P2/4
[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7
[-1, 1, MP, []], # 8-P3/8
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 14
[-1, 1, MP, []], # 15-P4/16
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 21
[-1, 1, MP, []], # 22-P5/32
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 28
]
# yolov7-tiny head
head:
[[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, SP, [5]],
[-2, 1, SP, [9]],
[-3, 1, SP, [13]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -7], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 37
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[21, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 47
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[14, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 57
[-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
[[-1, 47], 1, Concat, [1]],
[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 65
[-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
[[-1, 37], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[-1, -2, -3, -4], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 73
[57, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[65, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[73, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
[[74,75,76], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,163 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [ 19,27, 44,40, 38,94 ] # P3/8
- [ 96,68, 86,152, 180,137 ] # P4/16
- [ 140,301, 303,264, 238,542 ] # P5/32
- [ 436,615, 739,380, 925,792 ] # P6/64
# yolov7 backbone
backbone:
# [from, number, module, args]
[[-1, 1, ReOrg, []], # 0
[-1, 1, Conv, [64, 3, 1]], # 1-P1/2
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 10
[-1, 1, Conv, [256, 3, 2]], # 11-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 19
[-1, 1, Conv, [512, 3, 2]], # 20-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 28
[-1, 1, Conv, [768, 3, 2]], # 29-P5/32
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[-1, 1, Conv, [384, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [768, 1, 1]], # 37
[-1, 1, Conv, [1024, 3, 2]], # 38-P6/64
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 46
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [512]], # 47
[-1, 1, Conv, [384, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [384, 1, 1]], # route backbone P5
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 59
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[28, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 71
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[19, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 83
[-1, 1, Conv, [256, 3, 2]],
[[-1, 71], 1, Concat, [1]], # cat
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 93
[-1, 1, Conv, [384, 3, 2]],
[[-1, 59], 1, Concat, [1]], # cat
[-1, 1, Conv, [384, 1, 1]],
[-2, 1, Conv, [384, 1, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[-1, 1, Conv, [192, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [384, 1, 1]], # 103
[-1, 1, Conv, [512, 3, 2]],
[[-1, 47], 1, Concat, [1]], # cat
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 113
[83, 1, Conv, [256, 3, 1]],
[93, 1, Conv, [512, 3, 1]],
[103, 1, Conv, [768, 3, 1]],
[113, 1, Conv, [1024, 3, 1]],
[83, 1, Conv, [320, 3, 1]],
[71, 1, Conv, [640, 3, 1]],
[59, 1, Conv, [960, 3, 1]],
[47, 1, Conv, [1280, 3, 1]],
[[114,115,116,117,118,119,120,121], 1, IAuxDetect, [nc, anchors]], # Detect(P3, P4, P5, P6)
]

View File

@ -0,0 +1,140 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# yolov7 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 11
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 16-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 24
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 29-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 37
[-1, 1, MP, []],
[-1, 1, Conv, [512, 1, 1]],
[-3, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 42-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 50
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [512]], # 51
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 63
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[24, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 75
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3, 63], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 88
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3, 51], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 101
[75, 1, RepConv, [256, 3, 1]],
[88, 1, RepConv, [512, 3, 1]],
[101, 1, RepConv, [1024, 3, 1]],
[[102,103,104], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,156 @@
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# yolov7 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [40, 3, 1]], # 0
[-1, 1, Conv, [80, 3, 2]], # 1-P1/2
[-1, 1, Conv, [80, 3, 1]],
[-1, 1, Conv, [160, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 13
[-1, 1, MP, []],
[-1, 1, Conv, [160, 1, 1]],
[-3, 1, Conv, [160, 1, 1]],
[-1, 1, Conv, [160, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 18-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 28
[-1, 1, MP, []],
[-1, 1, Conv, [320, 1, 1]],
[-3, 1, Conv, [320, 1, 1]],
[-1, 1, Conv, [320, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 33-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 43
[-1, 1, MP, []],
[-1, 1, Conv, [640, 1, 1]],
[-3, 1, Conv, [640, 1, 1]],
[-1, 1, Conv, [640, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 48-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [1280, 1, 1]], # 58
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [640]], # 59
[-1, 1, Conv, [320, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[43, 1, Conv, [320, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 73
[-1, 1, Conv, [160, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[28, 1, Conv, [160, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [160, 1, 1]], # 87
[-1, 1, MP, []],
[-1, 1, Conv, [160, 1, 1]],
[-3, 1, Conv, [160, 1, 1]],
[-1, 1, Conv, [160, 3, 2]],
[[-1, -3, 73], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [320, 1, 1]], # 102
[-1, 1, MP, []],
[-1, 1, Conv, [320, 1, 1]],
[-3, 1, Conv, [320, 1, 1]],
[-1, 1, Conv, [320, 3, 2]],
[[-1, -3, 59], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[-1, 1, Conv, [512, 3, 1]],
[[-1, -3, -5, -7, -8], 1, Concat, [1]],
[-1, 1, Conv, [640, 1, 1]], # 117
[87, 1, Conv, [320, 3, 1]],
[102, 1, Conv, [640, 3, 1]],
[117, 1, Conv, [1280, 3, 1]],
[[118,119,120], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]

View File

@ -0,0 +1,23 @@
# COCO 2017 dataset http://cocodataset.org
# download command/URL (optional)
download: bash ./scripts/get_coco.sh
# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ./coco/train2017.txt # 118287 images
val: ./coco/val2017.txt # 5000 images
test: ./coco/test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
# number of classes
nc: 80
# class names
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush' ]

View File

@ -0,0 +1,29 @@
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.3 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 0.7 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.2 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.0 # image mixup (probability)
copy_paste: 0.0 # image copy paste (probability)
paste_in: 0.0 # image copy paste (probability)

View File

@ -0,0 +1,29 @@
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.1 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.3 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 0.7 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.2 # image translation (+/- fraction)
scale: 0.9 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.15 # image mixup (probability)
copy_paste: 0.0 # image copy paste (probability)
paste_in: 0.15 # image copy paste (probability)

View File

@ -0,0 +1,29 @@
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.3 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 0.7 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.2 # image translation (+/- fraction)
scale: 0.9 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.15 # image mixup (probability)
copy_paste: 0.0 # image copy paste (probability)
paste_in: 0.15 # image copy paste (probability)

View File

@ -0,0 +1,29 @@
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.01 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.5 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 1.0 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.1 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.05 # image mixup (probability)
copy_paste: 0.0 # image copy paste (probability)
paste_in: 0.05 # image copy paste (probability)

View File

@ -0,0 +1,7 @@
train: ./mot17/train.txt
val: ./mot17/val.txt
test: ./mot17/val.txt
nc: 1
names: ['pedestrain']

View File

@ -0,0 +1,7 @@
train: ./uavdt/train.txt
val: ./uavdt/test.txt
test: ./uavdt/test.txt
nc: 1
names: ['car']

View File

@ -0,0 +1,8 @@
train: ./visdrone/train.txt
val: ./visdrone/val.txt
test: ./visdrone/test.txt
nc: 10
names: ['pedestrain', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor']

View File

@ -0,0 +1,8 @@
train: ./visdrone/train.txt
val: ./visdrone/val.txt
test: ./visdrone/test.txt
nc: 4
names: ['car', 'van', 'truck', 'bus']

View File

@ -0,0 +1,184 @@
import argparse
import time
from pathlib import Path
import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow, non_max_suppression, apply_classifier, \
scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized, TracedModel
def detect(save_img=False):
source, weights, view_img, save_txt, imgsz, trace = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size, not opt.no_trace
save_img = not opt.nosave and not source.endswith('.txt') # save inference images
webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
('rtsp://', 'rtmp://', 'http://', 'https://'))
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
# Initialize
set_logging()
device = select_device(opt.device)
half = device.type != 'cpu' # half precision only supported on CUDA
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
stride = int(model.stride.max()) # model stride
imgsz = check_img_size(imgsz, s=stride) # check img_size
if trace:
model = TracedModel(model, device, opt.img_size)
if half:
model.half() # to FP16
# Second-stage classifier
classify = False
if classify:
modelc = load_classifier(name='resnet101', n=2) # initialize
modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()
# Set Dataloader
vid_path, vid_writer = None, None
if webcam:
view_img = check_imshow()
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz, stride=stride)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride)
# Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
# Run inference
if device.type != 'cpu':
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) # run once
t0 = time.time()
for path, img, im0s, vid_cap in dataset:
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img = img.unsqueeze(0)
# Inference
t1 = time_synchronized()
pred = model(img, augment=opt.augment)[0]
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
t2 = time_synchronized()
# Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img, im0s)
# Process detections
for i, det in enumerate(pred): # detections per image
if webcam: # batch_size >= 1
p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
else:
p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)
p = Path(p) # to Path
save_path = str(save_dir / p.name) # img.jpg
txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}') # img.txt
s += '%gx%g ' % img.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, -1].unique():
n = (det[:, -1] == c).sum() # detections per class
s += f"{n} {names[int(c)]}{'s' * (n > 1)}, " # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
if save_txt: # Write to file
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh) # label format
with open(txt_path + '.txt', 'a') as f:
f.write(('%g ' * len(line)).rstrip() % line + '\n')
if save_img or view_img: # Add bbox to image
label = f'{names[int(cls)]} {conf:.2f}'
plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
# Print time (inference + NMS)
#print(f'{s}Done. ({t2 - t1:.3f}s)')
# Stream results
if view_img:
cv2.imshow(str(p), im0)
cv2.waitKey(1) # 1 millisecond
# Save results (image with detections)
if save_img:
if dataset.mode == 'image':
cv2.imwrite(save_path, im0)
print(f" The image with the result is saved in: {save_path}")
else: # 'video' or 'stream'
if vid_path != save_path: # new video
vid_path = save_path
if isinstance(vid_writer, cv2.VideoWriter):
vid_writer.release() # release previous video writer
if vid_cap: # video
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
else: # stream
fps, w, h = 30, im0.shape[1], im0.shape[0]
save_path += '.mp4'
vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
vid_writer.write(im0)
if save_txt or save_img:
s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
#print(f"Results saved to {save_dir}{s}")
print(f'Done. ({time.time() - t0:.3f}s)')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='yolov7.pt', help='model.pt path(s)')
parser.add_argument('--source', type=str, default='inference/images', help='source') # file/folder, 0 for webcam
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default='runs/detect', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--no-trace', action='store_true', help='don`t trace model')
opt = parser.parse_args()
print(opt)
#check_requirements(exclude=('pycocotools', 'thop'))
with torch.no_grad():
if opt.update: # update all models (to fix SourceChangeWarning)
for opt.weights in ['yolov7.pt']:
detect()
strip_optimizer(opt.weights)
else:
detect()

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 347 KiB

View File

@ -0,0 +1,97 @@
"""PyTorch Hub models
Usage:
import torch
model = torch.hub.load('repo', 'model')
"""
from pathlib import Path
import torch
from models.yolo import Model
from utils.general import check_requirements, set_logging
from utils.google_utils import attempt_download
from utils.torch_utils import select_device
dependencies = ['torch', 'yaml']
check_requirements(Path(__file__).parent / 'requirements.txt', exclude=('pycocotools', 'thop'))
set_logging()
def create(name, pretrained, channels, classes, autoshape):
"""Creates a specified model
Arguments:
name (str): name of model, i.e. 'yolov7'
pretrained (bool): load pretrained weights into the model
channels (int): number of input channels
classes (int): number of model classes
Returns:
pytorch model
"""
try:
cfg = list((Path(__file__).parent / 'cfg').rglob(f'{name}.yaml'))[0] # model.yaml path
model = Model(cfg, channels, classes)
if pretrained:
fname = f'{name}.pt' # checkpoint filename
attempt_download(fname) # download if not found locally
ckpt = torch.load(fname, map_location=torch.device('cpu')) # load
msd = model.state_dict() # model state_dict
csd = ckpt['model'].float().state_dict() # checkpoint state_dict as FP32
csd = {k: v for k, v in csd.items() if msd[k].shape == v.shape} # filter
model.load_state_dict(csd, strict=False) # load
if len(ckpt['model'].names) == classes:
model.names = ckpt['model'].names # set class names attribute
if autoshape:
model = model.autoshape() # for file/URI/PIL/cv2/np inputs and NMS
device = select_device('0' if torch.cuda.is_available() else 'cpu') # default to GPU if available
return model.to(device)
except Exception as e:
s = 'Cache maybe be out of date, try force_reload=True.'
raise Exception(s) from e
def custom(path_or_model='path/to/model.pt', autoshape=True):
"""custom mode
Arguments (3 options):
path_or_model (str): 'path/to/model.pt'
path_or_model (dict): torch.load('path/to/model.pt')
path_or_model (nn.Module): torch.load('path/to/model.pt')['model']
Returns:
pytorch model
"""
model = torch.load(path_or_model) if isinstance(path_or_model, str) else path_or_model # load checkpoint
if isinstance(model, dict):
model = model['ema' if model.get('ema') else 'model'] # load model
hub_model = Model(model.yaml).to(next(model.parameters()).device) # create
hub_model.load_state_dict(model.float().state_dict()) # load state_dict
hub_model.names = model.names # class names
if autoshape:
hub_model = hub_model.autoshape() # for file/URI/PIL/cv2/np inputs and NMS
device = select_device('0' if torch.cuda.is_available() else 'cpu') # default to GPU if available
return hub_model.to(device)
def yolov7(pretrained=True, channels=3, classes=80, autoshape=True):
return create('yolov7', pretrained, channels, classes, autoshape)
if __name__ == '__main__':
model = custom(path_or_model='yolov7.pt') # custom example
# model = create(name='yolov7', pretrained=True, channels=3, classes=80, autoshape=True) # pretrained example
# Verify inference
import numpy as np
from PIL import Image
imgs = [np.zeros((640, 480, 3))]
results = model(imgs) # batched inference
results.print()
results.save()

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

View File

@ -0,0 +1 @@
# init

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,106 @@
import numpy as np
import torch
import torch.nn as nn
from models.common import Conv, DWConv
from utils.google_utils import attempt_download
class CrossConv(nn.Module):
# Cross Convolution Downsample
def __init__(self, c1, c2, k=3, s=1, g=1, e=1.0, shortcut=False):
# ch_in, ch_out, kernel, stride, groups, expansion, shortcut
super(CrossConv, self).__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, (1, k), (1, s))
self.cv2 = Conv(c_, c2, (k, 1), (s, 1), g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class Sum(nn.Module):
# Weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
def __init__(self, n, weight=False): # n: number of inputs
super(Sum, self).__init__()
self.weight = weight # apply weights boolean
self.iter = range(n - 1) # iter object
if weight:
self.w = nn.Parameter(-torch.arange(1., n) / 2, requires_grad=True) # layer weights
def forward(self, x):
y = x[0] # no weight
if self.weight:
w = torch.sigmoid(self.w) * 2
for i in self.iter:
y = y + x[i + 1] * w[i]
else:
for i in self.iter:
y = y + x[i + 1]
return y
class MixConv2d(nn.Module):
# Mixed Depthwise Conv https://arxiv.org/abs/1907.09595
def __init__(self, c1, c2, k=(1, 3), s=1, equal_ch=True):
super(MixConv2d, self).__init__()
groups = len(k)
if equal_ch: # equal c_ per group
i = torch.linspace(0, groups - 1E-6, c2).floor() # c2 indices
c_ = [(i == g).sum() for g in range(groups)] # intermediate channels
else: # equal weight.numel() per group
b = [c2] + [0] * groups
a = np.eye(groups + 1, groups, k=-1)
a -= np.roll(a, 1, axis=1)
a *= np.array(k) ** 2
a[0] = 1
c_ = np.linalg.lstsq(a, b, rcond=None)[0].round() # solve for equal weight indices, ax = b
self.m = nn.ModuleList([nn.Conv2d(c1, int(c_[g]), k[g], s, k[g] // 2, bias=False) for g in range(groups)])
self.bn = nn.BatchNorm2d(c2)
self.act = nn.LeakyReLU(0.1, inplace=True)
def forward(self, x):
return x + self.act(self.bn(torch.cat([m(x) for m in self.m], 1)))
class Ensemble(nn.ModuleList):
# Ensemble of models
def __init__(self):
super(Ensemble, self).__init__()
def forward(self, x, augment=False):
y = []
for module in self:
y.append(module(x, augment)[0])
# y = torch.stack(y).max(0)[0] # max ensemble
# y = torch.stack(y).mean(0) # mean ensemble
y = torch.cat(y, 1) # nms ensemble
return y, None # inference, train output
def attempt_load(weights, map_location=None):
# Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
model = Ensemble()
for w in weights if isinstance(weights, list) else [weights]:
# attempt_download(w)
ckpt = torch.load(w, map_location=map_location) # load
model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model
# Compatibility updates
for m in model.modules():
if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
m.inplace = True # pytorch 1.7.0 compatibility
elif type(m) is nn.Upsample:
m.recompute_scale_factor = None # torch 1.11.0 compatibility
elif type(m) is Conv:
m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatibility
if len(model) == 1:
return model[-1] # return model
else:
print('Ensemble created with %s\n' % weights)
for k in ['names', 'stride']:
setattr(model, k, getattr(model[-1], k))
return model # return ensemble

View File

@ -0,0 +1,98 @@
import argparse
import sys
import time
sys.path.append('./') # to run '$ python *.py' files in subdirectories
import torch
import torch.nn as nn
import models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_size
from utils.torch_utils import select_device
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='./yolor-csp-c.pt', help='weights path')
parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') # height, width
parser.add_argument('--batch-size', type=int, default=1, help='batch size')
parser.add_argument('--dynamic', action='store_true', help='dynamic ONNX axes')
parser.add_argument('--grid', action='store_true', help='export Detect() layer grid')
parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
opt = parser.parse_args()
opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
print(opt)
set_logging()
t = time.time()
# Load PyTorch model
device = select_device(opt.device)
model = attempt_load(opt.weights, map_location=device) # load FP32 model
labels = model.names
# Checks
gs = int(max(model.stride)) # grid size (max stride)
opt.img_size = [check_img_size(x, gs) for x in opt.img_size] # verify img_size are gs-multiples
# Input
img = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device) # image size(1,3,320,192) iDetection
# Update model
for k, m in model.named_modules():
m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatibility
if isinstance(m, models.common.Conv): # assign export-friendly activations
if isinstance(m.act, nn.Hardswish):
m.act = Hardswish()
elif isinstance(m.act, nn.SiLU):
m.act = SiLU()
# elif isinstance(m, models.yolo.Detect):
# m.forward = m.forward_export # assign forward (optional)
model.model[-1].export = not opt.grid # set Detect() layer grid export
y = model(img) # dry run
# TorchScript export
try:
print('\nStarting TorchScript export with torch %s...' % torch.__version__)
f = opt.weights.replace('.pt', '.torchscript.pt') # filename
ts = torch.jit.trace(model, img, strict=False)
ts.save(f)
print('TorchScript export success, saved as %s' % f)
except Exception as e:
print('TorchScript export failure: %s' % e)
# ONNX export
try:
import onnx
print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
f = opt.weights.replace('.pt', '.onnx') # filename
torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],
output_names=['classes', 'boxes'] if y is None else ['output'],
dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'}, # size(1,3,640,640)
'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic else None)
# Checks
onnx_model = onnx.load(f) # load onnx model
onnx.checker.check_model(onnx_model) # check onnx model
# print(onnx.helper.printable_graph(onnx_model.graph)) # print a human readable model
print('ONNX export success, saved as %s' % f)
except Exception as e:
print('ONNX export failure: %s' % e)
# CoreML export
try:
import coremltools as ct
print('\nStarting CoreML export with coremltools %s...' % ct.__version__)
# convert model from torchscript and apply pixel scaling as per detect.py
model = ct.convert(ts, inputs=[ct.ImageType(name='image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])
f = opt.weights.replace('.pt', '.mlmodel') # filename
model.save(f)
print('CoreML export success, saved as %s' % f)
except Exception as e:
print('CoreML export failure: %s' % e)
# Finish
print('\nExport complete (%.2fs). Visualize with https://github.com/lutzroeder/netron.' % (time.time() - t))

View File

@ -0,0 +1,550 @@
import argparse
import logging
import sys
from copy import deepcopy
sys.path.append('./') # to run '$ python *.py' files in subdirectories
logger = logging.getLogger(__name__)
from models.common import *
from models.experimental import *
from utils.autoanchor import check_anchor_order
from utils.general import make_divisible, check_file, set_logging
from utils.torch_utils import time_synchronized, fuse_conv_and_bn, model_info, scale_img, initialize_weights, \
select_device, copy_attr
from utils.loss import SigmoidBin
try:
import thop # for FLOPS computation
except ImportError:
thop = None
class Detect(nn.Module):
stride = None # strides computed during build
export = False # onnx export
def __init__(self, nc=80, anchors=(), ch=()): # detection layer
super(Detect, self).__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
self.nl = len(anchors) # number of detection layers
self.na = len(anchors[0]) // 2 # number of anchors
self.grid = [torch.zeros(1)] * self.nl # init grid
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
self.register_buffer('anchors', a) # shape(nl,na,2)
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
def forward(self, x):
# x = x.copy() # for profiling
z = [] # inference output
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](x[i]) # conv
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
z.append(y.view(bs, -1, self.no))
return x if self.training else (torch.cat(z, 1), x)
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
class IDetect(nn.Module):
stride = None # strides computed during build
export = False # onnx export
def __init__(self, nc=80, anchors=(), ch=()): # detection layer
super(IDetect, self).__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
self.nl = len(anchors) # number of detection layers
self.na = len(anchors[0]) // 2 # number of anchors
self.grid = [torch.zeros(1)] * self.nl # init grid
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
self.register_buffer('anchors', a) # shape(nl,na,2)
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
self.ia = nn.ModuleList(ImplicitA(x) for x in ch)
self.im = nn.ModuleList(ImplicitM(self.no * self.na) for _ in ch)
def forward(self, x):
# x = x.copy() # for profiling
z = [] # inference output
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](self.ia[i](x[i])) # conv
x[i] = self.im[i](x[i])
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
z.append(y.view(bs, -1, self.no))
return x if self.training else (torch.cat(z, 1), x)
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
class IAuxDetect(nn.Module):
stride = None # strides computed during build
export = False # onnx export
def __init__(self, nc=80, anchors=(), ch=()): # detection layer
super(IAuxDetect, self).__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
self.nl = len(anchors) # number of detection layers
self.na = len(anchors[0]) // 2 # number of anchors
self.grid = [torch.zeros(1)] * self.nl # init grid
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
self.register_buffer('anchors', a) # shape(nl,na,2)
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch[:self.nl]) # output conv
self.m2 = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch[self.nl:]) # output conv
self.ia = nn.ModuleList(ImplicitA(x) for x in ch[:self.nl])
self.im = nn.ModuleList(ImplicitM(self.no * self.na) for _ in ch[:self.nl])
def forward(self, x):
# x = x.copy() # for profiling
z = [] # inference output
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](self.ia[i](x[i])) # conv
x[i] = self.im[i](x[i])
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
x[i+self.nl] = self.m2[i](x[i+self.nl])
x[i+self.nl] = x[i+self.nl].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
z.append(y.view(bs, -1, self.no))
return x if self.training else (torch.cat(z, 1), x[:self.nl])
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
class IBin(nn.Module):
stride = None # strides computed during build
export = False # onnx export
def __init__(self, nc=80, anchors=(), ch=(), bin_count=21): # detection layer
super(IBin, self).__init__()
self.nc = nc # number of classes
self.bin_count = bin_count
self.w_bin_sigmoid = SigmoidBin(bin_count=self.bin_count, min=0.0, max=4.0)
self.h_bin_sigmoid = SigmoidBin(bin_count=self.bin_count, min=0.0, max=4.0)
# classes, x,y,obj
self.no = nc + 3 + \
self.w_bin_sigmoid.get_length() + self.h_bin_sigmoid.get_length() # w-bce, h-bce
# + self.x_bin_sigmoid.get_length() + self.y_bin_sigmoid.get_length()
self.nl = len(anchors) # number of detection layers
self.na = len(anchors[0]) // 2 # number of anchors
self.grid = [torch.zeros(1)] * self.nl # init grid
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
self.register_buffer('anchors', a) # shape(nl,na,2)
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
self.ia = nn.ModuleList(ImplicitA(x) for x in ch)
self.im = nn.ModuleList(ImplicitM(self.no * self.na) for _ in ch)
def forward(self, x):
#self.x_bin_sigmoid.use_fw_regression = True
#self.y_bin_sigmoid.use_fw_regression = True
self.w_bin_sigmoid.use_fw_regression = True
self.h_bin_sigmoid.use_fw_regression = True
# x = x.copy() # for profiling
z = [] # inference output
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](self.ia[i](x[i])) # conv
x[i] = self.im[i](x[i])
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
#y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
#px = (self.x_bin_sigmoid.forward(y[..., 0:12]) + self.grid[i][..., 0]) * self.stride[i]
#py = (self.y_bin_sigmoid.forward(y[..., 12:24]) + self.grid[i][..., 1]) * self.stride[i]
pw = self.w_bin_sigmoid.forward(y[..., 2:24]) * self.anchor_grid[i][..., 0]
ph = self.h_bin_sigmoid.forward(y[..., 24:46]) * self.anchor_grid[i][..., 1]
#y[..., 0] = px
#y[..., 1] = py
y[..., 2] = pw
y[..., 3] = ph
y = torch.cat((y[..., 0:4], y[..., 46:]), dim=-1)
z.append(y.view(bs, -1, y.shape[-1]))
return x if self.training else (torch.cat(z, 1), x)
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
class Model(nn.Module):
def __init__(self, cfg='yolor-csp-c.yaml', ch=3, nc=None, anchors=None): # model, input channels, number of classes
super(Model, self).__init__()
self.traced = False
if isinstance(cfg, dict):
self.yaml = cfg # model dict
else: # is *.yaml
import yaml # for torch hub
self.yaml_file = Path(cfg).name
with open(cfg) as f:
self.yaml = yaml.load(f, Loader=yaml.SafeLoader) # model dict
# Define model
ch = self.yaml['ch'] = self.yaml.get('ch', ch) # input channels
if nc and nc != self.yaml['nc']:
logger.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}")
self.yaml['nc'] = nc # override yaml value
if anchors:
logger.info(f'Overriding model.yaml anchors with anchors={anchors}')
self.yaml['anchors'] = round(anchors) # override yaml value
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
self.names = [str(i) for i in range(self.yaml['nc'])] # default names
# print([x.shape for x in self.forward(torch.zeros(1, ch, 64, 64))])
# Build strides, anchors
m = self.model[-1] # Detect()
if isinstance(m, Detect):
s = 256 # 2x min stride
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
m.anchors /= m.stride.view(-1, 1, 1)
check_anchor_order(m)
self.stride = m.stride
self._initialize_biases() # only run once
# print('Strides: %s' % m.stride.tolist())
if isinstance(m, IDetect):
s = 256 # 2x min stride
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
m.anchors /= m.stride.view(-1, 1, 1)
check_anchor_order(m)
self.stride = m.stride
self._initialize_biases() # only run once
# print('Strides: %s' % m.stride.tolist())
if isinstance(m, IAuxDetect):
s = 256 # 2x min stride
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))[:4]]) # forward
#print(m.stride)
m.anchors /= m.stride.view(-1, 1, 1)
check_anchor_order(m)
self.stride = m.stride
self._initialize_aux_biases() # only run once
# print('Strides: %s' % m.stride.tolist())
if isinstance(m, IBin):
s = 256 # 2x min stride
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
m.anchors /= m.stride.view(-1, 1, 1)
check_anchor_order(m)
self.stride = m.stride
self._initialize_biases_bin() # only run once
# print('Strides: %s' % m.stride.tolist())
# Init weights, biases
initialize_weights(self)
self.info()
logger.info('')
def forward(self, x, augment=False, profile=False):
if augment:
img_size = x.shape[-2:] # height, width
s = [1, 0.83, 0.67] # scales
f = [None, 3, None] # flips (2-ud, 3-lr)
y = [] # outputs
for si, fi in zip(s, f):
xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max()))
yi = self.forward_once(xi)[0] # forward
# cv2.imwrite(f'img_{si}.jpg', 255 * xi[0].cpu().numpy().transpose((1, 2, 0))[:, :, ::-1]) # save
yi[..., :4] /= si # de-scale
if fi == 2:
yi[..., 1] = img_size[0] - yi[..., 1] # de-flip ud
elif fi == 3:
yi[..., 0] = img_size[1] - yi[..., 0] # de-flip lr
y.append(yi)
return torch.cat(y, 1), None # augmented inference, train
else:
return self.forward_once(x, profile) # single-scale inference, train
def forward_once(self, x, profile=False):
y, dt = [], [] # outputs
for m in self.model:
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if not hasattr(self, 'traced'):
self.traced=False
if self.traced:
if isinstance(m, Detect) or isinstance(m, IDetect) or isinstance(m, IAuxDetect):
break
if profile:
c = isinstance(m, (Detect, IDetect, IAuxDetect, IBin))
o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0 # FLOPS
for _ in range(10):
m(x.copy() if c else x)
t = time_synchronized()
for _ in range(10):
m(x.copy() if c else x)
dt.append((time_synchronized() - t) * 100)
print('%10.1f%10.0f%10.1fms %-40s' % (o, m.np, dt[-1], m.type))
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if profile:
print('%.1fms total' % sum(dt))
return x
def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
# https://arxiv.org/abs/1708.02002 section 3.3
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
m = self.model[-1] # Detect() module
for mi, s in zip(m.m, m.stride): # from
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)
def _initialize_aux_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
# https://arxiv.org/abs/1708.02002 section 3.3
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
m = self.model[-1] # Detect() module
for mi, mi2, s in zip(m.m, m.m2, m.stride): # from
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)
b2 = mi2.bias.view(m.na, -1) # conv.bias(255) to (3,85)
b2.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b2.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
mi2.bias = torch.nn.Parameter(b2.view(-1), requires_grad=True)
def _initialize_biases_bin(self, cf=None): # initialize biases into Detect(), cf is class frequency
# https://arxiv.org/abs/1708.02002 section 3.3
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
m = self.model[-1] # Bin() module
bc = m.bin_count
for mi, s in zip(m.m, m.stride): # from
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
old = b[:, (0,1,2,bc+3)].data
obj_idx = 2*bc+4
b[:, :obj_idx].data += math.log(0.6 / (bc + 1 - 0.99))
b[:, obj_idx].data += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b[:, (obj_idx+1):].data += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
b[:, (0,1,2,bc+3)].data = old
mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)
def _print_biases(self):
m = self.model[-1] # Detect() module
for mi in m.m: # from
b = mi.bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)
print(('%6g Conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))
# def _print_weights(self):
# for m in self.model.modules():
# if type(m) is Bottleneck:
# print('%10.3g' % (m.w.detach().sigmoid() * 2)) # shortcut weights
def fuse(self): # fuse model Conv2d() + BatchNorm2d() layers
print('Fusing layers... ')
for m in self.model.modules():
if isinstance(m, RepConv):
#print(f" fuse_repvgg_block")
m.fuse_repvgg_block()
elif isinstance(m, RepConv_OREPA):
#print(f" switch_to_deploy")
m.switch_to_deploy()
elif type(m) is Conv and hasattr(m, 'bn'):
m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv
delattr(m, 'bn') # remove batchnorm
m.forward = m.fuseforward # update forward
self.info()
return self
def nms(self, mode=True): # add or remove NMS module
present = type(self.model[-1]) is NMS # last layer is NMS
if mode and not present:
print('Adding NMS... ')
m = NMS() # module
m.f = -1 # from
m.i = self.model[-1].i + 1 # index
self.model.add_module(name='%s' % m.i, module=m) # add
self.eval()
elif not mode and present:
print('Removing NMS... ')
self.model = self.model[:-1] # remove
return self
def autoshape(self): # add autoShape module
print('Adding autoShape... ')
m = autoShape(self) # wrap model
copy_attr(m, self, include=('yaml', 'nc', 'hyp', 'names', 'stride'), exclude=()) # copy attributes
return m
def info(self, verbose=False, img_size=640): # print model information
model_info(self, verbose, img_size)
def parse_model(d, ch): # model_dict, input_channels(3)
logger.info('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']): # from, number, module, args
m = eval(m) if isinstance(m, str) else m # eval strings
for j, a in enumerate(args):
try:
args[j] = eval(a) if isinstance(a, str) else a # eval strings
except:
pass
n = max(round(n * gd), 1) if n > 1 else n # depth gain
if m in [nn.Conv2d, Conv, RobustConv, RobustConv2, DWConv, GhostConv, RepConv, RepConv_OREPA, DownC,
SPP, SPPF, SPPCSPC, GhostSPPCSPC, MixConv2d, Focus, Stem, GhostStem, CrossConv,
Bottleneck, BottleneckCSPA, BottleneckCSPB, BottleneckCSPC,
RepBottleneck, RepBottleneckCSPA, RepBottleneckCSPB, RepBottleneckCSPC,
Res, ResCSPA, ResCSPB, ResCSPC,
RepRes, RepResCSPA, RepResCSPB, RepResCSPC,
ResX, ResXCSPA, ResXCSPB, ResXCSPC,
RepResX, RepResXCSPA, RepResXCSPB, RepResXCSPC,
Ghost, GhostCSPA, GhostCSPB, GhostCSPC,
SwinTransformerBlock, STCSPA, STCSPB, STCSPC,
SwinTransformer2Block, ST2CSPA, ST2CSPB, ST2CSPC]:
c1, c2 = ch[f], args[0]
if c2 != no: # if not output
c2 = make_divisible(c2 * gw, 8)
args = [c1, c2, *args[1:]]
if m in [DownC, SPPCSPC, GhostSPPCSPC,
BottleneckCSPA, BottleneckCSPB, BottleneckCSPC,
RepBottleneckCSPA, RepBottleneckCSPB, RepBottleneckCSPC,
ResCSPA, ResCSPB, ResCSPC,
RepResCSPA, RepResCSPB, RepResCSPC,
ResXCSPA, ResXCSPB, ResXCSPC,
RepResXCSPA, RepResXCSPB, RepResXCSPC,
GhostCSPA, GhostCSPB, GhostCSPC,
STCSPA, STCSPB, STCSPC,
ST2CSPA, ST2CSPB, ST2CSPC]:
args.insert(2, n) # number of repeats
n = 1
elif m is nn.BatchNorm2d:
args = [ch[f]]
elif m is Concat:
c2 = sum([ch[x] for x in f])
elif m is Chuncat:
c2 = sum([ch[x] for x in f])
elif m is Shortcut:
c2 = ch[f[0]]
elif m is Foldcut:
c2 = ch[f] // 2
elif m in [Detect, IDetect, IAuxDetect, IBin]:
args.append([ch[x] for x in f])
if isinstance(args[1], int): # number of anchors
args[1] = [list(range(args[1] * 2))] * len(f)
elif m is ReOrg:
c2 = ch[f] * 4
elif m is Contract:
c2 = ch[f] * args[0] ** 2
elif m is Expand:
c2 = ch[f] // args[0] ** 2
else:
c2 = ch[f]
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
np = sum([x.numel() for x in m_.parameters()]) # number params
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params
logger.info('%3s%18s%3s%10.0f %-40s%-30s' % (i, f, n, np, t, args)) # print
save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
layers.append(m_)
if i == 0:
ch = []
ch.append(c2)
return nn.Sequential(*layers), sorted(save)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--cfg', type=str, default='yolor-csp-c.yaml', help='model.yaml')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--profile', action='store_true', help='profile model speed')
opt = parser.parse_args()
opt.cfg = check_file(opt.cfg) # check file
set_logging()
device = select_device(opt.device)
# Create model
model = Model(opt.cfg).to(device)
model.train()
if opt.profile:
img = torch.rand(1, 3, 640, 640).to(device)
y = model(img, profile=True)
# Profile
# img = torch.rand(8 if torch.cuda.is_available() else 1, 3, 640, 640).to(device)
# y = model(img, profile=True)
# Tensorboard
# from torch.utils.tensorboard import SummaryWriter
# tb_writer = SummaryWriter()
# print("Run 'tensorboard --logdir=models/runs' to view tensorboard at http://localhost:6006/")
# tb_writer.add_graph(model.model, img) # add model to tensorboard
# tb_writer.add_image('test', img[0], dataformats='CWH') # add model to tensorboard

View File

@ -0,0 +1,24 @@
numpy
cython-bbox==0.1.3
loguru
motmetrics==1.4.0
ninja
pandas
Pillow
PyYAML
scikit-learn
scipy
seaborn
thop
tensorboard
lap
tabulate
tqdm
wandb
gdown

View File

@ -0,0 +1,22 @@
#!/bin/bash
# COCO 2017 dataset http://cocodataset.org
# Download command: bash ./scripts/get_coco.sh
# Download/unzip labels
d='./' # unzip directory
url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
echo 'Downloading' $url$f ' ...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
# Download/unzip images
d='./coco/images' # unzip directory
url=http://images.cocodataset.org/zips/
f1='train2017.zip' # 19G, 118k images
f2='val2017.zip' # 1G, 5k images
f3='test2017.zip' # 7G, 41k images (optional)
for f in $f1 $f2 $f3; do
echo 'Downloading' $url$f '...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
done
wait # finish background tasks

View File

@ -0,0 +1,350 @@
import argparse
import json
import os
from pathlib import Path
from threading import Thread
import numpy as np
import torch
import yaml
from tqdm import tqdm
from models.experimental import attempt_load
from utils.datasets import create_dataloader
from utils.general import coco80_to_coco91_class, check_dataset, check_file, check_img_size, check_requirements, \
box_iou, non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, set_logging, increment_path, colorstr
from utils.metrics import ap_per_class, ConfusionMatrix
from utils.plots import plot_images, output_to_target, plot_study_txt
from utils.torch_utils import select_device, time_synchronized, TracedModel
def test(data,
weights=None,
batch_size=32,
imgsz=640,
conf_thres=0.001,
iou_thres=0.6, # for NMS
save_json=False,
single_cls=False,
augment=False,
verbose=False,
model=None,
dataloader=None,
save_dir=Path(''), # for saving images
save_txt=False, # for auto-labelling
save_hybrid=False, # for hybrid auto-labelling
save_conf=False, # save auto-label confidences
plots=True,
wandb_logger=None,
compute_loss=None,
half_precision=True,
trace=False,
is_coco=False):
# Initialize/load model and set device
training = model is not None
if training: # called by train.py
device = next(model.parameters()).device # get model device
else: # called directly
set_logging()
device = select_device(opt.device, batch_size=batch_size)
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
gs = max(int(model.stride.max()), 32) # grid size (max stride)
imgsz = check_img_size(imgsz, s=gs) # check img_size
if trace:
model = TracedModel(model, device, opt.img_size)
# Half
half = device.type != 'cpu' and half_precision # half precision only supported on CUDA
if half:
model.half()
# Configure
model.eval()
if isinstance(data, str):
is_coco = data.endswith('coco.yaml')
with open(data) as f:
data = yaml.load(f, Loader=yaml.SafeLoader)
check_dataset(data) # check
nc = 1 if single_cls else int(data['nc']) # number of classes
iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95
niou = iouv.numel()
# Logging
log_imgs = 0
if wandb_logger and wandb_logger.wandb:
log_imgs = min(wandb_logger.log_imgs, 100)
# Dataloader
if not training:
if device.type != 'cpu':
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) # run once
task = opt.task if opt.task in ('train', 'val', 'test') else 'val' # path to train/val/test images
dataloader = create_dataloader(data[task], imgsz, batch_size, gs, opt, pad=0.5, rect=True,
prefix=colorstr(f'{task}: '))[0]
seen = 0
confusion_matrix = ConfusionMatrix(nc=nc)
names = {k: v for k, v in enumerate(model.names if hasattr(model, 'names') else model.module.names)}
coco91class = coco80_to_coco91_class()
s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
loss = torch.zeros(3, device=device)
jdict, stats, ap, ap_class, wandb_images = [], [], [], [], []
for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
img = img.to(device, non_blocking=True)
img = img.half() if half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
targets = targets.to(device)
nb, _, height, width = img.shape # batch size, channels, height, width
with torch.no_grad():
# Run model
t = time_synchronized()
out, train_out = model(img, augment=augment) # inference and training outputs
t0 += time_synchronized() - t
# Compute loss
if compute_loss:
loss += compute_loss([x.float() for x in train_out], targets)[1][:3] # box, obj, cls
# Run NMS
targets[:, 2:] *= torch.Tensor([width, height, width, height]).to(device) # to pixels
lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else [] # for autolabelling
t = time_synchronized()
out = non_max_suppression(out, conf_thres=conf_thres, iou_thres=iou_thres, labels=lb, multi_label=True)
t1 += time_synchronized() - t
# Statistics per image
for si, pred in enumerate(out):
labels = targets[targets[:, 0] == si, 1:]
nl = len(labels)
tcls = labels[:, 0].tolist() if nl else [] # target class
path = Path(paths[si])
seen += 1
if len(pred) == 0:
if nl:
stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
continue
# Predictions
predn = pred.clone()
scale_coords(img[si].shape[1:], predn[:, :4], shapes[si][0], shapes[si][1]) # native-space pred
# Append to text file
if save_txt:
gn = torch.tensor(shapes[si][0])[[1, 0, 1, 0]] # normalization gain whwh
for *xyxy, conf, cls in predn.tolist():
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format
with open(save_dir / 'labels' / (path.stem + '.txt'), 'a') as f:
f.write(('%g ' * len(line)).rstrip() % line + '\n')
# W&B logging - Media Panel Plots
if len(wandb_images) < log_imgs and wandb_logger.current_epoch > 0: # Check for test operation
if wandb_logger.current_epoch % wandb_logger.bbox_interval == 0:
box_data = [{"position": {"minX": xyxy[0], "minY": xyxy[1], "maxX": xyxy[2], "maxY": xyxy[3]},
"class_id": int(cls),
"box_caption": "%s %.3f" % (names[cls], conf),
"scores": {"class_score": conf},
"domain": "pixel"} for *xyxy, conf, cls in pred.tolist()]
boxes = {"predictions": {"box_data": box_data, "class_labels": names}} # inference-space
wandb_images.append(wandb_logger.wandb.Image(img[si], boxes=boxes, caption=path.name))
wandb_logger.log_training_progress(predn, path, names) if wandb_logger and wandb_logger.wandb_run else None
# Append to pycocotools JSON dictionary
if save_json:
# [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ...
image_id = int(path.stem) if path.stem.isnumeric() else path.stem
box = xyxy2xywh(predn[:, :4]) # xywh
box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner
for p, b in zip(pred.tolist(), box.tolist()):
jdict.append({'image_id': image_id,
'category_id': coco91class[int(p[5])] if is_coco else int(p[5]),
'bbox': [round(x, 3) for x in b],
'score': round(p[4], 5)})
# Assign all predictions as incorrect
correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device)
if nl:
detected = [] # target indices
tcls_tensor = labels[:, 0]
# target boxes
tbox = xywh2xyxy(labels[:, 1:5])
scale_coords(img[si].shape[1:], tbox, shapes[si][0], shapes[si][1]) # native-space labels
if plots:
confusion_matrix.process_batch(predn, torch.cat((labels[:, 0:1], tbox), 1))
# Per target class
for cls in torch.unique(tcls_tensor):
ti = (cls == tcls_tensor).nonzero(as_tuple=False).view(-1) # prediction indices
pi = (cls == pred[:, 5]).nonzero(as_tuple=False).view(-1) # target indices
# Search for detections
if pi.shape[0]:
# Prediction to target ious
ious, i = box_iou(predn[pi, :4], tbox[ti]).max(1) # best ious, indices
# Append detections
detected_set = set()
for j in (ious > iouv[0]).nonzero(as_tuple=False):
d = ti[i[j]] # detected target
if d.item() not in detected_set:
detected_set.add(d.item())
detected.append(d)
correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn
if len(detected) == nl: # all targets already located in image
break
# Append statistics (correct, conf, pcls, tcls)
stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
# Plot images
if plots and batch_i < 3:
f = save_dir / f'test_batch{batch_i}_labels.jpg' # labels
Thread(target=plot_images, args=(img, targets, paths, f, names), daemon=True).start()
f = save_dir / f'test_batch{batch_i}_pred.jpg' # predictions
Thread(target=plot_images, args=(img, output_to_target(out), paths, f, names), daemon=True).start()
# Compute statistics
stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
if len(stats) and stats[0].any():
p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
ap50, ap = ap[:, 0], ap.mean(1) # AP@0.5, AP@0.5:0.95
mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class
else:
nt = torch.zeros(1)
# Print results
pf = '%20s' + '%12i' * 2 + '%12.3g' * 4 # print format
print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))
# Print results per class
if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):
for i, c in enumerate(ap_class):
print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))
# Print speeds
t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple
if not training:
print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t)
# Plots
if plots:
confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
if wandb_logger and wandb_logger.wandb:
val_batches = [wandb_logger.wandb.Image(str(f), caption=f.name) for f in sorted(save_dir.glob('test*.jpg'))]
wandb_logger.log({"Validation": val_batches})
if wandb_images:
wandb_logger.log({"Bounding Box Debugger/Images": wandb_images})
# Save JSON
if save_json and len(jdict):
w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else '' # weights
anno_json = './coco/annotations/instances_val2017.json' # annotations json
pred_json = str(save_dir / f"{w}_predictions.json") # predictions json
print('\nEvaluating pycocotools mAP... saving %s...' % pred_json)
with open(pred_json, 'w') as f:
json.dump(jdict, f)
try: # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
anno = COCO(anno_json) # init annotations api
pred = anno.loadRes(pred_json) # init predictions api
eval = COCOeval(anno, pred, 'bbox')
if is_coco:
eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files] # image IDs to evaluate
eval.evaluate()
eval.accumulate()
eval.summarize()
map, map50 = eval.stats[:2] # update results (mAP@0.5:0.95, mAP@0.5)
except Exception as e:
print(f'pycocotools unable to run: {e}')
# Return results
model.float() # for training
if not training:
s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
print(f"Results saved to {save_dir}{s}")
maps = np.zeros(nc) + map
for i, c in enumerate(ap_class):
maps[c] = ap[i]
return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t
if __name__ == '__main__':
parser = argparse.ArgumentParser(prog='test.py')
parser.add_argument('--dataset', type=str, default='COCO', help='dataset name')
parser.add_argument('--weights', nargs='+', type=str, default='yolov7.pt', help='model.pt path(s)')
parser.add_argument('--data', type=str, default='data/coco.yaml', help='*.data path')
parser.add_argument('--batch-size', type=int, default=32, help='size of each image batch')
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.65, help='IOU threshold for NMS')
parser.add_argument('--task', default='val', help='train, val, test, speed or study')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--verbose', action='store_true', help='report mAP by class')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-hybrid', action='store_true', help='save label+prediction hybrid results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file')
parser.add_argument('--project', default='runs/test', help='save to project/name')
parser.add_argument('--name', default='exp', help='save to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--no-trace', action='store_true', help='don`t trace model')
opt = parser.parse_args()
opt.save_json |= opt.data.endswith('coco.yaml')
opt.data = check_file(opt.data) # check file
print(opt)
#check_requirements()
if opt.task in ('train', 'val', 'test'): # run normally
test(opt.data,
opt.weights,
opt.batch_size,
opt.img_size,
opt.conf_thres,
opt.iou_thres,
opt.save_json,
opt.single_cls,
opt.augment,
opt.verbose,
save_txt=opt.save_txt | opt.save_hybrid,
save_hybrid=opt.save_hybrid,
save_conf=opt.save_conf,
trace=not opt.no_trace,
)
elif opt.task == 'speed': # speed benchmarks
for w in opt.weights:
test(opt.data, w, opt.batch_size, opt.img_size, 0.25, 0.45, save_json=False, plots=False)
elif opt.task == 'study': # run over a range of settings and save/plot
# python test.py --task study --data coco.yaml --iou 0.65 --weights yolov7.pt
x = list(range(256, 1536 + 128, 128)) # x axis (image sizes)
for w in opt.weights:
f = f'study_{Path(opt.data).stem}_{Path(w).stem}.txt' # filename to save to
y = [] # y axis
for i in x: # img-size
print(f'\nRunning {f} point {i}...')
r, _, t = test(opt.data, w, opt.batch_size, i, opt.conf_thres, opt.iou_thres, opt.save_json,
plots=False)
y.append(r + t) # results and times
np.savetxt(f, y, fmt='%10.4g') # save
os.system('zip -r study.zip study_*.txt')
plot_study_txt(x=x) # plot

View File

@ -0,0 +1,180 @@
"""
将UAVDT转换为yolo v5格式
class_id, xc_norm, yc_norm, w_norm, h_norm
"""
import os
import os.path as osp
import argparse
import cv2
import glob
import numpy as np
import random
DATA_ROOT = '/data/wujiapeng/datasets/MOT17/'
image_wh_dict = {} # seq->(w,h) 字典 用于归一化
def generate_imgs_and_labels(opts):
"""
产生图片路径的txt文件以及yolo格式真值文件
"""
if opts.split == 'test':
seq_list = os.listdir(osp.join(DATA_ROOT, 'test'))
else:
seq_list = os.listdir(osp.join(DATA_ROOT, 'train'))
seq_list = [item for item in seq_list if 'FRCNN' in item] # 只取一个FRCNN即可
if 'val' in opts.split: opts.half = True # 验证集取训练集的一半
print('--------------------------')
print(f'Total {len(seq_list)} seqs!!')
print(seq_list)
if opts.random:
random.shuffle(seq_list)
# 定义类别 MOT只有一类
CATEGOTY_ID = 0 # pedestrian
# 定义帧数范围
frame_range = {'start': 0.0, 'end': 1.0}
if opts.half: # half 截取一半
frame_range['end'] = 0.5
if opts.split == 'test':
process_train_test(seqs=seq_list, frame_range=frame_range, cat_id=CATEGOTY_ID, split='test')
else:
process_train_test(seqs=seq_list, frame_range=frame_range, cat_id=CATEGOTY_ID, split=opts.split)
def process_train_test(seqs: list, frame_range: dict, cat_id: int = 0, split: str = 'trian') -> None:
"""
处理MOT17的train test
由于操作相似 故另写函数
"""
for seq in seqs:
print(f'Dealing with {split} dataset...')
img_dir = osp.join(DATA_ROOT, 'train', seq, 'img1') if split != 'test' else osp.join(DATA_ROOT, 'test', seq, 'img1') # 图片路径
imgs = sorted(os.listdir(img_dir)) # 所有图片的相对路径
seq_length = len(imgs) # 序列长度
if split != 'test':
# 求解图片高宽
img_eg = cv2.imread(osp.join(img_dir, imgs[0]))
w0, h0 = img_eg.shape[1], img_eg.shape[0] # 原始高宽
ann_of_seq_path = os.path.join(img_dir, '../', 'gt', 'gt.txt') # GT文件路径
ann_of_seq = np.loadtxt(ann_of_seq_path, dtype=np.float32, delimiter=',') # GT内容
gt_to_path = osp.join(DATA_ROOT, 'labels', split, seq) # 要写入的真值文件夹
# 如果不存在就创建
if not osp.exists(gt_to_path):
os.makedirs(gt_to_path)
exist_gts = [] # 初始化该列表 每个元素对应该seq的frame中有无真值框
# 如果没有 就在train.txt产生图片路径
for idx, img in enumerate(imgs):
# img 形如: img000001.jpg
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
# 第一步 产生图片软链接
# print('step1, creating imgs symlink...')
if opts.generate_imgs:
img_to_path = osp.join(DATA_ROOT, 'images', split, seq) # 该序列图片存储位置
if not osp.exists(img_to_path):
os.makedirs(img_to_path)
os.symlink(osp.join(img_dir, img),
osp.join(img_to_path, img)) # 创建软链接
# 第二步 产生真值文件
# print('step2, generating gt files...')
ann_of_current_frame = ann_of_seq[ann_of_seq[:, 0] == float(idx + 1), :] # 筛选真值文件里本帧的目标信息
exist_gts.append(True if ann_of_current_frame.shape[0] != 0 else False)
gt_to_file = osp.join(gt_to_path, img[: -4] + '.txt')
with open(gt_to_file, 'w') as f_gt:
for i in range(ann_of_current_frame.shape[0]):
if int(ann_of_current_frame[i][6]) == 1 and int(ann_of_current_frame[i][7]) == 1 \
and float(ann_of_current_frame[i][8]) > 0.25:
# bbox xywh
x0, y0 = int(ann_of_current_frame[i][2]), int(ann_of_current_frame[i][3])
x0, y0 = max(x0, 0), max(y0, 0)
w, h = int(ann_of_current_frame[i][4]), int(ann_of_current_frame[i][5])
xc, yc = x0 + w // 2, y0 + h // 2 # 中心点 x y
# 归一化
xc, yc = xc / w0, yc / h0
xc, yc = min(xc, 1.0), min(yc, 1.0)
w, h = w / w0, h / h0
w, h = min(w, 1.0), min(h, 1.0)
assert w <= 1 and h <= 1, f'{w}, {h} must be normed, original size{w0}, {h0}'
assert xc >= 0 and yc >= 0, f'{x0}, {y0} must be positve'
assert xc <= 1 and yc <= 1, f'{x0}, {y0} must be le than 1'
category_id = cat_id
write_line = '{:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
category_id, xc, yc, w, h)
f_gt.write(write_line)
f_gt.close()
else: # test 只产生图片软链接
for idx, img in enumerate(imgs):
# img 形如: img000001.jpg
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
# 第一步 产生图片软链接
# print('step1, creating imgs symlink...')
if opts.generate_imgs:
img_to_path = osp.join(DATA_ROOT, 'images', split, seq) # 该序列图片存储位置
if not osp.exists(img_to_path):
os.makedirs(img_to_path)
os.symlink(osp.join(img_dir, img),
osp.join(img_to_path, img)) # 创建软链接
# 第三步 产生图片索引train.txt等
print(f'generating img index file of {seq}')
to_file = os.path.join('./mot17/', split + '.txt')
with open(to_file, 'a') as f:
for idx, img in enumerate(imgs):
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
if split == 'test' or exist_gts[idx]:
f.write('MOT17/' + 'images/' + split + '/' \
+ seq + '/' + img + '\n')
f.close()
if __name__ == '__main__':
if not osp.exists('./mot17'):
os.system('mkdir mot17')
parser = argparse.ArgumentParser()
parser.add_argument('--split', type=str, default='train', help='train, test or val')
parser.add_argument('--generate_imgs', action='store_true', help='whether generate soft link of imgs')
parser.add_argument('--certain_seqs', action='store_true', help='for debug')
parser.add_argument('--half', action='store_true', help='half frames')
parser.add_argument('--ratio', type=float, default=0.8, help='ratio of test dataset devide train dataset')
parser.add_argument('--random', action='store_true', help='random split train and test')
opts = parser.parse_args()
generate_imgs_and_labels(opts)
# python tools/convert_MOT17_to_yolo.py --split train --generate_imgs

View File

@ -0,0 +1,159 @@
"""
将UAVDT转换为yolo v5格式
class_id, xc_norm, yc_norm, w_norm, h_norm
"""
import os
import os.path as osp
import argparse
import cv2
import glob
import numpy as np
import random
DATA_ROOT = '/data/wujiapeng/datasets/UAVDT/'
image_wh_dict = {} # seq->(w,h) 字典 用于归一化
def generate_imgs_and_labels(opts):
"""
产生图片路径的txt文件以及yolo格式真值文件
"""
seq_list = os.listdir(osp.join(DATA_ROOT, 'UAV-benchmark-M'))
print('--------------------------')
print(f'Total {len(seq_list)} seqs!!')
# 划分train test
if opts.random:
random.shuffle(seq_list)
bound = int(opts.ratio * len(seq_list))
train_seq_list = seq_list[: bound]
test_seq_list = seq_list[bound:]
del bound
print(f'train dataset: {train_seq_list}')
print(f'test dataset: {test_seq_list}')
print('--------------------------')
if not osp.exists('./uavdt/'):
os.makedirs('./uavdt/')
# 定义类别 UAVDT只有一类
CATEGOTY_ID = 0 # car
# 定义帧数范围
frame_range = {'start': 0.0, 'end': 1.0}
if opts.half: # half 截取一半
frame_range['end'] = 0.5
# 分别处理train与test
process_train_test(train_seq_list, frame_range, CATEGOTY_ID, 'train')
process_train_test(test_seq_list, {'start': 0.0, 'end': 1.0}, CATEGOTY_ID, 'test')
print('All Done!!')
def process_train_test(seqs: list, frame_range: dict, cat_id: int = 0, split: str = 'trian') -> None:
"""
处理UAVDT的train test
由于操作相似 故另写函数
"""
for seq in seqs:
print('Dealing with train dataset...')
img_dir = osp.join(DATA_ROOT, 'UAV-benchmark-M', seq, 'img1') # 图片路径
imgs = sorted(os.listdir(img_dir)) # 所有图片的相对路径
seq_length = len(imgs) # 序列长度
# 求解图片高宽
img_eg = cv2.imread(osp.join(img_dir, imgs[0]))
w0, h0 = img_eg.shape[1], img_eg.shape[0] # 原始高宽
ann_of_seq_path = os.path.join(img_dir, '../', 'gt', 'gt.txt') # GT文件路径
ann_of_seq = np.loadtxt(ann_of_seq_path, dtype=np.float32, delimiter=',') # GT内容
gt_to_path = osp.join(DATA_ROOT, 'labels', split, seq) # 要写入的真值文件夹
# 如果不存在就创建
if not osp.exists(gt_to_path):
os.makedirs(gt_to_path)
exist_gts = [] # 初始化该列表 每个元素对应该seq的frame中有无真值框
# 如果没有 就在train.txt产生图片路径
for idx, img in enumerate(imgs):
# img 形如: img000001.jpg
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
# 第一步 产生图片软链接
# print('step1, creating imgs symlink...')
if opts.generate_imgs:
img_to_path = osp.join(DATA_ROOT, 'images', split, seq) # 该序列图片存储位置
if not osp.exists(img_to_path):
os.makedirs(img_to_path)
os.symlink(osp.join(img_dir, img),
osp.join(img_to_path, img)) # 创建软链接
# 第二步 产生真值文件
# print('step2, generating gt files...')
ann_of_current_frame = ann_of_seq[ann_of_seq[:, 0] == float(idx + 1), :] # 筛选真值文件里本帧的目标信息
exist_gts.append(True if ann_of_current_frame.shape[0] != 0 else False)
gt_to_file = osp.join(gt_to_path, img[:-4] + '.txt')
with open(gt_to_file, 'w') as f_gt:
for i in range(ann_of_current_frame.shape[0]):
if int(ann_of_current_frame[i][6]) == 1:
# bbox xywh
x0, y0 = int(ann_of_current_frame[i][2]), int(ann_of_current_frame[i][3])
w, h = int(ann_of_current_frame[i][4]), int(ann_of_current_frame[i][5])
xc, yc = x0 + w // 2, y0 + h // 2 # 中心点 x y
# 归一化
xc, yc = xc / w0, yc / h0
w, h = w / w0, h / h0
category_id = cat_id
write_line = '{:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
category_id, xc, yc, w, h)
f_gt.write(write_line)
f_gt.close()
# 第三步 产生图片索引train.txt等
print(f'generating img index file of {seq}')
to_file = os.path.join('./uavdt/', split + '.txt')
with open(to_file, 'a') as f:
for idx, img in enumerate(imgs):
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
if exist_gts[idx]:
f.write('UAVDT/' + 'images/' + split + '/' \
+ seq + '/' + img + '\n')
f.close()
if __name__ == '__main__':
if not osp.exists('./uavdt'):
os.system('mkdir ./uavdt')
else:
os.system('rm -rf ./uavdt/*')
parser = argparse.ArgumentParser()
parser.add_argument('--generate_imgs', action='store_true', help='whether generate soft link of imgs')
parser.add_argument('--certain_seqs', action='store_true', help='for debug')
parser.add_argument('--half', action='store_true', help='half frames')
parser.add_argument('--ratio', type=float, default=0.8, help='ratio of test dataset devide train dataset')
parser.add_argument('--random', action='store_true', help='random split train and test')
opts = parser.parse_args()
generate_imgs_and_labels(opts)
# python tools/convert_UAVDT_to_yolo.py --generate_imgs --half --random

View File

@ -0,0 +1,182 @@
"""
将VisDrone转换为yolo v5格式
class_id, xc_norm, yc_norm, w_norm, h_norm
"""
import os
import os.path as osp
import argparse
import cv2
import glob
DATA_ROOT = '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019'
# 以下两个seqs只跟踪车的时候有用
certain_seqs = ['uav0000071_03240_v', 'uav0000072_04488_v','uav0000072_05448_v', 'uav0000072_06432_v','uav0000124_00944_v','uav0000126_00001_v','uav0000138_00000_v','uav0000145_00000_v','uav0000150_02310_v','uav0000222_03150_v','uav0000239_12336_v','uav0000243_00001_v',
'uav0000248_00001_v','uav0000263_03289_v','uav0000266_03598_v','uav0000273_00001_v','uav0000279_00001_v','uav0000281_00460_v','uav0000289_00001_v','uav0000289_06922_v','uav0000307_00000_v',
'uav0000308_00000_v','uav0000308_01380_v','uav0000326_01035_v','uav0000329_04715_v','uav0000361_02323_v','uav0000366_00001_v']
ignored_seqs = ['uav0000013_00000_v', 'uav0000013_01073_v', 'uav0000013_01392_v',
'uav0000020_00406_v', 'uav0000079_00480_v',
'uav0000084_00000_v', 'uav0000099_02109_v', 'uav0000086_00000_v',
'uav0000073_00600_v', 'uav0000073_04464_v', 'uav0000088_00290_v']
image_wh_dict = {} # seq->(w,h) 字典 用于归一化
def generate_imgs(split_name='VisDrone2019-MOT-train', generate_imgs=True, if_certain_seqs=False, car_only=False):
"""
产生图片文件夹 例如 VisDrone/images/VisDrone2019-MOT-train/uav0000076_00720_v/000010.jpg
同时产生序列->,宽的字典 便于后续
split: str, 'VisDrone2019-MOT-train', 'VisDrone2019-MOT-val' or 'VisDrone2019-MOT-test-dev'
if_certain_seqs: bool, use for debug.
"""
if not if_certain_seqs:
seq_list = os.listdir(osp.join(DATA_ROOT, split_name, 'sequences')) # 所有序列名称
else:
seq_list = certain_seqs
if car_only: # 只跟踪车就忽略行人多的视频
seq_list = [seq for seq in seq_list if seq not in ignored_seqs]
# 遍历所有序列 给图片创建软链接 同时更新seq->(w,h)字典
if_write_txt = True if glob.glob('./visdrone/*.txt') else False
# if_write_txt = True if not osp.exists(f'./visdrone/.txt') else False # 是否需要写txt 用于生成visdrone.train
if not if_write_txt:
for seq in seq_list:
img_dir = osp.join(DATA_ROOT, split_name, 'sequences', seq) # 该序列下所有图片路径
imgs = sorted(os.listdir(img_dir)) # 所有图片
if generate_imgs:
to_path = osp.join(DATA_ROOT, 'images', split_name, seq) # 该序列图片存储位置
if not osp.exists(to_path):
os.makedirs(to_path)
for img in imgs: # 遍历该序列下的图片
os.symlink(osp.join(img_dir, img),
osp.join(to_path, img)) # 创建软链接
img_sample = cv2.imread(osp.join(img_dir, imgs[0])) # 每个序列第一张图片 用于获取w, h
w, h = img_sample.shape[1], img_sample.shape[0] # w, h
image_wh_dict[seq] = (w, h) # 更新seq->(w,h) 字典
# print(image_wh_dict)
# return
else:
with open('./visdrone.txt', 'a') as f:
for seq in seq_list:
img_dir = osp.join(DATA_ROOT, split_name, 'sequences', seq) # 该序列下所有图片路径
imgs = sorted(os.listdir(img_dir)) # 所有图片
if generate_imgs:
to_path = osp.join(DATA_ROOT, 'images', split_name, seq) # 该序列图片存储位置
if not osp.exists(to_path):
os.makedirs(to_path)
for img in imgs: # 遍历该序列下的图片
f.write('VisDrone2019/' + 'VisDrone2019/' + 'images/' + split_name + '/' \
+ seq + '/' + img + '\n')
os.symlink(osp.join(img_dir, img),
osp.join(to_path, img)) # 创建软链接
img_sample = cv2.imread(osp.join(img_dir, imgs[0])) # 每个序列第一张图片 用于获取w, h
w, h = img_sample.shape[1], img_sample.shape[0] # w, h
image_wh_dict[seq] = (w, h) # 更新seq->(w,h) 字典
f.close()
if if_certain_seqs: # for debug
print(image_wh_dict)
def generate_labels(split='VisDrone2019-MOT-train', if_certain_seqs=False, car_only=False):
"""
split: str, 'train', 'val' or 'test'
if_certain_seqs: bool, use for debug.
"""
# from choose_anchors import image_wh_dict
# print(image_wh_dict)
if not if_certain_seqs:
seq_list = os.listdir(osp.join(DATA_ROOT, split, 'sequences')) # 序列列表
else:
seq_list = certain_seqs
if car_only: # 只跟踪车就忽略行人多的视频
seq_list = [seq for seq in seq_list if seq not in ignored_seqs]
category_list = ['4', '5', '6', '9']
else:
category_list = [str(i) for i in range(1, 11)]
# 类别ID 从0开始
category_dict = {category_list[idx]: idx for idx in range(len(category_list))}
# 每张图片分配一个txt
# 要从sequence的txt里分出来
for seq in seq_list:
seq_dir = osp.join(DATA_ROOT, split, 'annotations', seq + '.txt') # 真值文件
with open(seq_dir, 'r') as f:
lines = f.readlines()
for row in lines:
current_line = row.split(',')
frame = current_line[0] # 第几帧
if current_line[6] == '0' or current_line[7] not in category_list:
continue
to_file = osp.join(DATA_ROOT, 'labels', split, seq) # 要写入的文件名
# 如果不存在就创建
if not osp.exists(to_file):
os.makedirs(to_file)
to_file = osp.join(to_file, frame.zfill(7) + '.txt')
category_id = category_dict[current_line[7]]
x0, y0 = int(current_line[2]), int(current_line[3]) # 左上角 x y
w, h = int(current_line[4]), int(current_line[5]) # 宽 高
x_c, y_c = x0 + w // 2, y0 + h // 2 # 中心点 x y
image_w, image_h = image_wh_dict[seq][0], image_wh_dict[seq][1] # 图像高宽
# 归一化
w, h = w / image_w, h / image_h
x_c, y_c = x_c / image_w, y_c / image_h
with open(to_file, 'a') as f_to:
write_line = '{:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
category_id, x_c, y_c, w, h)
f_to.write(write_line)
f_to.close()
f.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--split', type=str, default='VisDrone2019-MOT-train', help='train or test')
parser.add_argument('--generate_imgs', action='store_true', help='whether generate soft link of imgs')
parser.add_argument('--car_only', action='store_true', help='only cars')
parser.add_argument('--if_certain_seqs', action='store_true', help='for debug')
opt = parser.parse_args()
print('generating images...')
generate_imgs(opt.split, opt.generate_imgs, opt.if_certain_seqs, opt.car_only)
print('generating labels...')
generate_labels(opt.split, opt.if_certain_seqs, opt.car_only)
print('Done!')
# python convert_VisDrone_to_yolo.py --split VisDrone2019-MOT-train
# python convert_VisDrone_to_yolo.py --split VisDrone2019-MOT-train --car_only --if_certain_seqs

View File

@ -0,0 +1,168 @@
"""
将VisDrone转换为yolo v5格式
class_id, xc_norm, yc_norm, w_norm, h_norm
改动:
1. 将产生img和label函数合成一个
2. 增加如果无label就不产生当前img路径的功能
3. 增加half选项 每个视频截取一半
"""
import os
import os.path as osp
import argparse
import cv2
import glob
import numpy as np
DATA_ROOT = '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019'
# 以下两个seqs只跟踪车的时候有用
certain_seqs = ['uav0000071_03240_v', 'uav0000072_04488_v','uav0000072_05448_v', 'uav0000072_06432_v','uav0000124_00944_v','uav0000126_00001_v','uav0000138_00000_v','uav0000145_00000_v','uav0000150_02310_v','uav0000222_03150_v','uav0000239_12336_v','uav0000243_00001_v',
'uav0000248_00001_v','uav0000263_03289_v','uav0000266_03598_v','uav0000273_00001_v','uav0000279_00001_v','uav0000281_00460_v','uav0000289_00001_v','uav0000289_06922_v','uav0000307_00000_v',
'uav0000308_00000_v','uav0000308_01380_v','uav0000326_01035_v','uav0000329_04715_v','uav0000361_02323_v','uav0000366_00001_v']
ignored_seqs = ['uav0000013_00000_v', 'uav0000013_01073_v', 'uav0000013_01392_v',
'uav0000020_00406_v', 'uav0000079_00480_v',
'uav0000084_00000_v', 'uav0000099_02109_v', 'uav0000086_00000_v',
'uav0000073_00600_v', 'uav0000073_04464_v', 'uav0000088_00290_v']
image_wh_dict = {} # seq->(w,h) 字典 用于归一化
def generate_imgs_and_labels(opts):
"""
产生图片路径的txt文件以及yolo格式真值文件
"""
if not opts.certain_seqs:
seq_list = os.listdir(osp.join(DATA_ROOT, opts.split_name, 'sequences')) # 所有序列名称
else:
seq_list = certain_seqs
if opts.car_only: # 只跟踪车就忽略行人多的视频
seq_list = [seq for seq in seq_list if seq not in ignored_seqs]
category_list = [4, 5, 6, 9] # 感兴趣的类别编号 List[int]
else:
category_list = [i for i in range(1, 11)]
print(f'Total {len(seq_list)} seqs!!')
if not osp.exists('./visdrone/'):
os.makedirs('./visdrone/')
# 类别ID 从0开始
category_dict = {category_list[idx]: idx for idx in range(len(category_list))}
txt_name_dict = {'VisDrone2019-MOT-train': 'train',
'VisDrone2019-MOT-val': 'val',
'VisDrone2019-MOT-test-dev': 'test'} # 产生txt文件名称对应关系
# 如果已经存在就不写了
write_txt = False if os.path.isfile(os.path.join('./visdrone', txt_name_dict[opts.split_name] + '.txt')) else True
print(f'write txt is {write_txt}')
frame_range = {'start': 0.0, 'end': 1.0}
if opts.half: # VisDrone-half 截取一半
frame_range['end'] = 0.5
# 以序列为单位进行处理
for seq in seq_list:
img_dir = osp.join(DATA_ROOT, opts.split_name, 'sequences', seq) # 该序列下所有图片路径
imgs = sorted(os.listdir(img_dir)) # 所有图片
seq_length = len(imgs) # 序列长度
img_eg = cv2.imread(os.path.join(img_dir, imgs[0])) # 序列的第一张图 用以计算高宽
w0, h0 = img_eg.shape[1], img_eg.shape[0] # 原始高宽
ann_of_seq_path = os.path.join(DATA_ROOT, opts.split_name, 'annotations', seq + '.txt') # GT文件路径
ann_of_seq = np.loadtxt(ann_of_seq_path, dtype=np.float32, delimiter=',') # GT内容
gt_to_path = osp.join(DATA_ROOT, 'labels', opts.split_name, seq) # 要写入的真值文件夹
# 如果不存在就创建
if not osp.exists(gt_to_path):
os.makedirs(gt_to_path)
exist_gts = [] # 初始化该列表 每个元素对应该seq的frame中有无真值框
# 如果没有 就在train.txt产生图片路径
for idx, img in enumerate(imgs):
# img: 相对路径 即 图片名称 0000001.jpg
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
# 第一步 产生图片软链接
# print('step1, creating imgs symlink...')
if opts.generate_imgs:
img_to_path = osp.join(DATA_ROOT, 'images', opts.split_name, seq) # 该序列图片存储位置
if not osp.exists(img_to_path):
os.makedirs(img_to_path)
os.symlink(osp.join(img_dir, img),
osp.join(img_to_path, img)) # 创建软链接
# print('Done!\n')
# 第二步 产生真值文件
# print('step2, generating gt files...')
# 根据本序列的真值文件读取
# ann_idx = int(ann_of_seq[:, 0]) == idx + 1
ann_of_current_frame = ann_of_seq[ann_of_seq[:, 0] == float(idx + 1), :] # 筛选真值文件里本帧的目标信息
exist_gts.append(True if ann_of_current_frame.shape[0] != 0 else False)
gt_to_file = osp.join(gt_to_path, img[:-4] + '.txt')
with open(gt_to_file, 'a') as f_gt:
for i in range(ann_of_current_frame.shape[0]):
category = int(ann_of_current_frame[i][7])
if int(ann_of_current_frame[i][6]) == 1 and category in category_list:
# bbox xywh
x0, y0 = int(ann_of_current_frame[i][2]), int(ann_of_current_frame[i][3])
w, h = int(ann_of_current_frame[i][4]), int(ann_of_current_frame[i][5])
xc, yc = x0 + w // 2, y0 + h // 2 # 中心点 x y
# 归一化
xc, yc = xc / w0, yc / h0
w, h = w / w0, h / h0
category_id = category_dict[category]
write_line = '{:d} {:.6f} {:.6f} {:.6f} {:.6f}\n'.format(
category_id, xc, yc, w, h)
f_gt.write(write_line)
f_gt.close()
# print('Done!\n')
print(f'img symlink and gt files of seq {seq} Done!')
# 第三步 产生图片索引train.txt等
print(f'generating img index file of {seq}')
if write_txt:
to_file = os.path.join('./visdrone', txt_name_dict[opts.split_name] + '.txt')
with open(to_file, 'a') as f:
for idx, img in enumerate(imgs):
if idx < int(seq_length * frame_range['start']) or idx > int(seq_length * frame_range['end']):
continue
if exist_gts[idx]:
f.write('VisDrone2019/' + 'VisDrone2019/' + 'images/' + opts.split_name + '/' \
+ seq + '/' + img + '\n')
f.close()
print('All done!!')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--split_name', type=str, default='VisDrone2019-MOT-train', help='train or test')
parser.add_argument('--generate_imgs', action='store_true', help='whether generate soft link of imgs')
parser.add_argument('--car_only', action='store_true', help='only cars')
parser.add_argument('--certain_seqs', action='store_true', help='for debug')
parser.add_argument('--half', action='store_true', help='half frames')
opts = parser.parse_args()
generate_imgs_and_labels(opts)
# python tools/convert_VisDrone_to_yolov2.py --split_name VisDrone2019-MOT-train --generate_imgs --car_only --half

View File

@ -0,0 +1,479 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d7cbe5ee",
"metadata": {},
"source": [
"# Reparameterization"
]
},
{
"cell_type": "markdown",
"id": "13393b70",
"metadata": {},
"source": [
"## YOLOv7 reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf53becf",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.105.m.0.weight'].data[i, :, :, :] *= state_dict['model.105.im.0.implicit'].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.105.m.1.weight'].data[i, :, :, :] *= state_dict['model.105.im.1.implicit'].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.105.m.2.weight'].data[i, :, :, :] *= state_dict['model.105.im.2.implicit'].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.105.m.0.bias'].data += state_dict['model.105.m.0.weight'].mul(state_dict['model.105.ia.0.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.105.m.1.bias'].data += state_dict['model.105.m.1.weight'].mul(state_dict['model.105.ia.1.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.105.m.2.bias'].data += state_dict['model.105.m.2.weight'].mul(state_dict['model.105.ia.2.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.105.m.0.bias'].data *= state_dict['model.105.im.0.implicit'].data.squeeze()\n",
"model.state_dict()['model.105.m.1.bias'].data *= state_dict['model.105.im.1.implicit'].data.squeeze()\n",
"model.state_dict()['model.105.m.2.bias'].data *= state_dict['model.105.im.2.implicit'].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7.pt')\n"
]
},
{
"cell_type": "markdown",
"id": "5b396a53",
"metadata": {},
"source": [
"## YOLOv7x reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d54d17f",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7x.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7x.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.121.m.0.weight'].data[i, :, :, :] *= state_dict['model.121.im.0.implicit'].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.121.m.1.weight'].data[i, :, :, :] *= state_dict['model.121.im.1.implicit'].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.121.m.2.weight'].data[i, :, :, :] *= state_dict['model.121.im.2.implicit'].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.121.m.0.bias'].data += state_dict['model.121.m.0.weight'].mul(state_dict['model.121.ia.0.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.121.m.1.bias'].data += state_dict['model.121.m.1.weight'].mul(state_dict['model.121.ia.1.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.121.m.2.bias'].data += state_dict['model.121.m.2.weight'].mul(state_dict['model.121.ia.2.implicit']).sum(1).squeeze()\n",
"model.state_dict()['model.121.m.0.bias'].data *= state_dict['model.121.im.0.implicit'].data.squeeze()\n",
"model.state_dict()['model.121.m.1.bias'].data *= state_dict['model.121.im.1.implicit'].data.squeeze()\n",
"model.state_dict()['model.121.m.2.bias'].data *= state_dict['model.121.im.2.implicit'].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7x.pt')\n"
]
},
{
"cell_type": "markdown",
"id": "11a9108e",
"metadata": {},
"source": [
"## YOLOv7-W6 reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d032c629",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7-w6.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7-w6.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"idx = 118\n",
"idx2 = 122\n",
"\n",
"# copy weights of lead head\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data -= model.state_dict()['model.{}.m.0.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data -= model.state_dict()['model.{}.m.1.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data -= model.state_dict()['model.{}.m.2.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data -= model.state_dict()['model.{}.m.3.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data -= model.state_dict()['model.{}.m.0.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data -= model.state_dict()['model.{}.m.1.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data -= model.state_dict()['model.{}.m.2.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data -= model.state_dict()['model.{}.m.3.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.bias'.format(idx2)].data\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.{}.m.0.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.0.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.1.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.1.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.2.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.2.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.3.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.3.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].mul(state_dict['model.{}.ia.0.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].mul(state_dict['model.{}.ia.1.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].mul(state_dict['model.{}.ia.2.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].mul(state_dict['model.{}.ia.3.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data *= state_dict['model.{}.im.0.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data *= state_dict['model.{}.im.1.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data *= state_dict['model.{}.im.2.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data *= state_dict['model.{}.im.3.implicit'.format(idx2)].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7-w6.pt')\n"
]
},
{
"cell_type": "markdown",
"id": "5f093d43",
"metadata": {},
"source": [
"## YOLOv7-E6 reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa2b2142",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7-e6.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7-e6.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"idx = 140\n",
"idx2 = 144\n",
"\n",
"# copy weights of lead head\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data -= model.state_dict()['model.{}.m.0.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data -= model.state_dict()['model.{}.m.1.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data -= model.state_dict()['model.{}.m.2.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data -= model.state_dict()['model.{}.m.3.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data -= model.state_dict()['model.{}.m.0.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data -= model.state_dict()['model.{}.m.1.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data -= model.state_dict()['model.{}.m.2.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data -= model.state_dict()['model.{}.m.3.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.bias'.format(idx2)].data\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.{}.m.0.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.0.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.1.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.1.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.2.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.2.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.3.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.3.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].mul(state_dict['model.{}.ia.0.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].mul(state_dict['model.{}.ia.1.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].mul(state_dict['model.{}.ia.2.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].mul(state_dict['model.{}.ia.3.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data *= state_dict['model.{}.im.0.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data *= state_dict['model.{}.im.1.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data *= state_dict['model.{}.im.2.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data *= state_dict['model.{}.im.3.implicit'.format(idx2)].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7-e6.pt')\n"
]
},
{
"cell_type": "markdown",
"id": "a3bccf89",
"metadata": {},
"source": [
"## YOLOv7-D6 reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5216b70",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7-d6.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7-d6.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"idx = 162\n",
"idx2 = 166\n",
"\n",
"# copy weights of lead head\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data -= model.state_dict()['model.{}.m.0.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data -= model.state_dict()['model.{}.m.1.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data -= model.state_dict()['model.{}.m.2.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data -= model.state_dict()['model.{}.m.3.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data -= model.state_dict()['model.{}.m.0.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data -= model.state_dict()['model.{}.m.1.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data -= model.state_dict()['model.{}.m.2.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data -= model.state_dict()['model.{}.m.3.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.bias'.format(idx2)].data\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.{}.m.0.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.0.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.1.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.1.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.2.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.2.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.3.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.3.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].mul(state_dict['model.{}.ia.0.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].mul(state_dict['model.{}.ia.1.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].mul(state_dict['model.{}.ia.2.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].mul(state_dict['model.{}.ia.3.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data *= state_dict['model.{}.im.0.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data *= state_dict['model.{}.im.1.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data *= state_dict['model.{}.im.2.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data *= state_dict['model.{}.im.3.implicit'.format(idx2)].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7-d6.pt')\n"
]
},
{
"cell_type": "markdown",
"id": "334c273b",
"metadata": {},
"source": [
"## YOLOv7-E6E reparameterization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "635fd8d2",
"metadata": {},
"outputs": [],
"source": [
"# import\n",
"from copy import deepcopy\n",
"from models.yolo import Model\n",
"import torch\n",
"from utils.torch_utils import select_device, is_parallel\n",
"\n",
"device = select_device('0', batch_size=1)\n",
"# model trained by cfg/training/*.yaml\n",
"ckpt = torch.load('cfg/training/yolov7-e6e.pt', map_location=device)\n",
"# reparameterized model in cfg/deploy/*.yaml\n",
"model = Model('cfg/deploy/yolov7-e6e.yaml', ch=3, nc=80).to(device)\n",
"\n",
"# copy intersect weights\n",
"state_dict = ckpt['model'].float().state_dict()\n",
"exclude = []\n",
"intersect_state_dict = {k: v for k, v in state_dict.items() if k in model.state_dict() and not any(x in k for x in exclude) and v.shape == model.state_dict()[k].shape}\n",
"model.load_state_dict(intersect_state_dict, strict=False)\n",
"model.names = ckpt['model'].names\n",
"model.nc = ckpt['model'].nc\n",
"\n",
"idx = 261\n",
"idx2 = 265\n",
"\n",
"# copy weights of lead head\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data -= model.state_dict()['model.{}.m.0.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data -= model.state_dict()['model.{}.m.1.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data -= model.state_dict()['model.{}.m.2.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data -= model.state_dict()['model.{}.m.3.weight'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.weight'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.weight'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.weight'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.weight'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data -= model.state_dict()['model.{}.m.0.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data -= model.state_dict()['model.{}.m.1.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data -= model.state_dict()['model.{}.m.2.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data -= model.state_dict()['model.{}.m.3.bias'.format(idx)].data\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.bias'.format(idx2)].data\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.bias'.format(idx2)].data\n",
"\n",
"# reparametrized YOLOR\n",
"for i in range(255):\n",
" model.state_dict()['model.{}.m.0.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.0.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.1.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.1.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.2.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.2.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
" model.state_dict()['model.{}.m.3.weight'.format(idx)].data[i, :, :, :] *= state_dict['model.{}.im.3.implicit'.format(idx2)].data[:, i, : :].squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data += state_dict['model.{}.m.0.weight'.format(idx2)].mul(state_dict['model.{}.ia.0.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data += state_dict['model.{}.m.1.weight'.format(idx2)].mul(state_dict['model.{}.ia.1.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data += state_dict['model.{}.m.2.weight'.format(idx2)].mul(state_dict['model.{}.ia.2.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data += state_dict['model.{}.m.3.weight'.format(idx2)].mul(state_dict['model.{}.ia.3.implicit'.format(idx2)]).sum(1).squeeze()\n",
"model.state_dict()['model.{}.m.0.bias'.format(idx)].data *= state_dict['model.{}.im.0.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.1.bias'.format(idx)].data *= state_dict['model.{}.im.1.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.2.bias'.format(idx)].data *= state_dict['model.{}.im.2.implicit'.format(idx2)].data.squeeze()\n",
"model.state_dict()['model.{}.m.3.bias'.format(idx)].data *= state_dict['model.{}.im.3.implicit'.format(idx2)].data.squeeze()\n",
"\n",
"# model to be saved\n",
"ckpt = {'model': deepcopy(model.module if is_parallel(model) else model).half(),\n",
" 'optimizer': None,\n",
" 'training_results': None,\n",
" 'epoch': -1}\n",
"\n",
"# save reparameterized model\n",
"torch.save(ckpt, 'cfg/deploy/yolov7-e6e.pt')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63a62625",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -0,0 +1,32 @@
# Config file of MOT17 dataset
DATASET_ROOT: '/data/wujiapeng/datasets/MOT17' # your dataset root
SPLIT: test
CATEGORY_NAMES: # category names to show
- 'pedestrian'
CATEGORY_DICT:
0: 'pedestrian'
CERTAIN_SEQS:
-
IGNORE_SEQS: # Seqs you want to ignore
-
YAML_DICT: '' # NOTE: ONLY for yolo v5 model loader(func DetectMultiBackend)
TRACK_EVAL: # If use TrackEval to evaluate, use these configs
'DISPLAY_LESS_PROGRESS': False
'GT_FOLDER': '/data/wujiapeng/datasets/MOT17/train'
'TRACKERS_FOLDER': './tracker/results'
'SKIP_SPLIT_FOL': True
'TRACKER_SUB_FOLDER': ''
'SEQ_INFO':
'MOT17-02-SDP': null
'MOT17-04-SDP': null
'MOT17-05-SDP': null
'MOT17-09-SDP': null
'MOT17-10-SDP': null
'MOT17-11-SDP': null
'MOT17-13-SDP': null
'GT_LOC_FORMAT': '{gt_folder}/{seq}/gt/gt.txt'

View File

@ -0,0 +1,26 @@
# Config file of UAVDT dataset
DATASET_ROOT: '/data/wujiapeng/datasets/UAVDT' # your dataset root
SPLIT: test
CATEGORY_NAMES: # category names to show
- 'car'
CATEGORY_DICT:
0: 'car'
CERTAIN_SEQS:
-
IGNORE_SEQS: # Seqs you want to ignore
-
YAML_DICT: './data/UAVDT.yaml' # NOTE: ONLY for yolo v5 model loader(func DetectMultiBackend)
TRACK_EVAL: # If use TrackEval to evaluate, use these configs
'DISPLAY_LESS_PROGRESS': False
'GT_FOLDER': '/data/wujiapeng/datasets/UAVDT/UAV-benchmark-M'
'TRACKERS_FOLDER': './tracker/results'
'SKIP_SPLIT_FOL': True
'TRACKER_SUB_FOLDER': ''
'SEQ_INFO':
'M0101': 407
'GT_LOC_FORMAT': '{gt_folder}/{seq}/gt/gt.txt'

View File

@ -0,0 +1,61 @@
# Config file of VisDrone dataset
DATASET_ROOT: '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019'
SPLIT: test
CATEGORY_NAMES:
- 'pedestrain'
- 'people'
- 'bicycle'
- 'car'
- 'van'
- 'truck'
- 'tricycle'
- 'awning-tricycle'
- 'bus'
- 'motor'
CATEGORY_DICT:
0: 'pedestrain'
1: 'people'
2: 'bicycle'
3: 'car'
4: 'van'
5: 'truck'
6: 'tricycle'
7: 'awning-tricycle'
8: 'bus'
9: 'motor'
CERTAIN_SEQS:
-
IGNORE_SEQS: # Seqs you want to ignore
-
YAML_DICT: './data/Visdrone_all.yaml' # NOTE: ONLY for yolo v5 model loader(func DetectMultiBackend)
TRACK_EVAL: # If use TrackEval to evaluate, use these configs
'DISPLAY_LESS_PROGRESS': False
'GT_FOLDER': '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019/VisDrone2019-MOT-test-dev/annotations'
'TRACKERS_FOLDER': './tracker/results'
'SKIP_SPLIT_FOL': True
'TRACKER_SUB_FOLDER': ''
'SEQ_INFO':
'uav0000009_03358_v': 219
'uav0000073_00600_v': 328
'uav0000073_04464_v': 312
'uav0000077_00720_v': 780
'uav0000088_00290_v': 296
'uav0000119_02301_v': 179
'uav0000120_04775_v': 1000
'uav0000161_00000_v': 308
'uav0000188_00000_v': 260
'uav0000201_00000_v': 677
'uav0000249_00001_v': 360
'uav0000249_02688_v': 244
'uav0000297_00000_v': 146
'uav0000297_02761_v': 373
'uav0000306_00230_v': 420
'uav0000355_00001_v': 468
'uav0000370_00001_v': 265
'GT_LOC_FORMAT': '{gt_folder}/{seq}.txt'

View File

@ -0,0 +1,51 @@
# Config file of VisDrone dataset
DATASET_ROOT: '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019'
SPLIT: test
CATEGORY_NAMES:
- 'pedestrain'
- 'car'
- 'van'
- 'truck'
- 'bus'
CATEGORY_DICT:
0: 'pedestrain'
1: 'car'
2: 'van'
3: 'truck'
4: 'bus'
CERTAIN_SEQS:
-
IGNORE_SEQS: # Seqs you want to ignore
-
YAML_DICT: './data/Visdrone_all.yaml' # NOTE: ONLY for yolo v5 model loader(func DetectMultiBackend)
TRACK_EVAL: # If use TrackEval to evaluate, use these configs
'DISPLAY_LESS_PROGRESS': False
'GT_FOLDER': '/data/wujiapeng/datasets/VisDrone2019/VisDrone2019/VisDrone2019-MOT-test-dev/annotations'
'TRACKERS_FOLDER': './tracker/results'
'SKIP_SPLIT_FOL': True
'TRACKER_SUB_FOLDER': ''
'SEQ_INFO':
'uav0000009_03358_v': 219
'uav0000073_00600_v': 328
'uav0000073_04464_v': 312
'uav0000077_00720_v': 780
'uav0000088_00290_v': 296
'uav0000119_02301_v': 179
'uav0000120_04775_v': 1000
'uav0000161_00000_v': 308
'uav0000188_00000_v': 260
'uav0000201_00000_v': 677
'uav0000249_00001_v': 360
'uav0000249_02688_v': 244
'uav0000297_00000_v': 146
'uav0000297_02761_v': 373
'uav0000306_00230_v': 420
'uav0000355_00001_v': 468
'uav0000370_00001_v': 265
'GT_LOC_FORMAT': '{gt_folder}/{seq}.txt'

View File

@ -0,0 +1,37 @@
import time
class Timer(object):
"""A simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
self.duration = 0.
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
self.duration = self.average_time
else:
self.duration = self.diff
return self.duration
def clear(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
self.duration = 0.

View File

@ -0,0 +1,305 @@
"""
main code for track
"""
import sys, os
import numpy as np
import torch
import cv2
from PIL import Image
from tqdm import tqdm
import yaml
from loguru import logger
import argparse
from tracking_utils.envs import select_device
from tracking_utils.tools import *
from tracking_utils.visualization import plot_img, save_video
from my_timer import Timer
from tracker_dataloader import TestDataset
# trackers
from trackers.byte_tracker import ByteTracker
from trackers.sort_tracker import SortTracker
from trackers.botsort_tracker import BotTracker
from trackers.c_biou_tracker import C_BIoUTracker
from trackers.ocsort_tracker import OCSortTracker
from trackers.deepsort_tracker import DeepSortTracker
from trackers.strongsort_tracker import StrongSortTracker
from trackers.sparse_tracker import SparseTracker
# YOLOX modules
try:
from yolox.exp import get_exp
from yolox_utils.postprocess import postprocess_yolox
from yolox.utils import fuse_model
except Exception as e:
logger.warning(e)
logger.warning('Load yolox fail. If you want to use yolox, please check the installation.')
pass
# YOLOv7 modules
try:
sys.path.append(os.getcwd())
from models.experimental import attempt_load
from utils.torch_utils import select_device, time_synchronized, TracedModel
from utils.general import non_max_suppression, scale_coords, check_img_size
from yolov7_utils.postprocess import postprocess as postprocess_yolov7
except Exception as e:
logger.warning(e)
logger.warning('Load yolov7 fail. If you want to use yolov7, please check the installation.')
pass
# YOLOv8 modules
try:
from ultralytics import YOLO
from yolov8_utils.postprocess import postprocess as postprocess_yolov8
except Exception as e:
logger.warning(e)
logger.warning('Load yolov8 fail. If you want to use yolov8, please check the installation.')
pass
TRACKER_DICT = {
'sort': SortTracker,
'bytetrack': ByteTracker,
'botsort': BotTracker,
'c_bioutrack': C_BIoUTracker,
'ocsort': OCSortTracker,
'deepsort': DeepSortTracker,
'strongsort': StrongSortTracker,
'sparsetrack': SparseTracker
}
def get_args():
parser = argparse.ArgumentParser()
"""general"""
parser.add_argument('--dataset', type=str, default='visdrone_part', help='visdrone, mot17, etc.')
parser.add_argument('--detector', type=str, default='yolov8', help='yolov7, yolox, etc.')
parser.add_argument('--tracker', type=str, default='sort', help='sort, deepsort, etc')
parser.add_argument('--reid_model', type=str, default='osnet_x0_25', help='osnet or deppsort')
parser.add_argument('--kalman_format', type=str, default='default', help='use what kind of Kalman, sort, deepsort, byte, etc.')
parser.add_argument('--img_size', type=int, default=1280, help='image size, [h, w]')
parser.add_argument('--conf_thresh', type=float, default=0.2, help='filter tracks')
parser.add_argument('--nms_thresh', type=float, default=0.7, help='thresh for NMS')
parser.add_argument('--iou_thresh', type=float, default=0.5, help='IOU thresh to filter tracks')
parser.add_argument('--device', type=str, default='6', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
"""yolox"""
parser.add_argument('--yolox_exp_file', type=str, default='./tracker/yolox_utils/yolox_m.py')
"""model path"""
parser.add_argument('--detector_model_path', type=str, default='./weights/best.pt', help='model path')
parser.add_argument('--trace', type=bool, default=False, help='traced model of YOLO v7')
# other model path
parser.add_argument('--reid_model_path', type=str, default='./weights/osnet_x0_25.pth', help='path for reid model path')
parser.add_argument('--dhn_path', type=str, default='./weights/DHN.pth', help='path of DHN path for DeepMOT')
"""other options"""
parser.add_argument('--discard_reid', action='store_true', help='discard reid model, only work in bot-sort etc. which need a reid part')
parser.add_argument('--track_buffer', type=int, default=30, help='tracking buffer')
parser.add_argument('--gamma', type=float, default=0.1, help='param to control fusing motion and apperance dist')
parser.add_argument('--min_area', type=float, default=150, help='use to filter small bboxs')
parser.add_argument('--save_dir', type=str, default='track_results/{dataset_name}/{split}')
parser.add_argument('--save_images', action='store_true', help='save tracking results (image)')
parser.add_argument('--save_videos', action='store_true', help='save tracking results (video)')
parser.add_argument('--track_eval', type=bool, default=True, help='Use TrackEval to evaluate')
return parser.parse_args()
def main(args, dataset_cfgs):
"""1. set some params"""
# NOTE: if save video, you must save image
if args.save_videos:
args.save_images = True
"""2. load detector"""
device = select_device(args.device)
if args.detector == 'yolox':
exp = get_exp(args.yolox_exp_file, None) # TODO: modify num_classes etc. for specific dataset
model_img_size = exp.input_size
model = exp.get_model()
model.to(device)
model.eval()
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
ckpt = torch.load(args.detector_model_path, map_location=device)
model.load_state_dict(ckpt['model'])
logger.info("loaded checkpoint done")
model = fuse_model(model)
stride = None # match with yolo v7
logger.info(f'Now detector is on device {next(model.parameters()).device}')
elif args.detector == 'yolov7':
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
model = attempt_load(args.detector_model_path, map_location=device)
# get inference img size
stride = int(model.stride.max()) # model stride
model_img_size = check_img_size(args.img_size, s=stride) # check img_size
# Traced model
model = TracedModel(model, device=device, img_size=args.img_size)
# model.half()
logger.info("loaded checkpoint done")
logger.info(f'Now detector is on device {next(model.parameters()).device}')
elif args.detector == 'yolov8':
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
model = YOLO(args.detector_model_path)
model_img_size = [None, None]
stride = None
logger.info("loaded checkpoint done")
else:
logger.error(f"detector {args.detector} is not supprted")
exit(0)
"""3. load sequences"""
DATA_ROOT = dataset_cfgs['DATASET_ROOT']
SPLIT = dataset_cfgs['SPLIT']
seqs = sorted(os.listdir(os.path.join(DATA_ROOT, 'images', SPLIT)))
seqs = [seq for seq in seqs if seq not in dataset_cfgs['IGNORE_SEQS']]
if not None in dataset_cfgs['CERTAIN_SEQS']:
seqs = dataset_cfgs['CERTAIN_SEQS']
logger.info(f'Total {len(seqs)} seqs will be tracked: {seqs}')
save_dir = args.save_dir.format(dataset_name=args.dataset, split=SPLIT)
"""4. Tracking"""
# set timer
timer = Timer()
seq_fps = []
for seq in seqs:
logger.info(f'--------------tracking seq {seq}--------------')
dataset = TestDataset(DATA_ROOT, SPLIT, seq_name=seq, img_size=model_img_size, model=args.detector, stride=stride)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False)
tracker = TRACKER_DICT[args.tracker](args, )
process_bar = enumerate(data_loader)
process_bar = tqdm(process_bar, total=len(data_loader), ncols=150)
results = []
for frame_idx, (ori_img, img) in process_bar:
# start timing this frame
timer.tic()
if args.detector == 'yolov8':
img = img.squeeze(0).cpu().numpy()
else:
img = img.to(device) # (1, C, H, W)
img = img.float()
ori_img = ori_img.squeeze(0)
# get detector output
with torch.no_grad():
if args.detector == 'yolov8':
output = model.predict(img, conf=args.conf_thresh, iou=args.nms_thresh)
else:
output = model(img)
# postprocess output to original scales
if args.detector == 'yolox':
output = postprocess_yolox(output, len(dataset_cfgs['CATEGORY_NAMES']), conf_thresh=args.conf_thresh,
img=img, ori_img=ori_img)
elif args.detector == 'yolov7':
output = postprocess_yolov7(output, args.conf_thresh, args.nms_thresh, img.shape[2:], ori_img.shape)
elif args.detector == 'yolov8':
output = postprocess_yolov8(output)
else: raise NotImplementedError
# output: (tlbr, conf, cls)
# convert tlbr to tlwh
if isinstance(output, torch.Tensor):
output = output.detach().cpu().numpy()
output[:, 2] -= output[:, 0]
output[:, 3] -= output[:, 1]
current_tracks = tracker.update(output, img, ori_img.cpu().numpy())
# save results
cur_tlwh, cur_id, cur_cls, cur_score = [], [], [], []
for trk in current_tracks:
bbox = trk.tlwh
id = trk.track_id
cls = trk.category
score = trk.score
# filter low area bbox
if bbox[2] * bbox[3] > args.min_area:
cur_tlwh.append(bbox)
cur_id.append(id)
cur_cls.append(cls)
cur_score.append(score)
# results.append((frame_id + 1, id, bbox, cls))
results.append((frame_idx + 1, cur_id, cur_tlwh, cur_cls, cur_score))
timer.toc()
if args.save_images:
plot_img(img=ori_img, frame_id=frame_idx, results=[cur_tlwh, cur_id, cur_cls],
save_dir=os.path.join(save_dir, 'vis_results'))
save_results(folder_name=os.path.join(args.dataset, SPLIT),
seq_name=seq,
results=results)
# show the fps
seq_fps.append(frame_idx / timer.total_time)
logger.info(f'fps of seq {seq}: {seq_fps[-1]}')
timer.clear()
if args.save_videos:
save_video(images_path=os.path.join(save_dir, 'vis_results'))
logger.info(f'save video of {seq} done')
# show the average fps
logger.info(f'average fps: {np.mean(seq_fps)}')
if __name__ == '__main__':
args = get_args()
with open(f'./tracker/config_files/{args.dataset}.yaml', 'r') as f:
cfgs = yaml.load(f, Loader=yaml.FullLoader)
main(args, cfgs)

View File

@ -0,0 +1,266 @@
"""
main code for track
"""
import sys, os
import numpy as np
import torch
import cv2
from PIL import Image
from tqdm import tqdm
import yaml
from loguru import logger
import argparse
from tracking_utils.envs import select_device
from tracking_utils.tools import *
from tracking_utils.visualization import plot_img, save_video
from tracker_dataloader import TestDataset, DemoDataset
# trackers
from trackers.byte_tracker import ByteTracker
from trackers.sort_tracker import SortTracker
from trackers.botsort_tracker import BotTracker
from trackers.c_biou_tracker import C_BIoUTracker
from trackers.ocsort_tracker import OCSortTracker
from trackers.deepsort_tracker import DeepSortTracker
# YOLOX modules
try:
from yolox.exp import get_exp
from yolox_utils.postprocess import postprocess_yolox
from yolox.utils import fuse_model
except Exception as e:
logger.warning(e)
logger.warning('Load yolox fail. If you want to use yolox, please check the installation.')
pass
# YOLOv7 modules
try:
sys.path.append(os.getcwd())
from models.experimental import attempt_load
from utils.torch_utils import select_device, time_synchronized, TracedModel
from utils.general import non_max_suppression, scale_coords, check_img_size
from yolov7_utils.postprocess import postprocess as postprocess_yolov7
except Exception as e:
logger.warning(e)
logger.warning('Load yolov7 fail. If you want to use yolov7, please check the installation.')
pass
# YOLOv8 modules
try:
from ultralytics import YOLO
from yolov8_utils.postprocess import postprocess as postprocess_yolov8
except Exception as e:
logger.warning(e)
logger.warning('Load yolov8 fail. If you want to use yolov8, please check the installation.')
pass
TRACKER_DICT = {
'sort': SortTracker,
'bytetrack': ByteTracker,
'botsort': BotTracker,
'c_bioutrack': C_BIoUTracker,
'ocsort': OCSortTracker,
'deepsort': DeepSortTracker
}
def get_args():
parser = argparse.ArgumentParser()
"""general"""
parser.add_argument('--obj', type=str, required=True, default='demo.mp4', help='video or images folder PATH')
parser.add_argument('--detector', type=str, default='yolov8', help='yolov7, yolox, etc.')
parser.add_argument('--tracker', type=str, default='sort', help='sort, deepsort, etc')
parser.add_argument('--reid_model', type=str, default='osnet_x0_25', help='osnet or deppsort')
parser.add_argument('--kalman_format', type=str, default='default', help='use what kind of Kalman, sort, deepsort, byte, etc.')
parser.add_argument('--img_size', type=int, default=1280, help='image size, [h, w]')
parser.add_argument('--conf_thresh', type=float, default=0.2, help='filter tracks')
parser.add_argument('--nms_thresh', type=float, default=0.7, help='thresh for NMS')
parser.add_argument('--iou_thresh', type=float, default=0.5, help='IOU thresh to filter tracks')
parser.add_argument('--device', type=str, default='6', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
"""yolox"""
parser.add_argument('--num_classes', type=int, default=1)
parser.add_argument('--yolox_exp_file', type=str, default='./tracker/yolox_utils/yolox_m.py')
"""model path"""
parser.add_argument('--detector_model_path', type=str, default='./weights/best.pt', help='model path')
parser.add_argument('--trace', type=bool, default=False, help='traced model of YOLO v7')
# other model path
parser.add_argument('--reid_model_path', type=str, default='./weights/osnet_x0_25.pth', help='path for reid model path')
parser.add_argument('--dhn_path', type=str, default='./weights/DHN.pth', help='path of DHN path for DeepMOT')
"""other options"""
parser.add_argument('--discard_reid', action='store_true', help='discard reid model, only work in bot-sort etc. which need a reid part')
parser.add_argument('--track_buffer', type=int, default=30, help='tracking buffer')
parser.add_argument('--gamma', type=float, default=0.1, help='param to control fusing motion and apperance dist')
parser.add_argument('--min_area', type=float, default=150, help='use to filter small bboxs')
parser.add_argument('--save_dir', type=str, default='track_demo_results')
parser.add_argument('--save_images', action='store_true', help='save tracking results (image)')
parser.add_argument('--save_videos', action='store_true', help='save tracking results (video)')
parser.add_argument('--track_eval', type=bool, default=True, help='Use TrackEval to evaluate')
return parser.parse_args()
def main(args):
"""1. set some params"""
# NOTE: if save video, you must save image
if args.save_videos:
args.save_images = True
"""2. load detector"""
device = select_device(args.device)
if args.detector == 'yolox':
exp = get_exp(args.yolox_exp_file, None) # TODO: modify num_classes etc. for specific dataset
model_img_size = exp.input_size
model = exp.get_model()
model.to(device)
model.eval()
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
ckpt = torch.load(args.detector_model_path, map_location=device)
model.load_state_dict(ckpt['model'])
logger.info("loaded checkpoint done")
model = fuse_model(model)
stride = None # match with yolo v7
logger.info(f'Now detector is on device {next(model.parameters()).device}')
elif args.detector == 'yolov7':
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
model = attempt_load(args.detector_model_path, map_location=device)
# get inference img size
stride = int(model.stride.max()) # model stride
model_img_size = check_img_size(args.img_size, s=stride) # check img_size
# Traced model
model = TracedModel(model, device=device, img_size=args.img_size)
# model.half()
logger.info("loaded checkpoint done")
logger.info(f'Now detector is on device {next(model.parameters()).device}')
elif args.detector == 'yolov8':
logger.info(f"loading detector {args.detector} checkpoint {args.detector_model_path}")
model = YOLO(args.detector_model_path)
model_img_size = [None, None]
stride = None
logger.info("loaded checkpoint done")
else:
logger.error(f"detector {args.detector} is not supprted")
exit(0)
"""3. load sequences"""
dataset = DemoDataset(file_name=args.obj, img_size=model_img_size, model=args.detector, stride=stride, )
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False)
tracker = TRACKER_DICT[args.tracker](args, )
save_dir = args.save_dir
process_bar = enumerate(data_loader)
process_bar = tqdm(process_bar, total=len(data_loader), ncols=150)
results = []
"""4. Tracking"""
for frame_idx, (ori_img, img) in process_bar:
if args.detector == 'yolov8':
img = img.squeeze(0).cpu().numpy()
else:
img = img.to(device) # (1, C, H, W)
img = img.float()
ori_img = ori_img.squeeze(0)
# get detector output
with torch.no_grad():
if args.detector == 'yolov8':
output = model.predict(img, conf=args.conf_thresh, iou=args.nms_thresh)
else:
output = model(img)
# postprocess output to original scales
if args.detector == 'yolox':
output = postprocess_yolox(output, args.num_classes, conf_thresh=args.conf_thresh,
img=img, ori_img=ori_img)
elif args.detector == 'yolov7':
output = postprocess_yolov7(output, args.conf_thresh, args.nms_thresh, img.shape[2:], ori_img.shape)
elif args.detector == 'yolov8':
output = postprocess_yolov8(output)
else: raise NotImplementedError
# output: (tlbr, conf, cls)
# convert tlbr to tlwh
if isinstance(output, torch.Tensor):
output = output.detach().cpu().numpy()
output[:, 2] -= output[:, 0]
output[:, 3] -= output[:, 1]
current_tracks = tracker.update(output, img, ori_img.cpu().numpy())
# save results
cur_tlwh, cur_id, cur_cls, cur_score = [], [], [], []
for trk in current_tracks:
bbox = trk.tlwh
id = trk.track_id
cls = trk.category
score = trk.score
# filter low area bbox
if bbox[2] * bbox[3] > args.min_area:
cur_tlwh.append(bbox)
cur_id.append(id)
cur_cls.append(cls)
cur_score.append(score)
# results.append((frame_id + 1, id, bbox, cls))
results.append((frame_idx + 1, cur_id, cur_tlwh, cur_cls, cur_score))
if args.save_images:
plot_img(img=ori_img, frame_id=frame_idx, results=[cur_tlwh, cur_id, cur_cls],
save_dir=os.path.join(save_dir, 'vis_results'))
save_results(folder_name=os.path.join(save_dir, 'txt_results'),
seq_name='demo',
results=results)
if args.save_videos:
save_video(images_path=os.path.join(save_dir, 'vis_results'))
logger.info(f'save video done')
if __name__ == '__main__':
args = get_args()
main(args)

View File

@ -0,0 +1,223 @@
import numpy as np
import torch
import cv2
import os
import os.path as osp
from torch.utils.data import Dataset
class TestDataset(Dataset):
""" This class generate origin image, preprocessed image for inference
NOTE: for every sequence, initialize a TestDataset class
"""
def __init__(self, data_root, split, seq_name, img_size=[640, 640], legacy_yolox=True, model='yolox', **kwargs) -> None:
"""
Args:
data_root: path for entire dataset
seq_name: name of sequence
img_size: List[int, int] | Tuple[int, int] image size for detection model
legacy_yolox: bool, to be compatible with older versions of yolox
model: detection model, currently support x, v7, v8
"""
super().__init__()
self.model = model
self.data_root = data_root
self.seq_name = seq_name
self.img_size = img_size
self.split = split
self.seq_path = osp.join(self.data_root, 'images', self.split, self.seq_name)
self.imgs_in_seq = sorted(os.listdir(self.seq_path))
self.legacy = legacy_yolox
self.other_param = kwargs
def __getitem__(self, idx):
if self.model == 'yolox':
return self._getitem_yolox(idx)
elif self.model == 'yolov7':
return self._getitem_yolov7(idx)
elif self.model == 'yolov8':
return self._getitem_yolov8(idx)
def _getitem_yolox(self, idx):
img = cv2.imread(osp.join(self.seq_path, self.imgs_in_seq[idx]))
img_resized, _ = self._preprocess_yolox(img, self.img_size, )
if self.legacy:
img_resized = img_resized[::-1, :, :].copy() # BGR -> RGB
img_resized /= 255.0
img_resized -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1)
img_resized /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)
return torch.from_numpy(img), torch.from_numpy(img_resized)
def _getitem_yolov7(self, idx):
img = cv2.imread(osp.join(self.seq_path, self.imgs_in_seq[idx]))
img_resized = self._preprocess_yolov7(img, ) # torch.Tensor
return torch.from_numpy(img), img_resized
def _getitem_yolov8(self, idx):
img = cv2.imread(osp.join(self.seq_path, self.imgs_in_seq[idx])) # (h, w, c)
# img = self._preprocess_yolov8(img)
return torch.from_numpy(img), torch.from_numpy(img)
def _preprocess_yolox(self, img, size, swap=(2, 0, 1)):
""" convert origin image to resized image, YOLOX-manner
Args:
img: np.ndarray
size: List[int, int] | Tuple[int, int]
swap: (H, W, C) -> (C, H, W)
Returns:
np.ndarray, float
"""
if len(img.shape) == 3:
padded_img = np.ones((size[0], size[1], 3), dtype=np.uint8) * 114
else:
padded_img = np.ones(size, dtype=np.uint8) * 114
r = min(size[0] / img.shape[0], size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR,
).astype(np.uint8)
padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
def _preprocess_yolov7(self, img, ):
img_resized = self._letterbox(img, new_shape=self.img_size, stride=self.other_param['stride'], )[0]
img_resized = img_resized[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB
img_resized = np.ascontiguousarray(img_resized)
img_resized = torch.from_numpy(img_resized).float()
img_resized /= 255.0
return img_resized
def _preprocess_yolov8(self, img, ):
img = img.transpose((2, 0, 1))
img = np.ascontiguousarray(img)
return img
def _letterbox(self, img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better test mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return img, ratio, (dw, dh)
def __len__(self, ):
return len(self.imgs_in_seq)
class DemoDataset(TestDataset):
"""
dataset for demo
"""
def __init__(self, file_name, img_size=[640, 640], model='yolox', legacy_yolox=True, **kwargs) -> None:
self.file_name = file_name
self.model = model
self.img_size = img_size
self.is_video = '.mp4' in file_name or '.avi' in file_name
if not self.is_video:
self.imgs_in_seq = sorted(os.listdir(file_name))
else:
self.imgs_in_seq = []
self.cap = cv2.VideoCapture(file_name)
while True:
ret, frame = self.cap.read()
if not ret: break
self.imgs_in_seq.append(frame)
self.legacy = legacy_yolox
def __getitem__(self, idx):
if not self.is_video:
img = cv2.imread(osp.join(self.file_name, self.imgs_in_seq[idx]))
else:
img = self.imgs_in_seq[idx]
if self.model == 'yolox':
return self._getitem_yolox(img)
elif self.model == 'yolov7':
return self._getitem_yolov7(img)
elif self.model == 'yolov8':
return self._getitem_yolov8(img)
def _getitem_yolox(self, img):
img_resized, _ = self._preprocess_yolox(img, self.img_size, )
if self.legacy:
img_resized = img_resized[::-1, :, :].copy() # BGR -> RGB
img_resized /= 255.0
img_resized -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1)
img_resized /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)
return torch.from_numpy(img), torch.from_numpy(img_resized)
def _getitem_yolov7(self, img):
img_resized = self._preprocess_yolov7(img, ) # torch.Tensor
return torch.from_numpy(img), img_resized
def _getitem_yolov8(self, img):
# img = self._preprocess_yolov8(img)
return torch.from_numpy(img), torch.from_numpy(img)

View File

@ -0,0 +1,133 @@
import numpy as np
from collections import OrderedDict
class TrackState(object):
New = 0
Tracked = 1
Lost = 2
Removed = 3
class BaseTrack(object):
_count = 0
track_id = 0
is_activated = False
state = TrackState.New
history = OrderedDict()
features = []
curr_feature = None
score = 0
start_frame = 0
frame_id = 0
time_since_update = 0
# multi-camera
location = (np.inf, np.inf)
@property
def end_frame(self):
return self.frame_id
@staticmethod
def next_id():
BaseTrack._count += 1
return BaseTrack._count
def activate(self, *args):
raise NotImplementedError
def predict(self):
raise NotImplementedError
def update(self, *args, **kwargs):
raise NotImplementedError
def mark_lost(self):
self.state = TrackState.Lost
def mark_removed(self):
self.state = TrackState.Removed
@property
def tlwh(self):
"""Get current position in bounding box format `(top left x, top left y,
width, height)`.
"""
if self.mean is None:
return self._tlwh.copy()
ret = self.mean[:4].copy()
ret[:2] -= ret[2:] / 2
return ret
@property
def tlbr(self):
"""Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
`(top left, bottom right)`.
"""
ret = self.tlwh.copy()
ret[2:] += ret[:2]
return ret
@property
def xywh(self):
"""Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
`(top left, bottom right)`.
"""
ret = self.tlwh.copy()
ret[:2] += ret[2:] / 2.0
return ret
@staticmethod
# @jit(nopython=True)
def tlwh_to_xyah(tlwh):
"""Convert bounding box to format `(center x, center y, aspect ratio,
height)`, where the aspect ratio is `width / height`.
"""
ret = np.asarray(tlwh).copy()
ret[:2] += ret[2:] / 2
ret[2] /= ret[3]
return ret
@staticmethod
def tlwh_to_xywh(tlwh):
"""Convert bounding box to format `(center x, center y, width,
height)`.
"""
ret = np.asarray(tlwh).copy()
ret[:2] += ret[2:] / 2
return ret
@staticmethod
def tlwh_to_xysa(tlwh):
"""Convert bounding box to format `(center x, center y, width,
height)`.
"""
ret = np.asarray(tlwh).copy()
ret[:2] += ret[2:] / 2
ret[2] = tlwh[2] * tlwh[3]
ret[3] = tlwh[2] / tlwh[3]
return ret
def to_xyah(self):
return self.tlwh_to_xyah(self.tlwh)
def to_xywh(self):
return self.tlwh_to_xywh(self.tlwh)
@staticmethod
def tlbr_to_tlwh(tlbr):
ret = np.asarray(tlbr).copy()
ret[2:] -= ret[:2]
return ret
@staticmethod
# @jit(nopython=True)
def tlwh_to_tlbr(tlwh):
ret = np.asarray(tlwh).copy()
ret[2:] += ret[:2]
return ret
def __repr__(self):
return 'OT_{}_({}-{})'.format(self.track_id, self.start_frame, self.end_frame)

View File

@ -0,0 +1,329 @@
"""
Bot sort
"""
import numpy as np
import torch
from torchvision.ops import nms
import cv2
import torchvision.transforms as T
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_reid
from .matching import *
from .reid_models.OSNet import *
from .reid_models.load_model_tools import load_pretrained_weights
from .reid_models.deepsort_reid import Extractor
from .camera_motion_compensation import GMC
REID_MODEL_DICT = {
'osnet_x1_0': osnet_x1_0,
'osnet_x0_75': osnet_x0_75,
'osnet_x0_5': osnet_x0_5,
'osnet_x0_25': osnet_x0_25,
'deepsort': Extractor
}
def load_reid_model(reid_model, reid_model_path):
if 'osnet' in reid_model:
func = REID_MODEL_DICT[reid_model]
model = func(num_classes=1, pretrained=False, )
load_pretrained_weights(model, reid_model_path)
model.cuda().eval()
elif 'deepsort' in reid_model:
model = REID_MODEL_DICT[reid_model](reid_model_path, use_cuda=True)
else:
raise NotImplementedError
return model
class BotTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
self.with_reid = not args.discard_reid
self.reid_model, self.crop_transforms = None, None
if self.with_reid:
self.reid_model = load_reid_model(args.reid_model, args.reid_model_path)
self.crop_transforms = T.Compose([
# T.ToPILImage(),
# T.Resize(size=(256, 128)),
T.ToTensor(), # (c, 128, 256)
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# camera motion compensation module
self.gmc = GMC(method='orb', downscale=2, verbose=None)
def reid_preprocess(self, obj_bbox):
"""
preprocess cropped object bboxes
obj_bbox: np.ndarray, shape=(h_obj, w_obj, c)
return:
torch.Tensor of shape (c, 128, 256)
"""
obj_bbox = cv2.resize(obj_bbox.astype(np.float32) / 255.0, dsize=(128, 128)) # shape: (128, 256, c)
return self.crop_transforms(obj_bbox)
def get_feature(self, tlwhs, ori_img):
"""
get apperance feature of an object
tlwhs: shape (num_of_objects, 4)
ori_img: original image, np.ndarray, shape(H, W, C)
"""
obj_bbox = []
for tlwh in tlwhs:
tlwh = list(map(int, tlwh))
# if any(tlbr_ == -1 for tlbr_ in tlwh):
# print(tlwh)
tlbr_tensor = self.reid_preprocess(ori_img[tlwh[1]: tlwh[1] + tlwh[3], tlwh[0]: tlwh[0] + tlwh[2]])
obj_bbox.append(tlbr_tensor)
if not obj_bbox:
return np.array([])
obj_bbox = torch.stack(obj_bbox, dim=0)
obj_bbox = obj_bbox.cuda()
features = self.reid_model(obj_bbox) # shape: (num_of_objects, feature_dim)
return features.cpu().detach().numpy()
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlwh format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
inds_low = scores > 0.1
inds_high = scores < self.args.conf_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
cates = categories[remain_inds]
cates_second = categories[inds_second]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
"""Step 1: Extract reid features"""
if self.with_reid:
features_keep = self.get_feature(tlwhs=dets[:, :4], ori_img=ori_img)
if len(dets) > 0:
if self.with_reid:
detections = [Tracklet_w_reid(tlwh, s, cate, motion=self.motion, feat=feat) for
(tlwh, s, cate, feat) in zip(dets, scores_keep, cates, features_keep)]
else:
detections = [Tracklet(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with high score detection boxes'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
# Camera motion compensation
warp = self.gmc.apply(ori_img, dets)
self.gmc.multi_gmc(tracklet_pool, warp)
self.gmc.multi_gmc(unconfirmed, warp)
ious_dists = iou_distance(tracklet_pool, detections)
ious_dists_mask = (ious_dists > 0.5) # high conf iou
if self.with_reid:
# mixed cost matrix
emb_dists = embedding_distance(tracklet_pool, detections) / 2.0
raw_emb_dists = emb_dists.copy()
emb_dists[emb_dists > 0.25] = 1.0
emb_dists[ious_dists_mask] = 1.0
dists = np.minimum(ious_dists, emb_dists)
else:
dists = ious_dists
matches, u_track, u_detection = linear_assignment(dists, thresh=0.9)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
''' Step 3: Second association, with low score detection boxes'''
# association the untrack to the low score detections
if len(dets_second) > 0:
'''Detections'''
detections_second = [Tracklet(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets_second, scores_second, cates_second)]
else:
detections_second = []
r_tracked_tracklets = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
dists = iou_distance(r_tracked_tracklets, detections_second)
matches, u_track, u_detection_second = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = r_tracked_tracklets[itracked]
det = detections_second[idet]
if track.state == TrackState.Tracked:
track.update(det, self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = r_tracked_tracklets[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detections[i] for i in u_detection]
ious_dists = iou_distance(unconfirmed, detections)
ious_dists_mask = (ious_dists > 0.5)
if self.with_reid:
emb_dists = embedding_distance(unconfirmed, detections) / 2.0
raw_emb_dists = emb_dists.copy()
emb_dists[emb_dists > 0.25] = 1.0
emb_dists[ious_dists_mask] = 1.0
dists = np.minimum(ious_dists, emb_dists)
else:
dists = ious_dists
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,201 @@
"""
ByteTrack
"""
import numpy as np
from collections import deque
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet
from .matching import *
class ByteTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
inds_low = scores > 0.1
inds_high = scores < self.args.conf_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
cates = categories[remain_inds]
cates_second = categories[inds_second]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
if len(dets) > 0:
'''Detections'''
detections = [Tracklet(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with high score detection boxes'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
dists = iou_distance(tracklet_pool, detections)
matches, u_track, u_detection = linear_assignment(dists, thresh=0.9)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
''' Step 3: Second association, with low score detection boxes'''
# association the untrack to the low score detections
if len(dets_second) > 0:
'''Detections'''
detections_second = [Tracklet(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets_second, scores_second, cates_second)]
else:
detections_second = []
r_tracked_tracklets = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
dists = iou_distance(r_tracked_tracklets, detections_second)
matches, u_track, u_detection_second = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = r_tracked_tracklets[itracked]
det = detections_second[idet]
if track.state == TrackState.Tracked:
track.update(det, self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = r_tracked_tracklets[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detections[i] for i in u_detection]
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,204 @@
"""
C_BIoU Track
"""
import numpy as np
from collections import deque
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_bbox_buffer
from .matching import *
class C_BIoUTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
inds_low = scores > 0.1
inds_high = scores < self.args.conf_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
cates = categories[remain_inds]
cates_second = categories[inds_second]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
if len(dets) > 0:
'''Detections'''
detections = [Tracklet_w_bbox_buffer(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with high score detection boxes'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
dists = buffered_iou_distance(tracklet_pool, detections, level=1)
matches, u_track, u_detection = linear_assignment(dists, thresh=0.9)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
''' Step 3: Second association, with low score detection boxes'''
# association the untrack to the low score detections
if len(dets_second) > 0:
'''Detections'''
detections_second = [Tracklet_w_bbox_buffer(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets_second, scores_second, cates_second)]
else:
detections_second = []
r_tracked_tracklets = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
dists = buffered_iou_distance(r_tracked_tracklets, detections_second, level=2)
matches, u_track, u_detection_second = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = r_tracked_tracklets[itracked]
det = detections_second[idet]
if track.state == TrackState.Tracked:
track.update(det, self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = r_tracked_tracklets[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detections[i] for i in u_detection]
dists = buffered_iou_distance(unconfirmed, detections, level=1)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,264 @@
import cv2
import numpy as np
import copy
import matplotlib.pyplot as plt
"""GMC Module"""
class GMC:
def __init__(self, method='orb', downscale=2, verbose=None):
super(GMC, self).__init__()
self.method = method
self.downscale = max(1, int(downscale))
if self.method == 'orb':
self.detector = cv2.FastFeatureDetector_create(20)
self.extractor = cv2.ORB_create()
self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING)
elif self.method == 'sift':
self.detector = cv2.SIFT_create(nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20)
self.extractor = cv2.SIFT_create(nOctaveLayers=3, contrastThreshold=0.02, edgeThreshold=20)
self.matcher = cv2.BFMatcher(cv2.NORM_L2)
elif self.method == 'ecc':
number_of_iterations = 100
termination_eps = 1e-5
self.warp_mode = cv2.MOTION_EUCLIDEAN
self.criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, number_of_iterations, termination_eps)
elif self.method == 'file' or self.method == 'files':
seqName = verbose[0]
ablation = verbose[1]
if ablation:
filePath = r'tracker/GMC_files/MOT17_ablation'
else:
filePath = r'tracker/GMC_files/MOTChallenge'
if '-FRCNN' in seqName:
seqName = seqName[:-6]
elif '-DPM' in seqName:
seqName = seqName[:-4]
elif '-SDP' in seqName:
seqName = seqName[:-4]
self.gmcFile = open(filePath + "/GMC-" + seqName + ".txt", 'r')
if self.gmcFile is None:
raise ValueError("Error: Unable to open GMC file in directory:" + filePath)
elif self.method == 'none' or self.method == 'None':
self.method = 'none'
else:
raise ValueError("Error: Unknown CMC method:" + method)
self.prevFrame = None
self.prevKeyPoints = None
self.prevDescriptors = None
self.initializedFirstFrame = False
def apply(self, raw_frame, detections=None):
if self.method == 'orb' or self.method == 'sift':
return self.applyFeaures(raw_frame, detections)
elif self.method == 'ecc':
return self.applyEcc(raw_frame, detections)
elif self.method == 'file':
return self.applyFile(raw_frame, detections)
elif self.method == 'none':
return np.eye(2, 3)
else:
return np.eye(2, 3)
def applyEcc(self, raw_frame, detections=None):
# Initialize
height, width, _ = raw_frame.shape
frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY)
H = np.eye(2, 3, dtype=np.float32)
# Downscale image (TODO: consider using pyramids)
if self.downscale > 1.0:
frame = cv2.GaussianBlur(frame, (3, 3), 1.5)
frame = cv2.resize(frame, (width // self.downscale, height // self.downscale))
width = width // self.downscale
height = height // self.downscale
# Handle first frame
if not self.initializedFirstFrame:
# Initialize data
self.prevFrame = frame.copy()
# Initialization done
self.initializedFirstFrame = True
return H
# Run the ECC algorithm. The results are stored in warp_matrix.
# (cc, H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, self.criteria)
try:
(cc, H) = cv2.findTransformECC(self.prevFrame, frame, H, self.warp_mode, self.criteria, None, 1)
except:
print('Warning: find transform failed. Set warp as identity')
return H
def applyFeaures(self, raw_frame, detections=None):
# Initialize
height, width, _ = raw_frame.shape
frame = cv2.cvtColor(raw_frame, cv2.COLOR_BGR2GRAY)
H = np.eye(2, 3)
# Downscale image (TODO: consider using pyramids)
if self.downscale > 1.0:
# frame = cv2.GaussianBlur(frame, (3, 3), 1.5)
frame = cv2.resize(frame, (width // self.downscale, height // self.downscale))
width = width // self.downscale
height = height // self.downscale
# find the keypoints
mask = np.zeros_like(frame)
# mask[int(0.05 * height): int(0.95 * height), int(0.05 * width): int(0.95 * width)] = 255
mask[int(0.02 * height): int(0.98 * height), int(0.02 * width): int(0.98 * width)] = 255
if detections is not None:
for det in detections:
tlbr = (det[:4] / self.downscale).astype(np.int_)
mask[tlbr[1]:tlbr[3], tlbr[0]:tlbr[2]] = 0
keypoints = self.detector.detect(frame, mask)
# compute the descriptors
keypoints, descriptors = self.extractor.compute(frame, keypoints)
# Handle first frame
if not self.initializedFirstFrame:
# Initialize data
self.prevFrame = frame.copy()
self.prevKeyPoints = copy.copy(keypoints)
self.prevDescriptors = copy.copy(descriptors)
# Initialization done
self.initializedFirstFrame = True
return H
# Match descriptors.
knnMatches = self.matcher.knnMatch(self.prevDescriptors, descriptors, 2)
# Filtered matches based on smallest spatial distance
matches = []
spatialDistances = []
maxSpatialDistance = 0.25 * np.array([width, height])
# Handle empty matches case
if len(knnMatches) == 0:
# Store to next iteration
self.prevFrame = frame.copy()
self.prevKeyPoints = copy.copy(keypoints)
self.prevDescriptors = copy.copy(descriptors)
return H
for m, n in knnMatches:
if m.distance < 0.9 * n.distance:
prevKeyPointLocation = self.prevKeyPoints[m.queryIdx].pt
currKeyPointLocation = keypoints[m.trainIdx].pt
spatialDistance = (prevKeyPointLocation[0] - currKeyPointLocation[0],
prevKeyPointLocation[1] - currKeyPointLocation[1])
if (np.abs(spatialDistance[0]) < maxSpatialDistance[0]) and \
(np.abs(spatialDistance[1]) < maxSpatialDistance[1]):
spatialDistances.append(spatialDistance)
matches.append(m)
meanSpatialDistances = np.mean(spatialDistances, 0)
stdSpatialDistances = np.std(spatialDistances, 0)
inliesrs = (spatialDistances - meanSpatialDistances) < 2.5 * stdSpatialDistances
goodMatches = []
prevPoints = []
currPoints = []
for i in range(len(matches)):
if inliesrs[i, 0] and inliesrs[i, 1]:
goodMatches.append(matches[i])
prevPoints.append(self.prevKeyPoints[matches[i].queryIdx].pt)
currPoints.append(keypoints[matches[i].trainIdx].pt)
prevPoints = np.array(prevPoints)
currPoints = np.array(currPoints)
# Draw the keypoint matches on the output image
if 0:
matches_img = np.hstack((self.prevFrame, frame))
matches_img = cv2.cvtColor(matches_img, cv2.COLOR_GRAY2BGR)
W = np.size(self.prevFrame, 1)
for m in goodMatches:
prev_pt = np.array(self.prevKeyPoints[m.queryIdx].pt, dtype=np.int_)
curr_pt = np.array(keypoints[m.trainIdx].pt, dtype=np.int_)
curr_pt[0] += W
color = np.random.randint(0, 255, (3,))
color = (int(color[0]), int(color[1]), int(color[2]))
matches_img = cv2.line(matches_img, prev_pt, curr_pt, tuple(color), 1, cv2.LINE_AA)
matches_img = cv2.circle(matches_img, prev_pt, 2, tuple(color), -1)
matches_img = cv2.circle(matches_img, curr_pt, 2, tuple(color), -1)
plt.figure()
plt.imshow(matches_img)
plt.show()
# Find rigid matrix
if (np.size(prevPoints, 0) > 4) and (np.size(prevPoints, 0) == np.size(prevPoints, 0)):
H, inliesrs = cv2.estimateAffinePartial2D(prevPoints, currPoints, cv2.RANSAC)
# Handle downscale
if self.downscale > 1.0:
H[0, 2] *= self.downscale
H[1, 2] *= self.downscale
else:
print('Warning: not enough matching points')
# Store to next iteration
self.prevFrame = frame.copy()
self.prevKeyPoints = copy.copy(keypoints)
self.prevDescriptors = copy.copy(descriptors)
return H
def applyFile(self, raw_frame, detections=None):
line = self.gmcFile.readline()
tokens = line.split("\t")
H = np.eye(2, 3, dtype=np.float_)
H[0, 0] = float(tokens[1])
H[0, 1] = float(tokens[2])
H[0, 2] = float(tokens[3])
H[1, 0] = float(tokens[4])
H[1, 1] = float(tokens[5])
H[1, 2] = float(tokens[6])
return H
@staticmethod
def multi_gmc(stracks, H=np.eye(2, 3)):
"""
GMC module prediction
:param stracks: List[Strack]
"""
if len(stracks) > 0:
multi_mean = np.asarray([st.kalman_filter.kf.x.copy() for st in stracks])
multi_covariance = np.asarray([st.kalman_filter.kf.P for st in stracks])
R = H[:2, :2]
R8x8 = np.kron(np.eye(4, dtype=float), R)
t = H[:2, 2]
for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)):
mean = R8x8.dot(mean)
mean[:2] += t
cov = R8x8.dot(cov).dot(R8x8.transpose())
stracks[i].kalman_filter.kf.x = mean
stracks[i].kalman_filter.kf.P = cov

View File

@ -0,0 +1,327 @@
"""
Deep Sort
"""
import numpy as np
import torch
from torchvision.ops import nms
import cv2
import torchvision.transforms as T
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_reid
from .matching import *
from .reid_models.OSNet import *
from .reid_models.load_model_tools import load_pretrained_weights
from .reid_models.deepsort_reid import Extractor
REID_MODEL_DICT = {
'osnet_x1_0': osnet_x1_0,
'osnet_x0_75': osnet_x0_75,
'osnet_x0_5': osnet_x0_5,
'osnet_x0_25': osnet_x0_25,
'deepsort': Extractor
}
def load_reid_model(reid_model, reid_model_path):
if 'osnet' in reid_model:
func = REID_MODEL_DICT[reid_model]
model = func(num_classes=1, pretrained=False, )
load_pretrained_weights(model, reid_model_path)
model.cuda().eval()
elif 'deepsort' in reid_model:
model = REID_MODEL_DICT[reid_model](reid_model_path, use_cuda=True)
else:
raise NotImplementedError
return model
class DeepSortTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
self.with_reid = not args.discard_reid
self.reid_model, self.crop_transforms = None, None
if self.with_reid:
self.reid_model = load_reid_model(args.reid_model, args.reid_model_path)
self.crop_transforms = T.Compose([
# T.ToPILImage(),
# T.Resize(size=(256, 128)),
T.ToTensor(), # (c, 128, 256)
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
self.bbox_crop_size = (64, 128) if 'deepsort' in args.reid_model else (128, 128)
def reid_preprocess(self, obj_bbox):
"""
preprocess cropped object bboxes
obj_bbox: np.ndarray, shape=(h_obj, w_obj, c)
return:
torch.Tensor of shape (c, 128, 256)
"""
obj_bbox = cv2.resize(obj_bbox.astype(np.float32) / 255.0, dsize=self.bbox_crop_size) # shape: (h, w, c)
return self.crop_transforms(obj_bbox)
def get_feature(self, tlwhs, ori_img):
"""
get apperance feature of an object
tlwhs: shape (num_of_objects, 4)
ori_img: original image, np.ndarray, shape(H, W, C)
"""
obj_bbox = []
for tlwh in tlwhs:
tlwh = list(map(int, tlwh))
# limit to the legal range
tlwh[0], tlwh[1] = max(tlwh[0], 0), max(tlwh[1], 0)
tlbr_tensor = self.reid_preprocess(ori_img[tlwh[1]: tlwh[1] + tlwh[3], tlwh[0]: tlwh[0] + tlwh[2]])
obj_bbox.append(tlbr_tensor)
if not obj_bbox:
return np.array([])
obj_bbox = torch.stack(obj_bbox, dim=0)
obj_bbox = obj_bbox.cuda()
features = self.reid_model(obj_bbox) # shape: (num_of_objects, feature_dim)
return features.cpu().detach().numpy()
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
dets = bboxes[remain_inds]
cates = categories[remain_inds]
scores_keep = scores[remain_inds]
features_keep = self.get_feature(tlwhs=dets[:, :4], ori_img=ori_img)
if len(dets) > 0:
'''Detections'''
detections = [Tracklet_w_reid(tlwh, s, cate, motion=self.motion, feat=feat) for
(tlwh, s, cate, feat) in zip(dets, scores_keep, cates, features_keep)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with appearance'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
matches, u_track, u_detection = matching_cascade(distance_metric=self.gated_metric,
matching_thresh=0.9,
cascade_depth=30,
tracks=tracklet_pool,
detections=detections
)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
'''Step 3: Second association, with iou'''
tracklet_for_iou = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
detection_for_iou = [detections[i] for i in u_detection]
dists = iou_distance(tracklet_for_iou, detection_for_iou)
matches, u_track, u_detection = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = tracklet_for_iou[itracked]
det = detection_for_iou[idet]
if track.state == TrackState.Tracked:
track.update(detection_for_iou[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = tracklet_for_iou[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detection_for_iou[i] for i in u_detection]
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def gated_metric(self, tracks, dets):
"""
get cost matrix, firstly calculate apperence cost, then filter by Kalman state.
tracks: List[STrack]
dets: List[STrack]
"""
apperance_dist = nearest_embedding_distance(tracks=tracks, detections=dets, metric='cosine')
cost_matrix = self.gate_cost_matrix(apperance_dist, tracks, dets, )
return cost_matrix
def gate_cost_matrix(self, cost_matrix, tracks, dets, max_apperance_thresh=0.15, gated_cost=1e5, only_position=False):
"""
gate cost matrix by calculating the Kalman state distance and constrainted by
0.95 confidence interval of x2 distribution
cost_matrix: np.ndarray, shape (len(tracks), len(dets))
tracks: List[STrack]
dets: List[STrack]
gated_cost: a very largt const to infeasible associations
only_position: use [xc, yc, a, h] as state vector or only use [xc, yc]
return:
updated cost_matirx, np.ndarray
"""
gating_dim = 2 if only_position else 4
gating_threshold = chi2inv95[gating_dim]
measurements = np.asarray([Tracklet.tlwh_to_xyah(det.tlwh) for det in dets]) # (len(dets), 4)
cost_matrix[cost_matrix > max_apperance_thresh] = gated_cost
for row, track in enumerate(tracks):
gating_distance = track.kalman_filter.gating_distance(measurements, )
cost_matrix[row, gating_distance > gating_threshold] = gated_cost
return cost_matrix
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,74 @@
from filterpy.kalman import KalmanFilter
import numpy as np
import scipy
class BaseKalman:
def __init__(self,
state_dim: int = 8,
observation_dim: int = 4,
F: np.ndarray = np.zeros((0, )),
P: np.ndarray = np.zeros((0, )),
Q: np.ndarray = np.zeros((0, )),
H: np.ndarray = np.zeros((0, )),
R: np.ndarray = np.zeros((0, )),
) -> None:
self.kf = KalmanFilter(dim_x=state_dim, dim_z=observation_dim, dim_u=0)
if F.shape[0] > 0: self.kf.F = F # if valid
if P.shape[0] > 0: self.kf.P = P
if Q.shape[0] > 0: self.kf.Q = Q
if H.shape[0] > 0: self.kf.H = H
if R.shape[0] > 0: self.kf.R = R
def initialize(self, observation):
return NotImplementedError
def predict(self, ):
self.kf.predict()
def update(self, observation, **kwargs):
self.kf.update(observation, self.R, self.H)
def get_state(self, ):
return self.kf.x
def gating_distance(self, measurements, only_position=False):
"""Compute gating distance between state distribution and measurements.
A suitable distance threshold can be obtained from `chi2inv95`. If
`only_position` is False, the chi-square distribution has 4 degrees of
freedom, otherwise 2.
Parameters
----------
measurements : ndarray
An Nx4 dimensional matrix of N measurements, note the format (whether xywh or xyah or others)
should be identical to state definition
only_position : Optional[bool]
If True, distance computation is done with respect to the bounding
box center position only.
Returns
-------
ndarray
Returns an array of length N, where the i-th element contains the
squared Mahalanobis distance between (mean, covariance) and
`measurements[i]`.
"""
# map state space to measurement space
mean = self.kf.x.copy()
mean = np.dot(self.kf.H, mean)
covariance = np.linalg.multi_dot((self.kf.H, self.kf.P, self.kf.H.T))
if only_position:
mean, covariance = mean[:2], covariance[:2, :2]
measurements = measurements[:, :2]
cholesky_factor = np.linalg.cholesky(covariance)
d = measurements - mean
z = scipy.linalg.solve_triangular(
cholesky_factor, d.T, lower=True, check_finite=False,
overwrite_b=True)
squared_maha = np.sum(z * z, axis=0)
return squared_maha

View File

@ -0,0 +1,99 @@
from numpy.core.multiarray import zeros as zeros
from .base_kalman import BaseKalman
import numpy as np
import cv2
class BotKalman(BaseKalman):
def __init__(self, ):
state_dim = 8 # [x, y, w, h, vx, vy, vw, vh]
observation_dim = 4
F = np.eye(state_dim, state_dim)
'''
[1, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 0, 0, 1, 0]
...
'''
for i in range(state_dim // 2):
F[i, i + state_dim // 2] = 1
H = np.eye(state_dim // 2, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160
def initialize(self, observation):
""" init x, P, Q, R
Args:
observation: x-y-w-h format
"""
# init x, P, Q, R
mean_pos = observation
mean_vel = np.zeros_like(observation)
self.kf.x = np.r_[mean_pos, mean_vel] # x_{0, 0}
std = [
2 * self._std_weight_position * observation[2], # related to h
2 * self._std_weight_position * observation[3],
2 * self._std_weight_position * observation[2],
2 * self._std_weight_position * observation[3],
10 * self._std_weight_velocity * observation[2],
10 * self._std_weight_velocity * observation[3],
10 * self._std_weight_velocity * observation[2],
10 * self._std_weight_velocity * observation[3],
]
self.kf.P = np.diag(np.square(std)) # P_{0, 0}
def predict(self, ):
""" predict step
x_{n + 1, n} = F * x_{n, n}
P_{n + 1, n} = F * P_{n, n} * F^T + Q
"""
std_pos = [
self._std_weight_position * self.kf.x[2],
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[2],
self._std_weight_position * self.kf.x[3]]
std_vel = [
self._std_weight_velocity * self.kf.x[2],
self._std_weight_velocity * self.kf.x[3],
self._std_weight_velocity * self.kf.x[2],
self._std_weight_velocity * self.kf.x[3]]
Q = np.diag(np.square(np.r_[std_pos, std_vel]))
self.kf.predict(Q=Q)
def update(self, z):
""" update step
Args:
z: observation x-y-a-h format
K_n = P_{n, n - 1} * H^T * (H P_{n, n - 1} H^T + R)^{-1}
x_{n, n} = x_{n, n - 1} + K_n * (z - H * x_{n, n - 1})
P_{n, n} = (I - K_n * H) P_{n, n - 1} (I - K_n * H)^T + K_n R_n
"""
std = [
self._std_weight_position * self.kf.x[2],
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[2],
self._std_weight_position * self.kf.x[3]]
R = np.diag(np.square(std))
self.kf.update(z=z, R=R)

View File

@ -0,0 +1,97 @@
from .base_kalman import BaseKalman
import numpy as np
class ByteKalman(BaseKalman):
def __init__(self, ):
state_dim = 8 # [x, y, a, h, vx, vy, va, vh]
observation_dim = 4
F = np.eye(state_dim, state_dim)
'''
[1, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 0, 0, 1, 0]
...
'''
for i in range(state_dim // 2):
F[i, i + state_dim // 2] = 1
H = np.eye(state_dim // 2, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160
def initialize(self, observation):
""" init x, P, Q, R
Args:
observation: x-y-a-h format
"""
# init x, P, Q, R
mean_pos = observation
mean_vel = np.zeros_like(observation)
self.kf.x = np.r_[mean_pos, mean_vel] # x_{0, 0}
std = [
2 * self._std_weight_position * observation[3], # related to h
2 * self._std_weight_position * observation[3],
1e-2,
2 * self._std_weight_position * observation[3],
10 * self._std_weight_velocity * observation[3],
10 * self._std_weight_velocity * observation[3],
1e-5,
10 * self._std_weight_velocity * observation[3],
]
self.kf.P = np.diag(np.square(std)) # P_{0, 0}
def predict(self, ):
""" predict step
x_{n + 1, n} = F * x_{n, n}
P_{n + 1, n} = F * P_{n, n} * F^T + Q
"""
std_pos = [
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[3],
1e-2,
self._std_weight_position * self.kf.x[3]]
std_vel = [
self._std_weight_velocity * self.kf.x[3],
self._std_weight_velocity * self.kf.x[3],
1e-5,
self._std_weight_velocity * self.kf.x[3]]
Q = np.diag(np.square(np.r_[std_pos, std_vel]))
self.kf.predict(Q=Q)
def update(self, z):
""" update step
Args:
z: observation x-y-a-h format
K_n = P_{n, n - 1} * H^T * (H P_{n, n - 1} H^T + R)^{-1}
x_{n, n} = x_{n, n - 1} + K_n * (z - H * x_{n, n - 1})
P_{n, n} = (I - K_n * H) P_{n, n - 1} (I - K_n * H)^T + K_n R_n
"""
std = [
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[3],
1e-1,
self._std_weight_position * self.kf.x[3]]
R = np.diag(np.square(std))
self.kf.update(z=z, R=R)

View File

@ -0,0 +1,144 @@
from numpy.core.multiarray import zeros as zeros
from .base_kalman import BaseKalman
import numpy as np
from copy import deepcopy
class OCSORTKalman(BaseKalman):
def __init__(self, ):
state_dim = 7 # [x, y, s, a, vx, vy, vs] s: area
observation_dim = 4
F = np.array([[1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]])
H = np.eye(state_dim // 2 + 1, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
# TODO check
# give high uncertainty to the unobservable initial velocities
self.kf.R[2:, 2:] *= 10 # [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 10, 0], [0, 0, 0, 10]]
self.kf.P[4:, 4:] *= 1000
self.kf.P *= 10
self.kf.Q[-1, -1] *= 0.01
self.kf.Q[4:, 4:] *= 0.01
# keep all observations
self.history_obs = []
self.attr_saved = None
self.observed = False
def initialize(self, observation):
"""
Args:
observation: x-y-s-a
"""
self.kf.x = self.kf.x.flatten()
self.kf.x[:4] = observation
def predict(self, ):
""" predict step
"""
# s + vs
if (self.kf.x[6] + self.kf.x[2] <= 0):
self.kf.x[6] *= 0.0
self.kf.predict()
def _freeze(self, ):
""" freeze all the param of Kalman
"""
self.attr_saved = deepcopy(self.kf.__dict__)
def _unfreeze(self, ):
""" when observe an lost object again, use the virtual trajectory
"""
if self.attr_saved is not None:
new_history = deepcopy(self.history_obs)
self.kf.__dict__ = self.attr_saved
self.history_obs = self.history_obs[:-1]
occur = [int(d is None) for d in new_history]
indices = np.where(np.array(occur)==0)[0]
index1 = indices[-2]
index2 = indices[-1]
box1 = new_history[index1]
x1, y1, s1, r1 = box1
w1 = np.sqrt(s1 * r1)
h1 = np.sqrt(s1 / r1)
box2 = new_history[index2]
x2, y2, s2, r2 = box2
w2 = np.sqrt(s2 * r2)
h2 = np.sqrt(s2 / r2)
time_gap = index2 - index1
dx = (x2-x1)/time_gap
dy = (y2-y1)/time_gap
dw = (w2-w1)/time_gap
dh = (h2-h1)/time_gap
for i in range(index2 - index1):
"""
The default virtual trajectory generation is by linear
motion (constant speed hypothesis), you could modify this
part to implement your own.
"""
x = x1 + (i+1) * dx
y = y1 + (i+1) * dy
w = w1 + (i+1) * dw
h = h1 + (i+1) * dh
s = w * h
r = w / float(h)
new_box = np.array([x, y, s, r]).reshape((4, 1))
"""
I still use predict-update loop here to refresh the parameters,
but this can be faster by directly modifying the internal parameters
as suggested in the paper. I keep this naive but slow way for
easy read and understanding
"""
self.kf.update(new_box)
if not i == (index2-index1-1):
self.kf.predict()
def update(self, z):
""" update step
For simplicity, directly change the self.kf as OCSORT modify the intrinsic Kalman
Args:
z: observation x-y-s-a format
"""
self.history_obs.append(z)
if z is None:
if self.observed:
self._freeze()
self.observed = False
self.kf.update(z)
else:
if not self.observed: # Get observation, use online smoothing to re-update parameters
self._unfreeze()
self.kf.update(z)
self.observed = True

View File

@ -0,0 +1,73 @@
from numpy.core.multiarray import zeros as zeros
from .base_kalman import BaseKalman
import numpy as np
from copy import deepcopy
class SORTKalman(BaseKalman):
def __init__(self, ):
state_dim = 7 # [x, y, s, a, vx, vy, vs] s: area
observation_dim = 4
F = np.array([[1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1]])
H = np.eye(state_dim // 2 + 1, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
# TODO check
# give high uncertainty to the unobservable initial velocities
self.kf.R[2:, 2:] *= 10 # [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 10, 0], [0, 0, 0, 10]]
self.kf.P[4:, 4:] *= 1000
self.kf.P *= 10
self.kf.Q[-1, -1] *= 0.01
self.kf.Q[4:, 4:] *= 0.01
# keep all observations
self.history_obs = []
self.attr_saved = None
self.observed = False
def initialize(self, observation):
"""
Args:
observation: x-y-s-a
"""
self.kf.x = self.kf.x.flatten()
self.kf.x[:4] = observation
def predict(self, ):
""" predict step
"""
# s + vs
if (self.kf.x[6] + self.kf.x[2] <= 0):
self.kf.x[6] *= 0.0
self.kf.predict()
def update(self, z):
""" update step
For simplicity, directly change the self.kf as OCSORT modify the intrinsic Kalman
Args:
z: observation x-y-s-a format
"""
self.kf.update(z)

View File

@ -0,0 +1,101 @@
from .base_kalman import BaseKalman
import numpy as np
class NSAKalman(BaseKalman):
def __init__(self, ):
state_dim = 8 # [x, y, a, h, vx, vy, va, vh]
observation_dim = 4
F = np.eye(state_dim, state_dim)
'''
[1, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 0, 0, 1, 0]
...
'''
for i in range(state_dim // 2):
F[i, i + state_dim // 2] = 1
H = np.eye(state_dim // 2, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160
def initialize(self, observation):
""" init x, P, Q, R
Args:
observation: x-y-a-h format
"""
# init x, P, Q, R
mean_pos = observation
mean_vel = np.zeros_like(observation)
self.kf.x = np.r_[mean_pos, mean_vel] # x_{0, 0}
std = [
2 * self._std_weight_position * observation[3], # related to h
2 * self._std_weight_position * observation[3],
1e-2,
2 * self._std_weight_position * observation[3],
10 * self._std_weight_velocity * observation[3],
10 * self._std_weight_velocity * observation[3],
1e-5,
10 * self._std_weight_velocity * observation[3],
]
self.kf.P = np.diag(np.square(std)) # P_{0, 0}
def predict(self, ):
""" predict step
x_{n + 1, n} = F * x_{n, n}
P_{n + 1, n} = F * P_{n, n} * F^T + Q
"""
std_pos = [
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[3],
1e-2,
self._std_weight_position * self.kf.x[3]]
std_vel = [
self._std_weight_velocity * self.kf.x[3],
self._std_weight_velocity * self.kf.x[3],
1e-5,
self._std_weight_velocity * self.kf.x[3]]
Q = np.diag(np.square(np.r_[std_pos, std_vel]))
self.kf.predict(Q=Q)
def update(self, z, score):
""" update step
Args:
z: observation x-y-a-h format
score: the detection score/confidence required by NSA kalman
K_n = P_{n, n - 1} * H^T * (H P_{n, n - 1} H^T + R)^{-1}
x_{n, n} = x_{n, n - 1} + K_n * (z - H * x_{n, n - 1})
P_{n, n} = (I - K_n * H) P_{n, n - 1} (I - K_n * H)^T + K_n R_n
"""
std = [
self._std_weight_position * self.kf.x[3],
self._std_weight_position * self.kf.x[3],
1e-1,
self._std_weight_position * self.kf.x[3]]
# NSA
std = [(1. - score) * x for x in std]
R = np.diag(np.square(std))
self.kf.update(z=z, R=R)

View File

@ -0,0 +1,27 @@
from .base_kalman import BaseKalman
import numpy as np
class UCMCKalman(BaseKalman):
def __init__(self, ):
state_dim = 8
observation_dim = 4
F = np.eye(state_dim, state_dim)
'''
[1, 0, 0, 0, 1, 0, 0]
[0, 1, 0, 0, 0, 1, 0]
...
'''
for i in range(state_dim // 2):
F[i, i + state_dim // 2] = 1
H = np.eye(state_dim // 2, state_dim)
super().__init__(state_dim=state_dim,
observation_dim=observation_dim,
F=F,
H=H)
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160

View File

@ -0,0 +1,388 @@
import cv2
import numpy as np
import scipy
import lap
from scipy.spatial.distance import cdist
import math
from cython_bbox import bbox_overlaps as bbox_ious
import time
chi2inv95 = {
1: 3.8415,
2: 5.9915,
3: 7.8147,
4: 9.4877,
5: 11.070,
6: 12.592,
7: 14.067,
8: 15.507,
9: 16.919}
def merge_matches(m1, m2, shape):
O,P,Q = shape
m1 = np.asarray(m1)
m2 = np.asarray(m2)
M1 = scipy.sparse.coo_matrix((np.ones(len(m1)), (m1[:, 0], m1[:, 1])), shape=(O, P))
M2 = scipy.sparse.coo_matrix((np.ones(len(m2)), (m2[:, 0], m2[:, 1])), shape=(P, Q))
mask = M1*M2
match = mask.nonzero()
match = list(zip(match[0], match[1]))
unmatched_O = tuple(set(range(O)) - set([i for i, j in match]))
unmatched_Q = tuple(set(range(Q)) - set([j for i, j in match]))
return match, unmatched_O, unmatched_Q
def _indices_to_matches(cost_matrix, indices, thresh):
matched_cost = cost_matrix[tuple(zip(*indices))]
matched_mask = (matched_cost <= thresh)
matches = indices[matched_mask]
unmatched_a = tuple(set(range(cost_matrix.shape[0])) - set(matches[:, 0]))
unmatched_b = tuple(set(range(cost_matrix.shape[1])) - set(matches[:, 1]))
return matches, unmatched_a, unmatched_b
def linear_assignment(cost_matrix, thresh):
if cost_matrix.size == 0:
return np.empty((0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple(range(cost_matrix.shape[1]))
matches, unmatched_a, unmatched_b = [], [], []
cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)
for ix, mx in enumerate(x):
if mx >= 0:
matches.append([ix, mx])
unmatched_a = np.where(x < 0)[0]
unmatched_b = np.where(y < 0)[0]
matches = np.asarray(matches)
return matches, unmatched_a, unmatched_b
def ious(atlbrs, btlbrs):
"""
Compute cost based on IoU
:type atlbrs: list[tlbr] | np.ndarray
:type atlbrs: list[tlbr] | np.ndarray
:rtype ious np.ndarray
"""
ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float64)
if ious.size == 0:
return ious
ious = bbox_ious(
np.ascontiguousarray(atlbrs, dtype=np.float64),
np.ascontiguousarray(btlbrs, dtype=np.float64)
)
return ious
def iou_distance(atracks, btracks):
"""
Compute cost based on IoU
:type atracks: list[STrack]
:type btracks: list[STrack]
:rtype cost_matrix np.ndarray
"""
if (len(atracks)>0 and isinstance(atracks[0], np.ndarray)) or (len(btracks) > 0 and isinstance(btracks[0], np.ndarray)):
atlbrs = atracks
btlbrs = btracks
else:
atlbrs = [track.tlbr for track in atracks]
btlbrs = [track.tlbr for track in btracks]
_ious = ious(atlbrs, btlbrs)
cost_matrix = 1 - _ious
return cost_matrix
def v_iou_distance(atracks, btracks):
"""
Compute cost based on IoU
:type atracks: list[STrack]
:type btracks: list[STrack]
:rtype cost_matrix np.ndarray
"""
if (len(atracks)>0 and isinstance(atracks[0], np.ndarray)) or (len(btracks) > 0 and isinstance(btracks[0], np.ndarray)):
atlbrs = atracks
btlbrs = btracks
else:
atlbrs = [track.tlwh_to_tlbr(track.pred_bbox) for track in atracks]
btlbrs = [track.tlwh_to_tlbr(track.pred_bbox) for track in btracks]
_ious = ious(atlbrs, btlbrs)
cost_matrix = 1 - _ious
return cost_matrix
def embedding_distance(tracks, detections, metric='cosine'):
"""
:param tracks: list[STrack]
:param detections: list[BaseTrack]
:param metric:
:return: cost_matrix np.ndarray
"""
cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float64)
if cost_matrix.size == 0:
return cost_matrix
det_features = np.asarray([track.curr_feat for track in detections], dtype=np.float64)
#for i, track in enumerate(tracks):
#cost_matrix[i, :] = np.maximum(0.0, cdist(track.smooth_feat.reshape(1,-1), det_features, metric))
track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float64)
cost_matrix = np.maximum(0.0, cdist(track_features, det_features, metric)) # Nomalized features
return cost_matrix
def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98):
if cost_matrix.size == 0:
return cost_matrix
gating_dim = 2 if only_position else 4
gating_threshold = chi2inv95[gating_dim]
measurements = np.asarray([det.to_xyah() for det in detections])
for row, track in enumerate(tracks):
gating_distance = kf.gating_distance(
track.mean, track.covariance, measurements, only_position, metric='maha')
cost_matrix[row, gating_distance > gating_threshold] = np.inf
cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_) * gating_distance
return cost_matrix
def fuse_iou(cost_matrix, tracks, detections):
if cost_matrix.size == 0:
return cost_matrix
reid_sim = 1 - cost_matrix
iou_dist = iou_distance(tracks, detections)
iou_sim = 1 - iou_dist
fuse_sim = reid_sim * (1 + iou_sim) / 2
det_scores = np.array([det.score for det in detections])
det_scores = np.expand_dims(det_scores, axis=0).repeat(cost_matrix.shape[0], axis=0)
#fuse_sim = fuse_sim * (1 + det_scores) / 2
fuse_cost = 1 - fuse_sim
return fuse_cost
def fuse_score(cost_matrix, detections):
if cost_matrix.size == 0:
return cost_matrix
iou_sim = 1 - cost_matrix
det_scores = np.array([det.score for det in detections])
det_scores = np.expand_dims(det_scores, axis=0).repeat(cost_matrix.shape[0], axis=0)
fuse_sim = iou_sim * det_scores
fuse_cost = 1 - fuse_sim
return fuse_cost
def greedy_assignment_iou(dist, thresh):
matched_indices = []
if dist.shape[1] == 0:
return np.array(matched_indices, np.int32).reshape(-1, 2)
for i in range(dist.shape[0]):
j = dist[i].argmin()
if dist[i][j] < thresh:
dist[:, j] = 1.
matched_indices.append([j, i])
return np.array(matched_indices, np.int32).reshape(-1, 2)
def greedy_assignment(dists, threshs):
matches = greedy_assignment_iou(dists.T, threshs)
u_det = [d for d in range(dists.shape[1]) if not (d in matches[:, 1])]
u_track = [d for d in range(dists.shape[0]) if not (d in matches[:, 0])]
return matches, u_track, u_det
def fuse_score_matrix(cost_matrix, detections, tracks):
if cost_matrix.size == 0:
return cost_matrix
iou_sim = 1 - cost_matrix
det_scores = np.array([det.score for det in detections])
det_scores = np.expand_dims(det_scores, axis=0).repeat(cost_matrix.shape[0], axis=0)
trk_scores = np.array([trk.score for trk in tracks])
trk_scores = np.expand_dims(trk_scores, axis=1).repeat(cost_matrix.shape[1], axis=1)
mid_scores = (det_scores + trk_scores) / 2
fuse_sim = iou_sim * mid_scores
fuse_cost = 1 - fuse_sim
return fuse_cost
"""
calculate buffered IoU, used in C_BIoU_Tracker
"""
def buffered_iou_distance(atracks, btracks, level=1):
"""
atracks: list[C_BIoUSTrack], tracks
btracks: list[C_BIoUSTrack], detections
level: cascade level, 1 or 2
"""
assert level in [1, 2], 'level must be 1 or 2'
if level == 1: # use motion_state1(tracks) and buffer_bbox1(detections) to calculate
atlbrs = [track.tlwh_to_tlbr(track.motion_state1) for track in atracks]
btlbrs = [det.tlwh_to_tlbr(det.buffer_bbox1) for det in btracks]
else:
atlbrs = [track.tlwh_to_tlbr(track.motion_state2) for track in atracks]
btlbrs = [det.tlwh_to_tlbr(det.buffer_bbox2) for det in btracks]
_ious = ious(atlbrs, btlbrs)
cost_matrix = 1 - _ious
return cost_matrix
"""
observation centric association, with velocity, for OC Sort
"""
def observation_centric_association(tracklets, detections, iou_threshold, velocities, previous_obs, vdc_weight):
if(len(tracklets) == 0):
return np.empty((0, 2), dtype=int), tuple(range(len(tracklets))), tuple(range(len(detections)))
# get numpy format bboxes
trk_tlbrs = np.array([track.tlbr for track in tracklets])
det_tlbrs = np.array([det.tlbr for det in detections])
det_scores = np.array([det.score for det in detections])
iou_matrix = bbox_ious(trk_tlbrs, det_tlbrs)
Y, X = speed_direction_batch(det_tlbrs, previous_obs)
inertia_Y, inertia_X = velocities[:,0], velocities[:,1]
inertia_Y = np.repeat(inertia_Y[:, np.newaxis], Y.shape[1], axis=1)
inertia_X = np.repeat(inertia_X[:, np.newaxis], X.shape[1], axis=1)
diff_angle_cos = inertia_X * X + inertia_Y * Y
diff_angle_cos = np.clip(diff_angle_cos, a_min=-1, a_max=1)
diff_angle = np.arccos(diff_angle_cos)
diff_angle = (np.pi / 2.0 - np.abs(diff_angle)) / np.pi
valid_mask = np.ones(previous_obs.shape[0])
valid_mask[np.where(previous_obs[:, 4] < 0)] = 0
scores = np.repeat(det_scores[:, np.newaxis], trk_tlbrs.shape[0], axis=1)
valid_mask = np.repeat(valid_mask[:, np.newaxis], X.shape[1], axis=1)
angle_diff_cost = (valid_mask * diff_angle) * vdc_weight
angle_diff_cost = angle_diff_cost * scores.T
matches, unmatched_a, unmatched_b = linear_assignment(- (iou_matrix + angle_diff_cost), thresh=0.9)
return matches, unmatched_a, unmatched_b
"""
helper func of observation_centric_association
"""
def speed_direction_batch(dets, tracks):
tracks = tracks[..., np.newaxis]
CX1, CY1 = (dets[:, 0] + dets[:, 2]) / 2.0, (dets[:,1] + dets[:,3]) / 2.0
CX2, CY2 = (tracks[:, 0] + tracks[:, 2]) / 2.0, (tracks[:, 1] + tracks[:, 3]) / 2.0
dx = CX2 - CX1
dy = CY2 - CY1
norm = np.sqrt(dx**2 + dy**2) + 1e-6
dx = dx / norm
dy = dy / norm
return dy, dx # size: num_track x num_det
def matching_cascade(
distance_metric, matching_thresh, cascade_depth, tracks, detections,
track_indices=None, detection_indices=None):
"""
Run matching cascade in DeepSORT
distance_metirc: function that calculate the cost matrix
matching_thresh: float, Associations with cost larger than this value are disregarded.
cascade_path: int, equal to max_age of a tracklet
tracks: List[STrack], current tracks
detections: List[STrack], current detections
track_indices: List[int], tracks that will be calculated, Default None
detection_indices: List[int], detections that will be calculated, Default None
return:
matched pair, unmatched tracks, unmatced detections: List[int], List[int], List[int]
"""
if track_indices is None:
track_indices = list(range(len(tracks)))
if detection_indices is None:
detection_indices = list(range(len(detections)))
detections_to_match = detection_indices
matches = []
for level in range(cascade_depth):
"""
match new track with detection firstly
"""
if not len(detections_to_match): # No detections left
break
track_indices_l = [
k for k in track_indices
if tracks[k].time_since_update == 1 + level
] # filter tracks whose age is equal to level + 1 (The age of Newest track = 1)
if not len(track_indices_l): # Nothing to match at this level
continue
# tracks and detections which will be mathcted in current level
track_l = [tracks[idx] for idx in track_indices_l] # List[STrack]
det_l = [detections[idx] for idx in detections_to_match] # List[STrack]
# calculate the cost matrix
cost_matrix = distance_metric(track_l, det_l)
# solve the linear assignment problem
matched_row_col, umatched_row, umatched_col = \
linear_assignment(cost_matrix, matching_thresh)
for row, col in matched_row_col: # for those who matched
matches.append((track_indices_l[row], detections_to_match[col]))
umatched_detecion_l = [] # current detections not matched
for col in umatched_col: # for detections not matched
umatched_detecion_l.append(detections_to_match[col])
detections_to_match = umatched_detecion_l # update detections to match for next level
unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))
return matches, unmatched_tracks, detections_to_match
def nearest_embedding_distance(tracks, detections, metric='cosine'):
"""
different from embedding distance, this func calculate the
nearest distance among all track history features and detections
tracks: list[STrack]
detections: list[STrack]
metric: str, cosine or euclidean
TODO: support euclidean distance
return:
cost_matrix, np.ndarray, shape(len(tracks), len(detections))
"""
cost_matrix = np.zeros((len(tracks), len(detections)))
det_features = np.asarray([det.features[-1] for det in detections])
for row, track in enumerate(tracks):
track_history_features = np.asarray(track.features)
dist = 1. - cal_cosine_distance(track_history_features, det_features)
dist = dist.min(axis=0)
cost_matrix[row, :] = dist
return cost_matrix
def cal_cosine_distance(mat1, mat2):
"""
simple func to calculate cosine distance between 2 matrixs
:param mat1: np.ndarray, shape(M, dim)
:param mat2: np.ndarray, shape(N, dim)
:return: np.ndarray, shape(M, N)
"""
# result = mat1·mat2^T / |mat1|·|mat2|
# norm mat1 and mat2
mat1 = mat1 / np.linalg.norm(mat1, axis=1, keepdims=True)
mat2 = mat2 / np.linalg.norm(mat2, axis=1, keepdims=True)
return np.dot(mat1, mat2.T)

View File

@ -0,0 +1,237 @@
"""
OC Sort
"""
import numpy as np
from collections import deque
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_velocity
from .matching import *
from cython_bbox import bbox_overlaps as bbox_ious
class OCSortTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
self.delta_t = 3
@staticmethod
def k_previous_obs(observations, cur_age, k):
if len(observations) == 0:
return [-1, -1, -1, -1, -1]
for i in range(k):
dt = k - i
if cur_age - dt in observations:
return observations[cur_age-dt]
max_age = max(observations.keys())
return observations[max_age]
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
inds_low = scores > 0.1
inds_high = scores < self.args.conf_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
cates = categories[remain_inds]
cates_second = categories[inds_second]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
if len(dets) > 0:
'''Detections'''
detections = [Tracklet_w_velocity(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, Observation Centric Momentum'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
velocities = np.array(
[trk.velocity if trk.velocity is not None else np.array((0, 0)) for trk in tracklet_pool])
# last observation, obervation-centric
# last_boxes = np.array([trk.last_observation for trk in tracklet_pool])
# historical observations
k_observations = np.array(
[self.k_previous_obs(trk.observations, trk.age, self.delta_t) for trk in tracklet_pool])
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
# Observation centric cost matrix and assignment
matches, u_track, u_detection = observation_centric_association(
tracklets=tracklet_pool, detections=detections, iou_threshold=0.3,
velocities=velocities, previous_obs=k_observations, vdc_weight=0.2
)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
''' Step 3: Second association, with low score detection boxes'''
# association the untrack to the low score detections
if len(dets_second) > 0:
'''Detections'''
detections_second = [Tracklet_w_velocity(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets_second, scores_second, cates_second)]
else:
detections_second = []
r_tracked_tracklets = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
# for unmatched tracks in the first round, use last obervation
r_tracked_tracklets_last_observ = [tracklet_pool[i].last_observation[:4] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
detections_second_bbox = [det.tlbr for det in detections_second]
dists = 1. - ious(r_tracked_tracklets_last_observ, detections_second_bbox)
matches, u_track, u_detection_second = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = r_tracked_tracklets[itracked]
det = detections_second[idet]
if track.state == TrackState.Tracked:
track.update(det, self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = r_tracked_tracklets[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detections[i] for i in u_detection]
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,98 @@
"""
AFLink code in StrongSORT(StrongSORT: Make DeepSORT Great Again(arxiv))
copied from origin repo
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import cv2
import logging
import torchvision.transforms as transforms
class TemporalBlock(nn.Module):
def __init__(self, cin, cout):
super(TemporalBlock, self).__init__()
self.conv = nn.Conv2d(cin, cout, (7, 1), bias=False)
self.relu = nn.ReLU(inplace=True)
self.bnf = nn.BatchNorm1d(cout)
self.bnx = nn.BatchNorm1d(cout)
self.bny = nn.BatchNorm1d(cout)
def bn(self, x):
x[:, :, :, 0] = self.bnf(x[:, :, :, 0])
x[:, :, :, 1] = self.bnx(x[:, :, :, 1])
x[:, :, :, 2] = self.bny(x[:, :, :, 2])
return x
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class FusionBlock(nn.Module):
def __init__(self, cin, cout):
super(FusionBlock, self).__init__()
self.conv = nn.Conv2d(cin, cout, (1, 3), bias=False)
self.bn = nn.BatchNorm2d(cout)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class Classifier(nn.Module):
def __init__(self, cin):
super(Classifier, self).__init__()
self.fc1 = nn.Linear(cin*2, cin//2)
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Linear(cin//2, 2)
def forward(self, x1, x2):
x = torch.cat((x1, x2), dim=1)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
class PostLinker(nn.Module):
def __init__(self):
super(PostLinker, self).__init__()
self.TemporalModule_1 = nn.Sequential(
TemporalBlock(1, 32),
TemporalBlock(32, 64),
TemporalBlock(64, 128),
TemporalBlock(128, 256)
)
self.TemporalModule_2 = nn.Sequential(
TemporalBlock(1, 32),
TemporalBlock(32, 64),
TemporalBlock(64, 128),
TemporalBlock(128, 256)
)
self.FusionBlock_1 = FusionBlock(256, 256)
self.FusionBlock_2 = FusionBlock(256, 256)
self.pooling = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = Classifier(256)
def forward(self, x1, x2):
x1 = x1[:, :, :, :3]
x2 = x2[:, :, :, :3]
x1 = self.TemporalModule_1(x1) # [B,1,30,3] -> [B,256,6,3]
x2 = self.TemporalModule_2(x2)
x1 = self.FusionBlock_1(x1)
x2 = self.FusionBlock_2(x2)
x1 = self.pooling(x1).squeeze(-1).squeeze(-1)
x2 = self.pooling(x2).squeeze(-1).squeeze(-1)
y = self.classifier(x1, x2)
if not self.training:
y = torch.softmax(y, dim=1)
return y

View File

@ -0,0 +1,598 @@
from __future__ import division, absolute_import
import warnings
import torch
from torch import nn
from torch.nn import functional as F
__all__ = [
'osnet_x1_0', 'osnet_x0_75', 'osnet_x0_5', 'osnet_x0_25', 'osnet_ibn_x1_0'
]
pretrained_urls = {
'osnet_x1_0':
'https://drive.google.com/uc?id=1LaG1EJpHrxdAxKnSCJ_i0u-nbxSAeiFY',
'osnet_x0_75':
'https://drive.google.com/uc?id=1uwA9fElHOk3ZogwbeY5GkLI6QPTX70Hq',
'osnet_x0_5':
'https://drive.google.com/uc?id=16DGLbZukvVYgINws8u8deSaOqjybZ83i',
'osnet_x0_25':
'https://drive.google.com/uc?id=1rb8UN5ZzPKRc_xvtHlyDh-cSz88YX9hs',
'osnet_ibn_x1_0':
'https://drive.google.com/uc?id=1sr90V6irlYYDd4_4ISU2iruoRG8J__6l'
}
##########
# Basic layers
##########
class ConvLayer(nn.Module):
"""Convolution layer (conv + bn + relu)."""
def __init__(
self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
groups=1,
IN=False
):
super(ConvLayer, self).__init__()
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
bias=False,
groups=groups
)
if IN:
self.bn = nn.InstanceNorm2d(out_channels, affine=True)
else:
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class Conv1x1(nn.Module):
"""1x1 convolution + bn + relu."""
def __init__(self, in_channels, out_channels, stride=1, groups=1):
super(Conv1x1, self).__init__()
self.conv = nn.Conv2d(
in_channels,
out_channels,
1,
stride=stride,
padding=0,
bias=False,
groups=groups
)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class Conv1x1Linear(nn.Module):
"""1x1 convolution + bn (w/o non-linearity)."""
def __init__(self, in_channels, out_channels, stride=1):
super(Conv1x1Linear, self).__init__()
self.conv = nn.Conv2d(
in_channels, out_channels, 1, stride=stride, padding=0, bias=False
)
self.bn = nn.BatchNorm2d(out_channels)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
return x
class Conv3x3(nn.Module):
"""3x3 convolution + bn + relu."""
def __init__(self, in_channels, out_channels, stride=1, groups=1):
super(Conv3x3, self).__init__()
self.conv = nn.Conv2d(
in_channels,
out_channels,
3,
stride=stride,
padding=1,
bias=False,
groups=groups
)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class LightConv3x3(nn.Module):
"""Lightweight 3x3 convolution.
1x1 (linear) + dw 3x3 (nonlinear).
"""
def __init__(self, in_channels, out_channels):
super(LightConv3x3, self).__init__()
self.conv1 = nn.Conv2d(
in_channels, out_channels, 1, stride=1, padding=0, bias=False
)
self.conv2 = nn.Conv2d(
out_channels,
out_channels,
3,
stride=1,
padding=1,
bias=False,
groups=out_channels
)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.bn(x)
x = self.relu(x)
return x
##########
# Building blocks for omni-scale feature learning
##########
class ChannelGate(nn.Module):
"""A mini-network that generates channel-wise gates conditioned on input tensor."""
def __init__(
self,
in_channels,
num_gates=None,
return_gates=False,
gate_activation='sigmoid',
reduction=16,
layer_norm=False
):
super(ChannelGate, self).__init__()
if num_gates is None:
num_gates = in_channels
self.return_gates = return_gates
self.global_avgpool = nn.AdaptiveAvgPool2d(1)
self.fc1 = nn.Conv2d(
in_channels,
in_channels // reduction,
kernel_size=1,
bias=True,
padding=0
)
self.norm1 = None
if layer_norm:
self.norm1 = nn.LayerNorm((in_channels // reduction, 1, 1))
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Conv2d(
in_channels // reduction,
num_gates,
kernel_size=1,
bias=True,
padding=0
)
if gate_activation == 'sigmoid':
self.gate_activation = nn.Sigmoid()
elif gate_activation == 'relu':
self.gate_activation = nn.ReLU(inplace=True)
elif gate_activation == 'linear':
self.gate_activation = None
else:
raise RuntimeError(
"Unknown gate activation: {}".format(gate_activation)
)
def forward(self, x):
input = x
x = self.global_avgpool(x)
x = self.fc1(x)
if self.norm1 is not None:
x = self.norm1(x)
x = self.relu(x)
x = self.fc2(x)
if self.gate_activation is not None:
x = self.gate_activation(x)
if self.return_gates:
return x
return input * x
class OSBlock(nn.Module):
"""Omni-scale feature learning block."""
def __init__(
self,
in_channels,
out_channels,
IN=False,
bottleneck_reduction=4,
**kwargs
):
super(OSBlock, self).__init__()
mid_channels = out_channels // bottleneck_reduction
self.conv1 = Conv1x1(in_channels, mid_channels)
self.conv2a = LightConv3x3(mid_channels, mid_channels)
self.conv2b = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
)
self.conv2c = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
)
self.conv2d = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
)
self.gate = ChannelGate(mid_channels)
self.conv3 = Conv1x1Linear(mid_channels, out_channels)
self.downsample = None
if in_channels != out_channels:
self.downsample = Conv1x1Linear(in_channels, out_channels)
self.IN = None
if IN:
self.IN = nn.InstanceNorm2d(out_channels, affine=True)
def forward(self, x):
identity = x
x1 = self.conv1(x)
x2a = self.conv2a(x1)
x2b = self.conv2b(x1)
x2c = self.conv2c(x1)
x2d = self.conv2d(x1)
x2 = self.gate(x2a) + self.gate(x2b) + self.gate(x2c) + self.gate(x2d)
x3 = self.conv3(x2)
if self.downsample is not None:
identity = self.downsample(identity)
out = x3 + identity
if self.IN is not None:
out = self.IN(out)
return F.relu(out)
##########
# Network architecture
##########
class OSNet(nn.Module):
"""Omni-Scale Network.
Reference:
- Zhou et al. Omni-Scale Feature Learning for Person Re-Identification. ICCV, 2019.
- Zhou et al. Learning Generalisable Omni-Scale Representations
for Person Re-Identification. TPAMI, 2021.
"""
def __init__(
self,
num_classes,
blocks,
layers,
channels,
feature_dim=512,
loss='softmax',
IN=False,
**kwargs
):
super(OSNet, self).__init__()
num_blocks = len(blocks)
assert num_blocks == len(layers)
assert num_blocks == len(channels) - 1
self.loss = loss
self.feature_dim = feature_dim
# convolutional backbone
self.conv1 = ConvLayer(3, channels[0], 7, stride=2, padding=3, IN=IN)
self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
self.conv2 = self._make_layer(
blocks[0],
layers[0],
channels[0],
channels[1],
reduce_spatial_size=True,
IN=IN
)
self.conv3 = self._make_layer(
blocks[1],
layers[1],
channels[1],
channels[2],
reduce_spatial_size=True
)
self.conv4 = self._make_layer(
blocks[2],
layers[2],
channels[2],
channels[3],
reduce_spatial_size=False
)
self.conv5 = Conv1x1(channels[3], channels[3])
self.global_avgpool = nn.AdaptiveAvgPool2d(1)
# fully connected layer
self.fc = self._construct_fc_layer(
self.feature_dim, channels[3], dropout_p=None
)
# identity classification layer
self.classifier = nn.Linear(self.feature_dim, num_classes)
self._init_params()
def _make_layer(
self,
block,
layer,
in_channels,
out_channels,
reduce_spatial_size,
IN=False
):
layers = []
layers.append(block(in_channels, out_channels, IN=IN))
for i in range(1, layer):
layers.append(block(out_channels, out_channels, IN=IN))
if reduce_spatial_size:
layers.append(
nn.Sequential(
Conv1x1(out_channels, out_channels),
nn.AvgPool2d(2, stride=2)
)
)
return nn.Sequential(*layers)
def _construct_fc_layer(self, fc_dims, input_dim, dropout_p=None):
if fc_dims is None or fc_dims < 0:
self.feature_dim = input_dim
return None
if isinstance(fc_dims, int):
fc_dims = [fc_dims]
layers = []
for dim in fc_dims:
layers.append(nn.Linear(input_dim, dim))
layers.append(nn.BatchNorm1d(dim))
layers.append(nn.ReLU(inplace=True))
if dropout_p is not None:
layers.append(nn.Dropout(p=dropout_p))
input_dim = dim
self.feature_dim = fc_dims[-1]
return nn.Sequential(*layers)
def _init_params(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu'
)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
def featuremaps(self, x):
x = self.conv1(x)
x = self.maxpool(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
return x
def forward(self, x, return_featuremaps=False):
x = self.featuremaps(x)
if return_featuremaps:
return x
v = self.global_avgpool(x)
v = v.view(v.size(0), -1)
if self.fc is not None:
v = self.fc(v)
if not self.training:
return v
y = self.classifier(v)
if self.loss == 'softmax':
return y
elif self.loss == 'triplet':
return y, v
else:
raise KeyError("Unsupported loss: {}".format(self.loss))
def init_pretrained_weights(model, key=''):
"""Initializes model with pretrained weights.
Layers that don't match with pretrained layers in name or size are kept unchanged.
"""
import os
import errno
import gdown
from collections import OrderedDict
def _get_torch_home():
ENV_TORCH_HOME = 'TORCH_HOME'
ENV_XDG_CACHE_HOME = 'XDG_CACHE_HOME'
DEFAULT_CACHE_DIR = '~/.cache'
torch_home = os.path.expanduser(
os.getenv(
ENV_TORCH_HOME,
os.path.join(
os.getenv(ENV_XDG_CACHE_HOME, DEFAULT_CACHE_DIR), 'torch'
)
)
)
return torch_home
torch_home = _get_torch_home()
model_dir = os.path.join(torch_home, 'checkpoints')
try:
os.makedirs(model_dir)
except OSError as e:
if e.errno == errno.EEXIST:
# Directory already exists, ignore.
pass
else:
# Unexpected OSError, re-raise.
raise
filename = key + '_imagenet.pth'
cached_file = os.path.join(model_dir, filename)
if not os.path.exists(cached_file):
gdown.download(pretrained_urls[key], cached_file, quiet=False)
state_dict = torch.load(cached_file)
model_dict = model.state_dict()
new_state_dict = OrderedDict()
matched_layers, discarded_layers = [], []
for k, v in state_dict.items():
if k.startswith('module.'):
k = k[7:] # discard module.
if k in model_dict and model_dict[k].size() == v.size():
new_state_dict[k] = v
matched_layers.append(k)
else:
discarded_layers.append(k)
model_dict.update(new_state_dict)
model.load_state_dict(model_dict)
if len(matched_layers) == 0:
warnings.warn(
'The pretrained weights from "{}" cannot be loaded, '
'please check the key names manually '
'(** ignored and continue **)'.format(cached_file)
)
else:
print(
'Successfully loaded imagenet pretrained weights from "{}"'.
format(cached_file)
)
if len(discarded_layers) > 0:
print(
'** The following layers are discarded '
'due to unmatched keys or layer size: {}'.
format(discarded_layers)
)
##########
# Instantiation
##########
def osnet_x1_0(num_classes=1000, pretrained=True, loss='softmax', **kwargs):
# standard size (width x1.0)
model = OSNet(
num_classes,
blocks=[OSBlock, OSBlock, OSBlock],
layers=[2, 2, 2],
channels=[64, 256, 384, 512],
loss=loss,
**kwargs
)
if pretrained:
init_pretrained_weights(model, key='osnet_x1_0')
return model
def osnet_x0_75(num_classes=1000, pretrained=True, loss='softmax', **kwargs):
# medium size (width x0.75)
model = OSNet(
num_classes,
blocks=[OSBlock, OSBlock, OSBlock],
layers=[2, 2, 2],
channels=[48, 192, 288, 384],
loss=loss,
**kwargs
)
if pretrained:
init_pretrained_weights(model, key='osnet_x0_75')
return model
def osnet_x0_5(num_classes=1000, pretrained=True, loss='softmax', **kwargs):
# tiny size (width x0.5)
model = OSNet(
num_classes,
blocks=[OSBlock, OSBlock, OSBlock],
layers=[2, 2, 2],
channels=[32, 128, 192, 256],
loss=loss,
**kwargs
)
if pretrained:
init_pretrained_weights(model, key='osnet_x0_5')
return model
def osnet_x0_25(num_classes=1000, pretrained=True, loss='softmax', **kwargs):
# very tiny size (width x0.25)
model = OSNet(
num_classes,
blocks=[OSBlock, OSBlock, OSBlock],
layers=[2, 2, 2],
channels=[16, 64, 96, 128],
loss=loss,
**kwargs
)
if pretrained:
init_pretrained_weights(model, key='osnet_x0_25')
return model
def osnet_ibn_x1_0(
num_classes=1000, pretrained=True, loss='softmax', **kwargs
):
# standard size (width x1.0) + IBN layer
# Ref: Pan et al. Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net. ECCV, 2018.
model = OSNet(
num_classes,
blocks=[OSBlock, OSBlock, OSBlock],
layers=[2, 2, 2],
channels=[64, 256, 384, 512],
loss=loss,
IN=True,
**kwargs
)
if pretrained:
init_pretrained_weights(model, key='osnet_ibn_x1_0')
return model

View File

@ -0,0 +1,3 @@
"""
file for reid_models folder
"""

View File

@ -0,0 +1,157 @@
"""
file for DeepSORT Re-ID model
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import cv2
import logging
import torchvision.transforms as transforms
class BasicBlock(nn.Module):
def __init__(self, c_in, c_out, is_downsample=False):
super(BasicBlock, self).__init__()
self.is_downsample = is_downsample
if is_downsample:
self.conv1 = nn.Conv2d(
c_in, c_out, 3, stride=2, padding=1, bias=False)
else:
self.conv1 = nn.Conv2d(
c_in, c_out, 3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(c_out)
self.relu = nn.ReLU(True)
self.conv2 = nn.Conv2d(c_out, c_out, 3, stride=1,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(c_out)
if is_downsample:
self.downsample = nn.Sequential(
nn.Conv2d(c_in, c_out, 1, stride=2, bias=False),
nn.BatchNorm2d(c_out)
)
elif c_in != c_out:
self.downsample = nn.Sequential(
nn.Conv2d(c_in, c_out, 1, stride=1, bias=False),
nn.BatchNorm2d(c_out)
)
self.is_downsample = True
def forward(self, x):
y = self.conv1(x)
y = self.bn1(y)
y = self.relu(y)
y = self.conv2(y)
y = self.bn2(y)
if self.is_downsample:
x = self.downsample(x)
return F.relu(x.add(y), True)
def make_layers(c_in, c_out, repeat_times, is_downsample=False):
blocks = []
for i in range(repeat_times):
if i == 0:
blocks += [BasicBlock(c_in, c_out, is_downsample=is_downsample), ]
else:
blocks += [BasicBlock(c_out, c_out), ]
return nn.Sequential(*blocks)
class Net(nn.Module):
def __init__(self, num_classes=751, reid=False):
super(Net, self).__init__()
# 3 128 64
self.conv = nn.Sequential(
nn.Conv2d(3, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
# nn.Conv2d(32,32,3,stride=1,padding=1),
# nn.BatchNorm2d(32),
# nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2, padding=1),
)
# 32 64 32
self.layer1 = make_layers(64, 64, 2, False)
# 32 64 32
self.layer2 = make_layers(64, 128, 2, True)
# 64 32 16
self.layer3 = make_layers(128, 256, 2, True)
# 128 16 8
self.layer4 = make_layers(256, 512, 2, True)
# 256 8 4
self.avgpool = nn.AvgPool2d((8, 4), 1)
# 256 1 1
self.reid = reid
self.classifier = nn.Sequential(
nn.Linear(512, 256),
nn.BatchNorm1d(256),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(256, num_classes),
)
def forward(self, x):
x = self.conv(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
# B x 128
if self.reid:
x = x.div(x.norm(p=2, dim=1, keepdim=True))
return x
# classifier
x = self.classifier(x)
return x
class Extractor(object):
def __init__(self, model_path, use_cuda=True):
self.net = Net(reid=True)
self.device = "cuda" if torch.cuda.is_available() and use_cuda else "cpu"
state_dict = torch.load(model_path, map_location=torch.device(self.device))[
'net_dict']
self.net.load_state_dict(state_dict)
logger = logging.getLogger("root.tracker")
logger.info("Loading weights from {}... Done!".format(model_path))
self.net.to(self.device)
self.size = (64, 128)
self.norm = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
def _preprocess(self, im_crops):
"""
TODO:
1. to float with scale from 0 to 1
2. resize to (64, 128) as Market1501 dataset did
3. concatenate to a numpy array
3. to torch Tensor
4. normalize
"""
def _resize(im, size):
try:
return cv2.resize(im.astype(np.float32)/255., size)
except:
print('Error: size in bbox exists zero, ', im.shape)
exit(0)
im_batch = torch.cat([self.norm(_resize(im, self.size)).unsqueeze(
0) for im in im_crops], dim=0).float()
return im_batch
def __call__(self, im_crops):
if isinstance(im_crops, list):
im_batch = self._preprocess(im_crops)
else:
im_batch = im_crops
with torch.no_grad():
im_batch = im_batch.to(self.device)
features = self.net(im_batch)
return features

View File

@ -0,0 +1,273 @@
"""
load checkpoint file
copied from https://github.com/mikel-brostrom/Yolov5_StrongSORT_OSNet
"""
from __future__ import division, print_function, absolute_import
import pickle
import shutil
import os.path as osp
import warnings
from functools import partial
from collections import OrderedDict
import torch
import torch.nn as nn
__all__ = [
'save_checkpoint', 'load_checkpoint', 'resume_from_checkpoint',
'open_all_layers', 'open_specified_layers', 'count_num_param',
'load_pretrained_weights'
]
def load_checkpoint(fpath):
r"""Loads checkpoint.
``UnicodeDecodeError`` can be well handled, which means
python2-saved files can be read from python3.
Args:
fpath (str): path to checkpoint.
Returns:
dict
Examples::
>>> from torchreid.utils import load_checkpoint
>>> fpath = 'log/my_model/model.pth.tar-10'
>>> checkpoint = load_checkpoint(fpath)
"""
if fpath is None:
raise ValueError('File path is None')
fpath = osp.abspath(osp.expanduser(fpath))
if not osp.exists(fpath):
raise FileNotFoundError('File is not found at "{}"'.format(fpath))
map_location = None if torch.cuda.is_available() else 'cpu'
try:
checkpoint = torch.load(fpath, map_location=map_location)
except UnicodeDecodeError:
pickle.load = partial(pickle.load, encoding="latin1")
pickle.Unpickler = partial(pickle.Unpickler, encoding="latin1")
checkpoint = torch.load(
fpath, pickle_module=pickle, map_location=map_location
)
except Exception:
print('Unable to load checkpoint from "{}"'.format(fpath))
raise
return checkpoint
def resume_from_checkpoint(fpath, model, optimizer=None, scheduler=None):
r"""Resumes training from a checkpoint.
This will load (1) model weights and (2) ``state_dict``
of optimizer if ``optimizer`` is not None.
Args:
fpath (str): path to checkpoint.
model (nn.Module): model.
optimizer (Optimizer, optional): an Optimizer.
scheduler (LRScheduler, optional): an LRScheduler.
Returns:
int: start_epoch.
Examples::
>>> from torchreid.utils import resume_from_checkpoint
>>> fpath = 'log/my_model/model.pth.tar-10'
>>> start_epoch = resume_from_checkpoint(
>>> fpath, model, optimizer, scheduler
>>> )
"""
print('Loading checkpoint from "{}"'.format(fpath))
checkpoint = load_checkpoint(fpath)
model.load_state_dict(checkpoint['state_dict'])
print('Loaded model weights')
if optimizer is not None and 'optimizer' in checkpoint.keys():
optimizer.load_state_dict(checkpoint['optimizer'])
print('Loaded optimizer')
if scheduler is not None and 'scheduler' in checkpoint.keys():
scheduler.load_state_dict(checkpoint['scheduler'])
print('Loaded scheduler')
start_epoch = checkpoint['epoch']
print('Last epoch = {}'.format(start_epoch))
if 'rank1' in checkpoint.keys():
print('Last rank1 = {:.1%}'.format(checkpoint['rank1']))
return start_epoch
def adjust_learning_rate(
optimizer,
base_lr,
epoch,
stepsize=20,
gamma=0.1,
linear_decay=False,
final_lr=0,
max_epoch=100
):
r"""Adjusts learning rate.
Deprecated.
"""
if linear_decay:
# linearly decay learning rate from base_lr to final_lr
frac_done = epoch / max_epoch
lr = frac_done*final_lr + (1.-frac_done) * base_lr
else:
# decay learning rate by gamma for every stepsize
lr = base_lr * (gamma**(epoch // stepsize))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def set_bn_to_eval(m):
r"""Sets BatchNorm layers to eval mode."""
# 1. no update for running mean and var
# 2. scale and shift parameters are still trainable
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.eval()
def open_all_layers(model):
r"""Opens all layers in model for training.
Examples::
>>> from torchreid.utils import open_all_layers
>>> open_all_layers(model)
"""
model.train()
for p in model.parameters():
p.requires_grad = True
def open_specified_layers(model, open_layers):
r"""Opens specified layers in model for training while keeping
other layers frozen.
Args:
model (nn.Module): neural net model.
open_layers (str or list): layers open for training.
Examples::
>>> from torchreid.utils import open_specified_layers
>>> # Only model.classifier will be updated.
>>> open_layers = 'classifier'
>>> open_specified_layers(model, open_layers)
>>> # Only model.fc and model.classifier will be updated.
>>> open_layers = ['fc', 'classifier']
>>> open_specified_layers(model, open_layers)
"""
if isinstance(model, nn.DataParallel):
model = model.module
if isinstance(open_layers, str):
open_layers = [open_layers]
for layer in open_layers:
assert hasattr(
model, layer
), '"{}" is not an attribute of the model, please provide the correct name'.format(
layer
)
for name, module in model.named_children():
if name in open_layers:
module.train()
for p in module.parameters():
p.requires_grad = True
else:
module.eval()
for p in module.parameters():
p.requires_grad = False
def count_num_param(model):
r"""Counts number of parameters in a model while ignoring ``self.classifier``.
Args:
model (nn.Module): network model.
Examples::
>>> from torchreid.utils import count_num_param
>>> model_size = count_num_param(model)
.. warning::
This method is deprecated in favor of
``torchreid.utils.compute_model_complexity``.
"""
warnings.warn(
'This method is deprecated and will be removed in the future.'
)
num_param = sum(p.numel() for p in model.parameters())
if isinstance(model, nn.DataParallel):
model = model.module
if hasattr(model,
'classifier') and isinstance(model.classifier, nn.Module):
# we ignore the classifier because it is unused at test time
num_param -= sum(p.numel() for p in model.classifier.parameters())
return num_param
def load_pretrained_weights(model, weight_path):
r"""Loads pretrianed weights to model.
Features::
- Incompatible layers (unmatched in name or size) will be ignored.
- Can automatically deal with keys containing "module.".
Args:
model (nn.Module): network model.
weight_path (str): path to pretrained weights.
Examples::
>>> from torchreid.utils import load_pretrained_weights
>>> weight_path = 'log/my_model/model-best.pth.tar'
>>> load_pretrained_weights(model, weight_path)
"""
checkpoint = load_checkpoint(weight_path)
if 'state_dict' in checkpoint:
state_dict = checkpoint['state_dict']
else:
state_dict = checkpoint
model_dict = model.state_dict()
new_state_dict = OrderedDict()
matched_layers, discarded_layers = [], []
for k, v in state_dict.items():
if k.startswith('module.'):
k = k[7:] # discard module.
if k in model_dict and model_dict[k].size() == v.size():
new_state_dict[k] = v
matched_layers.append(k)
else:
discarded_layers.append(k)
model_dict.update(new_state_dict)
model.load_state_dict(model_dict)
if len(matched_layers) == 0:
warnings.warn(
'The pretrained weights "{}" cannot be loaded, '
'please check the key names manually '
'(** ignored and continue **)'.format(weight_path)
)
else:
print(
'Successfully loaded pretrained weights from "{}"'.
format(weight_path)
)
if len(discarded_layers) > 0:
print(
'** The following layers are discarded '
'due to unmatched keys or layer size: {}'.
format(discarded_layers)
)

View File

@ -0,0 +1,169 @@
"""
Sort
"""
import numpy as np
from collections import deque
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet
from .matching import *
class SortTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
dets = bboxes[remain_inds]
cates = categories[remain_inds]
scores_keep = scores[remain_inds]
if len(dets) > 0:
'''Detections'''
detections = [Tracklet(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with high score detection boxes'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
dists = iou_distance(tracklet_pool, detections)
matches, u_track, u_detection = linear_assignment(dists, thresh=0.9)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detections[i] for i in u_detection]
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 3: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 4: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,338 @@
"""
Bot sort
"""
import numpy as np
import torch
from torchvision.ops import nms
import cv2
import torchvision.transforms as T
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_depth
from .matching import *
from .reid_models.OSNet import *
from .reid_models.load_model_tools import load_pretrained_weights
from .reid_models.deepsort_reid import Extractor
from .camera_motion_compensation import GMC
REID_MODEL_DICT = {
'osnet_x1_0': osnet_x1_0,
'osnet_x0_75': osnet_x0_75,
'osnet_x0_5': osnet_x0_5,
'osnet_x0_25': osnet_x0_25,
'deepsort': Extractor
}
def load_reid_model(reid_model, reid_model_path):
if 'osnet' in reid_model:
func = REID_MODEL_DICT[reid_model]
model = func(num_classes=1, pretrained=False, )
load_pretrained_weights(model, reid_model_path)
model.cuda().eval()
elif 'deepsort' in reid_model:
model = REID_MODEL_DICT[reid_model](reid_model_path, use_cuda=True)
else:
raise NotImplementedError
return model
class SparseTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
# camera motion compensation module
self.gmc = GMC(method='orb', downscale=2, verbose=None)
def get_deep_range(self, obj, step):
col = []
for t in obj:
lend = (t.deep_vec)[2]
col.append(lend)
max_len, mix_len = max(col), min(col)
if max_len != mix_len:
deep_range =np.arange(mix_len, max_len, (max_len - mix_len + 1) / step)
if deep_range[-1] < max_len:
deep_range = np.concatenate([deep_range, np.array([max_len],)])
deep_range[0] = np.floor(deep_range[0])
deep_range[-1] = np.ceil(deep_range[-1])
else:
deep_range = [mix_len,]
mask = self.get_sub_mask(deep_range, col)
return mask
def get_sub_mask(self, deep_range, col):
mix_len=deep_range[0]
max_len=deep_range[-1]
if max_len == mix_len:
lc = mix_len
mask = []
for d in deep_range:
if d > deep_range[0] and d < deep_range[-1]:
mask.append((col >= lc) & (col < d))
lc = d
elif d == deep_range[-1]:
mask.append((col >= lc) & (col <= d))
lc = d
else:
lc = d
continue
return mask
# core function
def DCM(self, detections, tracks, activated_tracklets, refind_tracklets, levels, thresh, is_fuse):
if len(detections) > 0:
det_mask = self.get_deep_range(detections, levels)
else:
det_mask = []
if len(tracks)!=0:
track_mask = self.get_deep_range(tracks, levels)
else:
track_mask = []
u_detection, u_tracks, res_det, res_track = [], [], [], []
if len(track_mask) != 0:
if len(track_mask) < len(det_mask):
for i in range(len(det_mask) - len(track_mask)):
idx = np.argwhere(det_mask[len(track_mask) + i] == True)
for idd in idx:
res_det.append(detections[idd[0]])
elif len(track_mask) > len(det_mask):
for i in range(len(track_mask) - len(det_mask)):
idx = np.argwhere(track_mask[len(det_mask) + i] == True)
for idd in idx:
res_track.append(tracks[idd[0]])
for dm, tm in zip(det_mask, track_mask):
det_idx = np.argwhere(dm == True)
trk_idx = np.argwhere(tm == True)
# search det
det_ = []
for idd in det_idx:
det_.append(detections[idd[0]])
det_ = det_ + u_detection
# search trk
track_ = []
for idt in trk_idx:
track_.append(tracks[idt[0]])
# update trk
track_ = track_ + u_tracks
dists = iou_distance(track_, det_)
matches, u_track_, u_det_ = linear_assignment(dists, thresh)
for itracked, idet in matches:
track = track_[itracked]
det = det_[idet]
if track.state == TrackState.Tracked:
track.update(det_[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
u_tracks = [track_[t] for t in u_track_]
u_detection = [det_[t] for t in u_det_]
u_tracks = u_tracks + res_track
u_detection = u_detection + res_det
else:
u_detection = detections
return activated_tracklets, refind_tracklets, u_tracks, u_detection
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlwh format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
inds_low = scores > 0.1
inds_high = scores < self.args.conf_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
cates = categories[remain_inds]
cates_second = categories[inds_second]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
if len(dets) > 0:
detections = [Tracklet_w_depth(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets, scores_keep, cates)]
else:
detections = []
''' Step 1: Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with high score detection boxes, depth cascade mathcing'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
# Camera motion compensation
warp = self.gmc.apply(ori_img, dets)
self.gmc.multi_gmc(tracklet_pool, warp)
self.gmc.multi_gmc(unconfirmed, warp)
# depth cascade matching
activated_tracklets, refind_tracklets, u_track, u_detection_high = self.DCM(
detections,
tracklet_pool,
activated_tracklets,
refind_tracklets,
levels=3,
thresh=0.75,
is_fuse=True)
''' Step 3: Second association, with low score detection boxes, depth cascade mathcing'''
if len(dets_second) > 0:
'''Detections'''
detections_second = [Tracklet_w_depth(tlwh, s, cate, motion=self.motion) for
(tlwh, s, cate) in zip(dets_second, scores_second, cates_second)]
else:
detections_second = []
r_tracked_tracklets = [t for t in u_track if t.state == TrackState.Tracked]
activated_tracklets, refind_tracklets, u_track, u_detection_sec = self.DCM(
detections_second,
r_tracked_tracklets,
activated_tracklets,
refind_tracklets,
levels=3,
thresh=0.3,
is_fuse=False)
for track in u_track:
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = u_detection_high
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,327 @@
"""
Deep Sort
"""
import numpy as np
import torch
from torchvision.ops import nms
import cv2
import torchvision.transforms as T
from .basetrack import BaseTrack, TrackState
from .tracklet import Tracklet, Tracklet_w_reid
from .matching import *
from .reid_models.OSNet import *
from .reid_models.load_model_tools import load_pretrained_weights
from .reid_models.deepsort_reid import Extractor
REID_MODEL_DICT = {
'osnet_x1_0': osnet_x1_0,
'osnet_x0_75': osnet_x0_75,
'osnet_x0_5': osnet_x0_5,
'osnet_x0_25': osnet_x0_25,
'deepsort': Extractor
}
def load_reid_model(reid_model, reid_model_path):
if 'osnet' in reid_model:
func = REID_MODEL_DICT[reid_model]
model = func(num_classes=1, pretrained=False, )
load_pretrained_weights(model, reid_model_path)
model.cuda().eval()
elif 'deepsort' in reid_model:
model = REID_MODEL_DICT[reid_model](reid_model_path, use_cuda=True)
else:
raise NotImplementedError
return model
class StrongSortTracker(object):
def __init__(self, args, frame_rate=30):
self.tracked_tracklets = [] # type: list[Tracklet]
self.lost_tracklets = [] # type: list[Tracklet]
self.removed_tracklets = [] # type: list[Tracklet]
self.frame_id = 0
self.args = args
self.det_thresh = args.conf_thresh + 0.1
self.buffer_size = int(frame_rate / 30.0 * args.track_buffer)
self.max_time_lost = self.buffer_size
self.motion = args.kalman_format
self.with_reid = not args.discard_reid
self.reid_model, self.crop_transforms = None, None
if self.with_reid:
self.reid_model = load_reid_model(args.reid_model, args.reid_model_path)
self.crop_transforms = T.Compose([
# T.ToPILImage(),
# T.Resize(size=(256, 128)),
T.ToTensor(), # (c, 128, 256)
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
self.bbox_crop_size = (64, 128) if 'deepsort' in args.reid_model else (128, 128)
self.lambda_ = 0.98 # the coef of cost mix in eq. 10 in paper
def reid_preprocess(self, obj_bbox):
"""
preprocess cropped object bboxes
obj_bbox: np.ndarray, shape=(h_obj, w_obj, c)
return:
torch.Tensor of shape (c, 128, 256)
"""
obj_bbox = cv2.resize(obj_bbox.astype(np.float32) / 255.0, dsize=self.bbox_crop_size) # shape: (h, w, c)
return self.crop_transforms(obj_bbox)
def get_feature(self, tlwhs, ori_img):
"""
get apperance feature of an object
tlwhs: shape (num_of_objects, 4)
ori_img: original image, np.ndarray, shape(H, W, C)
"""
obj_bbox = []
for tlwh in tlwhs:
tlwh = list(map(int, tlwh))
# limit to the legal range
tlwh[0], tlwh[1] = max(tlwh[0], 0), max(tlwh[1], 0)
tlbr_tensor = self.reid_preprocess(ori_img[tlwh[1]: tlwh[1] + tlwh[3], tlwh[0]: tlwh[0] + tlwh[2]])
obj_bbox.append(tlbr_tensor)
if not obj_bbox:
return np.array([])
obj_bbox = torch.stack(obj_bbox, dim=0)
obj_bbox = obj_bbox.cuda()
features = self.reid_model(obj_bbox) # shape: (num_of_objects, feature_dim)
return features.cpu().detach().numpy()
def update(self, output_results, img, ori_img):
"""
output_results: processed detections (scale to original size) tlbr format
"""
self.frame_id += 1
activated_tracklets = []
refind_tracklets = []
lost_tracklets = []
removed_tracklets = []
scores = output_results[:, 4]
bboxes = output_results[:, :4]
categories = output_results[:, -1]
remain_inds = scores > self.args.conf_thresh
dets = bboxes[remain_inds]
cates = categories[remain_inds]
scores_keep = scores[remain_inds]
features_keep = self.get_feature(tlwhs=dets[:, :4], ori_img=ori_img)
if len(dets) > 0:
'''Detections'''
detections = [Tracklet_w_reid(tlwh, s, cate, motion=self.motion, feat=feat) for
(tlwh, s, cate, feat) in zip(dets, scores_keep, cates, features_keep)]
else:
detections = []
''' Add newly detected tracklets to tracked_tracklets'''
unconfirmed = []
tracked_tracklets = [] # type: list[Tracklet]
for track in self.tracked_tracklets:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_tracklets.append(track)
''' Step 2: First association, with appearance'''
tracklet_pool = joint_tracklets(tracked_tracklets, self.lost_tracklets)
# Predict the current location with Kalman
for tracklet in tracklet_pool:
tracklet.predict()
# vallina matching
cost_matrix = self.gated_metric(tracklet_pool, detections)
matches, u_track, u_detection = linear_assignment(cost_matrix, thresh=0.9)
for itracked, idet in matches:
track = tracklet_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked:
track.update(detections[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
'''Step 3: Second association, with iou'''
tracklet_for_iou = [tracklet_pool[i] for i in u_track if tracklet_pool[i].state == TrackState.Tracked]
detection_for_iou = [detections[i] for i in u_detection]
dists = iou_distance(tracklet_for_iou, detection_for_iou)
matches, u_track, u_detection = linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = tracklet_for_iou[itracked]
det = detection_for_iou[idet]
if track.state == TrackState.Tracked:
track.update(detection_for_iou[idet], self.frame_id)
activated_tracklets.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_tracklets.append(track)
for it in u_track:
track = tracklet_for_iou[it]
if not track.state == TrackState.Lost:
track.mark_lost()
lost_tracklets.append(track)
'''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
detections = [detection_for_iou[i] for i in u_detection]
dists = iou_distance(unconfirmed, detections)
matches, u_unconfirmed, u_detection = linear_assignment(dists, thresh=0.7)
for itracked, idet in matches:
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_tracklets.append(unconfirmed[itracked])
for it in u_unconfirmed:
track = unconfirmed[it]
track.mark_removed()
removed_tracklets.append(track)
""" Step 4: Init new tracklets"""
for inew in u_detection:
track = detections[inew]
if track.score < self.det_thresh:
continue
track.activate(self.frame_id)
activated_tracklets.append(track)
""" Step 5: Update state"""
for track in self.lost_tracklets:
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_tracklets.append(track)
# print('Ramained match {} s'.format(t4-t3))
self.tracked_tracklets = [t for t in self.tracked_tracklets if t.state == TrackState.Tracked]
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, activated_tracklets)
self.tracked_tracklets = joint_tracklets(self.tracked_tracklets, refind_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.tracked_tracklets)
self.lost_tracklets.extend(lost_tracklets)
self.lost_tracklets = sub_tracklets(self.lost_tracklets, self.removed_tracklets)
self.removed_tracklets.extend(removed_tracklets)
self.tracked_tracklets, self.lost_tracklets = remove_duplicate_tracklets(self.tracked_tracklets, self.lost_tracklets)
# get scores of lost tracks
output_tracklets = [track for track in self.tracked_tracklets if track.is_activated]
return output_tracklets
def gated_metric(self, tracks, dets):
"""
get cost matrix, firstly calculate apperence cost, then filter by Kalman state.
tracks: List[STrack]
dets: List[STrack]
"""
apperance_dist = embedding_distance(tracks=tracks, detections=dets, metric='cosine')
cost_matrix = self.gate_cost_matrix(apperance_dist, tracks, dets, )
return cost_matrix
def gate_cost_matrix(self, cost_matrix, tracks, dets, max_apperance_thresh=0.15, gated_cost=1e5, only_position=False):
"""
gate cost matrix by calculating the Kalman state distance and constrainted by
0.95 confidence interval of x2 distribution
cost_matrix: np.ndarray, shape (len(tracks), len(dets))
tracks: List[STrack]
dets: List[STrack]
gated_cost: a very largt const to infeasible associations
only_position: use [xc, yc, a, h] as state vector or only use [xc, yc]
return:
updated cost_matirx, np.ndarray
"""
gating_dim = 2 if only_position else 4
gating_threshold = chi2inv95[gating_dim]
measurements = np.asarray([Tracklet.tlwh_to_xyah(det.tlwh) for det in dets]) # (len(dets), 4)
cost_matrix[cost_matrix > max_apperance_thresh] = gated_cost
for row, track in enumerate(tracks):
gating_distance = track.kalman_filter.gating_distance(measurements, )
cost_matrix[row, gating_distance > gating_threshold] = gated_cost
cost_matrix[row] = self.lambda_ * cost_matrix[row] + (1 - self.lambda_) * gating_distance
return cost_matrix
def joint_tracklets(tlista, tlistb):
exists = {}
res = []
for t in tlista:
exists[t.track_id] = 1
res.append(t)
for t in tlistb:
tid = t.track_id
if not exists.get(tid, 0):
exists[tid] = 1
res.append(t)
return res
def sub_tracklets(tlista, tlistb):
tracklets = {}
for t in tlista:
tracklets[t.track_id] = t
for t in tlistb:
tid = t.track_id
if tracklets.get(tid, 0):
del tracklets[tid]
return list(tracklets.values())
def remove_duplicate_tracklets(trackletsa, trackletsb):
pdist = iou_distance(trackletsa, trackletsb)
pairs = np.where(pdist < 0.15)
dupa, dupb = list(), list()
for p, q in zip(*pairs):
timep = trackletsa[p].frame_id - trackletsa[p].start_frame
timeq = trackletsb[q].frame_id - trackletsb[q].start_frame
if timep > timeq:
dupb.append(q)
else:
dupa.append(p)
resa = [t for i, t in enumerate(trackletsa) if not i in dupa]
resb = [t for i, t in enumerate(trackletsb) if not i in dupb]
return resa, resb

View File

@ -0,0 +1,366 @@
"""
implements base elements of trajectory
"""
import numpy as np
from collections import deque
from .basetrack import BaseTrack, TrackState
from .kalman_filters.bytetrack_kalman import ByteKalman
from .kalman_filters.botsort_kalman import BotKalman
from .kalman_filters.ocsort_kalman import OCSORTKalman
from .kalman_filters.sort_kalman import SORTKalman
from .kalman_filters.strongsort_kalman import NSAKalman
MOTION_MODEL_DICT = {
'sort': SORTKalman,
'byte': ByteKalman,
'bot': BotKalman,
'ocsort': OCSORTKalman,
'strongsort': NSAKalman,
}
STATE_CONVERT_DICT = {
'sort': 'xysa',
'byte': 'xyah',
'bot': 'xywh',
'ocsort': 'xysa',
'strongsort': 'xyah'
}
class Tracklet(BaseTrack):
def __init__(self, tlwh, score, category, motion='byte'):
# initial position
self._tlwh = np.asarray(tlwh, dtype=np.float)
self.is_activated = False
self.score = score
self.category = category
# kalman
self.motion = motion
self.kalman_filter = MOTION_MODEL_DICT[motion]()
self.convert_func = self.__getattribute__('tlwh_to_' + STATE_CONVERT_DICT[motion])
# init kalman
self.kalman_filter.initialize(self.convert_func(self._tlwh))
def predict(self):
self.kalman_filter.predict()
self.time_since_update += 1
def activate(self, frame_id):
self.track_id = self.next_id()
self.state = TrackState.Tracked
if frame_id == 1:
self.is_activated = True
self.frame_id = frame_id
self.start_frame = frame_id
def re_activate(self, new_track, frame_id, new_id=False):
# TODO different convert
self.kalman_filter.update(self.convert_func(new_track.tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.frame_id = frame_id
if new_id:
self.track_id = self.next_id()
self.score = new_track.score
def update(self, new_track, frame_id):
self.frame_id = frame_id
new_tlwh = new_track.tlwh
self.score = new_track.score
self.kalman_filter.update(self.convert_func(new_tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.time_since_update = 0
@property
def tlwh(self):
"""Get current position in bounding box format `(top left x, top left y,
width, height)`.
"""
return self.__getattribute__(STATE_CONVERT_DICT[self.motion] + '_to_tlwh')()
def xyah_to_tlwh(self, ):
x = self.kalman_filter.kf.x
ret = x[:4].copy()
ret[2] *= ret[3]
ret[:2] -= ret[2:] / 2
return ret
def xywh_to_tlwh(self, ):
x = self.kalman_filter.kf.x
ret = x[:4].copy()
ret[:2] -= ret[2:] / 2
return ret
def xysa_to_tlwh(self, ):
x = self.kalman_filter.kf.x
ret = x[:4].copy()
ret[2] = np.sqrt(x[2] * x[3])
ret[3] = x[2] / ret[2]
ret[:2] -= ret[2:] / 2
return ret
class Tracklet_w_reid(Tracklet):
"""
Tracklet class with reid features, for botsort, deepsort, etc.
"""
def __init__(self, tlwh, score, category, motion='byte',
feat=None, feat_history=50):
super().__init__(tlwh, score, category, motion)
self.smooth_feat = None # EMA feature
self.curr_feat = None # current feature
self.features = deque([], maxlen=feat_history) # all features
if feat is not None:
self.update_features(feat)
self.alpha = 0.9
def update_features(self, feat):
feat /= np.linalg.norm(feat)
self.curr_feat = feat
if self.smooth_feat is None:
self.smooth_feat = feat
else:
self.smooth_feat = self.alpha * self.smooth_feat + (1 - self.alpha) * feat
self.features.append(feat)
self.smooth_feat /= np.linalg.norm(self.smooth_feat)
def re_activate(self, new_track, frame_id, new_id=False):
# TODO different convert
if isinstance(self.kalman_filter, NSAKalman):
self.kalman_filter.update(self.convert_func(new_track.tlwh), new_track.score)
else:
self.kalman_filter.update(self.convert_func(new_track.tlwh))
if new_track.curr_feat is not None:
self.update_features(new_track.curr_feat)
self.state = TrackState.Tracked
self.is_activated = True
self.frame_id = frame_id
if new_id:
self.track_id = self.next_id()
self.score = new_track.score
def update(self, new_track, frame_id):
self.frame_id = frame_id
new_tlwh = new_track.tlwh
self.score = new_track.score
if isinstance(self.kalman_filter, NSAKalman):
self.kalman_filter.update(self.convert_func(new_tlwh), self.score)
else:
self.kalman_filter.update(self.convert_func(new_tlwh))
self.state = TrackState.Tracked
self.is_activated = True
if new_track.curr_feat is not None:
self.update_features(new_track.curr_feat)
self.time_since_update = 0
class Tracklet_w_velocity(Tracklet):
"""
Tracklet class with reid features, for ocsort.
"""
def __init__(self, tlwh, score, category, motion='byte', delta_t=3):
super().__init__(tlwh, score, category, motion)
self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder
self.observations = dict()
self.history_observations = []
self.velocity = None
self.delta_t = delta_t
self.age = 0 # mark the age
@staticmethod
def speed_direction(bbox1, bbox2):
cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0
cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0
speed = np.array([cy2 - cy1, cx2 - cx1])
norm = np.sqrt((cy2 - cy1)**2 + (cx2 - cx1)**2) + 1e-6
return speed / norm
def predict(self):
self.kalman_filter.predict()
self.age += 1
self.time_since_update += 1
def update(self, new_track, frame_id):
self.frame_id = frame_id
new_tlwh = new_track.tlwh
self.score = new_track.score
self.kalman_filter.update(self.convert_func(new_tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.time_since_update = 0
# update velocity and history buffer
new_tlbr = Tracklet_w_bbox_buffer.tlwh_to_tlbr(new_tlwh)
if self.last_observation.sum() >= 0: # no previous observation
previous_box = None
for i in range(self.delta_t):
dt = self.delta_t - i
if self.age - dt in self.observations:
previous_box = self.observations[self.age-dt]
break
if previous_box is None:
previous_box = self.last_observation
"""
Estimate the track speed direction with observations \Delta t steps away
"""
self.velocity = self.speed_direction(previous_box, new_tlbr)
new_observation = np.r_[new_tlbr, new_track.score]
self.last_observation = new_observation
self.observations[self.age] = new_observation
self.history_observations.append(new_observation)
class Tracklet_w_bbox_buffer(Tracklet):
"""
Tracklet class with buffer of bbox, for C_BIoU track.
"""
def __init__(self, tlwh, score, category, motion='byte'):
super().__init__(tlwh, score, category, motion)
# params in motion state
self.b1, self.b2, self.n = 0.3, 0.5, 5
self.origin_bbox_buffer = deque() # a deque store the original bbox(tlwh) from t - self.n to t, where t is the last time detected
self.origin_bbox_buffer.append(self._tlwh)
# buffered bbox, two buffer sizes
self.buffer_bbox1 = self.get_buffer_bbox(level=1)
self.buffer_bbox2 = self.get_buffer_bbox(level=2)
# motion state, s^{t + \delta} = o^t + (\delta / n) * \sum_{i=t-n+1}^t(o^i - o^{i-1}) = o^t + (\delta / n) * (o^t - o^{t - n})
self.motion_state1 = self.buffer_bbox1.copy()
self.motion_state2 = self.buffer_bbox2.copy()
def get_buffer_bbox(self, level=1, bbox=None):
"""
get buffered bbox as: (top, left, w, h) -> (top - bw, y - bh, w + 2bw, h + 2bh)
level = 1: b = self.b1 level = 2: b = self.b2
bbox: if not None, use bbox to calculate buffer_bbox, else use self._tlwh
"""
assert level in [1, 2], 'level must be 1 or 2'
b = self.b1 if level == 1 else self.b2
if bbox is None:
buffer_bbox = self._tlwh + np.array([-b*self._tlwh[2], -b*self._tlwh[3], 2*b*self._tlwh[2], 2*b*self._tlwh[3]])
else:
buffer_bbox = bbox + np.array([-b*bbox[2], -b*bbox[3], 2*b*bbox[2], 2*b*bbox[3]])
return np.maximum(0.0, buffer_bbox)
def re_activate(self, new_track, frame_id, new_id=False):
# TODO different convert
self.kalman_filter.update(self.convert_func(new_track.tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.frame_id = frame_id
if new_id:
self.track_id = self.next_id()
self.score = new_track.score
self._tlwh = new_track._tlwh
# update stored bbox
if (len(self.origin_bbox_buffer) > self.n):
self.origin_bbox_buffer.popleft()
self.origin_bbox_buffer.append(self._tlwh)
else:
self.origin_bbox_buffer.append(self._tlwh)
self.buffer_bbox1 = self.get_buffer_bbox(level=1)
self.buffer_bbox2 = self.get_buffer_bbox(level=2)
self.motion_state1 = self.buffer_bbox1.copy()
self.motion_state2 = self.buffer_bbox2.copy()
def update(self, new_track, frame_id):
self.frame_id = frame_id
new_tlwh = new_track.tlwh
self.score = new_track.score
self.kalman_filter.update(self.convert_func(new_tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.time_since_update = 0
# update stored bbox
if (len(self.origin_bbox_buffer) > self.n):
self.origin_bbox_buffer.popleft()
self.origin_bbox_buffer.append(new_tlwh)
else:
self.origin_bbox_buffer.append(new_tlwh)
# update motion state
if self.time_since_update: # have some unmatched frames
if len(self.origin_bbox_buffer) < self.n:
self.motion_state1 = self.get_buffer_bbox(level=1, bbox=new_tlwh)
self.motion_state2 = self.get_buffer_bbox(level=2, bbox=new_tlwh)
else: # s^{t + \delta} = o^t + (\delta / n) * (o^t - o^{t - n})
motion_state = self.origin_bbox_buffer[-1] + \
(self.time_since_update / self.n) * (self.origin_bbox_buffer[-1] - self.origin_bbox_buffer[0])
self.motion_state1 = self.get_buffer_bbox(level=1, bbox=motion_state)
self.motion_state2 = self.get_buffer_bbox(level=2, bbox=motion_state)
else: # no unmatched frames, use current detection as motion state
self.motion_state1 = self.get_buffer_bbox(level=1, bbox=new_tlwh)
self.motion_state2 = self.get_buffer_bbox(level=2, bbox=new_tlwh)
class Tracklet_w_depth(Tracklet):
"""
tracklet with depth info (i.e., 2000 - y2), for SparseTrack
"""
def __init__(self, tlwh, score, category, motion='byte'):
super().__init__(tlwh, score, category, motion)
@property
# @jit(nopython=True)
def deep_vec(self):
"""Convert bounding box to format `((top left, bottom right)`, i.e.,
`(top left, bottom right)`.
"""
ret = self.tlwh.copy()
cx = ret[0] + 0.5 * ret[2]
y2 = ret[1] + ret[3]
lendth = 2000 - y2
return np.asarray([cx, y2, lendth], dtype=np.float)

View File

@ -0,0 +1,5 @@
from .eval import Evaluator
from . import datasets
from . import metrics
from . import plotting
from . import utils

View File

@ -0,0 +1,65 @@
from functools import wraps
from time import perf_counter
import inspect
DO_TIMING = False
DISPLAY_LESS_PROGRESS = False
timer_dict = {}
counter = 0
def time(f):
@wraps(f)
def wrap(*args, **kw):
if DO_TIMING:
# Run function with timing
ts = perf_counter()
result = f(*args, **kw)
te = perf_counter()
tt = te-ts
# Get function name
arg_names = inspect.getfullargspec(f)[0]
if arg_names[0] == 'self' and DISPLAY_LESS_PROGRESS:
return result
elif arg_names[0] == 'self':
method_name = type(args[0]).__name__ + '.' + f.__name__
else:
method_name = f.__name__
# Record accumulative time in each function for analysis
if method_name in timer_dict.keys():
timer_dict[method_name] += tt
else:
timer_dict[method_name] = tt
# If code is finished, display timing summary
if method_name == "Evaluator.evaluate":
print("")
print("Timing analysis:")
for key, value in timer_dict.items():
print('%-70s %2.4f sec' % (key, value))
else:
# Get function argument values for printing special arguments of interest
arg_titles = ['tracker', 'seq', 'cls']
arg_vals = []
for i, a in enumerate(arg_names):
if a in arg_titles:
arg_vals.append(args[i])
arg_text = '(' + ', '.join(arg_vals) + ')'
# Display methods and functions with different indentation.
if arg_names[0] == 'self':
print('%-74s %2.4f sec' % (' '*4 + method_name + arg_text, tt))
elif arg_names[0] == 'test':
pass
else:
global counter
counter += 1
print('%i %-70s %2.4f sec' % (counter, method_name + arg_text, tt))
return result
else:
# If config["TIME_PROGRESS"] is false, or config["USE_PARALLEL"] is true, run functions normally without timing.
return f(*args, **kw)
return wrap

View File

@ -0,0 +1,6 @@
import baseline_utils
import stp
import non_overlap
import pascal_colormap
import thresholder
import vizualize

View File

@ -0,0 +1,321 @@
import os
import csv
import numpy as np
from copy import deepcopy
from PIL import Image
from pycocotools import mask as mask_utils
from scipy.optimize import linear_sum_assignment
from trackeval.baselines.pascal_colormap import pascal_colormap
def load_seq(file_to_load):
""" Load input data from file in RobMOTS format (e.g. provided detections).
Returns: Data object with the following structure (see STP :
data['cls'][t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles'}
"""
fp = open(file_to_load)
dialect = csv.Sniffer().sniff(fp.readline(), delimiters=' ')
dialect.skipinitialspace = True
fp.seek(0)
reader = csv.reader(fp, dialect)
read_data = {}
num_timesteps = 0
for i, row in enumerate(reader):
if row[-1] in '':
row = row[:-1]
t = int(row[0])
cid = row[1]
c = int(row[2])
s = row[3]
h = row[4]
w = row[5]
rle = row[6]
if t >= num_timesteps:
num_timesteps = t + 1
if c in read_data.keys():
if t in read_data[c].keys():
read_data[c][t]['ids'].append(cid)
read_data[c][t]['scores'].append(s)
read_data[c][t]['im_hs'].append(h)
read_data[c][t]['im_ws'].append(w)
read_data[c][t]['mask_rles'].append(rle)
else:
read_data[c][t] = {}
read_data[c][t]['ids'] = [cid]
read_data[c][t]['scores'] = [s]
read_data[c][t]['im_hs'] = [h]
read_data[c][t]['im_ws'] = [w]
read_data[c][t]['mask_rles'] = [rle]
else:
read_data[c] = {t: {}}
read_data[c][t]['ids'] = [cid]
read_data[c][t]['scores'] = [s]
read_data[c][t]['im_hs'] = [h]
read_data[c][t]['im_ws'] = [w]
read_data[c][t]['mask_rles'] = [rle]
fp.close()
data = {}
for c in read_data.keys():
data[c] = [{} for _ in range(num_timesteps)]
for t in range(num_timesteps):
if t in read_data[c].keys():
data[c][t]['ids'] = np.atleast_1d(read_data[c][t]['ids']).astype(int)
data[c][t]['scores'] = np.atleast_1d(read_data[c][t]['scores']).astype(float)
data[c][t]['im_hs'] = np.atleast_1d(read_data[c][t]['im_hs']).astype(int)
data[c][t]['im_ws'] = np.atleast_1d(read_data[c][t]['im_ws']).astype(int)
data[c][t]['mask_rles'] = np.atleast_1d(read_data[c][t]['mask_rles']).astype(str)
else:
data[c][t]['ids'] = np.empty(0).astype(int)
data[c][t]['scores'] = np.empty(0).astype(float)
data[c][t]['im_hs'] = np.empty(0).astype(int)
data[c][t]['im_ws'] = np.empty(0).astype(int)
data[c][t]['mask_rles'] = np.empty(0).astype(str)
return data
def threshold(tdata, thresh):
""" Removes detections below a certian threshold ('thresh') score. """
new_data = {}
to_keep = tdata['scores'] > thresh
for field in ['ids', 'scores', 'im_hs', 'im_ws', 'mask_rles']:
new_data[field] = tdata[field][to_keep]
return new_data
def create_coco_mask(mask_rles, im_hs, im_ws):
""" Converts mask as rle text (+ height and width) to encoded version used by pycocotools. """
coco_masks = [{'size': [h, w], 'counts': m.encode(encoding='UTF-8')}
for h, w, m in zip(im_hs, im_ws, mask_rles)]
return coco_masks
def mask_iou(mask_rles1, mask_rles2, im_hs, im_ws, do_ioa=0):
""" Calculate mask IoU between two masks.
Further allows 'intersection over area' instead of IoU (over the area of mask_rle1).
Allows either to pass in 1 boolean for do_ioa for all mask_rles2 or also one for each mask_rles2.
It is recommended that mask_rles1 is a detection and mask_rles2 is a groundtruth.
"""
coco_masks1 = create_coco_mask(mask_rles1, im_hs, im_ws)
coco_masks2 = create_coco_mask(mask_rles2, im_hs, im_ws)
if not hasattr(do_ioa, "__len__"):
do_ioa = [do_ioa]*len(coco_masks2)
assert(len(coco_masks2) == len(do_ioa))
if len(coco_masks1) == 0 or len(coco_masks2) == 0:
iou = np.zeros(len(coco_masks1), len(coco_masks2))
else:
iou = mask_utils.iou(coco_masks1, coco_masks2, do_ioa)
return iou
def sort_by_score(t_data):
""" Sorts data by score """
sort_index = np.argsort(t_data['scores'])[::-1]
for k in t_data.keys():
t_data[k] = t_data[k][sort_index]
return t_data
def mask_NMS(t_data, nms_threshold=0.5, already_sorted=False):
""" Remove redundant masks by performing non-maximum suppression (NMS) """
# Sort by score
if not already_sorted:
t_data = sort_by_score(t_data)
# Calculate the mask IoU between all detections in the timestep.
mask_ious_all = mask_iou(t_data['mask_rles'], t_data['mask_rles'], t_data['im_hs'], t_data['im_ws'])
# Determine which masks NMS should remove
# (those overlapping greater than nms_threshold with another mask that has a higher score)
num_dets = len(t_data['mask_rles'])
to_remove = [False for _ in range(num_dets)]
for i in range(num_dets):
if not to_remove[i]:
for j in range(i + 1, num_dets):
if mask_ious_all[i, j] > nms_threshold:
to_remove[j] = True
# Remove detections which should be removed
to_keep = np.logical_not(to_remove)
for k in t_data.keys():
t_data[k] = t_data[k][to_keep]
return t_data
def non_overlap(t_data, already_sorted=False):
""" Enforces masks to be non-overlapping in an image, does this by putting masks 'on top of one another',
such that higher score masks 'occlude' and thus remove parts of lower scoring masks.
Help wanted: if anyone knows a way to do this WITHOUT converting the RLE to the np.array let me know, because that
would be MUCH more efficient. (I have tried, but haven't yet had success).
"""
# Sort by score
if not already_sorted:
t_data = sort_by_score(t_data)
# Get coco masks
coco_masks = create_coco_mask(t_data['mask_rles'], t_data['im_hs'], t_data['im_ws'])
# Create a single np.array to hold all of the non-overlapping mask
masks_array = np.zeros((t_data['im_hs'][0], t_data['im_ws'][0]), 'uint8')
# Decode each mask into a np.array, and place it into the overall array for the whole frame.
# Since masks with the lowest score are placed first, they are 'partially overridden' by masks with a higher score
# if they overlap.
for i, mask in enumerate(coco_masks[::-1]):
masks_array[mask_utils.decode(mask).astype('bool')] = i + 1
# Encode the resulting np.array back into a set of coco_masks which are now non-overlapping.
num_dets = len(coco_masks)
for i, j in enumerate(range(1, num_dets + 1)[::-1]):
coco_masks[i] = mask_utils.encode(np.asfortranarray(masks_array == j, dtype=np.uint8))
# Convert from coco_mask back into our mask_rle format.
t_data['mask_rles'] = [m['counts'].decode("utf-8") for m in coco_masks]
return t_data
def masks2boxes(mask_rles, im_hs, im_ws):
""" Extracts bounding boxes which surround a set of masks. """
coco_masks = create_coco_mask(mask_rles, im_hs, im_ws)
boxes = np.array([mask_utils.toBbox(x) for x in coco_masks])
if len(boxes) == 0:
boxes = np.empty((0, 4))
return boxes
def box_iou(bboxes1, bboxes2, box_format='xywh', do_ioa=False, do_giou=False):
""" Calculates the IOU (intersection over union) between two arrays of boxes.
Allows variable box formats ('xywh' and 'x0y0x1y1').
If do_ioa (intersection over area), then calculates the intersection over the area of boxes1 - this is commonly
used to determine if detections are within crowd ignore region.
If do_giou (generalized intersection over union, then calculates giou.
"""
if len(bboxes1) == 0 or len(bboxes2) == 0:
ious = np.zeros((len(bboxes1), len(bboxes2)))
return ious
if box_format in 'xywh':
# layout: (x0, y0, w, h)
bboxes1 = deepcopy(bboxes1)
bboxes2 = deepcopy(bboxes2)
bboxes1[:, 2] = bboxes1[:, 0] + bboxes1[:, 2]
bboxes1[:, 3] = bboxes1[:, 1] + bboxes1[:, 3]
bboxes2[:, 2] = bboxes2[:, 0] + bboxes2[:, 2]
bboxes2[:, 3] = bboxes2[:, 1] + bboxes2[:, 3]
elif box_format not in 'x0y0x1y1':
raise (Exception('box_format %s is not implemented' % box_format))
# layout: (x0, y0, x1, y1)
min_ = np.minimum(bboxes1[:, np.newaxis, :], bboxes2[np.newaxis, :, :])
max_ = np.maximum(bboxes1[:, np.newaxis, :], bboxes2[np.newaxis, :, :])
intersection = np.maximum(min_[..., 2] - max_[..., 0], 0) * np.maximum(min_[..., 3] - max_[..., 1], 0)
area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])
if do_ioa:
ioas = np.zeros_like(intersection)
valid_mask = area1 > 0 + np.finfo('float').eps
ioas[valid_mask, :] = intersection[valid_mask, :] / area1[valid_mask][:, np.newaxis]
return ioas
else:
area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1])
union = area1[:, np.newaxis] + area2[np.newaxis, :] - intersection
intersection[area1 <= 0 + np.finfo('float').eps, :] = 0
intersection[:, area2 <= 0 + np.finfo('float').eps] = 0
intersection[union <= 0 + np.finfo('float').eps] = 0
union[union <= 0 + np.finfo('float').eps] = 1
ious = intersection / union
if do_giou:
enclosing_area = np.maximum(max_[..., 2] - min_[..., 0], 0) * np.maximum(max_[..., 3] - min_[..., 1], 0)
eps = 1e-7
# giou
ious = ious - ((enclosing_area - union) / (enclosing_area + eps))
return ious
def match(match_scores):
match_rows, match_cols = linear_sum_assignment(-match_scores)
return match_rows, match_cols
def write_seq(output_data, out_file):
out_loc = os.path.dirname(out_file)
if not os.path.exists(out_loc):
os.makedirs(out_loc, exist_ok=True)
fp = open(out_file, 'w', newline='')
writer = csv.writer(fp, delimiter=' ')
for row in output_data:
writer.writerow(row)
fp.close()
def combine_classes(data):
""" Converts data from a class-separated to a class-combined format.
Input format: data['cls'][t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles'}
Output format: data[t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles', 'cls'}
"""
output_data = [{} for _ in list(data.values())[0]]
for cls, cls_data in data.items():
for timestep, t_data in enumerate(cls_data):
for k in t_data.keys():
if k in output_data[timestep].keys():
output_data[timestep][k] += list(t_data[k])
else:
output_data[timestep][k] = list(t_data[k])
if 'cls' in output_data[timestep].keys():
output_data[timestep]['cls'] += [cls]*len(output_data[timestep]['ids'])
else:
output_data[timestep]['cls'] = [cls]*len(output_data[timestep]['ids'])
for timestep, t_data in enumerate(output_data):
for k in t_data.keys():
output_data[timestep][k] = np.array(output_data[timestep][k])
return output_data
def save_as_png(t_data, out_file, im_h, im_w):
""" Save a set of segmentation masks into a PNG format, the same as used for the DAVIS dataset."""
if len(t_data['mask_rles']) > 0:
coco_masks = create_coco_mask(t_data['mask_rles'], t_data['im_hs'], t_data['im_ws'])
list_of_np_masks = [mask_utils.decode(mask) for mask in coco_masks]
png = np.zeros((t_data['im_hs'][0], t_data['im_ws'][0]))
for mask, c_id in zip(list_of_np_masks, t_data['ids']):
png[mask.astype("bool")] = c_id + 1
else:
png = np.zeros((im_h, im_w))
if not os.path.exists(os.path.dirname(out_file)):
os.makedirs(os.path.dirname(out_file))
colmap = (np.array(pascal_colormap) * 255).round().astype("uint8")
palimage = Image.new('P', (16, 16))
palimage.putpalette(colmap)
im = Image.fromarray(np.squeeze(png.astype("uint8")))
im2 = im.quantize(palette=palimage)
im2.save(out_file)
def get_frame_size(data):
""" Gets frame height and width from data. """
for cls, cls_data in data.items():
for timestep, t_data in enumerate(cls_data):
if len(t_data['im_hs'] > 0):
im_h = t_data['im_hs'][0]
im_w = t_data['im_ws'][0]
return im_h, im_w
return None

View File

@ -0,0 +1,92 @@
"""
Non-Overlap: Code to take in a set of raw detections and produce a set of non-overlapping detections from it.
Author: Jonathon Luiten
"""
import os
import sys
from multiprocessing.pool import Pool
from multiprocessing import freeze_support
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')))
from trackeval.baselines import baseline_utils as butils
from trackeval.utils import get_code_path
code_path = get_code_path()
config = {
'INPUT_FOL': os.path.join(code_path, 'data/detections/rob_mots/{split}/raw_supplied/data/'),
'OUTPUT_FOL': os.path.join(code_path, 'data/detections/rob_mots/{split}/non_overlap_supplied/data/'),
'SPLIT': 'train', # valid: 'train', 'val', 'test'.
'Benchmarks': None, # If None, all benchmarks in SPLIT.
'Num_Parallel_Cores': None, # If None, run without parallel.
'THRESHOLD_NMS_MASK_IOU': 0.5,
}
def do_sequence(seq_file):
# Load input data from file (e.g. provided detections)
# data format: data['cls'][t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles'}
data = butils.load_seq(seq_file)
# Converts data from a class-separated to a class-combined format.
# data[t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles', 'cls'}
data = butils.combine_classes(data)
# Where to accumulate output data for writing out
output_data = []
# Run for each timestep.
for timestep, t_data in enumerate(data):
# Remove redundant masks by performing non-maximum suppression (NMS)
t_data = butils.mask_NMS(t_data, nms_threshold=config['THRESHOLD_NMS_MASK_IOU'])
# Perform non-overlap, to get non_overlapping masks.
t_data = butils.non_overlap(t_data, already_sorted=True)
# Save result in output format to write to file later.
# Output Format = [timestep ID class score im_h im_w mask_RLE]
for i in range(len(t_data['ids'])):
row = [timestep, int(t_data['ids'][i]), t_data['cls'][i], t_data['scores'][i], t_data['im_hs'][i],
t_data['im_ws'][i], t_data['mask_rles'][i]]
output_data.append(row)
# Write results to file
out_file = seq_file.replace(config['INPUT_FOL'].format(split=config['SPLIT']),
config['OUTPUT_FOL'].format(split=config['SPLIT']))
butils.write_seq(output_data, out_file)
print('DONE:', seq_file)
if __name__ == '__main__':
# Required to fix bug in multiprocessing on windows.
freeze_support()
# Obtain list of sequences to run tracker for.
if config['Benchmarks']:
benchmarks = config['Benchmarks']
else:
benchmarks = ['davis_unsupervised', 'kitti_mots', 'youtube_vis', 'ovis', 'bdd_mots', 'tao']
if config['SPLIT'] != 'train':
benchmarks += ['waymo', 'mots_challenge']
seqs_todo = []
for bench in benchmarks:
bench_fol = os.path.join(config['INPUT_FOL'].format(split=config['SPLIT']), bench)
seqs_todo += [os.path.join(bench_fol, seq) for seq in os.listdir(bench_fol)]
# Run in parallel
if config['Num_Parallel_Cores']:
with Pool(config['Num_Parallel_Cores']) as pool:
results = pool.map(do_sequence, seqs_todo)
# Run in series
else:
for seq_todo in seqs_todo:
do_sequence(seq_todo)

View File

@ -0,0 +1,257 @@
pascal_colormap = [
0 , 0, 0,
0.5020, 0, 0,
0, 0.5020, 0,
0.5020, 0.5020, 0,
0, 0, 0.5020,
0.5020, 0, 0.5020,
0, 0.5020, 0.5020,
0.5020, 0.5020, 0.5020,
0.2510, 0, 0,
0.7529, 0, 0,
0.2510, 0.5020, 0,
0.7529, 0.5020, 0,
0.2510, 0, 0.5020,
0.7529, 0, 0.5020,
0.2510, 0.5020, 0.5020,
0.7529, 0.5020, 0.5020,
0, 0.2510, 0,
0.5020, 0.2510, 0,
0, 0.7529, 0,
0.5020, 0.7529, 0,
0, 0.2510, 0.5020,
0.5020, 0.2510, 0.5020,
0, 0.7529, 0.5020,
0.5020, 0.7529, 0.5020,
0.2510, 0.2510, 0,
0.7529, 0.2510, 0,
0.2510, 0.7529, 0,
0.7529, 0.7529, 0,
0.2510, 0.2510, 0.5020,
0.7529, 0.2510, 0.5020,
0.2510, 0.7529, 0.5020,
0.7529, 0.7529, 0.5020,
0, 0, 0.2510,
0.5020, 0, 0.2510,
0, 0.5020, 0.2510,
0.5020, 0.5020, 0.2510,
0, 0, 0.7529,
0.5020, 0, 0.7529,
0, 0.5020, 0.7529,
0.5020, 0.5020, 0.7529,
0.2510, 0, 0.2510,
0.7529, 0, 0.2510,
0.2510, 0.5020, 0.2510,
0.7529, 0.5020, 0.2510,
0.2510, 0, 0.7529,
0.7529, 0, 0.7529,
0.2510, 0.5020, 0.7529,
0.7529, 0.5020, 0.7529,
0, 0.2510, 0.2510,
0.5020, 0.2510, 0.2510,
0, 0.7529, 0.2510,
0.5020, 0.7529, 0.2510,
0, 0.2510, 0.7529,
0.5020, 0.2510, 0.7529,
0, 0.7529, 0.7529,
0.5020, 0.7529, 0.7529,
0.2510, 0.2510, 0.2510,
0.7529, 0.2510, 0.2510,
0.2510, 0.7529, 0.2510,
0.7529, 0.7529, 0.2510,
0.2510, 0.2510, 0.7529,
0.7529, 0.2510, 0.7529,
0.2510, 0.7529, 0.7529,
0.7529, 0.7529, 0.7529,
0.1255, 0, 0,
0.6275, 0, 0,
0.1255, 0.5020, 0,
0.6275, 0.5020, 0,
0.1255, 0, 0.5020,
0.6275, 0, 0.5020,
0.1255, 0.5020, 0.5020,
0.6275, 0.5020, 0.5020,
0.3765, 0, 0,
0.8784, 0, 0,
0.3765, 0.5020, 0,
0.8784, 0.5020, 0,
0.3765, 0, 0.5020,
0.8784, 0, 0.5020,
0.3765, 0.5020, 0.5020,
0.8784, 0.5020, 0.5020,
0.1255, 0.2510, 0,
0.6275, 0.2510, 0,
0.1255, 0.7529, 0,
0.6275, 0.7529, 0,
0.1255, 0.2510, 0.5020,
0.6275, 0.2510, 0.5020,
0.1255, 0.7529, 0.5020,
0.6275, 0.7529, 0.5020,
0.3765, 0.2510, 0,
0.8784, 0.2510, 0,
0.3765, 0.7529, 0,
0.8784, 0.7529, 0,
0.3765, 0.2510, 0.5020,
0.8784, 0.2510, 0.5020,
0.3765, 0.7529, 0.5020,
0.8784, 0.7529, 0.5020,
0.1255, 0, 0.2510,
0.6275, 0, 0.2510,
0.1255, 0.5020, 0.2510,
0.6275, 0.5020, 0.2510,
0.1255, 0, 0.7529,
0.6275, 0, 0.7529,
0.1255, 0.5020, 0.7529,
0.6275, 0.5020, 0.7529,
0.3765, 0, 0.2510,
0.8784, 0, 0.2510,
0.3765, 0.5020, 0.2510,
0.8784, 0.5020, 0.2510,
0.3765, 0, 0.7529,
0.8784, 0, 0.7529,
0.3765, 0.5020, 0.7529,
0.8784, 0.5020, 0.7529,
0.1255, 0.2510, 0.2510,
0.6275, 0.2510, 0.2510,
0.1255, 0.7529, 0.2510,
0.6275, 0.7529, 0.2510,
0.1255, 0.2510, 0.7529,
0.6275, 0.2510, 0.7529,
0.1255, 0.7529, 0.7529,
0.6275, 0.7529, 0.7529,
0.3765, 0.2510, 0.2510,
0.8784, 0.2510, 0.2510,
0.3765, 0.7529, 0.2510,
0.8784, 0.7529, 0.2510,
0.3765, 0.2510, 0.7529,
0.8784, 0.2510, 0.7529,
0.3765, 0.7529, 0.7529,
0.8784, 0.7529, 0.7529,
0, 0.1255, 0,
0.5020, 0.1255, 0,
0, 0.6275, 0,
0.5020, 0.6275, 0,
0, 0.1255, 0.5020,
0.5020, 0.1255, 0.5020,
0, 0.6275, 0.5020,
0.5020, 0.6275, 0.5020,
0.2510, 0.1255, 0,
0.7529, 0.1255, 0,
0.2510, 0.6275, 0,
0.7529, 0.6275, 0,
0.2510, 0.1255, 0.5020,
0.7529, 0.1255, 0.5020,
0.2510, 0.6275, 0.5020,
0.7529, 0.6275, 0.5020,
0, 0.3765, 0,
0.5020, 0.3765, 0,
0, 0.8784, 0,
0.5020, 0.8784, 0,
0, 0.3765, 0.5020,
0.5020, 0.3765, 0.5020,
0, 0.8784, 0.5020,
0.5020, 0.8784, 0.5020,
0.2510, 0.3765, 0,
0.7529, 0.3765, 0,
0.2510, 0.8784, 0,
0.7529, 0.8784, 0,
0.2510, 0.3765, 0.5020,
0.7529, 0.3765, 0.5020,
0.2510, 0.8784, 0.5020,
0.7529, 0.8784, 0.5020,
0, 0.1255, 0.2510,
0.5020, 0.1255, 0.2510,
0, 0.6275, 0.2510,
0.5020, 0.6275, 0.2510,
0, 0.1255, 0.7529,
0.5020, 0.1255, 0.7529,
0, 0.6275, 0.7529,
0.5020, 0.6275, 0.7529,
0.2510, 0.1255, 0.2510,
0.7529, 0.1255, 0.2510,
0.2510, 0.6275, 0.2510,
0.7529, 0.6275, 0.2510,
0.2510, 0.1255, 0.7529,
0.7529, 0.1255, 0.7529,
0.2510, 0.6275, 0.7529,
0.7529, 0.6275, 0.7529,
0, 0.3765, 0.2510,
0.5020, 0.3765, 0.2510,
0, 0.8784, 0.2510,
0.5020, 0.8784, 0.2510,
0, 0.3765, 0.7529,
0.5020, 0.3765, 0.7529,
0, 0.8784, 0.7529,
0.5020, 0.8784, 0.7529,
0.2510, 0.3765, 0.2510,
0.7529, 0.3765, 0.2510,
0.2510, 0.8784, 0.2510,
0.7529, 0.8784, 0.2510,
0.2510, 0.3765, 0.7529,
0.7529, 0.3765, 0.7529,
0.2510, 0.8784, 0.7529,
0.7529, 0.8784, 0.7529,
0.1255, 0.1255, 0,
0.6275, 0.1255, 0,
0.1255, 0.6275, 0,
0.6275, 0.6275, 0,
0.1255, 0.1255, 0.5020,
0.6275, 0.1255, 0.5020,
0.1255, 0.6275, 0.5020,
0.6275, 0.6275, 0.5020,
0.3765, 0.1255, 0,
0.8784, 0.1255, 0,
0.3765, 0.6275, 0,
0.8784, 0.6275, 0,
0.3765, 0.1255, 0.5020,
0.8784, 0.1255, 0.5020,
0.3765, 0.6275, 0.5020,
0.8784, 0.6275, 0.5020,
0.1255, 0.3765, 0,
0.6275, 0.3765, 0,
0.1255, 0.8784, 0,
0.6275, 0.8784, 0,
0.1255, 0.3765, 0.5020,
0.6275, 0.3765, 0.5020,
0.1255, 0.8784, 0.5020,
0.6275, 0.8784, 0.5020,
0.3765, 0.3765, 0,
0.8784, 0.3765, 0,
0.3765, 0.8784, 0,
0.8784, 0.8784, 0,
0.3765, 0.3765, 0.5020,
0.8784, 0.3765, 0.5020,
0.3765, 0.8784, 0.5020,
0.8784, 0.8784, 0.5020,
0.1255, 0.1255, 0.2510,
0.6275, 0.1255, 0.2510,
0.1255, 0.6275, 0.2510,
0.6275, 0.6275, 0.2510,
0.1255, 0.1255, 0.7529,
0.6275, 0.1255, 0.7529,
0.1255, 0.6275, 0.7529,
0.6275, 0.6275, 0.7529,
0.3765, 0.1255, 0.2510,
0.8784, 0.1255, 0.2510,
0.3765, 0.6275, 0.2510,
0.8784, 0.6275, 0.2510,
0.3765, 0.1255, 0.7529,
0.8784, 0.1255, 0.7529,
0.3765, 0.6275, 0.7529,
0.8784, 0.6275, 0.7529,
0.1255, 0.3765, 0.2510,
0.6275, 0.3765, 0.2510,
0.1255, 0.8784, 0.2510,
0.6275, 0.8784, 0.2510,
0.1255, 0.3765, 0.7529,
0.6275, 0.3765, 0.7529,
0.1255, 0.8784, 0.7529,
0.6275, 0.8784, 0.7529,
0.3765, 0.3765, 0.2510,
0.8784, 0.3765, 0.2510,
0.3765, 0.8784, 0.2510,
0.8784, 0.8784, 0.2510,
0.3765, 0.3765, 0.7529,
0.8784, 0.3765, 0.7529,
0.3765, 0.8784, 0.7529,
0.8784, 0.8784, 0.7529]

View File

@ -0,0 +1,144 @@
"""
STP: Simplest Tracker Possible
Author: Jonathon Luiten
This simple tracker, simply assigns track IDs which maximise the 'bounding box IoU' between previous tracks and current
detections. It is also able to match detections to tracks at more than one timestep previously.
"""
import os
import sys
import numpy as np
from multiprocessing.pool import Pool
from multiprocessing import freeze_support
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')))
from trackeval.baselines import baseline_utils as butils
from trackeval.utils import get_code_path
code_path = get_code_path()
config = {
'INPUT_FOL': os.path.join(code_path, 'data/detections/rob_mots/{split}/non_overlap_supplied/data/'),
'OUTPUT_FOL': os.path.join(code_path, 'data/trackers/rob_mots/{split}/STP/data/'),
'SPLIT': 'train', # valid: 'train', 'val', 'test'.
'Benchmarks': None, # If None, all benchmarks in SPLIT.
'Num_Parallel_Cores': None, # If None, run without parallel.
'DETECTION_THRESHOLD': 0.5,
'ASSOCIATION_THRESHOLD': 1e-10,
'MAX_FRAMES_SKIP': 7
}
def track_sequence(seq_file):
# Load input data from file (e.g. provided detections)
# data format: data['cls'][t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles'}
data = butils.load_seq(seq_file)
# Where to accumulate output data for writing out
output_data = []
# To ensure IDs are unique per object across all classes.
curr_max_id = 0
# Run tracker for each class.
for cls, cls_data in data.items():
# Initialize container for holding previously tracked objects.
prev = {'boxes': np.empty((0, 4)),
'ids': np.array([], np.int),
'timesteps': np.array([])}
# Run tracker for each timestep.
for timestep, t_data in enumerate(cls_data):
# Threshold detections.
t_data = butils.threshold(t_data, config['DETECTION_THRESHOLD'])
# Convert mask dets to bounding boxes.
boxes = butils.masks2boxes(t_data['mask_rles'], t_data['im_hs'], t_data['im_ws'])
# Calculate IoU between previous and current frame dets.
ious = butils.box_iou(prev['boxes'], boxes)
# Score which decreases quickly for previous dets depending on how many timesteps before they come from.
prev_timestep_scores = np.power(10, -1 * prev['timesteps'])
# Matching score is such that it first tries to match 'most recent timesteps',
# and within each timestep maximised IoU.
match_scores = prev_timestep_scores[:, np.newaxis] * ious
# Find best matching between current dets and previous tracks.
match_rows, match_cols = butils.match(match_scores)
# Remove matches that have an IoU below a certain threshold.
actually_matched_mask = ious[match_rows, match_cols] > config['ASSOCIATION_THRESHOLD']
match_rows = match_rows[actually_matched_mask]
match_cols = match_cols[actually_matched_mask]
# Assign the prev track ID to the current dets if they were matched.
ids = np.nan * np.ones((len(boxes),), np.int)
ids[match_cols] = prev['ids'][match_rows]
# Create new track IDs for dets that were not matched to previous tracks.
num_not_matched = len(ids) - len(match_cols)
new_ids = np.arange(curr_max_id + 1, curr_max_id + num_not_matched + 1)
ids[np.isnan(ids)] = new_ids
# Update maximum ID to ensure future added tracks have a unique ID value.
curr_max_id += num_not_matched
# Drop tracks from 'previous tracks' if they have not been matched in the last MAX_FRAMES_SKIP frames.
unmatched_rows = [i for i in range(len(prev['ids'])) if
i not in match_rows and (prev['timesteps'][i] + 1 <= config['MAX_FRAMES_SKIP'])]
# Update the set of previous tracking results to include the newly tracked detections.
prev['ids'] = np.concatenate((ids, prev['ids'][unmatched_rows]), axis=0)
prev['boxes'] = np.concatenate((np.atleast_2d(boxes), np.atleast_2d(prev['boxes'][unmatched_rows])), axis=0)
prev['timesteps'] = np.concatenate((np.zeros((len(ids),)), prev['timesteps'][unmatched_rows] + 1), axis=0)
# Save result in output format to write to file later.
# Output Format = [timestep ID class score im_h im_w mask_RLE]
for i in range(len(t_data['ids'])):
row = [timestep, int(ids[i]), cls, t_data['scores'][i], t_data['im_hs'][i], t_data['im_ws'][i],
t_data['mask_rles'][i]]
output_data.append(row)
# Write results to file
out_file = seq_file.replace(config['INPUT_FOL'].format(split=config['SPLIT']),
config['OUTPUT_FOL'].format(split=config['SPLIT']))
butils.write_seq(output_data, out_file)
print('DONE:', seq_file)
if __name__ == '__main__':
# Required to fix bug in multiprocessing on windows.
freeze_support()
# Obtain list of sequences to run tracker for.
if config['Benchmarks']:
benchmarks = config['Benchmarks']
else:
benchmarks = ['davis_unsupervised', 'kitti_mots', 'youtube_vis', 'ovis', 'bdd_mots', 'tao']
if config['SPLIT'] != 'train':
benchmarks += ['waymo', 'mots_challenge']
seqs_todo = []
for bench in benchmarks:
bench_fol = os.path.join(config['INPUT_FOL'].format(split=config['SPLIT']), bench)
seqs_todo += [os.path.join(bench_fol, seq) for seq in os.listdir(bench_fol)]
# Run in parallel
if config['Num_Parallel_Cores']:
with Pool(config['Num_Parallel_Cores']) as pool:
results = pool.map(track_sequence, seqs_todo)
# Run in series
else:
for seq_todo in seqs_todo:
track_sequence(seq_todo)

View File

@ -0,0 +1,92 @@
"""
Thresholder
Author: Jonathon Luiten
Simply reads in a set of detection, thresholds them at a certain score threshold, and writes them out again.
"""
import os
import sys
from multiprocessing.pool import Pool
from multiprocessing import freeze_support
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')))
from trackeval.baselines import baseline_utils as butils
from trackeval.utils import get_code_path
THRESHOLD = 0.2
code_path = get_code_path()
config = {
'INPUT_FOL': os.path.join(code_path, 'data/detections/rob_mots/{split}/non_overlap_supplied/data/'),
'OUTPUT_FOL': os.path.join(code_path, 'data/detections/rob_mots/{split}/threshold_' + str(100*THRESHOLD) + '/data/'),
'SPLIT': 'train', # valid: 'train', 'val', 'test'.
'Benchmarks': None, # If None, all benchmarks in SPLIT.
'Num_Parallel_Cores': None, # If None, run without parallel.
'DETECTION_THRESHOLD': THRESHOLD,
}
def do_sequence(seq_file):
# Load input data from file (e.g. provided detections)
# data format: data['cls'][t] = {'ids', 'scores', 'im_hs', 'im_ws', 'mask_rles'}
data = butils.load_seq(seq_file)
# Where to accumulate output data for writing out
output_data = []
# Run for each class.
for cls, cls_data in data.items():
# Run for each timestep.
for timestep, t_data in enumerate(cls_data):
# Threshold detections.
t_data = butils.threshold(t_data, config['DETECTION_THRESHOLD'])
# Save result in output format to write to file later.
# Output Format = [timestep ID class score im_h im_w mask_RLE]
for i in range(len(t_data['ids'])):
row = [timestep, int(t_data['ids'][i]), cls, t_data['scores'][i], t_data['im_hs'][i],
t_data['im_ws'][i], t_data['mask_rles'][i]]
output_data.append(row)
# Write results to file
out_file = seq_file.replace(config['INPUT_FOL'].format(split=config['SPLIT']),
config['OUTPUT_FOL'].format(split=config['SPLIT']))
butils.write_seq(output_data, out_file)
print('DONE:', seq_todo)
if __name__ == '__main__':
# Required to fix bug in multiprocessing on windows.
freeze_support()
# Obtain list of sequences to run tracker for.
if config['Benchmarks']:
benchmarks = config['Benchmarks']
else:
benchmarks = ['davis_unsupervised', 'kitti_mots', 'youtube_vis', 'ovis', 'bdd_mots', 'tao']
if config['SPLIT'] != 'train':
benchmarks += ['waymo', 'mots_challenge']
seqs_todo = []
for bench in benchmarks:
bench_fol = os.path.join(config['INPUT_FOL'].format(split=config['SPLIT']), bench)
seqs_todo += [os.path.join(bench_fol, seq) for seq in os.listdir(bench_fol)]
# Run in parallel
if config['Num_Parallel_Cores']:
with Pool(config['Num_Parallel_Cores']) as pool:
results = pool.map(do_sequence, seqs_todo)
# Run in series
else:
for seq_todo in seqs_todo:
do_sequence(seq_todo)

Some files were not shown because too many files have changed in this diff Show More