commit 189b75ad3fc2d4a0d40a818ca298526d254ccdc4 Author: Greg Kroah-Hartman Date: Sat Jan 26 09:38:36 2019 +0100 Linux 4.9.153 commit 4b527f25a4ac64c0d17c882cdf8322be66f31611 Author: Dave Airlie Date: Thu Jan 24 18:54:15 2019 +0000 locking/qspinlock: Pull in asm/byteorder.h to ensure correct endianness This commit is not required upstream, but is required for the 4.9.y stable series. Upstream commit 101110f6271c ("Kbuild: always define endianess in kconfig.h") ensures that either __LITTLE_ENDIAN or __BIG_ENDIAN is defined to reflect the endianness of the target CPU architecture regardless of whether or not has been #included. The upstream definition of 'struct qspinlock' relies on this property. Unfortunately, the 4.9.y stable series does not provide this guarantee, so the 'spin_unlock()' routine can erroneously treat the underlying lockword as big-endian on little-endian architectures using native qspinlock (i.e. x86_64 without PV) if the caller has not included . This can lead to hangs such as the one in 'i915_gem_request()' reported via bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202063 Fix the issue by ensuring that is #included in , where 'struct qspinlock' is defined. Cc: # 4.9 Signed-off-by: Dave Airlie [will: wrote commit message] Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit cac2590d2582c04f995c2b1cd2c30d8a0c11b36e Author: Corey Minyard Date: Fri Nov 16 09:59:21 2018 -0600 ipmi:ssif: Fix handling of multi-part return messages commit 7d6380cd40f7993f75c4bde5b36f6019237e8719 upstream. The block number was not being compared right, it was off by one when checking the response. Some statistics wouldn't be incremented properly in some cases. Check to see if that middle-part messages always have 31 bytes of data. Signed-off-by: Corey Minyard Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Greg Kroah-Hartman commit 28aeb4c95f8a8da9c65461979fbb8eddf7d4f56f Author: Michal Hocko Date: Fri Dec 28 00:38:17 2018 -0800 mm, proc: be more verbose about unstable VMA flags in /proc//smaps [ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ] Patch series "THP eligibility reporting via proc". This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution. A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse. The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful. The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1] http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz This patch (of 3): Even though vma flags exported via /proc//smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time. Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc//smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel. http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com : Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp. In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well. Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Jan Kara Acked-by: Dan Williams Acked-by: David Rientjes Acked-by: Mike Rapoport Acked-by: Vlastimil Babka Cc: Dan Williams Cc: David Rientjes Cc: Paul Oppenheimer Cc: William Kucharski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit c6e4be626e83a3e61004c944bb5aa210ec3447cc Author: Brian Foster Date: Fri Dec 28 00:37:20 2018 -0800 mm/page-writeback.c: don't break integrity writeback on ->writepage() error [ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ] write_cache_pages() is used in both background and integrity writeback scenarios by various filesystems. Background writeback is mostly concerned with cleaning a certain number of dirty pages based on various mm heuristics. It may not write the full set of dirty pages or wait for I/O to complete. Integrity writeback is responsible for persisting a set of dirty pages before the writeback job completes. For example, an fsync() call must perform integrity writeback to ensure data is on disk before the call returns. write_cache_pages() unconditionally breaks out of its processing loop in the event of a ->writepage() error. This is fine for background writeback, which had no strict requirements and will eventually come around again. This can cause problems for integrity writeback on filesystems that might need to clean up state associated with failed page writeouts. For example, XFS performs internal delayed allocation accounting before returning a ->writepage() error, where applicable. If the current writeback happens to be associated with an unmount and write_cache_pages() completes the writeback prematurely due to error, the filesystem is unmounted in an inconsistent state if dirty+delalloc pages still exist. To handle this problem, update write_cache_pages() to always process the full set of pages for integrity writeback regardless of ->writepage() errors. Save the first encountered error and return it to the caller once complete. This facilitates XFS (or any other fs that expects integrity writeback to process the entire set of dirty pages) to clean up its internal state completely in the event of persistent mapping errors. Background writeback continues to exit on the first error encountered. [akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com Signed-off-by: Brian Foster Reviewed-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 3cc44695613e98bbe3df60fc5297a3a9b55e45a4 Author: Junxiao Bi Date: Fri Dec 28 00:32:50 2018 -0800 ocfs2: fix panic due to unrecovered local alloc [ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ] mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic. This issue was reported at https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run. (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777 o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[] [] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Reviewed-by: Yiwen Jiang Acked-by: Joseph Qi Cc: Jun Piao Cc: Mark Fasheh Cc: Joel Becker Cc: Changwei Ge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 7e4c9c4738231f4b96e819e31c94b877d2cf5178 Author: Qian Cai Date: Thu Dec 13 08:27:27 2018 -0500 scsi: megaraid: fix out-of-bound array accesses [ Upstream commit c7a082e4242fd8cd21a441071e622f87c16bdacc ] UBSAN reported those with MegaRAID SAS-3 3108, [ 77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1 [ 77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 77.495791] Workqueue: events work_for_cpu_fn [ 77.500154] Call trace: [ 77.502610] dump_backtrace+0x0/0x2c8 [ 77.506279] show_stack+0x24/0x30 [ 77.509604] dump_stack+0x118/0x19c [ 77.513098] ubsan_epilogue+0x14/0x60 [ 77.516765] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 77.521767] mr_update_load_balance_params+0x150/0x158 [megaraid_sas] [ 77.528230] MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas] [ 77.533825] megasas_get_map_info+0x244/0x2f0 [megaraid_sas] [ 77.539505] megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas] [ 77.545794] megasas_init_fw+0x1ab4/0x3518 [megaraid_sas] [ 77.551212] megasas_probe_one+0x2c4/0xbe0 [megaraid_sas] [ 77.556614] local_pci_probe+0x7c/0xf0 [ 77.560365] work_for_cpu_fn+0x34/0x50 [ 77.564118] process_one_work+0x61c/0xf08 [ 77.568129] worker_thread+0x534/0xa70 [ 77.571882] kthread+0x1c8/0x1d0 [ 77.575114] ret_from_fork+0x10/0x1c [ 89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1 [ 89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 89.268903] Workqueue: events_unbound async_run_entry_fn [ 89.274222] Call trace: [ 89.276680] dump_backtrace+0x0/0x2c8 [ 89.280348] show_stack+0x24/0x30 [ 89.283671] dump_stack+0x118/0x19c [ 89.287167] ubsan_epilogue+0x14/0x60 [ 89.290835] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 89.295828] MR_LdRaidGet+0x50/0x58 [megaraid_sas] [ 89.300638] megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas] [ 89.306576] megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas] [ 89.313468] megasas_queue_command+0x398/0x3d0 [megaraid_sas] [ 89.319222] scsi_dispatch_cmd+0x1dc/0x8a8 [ 89.323321] scsi_request_fn+0x8e8/0xdd0 [ 89.327249] __blk_run_queue+0xc4/0x158 [ 89.331090] blk_execute_rq_nowait+0xf4/0x158 [ 89.335449] blk_execute_rq+0xdc/0x158 [ 89.339202] __scsi_execute+0x130/0x258 [ 89.343041] scsi_probe_and_add_lun+0x2fc/0x1488 [ 89.347661] __scsi_scan_target+0x1cc/0x8c8 [ 89.351848] scsi_scan_channel.part.3+0x8c/0xc0 [ 89.356382] scsi_scan_host_selected+0x130/0x1f0 [ 89.361002] do_scsi_scan_host+0xd8/0xf0 [ 89.364927] do_scan_async+0x9c/0x320 [ 89.368594] async_run_entry_fn+0x138/0x420 [ 89.372780] process_one_work+0x61c/0xf08 [ 89.376793] worker_thread+0x13c/0xa70 [ 89.380546] kthread+0x1c8/0x1d0 [ 89.383778] ret_from_fork+0x10/0x1c This is because when populating Driver Map using firmware raid map, all non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later. From drivers/scsi/megaraid/megaraid_sas_base.c , memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS); From drivers/scsi/megaraid/megaraid_sas_fp.c , /* For non existing VDs, iterate to next VD*/ if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1)) continue; However, there are a few places that failed to skip those non-existing VDs due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff, map) and triggered the out-of-bound accesses. Fixes: 51087a8617fe ("megaraid_sas : Extended VD support") Signed-off-by: Qian Cai Acked-by: Sumit Saxena Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 1da52f241f5251225f1594af2a2af3f3f792263d Author: Kevin Barnett Date: Fri Dec 7 16:29:51 2018 -0600 scsi: smartpqi: correct lun reset issues [ Upstream commit 2ba55c9851d74eb015a554ef69ddf2ef061d5780 ] Problem: The Linux kernel takes a logical volume offline after a LUN reset. This is generally accompanied by this message in the dmesg output: Device offlined - not ready after error recovery Root Cause: The root cause is a "quirk" in the timeout handling in the Linux SCSI layer. The Linux kernel places a 30-second timeout on most media access commands (reads and writes) that it send to device drivers. When a media access command times out, the Linux kernel goes into error recovery mode for the LUN that was the target of the command that timed out. Every command that timed out is kept on a list inside of the Linux kernel to be retried later. The kernel attempts to recover the command(s) that timed out by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and TEST UNIT READY commands are successful, the kernel retries the command(s) that timed out. Each SCSI command issued by the kernel has a result field associated with it. This field indicates the final result of the command (success or error). When a command times out, the kernel places a value in this result field indicating that the command timed out. The "quirk" is that after the LUN reset and TEST UNIT READY commands are completed, the kernel checks each command on the timed-out command list before retrying it. If the result field is still "timed out", the kernel treats that command as not having been successfully recovered for a retry. If the number of commands that are in this state are greater than two, the kernel takes the LUN offline. Fix: When our RAIDStack receives a LUN reset, it simply waits until all outstanding commands complete. Generally, all of these outstanding commands complete successfully. Therefore, the fix in the smartpqi driver is to always set the command result field to indicate success when a request completes successfully. This normally isn’t necessary because the result field is always initialized to success when the command is submitted to the driver. So when the command completes successfully, the result field is left untouched. But in this case, the kernel changes the result field behind the driver’s back and then expects the field to be changed by the driver as the commands that timed-out complete. Reviewed-by: Dave Carroll Reviewed-by: Scott Teel Signed-off-by: Kevin Barnett Signed-off-by: Don Brace Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 54e6f64efc14213c2e76ea063645392a955b67e4 Author: Daniel Vetter Date: Wed Dec 19 13:39:09 2018 +0100 sysfs: Disable lockdep for driver bind/unbind files [ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ] This is the much more correct fix for my earlier attempt at: https://lkml.org/lkml/2018/12/10/118 Short recap: - There's not actually a locking issue, it's just lockdep being a bit too eager to complain about a possible deadlock. - Contrary to what I claimed the real problem is recursion on kn->count. Greg pointed me at sysfs_break_active_protection(), used by the scsi subsystem to allow a sysfs file to unbind itself. That would be a real deadlock, which isn't what's happening here. Also, breaking the active protection means we'd need to manually handle all the lifetime fun. - With Rafael we discussed the task_work approach, which kinda works, but has two downsides: It's a functional change for a lockdep annotation issue, and it won't work for the bind file (which needs to get the errno from the driver load function back to userspace). - Greg also asked why this never showed up: To hit this you need to unregister a 2nd driver from the unload code of your first driver. I guess only gpus do that. The bug has always been there, but only with a recent patch series did we add more locks so that lockdep built a chain from unbinding the snd-hda driver to the acpi_video_unregister call. Full lockdep splat: [12301.898799] ============================================ [12301.898805] WARNING: possible recursive locking detected [12301.898811] 4.20.0-rc7+ #84 Not tainted [12301.898815] -------------------------------------------- [12301.898821] bash/5297 is trying to acquire lock: [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80 [12301.898841] but task is already holding lock: [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898856] other info that might help us debug this: [12301.898862] Possible unsafe locking scenario: [12301.898867] CPU0 [12301.898870] ---- [12301.898874] lock(kn->count#39); [12301.898879] lock(kn->count#39); [12301.898883] *** DEADLOCK *** [12301.898891] May be due to missing lock nesting notation [12301.898899] 5 locks held by bash/5297: [12301.898903] #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0 [12301.898915] #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190 [12301.898925] #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898936] #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240 [12301.898950] #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40 [12301.898960] stack backtrace: [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84 [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [12301.898982] Call Trace: [12301.898989] dump_stack+0x67/0x9b [12301.898997] __lock_acquire+0x6ad/0x1410 [12301.899003] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899010] ? find_held_lock+0x2d/0x90 [12301.899017] ? mutex_spin_on_owner+0xe4/0x150 [12301.899023] ? find_held_lock+0x2d/0x90 [12301.899030] ? lock_acquire+0x90/0x180 [12301.899036] lock_acquire+0x90/0x180 [12301.899042] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899049] __kernfs_remove+0x296/0x310 [12301.899055] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899060] ? kernfs_name_hash+0xd/0x80 [12301.899066] ? kernfs_find_ns+0x6c/0x100 [12301.899073] kernfs_remove_by_name_ns+0x3b/0x80 [12301.899080] bus_remove_driver+0x92/0xa0 [12301.899085] acpi_video_unregister+0x24/0x40 [12301.899127] i915_driver_unload+0x42/0x130 [i915] [12301.899160] i915_pci_remove+0x19/0x30 [i915] [12301.899169] pci_device_remove+0x36/0xb0 [12301.899176] device_release_driver_internal+0x185/0x240 [12301.899183] unbind_store+0xaf/0x180 [12301.899189] kernfs_fop_write+0x104/0x190 [12301.899195] __vfs_write+0x31/0x180 [12301.899203] ? rcu_read_lock_sched_held+0x6f/0x80 [12301.899209] ? rcu_sync_lockdep_assert+0x29/0x50 [12301.899216] ? __sb_start_write+0x13c/0x1a0 [12301.899221] ? vfs_write+0x17f/0x1b0 [12301.899227] vfs_write+0xb9/0x1b0 [12301.899233] ksys_write+0x50/0xc0 [12301.899239] do_syscall_64+0x4b/0x180 [12301.899247] entry_SYSCALL_64_after_hwframe+0x49/0xbe [12301.899253] RIP: 0033:0x7f452ac7f7a4 [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4 [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001 [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730 [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d Looking around I've noticed that usb and i2c already handle similar recursion problems, where a sysfs file can unbind the same type of sysfs somewhere else in the hierarchy. Relevant commits are: commit 356c05d58af05d582e634b54b40050c73609617b Author: Alan Stern Date: Mon May 14 13:30:03 2012 -0400 sysfs: get rid of some lockdep false positives commit e9b526fe704812364bca07edd15eadeba163ebfb Author: Alexander Sverdlin Date: Fri May 17 14:56:35 2013 +0200 i2c: suppress lockdep warning on delete_device Implement the same trick for driver bind/unbind. v2: Put the macro into bus.c (Greg). Reviewed-by: Rafael J. Wysocki Cc: Ramalingam C Cc: Arend van Spriel Cc: Andy Shevchenko Cc: Geert Uytterhoeven Cc: Bartosz Golaszewski Cc: Heikki Krogerus Cc: Vivek Gautam Cc: Joe Perches Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit d9c8a99f35c343b7ba7543ebe367df5f0b4e289f Author: Takashi Sakamoto Date: Wed Dec 19 20:00:42 2018 +0900 ALSA: bebob: fix model-id of unit for Apogee Ensemble [ Upstream commit 644b2e97405b0b74845e1d3c2b4fe4c34858062b ] This commit fixes hard-coded model-id for an unit of Apogee Ensemble with a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly known as BridgeCo AG). I note that this model supports three modes in the number of data channels in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is switched by Vendor-dependent AV/C command, like: $ cd linux-firewire-utils $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs) When switching between different mode, the unit disappears from IEEE 1394 bus, then appears on the bus with different combination of stream formats. In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else up to 192.0 kHz. $ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': True, 'chip_ID': 21474898341, 'cmc': True, 'cyc_clk_acc': 100, 'generation': 2, 'imc': True, 'isc': True, 'link_spd': 2, 'max_ROM': 1, 'max_rec': 512, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['HARDWARE_VERSION', 19], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble'], ['VERSION', 5297], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble']]], [ 'DEPENDENT_INFO', [ ['SPECIFIER_ID', 2037], ['VERSION', 1], [(58, 'IMMEDIATE'), 16777159], [(59, 'IMMEDIATE'), 1048576], [(60, 'IMMEDIATE'), 16777159], [(61, 'IMMEDIATE'), 6291456]]]]} Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit 69855b55f1ecd58140ed9d851c92e0c76c0345ea Author: Nikos Tsironis Date: Wed Oct 31 17:53:08 2018 -0400 dm snapshot: Fix excessive memory usage and workqueue stalls [ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ] kcopyd has no upper limit to the number of jobs one can allocate and issue. Under certain workloads this can lead to excessive memory usage and workqueue stalls. For example, when creating multiple dm-snapshot targets with a 4K chunk size and then writing to the origin through the page cache. Syncing the page cache causes a large number of BIOs to be issued to the dm-snapshot origin target, which itself issues an even larger (because of the BIO splitting taking place) number of kcopyd jobs. Running the following test, from the device mapper test suite [1], dmtest run --suite snapshot -n many_snapshots_of_same_volume_N , with 8 active snapshots, results in the kcopyd job slab cache growing to 10G. Depending on the available system RAM this can lead to the OOM killer killing user processes: [463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0 [463.492894] kthreadd cpuset=/ mems_allowed=0 [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3 [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [463.492952] Call Trace: [463.492964] dump_stack+0x7d/0xbb [463.492973] dump_header+0x6b/0x2fc [463.492987] ? lockdep_hardirqs_on+0xee/0x190 [463.493012] oom_kill_process+0x302/0x370 [463.493021] out_of_memory+0x113/0x560 [463.493030] __alloc_pages_slowpath+0xf40/0x1020 [463.493055] __alloc_pages_nodemask+0x348/0x3c0 [463.493067] cache_grow_begin+0x81/0x8b0 [463.493072] ? cache_grow_begin+0x874/0x8b0 [463.493078] fallback_alloc+0x1e4/0x280 [463.493092] kmem_cache_alloc_node+0xd6/0x370 [463.493098] ? copy_process.part.31+0x1c5/0x20d0 [463.493105] copy_process.part.31+0x1c5/0x20d0 [463.493115] ? __lock_acquire+0x3cc/0x1550 [463.493121] ? __switch_to_asm+0x34/0x70 [463.493129] ? kthread_create_worker_on_cpu+0x70/0x70 [463.493135] ? finish_task_switch+0x90/0x280 [463.493165] _do_fork+0xe0/0x6d0 [463.493191] ? kthreadd+0x19f/0x220 [463.493233] kernel_thread+0x25/0x30 [463.493235] kthreadd+0x1bf/0x220 [463.493242] ? kthread_create_on_cpu+0x90/0x90 [463.493248] ret_from_fork+0x3a/0x50 [463.493279] Mem-Info: [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0 [463.493285] active_file:80216 inactive_file:80107 isolated_file:435 [463.493285] unevictable:0 dirty:51266 writeback:109372 unstable:0 [463.493285] slab_reclaimable:31191 slab_unreclaimable:3483521 [463.493285] mapped:526 shmem:4903 pagetables:1759 bounce:0 [463.493285] free:33623 free_pcp:2392 free_cma:0 ... [463.493489] Unreclaimable slab info: [463.493513] Name Used Total [463.493522] bio-6 1028KB 1028KB [463.493525] bio-5 1028KB 1028KB [463.493528] dm_snap_pending_exception 236783KB 243789KB [463.493531] dm_exception 41KB 42KB [463.493534] bio-4 1216KB 1216KB [463.493537] bio-3 439396KB 439396KB [463.493539] kcopyd_job 6973427KB 6973427KB ... [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Moreover, issuing a large number of kcopyd jobs results in kcopyd hogging the CPU, while processing them. As a result, processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is stalled for long periods of time, hurting performance. Running the aforementioned test we get, in dmesg, messages like the following: [67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s! [67501.195586] Showing busy workqueues and worker pools: [67501.195591] workqueue events: flags=0x0 [67501.195597] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195611] pending: cache_reap [67501.195641] workqueue mm_percpu_wq: flags=0x8 [67501.195645] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195656] pending: vmstat_update [67501.195682] workqueue kblockd: flags=0x18 [67501.195687] pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256 [67501.195698] pending: blk_timeout_work [67501.195753] workqueue kcopyd: flags=0x8 [67501.195757] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195768] pending: do_work [dm_mod] [67501.195802] workqueue kcopyd: flags=0x8 [67501.195806] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195817] pending: do_work [dm_mod] [67501.195834] workqueue kcopyd: flags=0x8 [67501.195838] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195848] pending: do_work [dm_mod] [67501.195881] workqueue kcopyd: flags=0x8 [67501.195885] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195896] pending: do_work [dm_mod] [67501.195920] workqueue kcopyd: flags=0x8 [67501.195924] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256 [67501.195935] in-flight: 67:do_work [dm_mod] [67501.195945] pending: do_work [dm_mod] [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765 The root cause for these issues is the way dm-snapshot uses kcopyd. In particular, the lack of an explicit or implicit limit to the maximum number of in-flight COW jobs. The merging path is not affected because it implicitly limits the in-flight kcopyd jobs to one. Fix these issues by using a semaphore to limit the maximum number of in-flight kcopyd jobs. We grab the semaphore before allocating a new kcopyd job in start_copy() and start_full_bio() and release it after the job finishes in copy_callback(). The initial semaphore value is configurable through a module parameter, to allow fine tuning the maximum number of in-flight COW jobs. Setting this parameter to zero initializes the semaphore to INT_MAX. A default value of 2048 maximum in-flight kcopyd jobs was chosen. This value was decided experimentally as a trade-off between memory consumption, stalling the kernel's workqueues and maintaining a high enough throughput. Re-running the aforementioned test: * Workqueue stalls are eliminated * kcopyd's job slab cache uses a maximum of 130MB * The time taken by the test to write to the snapshot-origin target is reduced from 05m20.48s to 03m26.38s [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin commit 7a6da6292435a5245576a5304a3a30f28abd05ea Author: Arnaldo Carvalho de Melo Date: Tue Dec 11 15:00:52 2018 -0300 tools lib subcmd: Don't add the kernel sources to the include path [ Upstream commit ece9804985b57e1ccd83b1fb6288520955a29d51 ] At some point we decided not to directly include kernel sources files when building tools/perf/, but when tools/lib/subcmd/ was forked from tools/perf it somehow ended up adding it via these two lines in its Makefile: CFLAGS += -I$(srctree)/include/uapi CFLAGS += -I$(srctree)/include As $(srctree) points to the kernel sources. Removing those lines and keeping just: CFLAGS += -I$(srctree)/tools/include/ Is enough to build tools/perf and tools/objtool. This fixes the build when building from the sources in environments such as the Android NDK crossbuilding from a fedora:26 system: subcmd-util.h:11:15: error: expected ',' or ';' before 'void' static inline void report(const char *prefix, const char *err, va_list params) ^ In file included from /git/perf/include/uapi/linux/stddef.h:2:0, from /git/perf/include/uapi/linux/posix_types.h:5, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33, from run-command.c:2: subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h file that includes linux/posix_types.h ends up getting the one in the kernel sources causing the breakage. Fix it. Test built tools/objtool/ too. Reported-by: Jiri Olsa Tested-by: Jiri Olsa Cc: Adrian Hunter Cc: Josh Poimboeuf Cc: Namhyung Kim Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library") Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit e9566e9990414e73f5d875307286a75bb9838aa3 Author: Nikos Tsironis Date: Wed Oct 31 17:53:09 2018 -0400 dm kcopyd: Fix bug causing workqueue stalls [ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ] When using kcopyd to run callbacks through dm_kcopyd_do_callback() or submitting copy jobs with a source size of 0, the jobs are pushed directly to the complete_jobs list, which could be under processing by the kcopyd thread. As a result, the kcopyd thread can continue running completed jobs indefinitely, without releasing the CPU, as long as someone keeps submitting new completed jobs through the aforementioned paths. Processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is thus stalled for excessive amounts of time, hurting performance. Running the following test, from the device mapper test suite [1], dmtest run --suite snapshot -n parallel_io_to_many_snaps_N , with 8 active snapshots, we get, in dmesg, messages like the following: [68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s! [68899.949282] Showing busy workqueues and worker pools: [68899.949288] workqueue events: flags=0x0 [68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949306] pending: vmstat_shepherd, cache_reap [68899.949331] workqueue mm_percpu_wq: flags=0x8 [68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949345] pending: vmstat_update [68899.949387] workqueue dm_bufio_cache: flags=0x8 [68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [68899.949400] pending: work_fn [dm_bufio] [68899.949423] workqueue kcopyd: flags=0x8 [68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949437] pending: do_work [dm_mod] [68899.949452] workqueue kcopyd: flags=0x8 [68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949466] in-flight: 13:do_work [dm_mod] [68899.949474] pending: do_work [dm_mod] [68899.949487] workqueue kcopyd: flags=0x8 [68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949501] pending: do_work [dm_mod] [68899.949515] workqueue kcopyd: flags=0x8 [68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949529] pending: do_work [dm_mod] [68899.949541] workqueue kcopyd: flags=0x8 [68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949555] pending: do_work [dm_mod] [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084 Fix this by splitting the complete_jobs list into two parts: A user facing part, named callback_jobs, and one used internally by kcopyd, retaining the name complete_jobs. dm_kcopyd_do_callback() and dispatch_job() now push their jobs to the callback_jobs list, which is spliced to the complete_jobs list once, every time the kcopyd thread wakes up. This prevents kcopyd from hogging the CPU indefinitely and causing workqueue stalls. Re-running the aforementioned test: * Workqueue stalls are eliminated * The maximum writing time among all targets is reduced from 09m37.10s to 06m04.85s and the total run time of the test is reduced from 10m43.591s to 7m19.199s [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin commit 8691aa52fd2c943437dbf959c6c992ca1b72ab15 Author: Arnaldo Carvalho de Melo Date: Thu Dec 6 13:52:13 2018 -0300 perf parse-events: Fix unchecked usage of strncpy() [ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ] The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it. This fixes this warning on an Alpine Linux Edge system with gcc 8.2: util/parse-events.c: In function 'print_symbol_events': util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2508:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2511:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Fixes: 947b4ad1d198 ("perf list: Fix max event string size") Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit f9e281df94bac2e6538d80c06b8bf01beb5759d7 Author: Arnaldo Carvalho de Melo Date: Thu Dec 6 11:29:48 2018 -0300 perf svghelper: Fix unchecked usage of strncpy() [ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ] The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it. In this specific case this would only happen if fgets() was buggy, as its man page states that it should read one less byte than the size of the destination buffer, so that it can put the nul byte at the end of it, so it would never copy 255 non-nul chars, as fgets reads into the orig buffer at most 254 non-nul chars and terminates it. But lets just switch to strlcpy to keep the original intent and silence the gcc 8.2 warning. This fixes this warning on an Alpine Linux Edge system with gcc 8.2: In function 'cpu_model', inlined from 'svg_cpu_box' at util/svghelper.c:378:2: util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation] strncpy(cpu_m, &buf[13], 255); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Adrian Hunter Cc: Jiri Olsa Cc: Namhyung Kim Cc: Arjan van de Ven Fixes: f48d55ce7871 ("perf: Add a SVG helper library file") Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit 9da7dfdb8d97ea40cb4db8f722a78d6db3dff34f Author: Adrian Hunter Date: Mon Nov 26 14:12:52 2018 +0200 perf intel-pt: Fix error with config term "pt=0" [ Upstream commit 1c6f709b9f96366cc47af23c05ecec9b8c0c392d ] Users should never use 'pt=0', but if they do it may give a meaningless error: $ perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u). Fix that by forcing 'pt=1'. Committer testing: # perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u). /bin/dmesg | grep -i perf may provide additional information. # perf record -e intel_pt/pt=0/u uname pt=0 doesn't make sense, forcing pt=1 Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data ] # Signed-off-by: Adrian Hunter Tested-by: Arnaldo Carvalho de Melo Cc: Jiri Olsa Link: http://lkml.kernel.org/r/b7c5b4e5-9497-10e5-fd43-5f3e4a0fe51d@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit 80c8e52837206e5c61cffd35183798b64e659675 Author: Sergey Senozhatsky Date: Thu Dec 13 13:58:39 2018 +0900 tty/serial: do not free trasnmit buffer page under port lock [ Upstream commit d72402145ace0697a6a9e8e75a3de5bf3375f78d ] LKP has hit yet another circular locking dependency between uart console drivers and debugobjects [1]: CPU0 CPU1 rhltable_init() __init_work() debug_object_init uart_shutdown() /* db->lock */ /* uart_port->lock */ debug_print_object() free_page() printk() call_console_drivers() debug_check_no_obj_freed() /* uart_port->lock */ /* db->lock */ debug_print_object() So there are two dependency chains: uart_port->lock -> db->lock And db->lock -> uart_port->lock This particular circular locking dependency can be addressed in several ways: a) One way would be to move debug_print_object() out of db->lock scope and, thus, break the db->lock -> uart_port->lock chain. b) Another one would be to free() transmit buffer page out of db->lock in UART code; which is what this patch does. It makes sense to apply a) and b) independently: there are too many things going on behind free(), none of which depend on uart_port->lock. The patch fixes transmit buffer page free() in uart_shutdown() and, additionally, in uart_port_startup() (as was suggested by Dmitry Safonov). [1] https://lore.kernel.org/lkml/20181211091154.GL23332@shao2-debian/T/#u Signed-off-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Acked-by: Peter Zijlstra (Intel) Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Andrew Morton Cc: Waiman Long Cc: Dmitry Safonov Cc: Steven Rostedt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit 20f33f37b0b0a89bcc20f9c11cefbdf8e7db80ae Author: Jonas Danielsson Date: Fri Oct 19 16:40:05 2018 +0200 mmc: atmel-mci: do not assume idle after atmci_request_end [ Upstream commit ae460c115b7aa50c9a36cf78fced07b27962c9d0 ] On our AT91SAM9260 board we use the same sdio bus for wifi and for the sd card slot. This caused the atmel-mci to give the following splat on the serial console: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44 Modules linked in: CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 #14 Hardware name: Atmel AT91SAM9 [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (__warn+0xd8/0xf4) [] (__warn) from [] (warn_slowpath_null+0x1c/0x24) [] (warn_slowpath_null) from [] (atmci_send_command+0x24/0x44) [] (atmci_send_command) from [] (atmci_start_request+0x1f4/0x2dc) [] (atmci_start_request) from [] (atmci_request+0xf0/0x164) [] (atmci_request) from [] (mmc_start_request+0x280/0x2d0) [] (mmc_start_request) from [] (mmc_start_areq+0x230/0x330) [] (mmc_start_areq) from [] (mmc_blk_issue_rw_rq+0xc4/0x310) [] (mmc_blk_issue_rw_rq) from [] (mmc_blk_issue_rq+0x118/0x5ac) [] (mmc_blk_issue_rq) from [] (mmc_queue_thread+0xc4/0x118) [] (mmc_queue_thread) from [] (kthread+0x100/0x118) [] (kthread) from [] (ret_from_fork+0x14/0x34) ---[ end trace 594371ddfa284bd6 ]--- This is: WARN_ON(host->cmd); This was fixed on our board by letting atmci_request_end determine what state we are in. Instead of unconditionally setting it to STATE_IDLE on STATE_END_REQUEST. Signed-off-by: Jonas Danielsson Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin commit 4f67ca0965e62df8c8be9bf2893a43327a8b3f66 Author: Masahiro Yamada Date: Tue Dec 11 20:00:45 2018 +0900 kconfig: fix memory leak when EOF is encountered in quotation [ Upstream commit fbac5977d81cb2b2b7e37b11c459055d9585273c ] An unterminated string literal followed by new line is passed to the parser (with "multi-line strings not supported" warning shown), then handled properly there. On the other hand, an unterminated string literal at end of file is never passed to the parser, then results in memory leak. [Test Code] ----------(Kconfig begin)---------- source "Kconfig.inc" config A bool "a" -----------(Kconfig end)----------- --------(Kconfig.inc begin)-------- config B bool "b\No new line at end of file ---------(Kconfig.inc end)--------- [Summary from Valgrind] Before the fix: LEAK SUMMARY: definitely lost: 16 bytes in 1 blocks ... After the fix: LEAK SUMMARY: definitely lost: 0 bytes in 0 blocks ... Eliminate the memory leak path by handling this case. Of course, such a Kconfig file is wrong already, so I will add an error message later. Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin commit 7ff335ee509d50a77a626aef22413d6b02994625 Author: Masahiro Yamada Date: Tue Dec 11 20:00:44 2018 +0900 kconfig: fix file name and line number of warn_ignored_character() [ Upstream commit 77c1c0fa8b1477c5799bdad65026ea5ff676da44 ] Currently, warn_ignore_character() displays invalid file name and line number. The lexer should use current_file->name and yylineno, while the parser should use zconf_curname() and zconf_lineno(). This difference comes from that the lexer is always going ahead of the parser. The parser needs to look ahead one token to make a shift/reduce decision, so the lexer is requested to scan more text from the input file. This commit fixes the warning message from warn_ignored_character(). [Test Code] ----(Kconfig begin)---- / -----(Kconfig end)----- [Output] Before the fix: :0:warning: ignoring unsupported character '/' After the fix: Kconfig:1:warning: ignoring unsupported character '/' Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin commit 73319e8d4d1c18f0e38a7be3c2062d7e870278ac Author: Lucas Stach Date: Thu Nov 15 15:30:26 2018 +0100 clk: imx6q: reset exclusive gates on init [ Upstream commit f7542d817733f461258fd3a47d77da35b2d9fc81 ] The exclusive gates may be set up in the wrong way by software running before the clock driver comes up. In that case the exclusive setup is locked in its initial state, as the complementary function can't be activated without disabling the initial setup first. To avoid this lock situation, reset the exclusive gates to the off state and allow the kernel to provide the proper setup. Signed-off-by: Lucas Stach Reviewed-by: Dong Aisheng Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin commit e562057290b8be55f4eb26ad3a44be3193985e10 Author: David Disseldorp Date: Wed Dec 5 13:18:34 2018 +0100 scsi: target: use consistent left-aligned ASCII INQUIRY data [ Upstream commit 0de263577de5d5e052be5f4f93334e63cc8a7f0b ] spc5r17.pdf specifies: 4.3.1 ASCII data field requirements ASCII data fields shall contain only ASCII printable characters (i.e., code values 20h to 7Eh) and may be terminated with one or more ASCII null (00h) characters. ASCII data fields described as being left-aligned shall have any unused bytes at the end of the field (i.e., highest offset) and the unused bytes shall be filled with ASCII space characters (20h). LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR IDENTIFICATION field in the INQUIRY Device Identification VPD Page are zero-terminated/zero-padded. Fix this inconsistency by using space-padding for all of the above fields. Signed-off-by: David Disseldorp Reviewed-by: Christoph Hellwig Reviewed-by: Bryant G. Ly Reviewed-by: Lee Duncan Reviewed-by: Hannes Reinecke Reviewed-by: Roman Bolshakov Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 6d814b145bdaf03b12585b96418561a2db3c32b4 Author: yupeng Date: Wed Dec 5 18:56:28 2018 -0800 net: call sk_dst_reset when set SO_DONTROUTE [ Upstream commit 0fbe82e628c817e292ff588cd5847fc935e025f2 ] after set SO_DONTROUTE to 1, the IP layer should not route packets if the dest IP address is not in link scope. But if the socket has cached the dst_entry, such packets would be routed until the sk_dst_cache expires. So we should clean the sk_dst_cache when a user set SO_DONTROUTE option. Below are server/client python scripts which could reprodue this issue: server side code: ========================================================================== import socket import struct import time s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(('0.0.0.0', 9000)) s.listen(1) sock, addr = s.accept() sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1)) while True: sock.send(b'foo') time.sleep(1) ========================================================================== client side code: ========================================================================== import socket import time s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('server_address', 9000)) while True: data = s.recv(1024) print(data) ========================================================================== Signed-off-by: yupeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit bb457a479c9ba5b18828a56acaf6b617010bc138 Author: Nathan Chancellor Date: Thu Oct 18 16:03:06 2018 -0400 media: firewire: Fix app_info parameter type in avc_ca{,_app}_info [ Upstream commit b2e9a4eda11fd2cb1e6714e9ad3f455c402568ff ] Clang warns: drivers/media/firewire/firedtv-avc.c:999:45: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_APP_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1000:45: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_APP_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1040:44: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_CA_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1041:44: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_CA_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ 4 warnings generated. Change app_info's type to unsigned char to match the type of the member msg in struct ca_msg, which is the only thing passed into the app_info parameter in this function. Link: https://github.com/ClangBuiltLinux/linux/issues/105 Signed-off-by: Nathan Chancellor Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin commit dc21489da678693c44a7feb8978c19006da30d3b Author: Breno Leitao Date: Fri Nov 23 14:30:11 2018 -0200 powerpc/pseries/cpuidle: Fix preempt warning [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable) [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160 [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300 [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500 [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160 [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin commit 8117508a9c4b38c941c0fe0d2cdeacf54cd317c7 Author: Breno Leitao Date: Thu Nov 8 15:12:42 2018 -0200 powerpc/xmon: Fix invocation inside lock region [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(&devtree_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao Reviewed-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin commit fffdbf586866e9500b53c9d4b061d3983720375a Author: Joel Fernandes (Google) Date: Sat Nov 3 16:38:18 2018 -0700 pstore/ram: Do not treat empty buffers as valid [ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ] The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid. I got something like: found existing buffer, size 0, start 0 When I was expecting: no valid data in buffer (sig = ...) This bails out early (and reports with pr_debug()), since it's an acceptable state. Signed-off-by: Joel Fernandes (Google) Co-developed-by: Kees Cook Signed-off-by: Kees Cook Signed-off-by: Sasha Levin commit 93611460f355f43ddad0338c05c24b9f699832da Author: Daniel Santos Date: Fri Oct 19 03:30:20 2018 -0500 jffs2: Fix use of uninitialized delayed_work, lockdep breakage [ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ] jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos Reviewed-by: Hou Tao Signed-off-by: Boris Brezillon Signed-off-by: Sasha Levin commit 6b32535988df09ed4416a10c7b291b0f799c76a1 Author: Chuck Lever Date: Sun Nov 25 17:13:08 2018 -0500 rxe: IB_WR_REG_MR does not capture MR's iova field [ Upstream commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 ] FRWR memory registration is done with a series of calls and WRs. 1. ULP invokes ib_dma_map_sg() 2. ULP invokes ib_map_mr_sg() 3. ULP posts an IB_WR_REG_MR on the Send queue Step 2 generates an iova. It is permissible for ULPs to change this iova (with certain restrictions) between steps 2 and 3. rxe_map_mr_sg captures the MR's iova but later when rxe processes the REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova after step 2 but before step 3, rxe never captures that change. When the remote sends an RDMA Read targeting that MR, rxe looks up the R_key, but the altered iova does not match the iova stored in the MR, causing the RDMA Read request to fail. Reported-by: Anna Schumaker Signed-off-by: Chuck Lever Reviewed-by: Sagi Grimberg Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin commit 62044cba2acb26f13176434e554ef230976dbc08 Author: Ondrej Mosnacek Date: Fri Nov 16 14:12:02 2018 +0100 selinux: always allow mounting submounts [ Upstream commit 2cbdcb882f97a45f7475c67ac6257bbc16277dfe ] If a superblock has the MS_SUBMOUNT flag set, we should always allow mounting it. These mounts are done automatically by the kernel either as part of mounting some parent mount (e.g. debugfs always mounts tracefs under "tracing" for compatibility) or they are mounted automatically as needed on subdirectory accesses (e.g. NFS crossmnt mounts). Since such automounts are either an implicit consequence of the parent mount (which is already checked) or they can happen during regular accesses (where it doesn't make sense to check against the current task's context), the mount permission check should be skipped for them. Without this patch, attempts to access contents of an automounted directory can cause unexpected SELinux denials. In the current kernel tree, the MS_SUBMOUNT flag is set only via vfs_submount(), which is called only from the following places: - AFS, when automounting special "symlinks" referencing other cells - CIFS, when automounting "referrals" - NFS, when automounting subtrees - debugfs, when automounting tracefs In all cases the submounts are meant to be transparent to the user and it makes sense that if mounting the master is allowed, then so should be the automounts. Note that CAP_SYS_ADMIN capability checking is already skipped for (SB_KERNMOUNT|SB_SUBMOUNT) in: - sget_userns() in fs/super.c: if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !(type->fs_flags & FS_USERNS_MOUNT) && !capable(CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); - sget() in fs/super.c: /* Ensure the requestor has permissions over the target filesystem */ if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !ns_capable(user_ns, CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); Verified internally on patched RHEL 7.6 with a reproducer using NFS+httpd and selinux-tesuite. Fixes: 93faccbbfa95 ("fs: Better permission checking for submounts") Signed-off-by: Ondrej Mosnacek Signed-off-by: Paul Moore Signed-off-by: Sasha Levin commit bd37f21e9f85e598be452b4817292bbe601efcd0 Author: Anders Roxell Date: Wed Oct 17 17:26:22 2018 +0200 arm64: perf: set suppress_bind_attrs flag to true [ Upstream commit 81e9fa8bab381f8b6eb04df7cdf0f71994099bd4 ] The armv8_pmuv3 driver doesn't have a remove function, and when the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, the following Call trace can be seen. [ 1.424287] Failed to register pmu: armv8_pmuv3, reason -17 [ 1.424870] WARNING: CPU: 0 PID: 1 at ../kernel/events/core.c:11771 perf_event_sysfs_init+0x98/0xdc [ 1.425220] Modules linked in: [ 1.425531] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.19.0-rc7-next-20181012-00003-ge7a97b1ad77b-dirty #35 [ 1.425951] Hardware name: linux,dummy-virt (DT) [ 1.426212] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 1.426458] pc : perf_event_sysfs_init+0x98/0xdc [ 1.426720] lr : perf_event_sysfs_init+0x98/0xdc [ 1.426908] sp : ffff00000804bd50 [ 1.427077] x29: ffff00000804bd50 x28: ffff00000934e078 [ 1.427429] x27: ffff000009546000 x26: 0000000000000007 [ 1.427757] x25: ffff000009280710 x24: 00000000ffffffef [ 1.428086] x23: ffff000009408000 x22: 0000000000000000 [ 1.428415] x21: ffff000009136008 x20: ffff000009408730 [ 1.428744] x19: ffff80007b20b400 x18: 000000000000000a [ 1.429075] x17: 0000000000000000 x16: 0000000000000000 [ 1.429418] x15: 0000000000000400 x14: 2e79726f74636572 [ 1.429748] x13: 696420656d617320 x12: 656874206e692065 [ 1.430060] x11: 6d616e20656d6173 x10: 2065687420687469 [ 1.430335] x9 : ffff00000804bd50 x8 : 206e6f7361657220 [ 1.430610] x7 : 2c3376756d705f38 x6 : ffff00000954d7ce [ 1.430880] x5 : 0000000000000000 x4 : 0000000000000000 [ 1.431226] x3 : 0000000000000000 x2 : ffffffffffffffff [ 1.431554] x1 : 4d151327adc50b00 x0 : 0000000000000000 [ 1.431868] Call trace: [ 1.432102] perf_event_sysfs_init+0x98/0xdc [ 1.432382] do_one_initcall+0x6c/0x1a8 [ 1.432637] kernel_init_freeable+0x1bc/0x280 [ 1.432905] kernel_init+0x18/0x160 [ 1.433115] ret_from_fork+0x10/0x18 [ 1.433297] ---[ end trace 27fd415390eb9883 ]--- Rework to set suppress_bind_attrs flag to avoid removing the device when CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, since there's no real reason to remove the armv8_pmuv3 driver. Cc: Arnd Bergmann Co-developed-by: Arnd Bergmann Signed-off-by: Anders Roxell Signed-off-by: Will Deacon Signed-off-by: Sasha Levin commit 8ac4ad063e1a32e321a7a0e6ea9fd305a6ecd033 Author: Maciej W. Rozycki Date: Tue Nov 13 22:42:44 2018 +0000 MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur [ Upstream commit e4849aff1e169b86c561738daf8ff020e9de1011 ] The Broadcom SiByte BCM1250, BCM1125, and BCM1125H SOCs have an onchip DRAM controller that supports memory amounts of up to 16GiB, and due to how the address decoder has been wired in the SOC any memory beyond 1GiB is actually mapped starting from 4GiB physical up, that is beyond the 32-bit addressable limit[1]. Consequently if the maximum amount of memory has been installed, then it will span up to 19GiB. Many of the evaluation boards we support that are based on one of these SOCs have their memory soldered and the amount present fits in the 32-bit address range. The BCM91250A SWARM board however has actual DIMM slots and accepts, depending on the peripherals revision of the SOC, up to 4GiB or 8GiB of memory in commercially available JEDEC modules[2]. I believe this is also the case with the BCM91250C2 LittleSur board. This means that up to either 3GiB or 7GiB of memory requires 64-bit addressing to access. I believe the BCM91480B BigSur board, which has the BCM1480 SOC instead, accepts at least as much memory, although I have no documentation or actual hardware available to verify that. Both systems have PCI slots installed for use by any PCI option boards, including ones that only support 32-bit addressing (additionally the 32-bit PCI host bridge of the BCM1250, BCM1125, and BCM1125H SOCs limits addressing to 32-bits), and there is no IOMMU available. Therefore for PCI DMA to work in the presence of memory beyond enable swiotlb for the affected systems. All the other SOC onchip DMA devices use 40-bit addressing and therefore can address the whole memory, so only enable swiotlb if PCI support and support for DMA beyond 4GiB have been both enabled in the configuration of the kernel. This shows up as follows: Broadcom SiByte BCM1250 B2 @ 800 MHz (SB1 rev 2) Board type: SiByte BCM91250A (SWARM) Determined physical RAM map: memory: 000000000fe7fe00 @ 0000000000000000 (usable) memory: 000000001ffffe00 @ 0000000080000000 (usable) memory: 000000000ffffe00 @ 00000000c0000000 (usable) memory: 0000000087fffe00 @ 0000000100000000 (usable) software IO TLB: mapped [mem 0xcbffc000-0xcfffc000] (64MB) in the bootstrap log and removes failures like these: defxx 0000:02:00.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi0: Receive buffer allocation failed fddi0: Adapter open failed! IP-Config: Failed to open fddi0 defxx 0000:09:08.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi1: Receive buffer allocation failed fddi1: Adapter open failed! IP-Config: Failed to open fddi1 when memory beyond 4GiB is handed out to devices that can only do 32-bit addressing. This updates commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32."). References: [1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R, Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview", "Memory Map", pp. 34-38 [2] "BCM91250A User Manual", Revision 91250A-UM100-R, Broadcom Corporation, 18 May 2004, Section 3: "Physical Description", "Supported DRAM", p. 23 Signed-off-by: Maciej W. Rozycki [paul.burton@mips.com: Remove GPL text from dma.c; SPDX tag covers it] Signed-off-by: Paul Burton Reviewed-by: Christoph Hellwig Patchwork: https://patchwork.linux-mips.org/patch/21108/ References: cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin commit 1871002dd214db617f2e9db15224938130abb14a Author: Takashi Sakamoto Date: Tue Nov 13 12:01:30 2018 +0900 ALSA: oxfw: add support for APOGEE duet FireWire [ Upstream commit fba43f454cdf9caa3185219d116bd2a6e6354552 ] This commit adds support for APOGEE duet FireWire, launched 2007, already discontinued. This model uses Oxford Semiconductor FW971 as its communication engine. Below is information on Configuration ROM of this unit. The unit supports some AV/C commands defined by Audio subunit specification and vendor dependent commands. $ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': False, 'chip_ID': 42949742248, 'cmc': False, 'cyc_clk_acc': 255, 'generation': 0, 'imc': False, 'isc': True, 'link_spd': 3, 'max_ROM': 0, 'max_rec': 64, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 122333], ['DESCRIPTOR', 'Duet'], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 122333], ['DESCRIPTOR', 'Duet']]]]} Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit e97701602266079b8c99cf97c553790908d9c86a Author: Anders Roxell Date: Tue Oct 30 12:35:44 2018 +0100 serial: set suppress_bind_attrs flag only if builtin [ Upstream commit 646097940ad35aa2c1f2012af932d55976a9f255 ] When the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init). devtmpfs gets killed because we try to remove a file and decrement the wb reference count before the noop_backing_device_info gets initialized. [ 0.332075] Serial: AMBA PL011 UART driver [ 0.485276] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, base_baud = 0) is a PL011 rev1 [ 0.502382] console [ttyAMA0] enabled [ 0.515710] Unable to handle kernel paging request at virtual address 0000800074c12000 [ 0.516053] Mem abort info: [ 0.516222] ESR = 0x96000004 [ 0.516417] Exception class = DABT (current EL), IL = 32 bits [ 0.516641] SET = 0, FnV = 0 [ 0.516826] EA = 0, S1PTW = 0 [ 0.516984] Data abort info: [ 0.517149] ISV = 0, ISS = 0x00000004 [ 0.517339] CM = 0, WnR = 0 [ 0.517553] [0000800074c12000] user address but active_mm is swapper [ 0.517928] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.518305] Modules linked in: [ 0.518839] CPU: 0 PID: 13 Comm: kdevtmpfs Not tainted 4.19.0-rc5-next-20180928-00002-g2ba39ab0cd01-dirty #82 [ 0.519307] Hardware name: linux,dummy-virt (DT) [ 0.519681] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 0.519959] pc : __destroy_inode+0x94/0x2a8 [ 0.520212] lr : __destroy_inode+0x78/0x2a8 [ 0.520401] sp : ffff0000098c3b20 [ 0.520590] x29: ffff0000098c3b20 x28: 00000000087a3714 [ 0.520904] x27: 0000000000002000 x26: 0000000000002000 [ 0.521179] x25: ffff000009583000 x24: 0000000000000000 [ 0.521467] x23: ffff80007bb52000 x22: ffff80007bbaa7c0 [ 0.521737] x21: ffff0000093f9338 x20: 0000000000000000 [ 0.522033] x19: ffff80007bbb05d8 x18: 0000000000000400 [ 0.522376] x17: 0000000000000000 x16: 0000000000000000 [ 0.522727] x15: 0000000000000400 x14: 0000000000000400 [ 0.523068] x13: 0000000000000001 x12: 0000000000000001 [ 0.523421] x11: 0000000000000000 x10: 0000000000000970 [ 0.523749] x9 : ffff0000098c3a60 x8 : ffff80007bbab190 [ 0.524017] x7 : ffff80007bbaa880 x6 : 0000000000000c88 [ 0.524305] x5 : ffff0000093d96c8 x4 : 61c8864680b583eb [ 0.524567] x3 : ffff0000093d6180 x2 : ffffffffffffffff [ 0.524872] x1 : 0000800074c12000 x0 : 0000800074c12000 [ 0.525207] Process kdevtmpfs (pid: 13, stack limit = 0x(____ptrval____)) [ 0.525529] Call trace: [ 0.525806] __destroy_inode+0x94/0x2a8 [ 0.526108] destroy_inode+0x34/0x88 [ 0.526370] evict+0x144/0x1c8 [ 0.526636] iput+0x184/0x230 [ 0.526871] dentry_unlink_inode+0x118/0x130 [ 0.527152] d_delete+0xd8/0xe0 [ 0.527420] vfs_unlink+0x240/0x270 [ 0.527665] handle_remove+0x1d8/0x330 [ 0.527875] devtmpfsd+0x138/0x1c8 [ 0.528085] kthread+0x14c/0x158 [ 0.528291] ret_from_fork+0x10/0x18 [ 0.528720] Code: 92800002 aa1403e0 d538d081 8b010000 (c85f7c04) [ 0.529367] ---[ end trace 5a3dee47727f877c ]--- Rework to set suppress_bind_attrs flag to avoid removing the device when CONFIG_DEBUG_TEST_DRIVER_REMOVE=y. This applies for pic32_uart and xilinx_uartps as well. Co-developed-by: Arnd Bergmann Signed-off-by: Arnd Bergmann Signed-off-by: Anders Roxell Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit db16eb55395dc14da29a3e385140f195e57be546 Author: Anders Roxell Date: Tue Oct 30 12:35:45 2018 +0100 writeback: don't decrement wb->refcnt if !wb->bdi [ Upstream commit 347a28b586802d09604a149c1a1f6de5dccbe6fa ] This happened while running in qemu-system-aarch64, the AMBA PL011 UART driver when enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE. arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init), devtmpfs' handle_remove() crashes because the reference count is a NULL pointer only because wb->bdi hasn't been initialized yet. Rework so that wb_put have an extra check if wb->bdi before decrement wb->refcnt and also add a WARN_ON_ONCE to get a warning if it happens again in other drivers. Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Co-developed-by: Arnd Bergmann Signed-off-by: Arnd Bergmann Signed-off-by: Anders Roxell Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin commit de390c264e9c31ab3b8c1be0e4db464c3da9a5c5 Author: Miroslav Lichvar Date: Tue Oct 23 14:37:39 2018 +0200 e1000e: allow non-monotonic SYSTIM readings [ Upstream commit e1f65b0d70e9e5c80e15105cd96fa00174d7c436 ] It seems with some NICs supported by the e1000e driver a SYSTIM reading may occasionally be few microseconds before the previous reading and if enabled also pass e1000e_sanitize_systim() without reaching the maximum number of rereads, even if the function is modified to check three consecutive readings (i.e. it doesn't look like a double read error). This causes an underflow in the timecounter and the PHC time jumps hours ahead. This was observed on 82574, I217 and I219. The fastest way to reproduce it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl on the PHC. Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of timecounter_read() in order to allow non-monotonic SYSTIM readings and prevent the PHC from jumping. Cc: Richard Cochran Signed-off-by: Miroslav Lichvar Acked-by: Jacob Keller Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin commit dfc711a4483991e174f989711b869aaf5df93e8b Author: João Paulo Rechi Vita Date: Wed Oct 31 17:21:26 2018 -0700 platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey [ Upstream commit 78f3ac76d9e5219589718b9e4733bee21627b3f5 ] In the past, Asus firmwares would change the panel backlight directly through the EC when the display off hotkey (Fn+F7) was pressed, and only notify the OS of such change, with 0x33 when the LCD was ON and 0x34 when the LCD was OFF. These are currently mapped to KEY_DISPLAYTOGGLE and KEY_DISPLAY_OFF, respectively. Most recently the EC on Asus most machines lost ability to toggle the LCD backlight directly, but unless the OS informs the firmware it is going to handle the display toggle hotkey events, the firmware still tries change the brightness through the EC, to no effect. The end result is a long list (at Endless we counted 11) of Asus laptop models where the display toggle hotkey does not perform any action. Our firmware engineers contacts at Asus were surprised that there were still machines out there with the old behavior. Calling WMNB(ASUS_WMI_DEVID_BACKLIGHT==0x00050011, 2) on the _WDG device tells the firmware that it should let the OS handle the display toggle event, in which case it will simply notify the OS of a key press with 0x35, as shown by the DSDT excerpts bellow. Scope (_SB) { (...) Device (ATKD) { (...) Name (_WDG, Buffer (0x28) { /* 0000 */ 0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11, /* 0008 */ 0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66, /* 0010 */ 0x4E, 0x42, 0x01, 0x02, 0x35, 0xBB, 0x3C, 0x0B, /* 0018 */ 0xC2, 0xE3, 0xED, 0x45, 0x91, 0xC2, 0x4C, 0x5A, /* 0020 */ 0x6D, 0x19, 0x5D, 0x1C, 0xFF, 0x00, 0x01, 0x08 }) Method (WMNB, 3, Serialized) { CreateDWordField (Arg2, Zero, IIA0) CreateDWordField (Arg2, 0x04, IIA1) Local0 = (Arg1 & 0xFFFFFFFF) (...) If ((Local0 == 0x53564544)) { (...) If ((IIA0 == 0x00050011)) { If ((IIA1 == 0x02)) { ^^PCI0.SBRG.EC0.SPIN (0x72, One) ^^PCI0.SBRG.EC0.BLCT = One } Return (One) } } (...) } (...) } (...) } (...) Scope (_SB.PCI0.SBRG.EC0) { (...) Name (BLCT, Zero) (...) Method (_Q10, 0, NotSerialized) // _Qxx: EC Query { If ((BLCT == Zero)) { Local0 = One Local0 = RPIN (0x72) Local0 ^= One SPIN (0x72, Local0) If (ATKP) { Local0 = (0x34 - Local0) ^^^^ATKD.IANE (Local0) } } ElseIf ((BLCT == One)) { If (ATKP) { ^^^^ATKD.IANE (0x35) } } } (...) } Signed-off-by: João Paulo Rechi Vita Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin commit 2af7450e84355619396a9c7c5c3caac2231d3565 Author: David Ahern Date: Sat Jan 5 07:35:04 2019 -0800 ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses [ Upstream commit d4a7e9bb74b5aaf07b89f6531c080b1130bdf019 ] I realized the last patch calls dev_get_by_index_rcu in a branch not holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock. Fixes: ec90ad334986 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dbbbd01ea8aef42c26215f6c749955a006b9801b Author: David Ahern Date: Fri Jan 4 16:58:15 2019 -0800 ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address [ Upstream commit ec90ad334986fa5856d11dd272f7f22fa86c55c4 ] Similar to c5ee066333eb ("ipv6: Consider sk_bound_dev_if when binding a socket to an address"), binding a socket to v4 mapped addresses needs to consider if the socket is bound to a device. This problem also exists from the beginning of git history. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit eb0e073f07bdb97226681e4e7e41457e72f2f4ae Author: Kai-Heng Feng Date: Wed Jan 2 14:45:07 2019 +0800 r8169: Add support for new Realtek Ethernet [ Upstream commit 36352991835ce99e46b4441dd0eb6980f9a83e8f ] There are two new Realtek Ethernet devices which are re-branded r8168h. Add the IDs to to support them. Signed-off-by: Kai-Heng Feng Reviewed-by: Heiner Kallweit Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman